On Sun, Aug 7, 2011 at 5:57 AM, Sundance <[email protected]> wrote:
> On Sunday 07 August 2011 12:10:06 Aaron Meurer wrote:
>
>> If I set the last character to … (the unicode character), it just puts
>> a ?.  If I set the second to last character to …, it makes that and
>> also the last character ?. But if I set the third to last character to
>> …, it works.
>
> On a hunch, I'd say that you're not setting the last *character*, but the last
> *byte*. Based on your account, it would seem that the '…' character as encoded
> in UTF-8 takes up three bytes. Let's check:
>
>  >>> len( u'…'.encode('UTF-8') )
>  3
>
> And bingo. :)
>
> So it sounds like you're working with bytes instead of characters. It's a
> common issue with Python 2: 'this string' is bytes, and u'this one' is
> characters; the u indicator right before the string makes all the difference.
>
> In Python 3, 'this string' is characters and b'this one' is bytes, because
> really, characters should be the default, dammit. Basically, in Python 2,
> every bit of text in your code should be prefixed with the u indicator, which
> kind of sucks.
>
> This here is a pretty good presentation on Unicode in Python:
>  http://farmdev.com/talks/unicode/
>
> It should help clear things up. Also recommended, Joel Spolsky's more general
> talk about Unicode: http://www.joelonsoftware.com/articles/Unicode.html
>
> Thank you again for your work on PuDB. You two are getting major cookie points
> from, err, some random guy somewhere in the world, but really, I do appreciate
> what you are doing. :)
>
> -- S.
>

Well, making the original text unicode in the first place is not
possible, as it is just grabbed from the variable values.

But even so, I am making the text unicode first.  I'm doing text[i] =
(unicode(text[i][:maxcol-1]) + unicode(u'…') +
unicode(text[i][maxcol:])) (see the source).  But this has no effect.

I think the problem might have something to do with the color codes.
Anyway, I think unicode characters in urwid are just broken (at least
in Python 2; once the pudb Python 3 port is ready I'll try it there).
For example if I create a file


# -*- coding: utf-8 -*-

ellipsis = u"…"*100

and run it in pudb, it appears as


# -*- coding: utf-8 -*-

ellipsis = u"??"*100

and the coloring is messed up. u"? is the string color (red), the next
? is the variable color (white), "*1 is the number literal color
(light blue), and 00 is the variable color (white) (all in midnight
theme).  Also, the vertical bar character separating the right hand
side bar is offset by one.  All this is assumedly because it didn't
calculate the … character as being three bytes correctly. (by the way,
should I open an issue for this?)

So I think getting the unicode … to work at the end of the variables
list is a loosing battle, at least in Python 2.  Like I said, I'll try
it again in Python 3 when that is ready, where I hope it will just
work.

Aaron Meurer

_______________________________________________
Pudb mailing list
[email protected]
http://lists.tiker.net/listinfo/pudb

Reply via email to