On Sun, Aug 7, 2011 at 5:57 AM, Sundance <[email protected]> wrote: > On Sunday 07 August 2011 12:10:06 Aaron Meurer wrote: > >> If I set the last character to … (the unicode character), it just puts >> a ?. If I set the second to last character to …, it makes that and >> also the last character ?. But if I set the third to last character to >> …, it works. > > On a hunch, I'd say that you're not setting the last *character*, but the last > *byte*. Based on your account, it would seem that the '…' character as encoded > in UTF-8 takes up three bytes. Let's check: > > >>> len( u'…'.encode('UTF-8') ) > 3 > > And bingo. :) > > So it sounds like you're working with bytes instead of characters. It's a > common issue with Python 2: 'this string' is bytes, and u'this one' is > characters; the u indicator right before the string makes all the difference. > > In Python 3, 'this string' is characters and b'this one' is bytes, because > really, characters should be the default, dammit. Basically, in Python 2, > every bit of text in your code should be prefixed with the u indicator, which > kind of sucks. > > This here is a pretty good presentation on Unicode in Python: > http://farmdev.com/talks/unicode/ > > It should help clear things up. Also recommended, Joel Spolsky's more general > talk about Unicode: http://www.joelonsoftware.com/articles/Unicode.html > > Thank you again for your work on PuDB. You two are getting major cookie points > from, err, some random guy somewhere in the world, but really, I do appreciate > what you are doing. :) > > -- S. >
Well, making the original text unicode in the first place is not possible, as it is just grabbed from the variable values. But even so, I am making the text unicode first. I'm doing text[i] = (unicode(text[i][:maxcol-1]) + unicode(u'…') + unicode(text[i][maxcol:])) (see the source). But this has no effect. I think the problem might have something to do with the color codes. Anyway, I think unicode characters in urwid are just broken (at least in Python 2; once the pudb Python 3 port is ready I'll try it there). For example if I create a file # -*- coding: utf-8 -*- ellipsis = u"…"*100 and run it in pudb, it appears as # -*- coding: utf-8 -*- ellipsis = u"??"*100 and the coloring is messed up. u"? is the string color (red), the next ? is the variable color (white), "*1 is the number literal color (light blue), and 00 is the variable color (white) (all in midnight theme). Also, the vertical bar character separating the right hand side bar is offset by one. All this is assumedly because it didn't calculate the … character as being three bytes correctly. (by the way, should I open an issue for this?) So I think getting the unicode … to work at the end of the variables list is a loosing battle, at least in Python 2. Like I said, I'll try it again in Python 3 when that is ready, where I hope it will just work. Aaron Meurer _______________________________________________ Pudb mailing list [email protected] http://lists.tiker.net/listinfo/pudb
