On Sunday 07 August 2011 12:10:06 Aaron Meurer wrote:

> If I set the last character to … (the unicode character), it just puts
> a ?.  If I set the second to last character to …, it makes that and
> also the last character ?. But if I set the third to last character to
> …, it works.

On a hunch, I'd say that you're not setting the last *character*, but the last 
*byte*. Based on your account, it would seem that the '…' character as encoded 
in UTF-8 takes up three bytes. Let's check:

  >>> len( u'…'.encode('UTF-8') )
  3

And bingo. :)

So it sounds like you're working with bytes instead of characters. It's a 
common issue with Python 2: 'this string' is bytes, and u'this one' is 
characters; the u indicator right before the string makes all the difference.

In Python 3, 'this string' is characters and b'this one' is bytes, because 
really, characters should be the default, dammit. Basically, in Python 2, 
every bit of text in your code should be prefixed with the u indicator, which 
kind of sucks.

This here is a pretty good presentation on Unicode in Python:
  http://farmdev.com/talks/unicode/

It should help clear things up. Also recommended, Joel Spolsky's more general 
talk about Unicode: http://www.joelonsoftware.com/articles/Unicode.html

Thank you again for your work on PuDB. You two are getting major cookie points 
from, err, some random guy somewhere in the world, but really, I do appreciate 
what you are doing. :)

-- S.

_______________________________________________
Pudb mailing list
[email protected]
http://lists.tiker.net/listinfo/pudb

Reply via email to