On Jan 13, 12:06 am, Christian Heimes <li...@cheimes.de> wrote: > >> Perhaps you also like to hear from a developer who has worked on Python > >> 3.0 itself and who has done lots of work with internationalized > >> applications. If you want to get it right you must > > >> * decode incoming text data to unicode as early as possible > >> * use unicode for all internal text data > >> * encode outgoing unicode as late as possible. > > >> where incoming data is read from the file system, database, network etc. > > >> This rule applies not only to Python 3.0 but to *any* application > >> written in *any* languate. > > > The above is a story with which I'm quite familiar. However it is > > *not* the issue!! The issue is why would anyone propose changing a > > string constant "foo" in working 2.x code to u"foo"? > > Do I really have to repeat "use unicode for all internal text data"? > > "foo" and u"foo" are two totally different things. The former is a byte > sequence "\x66\x6f\x6f" while the latter is the text 'foo'. It just > happens that "foo" and u"foo" are equal in Python 2.x because > "foo".decode("ascii") == u"foo". In Python 3.x does it right, b"foo" is > unequal to "foo". >
Again, all very true, but irrelevant. b"foo" is *not* involved. You're ignoring the effect of 2to3: Original 2.x code: assert "foo" == u"foo" # works output from 2to3: assert "foo" == "foo" # works Original 2.x code with u prepended: assert u"foo" == u"foo" # works output from 2to3: assert "foo" == "foo" # works I say again, show me a case of working 2.5 code where prepending u to an ASCII string constant that is intended to be used in a text context is actually worth the keystrokes. -- http://mail.python.org/mailman/listinfo/python-list