On 11/20/2014 04:15 PM, Chris Angelico wrote: > On Fri, Nov 21, 2014 at 1:14 AM, Francis Moreau <francis.m...@gmail.com> > wrote: >> Hi, >> >> Thanks for the "from __future__ import unicode_literals" trick, it makes >> that switch much less intrusive. >> >> However it seems that I will suddenly be trapped by all modules which >> are not prepared to handle unicode. For example: >> >> >>> from __future__ import unicode_literals >> >>> import locale >> >>> locale.setlocale(locale.LC_ALL, 'fr_FR') >> Traceback (most recent call last): >> File "<stdin>", line 1, in <module> >> File "/usr/lib64/python2.7/locale.py", line 546, in setlocale >> locale = normalize(_build_localename(locale)) >> File "/usr/lib64/python2.7/locale.py", line 453, in _build_localename >> language, encoding = localetuple >> ValueError: too many values to unpack >> >> Is the locale module an exception and in that case I'll fix it by doing: >> >> >>> locale.setlocale(locale.LC_ALL, b'fr_FR') >> >> or is a (big) part of the modules in python 2.7 still not ready for >> unicode and in that case I have to decide which prefix (u or b) I should >> manually add ? > > Sadly, there are quite a lot of parts of Python 2 that simply don't > handle Unicode strings. But you can probably keep all of those down to > just a handful of explicit b"whatever" strings; most places should > accept unicode as well as str. What you're seeing here is a prime > example of one of this author's points (caution, long post): > > http://unspecified.wordpress.com/2012/04/19/the-importance-of-language-level-abstract-unicode-strings/ > > """The lesson of Python 3 is: give programmers a Unicode string type, > *make it the default*, and encoding issues will /mostly/ go away.""" > > There's a whole ecosystem to Python 2 - some in the standard library, > heaps more in the rest of the world - and a lot of it was written on > the assumption that a byte is a character is an octet. When you pass > Unicode strings to functions written to expect byte strings, sometimes > you win, and sometimes you lose... even with the standard library > itself. But the Python 3 ecosystem has been written on the assumption > that strings are Unicode. It's only a narrow set of programs > ("boundary code", where you're moving text across networks and stuff > like that) where the Python 2 model is easier to work with; and the > recent Py3 releases have been progressively working to relieve that > pain. > > The absolute worst case is a function which exists in Python 2 and 3, > and requires a byte string in Py2 and a text string in Py3. Sadly, > that may be exactly what locale.setlocale() is. For that, I would > suggest explicitly passing stuff through str(): > > locale.setlocale(locale.LC_ALL, str('fr_FR')) > > In Python 3, 'fr_FR' is already a str, so passing it through str() > will have no significant effect. (Though it would be worth commenting > that, to make it clear to a subsequent reader that this is Py2 compat > code.) In Python 2 with unicode_literals active, 'fr_FR' is a unicode, > so passing it through str() will encode it to ASCII, producing a byte > string that setlocale should be happy with. > > By the way, the reason for the strange error message is clearer in > Python 3, which chains in another exception: > >>>> locale.setlocale(locale.LC_ALL, b'fr_FR') > Traceback (most recent call last): > File "/usr/local/lib/python3.5/locale.py", line 498, in _build_localename > language, encoding = localetuple > ValueError: too many values to unpack (expected 2) > > During handling of the above exception, another exception occurred: > > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/local/lib/python3.5/locale.py", line 594, in setlocale > locale = normalize(_build_localename(locale)) > File "/usr/local/lib/python3.5/locale.py", line 507, in _build_localename > raise TypeError('Locale must be None, a string, or an iterable of > two strings -- language code, encoding.') > TypeError: Locale must be None, a string, or an iterable of two > strings -- language code, encoding. > > So when it gets the wrong type of string, it attempts to unpack it as > an iterable; it yields five values (the five bytes or characters, > depending on which way it's the wrong type of string), but it's > expecting two. Fortunately, str() will deal with this. But make sure > you don't have the b prefix, or str() in Py3 will give you quite a > different result! >
Yes I finally used str() since only setlocale() reported to have some issues with unicode_literals active in my appliction. Thanks Chris for your useful insight. -- https://mail.python.org/mailman/listinfo/python-list