On 05/15/2014 11:41 AM, Victor Stinner wrote: > Hi, > > The functions safe_decode() and safe_encode() have been ported to Python 3, > and changed more than once. IMO we can still improve these functions to make > them more reliable and easier to use. > > > (1) My first concern is that these functions try to guess user expectation > about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as > the default encoding to decode, but this encoding depends on the locale > encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and > on > the Python major version. > > IMO the default encoding should be UTF-8 because most OpenStack components > expect this encoding. > > Or maybe users want to display data to the terminal, and so the locale > encoding should be used? In this case, locale.getpreferredencoding() would be > more reliable than sys.stdin.encoding.
The problem is you can't know the correct encoding to use until you know the encoding of the IO stream, therefore I don't think you can correctly write a generic encode/decode functions. What if you're trying to send the output to multiple IO streams potentially with different encodings? Think that's far fetched? Nope, it's one of the nastiest and common problems in Python2. The default encoding differs depending on whether the IO target is a tty or not. Therefore code that works fine when written to the terminal blows up with encoding errors when redirected to a file (because the TTY probably has UTF-8 and all other encodings default to ASCII due to sys.defaultencoding). Another problem is that Python2 default encoding is ASCII but in Python3 it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8, that fact it was set to ASCII is the cause of 99% of the encoding exceptions in Python2). Given that you don't know what the encoding of the IO stream is I don't think you should base it on the locale nor sys.stdin. Rather I think we should just agree everything is UTF-8. If that messes up someones terminal output I think it's fair to say if you're running OpenStack you'll need to switch to UTF-8. Anything else requires way more knowledge than we have available in a generic function. Solving this so the encodings match for each and every IO stream is very complicated, note Python3 still punts on this. -- John _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev