On Wed, May 21, 2014 at 12:30 PM, John Dennis <jden...@redhat.com> wrote: > On 05/15/2014 11:41 AM, Victor Stinner wrote: >> Hi, >> >> The functions safe_decode() and safe_encode() have been ported to Python 3, >> and changed more than once. IMO we can still improve these functions to make >> them more reliable and easier to use. >> >> >> (1) My first concern is that these functions try to guess user expectation >> about encodings. They use "sys.stdin.encoding or sys.getdefaultencoding()" as >> the default encoding to decode, but this encoding depends on the locale >> encoding (stdin encoding), on stdin (is stdin a TTY? is stdin mocked?), and >> on >> the Python major version. >> >> IMO the default encoding should be UTF-8 because most OpenStack components >> expect this encoding. >> >> Or maybe users want to display data to the terminal, and so the locale >> encoding should be used? In this case, locale.getpreferredencoding() would be >> more reliable than sys.stdin.encoding. > > The problem is you can't know the correct encoding to use until you know > the encoding of the IO stream, therefore I don't think you can correctly > write a generic encode/decode functions. What if you're trying to send > the output to multiple IO streams potentially with different encodings? > Think that's far fetched? Nope, it's one of the nastiest and common > problems in Python2. The default encoding differs depending on whether > the IO target is a tty or not. Therefore code that works fine when > written to the terminal blows up with encoding errors when redirected to > a file (because the TTY probably has UTF-8 and all other encodings > default to ASCII due to sys.defaultencoding). > > Another problem is that Python2 default encoding is ASCII but in Python3 > it's UTF-8 (IMHO the default encoding in Python2 should have been UTF-8, > that fact it was set to ASCII is the cause of 99% of the encoding > exceptions in Python2). > > Given that you don't know what the encoding of the IO stream is I don't > think you should base it on the locale nor sys.stdin. Rather I think we > should just agree everything is UTF-8. If that messes up someones > terminal output I think it's fair to say if you're running OpenStack > you'll need to switch to UTF-8. Anything else requires way more > knowledge than we have available in a generic function. Solving this so > the encodings match for each and every IO stream is very complicated, > note Python3 still punts on this.
Unfortunately we can't just agree to a single encoding in all cases. Lots of people use encodings other than UTF-8 for terminals, and that's where these functions are most frequently used. Doug > > > -- > John > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev