Michael Felt <aixto...@felt.demon.nl> added the comment:

On 23/08/2018 12:51, STINNER Victor wrote:
> STINNER Victor <vstin...@redhat.com> added the comment:
>
> Your issue is about decoding command line argument which is done from main() 
> function. It doesn't use Python codecs, but functions like Py_DecodeLocale().
This is beyond my understanding atm.
Early on I tried making the expected just be 'arg' and went from
situation A to situation B - which looked much closer, BUT, the 'types'
differed:

Situaltion A (original)
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"['h\\udcc3\\udca9\\udce2\\udc82\\udcac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ ['h\udcc3\udca9\udce2\udc82\udcac']
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

I tried saying the "expected" is arg, but arg is still a byte object, the 
cmd_line result is not (printed as such).

Situation B
AssertionError: "['h\\xc3\\xa9\\xe2\\x82\\xac']" != 
"[b'h\\xc3\\xa9\\xe2\\x82\\xac']"
- ['h\xc3\xa9\xe2\x82\xac']
+ [b'h\xc3\xa9\xe2\x82\xac']
?  +
 : ISO8859-1:['h\xc3\xa9\xe2\x82\xac']

After further digging - to understand why it was coming as "\x encoding rather 
than \udc"

I looked at what was happening here:

out = self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
becomes
out = self.get_output('-X', utf8_opt, '-c', code, 
'h\xe9\u20ac'.encode('utf-8'), **kw)
becomes
out = self.get_output('-X', utf8_opt, '-c', code, b'h\xc3\xa9\xe2\x82\xac', 
**kw)

And finally, at the CLI becomes:
['/data/prj/python/python3-3.8/python', '-X', 'faulthandler', '-X', 'utf8', 
'-c', 'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.argv[1:])))', b'h\xc3\xa9\xe2\x82\xac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), 
ascii(sys.ar
gv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
UTF-8:['bh\\xc3\\xa9\\xe2\\x82\\xac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', b'h\xc3\xa9\xe2\x82\xac'
ISO8859-1:['bh\\xc3\\xa9\\xe2\\x82\\xac']

Note:
/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', 'h\udcc3\udca9\udce2\udc82\udcac'
ISO8859-1:['h\\udcc3\\udca9\\udce2\\udc82\\udcac']

/data/prj/python/python3-3.8/python '-X' 'faulthandler' '-X' 'utf8=0' '-c' 
'import locale, sys; print("%s:%s" % (locale.getpreferredencoding(), ascii(sys.
argv[1:])))', b'h\udcc3\udca9\udce2\udc82\udcac'
ISO8859-1:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']

root@x066:[/data/prj/python/python3-3.8]/data/prj/python/python3-3.8/python 
'-X' 'faulthandler' '-X' 'utf8' '-c' 'import locale, sys; print("%s:%s" % (>
UTF-8:['bh\\udcc3\\udca9\\udce2\\udc82\\udcac']

Summary:
a) concerned about how b'h....' becomes 'bh....'
b) whatwever argv[1] is, is very close to what is returned - so whatever 
happens durinf the transformation from 
self.get_output('-X', utf8_opt, '-c', code, arg, **kw)
 determines the output and the (failed) comparison.

>> Question 1: why is windows excluded? Because it does not use UTF-8 as it's 
>> default (it's default is CP1252)
> Windows uses wmain() which gets command line arguments as wchar_t* strings: 
> Unicode. No decoding is needed.
>
> ----------
>
> _______________________________________
> Python tracker <rep...@bugs.python.org>
> <https://bugs.python.org/issue34347>
> _______________________________________
>

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue34347>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to