On 18Aug2016 0829, Chris Angelico wrote:
The second call to glob doesn't have any Unicode characters at all, the way I see it - it's all bytes. Am I completely misunderstanding this?
You're not the only one - I think this has been the most common misunderstanding.
On Windows, the paths as stored in the filesystem are actually all text - more precisely, utf-16-le encoded bytes, represented as 16-bit characters strings.
Converting to an 8-bit character representation only exists for compatibility with code written for other platforms (either Linux, or much older versions of Windows). The operating system has one way to do the conversion to bytes, which Python currently uses, but since we control that transformation I'm proposing an alternative conversion that is more reliable than compatible (with Windows 3.1... shouldn't affect compatibility with code that properly handles multibyte encodings, which should include anything developed for Linux in the last decade or two).
Does that help? I tried to keep the explanation short and focused :) Cheers, Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/