On 10Aug2016 1226, Random832 wrote:
On Wed, Aug 10, 2016, at 15:08, Steve Dower wrote:
Testing with obscure filenames and strings is where help will be needed
most :)
How about filenames with invalid surrogates? For added fun, consider
that the file system encoding is normally used with surrogateescape.
This is where it gets extra fun, since surrogateescape is not normally
used on Windows because we receive paths as Unicode text and pass them
back as Unicode text without ever encoding or decoding them.
Currently a broken filename (such as '\udee1.txt') can be correctly seen
with os.listdir('.') but not os.listdir(b'.') (because Windows will
return it as '?.txt'). It can be passed to open(), but encoding the name
to utf-8 or utf-16 fails, and I doubt there's any encoding that is going
to succeed.
As far as I can tell, if you get a weird name in bytes today you are
broken, and there is no way to be unbroken without doing the actual
right thing and converting paths on POSIX into Unicode with
surrogateescape. So our official advice has to stay the same - treating
paths as text with smuggled bytes is the *only* way to be truly correct.
But unless we also deprecate byte paths on POSIX, we'll never get there.
(Now there's a dangerous idea ;) )
Cheers,
Steve
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/