eryk sun writes: > On Wed, Aug 17, 2016 at 9:35 AM, Stephen J. Turnbull > <turnbull.stephen...@u.tsukuba.ac.jp> wrote: > > BTW, why "surrogate pairs"? Does Windows validate surrogates to > > ensure they come in pairs, but not necessarily in the right order (or > > perhaps sometimes they resolve to non-characters such as U+1FFFF)? > > Microsoft's filesystems remain compatible with UCS2
So it's not just invalid surrogate *pairs*, it's invalid surrogates of all kinds. This means that it's theoretically possible (though I gather that it's unlikely in the extreme) for a real Windows filename to indistinguishable from one generated by Python's surrogateescape handler. What happens when Python's directory manipulation functions on Windows encounter such a filename? Do they try to write it to the disk directory? Do they succeed? Does that depend on surrogateescape? Is there a reason in practice to allow surrogateescape at all on names in Windows filesystems, at least when using the *W API? You mention non-Microsoft filesystems; are they common enough to matter? I admit that as we converge on sanity (UTF-8 for text/* content, some kind of Unicode for filesystem names) none of this is very likely to matter, but I'm a worrywart.... Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/