Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

eryk sun Tue, 09 Feb 2016 05:35:55 -0800

On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner <[email protected]> wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun <[email protected]>:
>> For example, in codepage 932 (Japanese), it's an error if a lead byte
>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
>> uncommon). In this case the ANSI API substitutes the default character
>> for Japanese, '・' (U+30FB, Katakana middle dot).
>>
>>     >>> locale.getpreferredencoding()
>>     'cp932'
>>     >>> open(b'\xe05', 'w').close()
>>     >>> os.listdir('.')
>>     ['・']
>>     >>> os.listdir(b'.')
>>     [b'\x81E']
>>
>> All invalid sequences get mapped to '・', which roundtrips as
>> b'\x81\x45', so you can't reliably create and open files with
>> arbitrary bytes paths in this locale.
>
> Oh, and I forgot to ask: what is your filesystem? Is it the same
> behaviour for NTFS, FAT32, network shared directories, etc.?


That was tested using NTFS, but the same would apply to FAT32, exFAT,
and UDF since they all use Unicode [1]. CreateFile[A|W] wraps the
NtCreateFile system call. The NT executive is Unicode, so the system
call receives the filename using a Unicode-only OBJECT_ATTRIBUTES [2]
record. I can't say what an arbitrary non-Microsoft filesystem will do
with the U+30FB character when it processes the IRP_MJ_CREATE. I was
only concerned with ANSI<=>Unicode conversion that's implemented in
the ntdll.dll runtime library.

[1]: https://msdn.microsoft.com/en-us/library/ee681827
[2]: https://msdn.microsoft.com/en-us/library/ff557749
_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

Reply via email to