Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-10 Thread Serhiy Storchaka

On 08.02.16 16:32, Victor Stinner wrote:

On Python 2, it wasn't possible to use Unicode for filenames, many
functions fail badly with Unicode, especially when you mix bytes and
Unicode.


Even not all os functions support Unicode.
See http://bugs.python.org/issue18695.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Victor Stinner
Le mercredi 10 février 2016, Steve Dower  a écrit :
>
> I really don't like the idea of not being able to use bytes in cross
> platform code. Unless it's become feasible to use Unicode for lossless
> filenames on Linux - last I heard it wasn't.
>

The point of my email is that even on Python 3, users kept bad habits
because of Python 2. *Yes*, you can use Unicode filenames on all platforms
on Python 3 since 2009 thanks to the following PEP:
 https://www.python.org/dev/peps/pep-0383/

In my first email, I mentioned a bug report of an user still using bytes
filenames on Windows with Python 3. It is on the Blender project which
*only* supports Python 3. Or maybe I missed something huge which really
force Blender to use bytes??? But if a few functions still require bytes, I
would suggest to use instead os.fsencode() for them. It's more much
convenient to handle filenames as Unicode on Python 3.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread eryk sun
On Tue, Feb 9, 2016 at 3:22 AM, Victor Stinner  wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun :
>> For example, in codepage 932 (Japanese), it's an error if a lead byte
>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
>> uncommon). In this case the ANSI API substitutes the default character
>> for Japanese, '・' (U+30FB, Katakana middle dot).
>>
>> >>> locale.getpreferredencoding()
>> 'cp932'
>> >>> open(b'\xe05', 'w').close()
>> >>> os.listdir('.')
>> ['・']
>> >>> os.listdir(b'.')
>> [b'\x81E']
>>
>> All invalid sequences get mapped to '・', which roundtrips as
>> b'\x81\x45', so you can't reliably create and open files with
>> arbitrary bytes paths in this locale.
>
> Oh, and I forgot to ask: what is your filesystem? Is it the same
> behaviour for NTFS, FAT32, network shared directories, etc.?

That was tested using NTFS, but the same would apply to FAT32, exFAT,
and UDF since they all use Unicode [1]. CreateFile[A|W] wraps the
NtCreateFile system call. The NT executive is Unicode, so the system
call receives the filename using a Unicode-only OBJECT_ATTRIBUTES [2]
record. I can't say what an arbitrary non-Microsoft filesystem will do
with the U+30FB character when it processes the IRP_MJ_CREATE. I was
only concerned with ANSI<=>Unicode conversion that's implemented in
the ntdll.dll runtime library.

[1]: https://msdn.microsoft.com/en-us/library/ee681827
[2]: https://msdn.microsoft.com/en-us/library/ff557749
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread eryk sun
On Tue, Feb 9, 2016 at 3:21 AM, Victor Stinner  wrote:
> 2016-02-09 1:37 GMT+01:00 eryk sun :
>> For example, in codepage 932 (Japanese), it's an error if a lead byte
>> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
>> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
>> uncommon). In this case the ANSI API substitutes the default character
>> for Japanese, '・' (U+30FB, Katakana middle dot).
>>
>> >>> locale.getpreferredencoding()
>> 'cp932'
>> >>> open(b'\xe05', 'w').close()
>> >>> os.listdir('.')
>> ['・']
>> >>> os.listdir(b'.')
>> [b'\x81E']
>
> Hum, I'm not sure that I understand your example.

Say I create a sequence of files with the names "file_à[N].txt"
encoded in Latin-1, where N is 0-2. They all map to the same file in a
Japanese system locale:

>>> open(b'file_\xe00.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> open(b'file_\xe01.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> open(b'file_\xe02.txt', 'w').close(); os.listdir('.')
['file_・.txt']
>>> os.listdir(b'.')
[b'file_\x81E.txt']

This isn't a problem with a single-byte codepage such as 1251. For
example, codepage 1251 doesn't map b"\x98" to any character, but
harmlessly maps it to "\x98" (SOS in the C1 Controls block).

Single-byte code pages still have the problem that when a filename is
created using the wide-character API, listing it as bytes may use
either an approximate mapping (e.g. "à" => "a" in 1251) or the
codepage default character (e.g. "\xd7" => "?" in 1251).
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Victor Stinner
2016-02-08 19:26 GMT+01:00 Paul Moore :
> On 8 February 2016 at 14:32, Victor Stinner  wrote:
>> Since 3.3, functions of the os module started to emit
>> DeprecationWarning when called with bytes filenames.
>
> Everywhere? Or just on Windows? I can't tell from your email and I
> don't have a Unix system to hand to check.

I propose to only drop support for bytes filenames on Windows.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Paul Moore
On 9 February 2016 at 10:13, Victor Stinner  wrote:
> IMHO we have to put a line somewhere between Python 2 and Python 3.
> For some specific use cases, there is no good solution which works on
> both Python versions.
>
> For filenames, there is no simple design on Python 2. bytes is the
> natural choice on UNIX, whereas Unicode is preferred on Windows. But
> it's difficult to handle two types in the same code base. As a
> consequence, most users use bytes on Python 2, which is a bad choice
> for Windows...
>
> On Python 3, it's much simpler: always use Unicode. Again, the PEP 383
> helps on UNIX.

So if you were proposing "drop the bytes APIs everywhere" that might
be acceptable (for Python 3). But of course it makes porting harder,
so it's probably not a good idea until Python 2 is no longer relevant.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Victor Stinner
Hi,

2016-02-08 18:02 GMT+01:00 Brett Cannon :
> If Unicode string don't work in Python 2 then what is Python 2/3 to do as a
> cross-platform solution if we completely remove bytes support in Python 3?
> Wouldn't that mean there is no common type between Python 2 & 3 that one can
> use which will work with the os module except native strings (which are
> difficult to get right)?

IMHO we have to put a line somewhere between Python 2 and Python 3.
For some specific use cases, there is no good solution which works on
both Python versions.

For filenames, there is no simple design on Python 2. bytes is the
natural choice on UNIX, whereas Unicode is preferred on Windows. But
it's difficult to handle two types in the same code base. As a
consequence, most users use bytes on Python 2, which is a bad choice
for Windows...

On Python 3, it's much simpler: always use Unicode. Again, the PEP 383
helps on UNIX.

I wrote a PoC for Mercurial to always use Unicode, but the idea was
rejected since Mercurial must support undecodable filenames on UNIX.
It's possible on Python 3 (str+PEP 383), not on Python 2. I tried to
port Mercurial to Python 3 and use Unicode for filenames in the same
change. It's probably better to do that in two steps: first port to
Python 3, then use Unicode. I guess that the final change is to drop
Python 2? I don't know if it's feasible for Mercurial.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Stephen J. Turnbull
Chris Barker - NOAA Federal writes:

 > All I can say is "ouch". Hard to call it a regression to no longer
 > allow this mess...

We can't "disallow" the mess, it's embedded in the lunatic computing
environment (which I happen to live in).  We can't even stop people
from using existing Python programs abusing bytes-oriented APIs.  All
we can do is make it harder for people to port to Python 3, and that
would be bad because it's much easier to refactor once you're in
Python 3.

And as Paul points out, it works fine in ASCII-compatible one-byte
environments (and probably in ISO-2022-compatible 8-bit multibyte
environments, too -- the big problems are the abominations known as
Shift JIS and Big5).  Please, let's leave it alone.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Victor Stinner
2016-02-09 1:37 GMT+01:00 eryk sun :
> For example, in codepage 932 (Japanese), it's an error if a lead byte
> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
> uncommon). In this case the ANSI API substitutes the default character
> for Japanese, '・' (U+30FB, Katakana middle dot).
>
> >>> locale.getpreferredencoding()
> 'cp932'
> >>> open(b'\xe05', 'w').close()
> >>> os.listdir('.')
> ['・']
> >>> os.listdir(b'.')
> [b'\x81E']
>
> All invalid sequences get mapped to '・', which roundtrips as
> b'\x81\x45', so you can't reliably create and open files with
> arbitrary bytes paths in this locale.

Oh, and I forgot to ask: what is your filesystem? Is it the same
behaviour for NTFS, FAT32, network shared directories, etc.?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Victor Stinner
2016-02-09 1:37 GMT+01:00 eryk sun :
> For example, in codepage 932 (Japanese), it's an error if a lead byte
> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
> uncommon). In this case the ANSI API substitutes the default character
> for Japanese, '・' (U+30FB, Katakana middle dot).
>
> >>> locale.getpreferredencoding()
> 'cp932'
> >>> open(b'\xe05', 'w').close()
> >>> os.listdir('.')
> ['・']
> >>> os.listdir(b'.')
> [b'\x81E']

Hum, I'm not sure that I understand your example. Can you pass the
result of os.listdir(str) to open() on Python 3? Are you able to open
the file? Same question for os.listdir(bytes).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-09 Thread Paul Moore
On 9 February 2016 at 01:57, Chris Barker - NOAA Federal
 wrote:OTOH, it's a
> All I can say is "ouch". Hard to call it a regression to no longer
> allow this mess..

OTOH, it's a major regression for someone using an 8-bit codepage that
doesn't have these problems. Code that worked fine for them now
doesn't.

I dislike "works for some people" solutions as much as anyone, but
breaking code that does the job that people need it to is not
something we should do lightly (if at all).

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Chris Barker - NOAA Federal
All I can say is "ouch". Hard to call it a regression to no longer
allow this mess...

CHB

> On Feb 8, 2016, at 4:37 PM, eryk sun  wrote:
>
>> On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker  wrote:
>> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses
>> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming
>> some Windows ANSI-compatible encoding? (and what does it return?)
>
> UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI
> codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a
> bytes path that's passed to CreateFileA matches the listing from
> FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not
> roundtrip. Invalid byte sequences map to the default character. Note
> that an ASCII question mark is not always the default character. It
> depends on the codepage.
>
> For example, in codepage 932 (Japanese), it's an error if a lead byte
> (i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
> value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
> uncommon). In this case the ANSI API substitutes the default character
> for Japanese, '・' (U+30FB, Katakana middle dot).
>
 locale.getpreferredencoding()
>'cp932'
 open(b'\xe05', 'w').close()
 os.listdir('.')
>['・']
 os.listdir(b'.')
>[b'\x81E']
>
> All invalid sequences get mapped to '・', which roundtrips as
> b'\x81\x45', so you can't reliably create and open files with
> arbitrary bytes paths in this locale.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread eryk sun
On Mon, Feb 8, 2016 at 2:41 PM, Chris Barker  wrote:
> Just to clarify -- what does it currently do for bytes? IIUC, Windows uses
> UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming
> some Windows ANSI-compatible encoding? (and what does it return?)

UTF-16 is used in the [W]ide-character API. Bytes paths use the [A]NSI
codepage. For a single-byte codepage, the ANSI API rountrips, i.e. a
bytes path that's passed to CreateFileA matches the listing from
FindFirstFileA. But for a DBCS codepage arbitrary bytes paths do not
roundtrip. Invalid byte sequences map to the default character. Note
that an ASCII question mark is not always the default character. It
depends on the codepage.

For example, in codepage 932 (Japanese), it's an error if a lead byte
(i.e. 0x81-0x9F, 0xE0-0xFC) is followed by a trailing byte with a
value less than 0x40 (note that ASCII 0-9 is 0x30-0x39, so this is not
uncommon). In this case the ANSI API substitutes the default character
for Japanese, '・' (U+30FB, Katakana middle dot).

>>> locale.getpreferredencoding()
'cp932'
>>> open(b'\xe05', 'w').close()
>>> os.listdir('.')
['・']
>>> os.listdir(b'.')
[b'\x81E']

All invalid sequences get mapped to '・', which roundtrips as
b'\x81\x45', so you can't reliably create and open files with
arbitrary bytes paths in this locale.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Chris Barker
On Mon, Feb 8, 2016 at 6:32 AM, Victor Stinner 
wrote:

>  Windows native type for filenames is
> Unicode, and the Windows has a weird behaviour when you use bytes.


Just to clarify -- what does it currently do for bytes? IIUC, Windows uses
UTF-16, so can you pass in UTF-16 bytes? Or when using bytes is is assuming
some Windows ANSI-compatible encoding? (and what does it return?)

Are we brave enough to force users to use the "right" type for filenames?
>

I think so :-)

On Python 2, it wasn't possible to use Unicode for filenames, many
> functions fail badly with Unicode,


I've had fine success using Unicode filenames with py2 on Windows -- in
fact, as soon as my users have non-ansi characters in their names I'm
pretty sure I have no choice

especially when you mix bytes and
> Unicode.
>

well yes, that sure does get ugly!

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Paul Moore
On 8 February 2016 at 14:32, Victor Stinner  wrote:
> Since 3.3, functions of the os module started to emit
> DeprecationWarning when called with bytes filenames.

Everywhere? Or just on Windows? I can't tell from your email and I
don't have a Unix system to hand to check.

> The rationale is quite simple: Windows native type for filenames is
> Unicode, and the Windows has a weird behaviour when you use bytes. For
> example, os.listdir(b'.') gives you paths which cannot be used with
> open() on filenames which are not encodable the ANSI code page.
> Unencodable characters are replaced with "?". The following issue was
> opened to document this weird behaviour (but the doc was never
> completed):
>
> "Document that bytes OS API can returns unusable results on Windows"
> http://bugs.python.org/issue16700

OK, that seems fine, but obviously of limited interest to Unix users
who aren't worried about cross-platform portability :-)

> When the new os.scandir() API was designed, I asked to *not* support
> bytes filenames since they are "broken by design".
> https://www.python.org/dev/peps/pep-0471/
>
> Recently, an user complained that os.walk() doesn't work with bytes on
> Windows anymore:
>
> "Regression: os.walk now using os.scandir() breaks bytes filenames on windows"
> http://bugs.python.org/issue25911
>
> Serhiy Storchaka just pushed a change to reintroduce support bytes
> support on Windows in os.walk(), but I would prefer to do the
> *opposite*: drop supports for bytes filenames on Windows.

But leave those APIs as Unix only? That seems like a regression, too
(sure, the bytes APIs are problematic on Windows, but only for certain
characters AIUI). Windows users currently using programs written using
the bytes API (presumably originally intended for Unix where the bytes
API was a deliberate choice), who don't hit any encoding issues
currently, will see those programs broken for no reason other than
"users using different character sets than you may have been hitting
issues before". That seems like a weird justification to me...

> Are we brave enough to force users to use the "right" type for filenames?

If it were *all* users I'd say it's worth considering. But
practicality beats purity here IMO, and I feel that allowing people's
code to be "portable by default" is a more important goal than
enforcing encoding purity on a single platform.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Andrew Barnert via Python-Dev
On Monday, February 8, 2016 9:11 AM, Alexander Walters 
 wrote:


> 
> On 2/8/2016 12:02, Brett Cannon wrote:
>> 
>> 
>>  If Unicode string don't work in Python 2 then what is Python 2/3 to do 
>>  as a cross-platform solution if we completely remove bytes support in 
>>  Python 3? Wouldn't that mean there is no common type between Python 2 
>>  & 3 that one can use which will work with the os module except native 
>>  strings (which are difficult to get right)?
> 
> The only solution then would be to do `if not PY3: arg = 
> arg.encode(...);; os.SOMEFUNC(arg)`, pardon my psudocode.  
That's exactly what you _don't_ want to do.

More generally, the assumption here is wrong. 

It's not true that you can't use Unicode for Window filenames on Python 2. What 
is true is that you have to be a lot more careful about using Unicode 
_consistently_. And that Python 2 gives you very little help in doing so. And 
some third-party modules may make it harder on you. But if you always use 
unicode, `os.listdir(u'.')` calls FindFirstFileW instead of FindFirstFileA and 
gives you back unicode filenames, os.stat or open call _wstat or _wopen with 
those unicode filenames, etc.

The problem is that on POSIX, you're often better off using str everywhere, 
because Python 2.7 doesn't do surrogate escape. And once you're using str on 
one platform/unicode on the other for filenames, it gets very easy to mix str 
and unicode in other places (like strings you want to print out for the user or 
store in a database), and then you're in mojibake hell.

The io module, the pathlib backport, and six can help a bit (at the cost of 
performance and/or simplicity), but there's no easy answer--if there _were_ an 
easy answer, we wouldn't have Python 3 in the first place, right?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Alexander Walters



On 2/8/2016 12:02, Brett Cannon wrote:



If Unicode string don't work in Python 2 then what is Python 2/3 to do 
as a cross-platform solution if we completely remove bytes support in 
Python 3? Wouldn't that mean there is no common type between Python 2 
& 3 that one can use which will work with the os module except native 
strings (which are difficult to get right)?


The only solution then would be to do `if not PY3: arg = 
arg.encode(...);; os.SOMEFUNC(arg)`, pardon my psudocode.  Its annoying, 
but at least its not a language syntax change which means it isn't 
intractable, just an annoying roadblock.  If I had my druthers it would 
be put off until after 2.x is well and truly dead.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Brett Cannon
On Mon, 8 Feb 2016 at 06:33 Victor Stinner  wrote:

> Hi,
>
> Since 3.3, functions of the os module started to emit
> DeprecationWarning when called with bytes filenames.
>
> The rationale is quite simple: Windows native type for filenames is
> Unicode, and the Windows has a weird behaviour when you use bytes. For
> example, os.listdir(b'.') gives you paths which cannot be used with
> open() on filenames which are not encodable the ANSI code page.
> Unencodable characters are replaced with "?". The following issue was
> opened to document this weird behaviour (but the doc was never
> completed):
>
> "Document that bytes OS API can returns unusable results on Windows"
> http://bugs.python.org/issue16700
>
>
> When the new os.scandir() API was designed, I asked to *not* support
> bytes filenames since they are "broken by design".
> https://www.python.org/dev/peps/pep-0471/
>
> Recently, an user complained that os.walk() doesn't work with bytes on
> Windows anymore:
>
> "Regression: os.walk now using os.scandir() breaks bytes filenames on
> windows"
> http://bugs.python.org/issue25911
>
>
> Serhiy Storchaka just pushed a change to reintroduce support bytes
> support on Windows in os.walk(), but I would prefer to do the
> *opposite*: drop supports for bytes filenames on Windows.
>
> Are we brave enough to force users to use the "right" type for filenames?
>
> --
>
> On Python 2, it wasn't possible to use Unicode for filenames, many
> functions fail badly with Unicode, especially when you mix bytes and
> Unicode.
>
> On Python 3, Unicode is the "natural" types, most Python functions
> prefer Unicode, and the PEP 383 (surrogateescape) allows to safetely
> use Unicode on UNIX even with undecodable filenames (invalid bytes are
> stored as Unicode surrogate characters).
>

If Unicode string don't work in Python 2 then what is Python 2/3 to do as a
cross-platform solution if we completely remove bytes support in Python 3?
Wouldn't that mean there is no common type between Python 2 & 3 that one
can use which will work with the os module except native strings (which are
difficult to get right)?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Matthias Bussonnier

> On Feb 8, 2016, at 06:40, Victor Stinner  wrote:
> 
> 2016-02-08 15:32 GMT+01:00 Victor Stinner :
>> Since 3.3, functions of the os module started to emit
>> DeprecationWarning when called with bytes filenames.
>> (...)
>> Recently, an user complained that os.walk() doesn't work with bytes on
>> Windows anymore:
>> (...)
> 
> It's also sad to see that deprecation warnings are completly ignored.
> Python 3.3 was release in 2011, 5 years ago.

> 
> I would prefer to show deprecation warnings by default. But I know
> that it's an old debate: developers vs users :-) I like to see my
> users as potential developers ;-)


This is tracked in this issue:

http://bugs.python.org/issue24294  :  
DeprecationWarnings should be visible by default in the interactive REPL

IPython have enabled them only if they come from __main__. From totally 
subjective experience, 
that has already pushed a few library to update their code to new apis[1].
-- 
M

[1] or sometime to wrap code in ignore warnings...



> 
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/bussonniermatthias%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Windows: Remove support of bytes filenames in the os module?

2016-02-08 Thread Victor Stinner
2016-02-08 15:32 GMT+01:00 Victor Stinner :
> Since 3.3, functions of the os module started to emit
> DeprecationWarning when called with bytes filenames.
> (...)
> Recently, an user complained that os.walk() doesn't work with bytes on
> Windows anymore:
> (...)

It's also sad to see that deprecation warnings are completly ignored.
Python 3.3 was release in 2011, 5 years ago.

I would prefer to show deprecation warnings by default. But I know
that it's an old debate: developers vs users :-) I like to see my
users as potential developers ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com