Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

James Y Knight Tue, 30 Sep 2008 20:32:30 -0700


On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote:

However, Martin, I can promise you that I will _never_ ask for anyconvenience functions related to bytes as a result of thisdecision. I want bytes to come back from filesystem APIs because Iintend to have a wrapper layer which knows two things about thefile: the bytes (which are needed to talk to POSIX filesystem APIs)and the characters (which are computed from those bytes, can besafely renormalized, displayed to users, etc). On Windows thisfilesystem wrapper will necessarily behave differently, and will notuse bytes for anything. Any formatting beyond joining path segmentstogether and possibly splitting extensions off will be done oncharacter strings, not byte strings.


Can you clarify what proposal you are supporting for Python:

1) Two sets of APIs, one returning unicode strings, and one returningbytestrings. (subpoints: what does the unicode-returning API do whenit cannot decode the bytestring into unicode? raise exception, pretendargument/envvar/file didn't exist/?)

or

2) All APIs return bytestrings only. Converting to unicode isconsidered lossy, and would have to be done by applications fordisplay purposes only.

I really don't understand the reasoning for (1). It seems to me thatmost software (probably including all of the Python stdlib) wouldcontinue to use the unicode string API. Switching all of the Pythonstdlib to use the bytestring APIs instead would certainly be a largeundertaking, and would have all sorts of ripple-on API changes (e.g.__file__). So I can only imagine that if you're proposing (1), you'redoing so without the intention of suggesting that Python be convertedto use it.

And so, of course, that doesn't really fix things (such as getcwdfailing if your cwd is a path that is undecodeable in the currentlocale, or well, currently, python refusing to even start).

If you're proposing (2), it's at least as large an undertaking as (1)+ converting Python to use the optional bytestring APIs. But at leastit avoids exposing an API that people ought not use, and does make itobvious what still needs to be fixed: the unfixed code simply won'trun at all.

The proposal of using U+0000 seems like it would have been almostthe same from such a wrapper's perspective, except (A) people usingthe filesystem APIs without the benefit of such a wrapper would havebeen even more screwed

I'm not sure what your "more screwed" is comparing against: currentpy3k behavior? (aka: decoding to Unicode in locale's specifiedencoding)? I don't see how you can really be more screwed than that:not only can't you send your filename to display in a Gtk+ button, youcan't access it at all, even staying within python.

and (B) there are a few nasty corner-cases when dealing withsurrogate (i.e. invalid, in UTF-8) code points which I'm not quitesure what it would have done with.

The lone-surrogate-pair proposal was a totally different proposal thanthe U+0000 one.


James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue

Reply via email to