On Sep 30, 2008, at 10:06 PM, [EMAIL PROTECTED] wrote:
However, Martin, I can promise you that I will _never_ ask for any convenience functions related to bytes as a result of this decision. I want bytes to come back from filesystem APIs because I intend to have a wrapper layer which knows two things about the file: the bytes (which are needed to talk to POSIX filesystem APIs) and the characters (which are computed from those bytes, can be safely renormalized, displayed to users, etc). On Windows this filesystem wrapper will necessarily behave differently, and will not use bytes for anything. Any formatting beyond joining path segments together and possibly splitting extensions off will be done on character strings, not byte strings.

Can you clarify what proposal you are supporting for Python:

1) Two sets of APIs, one returning unicode strings, and one returning bytestrings. (subpoints: what does the unicode-returning API do when it cannot decode the bytestring into unicode? raise exception, pretend argument/envvar/file didn't exist/?)

or

2) All APIs return bytestrings only. Converting to unicode is considered lossy, and would have to be done by applications for display purposes only.

I really don't understand the reasoning for (1). It seems to me that most software (probably including all of the Python stdlib) would continue to use the unicode string API. Switching all of the Python stdlib to use the bytestring APIs instead would certainly be a large undertaking, and would have all sorts of ripple-on API changes (e.g. __file__). So I can only imagine that if you're proposing (1), you're doing so without the intention of suggesting that Python be converted to use it.

And so, of course, that doesn't really fix things (such as getcwd failing if your cwd is a path that is undecodeable in the current locale, or well, currently, python refusing to even start).

If you're proposing (2), it's at least as large an undertaking as (1) + converting Python to use the optional bytestring APIs. But at least it avoids exposing an API that people ought not use, and does make it obvious what still needs to be fixed: the unfixed code simply won't run at all.

The proposal of using U+0000 seems like it would have been almost the same from such a wrapper's perspective, except (A) people using the filesystem APIs without the benefit of such a wrapper would have been even more screwed

I'm not sure what your "more screwed" is comparing against: current py3k behavior? (aka: decoding to Unicode in locale's specified encoding)? I don't see how you can really be more screwed than that: not only can't you send your filename to display in a Gtk+ button, you can't access it at all, even staying within python.

and (B) there are a few nasty corner-cases when dealing with surrogate (i.e. invalid, in UTF-8) code points which I'm not quite sure what it would have done with.

The lone-surrogate-pair proposal was a totally different proposal than the U+0000 one.

James
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to