On 9/20/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: > On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote: > > On 9/20/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: > > > On 9/20/06, Adam Olsen <[EMAIL PROTECTED]> wrote: > > > > Before we can decide on the internal representation of our unicode > > > > objects, we need to decide on their external interface. My thoughts > > > > so far: > > > > > > Let me cut this short. The external string API in Py3k should not > > > change or only very marginally so (like removing rarely used useless > > > APIs or adding a few new conveniences). The plan is to keep the 2.x > > > API that is supported (in 2.x) by both str and unicode, but merge the > > > twp string types into one. Anything else could be done just as easily > > > before or after Py3k. > > > > Thanks, but one thing remains unclear: is the indexing intended to > > represent bytes, code points, or code units? > > I don't see what's unclear -- the existing unicode object does what it does.
The existing unicode object doesn't expose the difference between them except when UTF-16 is used and surrogates exist. > > Note that C code > > operating on UTF-16 would use code units for slicing of UTF-16, which > > splits surrogate pairs. > > I thought we were discussing the Python API. > > C code will likely have the same access to unicode objects as it has in 2.x. I only mentioned it because C doesn't mind exposing the internal details for performance benefits, whereas python usually does mind. > > As far as I can tell, CPython on windows uses UTF-16 with code units. > > Perhaps not intentionally, but by default (not throwing an error on > > surrogates). > > This is intentional, to be compatible with the rest of that platform. > Jython and IronPython do this too I believe. So you're saying we should use code units?! Or are you referring to the choice of UTF-16? I would expect us to use code points in 3.x, but that's not how it is in 2.x. -- Adam Olsen, aka Rhamphoryncus _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
