Ronald Oussoren wrote: > > On 7 Oct, 2009, at 22:13, M.-A. Lemburg wrote: > >> Ronald Oussoren wrote: >>> >>> On 7 Oct, 2009, at 20:05, M.-A. Lemburg wrote: >>>> >>>> >>>> If we do go for a change, we should use sizeof(wchar_t) >>>> as basis for the new default - on all platforms that >>>> provide a wchar_t type. >>> >>> I'd be -1 on that. Sizeof(wchar_t) is 4 on OSX, but all non-Unix API's >>> that deal with Unicode text use ucs16. >> >> Is that true for non-Carbon APIs as well ? >> >> This is what I found on the web (in summary): >> >> Apple chose to go with UTF-16 at about the same time as Microsoft did >> and used sizeof(wchar_t) == 2 for Mac OS. When they moved to Mac OS X, >> they switched wchar_t to sizeof(wchar_t) == 4. >> > > Both Carbon and the modern APIs use UTF-16.
Thanks for that data point. So UTF-16 would be the more natural choice on Mac OS X, despite the choice of sizeof(wchar_t). > What I don't quite get in the UTF-16 vs. UTF-32 discussion is why UTF-32 > would be useful, because if you want to do generic Unicode processing > you have to look at sequences of composed characters (base characters + > composing marks) anyway instead of separate code points. Not that I'm a > unicode expert in any way... Very true. It's one of the reasons why I'm not much of a UCS4-fan - it only helps with surrogates and that's about it. Combining characters, various types of control code points (e.g. joiners, bidirectional marks, breaks, non-breaks, annotations) context sensitive casing, bidirectional marks and other such features found in scripts cause very similar problems - often much harder to solve, since they are not as easily identifiable as surrogate high and low code points. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 07 2009) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com