Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>Unicode has many code points that are meant only for composition >>and don't have any standalone meaning, e.g. a combining acute >>accent (U+0301), yet they are perfectly valid code points - >>regardless of UCS-2 or UCS-4. It is easily possible to break >>such a combining sequence using slicing, so the most >>often presented argument for using UCS-4 instead of UCS-2 >>(+ surrogates) is rather weak if seen by daylight. > > > I disagree. It is not just about slicing, it is also about > searching for a character (either through the "in" operator, > or through regular expressions). If you define an SRE character > class, such a character class cannot hold a non-BMP character > in UTF-16 mode, but it can in UCS-4 mode. Consequently, > implementing XML's lexical classes (such as Name, NCName, etc.) > is much easier in UCS-4 than it is in UCS-2. In this case, > combining characters do not matter much, because the XML > spec is defined in terms of Unicode coded characters, causing > combining characters to appear as separate entities for lexical > purposes (unlike half surrogates).
Searching for a character is possible in UCS2 as well - even for surrogates with "in" now supporting multiple code point searches: >>> len(u'\U00010000') 2 >>> u'\U00010000' in u'\U00010001\U00010002\U00010000 and some extra stuff' True >>> u'\U00010000' in u'\U00010001\U00010002\U00010003 and some extra stuff' False On sre character classes: I don't think that these provide a good approach to XML lexical classes - custom functions or methods or maybe even a codec mapping the characters to their XML lexical class are much more efficient in practice. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 09 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com