Alexander Belopolsky wrote: > To conclude, I feel that rather than trying to fully support non-BMP > characters as surrogate pairs in narrow builds, we should make it > easier for application developers to avoid them.
I don't understand what you're after here. Programmers can easily avoid them by not using them :-) > If abandoning > internal use of UTF-16 is not an option, I think we should at least > add an option for decoders that currently produce surrogate pairs to > treat non-BMP characters as errors and handle them according to user's > choice. But what do you gain by doing this ? You'd lose the round-trip safety of those codecs and that's not a good thing. Note that most text processing APIs in Python work based on code units, which in most cases represent single code points, but in some cases can also represent surrogates (both on UCS-2 and on UCS-4 builds). E.g. str.center(n) centers the string in a padded string that is composed of n code units. Whether that operation will result in a text that's centered visually on output is a completely different story. The original string could contain surrogates, it could also contain combing code points, so the visual presentation of the result may very well not be centered at all; it may not even appear as having the length n to the user. Since we're not going change the semantics of those APIs, it is OK to not support padding with non-BMP code points on UCS-2 builds. Supporting such cases would only cause problems: * if the methods would pad with surrogates, the resulting string would no longer have length n; breaking the assumption that len(str.center(n)) == n * if the methods would pad with half the number of surroagtes to make sure that len(str.center(n)) == n, the resulting output to e.g. a terminal would be further off, than what you already have with surrogates and combining code points in the original string. More on codecs supporting surrogates: http://mail.python.org/pipermail/python-dev/2008-July/080915.html Perhaps it's time to reconsider a project I once started but that never got off the ground: http://mail.python.org/pipermail/python-dev/2008-July/080911.html Here's the pre-PEP: http://mail.python.org/pipermail/python-dev/2001-July/015938.html -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 24 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com