Terry Reedy wrote: > On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > >> Any non-trivial text processing is likely to be broken in presence of >> surrogates. Producing them on input is just trading known issue for >> an unknown one. Processing surrogate pairs in python code is hard. >> Software that has to support non-BMP characters will most likely be >> written for a wide build and contain subtle bugs when run under a >> narrow build. Note that my latest proposal does not abolish >> surrogates outright. Users who want them can still use something like >> "surrogateescape" error handler for non-BMP characters. > > It seems to me that what you are asking for is an alternate, optional, > utf-8-bmp codec that would raise an error, in either direction, for > non-bmp chars. Then, as you suggest, if one is not prepared for > surrogates, they are not allowed.
That would be a possibility as well... but I doubt that many users are going to bother, since slicing surrogates is just as bad as slicing combining code points and the latter are much more common in real life and they do happen to mostly live in the BMP. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com