Amaury Forgeot d'Arc <amaur...@gmail.com> added the comment:

> I must be missing some detail, but what does the Unicode database
> have to do with the unicodeobject.c C API ?

Ah, now I understand your concerns. My suggestion is to change only the 20 
functions in 
unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in 
unicodeobject.c at all.
They all take a single code point as argument, some also return a single code 
point.
Changing these functions is backwards compatible.

I join a patch so we can argue on concrete code (tests are missing).

Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can 
return 2.0.

The str.isalpha() (and others) methods did not change: they still split the 
surrogate pairs.

----------
keywords: +patch
Added file: http://bugs.python.org/file12934/unicodectype_ucs4.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue5127>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to