Amaury Forgeot d'Arc <[email protected]> added the comment:
> I must be missing some detail, but what does the Unicode database
> have to do with the unicodeobject.c C API ?
Ah, now I understand your concerns. My suggestion is to change only the 20
functions in
unicodectype.c: _PyUnicode_IsAlpha, _PyUnicode_ToLowercase... and no change in
unicodeobject.c at all.
They all take a single code point as argument, some also return a single code
point.
Changing these functions is backwards compatible.
I join a patch so we can argue on concrete code (tests are missing).
Another effect of the patch: unicodedata.numeric('\N{AEGEAN NUMBER TWO}') can
return 2.0.
The str.isalpha() (and others) methods did not change: they still split the
surrogate pairs.
----------
keywords: +patch
Added file: http://bugs.python.org/file12934/unicodectype_ucs4.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue5127>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com