linebreak from Unicode database.

Amaury Forgeot d'Arc Tue, 30 Jun 2009 17:03:38 -0700

Amaury Forgeot d'Arc <[email protected]> added the comment:

Here is a refreshed version of the patch, without the generated files.
The patch combines several changes which are fairly independent from 
each other:


- Using the unicode database to generate the functions adds 143 new 
codepoints to PyUnicode_ToNumeric, and one codepoint to 
PyUnicode_IsWhitespace.

- In addition, PyUnicode_ToNumeric now contains code for all numerics; 
previously those which are also digits fell in the 'default:' case and 
were converted with PyUnicode_ToDigit(). This adds 468 new codepoints, 
but removes the need to call PyUnicode_ToDigit()

- The Unihan.txt files (two files to download, 25Mb each) are now 
parsed, and this adds 73 more codepoints to PyUnicode_ToNumeric. (There 
are now 1009 entries in this function.)
The 3.2.0 version of this file contains two huge numbers: 1e16 and 1e20, 
I had to widen the type of 'change_record.numeric_changed' from 'int' to 
'double'.  It is possible that these were removed from the Unicode 
database between versions 4.1 and 5.1.

- the database has a new flag, NUMERIC_MASK, used by 
PyUnicode_IsNumeric.  This adds ~350 lines in the arrays of numbers in 
unicodetype_db.h

If this patch is accepted, the md5 checksum in test_unicodedata.py will 
need to change.

----------
nosy: +amaury.forgeotdarc
Added file: http://bugs.python.org/file14413/unicodedata-2.7.patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1571184>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1571184] Generate numeric/space/linebreak from Unicode database.

Reply via email to