2008/8/22 Fredrik Lundh <[EMAIL PROTECTED]>: > On Fri, Aug 22, 2008 at 4:59 PM, Guido van Rossum <[EMAIL PROTECTED]> wrote: > >>> (how's the 3.2/4.1 dual support implemented? do we have two distinct >>> datasets, or are the differences encoded in some clever way? would it >>> make sense to split the unicodedata module into three separate >>> modules, one for each major Unicode version?) >> >> The current API looks fine to me: unicodedata is the latest version >> whereas unicodedata.ucd_3_2_0 is the older version. The APIs are the >> same; there's a tiny bit of code in the generated _db.h file that >> expresses the differences: >> >> static const change_record* get_change_3_2_0(Py_UCS4 n) >> { >> int index; >> if (n >= 0x110000) index = 0; >> else { >> index = changes_3_2_0_index[n>>7]; >> index = changes_3_2_0_data[(index<<7)+(n & 127)]; >> } >> return change_records_3_2_0+index; >> } > > there's a bunch of data tables as well, but they don't seem to be very > large. looks like Martin did a thorough job here. > > ... digging digging digging ... > > yes, the generator script produces difference tables between the main > version and a list of older versions. I'd say it's worth running the > script on the 5.1.0 tables, and if it doesn't choke, compare the > resulting table with the corresponding table for 4.1.0 (a simple loop > fetching the main properties for all code points). if the differences > look reasonably small, switch 5.1.0 and keep the others.
Right, that's my hope as well. I believe the changes between 3.2 and 4.1 were much larger than more recent changes. (Yay convergence! :-) > I can tinker a little with this over the weekend, unless Martin tells > me not to ;-) That would be great! -- --Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com