Martin v. Löwis <mar...@v.loewis.de> added the comment: The patch needs to take versioning into account. It seems that NamedSequences where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2 (i.e. when self is not NULL), it is fine to lookup neither. Please put an assertion into makeunicodedata that this needs to be reviewed when an old version other than 3.2 needs to be supported.
The size of the DB does matter; there are frequent complaints about it. The named sequences take 20kB on my system; not sure whether that's too much. If you want to reduce the size (and also speedup lookup), you could use private-use characters, like so: - add the named sequences as PUA characters to the names table of makeunicodename, in the range(P, P+418) (for some P). - in lookup, check whether the _getcode result is in range(P,P+418). If so, subtract P from the code and use this as an index into _namedsequences. - add a _getcode wrapper that filters out all private use characters, for regular lookup. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12753> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com