Martin v. Löwis <mar...@v.loewis.de> added the comment:

The patch needs to take versioning into account. It seems that NamedSequences 
where added in 4.1, and NameAliases in 5.0. So for the moment, when using 3.2 
(i.e. when self is not NULL), it is fine to lookup neither. Please put an 
assertion into makeunicodedata that this needs to be reviewed when an old 
version other than 3.2 needs to be supported.

The size of the DB does matter; there are frequent complaints about it. The 
named sequences take 20kB on my system; not sure whether that's too much. If 
you want to reduce the size (and also speedup lookup), you could use 
private-use characters, like so:
- add the named sequences as PUA characters to the names table of 
makeunicodename, in the range(P, P+418) (for some P).
- in lookup, check whether the _getcode result is in range(P,P+418). If so, 
subtract P from the code and use this as an index into _namedsequences.
- add a _getcode wrapper that filters out all private use characters, for 
regular lookup.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12753>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to