[issue18234] Unicodedata module should provide access to codepoint aliases

Alexander Belopolsky Sun, 23 Jun 2013 12:41:42 -0700

Alexander Belopolsky added the comment:

> Can a character or sequence have multiple aliases?


Yes, for example, most control characters have two aliases (and no name).

0000;NULL;control
0000;NUL;abbreviation
0001;START OF HEADING;control
0001;SOH;abbreviation
0002;START OF TEXT;control
0002;STX;abbreviation

(See <http://www.unicode.org/Public/UNIDATA/NameAliases.txt>)

> What will be a result type of unicodedata.name() with "abbreviation" keyword 
> value?

Under my proposal:

>>> unicodedata.name('\N{ESCAPE}', type='abbreviation')
'ESC'

I would also like to consider changing the default slightly.  I find the 
following behavior rather unhelpful:

>>> unicodedata.name('\N{ESC}')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name

I think most users would expect 'ESCAPE' instead.

The following is more of a curiosity rather than a genuine problem, but is a 
good illustration for a general point:

>>> unicodedata.name('\N{PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR 
>>> BRACKET}')
'PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET'

(Note misspelled word "BRACKET" in the output.)

Since "correction" alias is the official method of publishing corrections to 
unicode names, I think unicodedata.name() should return correct name by default.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue18234>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18234] Unicodedata module should provide access to codepoint aliases

Reply via email to