Package: unicode
Version: 2.8-1.1
Severity: normal
The unicode tool fails to properly construct some systematic
names that are abbreviated in the UnicodeData.txt file.
It completely fails to do it in Tangut and Tangut Supplement blocks:
$ unicode --brief 17a98
𗪘 U+17A98 - No such unicode character name in database
The name above should be "TANGUT IDEOGRAPH-17A98".
Other properties except the name are listed correctly.
Even in ranges where systematic names are derived correctly, unicode
still displays the UnicodeData meta-label instead of the character
name for the first and the last character:
$ unicode --brief ac00 ac01 d7a3 3400 3401 4dbf
가 U+AC00 <Hangul Syllable, First>
각 U+AC01 HANGUL SYLLABLE GAG
힣 U+D7A3 <Hangul Syllable, Last>
㐀 U+3400 <CJK Ideograph Extension A, First>
㐁 U+3401 CJK UNIFIED IDEOGRAPH-3401
䶿 U+4DBF <CJK Ideograph Extension A, Last>
The missing names should be:
U+AC00 HANGUL SYLLABLE GA
U+D7A3 HANGUL SYLLABLE HIH
U+3400 CJK UNIFIED IDEOGRAPH-3400
U+4DBF CJK UNIFIED IDEOGRAPH-4DBF
Leaving UnicodeData meta-label might make some sense in case
of control characters*, but it doesn't make any sense for ranges
that have systematic names defined by generative rules and are
abbreviated in UnicodeData only to save space.
* It would probably be better if controls and other code points
with no name followed the convention described in Unicode §4.8
for code point labels, i.e. <control-0009> for U+0009 instead of
just <control>, and other labels as appropriate for reserved,
noncharacter, private use, and surrogates instead of just
" - No such unicode character name in database", but that would
be a separate feature request that I don't care enough about to
make it.
-k
-- System Information:
Debian Release: bookworm/sid
APT prefers testing
APT policy: (900, 'testing'), (700, 'unstable')
Architecture: amd64 (x86_64)
Kernel: Linux 5.15.0-3-amd64 (SMP w/4 CPU threads)
Kernel taint flags: TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=pl_PL.UTF-8, LC_CTYPE=pl_PL.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /bin/dash
Init: sysvinit (via /sbin/init)
Versions of packages unicode depends on:
ii python3 3.9.8-1
Versions of packages unicode recommends:
ii unicode-data 14.0.0-1.1
Versions of packages unicode suggests:
ii bzip2 1.0.8-5
-- no debconf information