On Thu, Dec 15, 2016 at 4:53 PM, Steve D'Aprano
<steve+pyt...@pearwood.info> wrote:
> Suppose I have a Unicode character, and I want to determine the script or
> scripts it belongs to.
>
> For example:
>
> U+0033 DIGIT THREE "3" belongs to the script "COMMON";
> U+0061 LATIN SMALL LETTER A "a" belongs to the script "LATIN";
> U+03BE GREEK SMALL LETTER XI "ΞΎ" belongs to the script "GREEK".
>
> Is this information available from Python?

Tools/makunicodedata.py doesn't include data from "Scripts.txt". If
adding an external dependency is ok, then you can use PyICU. For
example:

    >>> icu.Script.getScript('\u0033').getName()
    'Common'
    >>> icu.Script.getScript('\u0061').getName()
    'Latin'
    >>> icu.Script.getScript('\u03be').getName()
    'Greek'

There isn't documentation specific to Python, so you'll have to figure
things out experimentally with reference to the C API.

http://icu-project.org/apiref/icu4c
http://icu-project.org/apiref/icu4c/uscript_8h.html
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to