[Python-ideas] Re: Add a .whitespace property to module unicodedata

David Mertz, Ph.D. Thu, 01 Jun 2023 11:07:31 -0700

I guess this is pretty general for the described need:

>>> %time unicode_whitespace = [chr(c) for c in range(0xFFFF) if 
>>> unicodedata.category(chr(c)) == "Zs"]
CPU times: user 19.2 ms, sys: 0 ns, total: 19.2 ms
Wall time: 18.7 ms
>>> unicode_whitespace
[' ', '\xa0', '\u1680', '\u2000', '\u2001', '\u2002', '\u2003',
'\u2004', '\u2005', '\u2006', '\u2007', '\u2008', '\u2009', '\u200a',
'\u202f', '\u205f', '\u3000']


It's milliseconds not nanoseconds, but presumably something you do
once at the start of an application.  Can anyone think of a more
efficient and/or more concise way of doing this?

This definitely feels better than making a static sequence of
characters since the Unicode Consortium may (and has) changed the
definition.  In particular, MONGOLIAN VOWEL SEPARATOR (U+180E) was
removed from the whitespace category to which it previously belonged.
I'm not sure why U+FEFF isn't included, but that seems to match the
current standards, so all good.

On Thu, Jun 1, 2023 at 1:29 PM Marc-Andre Lemburg <[email protected]> wrote:
>
> On 01.06.2023 18:18, Paul Moore wrote:
> > On Thu, 1 Jun 2023 at 15:09, Antonio Carlos Jorge Patricio
> > <[email protected] <mailto:[email protected]>> wrote:
> >
> >     I suggest including a simple str variable in unicodedata module to
> >     mirror string.whitespace, so it would contain all characters defined
> >     in CPython function
> >     
> > [_PyUnicode_IsWhitespace()](https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314
> >  
> > <https://github.com/python/cpython/blob/main/Objects/unicodetype_db.h#L6314>)
> >  so that:
> >
> >       # existent
> >     string.whitespace = ' \t\n\r\x0b\x0c'
> >
> >     # proposed
> >     unicodedata.whitespace = '
> >     
> > \t\n\x0b\x0c\r\x1c\x1d\x1e\x1f\x85\xa0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000'
> >
> >
> > What's the use case? I can't think of a single occasion when I would
> > have found this useful.
>
> Same here.
>
> For those few cases, where it might be useful, you can easily put the
> string into your application code.
>
> Putting this into the stdlib would just mean that we'd have to recheck
> whether new Unicode whitespace chars were added, every time the standard
> upgrades. With ASCII, this won't happen in the foreseeable future ;-)
>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Experts (#1, Jun 01 2023)
>  >>> Python Projects, Coaching and Support ...    https://www.egenix.com/
>  >>> Python Product Development ...        https://consulting.egenix.com/
> ________________________________________________________________________
>
> ::: We implement business ideas - efficiently in both time and costs :::
>
>     eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>      D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>             Registered at Amtsgericht Duesseldorf: HRB 46611
>                 https://www.egenix.com/company/contact/
>                       https://www.malemburg.com/
>
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/[email protected]/message/REMDZ2SVFVOIDEJYX3VSB2WUZTQPTTLM/
> Code of Conduct: http://python.org/psf/codeofconduct/



-- 
The dead increasingly dominate and strangle both the living and the
not-yet born.  Vampiric capital and undead corporate persons abuse
the lives and control the thoughts of homo faber. Ideas, once born,
become abortifacients against new conceptions.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/3CH6FHG4BCXNBTF4LBZOYLRNHEKXCMYY/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Add a .whitespace property to module unicodedata

Reply via email to