On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. <david.me...@gmail.com> wrote:
>
> This is just bar talk at this point.  I think we've shown that this is
> easy enough to do that programmers can roll their own.
>
> But as idle chat goes, note that in your code:
>
>    set(unicodedata.category(ch) for ch in s)
>
> If `s` is a billion characters long, then we make a billion calls to
> the `.category()` method.  Python calls are comparatively expensive,
> even on well optimized data structures like strings.
>
> In my version:
>
>     bool(set(s) & set(unicode_categories['Sc'])
>
> The billion characters are first reduced to a smallish set of hundreds
> or thousands of distinct characters without needing method calls. Then
> that is intersected with a smallish set of characters in the category.
>
> You could optimize your version, however, simply by using:
>
>    set(unicodedata.category(set(ch)) for ch in s)

Or perhaps:

set(unicodedata.category(ch) for ch in set(s))

But measure before considering this worthwhile.

> Yours provides more information, since it lists all the categories.
> But if you REALLY only care about one category, then you still have to
> ask `'Sc' in set(unicodedata.category(set(ch)) for ch in s)`.  Which
> is fine, that's not a hard question to ask.

If you REALLY want to just check whether any category is there, you
probably want something like:

any(unicodedata.category(ch) == "Sc" for ch in s)

which is completely different from what you were suggesting, and still
doesn't require the string of all codepoints in the category.

Point is, querying the string is almost always going to be more
efficient than intersecting with the full gamut of that category.

ChrisA
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/KMHZOQJQPILZD6Z3AKKRQXGHXVYFQPER/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to