Yeah... oops. Obviously I typed the version in email. Should have done it
in the shell. But you got the intention of set-ifying the characters in the
large string.

Yes on lies, damn lies, and benchmarks.

On Fri, Jun 2, 2023, 7:29 PM Chris Angelico <ros...@gmail.com> wrote:

> On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. <david.me...@gmail.com>
> wrote:
> >
> > This is just bar talk at this point.  I think we've shown that this is
> > easy enough to do that programmers can roll their own.
> >
> > But as idle chat goes, note that in your code:
> >
> >    set(unicodedata.category(ch) for ch in s)
> >
> > If `s` is a billion characters long, then we make a billion calls to
> > the `.category()` method.  Python calls are comparatively expensive,
> > even on well optimized data structures like strings.
> >
> > In my version:
> >
> >     bool(set(s) & set(unicode_categories['Sc'])
> >
> > The billion characters are first reduced to a smallish set of hundreds
> > or thousands of distinct characters without needing method calls. Then
> > that is intersected with a smallish set of characters in the category.
> >
> > You could optimize your version, however, simply by using:
> >
> >    set(unicodedata.category(set(ch)) for ch in s)
>
> Or perhaps:
>
> set(unicodedata.category(ch) for ch in set(s))
>
> But measure before considering this worthwhile.
>
> > Yours provides more information, since it lists all the categories.
> > But if you REALLY only care about one category, then you still have to
> > ask `'Sc' in set(unicodedata.category(set(ch)) for ch in s)`.  Which
> > is fine, that's not a hard question to ask.
>
> If you REALLY want to just check whether any category is there, you
> probably want something like:
>
> any(unicodedata.category(ch) == "Sc" for ch in s)
>
> which is completely different from what you were suggesting, and still
> doesn't require the string of all codepoints in the category.
>
> Point is, querying the string is almost always going to be more
> efficient than intersecting with the full gamut of that category.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/KMHZOQJQPILZD6Z3AKKRQXGHXVYFQPER/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/FC64VVAITJTQLIHQYT2BUHSU64VXJXSC/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to