Yeah... oops. Obviously I typed the version in email. Should have done it
in the shell. But you got the intention of set-ifying the characters in the
large string.

Yes on lies, damn lies, and benchmarks.

On Fri, Jun 2, 2023, 7:29 PM Chris Angelico <> wrote:

> On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. <>
> wrote:
> >
> > This is just bar talk at this point.  I think we've shown that this is
> > easy enough to do that programmers can roll their own.
> >
> > But as idle chat goes, note that in your code:
> >
> >    set(unicodedata.category(ch) for ch in s)
> >
> > If `s` is a billion characters long, then we make a billion calls to
> > the `.category()` method.  Python calls are comparatively expensive,
> > even on well optimized data structures like strings.
> >
> > In my version:
> >
> >     bool(set(s) & set(unicode_categories['Sc'])
> >
> > The billion characters are first reduced to a smallish set of hundreds
> > or thousands of distinct characters without needing method calls. Then
> > that is intersected with a smallish set of characters in the category.
> >
> > You could optimize your version, however, simply by using:
> >
> >    set(unicodedata.category(set(ch)) for ch in s)
> Or perhaps:
> set(unicodedata.category(ch) for ch in set(s))
> But measure before considering this worthwhile.
> > Yours provides more information, since it lists all the categories.
> > But if you REALLY only care about one category, then you still have to
> > ask `'Sc' in set(unicodedata.category(set(ch)) for ch in s)`.  Which
> > is fine, that's not a hard question to ask.
> If you REALLY want to just check whether any category is there, you
> probably want something like:
> any(unicodedata.category(ch) == "Sc" for ch in s)
> which is completely different from what you were suggesting, and still
> doesn't require the string of all codepoints in the category.
> Point is, querying the string is almost always going to be more
> efficient than intersecting with the full gamut of that category.
> ChrisA
> _______________________________________________
> Python-ideas mailing list --
> To unsubscribe send an email to
> Message archived at
> Code of Conduct:
Python-ideas mailing list --
To unsubscribe send an email to
Message archived at
Code of Conduct:

Reply via email to