Yeah... oops. Obviously I typed the version in email. Should have done it in the shell. But you got the intention of set-ifying the characters in the large string.
Yes on lies, damn lies, and benchmarks. On Fri, Jun 2, 2023, 7:29 PM Chris Angelico <[email protected]> wrote: > On Sat, 3 Jun 2023 at 08:28, David Mertz, Ph.D. <[email protected]> > wrote: > > > > This is just bar talk at this point. I think we've shown that this is > > easy enough to do that programmers can roll their own. > > > > But as idle chat goes, note that in your code: > > > > set(unicodedata.category(ch) for ch in s) > > > > If `s` is a billion characters long, then we make a billion calls to > > the `.category()` method. Python calls are comparatively expensive, > > even on well optimized data structures like strings. > > > > In my version: > > > > bool(set(s) & set(unicode_categories['Sc']) > > > > The billion characters are first reduced to a smallish set of hundreds > > or thousands of distinct characters without needing method calls. Then > > that is intersected with a smallish set of characters in the category. > > > > You could optimize your version, however, simply by using: > > > > set(unicodedata.category(set(ch)) for ch in s) > > Or perhaps: > > set(unicodedata.category(ch) for ch in set(s)) > > But measure before considering this worthwhile. > > > Yours provides more information, since it lists all the categories. > > But if you REALLY only care about one category, then you still have to > > ask `'Sc' in set(unicodedata.category(set(ch)) for ch in s)`. Which > > is fine, that's not a hard question to ask. > > If you REALLY want to just check whether any category is there, you > probably want something like: > > any(unicodedata.category(ch) == "Sc" for ch in s) > > which is completely different from what you were suggesting, and still > doesn't require the string of all codepoints in the category. > > Point is, querying the string is almost always going to be more > efficient than intersecting with the full gamut of that category. > > ChrisA > _______________________________________________ > Python-ideas mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/[email protected]/message/KMHZOQJQPILZD6Z3AKKRQXGHXVYFQPER/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/FC64VVAITJTQLIHQYT2BUHSU64VXJXSC/ Code of Conduct: http://python.org/psf/codeofconduct/
