On 25 Mar 2020, at 9:48, Stephen J. Turnbull wrote:
Walter Dörwald writes:
A `find()` that supports multiple search strings (and returns the
leftmost position where a search string can be found) is a great help
in
implementing some kind of tokenizer:
In other words, you want the equivalent of Emacs's "(search-forward
(regexp-opt list-of-strings))", which also meets the requirement of
returning which string was found (as "(match-string 0)").
Sounds like it. I'm not familiar with Emacs.
Since Python already has a functionally similar API for regexps, we
can add a regexp-opt (with appropriate name) method to re, perhaps as
.compile_string_list(), and provide a convenience function
re.search_string_list() for your application.
If you're using regexps anyway, building the appropriate or-expression
shouldn't be a problem. I guess that's what most lexers/tokenizers do
anyway.
I'm applying practicality before purity, of course. To some extent
we want to encourage simple string approaches, and putting this in
regex is not optimal for that.
Exactly. I'm always a bit hesitant when using regexps, if there's a
simpler string approach.
Steve
Servus,
Walter
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/46KMMKYHW7DIDNZFO27GNQCJVILNSQ6Q/
Code of Conduct: http://python.org/psf/codeofconduct/