I think it shouldn't be str's method. They should be separate class to reuse internal tree.
There are some Aho Corasick implementation on PyPI. As far as I know, AC is longest match. On the other hand, Go's replacer (it's trie based too) is: > Replacements are performed in order, without overlapping matches. https://golang.org/pkg/strings/#NewReplacer On Sun, Feb 4, 2018 at 7:04 AM, Franklin? Lee <leewangzhong+pyt...@gmail.com> wrote: > Let s be a str. I propose to allow these existing str methods to take params > in new forms. > > s.replace(old, new): > Allow passing in a collection of olds. > Allow passing in a single argument, a mapping of olds to news. > Allow the olds in the mapping to be tuples of strings. > > s.split(sep), s.rsplit, s.partition: > Allow sep to be a collection of separators. > > s.startswith, s.endswith: > Allow argument to be a collection of strings. > > s.find, s.index, s.count, x in s: > Similar. > These methods are also in `list`, which can't distinguish between items, > subsequences, and subsets. However, `str` is already inconsistent with > `list` here: list.M looks for an item, while str.M looks for a subsequence. > > s.[r|l]strip: > Sadly, these functions already interpret their str arguments as > collections of characters. > > These new forms can be optimized internally, as a search for multiple > candidate substrings can be more efficient than searching for one at a time. > See > https://stackoverflow.com/questions/3260962/algorithm-to-find-multiple-string-matches > > The most significant change is on .replace. The others are simple enough to > simulate with a loop or something. It is harder to make multiple > simultaneous replacements using one .replace at a time, because previous > replacements can form new things that look like replaceables. The easiest > Python solution is to use regex or install some package, which uses (if > you're lucky) regex or (if unlucky) doesn't simulate simultaneous > replacements. (If possible, just use str.translate.) > > I suppose .split on multiple separators is also annoying to simulate. The > two-argument form of .split may be even more of a burden, though I don't > know when a limited multiple-separator split is useful. The current best > solution is, like before, to use regex, or install a package and hope for > the best. > > _______________________________________________ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- INADA Naoki <songofaca...@gmail.com> _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/