> Alternatively: creating a new section under https://docs.python.org/3.8/library/re.html#regular-expression-examples, titled "Finding the first match", where it briefly explains the difference in behavior between using re.findall()[0] and re.finditer().group(1) (or re.finditer.group() when there's not a subgroup). Based on the discussions in this thread and code examples, this seems to be rather commonly misunderstood.
Another clarification: The above should be about re.search(), not re.finditer(). I sent this reply late at night and got the two mixed up. On Sat, Dec 7, 2019 at 7:47 AM Kyle Stanley <aeros...@gmail.com> wrote: > > Code examples should of course be used sparingly, but I think > re.finditer() could benefit from at least one > > Clarification: I see that there's an example of it being used in > https://docs.python.org/3.8/library/re.html#finding-all-adverbs-and-their-positions > and one more complex example with > https://docs.python.org/3.8/library/re.html#writing-a-tokenizer. I was > specifically referring to including a basic example directly within > https://docs.python.org/3.8/library/re.html#re.finditer, similar to the > section for https://docs.python.org/3.8/library/re.html#re.split or > https://docs.python.org/3.8/library/re.html#re.sub. > > Alternatively: creating a new section under > https://docs.python.org/3.8/library/re.html#regular-expression-examples, > titled "Finding the first match", where it briefly explains the difference > in behavior between using re.findall()[0] and re.finditer().group(1) (or > re.finditer.group() when there's not a subgroup). Based on the discussions > in this thread and code examples, this seems to be rather commonly > misunderstood. > > On Sat, Dec 7, 2019 at 7:29 AM Kyle Stanley <aeros...@gmail.com> wrote: > >> Serhiy Storchaka wrote: >> > My concern is that this will add complexity to the module documentation >> > which is already too complex. re.findfirst() has more complex semantic >> > (if no capture groups returns this, if one capture group return that, >> > and in other cases return even something of different type) than >> > re.search() which just returns a match object or None. This will >> > increase chance that the user miss the appropriate function and use >> > suboptimal functions like findall()[0]. >> >> > re.finditer() is more modern and powerful function than re.findall(). >> > The latter may be even deprecated in future. >> >> Hmm, perhaps another consideration then would be to think of improvements >> to make to the existing documentation, particularly with including some >> code examples or expanding upon the docs for re.finditer() to make its >> usage more clear. Personally, it took me quite a while to understand its >> role in the module (as someone who does not use it on a frequent basis). >> Code examples should of course be used sparingly, but I think re.finditer() >> could benefit from at least one. Especially considering that far less >> complex functions in the module have several examples. See >> https://docs.python.org/3.8/library/re.html#re.finditer. >> >> Serhiy Storchaka wrote: >> > > Another option to consider might be adding a boolean parameter to >> > > re.search() that changes the behavior to directly return a string >> > > instead of a match object, similar to re.findall() when there are not >> > > multiple subgroups. >> >> > Oh, no, this is the worst idea! >> >> Yeah, after having some time to reflect on that idea a bit more I don't >> think it would work. That would just end up adding confusion to >> re.search(), ultimately defeating the purpose of the parameter in the first >> place. It would be too drastic of a change in behavior for a single >> parameter to make. >> >> Thanks for the honesty though, not all of my ideas are good ones. But, if >> I can come up with something half-decent every once in a while I think it's >> worth throwing them out there. (: >> >> >> >> >> On Sat, Dec 7, 2019 at 2:56 AM Serhiy Storchaka <storch...@gmail.com> >> wrote: >> >>> 06.12.19 23:20, Kyle Stanley пише: >>> > Serhiy Storchaka wrote: >>> > > It seems that in most cases the author just do not know about >>> > > re.search(). Adding re.findfirst() will not fix this. >>> > >>> > That's definitely possible, but it might be just as likely that they >>> saw >>> > re.findall() as being more simple to use compared to re.search(). >>> > Although it has worse performance by a substantial amount when parsing >>> > decent amounts of text (assuming the first match isn't at the end), >>> > ``re.findall()[0]`` /consistently/ returns the first string that was >>> > matched, as long as no subgroups were used. This allows them to >>> > circumvent the usage of match objects entirely, which makes it a bit >>> > easier to learn. Especially for those who are less familiar with OOP, >>> or >>> > are already familiar with other popular flavors of regex (such as JS). >>> > >>> > I'll admit this is mostly speculation, but I think there's an >>> especially >>> > large number of re users (compared to other modules) that aren't >>> > necessarily developers, and might just be someone who wants to write a >>> > script to quickly parse some documents. These types of users are the >>> > ones who would likely benefit the most from the proposed >>> re.findfirst(), >>> > particularly if it directly returns a string as Guido is suggesting. >>> > >>> > I think at the end of the day, the critical question to answer is this: >>> > >>> > *Do we want to add a new helper function that's easy to use, >>> consistent, >>> > and provides good performance for finding the first match, even if the >>> > functionality already exists within the module?* >>> >>> My concern is that this will add complexity to the module documentation >>> which is already too complex. re.findfirst() has more complex semantic >>> (if no capture groups returns this, if one capture group return that, >>> and in other cases return even something of different type) than >>> re.search() which just returns a match object or None. This will >>> increase chance that the user miss the appropriate function and use >>> suboptimal functions like findall()[0]. >>> >>> re.finditer() is more modern and powerful function than re.findall(). >>> The latter may be even deprecated in future. >>> >>> In future we may add yet few functions/methods: re.rmatch() (like >>> re.match(), but matches at the end of the string instead of the start), >>> re.rsearch() (searches from the end), re.rfinditer() (iterates in the >>> reversed order). Unlike to findfirst() they will implement features that >>> cannot be easily expressed using existing functions. >>> >>> > Another option to consider might be adding a boolean parameter to >>> > re.search() that changes the behavior to directly return a string >>> > instead of a match object, similar to re.findall() when there are not >>> > multiple subgroups. >>> >>> Oh, no, this is the worst idea! >>> _______________________________________________ >>> Python-ideas mailing list -- python-ideas@python.org >>> To unsubscribe send an email to python-ideas-le...@python.org >>> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >>> Message archived at >>> https://mail.python.org/archives/list/python-ideas@python.org/message/C4VUEDFVLRJ5G7KTDI5G5RNC3MMP7X6V/ >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QBCYGN5EEQMKPPDXZOAUKAXRQ2QBGWMU/ Code of Conduct: http://python.org/psf/codeofconduct/