[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Kyle Stanley Sat, 07 Dec 2019 16:05:53 -0800

> Alternatively: creating a new section under
https://docs.python.org/3.8/library/re.html#regular-expression-examples,
titled
"Finding the first match", where it briefly explains the difference in
behavior between using re.findall()[0] and re.finditer().group(1) (or
re.finditer.group() when there's not a subgroup). Based on the discussions
in this thread and code examples, this seems to be rather commonly
misunderstood.


Another clarification: The above should be about re.search(), not
re.finditer(). I sent this reply late at night and got the two mixed up.

On Sat, Dec 7, 2019 at 7:47 AM Kyle Stanley <aeros...@gmail.com> wrote:

> > Code examples should of course be used sparingly, but I think
> re.finditer() could benefit from at least one
>
> Clarification: I see that there's an example of it being used in
> https://docs.python.org/3.8/library/re.html#finding-all-adverbs-and-their-positions
> and one more complex example with
> https://docs.python.org/3.8/library/re.html#writing-a-tokenizer. I was
> specifically referring to including a basic example directly within
> https://docs.python.org/3.8/library/re.html#re.finditer, similar to the
> section for https://docs.python.org/3.8/library/re.html#re.split or
> https://docs.python.org/3.8/library/re.html#re.sub.
>
> Alternatively: creating a new section under
> https://docs.python.org/3.8/library/re.html#regular-expression-examples,
> titled "Finding the first match", where it briefly explains the difference
> in behavior between using re.findall()[0] and re.finditer().group(1) (or
> re.finditer.group() when there's not a subgroup). Based on the discussions
> in this thread and code examples, this seems to be rather commonly
> misunderstood.
>
> On Sat, Dec 7, 2019 at 7:29 AM Kyle Stanley <aeros...@gmail.com> wrote:
>
>> Serhiy Storchaka wrote:
>> > My concern is that this will add complexity to the module documentation
>> > which is already too complex. re.findfirst() has more complex semantic
>> > (if no capture groups returns this, if one capture group return that,
>> > and in other cases return even something of different type) than
>> > re.search() which just returns a match object or None. This will
>> > increase chance that the user miss the appropriate function and use
>> > suboptimal functions like findall()[0].
>>
>> > re.finditer() is more modern and powerful function than re.findall().
>> > The latter may be even deprecated in future.
>>
>> Hmm, perhaps another consideration then would be to think of improvements
>> to make to the existing documentation, particularly with including some
>> code examples or expanding upon the docs for re.finditer() to make its
>> usage more clear. Personally, it took me quite a while to understand its
>> role in the module (as someone who does not use it on a frequent basis).
>> Code examples should of course be used sparingly, but I think re.finditer()
>> could benefit from at least one. Especially considering that far less
>> complex functions in the module have several examples. See
>> https://docs.python.org/3.8/library/re.html#re.finditer.
>>
>> Serhiy Storchaka wrote:
>> > > Another option to consider might be adding a boolean parameter to
>> > > re.search() that changes the behavior to directly return a string
>> > > instead of a match object, similar to re.findall() when there are not
>> > > multiple subgroups.
>>
>> > Oh, no, this is the worst idea!
>>
>> Yeah, after having some time to reflect on that idea a bit more I don't
>> think it would work. That would just end up adding confusion to
>> re.search(), ultimately defeating the purpose of the parameter in the first
>> place. It would be too drastic of a change in behavior for a single
>> parameter to make.
>>
>> Thanks for the honesty though, not all of my ideas are good ones. But, if
>> I can come up with something half-decent every once in a while I think it's
>> worth throwing them out there. (:
>>
>>
>>
>>
>> On Sat, Dec 7, 2019 at 2:56 AM Serhiy Storchaka <storch...@gmail.com>
>> wrote:
>>
>>> 06.12.19 23:20, Kyle Stanley пише:
>>> > Serhiy Storchaka wrote:
>>> >  > It seems that in most cases the author just do not know about
>>> >  > re.search(). Adding re.findfirst() will not fix this.
>>> >
>>> > That's definitely possible, but it might be just as likely that they
>>> saw
>>> > re.findall() as being more simple to use compared to re.search().
>>> > Although it has worse performance by a substantial amount when parsing
>>> > decent amounts of text (assuming the first match isn't at the end),
>>> > ``re.findall()[0]`` /consistently/ returns the first string that was
>>> > matched, as long as no subgroups were used. This allows them to
>>> > circumvent the usage of match objects entirely, which makes it a bit
>>> > easier to learn. Especially for those who are less familiar with OOP,
>>> or
>>> > are already familiar with other popular flavors of regex (such as JS).
>>> >
>>> > I'll admit this is mostly speculation, but I think there's an
>>> especially
>>> > large number of re users (compared to other modules) that aren't
>>> > necessarily developers, and might just be someone who wants to write a
>>> > script to quickly parse some documents. These types of users are the
>>> > ones who would likely benefit the most from the proposed
>>> re.findfirst(),
>>> > particularly if it directly returns a string as Guido is suggesting.
>>> >
>>> > I think at the end of the day, the critical question to answer is this:
>>> >
>>> > *Do we want to add a new helper function that's easy to use,
>>> consistent,
>>> > and provides good performance for finding the first match, even if the
>>> > functionality already exists within the module?*
>>>
>>> My concern is that this will add complexity to the module documentation
>>> which is already too complex. re.findfirst() has more complex semantic
>>> (if no capture groups returns this, if one capture group return that,
>>> and in other cases return even something of different type) than
>>> re.search() which just returns a match object or None. This will
>>> increase chance that the user miss the appropriate function and use
>>> suboptimal functions like findall()[0].
>>>
>>> re.finditer() is more modern and powerful function than re.findall().
>>> The latter may be even deprecated in future.
>>>
>>> In future we may add yet few functions/methods: re.rmatch() (like
>>> re.match(), but matches at the end of the string instead of the start),
>>> re.rsearch() (searches from the end), re.rfinditer() (iterates in the
>>> reversed order). Unlike to findfirst() they will implement features that
>>> cannot be easily expressed using existing functions.
>>>
>>> > Another option to consider might be adding a boolean parameter to
>>> > re.search() that changes the behavior to directly return a string
>>> > instead of a match object, similar to re.findall() when there are not
>>> > multiple subgroups.
>>>
>>> Oh, no, this is the worst idea!
>>> _______________________________________________
>>> Python-ideas mailing list -- python-ideas@python.org
>>> To unsubscribe send an email to python-ideas-le...@python.org
>>> https://mail.python.org/mailman3/lists/python-ideas.python.org/
>>> Message archived at
>>> https://mail.python.org/archives/list/python-ideas@python.org/message/C4VUEDFVLRJ5G7KTDI5G5RNC3MMP7X6V/
>>> Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>>

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QBCYGN5EEQMKPPDXZOAUKAXRQ2QBGWMU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fwd: Re: Fwd: re.findfirst()

Reply via email to