[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Ma Lin
Ma Lin added the comment: Thanks for your review. 3.11 has a more powerful re module, also thank you for rebasing the atomic grouping code. -- ___ Python tracker ___

[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Since the old behavior in many cases matches the behavior of Perl and Java (which are considered bugs, but still), it was decided to not backport the fix to avoid possible breakage in bugfix releases. Thank you Ma Lin for your contribution. --

[issue35859] Capture behavior depends on the order of an alternation

2022-03-29 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: New changeset 35699721a3391175d20e9ef03d434675b496 by Ma Lin in branch 'main': bpo-35859: Fix a few long-standing bugs in re engine (GH-12427) https://github.com/python/cpython/commit/35699721a3391175d20e9ef03d434675b496 --

[issue35859] Capture behavior depends on the order of an alternation

2020-06-29 Thread Ma Lin
Ma Lin added the comment: Do I need to write a detailed review guide? I suppose that after reading it from beginning to end, it will be easy to understand PR 12427, no need to read anything else. Or plan to replace the sre module with the regex module in a future version? --

[issue35859] Capture behavior depends on the order of an alternation

2020-05-31 Thread Ma Lin
Ma Lin added the comment: Is there hope to merge to 3.9 branch? -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue35859] Capture behavior depends on the order of an alternation

2019-03-19 Thread Ma Lin
Ma Lin added the comment: I guess PR12427 is mature enough for review, I have been working on it these days. You may review these commits one by one, commit message is review guide. https://github.com/python/cpython/pull/12427/commits Maybe you will need two or three days to understand it,

[issue35859] Capture behavior depends on the order of an alternation

2019-03-19 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +12381 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-12 Thread Ma Lin
Ma Lin added the comment: > Could you please create and run some microbenchmarks to measure > possible performance penalty of additional MARH_PUSHes? I am > especially interesting in worst cases. Besides the worst case, I prepared two solutions. Solution_A (PR12288): Fix the bugs, I can find

[issue35859] Capture behavior depends on the order of an alternation

2019-03-12 Thread Ma Lin
Change by Ma Lin : Added file: https://bugs.python.org/file48204/t.py ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-12 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +12271 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-12 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +12270 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-12 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +12269 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-04 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Thank you for your great work Ma Lin! But it will take a time to make a review of it. Could you please create and run some microbenchmarks to measure possible performance penalty of additional MARH_PUSHes? I am especially interesting in worst cases. If

[issue35859] Capture behavior depends on the order of an alternation

2019-03-04 Thread Ma Lin
Ma Lin added the comment: Found another bug in re: >>> re.match(r'(?:.*?\b(?=(\t)|(x))x)*', 'a\txa\tx').groups() ('\t', 'x') Expected result: (None, 'x') PHP 7.3.2 NULL, "x" Java 11.0.2 "\t", "x" Perl 5.28.1 "\t", "x" Ruby 2.6.1 nil, "x" Go 1.12

[issue35859] Capture behavior depends on the order of an alternation

2019-03-01 Thread Ma Lin
Change by Ma Lin : -- pull_requests: +12137 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-03-01 Thread Ma Lin
Ma Lin added the comment: The PR11756 is prepared. I force-pushed the patch in four steps, hope you can review it easier: https://github.com/python/cpython/pull/11756/commits  Step 1, test-cases Show the wrong behaviors before this fix, the corresponding test-case will be updated in next

[issue35859] Capture behavior depends on the order of an alternation

2019-02-22 Thread Ma Lin
Ma Lin added the comment: A bug harvest, see PR11756, maybe sre has more bugs. Those bug exist since Python 2. Any ideas from regular expression experts? -- ___ Python tracker

[issue35859] Capture behavior depends on the order of an alternation

2019-02-19 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: -11701 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-02-19 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- pull_requests: -11700 ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-02-09 Thread Ma Lin
Ma Lin added the comment: For a capture group, state->mark[] array stores it's begin and end: begin: state->mark[(group_number-1)*2] end: state->mark[(group_number-1)*2+1] So state->mark[0] is the begin of the first capture group. state->mark[1] is the end of the first capture group.

[issue35859] Capture behavior depends on the order of an alternation

2019-02-04 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch, patch, patch pull_requests: +11699, 11700, 11701 stage: -> patch review ___ Python tracker ___

[issue35859] Capture behavior depends on the order of an alternation

2019-02-04 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch, patch pull_requests: +11699, 11700 stage: -> patch review ___ Python tracker ___ ___

[issue35859] Capture behavior depends on the order of an alternation

2019-02-04 Thread Ma Lin
Change by Ma Lin : -- keywords: +patch pull_requests: +11699 stage: -> patch review ___ Python tracker ___ ___ Python-bugs-list

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Ma Lin
Ma Lin added the comment: You can `#define VERBOSE` in file `_sre.c`, it will print the engine's actual actions: |02FAC684|02FC7402|MARK 0 ... |02FAC6BC|02FC7401|MARK 1 In my computer, 02FC7400 points to "ab", 02FC7401 points 'b' in "ab", 02FC7402 points to the end of "ab". This capture

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Matthew Barnett
Matthew Barnett added the comment: It matches, and the span is (0, 2). The only way that it can match like that is for the capture group to match the 'a', and the final 'b' to match the 'b'. Therefore, re.search(r'(ab|a)*b', 'ab').groups() should be ('a', ), as it is for the pattern with a

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Raymond Hettinger
Change by Raymond Hettinger : -- nosy: +effbot, serhiy.storchaka, tim.peters ___ Python tracker ___ ___ Python-bugs-list mailing

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: I'm not at all clear on how these features should interact (alternation, non-greedy matching, and group capture). Was just pointing that the first-match behavior of alternation is the documented behavior and that re.DEBUG can be used to explore what the

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Matthew Barnett
Matthew Barnett added the comment: It looks like a bug in re to me. -- ___ Python tracker ___ ___ Python-bugs-list mailing list

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread James Davis
James Davis added the comment: Thanks for your thoughts, Raymond. I understand that the alternation has "short-circuit" behavior, but I still find it confusing in this case. Consider these two: Regex patternmatched? matched string captured content

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Raymond Hettinger
Raymond Hettinger added the comment: > I cannot see why changing the order of the alternation should have this > effect. The first regex, r'(a|ab)*?b', looks for the first alternative group by matching left-to-right [1] stopping at the first matching alternation "a". Roughly, the regex

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread Ma Lin
Change by Ma Lin : -- nosy: +Ma Lin ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue35859] Capture behavior depends on the order of an alternation

2019-01-30 Thread James Davis
New submission from James Davis : I have two regexes: /(a|ab)*?b/ and /(ab|a)*?b/. If I re.search the string "ab" for these regexes, I get inconsistent behavior. Specifically, /(a|ab)*?b/ matches with capture "a", while /(ab|a)*?b/ matches with an empty capture group. I am not actually sure