[Perl/perl5] 9e9592: regcomp.c - add optimistic eval

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curlyx_curlym
  Home:   https://github.com/Perl/perl5
  Commit: 9e959293ce1b7b9704e631ab54ab13a90ef5b3bc
  
https://github.com/Perl/perl5/commit/9e959293ce1b7b9704e631ab54ab13a90ef5b3bc
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldelta.pod
M pod/perlre.pod
M regcomp.c
M regcomp.h
M regcomp_debug.c
M regcomp_internal.h
M regcomp_study.c
M regexec.c
M regnodes.h
M t/re/pat_re_eval.t
M t/re/pat_rt_report.t
M toke.c

  Log Message:
  ---
  regcomp.c - add optimistic eval

This adds (*{ ... }) and (**{ ... }) as equivalents to
(?{ ... }) and (??{ ... }). The only difference being that
the star variants are "optimisitic" and are defined to never
disable optimisations.  This is especially relevant now that
use of (?{ ... }) prevents important optimisations anywhere
in the pattern, instead of the older and inconsistent rules
where it only affected the parts that contained the EVAL.

It is also very useful for injecting debugging style expressions
to the pattern to understand what the regex engine is actually
doing. The older style (?{ ... }) variants would change the
regex engines behavior, meaning this was not as effective a
tool as it could have been.


  Commit: 12dc87ffd0d1d88592aeddd0b1da3863a9e40f5e
  
https://github.com/Perl/perl5/commit/12dc87ffd0d1d88592aeddd0b1da3863a9e40f5e
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M pp_ctl.c
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/pat.t
M t/re/pat_rt_report.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE)

CURLYX doesn't reset capture buffers properly. It is possible
for multiple buffers to be defined at once with values from
different iterations of the loop, which doesn't make sense really.

An example is this:

  "foobarfoo"=~/((foo)|(bar))+/

after this matches $1 should equal $2 and $3 should be undefined,
or $1 should equal $3 and $2 should be undefined. Prior to this
patch this would not be the case.

The solution that this patches uses is to introduce a form of
"layered transactional storage" for paren data. The existing
pair of start/end data for capture data is extended with a
start_new/end_new pair. When the vast majority of our code wants
to check if a given capture buffer is defined they first check
"start_new/end_new", if either is -1 then they fall back to
whatever is in start/end.

When a capture buffer is CLOSEd the data is written into the
start_new/end_new pair instead of the start/end pair. When a CURLYX
loop is executing and has matched something (at least one "A" in
/A*B/ -- thus actually in WHILEM) it "commits" the start_new/end_new
data by writing it into start/end. When we begin a new iteration of
the loop we clear the start_new/end_new pairs that are contained by
the loop, by setting them to -1. If the loop fails then we roll back
as we used to. If the loop succeeds we continue. When we hit an END
block we commit everything.

Consider the example above. We start off with everything set to -1.

 $1 = (-1,-1):(-1,-1)
 $2 = (-1,-1):(-1,-1)
 $3 = (-1,-1):(-1,-1)

In the first iteration we have matched "foo" and end up with this:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):( 0, 3)
 $3 = (-1,-1):(-1,-1)

We commit the results of $2 and $3, and then clear the new data in
the beginning of the next loop:

 $1 = (-1,-1):( 0, 3)
 $2 = ( 0, 3):(-1,-1)
 $3 = (-1,-1):(-1,-1)

We then match "bar":

 $1 = (-1,-1):( 0, 3)
 $2 = ( 0, 3):(-1,-1)
 $3 = (-1,-1):( 3, 7)

and then commit the result and clear the new data:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):(-1,-1)
 $3 = ( 3, 7):(-1,-1)

and then we match "foo" again:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):( 7,10)
 $3 = ( 3, 7):(-1,-1)

And we then commit. We do a regcppush here as normal.

 $1 = (-1,-1):( 0, 3)
 $2 = ( 7,10):( 7,10)
 $3 = (-1,-1):(-1,-1)

We then clear it again, but since we don't match when we regcppop
we store the buffers back to the above layout. When we finally
hit the END buffer we also do a commit as well on all buffers, including
the 0th (for the full match).

Fixes GH Issue #18865, and adds tests for it and other things.


  Commit: 1cfe4e1985cfafa9d6c1f41bc46387236750e099
  
https://github.com/Perl/perl5/commit/1cfe4e1985cfafa9d6c1f41bc46387236750e099
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regcomp_trie.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/re_tests

  Log Message:
  ---
  regexec.c - teach BRANCH and BRANCHJ nodes to reset capture buffers

In /((a)(b)|(a))+/ we should not end up with $2 and $4 being set at
the same time. When a branch fails it should 

[Perl/perl5] 329fff: regcomp.c - add optimistic eval (*{ ... }) and (**...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/optimistic_eval
  Home:   https://github.com/Perl/perl5
  Commit: 329aa934cb54a43370945ee99e8731e459ff
  
https://github.com/Perl/perl5/commit/329aa934cb54a43370945ee99e8731e459ff
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldelta.pod
M pod/perlre.pod
M regcomp.c
M regcomp.h
M regcomp_debug.c
M regcomp_internal.h
M regcomp_study.c
M regexec.c
M regnodes.h
M t/re/pat_re_eval.t
M t/re/pat_rt_report.t
M toke.c

  Log Message:
  ---
  regcomp.c - add optimistic eval (*{ ... }) and (**{ ... })

This adds (*{ ... }) and (**{ ... }) as equivalents to (?{ ... }) and
(??{ ... }). The only difference being that the star variants are
"optimisitic" and are defined to never disable optimisations. This is
especially relevant now that use of (?{ ... }) prevents important
optimisations anywhere in the pattern, instead of the older and inconsistent
rules where it only affected the parts that contained the EVAL.

It is also very useful for injecting debugging style expressions to the
pattern to understand what the regex engine is actually doing. The older
style (?{ ... }) variants would change the regex engines behavior, meaning
this was not as effective a tool as it could have been.

Similarly it is now possible to test that a given regex optimisation
works correctly using (*{ ... }), which was not possible with (?{ ... }).




[Perl/perl5] 9768b3: regcomp.c - add optimistic eval (*{ ... }) and (**...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/optimistic_eval
  Home:   https://github.com/Perl/perl5
  Commit: 9768b35ca0c67fe2b29655fe2ccbc897df0ee263
  
https://github.com/Perl/perl5/commit/9768b35ca0c67fe2b29655fe2ccbc897df0ee263
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldelta.pod
M pod/perlre.pod
M regcomp.c
M regcomp.h
M regcomp_debug.c
M regcomp_internal.h
M regcomp_study.c
M regexec.c
M regnodes.h
M t/re/pat_re_eval.t
M t/re/pat_rt_report.t
M toke.c

  Log Message:
  ---
  regcomp.c - add optimistic eval (*{ ... }) and (**{ ... })

This adds (*{ ... }) and (**{ ... }) as equivalents to (?{ ... }) and
(??{ ... }). The only difference being that the star variants are
"optimisitic" and are defined to never disable optimisations. This is
especially relevant now that use of (?{ ... }) prevents important
optimisations anywhere in the pattern, instead of the older and inconsistent
rules where it only affected the parts that contained the EVAL.

It is also very useful for injecting debugging style expressions to the
pattern to understand what the regex engine is actually doing. The older
style (?{ ... }) variants would change the regex engines behavior, meaning
this was not as effective a tool as it could have been.

Similarly it is now possible to test that a given regex optimisation
works correctly using (*{ ... }), which was not possible with (?{ ... }).




[Perl/perl5] bc0177: Override *.h files as C with Linguist

2023-01-15 Thread Elvin Aslanov via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: bc01770b32f9d9bf9d313d2eec370c41822cd61c
  
https://github.com/Perl/perl5/commit/bc01770b32f9d9bf9d313d2eec370c41822cd61c
  Author: Elvin Aslanov 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M .gitattributes

  Log Message:
  ---
  Override *.h files as C with Linguist

GitHub classifies 23 files as C++ for some reason.
https://github.com/Perl/perl5/search?q=language%3AC%2B%2B=code
I believe Perl doesn't contain C++ code, and C++ headers can have a 
distinguishable .hh, .hpp, .hxx, and .h++ extensions.




[Perl/perl5] 5e3422: regcomp.c - Resolve issues clearing buffers in CUR...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curlyx_curlym
  Home:   https://github.com/Perl/perl5
  Commit: 5e3422bbf1af947a1319f5bd9ced09cfa48bf17b
  
https://github.com/Perl/perl5/commit/5e3422bbf1af947a1319f5bd9ced09cfa48bf17b
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M pp_ctl.c
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/pat.t
M t/re/pat_rt_report.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE)

CURLYX doesn't reset capture buffers properly. It is possible
for multiple buffers to be defined at once with values from
different iterations of the loop, which doesn't make sense really.

An example is this:

  "foobarfoo"=~/((foo)|(bar))+/

after this matches $1 should equal $2 and $3 should be undefined,
or $1 should equal $3 and $2 should be undefined. Prior to this
patch this would not be the case.

The solution that this patches uses is to introduce a form of
"layered transactional storage" for paren data. The existing
pair of start/end data for capture data is extended with a
start_new/end_new pair. When the vast majority of our code wants
to check if a given capture buffer is defined they first check
"start_new/end_new", if either is -1 then they fall back to
whatever is in start/end.

When a capture buffer is CLOSEd the data is written into the
start_new/end_new pair instead of the start/end pair. When a CURLYX
loop is executing and has matched something (at least one "A" in
/A*B/ -- thus actually in WHILEM) it "commits" the start_new/end_new
data by writing it into start/end. When we begin a new iteration of
the loop we clear the start_new/end_new pairs that are contained by
the loop, by setting them to -1. If the loop fails then we roll back
as we used to. If the loop succeeds we continue. When we hit an END
block we commit everything.

Consider the example above. We start off with everything set to -1.

 $1 = (-1,-1):(-1,-1)
 $2 = (-1,-1):(-1,-1)
 $3 = (-1,-1):(-1,-1)

In the first iteration we have matched "foo" and end up with this:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):( 0, 3)
 $3 = (-1,-1):(-1,-1)

We commit the results of $2 and $3, and then clear the new data in
the beginning of the next loop:

 $1 = (-1,-1):( 0, 3)
 $2 = ( 0, 3):(-1,-1)
 $3 = (-1,-1):(-1,-1)

We then match "bar":

 $1 = (-1,-1):( 0, 3)
 $2 = ( 0, 3):(-1,-1)
 $3 = (-1,-1):( 3, 7)

and then commit the result and clear the new data:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):(-1,-1)
 $3 = ( 3, 7):(-1,-1)

and then we match "foo" again:

 $1 = (-1,-1):( 0, 3)
 $2 = (-1,-1):( 7,10)
 $3 = ( 3, 7):(-1,-1)

And we then commit. We do a regcppush here as normal.

 $1 = (-1,-1):( 0, 3)
 $2 = ( 7,10):( 7,10)
 $3 = (-1,-1):(-1,-1)

We then clear it again, but since we don't match when we regcppop
we store the buffers back to the above layout. When we finally
hit the END buffer we also do a commit as well on all buffers, including
the 0th (for the full match).

Fixes GH Issue #18865, and adds tests for it and other things.


  Commit: 18d510bc522a1faef2f2d659a4435dcf0e9b0d62
  
https://github.com/Perl/perl5/commit/18d510bc522a1faef2f2d659a4435dcf0e9b0d62
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regcomp_trie.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/re_tests

  Log Message:
  ---
  regexec.c - teach BRANCH and BRANCHJ nodes to reset capture buffers

In /((a)(b)|(a))+/ we should not end up with $2 and $4 being set at
the same time. When a branch fails it should reset any capture buffers
that might be touched by its branch.

We change BRANCH and BRANCHJ to store the number of parens before the
branch, and the number of parens after the branch was completed. When
a BRANCH operation fails, we clear the buffers it contains before we
continue on.

It is a bit more complex than it should be because we have BRANCHJ
and BRANCH. (One of these days we should merge them together.)

This is also made somewhat more complex because TRIE nodes are actually
branches, and may need to track capture buffers also, at two levels.
The overall TRIE op, and for jump tries especially where we emulate
the behavior of branches. So we have to do the same clearing logic if
a trie branch fails as well.


  Commit: 9babd470eecba10fc402e34e231489d9b5e0a3f3
  
https://github.com/Perl/perl5/commit/9babd470eecba10fc402e34e231489d9b5e0a3f3
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldelta.pod
M pod/perlre.pod
M regcomp.c
M regcomp.h
M regcomp_debug.c
M regcomp_internal.h
M regcomp_study.c
M regexec.c
M regnodes.h
M t/re/pat_re_eval.t
M t/re/pat_rt_report.t
M toke.c

  Log Message:
  ---
  

[Perl/perl5]

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/regexec_fix_eval_stack_leak
  Home:   https://github.com/Perl/perl5


[Perl/perl5]

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/fix_accept_in_curlyx_whilem
  Home:   https://github.com/Perl/perl5


[Perl/perl5]

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/disable_curlyx_with_eval
  Home:   https://github.com/Perl/perl5


[Perl/perl5]

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curly_i32
  Home:   https://github.com/Perl/perl5


[Perl/perl5] 98ce67: regcomp_study.c - disable CURLYX optimizations whe...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 98ce67cb64ba29be3aa5fd5b81012a3aab873b8e
  
https://github.com/Perl/perl5/commit/98ce67cb64ba29be3aa5fd5b81012a3aab873b8e
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_debug.c
M regcomp_study.c
M t/re/pat_re_eval.t

  Log Message:
  ---
  regcomp_study.c - disable CURLYX optimizations when EVAL has been seen 
anywhere

Historically we disabled CURLYX optimizations when they
*contained* an EVAL, on the assumption that the optimization might
affect how many times, etc, the eval was called. However, this is
also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM
optimization can prune off the search space, then an eval afterwards
will be affected. An when you take into account GOSUB, it means that
an eval in front might be affected by an optimization after it.

So for now we disable CURLYN and CURLYM in any pattern with an EVAL.




[Perl/perl5] 067833: regcomp.c - increase size of CURLY nodes so the mi...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 0678333e684b55ba8877db1f865692713dacafc0
  
https://github.com/Perl/perl5/commit/0678333e684b55ba8877db1f865692713dacafc0
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h
M regcomp_internal.h
M t/re/pat.t
M t/re/reg_mesg.t

  Log Message:
  ---
  regcomp.c - increase size of CURLY nodes so the min/max is a I32

This allows us to resolve a test inconsistency between CURLYX and CURLY
and CURLYM, which have different maximums. We use I32 and not U32 because
the existing count logic uses -1 internally and using an I32 for the min/max
prevents warnings about comparing signed and unsigned values when the
count is compared against the min or max.




[Perl/perl5] 5c6240: regexec.c - fix accept in CURLYX/WHILEM construct.

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 5c6240fadac873b60c46677b4d5b180f4fb6074b
  
https://github.com/Perl/perl5/commit/5c6240fadac873b60c46677b4d5b180f4fb6074b
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M MANIFEST
M regexec.c
M t/re/pat_re_eval.t
M t/re/re_tests
M t/re/regexp.t
A t/re/regexp_normal.t

  Log Message:
  ---
  regexec.c - fix accept in CURLYX/WHILEM construct.

The ACCEPT logic didnt know how to handle WHILEM, which for
some reason does not have a next_off defined. I am not sure why.

This was revealed by forcing CURLYX optimisations off. This includes
a patch to test what happens if we embed an eval group in the tests
run by regexp.t when run via regexp_normal.t, which disabled CURLYX ->
CURLYN and CURLYM optimisations and revealed this issue.

This adds t/re/regexp_normal.t which test "normalized" forms of
the patterns in t/re/re_tests by munging them in various ways
to see if they still behave as expected. For instance injecting
a (?{}) can disable optimisations.




[Perl/perl5] 370405: regexec.c - fix memory leak in EVAL.

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 37040543d024b3ecb0aecd78849bd5af61408d02
  
https://github.com/Perl/perl5/commit/37040543d024b3ecb0aecd78849bd5af61408d02
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M MANIFEST
M ext/XS-APItest/APItest.xs
A ext/XS-APItest/t/savestack.t
M regexec.c

  Log Message:
  ---
  regexec.c - fix memory leak in EVAL.

EVAL was calling regcppush twice per invocation, once before executing the
callback, and once after. But not regcppop'ing twice. So each time we
would accumulate an extra "frame" of data. This is/was hidden somewhat by
the way we eventually "blow" the stack, so the extra data was just thrown
away at the end.

This removes the second set of pushes so that the save stack stays a stable
size as it unwinds from each failed eval.

We also weren't cleaning up after a (?{...}) when we failed to match to its
right. This unwinds the stack and restores the parens properly.

This adds tests to check how the save stack grows during patterns using
(?{ ... }) and (??{ ... }) and ensure that when we backtrack and re-execute
the EVAL again it cleans up the stack as it goes.




[Perl/perl5] ca6efc: regcomp_study.c - disable CURLYX optimizations whe...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/disable_curlyx_with_eval
  Home:   https://github.com/Perl/perl5
  Commit: ca6efcf3d98c1af754358760ec3dd3019a2f6010
  
https://github.com/Perl/perl5/commit/ca6efcf3d98c1af754358760ec3dd3019a2f6010
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_debug.c
M regcomp_study.c
M t/re/pat_re_eval.t

  Log Message:
  ---
  regcomp_study.c - disable CURLYX optimizations when EVAL has been seen 
anywhere

Historically we disabled CURLYX optimizations when they
*contained* an EVAL, on the assumption that the optimization might
affect how many times, etc, the eval was called. However, this is
also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM
optimization can prune off the search space, then an eval afterwards
will be affected. An when you take into account GOSUB, it means that
an eval in front might be affected by an optimization after it.

So for now we disable CURLYN and CURLYM in any pattern with an EVAL.




[Perl/perl5] 074a4d: regcomp_study.c - disable CURLYX optimizations whe...

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curlyx_curlym
  Home:   https://github.com/Perl/perl5
  Commit: 074a4d7419ab6f9e9292b4b46cd482f7662d8ed2
  
https://github.com/Perl/perl5/commit/074a4d7419ab6f9e9292b4b46cd482f7662d8ed2
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_debug.c
M regcomp_study.c
M t/re/pat_re_eval.t

  Log Message:
  ---
  regcomp_study.c - disable CURLYX optimizations when EVAL has been seen 
anywhere

Historically we disabled CURLYX optimizations when they
*contained* an EVAL, on the assumption that the optimization might
affect how many times, etc, the eval was called. However, this is
also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM
optimization can prune off the search space, then an eval afterwards
will be affected. An when you take into account GOSUB, it means that
an eval in front might be affected by an optimization after it.

So for now we disable CURLYN and CURLYM in any pattern with an EVAL.


  Commit: 33ce983f184eace1dea679a0349a5be2c5137484
  
https://github.com/Perl/perl5/commit/33ce983f184eace1dea679a0349a5be2c5137484
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M MANIFEST
M ext/XS-APItest/APItest.xs
A ext/XS-APItest/t/savestack.t
M regexec.c

  Log Message:
  ---
  regexec.c - fix memory leak in EVAL.

EVAL was calling regcppush twice per invocation, once before executing the
callback, and once after. But not regcppop'ing twice. So each time we
would accumulate an extra "frame" of data. This is/was hidden somewhat by
the way we eventually "blow" the stack, so the extra data was just thrown
away at the end.

This removes the second set of pushes so that the save stack stays a stable
size as it unwinds from each failed eval.

We also weren't cleaning up after a (?{...}) when we failed to match to its
right. This unwinds the stack and restores the parens properly.

This adds tests to check how the save stack grows during patterns using
(?{ ... }) and (??{ ... }) and ensure that when we backtrack and re-execute
the EVAL again it cleans up the stack as it goes.


  Commit: 64b1e8b2393d4733b9414a37db12d5842d97eaf4
  
https://github.com/Perl/perl5/commit/64b1e8b2393d4733b9414a37db12d5842d97eaf4
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M MANIFEST
M regexec.c
M t/re/pat_re_eval.t
M t/re/re_tests
M t/re/regexp.t
A t/re/regexp_normal.t

  Log Message:
  ---
  regexec.c - fix accept in CURLYX/WHILEM construct.

The ACCEPT logic didnt know how to handle WHILEM, which for
some reason does not have a next_off defined. I am not sure why.

This was revealed by forcing CURLYX optimisations off. This includes
a patch to test what happens if we embed an eval group in the tests
run by regexp.t when run via regexp_normal.t, which disabled CURLYX ->
CURLYN and CURLYM optimisations and revealed this issue.

This adds t/re/regexp_normal.t which test "normalized" forms of
the patterns in t/re/re_tests by munging them in various ways
to see if they still behave as expected. For instance injecting
a (?{}) can disable optimisations.


  Commit: b3655a49afabb87ebf787310f36f4a744146ef55
  
https://github.com/Perl/perl5/commit/b3655a49afabb87ebf787310f36f4a744146ef55
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h
M regcomp_internal.h
M t/re/pat.t
M t/re/reg_mesg.t

  Log Message:
  ---
  regcomp.c - increase size of CURLY nodes so the min/max is a I32

This allows us to resolve a test inconsistency between CURLYX and CURLY
and CURLYM. We use I32 because the existing count logic uses -1 and
this keeps everything unsigned compatible.


  Commit: a492304b44435a9dced29fa724da65a31b3aabd8
  
https://github.com/Perl/perl5/commit/a492304b44435a9dced29fa724da65a31b3aabd8
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M pp_ctl.c
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/pat.t
M t/re/pat_rt_report.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE)

CURLYX doesn't reset capture buffers properly. It is possible
for multiple buffers to be defined at once with values from
different iterations of the loop, which doesn't make sense really.

An example is this:

  "foobarfoo"=~/((foo)|(bar))+/

after this matches $1 should equal $2 and $3 should be undefined,
or $1 should equal $3 and $2 should be undefined. Prior to this
patch this would not be the case.

The solution that this patches uses is to introduce a form of
"layered transactional storage" for paren data. The existing
pair of start/end data for capture data is extended with a
start_new/end_new pair. When the vast majority of our code 

[Perl/perl5]

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/regex_prep_patches
  Home:   https://github.com/Perl/perl5


[Perl/perl5] ee2168: t/re/re_rests - extend test to show more buffers

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: ee2168cf3ae08f222fd07cf66fb88a26c15e6306
  
https://github.com/Perl/perl5/commit/ee2168cf3ae08f222fd07cf66fb88a26c15e6306
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M t/re/re_tests

  Log Message:
  ---
  t/re/re_rests - extend test to show more buffers

This is a tricky test, showing more buffers makes it a bit easier
to understand if you break it. (Guess what I did?)


  Commit: c5b1c090dbd52c47488c0f80eecb9cb7fa6f93e3
  
https://github.com/Perl/perl5/commit/c5b1c090dbd52c47488c0f80eecb9cb7fa6f93e3
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_internal.h
M regcomp_study.c

  Log Message:
  ---
  regcomp_study.c - Add a way to disable CURLYX optimisations

Also break up the condition so there is one condition per line so
it is more readable, and fold repeated binary tests together. This
makes it more obvious what the expression is doing.


  Commit: 0b5fb5dd6851cc2ffc94d9d28add98cc3f441ead
  
https://github.com/Perl/perl5/commit/0b5fb5dd6851cc2ffc94d9d28add98cc3f441ead
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regexec.c

  Log Message:
  ---
  regexec.c - rework CLOSE_CAPTURE() to take rex as an arg to enable reuse.

This also splits up CLOSE_CAPTURE() into two parts, with the important parts
implemented by CLOSE_ANY_CAPTURE(), and the debugging parts in
CLOSE_CAPTURE(). This allows it to be used in contexts where the regexp
structure isn't set up under the name 'rex', and where the debugging output it
includes might not be relevant or possible to produce.

This encapsulates all the places that "close" a capture buffer, and ensures
that they are closed properly. One important case in particular cannot use
CLOSE_CAPTURE() directly, as it does not have a 'rex' variable in scope (it is
called prog in this function), nor the debugging context used in normal
execution of CLOSE_CAPTURE(). Using CLOSE_ANY_CAPTURE() instead means all the
main points that update capture buffer state use the same macro API.


  Commit: b1ad323637ffe843cc851bc790fd31813638147d
  
https://github.com/Perl/perl5/commit/b1ad323637ffe843cc851bc790fd31813638147d
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h

  Log Message:
  ---
  regcomp.h - get rid of EXTRA_STEP defines

They are unused these days.


  Commit: 0e946b8626799edfa80f978f41e9abb045579c24
  
https://github.com/Perl/perl5/commit/0e946b8626799edfa80f978f41e9abb045579c24
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c

  Log Message:
  ---
  regcomp.c - add whitespace to binary operation

The tight & is hard to read.


  Commit: 3645ca4ee1a59fae1a6d6817c4582968ffd0a731
  
https://github.com/Perl/perl5/commit/3645ca4ee1a59fae1a6d6817c4582968ffd0a731
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_trie.c

  Log Message:
  ---
  regcomp_trie.c - use the indirect types so we are safe to changes

We shouldnt assume that a TRIEC is a regcomp_charclass. We have a per
opcode type exactly for this type of use, so lets use it.


Compare: https://github.com/Perl/perl5/compare/41af9f428a96...3645ca4ee1a5


[Perl/perl5] b8914e: perl.h: Make sure PERL_IMPLICIT_CONTEXT doesn't co...

2023-01-15 Thread Karl Williamson via perl5-changes
  Branch: refs/heads/smoke-me/khw-env
  Home:   https://github.com/Perl/perl5
  Commit: b8914ecb7d2fcf9a84a920ee57d824cc89b8fbc5
  
https://github.com/Perl/perl5/commit/b8914ecb7d2fcf9a84a920ee57d824cc89b8fbc5
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M perl.h

  Log Message:
  ---
  perl.h: Make sure PERL_IMPLICIT_CONTEXT doesn't come back

This is an obsolete name, retained for back compat with cpan.  Make sure
the core doesn't have it defined.


  Commit: a83862b66c093b2c547de55f3021745855f84402
  
https://github.com/Perl/perl5/commit/a83862b66c093b2c547de55f3021745855f84402
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M pp.c

  Log Message:
  ---
  pp.c: Need to lock NUMERIC category only

This was doing a general locale lock, but only LC_NUMERIC is needed, and
a future commit will want to know that it is specifically LC_NUMERIC
that is affected.


  Commit: 3e8c9e530d770e60d3c74110663437be344b925e
  
https://github.com/Perl/perl5/commit/3e8c9e530d770e60d3c74110663437be344b925e
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M t/porting/customized.dat
M vutil.c

  Log Message:
  ---
  vutil.c: Clean up white space

Change tabs to blanks; Fix indentation; chomp trailing white space

Remove some blank lines that don't contribute to readability


  Commit: b43f0e5d4f7a4c0185d610a3bcb0ec4bbdb18e71
  
https://github.com/Perl/perl5/commit/b43f0e5d4f7a4c0185d610a3bcb0ec4bbdb18e71
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M cpan/Archive-Tar/t/02_methods.t

  Log Message:
  ---
  XXX skip Archive-Tar because of symlinks


  Commit: 730b30b12596093dcaff645ab99e447bca2f8634
  
https://github.com/Perl/perl5/commit/730b30b12596093dcaff645ab99e447bca2f8634
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M t/porting/cmp_version.t

  Log Message:
  ---
  XXX skip cmp_version.t because of sym links


  Commit: f5278eeb7659adf8f40e9bf3005914b394a134e7
  
https://github.com/Perl/perl5/commit/f5278eeb7659adf8f40e9bf3005914b394a134e7
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M perl.h

  Log Message:
  ---
  XXX temp to test broken lconv on non-Windows


  Commit: 0651608535eb044e409d3b6ad504035ed2909eaa
  
https://github.com/Perl/perl5/commit/0651608535eb044e409d3b6ad504035ed2909eaa
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M cpan/Sys-Syslog/t/syslog-inet-udp.t
M cpan/Sys-Syslog/t/syslog.t

  Log Message:
  ---
  XXX skip syslog tests because fail without LC_TIME


  Commit: 6baad763453094933bb5ba02d61dd71c29a98842
  
https://github.com/Perl/perl5/commit/6baad763453094933bb5ba02d61dd71c29a98842
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M Configure

  Log Message:
  ---
  XXX Configure temporary to get no_nl, etc working


  Commit: 44b06b709d9a2bcaa56e3616e8576bf8fce463ed
  
https://github.com/Perl/perl5/commit/44b06b709d9a2bcaa56e3616e8576bf8fce463ed
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M Configure
M win32/config_H.gc
M win32/config_H.vc

  Log Message:
  ---
  Regenerate Configure after metaconfig backports applied


  Commit: cea6dab399a3d1f3fa9feb6d56be35af4c4220ae
  
https://github.com/Perl/perl5/commit/cea6dab399a3d1f3fa9feb6d56be35af4c4220ae
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M Configure
M config_h.SH
M uconfig.h
M win32/config_H.gc
M win32/config_H.vc

  Log Message:
  ---
  Regenerate Configure after rm thread-safe nl_langinfo_l


  Commit: 312aa449aae44bc328e1ded487ac357f85ac998f
  
https://github.com/Perl/perl5/commit/312aa449aae44bc328e1ded487ac357f85ac998f
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M Configure
M Cross/config.sh-arm-linux
M Cross/config.sh-arm-linux-n770
M Porting/config.sh
M config_h.SH
M configure.com
M metaconfig.h
M plan9/config_sh.sample
M uconfig.h
M uconfig.sh
M uconfig64.sh
M win32/config.gc
M win32/config.vc
M win32/config_H.gc
M win32/config_H.vc

  Log Message:
  ---
  No count Regenerate Configure after LC_ALL


  Commit: 673d167b0f441e3c8a1c020db8c381e9d6eb21be
  
https://github.com/Perl/perl5/commit/673d167b0f441e3c8a1c020db8c381e9d6eb21be
  Author: Karl Williamson 
  Date:   2023-01-14 (Sat, 14 Jan 2023)

  Changed paths:
M uconfig.h

  Log Message:
  ---
  config


  Commit: 593d68cd1faeb7e062f719f570b138bdc50ef79e
  
https://github.com/Perl/perl5/commit/593d68cd1faeb7e062f719f570b138bdc50ef79e
  Author: Karl Williamson 
  Date:   

[Perl/perl5] 88434c: t/re/re_rests - extend test to show more buffers

2023-01-15 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curlyx_curlym
  Home:   https://github.com/Perl/perl5
  Commit: 88434cc3bc39d9907d982d99e8256f3e041c79ef
  
https://github.com/Perl/perl5/commit/88434cc3bc39d9907d982d99e8256f3e041c79ef
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M t/re/re_tests

  Log Message:
  ---
  t/re/re_rests - extend test to show more buffers

This is a tricky test, showing more buffers makes it a bit easier
to understand if you break it. (Guess what I did?)


  Commit: b06fbb4fdf8e0ba55286dafb629937b24ec9adb5
  
https://github.com/Perl/perl5/commit/b06fbb4fdf8e0ba55286dafb629937b24ec9adb5
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_internal.h
M regcomp_study.c

  Log Message:
  ---
  regcomp_study.c - Add a way to disable CURLYX optimisations

Also break up the condition so there is one condition per line so
it is more readable, and fold repeated binary tests together. This
makes it more obvious what the expression is doing.


  Commit: 59022ac481b0b72b7c14f1fcfd0e3090bcbdab95
  
https://github.com/Perl/perl5/commit/59022ac481b0b72b7c14f1fcfd0e3090bcbdab95
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regexec.c

  Log Message:
  ---
  regexec.c - rework CLOSE_CAPTURE() to take rex as an arg to enable reuse.

This also splits up CLOSE_CAPTURE() into two parts, with the important parts
implemented by CLOSE_ANY_CAPTURE(), and the debugging parts in
CLOSE_CAPTURE(). This allows it to be used in contexts where the regexp
structure isn't set up under the name 'rex', and where the debugging output it
includes might not be relevant or possible to produce.

This encapsulates all the places that "close" a capture buffer, and ensures
that they are closed properly. One important case in particular cannot use
CLOSE_CAPTURE() directly, as it does not have a 'rex' variable in scope (it is
called prog in this function), nor the debugging context used in normal
execution of CLOSE_CAPTURE(). Using CLOSE_ANY_CAPTURE() instead means all the
main points that update capture buffer state use the same macro API.


  Commit: 84e90860ab0be2e5997a91225a358f946210bc7f
  
https://github.com/Perl/perl5/commit/84e90860ab0be2e5997a91225a358f946210bc7f
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h

  Log Message:
  ---
  regcomp.h - get rid of EXTRA_STEP defines

They are unused these days.


  Commit: d3fa340271178edacc139aa4169a556e9d1a9ec5
  
https://github.com/Perl/perl5/commit/d3fa340271178edacc139aa4169a556e9d1a9ec5
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp.c

  Log Message:
  ---
  regcomp.c - add whitespace to binary operation

The tight & is hard to read.


  Commit: 784fccd8e85cca906c20b5e97b5088d2f667ff92
  
https://github.com/Perl/perl5/commit/784fccd8e85cca906c20b5e97b5088d2f667ff92
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_trie.c

  Log Message:
  ---
  regcomp_trie.c - use the indirect types so we are safe to changes

We shouldnt assume that a TRIEC is a regcomp_charclass. We have a per
opcode type exactly for this type of use, so lets use it.


  Commit: ae66e45b465d5f1a186a0f449bdb8597bd754f84
  
https://github.com/Perl/perl5/commit/ae66e45b465d5f1a186a0f449bdb8597bd754f84
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M regcomp_debug.c
M regcomp_study.c
M t/re/pat_re_eval.t

  Log Message:
  ---
  regcomp_study.c - disable CURLYX optimizations when EVAL has been seen 
anywhere

Historically we disabled CURLYX optimizations when they
*contained* an EVAL, on the assumption that the optimization might
affect how many times, etc, the eval was called. However, this is
also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM
optimization can prune off the search space, then an eval afterwards
will be affected. An when you take into account GOSUB, it means that
an eval in front might be affected by an optimization after it.

So for now we disable CURLYN and CURLYM in any pattern with an EVAL.


  Commit: 8bd6445f12d47068a263b576fc04146a9a2886b4
  
https://github.com/Perl/perl5/commit/8bd6445f12d47068a263b576fc04146a9a2886b4
  Author: Yves Orton 
  Date:   2023-01-15 (Sun, 15 Jan 2023)

  Changed paths:
M MANIFEST
M ext/XS-APItest/APItest.xs
A ext/XS-APItest/t/savestack.t
M regexec.c

  Log Message:
  ---
  regexec.c - fix memory leak in EVAL.

EVAL was calling regcppush twice per invocation, once before executing the
callback, and once after. But not regcppop'ing twice. So each time we
would accumulate an extra "frame" of data. This is/was hidden somewhat by
the way we eventually "blow" the stack, so the extra data was just thrown
away at the end.

This removes