[Perl/perl5] f1601e: Pod::Html: Test --htmldir and --htmlroot separately

2023-01-11 Thread James E Keenan via perl5-changes
  Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209
  Home:   https://github.com/Perl/perl5
  Commit: f1601e42bdfeb71f8b709c00de772536ea074dcf
  
https://github.com/Perl/perl5/commit/f1601e42bdfeb71f8b709c00de772536ea074dcf
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M MANIFEST
M ext/Pod-Html/t/htmldir5.t
A ext/Pod-Html/t/htmldir7.t

  Log Message:
  ---
  Pod::Html: Test --htmldir and --htmlroot separately

The documentation advises that '--htmldir' and '--htmlroot' should not be used
in the same call to pod2html, as they are mutually exclusive.  However, two
files in the test suite have for a long time violated this advice.

This commit removes an instance of the "double call" from t/htmldir5.t and
moves a test of '--htmlroot' to new test file t/htmldir7.t.  (This new test
file will, however, use the same POD input as t/htmldir5.t.)  There is a
slight change in the HTML output, which is reflected in the "expected HTML" in
the DATA section of t/htmldir7.t.  Test descriptions are modified
appropriately.




[Perl/perl5] 475358: Pod::Html: Test --htmldir and --htmlroot separately

2023-01-11 Thread James E Keenan via perl5-changes
  Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209
  Home:   https://github.com/Perl/perl5
  Commit: 475358425e45c3e4019b47d40e473a2702d32bf9
  
https://github.com/Perl/perl5/commit/475358425e45c3e4019b47d40e473a2702d32bf9
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M MANIFEST
M ext/Pod-Html/t/htmldir1.t
A ext/Pod-Html/t/htmldir6.t

  Log Message:
  ---
  Pod::Html: Test --htmldir and --htmlroot separately

The documentation advises that '--htmldir' and '--htmlroot' should not be used
in the same call to pod2html, as they are mutually exclusive.  However, two
files in the test suite have for a long time violated this advice.

This commit removes an instance of the "double call" from t/htmldir1.t and
moves a test of '--htmlroot' to new test file t/htmldir6.t.  (This new test
file will, however, use the same POD input as t/htmldir1.t.)  Test descriptions
are modified appropriately.




[Perl/perl5] 1305ea: Increment $VERSION to 1.35 in all .pm files

2023-01-11 Thread James E Keenan via perl5-changes
  Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209
  Home:   https://github.com/Perl/perl5
  Commit: 1305eac89432bdfd1efac7bd5cbe598f8ac10be6
  
https://github.com/Perl/perl5/commit/1305eac89432bdfd1efac7bd5cbe598f8ac10be6
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/lib/Pod/Html.pm
M ext/Pod-Html/lib/Pod/Html/Util.pm
M ext/Pod-Html/t/lib/Testing.pm

  Log Message:
  ---
  Increment $VERSION to 1.35 in all .pm files




[Perl/perl5] 9d19ed: Standardize on 4-character indent for switches

2023-01-11 Thread James E Keenan via perl5-changes
  Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209
  Home:   https://github.com/Perl/perl5
  Commit: 9d19ed6a23e543d0447146a365ec77004a62515d
  
https://github.com/Perl/perl5/commit/9d19ed6a23e543d0447146a365ec77004a62515d
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html

  Log Message:
  ---
  Standardize on 4-character indent for switches


  Commit: 16b9c9a9d357477ec344291a8f608465b9711fce
  
https://github.com/Perl/perl5/commit/16b9c9a9d357477ec344291a8f608465b9711fce
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform --backlink, --nobacklink


  Commit: 8af6b2e229562820832563a141856200e1004f8e
  
https://github.com/Perl/perl5/commit/8af6b2e229562820832563a141856200e1004f8e
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M MANIFEST
A ext/Pod-Html/t/feature3.pod
A ext/Pod-Html/t/feature3.t

  Log Message:
  ---
  Pod-Html: explicitly test '--nobacklink'

Add dummy POD file and test file.


  Commit: 4735196de2a1d6ce76c716cf467ccff495c24211
  
https://github.com/Perl/perl5/commit/4735196de2a1d6ce76c716cf467ccff495c24211
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform --cachedir


  Commit: 83c75414f2d668156d042ea19b7013534ff42f60
  
https://github.com/Perl/perl5/commit/83c75414f2d668156d042ea19b7013534ff42f60
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/t/feature3.t

  Log Message:
  ---
  Use template value in expected html


  Commit: edb752f54658e753d09addda56076e93c3664a88
  
https://github.com/Perl/perl5/commit/edb752f54658e753d09addda56076e93c3664a88
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform docs for '--css'


  Commit: 35b8e6e4348c1d78de7dcf46ae50b2f8bb29ff0c
  
https://github.com/Perl/perl5/commit/35b8e6e4348c1d78de7dcf46ae50b2f8bb29ff0c
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform '--flush', '--header' and '--help'


  Commit: e0ac5532aead9f71ea500db2d25355de9720ef8e
  
https://github.com/Perl/perl5/commit/e0ac5532aead9f71ea500db2d25355de9720ef8e
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform '--htmldir'


  Commit: c5b52623f8393d3243f882501a380d0fa553f791
  
https://github.com/Perl/perl5/commit/c5b52623f8393d3243f882501a380d0fa553f791
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform '--index'; initial edit on '--htmlroot'


  Commit: cbe444632e0e969786423f8110ed4527fc0beb0f
  
https://github.com/Perl/perl5/commit/cbe444632e0e969786423f8110ed4527fc0beb0f
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/t/feature2.t

  Log Message:
  ---
  Explicit test of '--index'


  Commit: ddcfe2cc06a2c1f120004d2ee4eb7935cca888d9
  
https://github.com/Perl/perl5/commit/ddcfe2cc06a2c1f120004d2ee4eb7935cca888d9
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Correction on '--index'; conform '--infile'


  Commit: 7f57a5aa7f9b38ca4a3633dd5df68688c4ffb549
  
https://github.com/Perl/perl5/commit/7f57a5aa7f9b38ca4a3633dd5df68688c4ffb549
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html
M ext/Pod-Html/lib/Pod/Html.pm

  Log Message:
  ---
  Conform '--poderrors'


  Commit: fd2d01b43e071fb6b68197c82c39488b49e46a05
  
https://github.com/Perl/perl5/commit/fd2d01b43e071fb6b68197c82c39488b49e46a05
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/t/poderr.t

  Log Message:
  ---
  Explicit test for '--poderrors'


  Commit: 094cdbd9d72927d4b57f74eff5d6bc008ccbd2ed
  
https://github.com/Perl/perl5/commit/094cdbd9d72927d4b57f74eff5d6bc008ccbd2ed
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M ext/Pod-Html/bin/pod2html

  Log Message:
  ---
  Conform '--podroot'


  Commit: ec6962a66d4b979c8dbc8426b313012ee39763d7
  

[Perl/perl5] 97fa06: Replace FreeBSD URL's with new HTTPS ones

2023-01-11 Thread Elvin Aslanov via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 97fa06dbb856bb06338778dacd86751fa22f4f73
  
https://github.com/Perl/perl5/commit/97fa06dbb856bb06338778dacd86751fa22f4f73
  Author: Elvin Aslanov 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M README.freebsd
M caretx.c

  Log Message:
  ---
  Replace FreeBSD URL's with new HTTPS ones




[Perl/perl5] f91101: Correct one character typo appearing in lib/featur...

2023-01-11 Thread James E Keenan via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: f91101a0615d2706c3cc4ebc69a428df2363e927
  
https://github.com/Perl/perl5/commit/f91101a0615d2706c3cc4ebc69a428df2363e927
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M lib/feature.pm
M regen/feature.pl

  Log Message:
  ---
  Correct one character typo appearing in lib/feature.pm

Since lib/feature.pm is a generated file, the actual changes are made in
regen/feature.pl, followed by 'make regen' to regenerate lib/feature.pm
(and then followed by 'make test_porting') to confirm.


  Commit: 80474df5fe9d8237ccb1cb224b2a849e54014ecd
  
https://github.com/Perl/perl5/commit/80474df5fe9d8237ccb1cb224b2a849e54014ecd
  Author: James E Keenan 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M t/porting/regen.t

  Log Message:
  ---
  Hint should advise using 'make regen'

Per discussion by @demerphq in
https://github.com/Perl/perl5/pull/20682#issuecomment-1377536039.  The
'regen' programs should be run with your installed 'perl'.

Use single quote in heredoc, as $_ is no longer being interpolated (per
@JRaspass in
https://github.com/Perl/perl5/pull/20683#discussion_r1066294815).


Compare: https://github.com/Perl/perl5/compare/3f11a2855248...80474df5fe9d


[Perl/perl5] c128f4: t/re/re_rests - extend test to show more buffers

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/curlyx_curlym
  Home:   https://github.com/Perl/perl5
  Commit: c128f4426b843771b84e2f4e344905eb86dbe427
  
https://github.com/Perl/perl5/commit/c128f4426b843771b84e2f4e344905eb86dbe427
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M t/re/re_tests

  Log Message:
  ---
  t/re/re_rests - extend test to show more buffers

This is a tricky test, showing more buffers makes it a bit easier
to understand if you break it. (Guess what I did?)


  Commit: c68fac5f4bf38c2c0615f32c63e3f0c98ec1f3bd
  
https://github.com/Perl/perl5/commit/c68fac5f4bf38c2c0615f32c63e3f0c98ec1f3bd
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h
M regcomp_internal.h
M t/re/pat.t
M t/re/reg_mesg.t

  Log Message:
  ---
  regcomp.c - increase size of CURLY nodes so the min/max is a I32

This allows us to resolve a test inconsistency between CURLYX and CURLY
and CURLYM. We use I32 because the existing count logic uses -1 and
this keeps everything unsigned compatible.


  Commit: 22897d307282986e68a28989bdd42ba5430ac503
  
https://github.com/Perl/perl5/commit/22897d307282986e68a28989bdd42ba5430ac503
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp_internal.h
M regcomp_study.c

  Log Message:
  ---
  regcomp_study.c - Add a way to disable CURLYX optimisations

Also break up the condition so there is one condition per line so
it is more readable, and fold repeated binary tests together. This
makes it more obvious what the expression is doing.


  Commit: 9423873e18d9216ec98aed4df14ab114104931f8
  
https://github.com/Perl/perl5/commit/9423873e18d9216ec98aed4df14ab114104931f8
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp_debug.c
M regcomp_study.c
M t/re/pat_re_eval.t

  Log Message:
  ---
  regcomp_study.c - disable CURLYX optimizations when EVAL has been seen 
anywhere

Historically we disabled CURLYX optimizations when they
*contained* an EVAL, on the assumption that the optimization might
affect how many times, etc, the eval was called. However, this is
also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM
optimization can prune off the search space, then an eval afterwards
will be affected. An when you take into account GOSUB, it means that
an eval in front might be affected by an optimization after it.

So for now we disable CURLYN and CURLYM in any pattern with an EVAL.


  Commit: d07c6e339e942438eda58692f95cd00613408216
  
https://github.com/Perl/perl5/commit/d07c6e339e942438eda58692f95cd00613408216
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regexec.c

  Log Message:
  ---
  regexec.c - rework CLOSE_CAPTURE() macro to take a rex argument

This allows it to be used in contexts where rex isn't set up under
this name.


  Commit: b21d696822a07fed9cf0b0029ea94328985ae0b4
  
https://github.com/Perl/perl5/commit/b21d696822a07fed9cf0b0029ea94328985ae0b4
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp.c
M regcomp.h

  Log Message:
  ---
  regcomp.h - get rid of EXTRA_STEP defines

They are unused these days.


  Commit: 7a7955ec9b8e2745344cac828b3646b35e251fca
  
https://github.com/Perl/perl5/commit/7a7955ec9b8e2745344cac828b3646b35e251fca
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp.c

  Log Message:
  ---
  regcomp.c - add whitespace to binary operation

The tight & is hard to read.


  Commit: dc2e92ac67e541e0f6fd4903e2d0c433f8b274bf
  
https://github.com/Perl/perl5/commit/dc2e92ac67e541e0f6fd4903e2d0c433f8b274bf
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M regcomp_trie.c

  Log Message:
  ---
  regcomp_trie.c - use the indirect types so we are safe to changes

We shouldnt assume that a TRIEC is a regcomp_charclass. We have a per
opcode type exactly for this type of use, so lets use it.


  Commit: 9a58de08f43f69a9a96abfd6b90e0ba314e05f3e
  
https://github.com/Perl/perl5/commit/9a58de08f43f69a9a96abfd6b90e0ba314e05f3e
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M pod/perldebguts.pod
M pp_ctl.c
M regcomp.c
M regcomp.h
M regcomp.sym
M regcomp_debug.c
M regexec.c
M regexp.h
M regnodes.h
M t/re/pat.t
M t/re/pat_rt_report.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE)

CURLYX doesn't reset capture buffers properly. It is possible
for multiple buffers to be defined at once with values from
different iterations of the loop, which doesn't make sense really.

An example is this:

  "foobarfoo"=~/((foo)|(bar))+/

after this matches $1 should equal $2 and $3 should 

[Perl/perl5] a921ad: test.pl - add support for rtriming fresh perl output

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/re_capture
  Home:   https://github.com/Perl/perl5
  Commit: a921ad28c9d545fd1a83476e36c27ae85e2847ae
  
https://github.com/Perl/perl5/commit/a921ad28c9d545fd1a83476e36c27ae85e2847ae
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M t/test.pl

  Log Message:
  ---
  test.pl - add support for rtriming fresh perl output

This makes it easier to do regexp debug tests, where we don't care
about trailing whitespace.

It also fixes the line number reporting for fresh_perl_is() and
fresh_perl_like() so that it shows the actual place where the line
number is located, and it changes the relevant code to work properly
with external $Level overrides.


  Commit: 87b8f3376c1b7c826b45287601469ad417f764ac
  
https://github.com/Perl/perl5/commit/87b8f3376c1b7c826b45287601469ad417f764ac
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M handy.h

  Log Message:
  ---
  handy.h - add NewCopy() macro to combine New and Copy.


  Commit: 9bef0ac56fb62562e6f085ca2982385c51d874cc
  
https://github.com/Perl/perl5/commit/9bef0ac56fb62562e6f085ca2982385c51d874cc
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M embed.fnc
M embed.h
M mg.c
M proto.h
M regcomp.c
M regcomp_debug.c
M regcomp_internal.h
M regexec.c
M regexp.h
M t/re/pat_advanced.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c etc - rework branch reset so it works properly

Branch reset was hacked in without much thought about how it might interact
with other features. Over time we added named capture and recursive patterns
with GOSUB, but I guess because branch reset is somewhat esoteric we didnt
notice the accumulating issues related to it.

The main problem was my original hack used a fairly simple device to give
multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced
this was fine. When GOSUB was added later however, we overlooked at that this
broke a key part of the book-keeping for GOSUB.

A GOSUB regop needs to know where to jump to, and which close paren to stop
at. However the structure of the regexp program can change from the time the
regop is created. This means we keep track of every OPEN/CLOSE regop we
encounter during parsing, and when something is inserted into the middle of
the program we make sure to move the offsets we store for the OPEN/CLOSE data.
This is essentially keyed and scaled to the number of parens we have seen.
When branch reset is used however the number of OPEN/CLOSE regops is more than
the number of logical buffers we have seen, and we only move one of the
OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things.

Another issues with branch reset is that it creates weird artifacts like this:
/(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a)
capture buffer because they both have the same id. Another case is that you
cannot check if $+{b} matched and $+{a} did not, because conceptually they
were the same buffer under the hood.

These bugs are now fixed. The "aliasing" of capture buffers to each other is
now done virtually, and under the hood each capture buffer is distinct. We
introduce the concept of a "logical parno" which is the user visible capture
buffer id, and keep it distinct from the true capture buffer id. Most of the
internal logic uses the "true parno" for its business, so a bunch of problems
go away, and we keep maps from logical to physical parnos, and vice versa,
along with a map that gives use the "next physical parno with the same
logical parno". Thus we can quickly skip through the physical capture buffers
to find the one that matched. This means we also have to introduce a
logical_total_parens as well, to complement the already existing total_parens.
The latter refers to the true number of capture buffers. The former represents
the logical number visible to the user.

It is helpful to consider the following table:

  Logical:$1  $2 $3   $2 $3 $4 $2 $5
  Physical:1   2  34  5  6  7  8
  Next:0   4  57  0  0  0  0
  Pattern:   /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/

The names are mapped to physical buffers. So $+{b} will show what is in
physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched.
Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all
6 named buffers.

Since the need to map these values is rare, we only store these maps when they
are needed and branch reset has been used, when they are NULL it is assumed
that physical and logical buffers are identical.

Currently the way this change is implemented will likely break plug in regexp
engines because they will be missing the new logical_total_parens field at
the very least. Given that the perl internals code is somewhat poorly
abstracted from the regexp 

[Perl/perl5] 3f11a2: regexec engine - wrap and replace RX_OFFS() with b...

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: 3f11a2855248134af98ca8d71cf71a3fe736dbae
  
https://github.com/Perl/perl5/commit/3f11a2855248134af98ca8d71cf71a3fe736dbae
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M mg.c
M pp.c
M pp_ctl.c
M pp_hot.c
M regcomp.c
M regcomp_debug.c
M regexec.c
M regexp.h

  Log Message:
  ---
  regexec engine - wrap and replace RX_OFFS() with better abstractions

RX_OFFS() exposes a bit too much about how capture buffers are represented.
This adds RX_OFFS_START() and RX_OFFS_END() and RX_OFFS_VALID() to replace
most of the uses of the RX_OFFS() macro or direct access to the rx->off[]
array. (We add RX_OFFSp() for those rare cases that should have direct
access to the array.) This allows us to replace this logic with more
complicated macros in the future. Pretty much anything using RX_OFFS() is
going to be broken by future changes, so changing the define allows us to
track it down easily.

Not all use of the rx->offs[] array are converted; some uses are required
for the regex engine internals, but anything outside of the regex engine
should be using the replacement macros, and most things in the regex internals
should use it also.




[Perl/perl5]

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/wrap_cap_buf_macro
  Home:   https://github.com/Perl/perl5


[Perl/perl5] a9f676: test.pl - add support for rtriming fresh perl output

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/re_capture
  Home:   https://github.com/Perl/perl5
  Commit: a9f676271c3aac5dbb2a646da7d370e4f2a51ab9
  
https://github.com/Perl/perl5/commit/a9f676271c3aac5dbb2a646da7d370e4f2a51ab9
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M t/test.pl

  Log Message:
  ---
  test.pl - add support for rtriming fresh perl output

This makes it easier to do regexp debug tests, where we don't care
about trailing whitespace.

It also fixes the line number reporting for fresh_perl_is() and
fresh_perl_like() so that it shows the actual place where the line
number is located, and it changes the relevant code to work properly
with external $Level overrides.


  Commit: 683ebc3f8f6f479fb9a7ba66cab9374ad4f3bbfc
  
https://github.com/Perl/perl5/commit/683ebc3f8f6f479fb9a7ba66cab9374ad4f3bbfc
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M handy.h

  Log Message:
  ---
  handy.h - add NewCopy() macro to combine New and Copy.


  Commit: 66c1e1e2cd9aee0fc8791faf86917429230db73f
  
https://github.com/Perl/perl5/commit/66c1e1e2cd9aee0fc8791faf86917429230db73f
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M embed.fnc
M embed.h
M mg.c
M proto.h
M regcomp.c
M regcomp_debug.c
M regcomp_internal.h
M regexec.c
M regexp.h
M t/re/pat_advanced.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c etc - rework branch reset so it works properly

Branch reset was hacked in without much thought about how it might interact
with other features. Over time we added named capture and recursive patterns
with GOSUB, but I guess because branch reset is somewhat esoteric we didnt
notice the accumulating issues related to it.

The main problem was my original hack used a fairly simple device to give
multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced
this was fine. When GOSUB was added later however, we overlooked at that this
broke a key part of the book-keeping for GOSUB.

A GOSUB regop needs to know where to jump to, and which close paren to stop
at. However the structure of the regexp program can change from the time the
regop is created. This means we keep track of every OPEN/CLOSE regop we
encounter during parsing, and when something is inserted into the middle of
the program we make sure to move the offsets we store for the OPEN/CLOSE data.
This is essentially keyed and scaled to the number of parens we have seen.
When branch reset is used however the number of OPEN/CLOSE regops is more than
the number of logical buffers we have seen, and we only move one of the
OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things.

Another issues with branch reset is that it creates weird artifacts like this:
/(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a)
capture buffer because they both have the same id. Another case is that you
cannot check if $+{b} matched and $+{a} did not, because conceptually they
were the same buffer under the hood.

These bugs are now fixed. The "aliasing" of capture buffers to each other is
now done virtually, and under the hood each capture buffer is distinct. We
introduce the concept of a "logical parno" which is the user visible capture
buffer id, and keep it distinct from the true capture buffer id. Most of the
internal logic uses the "true parno" for its business, so a bunch of problems
go away, and we keep maps from logical to physical parnos, and vice versa,
along with a map that gives use the "next physical parno with the same
logical parno". Thus we can quickly skip through the physical capture buffers
to find the one that matched. This means we also have to introduce a
logical_total_parens as well, to complement the already existing total_parens.
The latter refers to the true number of capture buffers. The former represents
the logical number visible to the user.

It is helpful to consider the following table:

  Logical:$1  $2 $3   $2 $3 $4 $2 $5
  Physical:1   2  34  5  6  7  8
  Next:0   4  57  0  0  0  0
  Pattern:   /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/

The names are mapped to physical buffers. So $+{b} will show what is in
physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched.
Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all
6 named buffers.

Since the need to map these values is rare, we only store these maps when they
are needed and branch reset has been used, when they are NULL it is assumed
that physical and logical buffers are identical.

Currently the way this change is implemented will likely break plug in regexp
engines because they will be missing the new logical_total_parens field at
the very least. Given that the perl internals code is somewhat poorly
abstracted from the regexp 

[Perl/perl5] 91d8a5: regexec engine - wrap and replace RX_OFFS() with b...

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/wrap_cap_buf_macro
  Home:   https://github.com/Perl/perl5
  Commit: 91d8a59942b4e20c72026b53aaad7b89223ab656
  
https://github.com/Perl/perl5/commit/91d8a59942b4e20c72026b53aaad7b89223ab656
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M mg.c
M pp.c
M pp_ctl.c
M pp_hot.c
M regcomp.c
M regcomp_debug.c
M regexec.c
M regexp.h

  Log Message:
  ---
  regexec engine - wrap and replace RX_OFFS() with better abstractions

RX_OFFS() exposes a bit too much about how capture buffers are represented.
This adds RX_OFFS_START() and RX_OFFS_END() and RX_OFFS_VALID() to replace
most of the uses of the RX_OFFS() macro or direct access to the rx->off[]
array. (We add RX_OFFSp() for those rare cases that should have direct
access to the array.) This allows us to replace this logic with more
complicated macros in the future. Pretty much anything using RX_OFFS() is
going to be broken by future changes, so changing the define allows us to
track it down easily.

Not all use of the rx->offs[] array are converted; some uses are required
for the regex engine internals, but anything outside of the regex engine
should be using the replacement macros, and most things in the regex internals
should use it also.




[Perl/perl5] 77d4fd: test.pl - add support for rtriming fresh perl output

2023-01-11 Thread Yves Orton via perl5-changes
  Branch: refs/heads/yves/re_capture
  Home:   https://github.com/Perl/perl5
  Commit: 77d4fd3fd3c9546e888262d67a07b2b904c47437
  
https://github.com/Perl/perl5/commit/77d4fd3fd3c9546e888262d67a07b2b904c47437
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M t/test.pl

  Log Message:
  ---
  test.pl - add support for rtriming fresh perl output

This makes it easier to do regexp debug tests, where we don't care
about trailing whitespace.

It also fixes the line number reporting for fresh_perl_is() and
fresh_perl_like() so that it shows the actual place where the line
number is located, and it changes the relevant code to work properly
with external $Level overrides.


  Commit: b0e804579a3c3403623d77fa38910b2cab129c37
  
https://github.com/Perl/perl5/commit/b0e804579a3c3403623d77fa38910b2cab129c37
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M handy.h

  Log Message:
  ---
  handy.h - add NewCopy() macro to combine New and Copy.


  Commit: f25e8abd64fcba61e27651d5ddb2f55e41249deb
  
https://github.com/Perl/perl5/commit/f25e8abd64fcba61e27651d5ddb2f55e41249deb
  Author: Yves Orton 
  Date:   2023-01-11 (Wed, 11 Jan 2023)

  Changed paths:
M embed.fnc
M embed.h
M mg.c
M proto.h
M regcomp.c
M regcomp_debug.c
M regcomp_internal.h
M regexec.c
M regexp.h
M t/re/pat_advanced.t
M t/re/re_tests

  Log Message:
  ---
  regcomp.c etc - rework branch reset so it works properly

Branch reset was hacked in without much thought about how it might interact
with other features. Over time we added named capture and recursive patterns
with GOSUB, but I guess because branch reset is somewhat esoteric we didnt
notice the accumulating issues related to it.

The main problem was my original hack used a fairly simple device to give
multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced
this was fine. When GOSUB was added later however, we overlooked at that this
broke a key part of the book-keeping for GOSUB.

A GOSUB regop needs to know where to jump to, and which close paren to stop
at. However the structure of the regexp program can change from the time the
regop is created. This means we keep track of every OPEN/CLOSE regop we
encounter during parsing, and when something is inserted into the middle of
the program we make sure to move the offsets we store for the OPEN/CLOSE data.
This is essentially keyed and scaled to the number of parens we have seen.
When branch reset is used however the number of OPEN/CLOSE regops is more than
the number of logical buffers we have seen, and we only move one of the
OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things.

Another issues with branch reset is that it creates weird artifacts like this:
/(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a)
capture buffer because they both have the same id. Another case is that you
cannot check if $+{b} matched and $+{a} did not, because conceptually they
were the same buffer under the hood.

These bugs are now fixed. The "aliasing" of capture buffers to each other is
now done virtually, and under the hood each capture buffer is distinct. We
introduce the concept of a "logical parno" which is the user visible capture
buffer id, and keep it distinct from the true capture buffer id. Most of the
internal logic uses the "true parno" for its business, so a bunch of problems
go away, and we keep maps from logical to physical parnos, and vice versa,
along with a map that gives use the "next physical parno with the same
logical parno". Thus we can quickly skip through the physical capture buffers
to find the one that matched. This means we also have to introduce a
logical_total_parens as well, to complement the already existing total_parens.
The latter refers to the true number of capture buffers. The former represents
the logical number visible to the user.

It is helpful to consider the following table:

  Logical:$1  $2 $3   $2 $3 $4 $2 $5
  Physical:1   2  34  5  6  7  8
  Next:0   4  57  0  0  0  0
  Pattern:   /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/

The names are mapped to physical buffers. So $+{b} will show what is in
physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched.
Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all
6 named buffers.

Since the need to map these values is rare, we only store these maps when they
are needed and branch reset has been used, when they are NULL it is assumed
that physical and logical buffers are identical.

Currently the way this change is implemented will likely break plug in regexp
engines because they will be missing the new logical_total_parens field at
the very least. Given that the perl internals code is somewhat poorly
abstracted from the regexp