[Perl/perl5] f1601e: Pod::Html: Test --htmldir and --htmlroot separately
Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209 Home: https://github.com/Perl/perl5 Commit: f1601e42bdfeb71f8b709c00de772536ea074dcf https://github.com/Perl/perl5/commit/f1601e42bdfeb71f8b709c00de772536ea074dcf Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M MANIFEST M ext/Pod-Html/t/htmldir5.t A ext/Pod-Html/t/htmldir7.t Log Message: --- Pod::Html: Test --htmldir and --htmlroot separately The documentation advises that '--htmldir' and '--htmlroot' should not be used in the same call to pod2html, as they are mutually exclusive. However, two files in the test suite have for a long time violated this advice. This commit removes an instance of the "double call" from t/htmldir5.t and moves a test of '--htmlroot' to new test file t/htmldir7.t. (This new test file will, however, use the same POD input as t/htmldir5.t.) There is a slight change in the HTML output, which is reflected in the "expected HTML" in the DATA section of t/htmldir7.t. Test descriptions are modified appropriately.
[Perl/perl5] 475358: Pod::Html: Test --htmldir and --htmlroot separately
Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209 Home: https://github.com/Perl/perl5 Commit: 475358425e45c3e4019b47d40e473a2702d32bf9 https://github.com/Perl/perl5/commit/475358425e45c3e4019b47d40e473a2702d32bf9 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M MANIFEST M ext/Pod-Html/t/htmldir1.t A ext/Pod-Html/t/htmldir6.t Log Message: --- Pod::Html: Test --htmldir and --htmlroot separately The documentation advises that '--htmldir' and '--htmlroot' should not be used in the same call to pod2html, as they are mutually exclusive. However, two files in the test suite have for a long time violated this advice. This commit removes an instance of the "double call" from t/htmldir1.t and moves a test of '--htmlroot' to new test file t/htmldir6.t. (This new test file will, however, use the same POD input as t/htmldir1.t.) Test descriptions are modified appropriately.
[Perl/perl5] 1305ea: Increment $VERSION to 1.35 in all .pm files
Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209 Home: https://github.com/Perl/perl5 Commit: 1305eac89432bdfd1efac7bd5cbe598f8ac10be6 https://github.com/Perl/perl5/commit/1305eac89432bdfd1efac7bd5cbe598f8ac10be6 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/lib/Pod/Html.pm M ext/Pod-Html/lib/Pod/Html/Util.pm M ext/Pod-Html/t/lib/Testing.pm Log Message: --- Increment $VERSION to 1.35 in all .pm files
[Perl/perl5] 9d19ed: Standardize on 4-character indent for switches
Branch: refs/heads/smoke-me/jkeenan/pod-html-docs-conformance-20221209 Home: https://github.com/Perl/perl5 Commit: 9d19ed6a23e543d0447146a365ec77004a62515d https://github.com/Perl/perl5/commit/9d19ed6a23e543d0447146a365ec77004a62515d Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html Log Message: --- Standardize on 4-character indent for switches Commit: 16b9c9a9d357477ec344291a8f608465b9711fce https://github.com/Perl/perl5/commit/16b9c9a9d357477ec344291a8f608465b9711fce Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform --backlink, --nobacklink Commit: 8af6b2e229562820832563a141856200e1004f8e https://github.com/Perl/perl5/commit/8af6b2e229562820832563a141856200e1004f8e Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M MANIFEST A ext/Pod-Html/t/feature3.pod A ext/Pod-Html/t/feature3.t Log Message: --- Pod-Html: explicitly test '--nobacklink' Add dummy POD file and test file. Commit: 4735196de2a1d6ce76c716cf467ccff495c24211 https://github.com/Perl/perl5/commit/4735196de2a1d6ce76c716cf467ccff495c24211 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform --cachedir Commit: 83c75414f2d668156d042ea19b7013534ff42f60 https://github.com/Perl/perl5/commit/83c75414f2d668156d042ea19b7013534ff42f60 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/t/feature3.t Log Message: --- Use template value in expected html Commit: edb752f54658e753d09addda56076e93c3664a88 https://github.com/Perl/perl5/commit/edb752f54658e753d09addda56076e93c3664a88 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform docs for '--css' Commit: 35b8e6e4348c1d78de7dcf46ae50b2f8bb29ff0c https://github.com/Perl/perl5/commit/35b8e6e4348c1d78de7dcf46ae50b2f8bb29ff0c Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform '--flush', '--header' and '--help' Commit: e0ac5532aead9f71ea500db2d25355de9720ef8e https://github.com/Perl/perl5/commit/e0ac5532aead9f71ea500db2d25355de9720ef8e Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform '--htmldir' Commit: c5b52623f8393d3243f882501a380d0fa553f791 https://github.com/Perl/perl5/commit/c5b52623f8393d3243f882501a380d0fa553f791 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform '--index'; initial edit on '--htmlroot' Commit: cbe444632e0e969786423f8110ed4527fc0beb0f https://github.com/Perl/perl5/commit/cbe444632e0e969786423f8110ed4527fc0beb0f Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/t/feature2.t Log Message: --- Explicit test of '--index' Commit: ddcfe2cc06a2c1f120004d2ee4eb7935cca888d9 https://github.com/Perl/perl5/commit/ddcfe2cc06a2c1f120004d2ee4eb7935cca888d9 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Correction on '--index'; conform '--infile' Commit: 7f57a5aa7f9b38ca4a3633dd5df68688c4ffb549 https://github.com/Perl/perl5/commit/7f57a5aa7f9b38ca4a3633dd5df68688c4ffb549 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html M ext/Pod-Html/lib/Pod/Html.pm Log Message: --- Conform '--poderrors' Commit: fd2d01b43e071fb6b68197c82c39488b49e46a05 https://github.com/Perl/perl5/commit/fd2d01b43e071fb6b68197c82c39488b49e46a05 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/t/poderr.t Log Message: --- Explicit test for '--poderrors' Commit: 094cdbd9d72927d4b57f74eff5d6bc008ccbd2ed https://github.com/Perl/perl5/commit/094cdbd9d72927d4b57f74eff5d6bc008ccbd2ed Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M ext/Pod-Html/bin/pod2html Log Message: --- Conform '--podroot' Commit: ec6962a66d4b979c8dbc8426b313012ee39763d7
[Perl/perl5] 97fa06: Replace FreeBSD URL's with new HTTPS ones
Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: 97fa06dbb856bb06338778dacd86751fa22f4f73 https://github.com/Perl/perl5/commit/97fa06dbb856bb06338778dacd86751fa22f4f73 Author: Elvin Aslanov Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M README.freebsd M caretx.c Log Message: --- Replace FreeBSD URL's with new HTTPS ones
[Perl/perl5] f91101: Correct one character typo appearing in lib/featur...
Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: f91101a0615d2706c3cc4ebc69a428df2363e927 https://github.com/Perl/perl5/commit/f91101a0615d2706c3cc4ebc69a428df2363e927 Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M lib/feature.pm M regen/feature.pl Log Message: --- Correct one character typo appearing in lib/feature.pm Since lib/feature.pm is a generated file, the actual changes are made in regen/feature.pl, followed by 'make regen' to regenerate lib/feature.pm (and then followed by 'make test_porting') to confirm. Commit: 80474df5fe9d8237ccb1cb224b2a849e54014ecd https://github.com/Perl/perl5/commit/80474df5fe9d8237ccb1cb224b2a849e54014ecd Author: James E Keenan Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M t/porting/regen.t Log Message: --- Hint should advise using 'make regen' Per discussion by @demerphq in https://github.com/Perl/perl5/pull/20682#issuecomment-1377536039. The 'regen' programs should be run with your installed 'perl'. Use single quote in heredoc, as $_ is no longer being interpolated (per @JRaspass in https://github.com/Perl/perl5/pull/20683#discussion_r1066294815). Compare: https://github.com/Perl/perl5/compare/3f11a2855248...80474df5fe9d
[Perl/perl5] c128f4: t/re/re_rests - extend test to show more buffers
Branch: refs/heads/yves/curlyx_curlym Home: https://github.com/Perl/perl5 Commit: c128f4426b843771b84e2f4e344905eb86dbe427 https://github.com/Perl/perl5/commit/c128f4426b843771b84e2f4e344905eb86dbe427 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M t/re/re_tests Log Message: --- t/re/re_rests - extend test to show more buffers This is a tricky test, showing more buffers makes it a bit easier to understand if you break it. (Guess what I did?) Commit: c68fac5f4bf38c2c0615f32c63e3f0c98ec1f3bd https://github.com/Perl/perl5/commit/c68fac5f4bf38c2c0615f32c63e3f0c98ec1f3bd Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp.c M regcomp.h M regcomp_internal.h M t/re/pat.t M t/re/reg_mesg.t Log Message: --- regcomp.c - increase size of CURLY nodes so the min/max is a I32 This allows us to resolve a test inconsistency between CURLYX and CURLY and CURLYM. We use I32 because the existing count logic uses -1 and this keeps everything unsigned compatible. Commit: 22897d307282986e68a28989bdd42ba5430ac503 https://github.com/Perl/perl5/commit/22897d307282986e68a28989bdd42ba5430ac503 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp_internal.h M regcomp_study.c Log Message: --- regcomp_study.c - Add a way to disable CURLYX optimisations Also break up the condition so there is one condition per line so it is more readable, and fold repeated binary tests together. This makes it more obvious what the expression is doing. Commit: 9423873e18d9216ec98aed4df14ab114104931f8 https://github.com/Perl/perl5/commit/9423873e18d9216ec98aed4df14ab114104931f8 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp_debug.c M regcomp_study.c M t/re/pat_re_eval.t Log Message: --- regcomp_study.c - disable CURLYX optimizations when EVAL has been seen anywhere Historically we disabled CURLYX optimizations when they *contained* an EVAL, on the assumption that the optimization might affect how many times, etc, the eval was called. However, this is also true for CURLYX with evals *afterwards*. If the CURLYN or CURLYM optimization can prune off the search space, then an eval afterwards will be affected. An when you take into account GOSUB, it means that an eval in front might be affected by an optimization after it. So for now we disable CURLYN and CURLYM in any pattern with an EVAL. Commit: d07c6e339e942438eda58692f95cd00613408216 https://github.com/Perl/perl5/commit/d07c6e339e942438eda58692f95cd00613408216 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regexec.c Log Message: --- regexec.c - rework CLOSE_CAPTURE() macro to take a rex argument This allows it to be used in contexts where rex isn't set up under this name. Commit: b21d696822a07fed9cf0b0029ea94328985ae0b4 https://github.com/Perl/perl5/commit/b21d696822a07fed9cf0b0029ea94328985ae0b4 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp.c M regcomp.h Log Message: --- regcomp.h - get rid of EXTRA_STEP defines They are unused these days. Commit: 7a7955ec9b8e2745344cac828b3646b35e251fca https://github.com/Perl/perl5/commit/7a7955ec9b8e2745344cac828b3646b35e251fca Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp.c Log Message: --- regcomp.c - add whitespace to binary operation The tight & is hard to read. Commit: dc2e92ac67e541e0f6fd4903e2d0c433f8b274bf https://github.com/Perl/perl5/commit/dc2e92ac67e541e0f6fd4903e2d0c433f8b274bf Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M regcomp_trie.c Log Message: --- regcomp_trie.c - use the indirect types so we are safe to changes We shouldnt assume that a TRIEC is a regcomp_charclass. We have a per opcode type exactly for this type of use, so lets use it. Commit: 9a58de08f43f69a9a96abfd6b90e0ba314e05f3e https://github.com/Perl/perl5/commit/9a58de08f43f69a9a96abfd6b90e0ba314e05f3e Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M pod/perldebguts.pod M pp_ctl.c M regcomp.c M regcomp.h M regcomp.sym M regcomp_debug.c M regexec.c M regexp.h M regnodes.h M t/re/pat.t M t/re/pat_rt_report.t M t/re/re_tests Log Message: --- regcomp.c - Resolve issues clearing buffers in CURLYX (MAJOR-CHANGE) CURLYX doesn't reset capture buffers properly. It is possible for multiple buffers to be defined at once with values from different iterations of the loop, which doesn't make sense really. An example is this: "foobarfoo"=~/((foo)|(bar))+/ after this matches $1 should equal $2 and $3 should
[Perl/perl5] a921ad: test.pl - add support for rtriming fresh perl output
Branch: refs/heads/yves/re_capture Home: https://github.com/Perl/perl5 Commit: a921ad28c9d545fd1a83476e36c27ae85e2847ae https://github.com/Perl/perl5/commit/a921ad28c9d545fd1a83476e36c27ae85e2847ae Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M t/test.pl Log Message: --- test.pl - add support for rtriming fresh perl output This makes it easier to do regexp debug tests, where we don't care about trailing whitespace. It also fixes the line number reporting for fresh_perl_is() and fresh_perl_like() so that it shows the actual place where the line number is located, and it changes the relevant code to work properly with external $Level overrides. Commit: 87b8f3376c1b7c826b45287601469ad417f764ac https://github.com/Perl/perl5/commit/87b8f3376c1b7c826b45287601469ad417f764ac Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M handy.h Log Message: --- handy.h - add NewCopy() macro to combine New and Copy. Commit: 9bef0ac56fb62562e6f085ca2982385c51d874cc https://github.com/Perl/perl5/commit/9bef0ac56fb62562e6f085ca2982385c51d874cc Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M embed.fnc M embed.h M mg.c M proto.h M regcomp.c M regcomp_debug.c M regcomp_internal.h M regexec.c M regexp.h M t/re/pat_advanced.t M t/re/re_tests Log Message: --- regcomp.c etc - rework branch reset so it works properly Branch reset was hacked in without much thought about how it might interact with other features. Over time we added named capture and recursive patterns with GOSUB, but I guess because branch reset is somewhat esoteric we didnt notice the accumulating issues related to it. The main problem was my original hack used a fairly simple device to give multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced this was fine. When GOSUB was added later however, we overlooked at that this broke a key part of the book-keeping for GOSUB. A GOSUB regop needs to know where to jump to, and which close paren to stop at. However the structure of the regexp program can change from the time the regop is created. This means we keep track of every OPEN/CLOSE regop we encounter during parsing, and when something is inserted into the middle of the program we make sure to move the offsets we store for the OPEN/CLOSE data. This is essentially keyed and scaled to the number of parens we have seen. When branch reset is used however the number of OPEN/CLOSE regops is more than the number of logical buffers we have seen, and we only move one of the OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things. Another issues with branch reset is that it creates weird artifacts like this: /(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a) capture buffer because they both have the same id. Another case is that you cannot check if $+{b} matched and $+{a} did not, because conceptually they were the same buffer under the hood. These bugs are now fixed. The "aliasing" of capture buffers to each other is now done virtually, and under the hood each capture buffer is distinct. We introduce the concept of a "logical parno" which is the user visible capture buffer id, and keep it distinct from the true capture buffer id. Most of the internal logic uses the "true parno" for its business, so a bunch of problems go away, and we keep maps from logical to physical parnos, and vice versa, along with a map that gives use the "next physical parno with the same logical parno". Thus we can quickly skip through the physical capture buffers to find the one that matched. This means we also have to introduce a logical_total_parens as well, to complement the already existing total_parens. The latter refers to the true number of capture buffers. The former represents the logical number visible to the user. It is helpful to consider the following table: Logical:$1 $2 $3 $2 $3 $4 $2 $5 Physical:1 2 34 5 6 7 8 Next:0 4 57 0 0 0 0 Pattern: /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/ The names are mapped to physical buffers. So $+{b} will show what is in physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched. Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all 6 named buffers. Since the need to map these values is rare, we only store these maps when they are needed and branch reset has been used, when they are NULL it is assumed that physical and logical buffers are identical. Currently the way this change is implemented will likely break plug in regexp engines because they will be missing the new logical_total_parens field at the very least. Given that the perl internals code is somewhat poorly abstracted from the regexp
[Perl/perl5] 3f11a2: regexec engine - wrap and replace RX_OFFS() with b...
Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: 3f11a2855248134af98ca8d71cf71a3fe736dbae https://github.com/Perl/perl5/commit/3f11a2855248134af98ca8d71cf71a3fe736dbae Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M mg.c M pp.c M pp_ctl.c M pp_hot.c M regcomp.c M regcomp_debug.c M regexec.c M regexp.h Log Message: --- regexec engine - wrap and replace RX_OFFS() with better abstractions RX_OFFS() exposes a bit too much about how capture buffers are represented. This adds RX_OFFS_START() and RX_OFFS_END() and RX_OFFS_VALID() to replace most of the uses of the RX_OFFS() macro or direct access to the rx->off[] array. (We add RX_OFFSp() for those rare cases that should have direct access to the array.) This allows us to replace this logic with more complicated macros in the future. Pretty much anything using RX_OFFS() is going to be broken by future changes, so changing the define allows us to track it down easily. Not all use of the rx->offs[] array are converted; some uses are required for the regex engine internals, but anything outside of the regex engine should be using the replacement macros, and most things in the regex internals should use it also.
[Perl/perl5]
Branch: refs/heads/yves/wrap_cap_buf_macro Home: https://github.com/Perl/perl5
[Perl/perl5] a9f676: test.pl - add support for rtriming fresh perl output
Branch: refs/heads/yves/re_capture Home: https://github.com/Perl/perl5 Commit: a9f676271c3aac5dbb2a646da7d370e4f2a51ab9 https://github.com/Perl/perl5/commit/a9f676271c3aac5dbb2a646da7d370e4f2a51ab9 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M t/test.pl Log Message: --- test.pl - add support for rtriming fresh perl output This makes it easier to do regexp debug tests, where we don't care about trailing whitespace. It also fixes the line number reporting for fresh_perl_is() and fresh_perl_like() so that it shows the actual place where the line number is located, and it changes the relevant code to work properly with external $Level overrides. Commit: 683ebc3f8f6f479fb9a7ba66cab9374ad4f3bbfc https://github.com/Perl/perl5/commit/683ebc3f8f6f479fb9a7ba66cab9374ad4f3bbfc Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M handy.h Log Message: --- handy.h - add NewCopy() macro to combine New and Copy. Commit: 66c1e1e2cd9aee0fc8791faf86917429230db73f https://github.com/Perl/perl5/commit/66c1e1e2cd9aee0fc8791faf86917429230db73f Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M embed.fnc M embed.h M mg.c M proto.h M regcomp.c M regcomp_debug.c M regcomp_internal.h M regexec.c M regexp.h M t/re/pat_advanced.t M t/re/re_tests Log Message: --- regcomp.c etc - rework branch reset so it works properly Branch reset was hacked in without much thought about how it might interact with other features. Over time we added named capture and recursive patterns with GOSUB, but I guess because branch reset is somewhat esoteric we didnt notice the accumulating issues related to it. The main problem was my original hack used a fairly simple device to give multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced this was fine. When GOSUB was added later however, we overlooked at that this broke a key part of the book-keeping for GOSUB. A GOSUB regop needs to know where to jump to, and which close paren to stop at. However the structure of the regexp program can change from the time the regop is created. This means we keep track of every OPEN/CLOSE regop we encounter during parsing, and when something is inserted into the middle of the program we make sure to move the offsets we store for the OPEN/CLOSE data. This is essentially keyed and scaled to the number of parens we have seen. When branch reset is used however the number of OPEN/CLOSE regops is more than the number of logical buffers we have seen, and we only move one of the OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things. Another issues with branch reset is that it creates weird artifacts like this: /(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a) capture buffer because they both have the same id. Another case is that you cannot check if $+{b} matched and $+{a} did not, because conceptually they were the same buffer under the hood. These bugs are now fixed. The "aliasing" of capture buffers to each other is now done virtually, and under the hood each capture buffer is distinct. We introduce the concept of a "logical parno" which is the user visible capture buffer id, and keep it distinct from the true capture buffer id. Most of the internal logic uses the "true parno" for its business, so a bunch of problems go away, and we keep maps from logical to physical parnos, and vice versa, along with a map that gives use the "next physical parno with the same logical parno". Thus we can quickly skip through the physical capture buffers to find the one that matched. This means we also have to introduce a logical_total_parens as well, to complement the already existing total_parens. The latter refers to the true number of capture buffers. The former represents the logical number visible to the user. It is helpful to consider the following table: Logical:$1 $2 $3 $2 $3 $4 $2 $5 Physical:1 2 34 5 6 7 8 Next:0 4 57 0 0 0 0 Pattern: /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/ The names are mapped to physical buffers. So $+{b} will show what is in physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched. Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all 6 named buffers. Since the need to map these values is rare, we only store these maps when they are needed and branch reset has been used, when they are NULL it is assumed that physical and logical buffers are identical. Currently the way this change is implemented will likely break plug in regexp engines because they will be missing the new logical_total_parens field at the very least. Given that the perl internals code is somewhat poorly abstracted from the regexp
[Perl/perl5] 91d8a5: regexec engine - wrap and replace RX_OFFS() with b...
Branch: refs/heads/yves/wrap_cap_buf_macro Home: https://github.com/Perl/perl5 Commit: 91d8a59942b4e20c72026b53aaad7b89223ab656 https://github.com/Perl/perl5/commit/91d8a59942b4e20c72026b53aaad7b89223ab656 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M mg.c M pp.c M pp_ctl.c M pp_hot.c M regcomp.c M regcomp_debug.c M regexec.c M regexp.h Log Message: --- regexec engine - wrap and replace RX_OFFS() with better abstractions RX_OFFS() exposes a bit too much about how capture buffers are represented. This adds RX_OFFS_START() and RX_OFFS_END() and RX_OFFS_VALID() to replace most of the uses of the RX_OFFS() macro or direct access to the rx->off[] array. (We add RX_OFFSp() for those rare cases that should have direct access to the array.) This allows us to replace this logic with more complicated macros in the future. Pretty much anything using RX_OFFS() is going to be broken by future changes, so changing the define allows us to track it down easily. Not all use of the rx->offs[] array are converted; some uses are required for the regex engine internals, but anything outside of the regex engine should be using the replacement macros, and most things in the regex internals should use it also.
[Perl/perl5] 77d4fd: test.pl - add support for rtriming fresh perl output
Branch: refs/heads/yves/re_capture Home: https://github.com/Perl/perl5 Commit: 77d4fd3fd3c9546e888262d67a07b2b904c47437 https://github.com/Perl/perl5/commit/77d4fd3fd3c9546e888262d67a07b2b904c47437 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M t/test.pl Log Message: --- test.pl - add support for rtriming fresh perl output This makes it easier to do regexp debug tests, where we don't care about trailing whitespace. It also fixes the line number reporting for fresh_perl_is() and fresh_perl_like() so that it shows the actual place where the line number is located, and it changes the relevant code to work properly with external $Level overrides. Commit: b0e804579a3c3403623d77fa38910b2cab129c37 https://github.com/Perl/perl5/commit/b0e804579a3c3403623d77fa38910b2cab129c37 Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M handy.h Log Message: --- handy.h - add NewCopy() macro to combine New and Copy. Commit: f25e8abd64fcba61e27651d5ddb2f55e41249deb https://github.com/Perl/perl5/commit/f25e8abd64fcba61e27651d5ddb2f55e41249deb Author: Yves Orton Date: 2023-01-11 (Wed, 11 Jan 2023) Changed paths: M embed.fnc M embed.h M mg.c M proto.h M regcomp.c M regcomp_debug.c M regcomp_internal.h M regexec.c M regexp.h M t/re/pat_advanced.t M t/re/re_tests Log Message: --- regcomp.c etc - rework branch reset so it works properly Branch reset was hacked in without much thought about how it might interact with other features. Over time we added named capture and recursive patterns with GOSUB, but I guess because branch reset is somewhat esoteric we didnt notice the accumulating issues related to it. The main problem was my original hack used a fairly simple device to give multiple OPEN/CLOSE opcodes the same target buffer id. When it was introduced this was fine. When GOSUB was added later however, we overlooked at that this broke a key part of the book-keeping for GOSUB. A GOSUB regop needs to know where to jump to, and which close paren to stop at. However the structure of the regexp program can change from the time the regop is created. This means we keep track of every OPEN/CLOSE regop we encounter during parsing, and when something is inserted into the middle of the program we make sure to move the offsets we store for the OPEN/CLOSE data. This is essentially keyed and scaled to the number of parens we have seen. When branch reset is used however the number of OPEN/CLOSE regops is more than the number of logical buffers we have seen, and we only move one of the OPEN/CLOSE buffers that is in the branch reset. Which of course breaks things. Another issues with branch reset is that it creates weird artifacts like this: /(?|(?a)|(?b))(?)(?)/ where the (?) actually maps to the (?a) capture buffer because they both have the same id. Another case is that you cannot check if $+{b} matched and $+{a} did not, because conceptually they were the same buffer under the hood. These bugs are now fixed. The "aliasing" of capture buffers to each other is now done virtually, and under the hood each capture buffer is distinct. We introduce the concept of a "logical parno" which is the user visible capture buffer id, and keep it distinct from the true capture buffer id. Most of the internal logic uses the "true parno" for its business, so a bunch of problems go away, and we keep maps from logical to physical parnos, and vice versa, along with a map that gives use the "next physical parno with the same logical parno". Thus we can quickly skip through the physical capture buffers to find the one that matched. This means we also have to introduce a logical_total_parens as well, to complement the already existing total_parens. The latter refers to the true number of capture buffers. The former represents the logical number visible to the user. It is helpful to consider the following table: Logical:$1 $2 $3 $2 $3 $4 $2 $5 Physical:1 2 34 5 6 7 8 Next:0 4 57 0 0 0 0 Pattern: /(pre)(?|(?a)(?b)|(?c)(?d)(?e)|(?))(post)/ The names are mapped to physical buffers. So $+{b} will show what is in physical buffer 3. But $3 will show whichever of buffer 3 or 5 matched. Similarly @{^CAPTURE} will contain 5 elements, not 8. But %+ will contain all 6 named buffers. Since the need to map these values is rare, we only store these maps when they are needed and branch reset has been used, when they are NULL it is assumed that physical and logical buffers are identical. Currently the way this change is implemented will likely break plug in regexp engines because they will be missing the new logical_total_parens field at the very least. Given that the perl internals code is somewhat poorly abstracted from the regexp