In perl.git, the branch blead has been updated <http://perl5.git.perl.org/perl.git/commitdiff/6efb583a3779086d6dce2899767ca920f0eda754?hp=b9a58d500dd75ba783abac92a56e57d41227f62b>
- Log ----------------------------------------------------------------- commit 6efb583a3779086d6dce2899767ca920f0eda754 Merge: b9a58d500d 37b6262ff6 Author: David Mitchell <da...@iabyn.com> Date: Sun Jul 2 21:25:35 2017 +0100 [MERGE] scan_data_t: make fixed/float data array While compiling a pattern, rather than storing information about the current longest fixed and floating substring as individual fields of this struct - such as minlen_fixed, minlen_float etc - create a 2-element array to hold fixed and float data. So for example data->lookbehind_float becomes data->substrs[1].lookbehind etc. This mimics the way that substring data is stored in the regex struct once the pattern has been compiled. Similarly, move some fixed/float-specific flags from data->flags into a new per-substr subfield, data->subfield[i].flags. Also, the fixed substring now has both max and min offset fields, which are set equal to each other. This set of commits should make no functional difference apart from minor changes to debugging output (e.g. fixed offsets displayed as N..N rather than N). The various commits in this branch are mainly concerned with harmonising fixed and float code paths, and finally parameterising them, e.g. data->lookbehind_fixed = ...; data->lookbehind_float = ...; becomes data->substr[0].lookbehind = ...; data->substr[1].lookbehind = ...; and finally becomes for (i = 0; i < 2; i++) { data->substr[i].lookbehind = ...; } The two big advantages of this approach are; 1) it simplifies and rationalises the code, avoiding two similar blocks of code in several places; 2) it may allow for future expansion where there isn't necessarily at most 1 fixed and 1 floating substring. Note that this only affects the regex compile-time code. The run-time code in re_intuit_start is still a mess and could benefit from a similar rationalisation (it already has the arrray[2], but doesn't do much parameterising). commit 37b6262ff693a05099d6f0ccc01328e5223f6a85 Author: David Mitchell <da...@iabyn.com> Date: Sun Jul 2 19:24:07 2017 +0100 regcomp.c: parameterise scan_data_t substrs[] Now that the scan_data_t stores its fixed and floating substring data as a 2-element array, replace various bits of duplicated code which separately handled fixed and floating substrings with for (i = 0; i < 2; i++) loops etc. This makes the code shorter and simpler, and will make it easier in future to expand to more than a single each of fixed+float. There should be no functional changes, except that debugging output now displays N..N rather than just just N for the fixed substring start range (i.e. its now just a subset of float where max == min) M regcomp.c commit 54fc74717cb6983622ca7c2a88c698f8ebf621cd Author: David Mitchell <da...@iabyn.com> Date: Fri Jun 30 12:43:00 2017 +0100 scan_data_t: rename 'longest' field .. to 'cur_is_floating' It's an index into either the fixed or float substring info; the information it provides is whether the currently being captured substring is fixed or floating; it's nothing to do with whether the fixed or the floating is currently the longest. M regcomp.c commit a0e2688aebe4f6e7b0243bf8e1eff7fa78c08e08 Author: David Mitchell <da...@iabyn.com> Date: Fri Jun 30 10:33:37 2017 +0100 regcomp.c: remove float_min_offset etc macro use In this src file, expand all the various macros like #define anchored_offset substrs->data[0].min_offset #define float_min_offset substrs->data[1].min_offset This will later allow parts of the code to be parameterised, e.g. for (i=0; i<1; i++) { substrs->data[i].min_offset = ...; ... } M regcomp.c commit 0981c3b17a5185c79d86712f0fbe832b32591e6c Author: David Mitchell <da...@iabyn.com> Date: Fri Jun 30 10:20:11 2017 +0100 regcomp.c: S_setup_longest(): simplify args Rather than passing e.g. &(r->float_utf8), &(r->float_substr), &(r->float_end_shift), pass the single arg &(r->substrs->data[1]) (float_foo are macros which expand to substrs->data[1].foo) M regcomp.c commit 6e12f832ea033e57dcca8b75bd3a1d75389074c6 Author: David Mitchell <da...@iabyn.com> Date: Fri Jun 30 09:55:32 2017 +0100 regcomp: set fixed max_offset to min_offset previously scan_data_t had the three fields offset_fixed offset_float_min offset_float_max a few commits ago that was converted into a 2 element array (for fixed and float), each with the fields min_offset max_offset where the max_offset was unused in fixed (substrs[0]) case. Instead, set it equal to min_offset This makes the fixed and float code paths more similar. At the same time expand a few of the 'float_max_offset' type macros to make it clearer what's going on. M regcomp.c commit 11683ecb8678a801ac36414c33c4f730f2c18cdf Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 21:40:41 2017 +0100 S_study_chunk: have per substring flags Currently the scan_data_t struct has a flags field which contains SF_ and SCF_ flags. Some of the SF_ flags are general; others are specific to the fixed or floating substr. For example there are these 3 flags: SF_BEFORE_MEOL SF_FIX_BEFORE_MEOL SF_FL_BEFORE_MEOL This commit adds a flags field to the per-substring substruct and sets some flags per-substring instead. For example previously we did: now we would do: -------------------------------- -------------------------------------- data->flags |= SF_BEFORE_MEOL unchanged data->flags |= SF_FIX_BEFORE_MEOL data->substrs[0].flags |= SF_BEFORE_MEOL data->flags |= SF_FL_BEFORE_MEOL data->substrs[1].flags |= SF_BEFORE_MEOL This allows us to simplify the code (e.g. eliminating some args from S_setup_longest()) and in future will allow more than one fixed or floating substring. M regcomp.c commit a302f6d6ad88210a4a24a69603c1c6511016d294 Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 22:16:19 2017 +0100 regcomp.c: DEBUG_PEEP(): invalid flags DEBUG_PEEP(..., flags) was invoked from 3 functions - however in two of throse functions, the 'flags' local var did *not* contain SF_ and SCF_ bits, so the flag bits were being incorrectly displayed as SF_ etc. In those two functions, change it instead to DEBUG_PEEP(...., 0) M regcomp.c commit f5a36d78ddaf1655ac1208fbeba81ff863cb06be Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 20:15:56 2017 +0100 regcomp.c: convert debugging macros to static fns make these 3 macros into thin wrappers around some new static functions, rather than just being huge macros: DEBUG_SHOW_STUDY_FLAGS DEBUG_STUDYDATA DEBUG_PEEP Also, avoid the macros implicitly using local vars: make them into explicit parameters instead (this is one of my pet peeves). M regcomp.c commit a2ca2017bc501f5c42d414bb761d2f1434eb746b Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 16:45:39 2017 +0100 make struct scan_data_t->longest an index val In this private data structure used during regex compilation, the 'longest' field was an SV** pointer which was always set to point to one of these two addresses: &(data->substrs[0].str) &(data->substrs[1].str) Instead, just make it a U8 with the value 0 or 1. M regcomp.c commit cb6aef8b59927fcafdf780457956e281415062cb Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 16:24:12 2017 +0100 S_setup_longest() pass struct rather than fields Now that a substring is a separate struct, pass as a single pointer rather than as 4 separate args. M regcomp.c commit 6ea771694bdcd55980c7932b2d2fa4aed8d66973 Author: David Mitchell <da...@iabyn.com> Date: Thu Jun 29 15:05:03 2017 +0100 struct scan_data_t: make some fields into an array This private struct is used just within regcomp.c while compiling a pattern. It has a set of fields for a fixed substring, and similar set for floating, e.g. SV *longest_fixed; SV *longest_float; SSize_t *minlen_fixed; SSize_t *minlen_float; etc Instead have a 2 element array, one for fixed, one for float, so e.g. data->offset_float_max becomes data->substrs[1].max_offset There are 3 reasons for doing this. First, it makes the code more regular, and allows a whole substr ptr to be passed as an arg to a function rather than having to pass every individual field; second, it makes the compile-time struct more similar to the runtime struct, which already has such an arrangement; third, it allows for a hypothetical future expansion where there aren't necessarily at most 1 fixed and 1 floating substring. Note that a side effect of this commit has been to change lookbehind_fixed from I32 to SSize_t; lookbehind_float was already SSize_t, so the I32 was probably a bug. M regcomp.c ----------------------------------------------------------------------- Summary of changes: regcomp.c | 672 +++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 358 insertions(+), 314 deletions(-) diff --git a/regcomp.c b/regcomp.c index e2bdacc394..088def552a 100644 --- a/regcomp.c +++ b/regcomp.c @@ -392,7 +392,7 @@ struct RExC_state_t { For each string some basic information is maintained: - - offset or min_offset + - min_offset This is the position the string must appear at, or not before. It also implicitly (when combined with minlenp) tells us how many characters must match before the string we are searching for. @@ -404,6 +404,7 @@ struct RExC_state_t { Only used for floating strings. This is the rightmost point that the string can appear at. If set to SSize_t_MAX it indicates that the string can occur infinitely far to the right. + For fixed strings, it is equal to min_offset. - minlenp A pointer to the minimum number of characters of the pattern that the @@ -443,6 +444,15 @@ struct RExC_state_t { */ +struct scan_data_substrs { + SV *str; /* longest substring found in pattern */ + SSize_t min_offset; /* earliest point in string it can appear */ + SSize_t max_offset; /* latest point in string it can appear */ + SSize_t *minlenp; /* pointer to the minlen relevant to the string */ + SSize_t lookbehind; /* is the pos of the string modified by LB */ + I32 flags; /* per substring SF_* and SCF_* flags */ +}; + typedef struct scan_data_t { /*I32 len_min; unused */ /*I32 len_delta; unused */ @@ -452,17 +462,14 @@ typedef struct scan_data_t { SSize_t last_end; /* min value, <0 unless valid. */ SSize_t last_start_min; SSize_t last_start_max; - SV **longest; /* Either &l_fixed, or &l_float. */ - SV *longest_fixed; /* longest fixed string found in pattern */ - SSize_t offset_fixed; /* offset where it starts */ - SSize_t *minlen_fixed; /* pointer to the minlen relevant to the string */ - I32 lookbehind_fixed; /* is the position of the string modfied by LB */ - SV *longest_float; /* longest floating string found in pattern */ - SSize_t offset_float_min; /* earliest point in string it can appear */ - SSize_t offset_float_max; /* latest point in string it can appear */ - SSize_t *minlen_float; /* pointer to the minlen relevant to the string */ - SSize_t lookbehind_float; /* is the pos of the string modified by LB */ - I32 flags; + U8 cur_is_floating; /* whether the last_* values should be set as + * the next fixed (0) or floating (1) + * substring */ + + /* [0] is longest fixed substring so far, [1] is longest float so far */ + struct scan_data_substrs substrs[2]; + + I32 flags; /* common SF_* and SCF_* flags */ I32 whilem_c; SSize_t *last_closep; regnode_ssc *start_class; @@ -472,23 +479,21 @@ typedef struct scan_data_t { * Forward declarations for pregcomp()'s friends. */ -static const scan_data_t zero_scan_data = - { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ,0}; +static const scan_data_t zero_scan_data = { + 0, 0, NULL, 0, 0, 0, 0, + { + { NULL, 0, 0, 0, 0, 0 }, + { NULL, 0, 0, 0, 0, 0 }, + }, + 0, 0, NULL, NULL +}; + +/* study flags */ -#define SF_BEFORE_EOL (SF_BEFORE_SEOL|SF_BEFORE_MEOL) #define SF_BEFORE_SEOL 0x0001 #define SF_BEFORE_MEOL 0x0002 -#define SF_FIX_BEFORE_EOL (SF_FIX_BEFORE_SEOL|SF_FIX_BEFORE_MEOL) -#define SF_FL_BEFORE_EOL (SF_FL_BEFORE_SEOL|SF_FL_BEFORE_MEOL) - -#define SF_FIX_SHIFT_EOL (+2) -#define SF_FL_SHIFT_EOL (+4) - -#define SF_FIX_BEFORE_SEOL (SF_BEFORE_SEOL << SF_FIX_SHIFT_EOL) -#define SF_FIX_BEFORE_MEOL (SF_BEFORE_MEOL << SF_FIX_SHIFT_EOL) +#define SF_BEFORE_EOL (SF_BEFORE_SEOL|SF_BEFORE_MEOL) -#define SF_FL_BEFORE_SEOL (SF_BEFORE_SEOL << SF_FL_SHIFT_EOL) -#define SF_FL_BEFORE_MEOL (SF_BEFORE_MEOL << SF_FL_SHIFT_EOL) /* 0x20 */ #define SF_IS_INF 0x0040 #define SF_HAS_PAR 0x0080 #define SF_IN_PAR 0x0100 @@ -970,65 +975,123 @@ Perl_re_indentf(pTHX_ const char *fmt, U32 depth, ...) #define DEBUG_SHOW_STUDY_FLAG(flags,flag) \ if ((flags) & flag) Perl_re_printf( aTHX_ "%s ", #flag) -#define DEBUG_SHOW_STUDY_FLAGS(flags,open_str,close_str) \ - if ( ( flags ) ) { \ - Perl_re_printf( aTHX_ "%s", open_str); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_FL_BEFORE_SEOL); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_FL_BEFORE_MEOL); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_IS_INF); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_HAS_PAR); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_IN_PAR); \ - DEBUG_SHOW_STUDY_FLAG(flags,SF_HAS_EVAL); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_DO_SUBSTR); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_DO_STCLASS_AND); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_DO_STCLASS_OR); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_DO_STCLASS); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_WHILEM_VISITED_POS); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_TRIE_RESTUDY); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_SEEN_ACCEPT); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_TRIE_DOING_RESTUDY); \ - DEBUG_SHOW_STUDY_FLAG(flags,SCF_IN_DEFINE); \ - Perl_re_printf( aTHX_ "%s", close_str); \ - } - - -#define DEBUG_STUDYDATA(str,data,depth) \ -DEBUG_OPTIMISE_MORE_r(if(data){ \ - Perl_re_indentf( aTHX_ "" str "Pos:%" IVdf "/%" IVdf \ - " Flags: 0x%" UVXf, \ - depth, \ - (IV)((data)->pos_min), \ - (IV)((data)->pos_delta), \ - (UV)((data)->flags) \ - ); \ - DEBUG_SHOW_STUDY_FLAGS((data)->flags," [ ","]"); \ - Perl_re_printf( aTHX_ \ - " Whilem_c: %" IVdf " Lcp: %" IVdf " %s", \ - (IV)((data)->whilem_c), \ - (IV)((data)->last_closep ? *((data)->last_closep) : -1), \ - is_inf ? "INF " : "" \ - ); \ - if ((data)->last_found) \ - Perl_re_printf( aTHX_ \ - "Last:'%s' %" IVdf ":%" IVdf "/%" IVdf \ - " %sFixed:'%s' @ %" IVdf \ - " %sFloat: '%s' @ %" IVdf "/%" IVdf, \ - SvPVX_const((data)->last_found), \ - (IV)((data)->last_end), \ - (IV)((data)->last_start_min), \ - (IV)((data)->last_start_max), \ - ((data)->longest && \ - (data)->longest==&((data)->longest_fixed)) ? "*" : "", \ - SvPVX_const((data)->longest_fixed), \ - (IV)((data)->offset_fixed), \ - ((data)->longest && \ - (data)->longest==&((data)->longest_float)) ? "*" : "", \ - SvPVX_const((data)->longest_float), \ - (IV)((data)->offset_float_min), \ - (IV)((data)->offset_float_max) \ - ); \ - Perl_re_printf( aTHX_ "\n"); \ -}); + +#ifdef DEBUGGING +static void +S_debug_show_study_flags(pTHX_ U32 flags, const char *open_str, + const char *close_str) +{ + if (!flags) + return; + + Perl_re_printf( aTHX_ "%s", open_str); + DEBUG_SHOW_STUDY_FLAG(flags, SF_BEFORE_SEOL); + DEBUG_SHOW_STUDY_FLAG(flags, SF_BEFORE_MEOL); + DEBUG_SHOW_STUDY_FLAG(flags, SF_IS_INF); + DEBUG_SHOW_STUDY_FLAG(flags, SF_HAS_PAR); + DEBUG_SHOW_STUDY_FLAG(flags, SF_IN_PAR); + DEBUG_SHOW_STUDY_FLAG(flags, SF_HAS_EVAL); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_DO_SUBSTR); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_DO_STCLASS_AND); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_DO_STCLASS_OR); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_DO_STCLASS); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_WHILEM_VISITED_POS); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_TRIE_RESTUDY); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_SEEN_ACCEPT); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_TRIE_DOING_RESTUDY); + DEBUG_SHOW_STUDY_FLAG(flags, SCF_IN_DEFINE); + Perl_re_printf( aTHX_ "%s", close_str); +} + + +static void +S_debug_studydata(pTHX_ const char *where, scan_data_t *data, + U32 depth, int is_inf) +{ + GET_RE_DEBUG_FLAGS_DECL; + + DEBUG_OPTIMISE_MORE_r({ + if (!data) + return; + Perl_re_indentf(aTHX_ "%s: Pos:%" IVdf "/%" IVdf " Flags: 0x%" UVXf, + depth, + where, + (IV)data->pos_min, + (IV)data->pos_delta, + (UV)data->flags + ); + + S_debug_show_study_flags(aTHX_ data->flags," [","]"); + + Perl_re_printf( aTHX_ + " Whilem_c: %" IVdf " Lcp: %" IVdf " %s", + (IV)data->whilem_c, + (IV)(data->last_closep ? *((data)->last_closep) : -1), + is_inf ? "INF " : "" + ); + + if (data->last_found) { + int i; + Perl_re_printf(aTHX_ + "Last:'%s' %" IVdf ":%" IVdf "/%" IVdf, + SvPVX_const(data->last_found), + (IV)data->last_end, + (IV)data->last_start_min, + (IV)data->last_start_max + ); + + for (i = 0; i < 2; i++) { + Perl_re_printf(aTHX_ + " %s%s: '%s' @ %" IVdf "/%" IVdf, + data->cur_is_floating == i ? "*" : "", + i ? "Float" : "Fixed", + SvPVX_const(data->substrs[i].str), + (IV)data->substrs[i].min_offset, + (IV)data->substrs[i].max_offset + ); + S_debug_show_study_flags(aTHX_ data->substrs[i].flags," [","]"); + } + } + + Perl_re_printf( aTHX_ "\n"); + }); +} + + +static void +S_debug_peep(pTHX_ const char *str, const RExC_state_t *pRExC_state, + regnode *scan, U32 depth, U32 flags) +{ + GET_RE_DEBUG_FLAGS_DECL; + + DEBUG_OPTIMISE_r({ + regnode *Next; + + if (!scan) + return; + Next = regnext(scan); + regprop(RExC_rx, RExC_mysv, scan, NULL, pRExC_state); + Perl_re_indentf( aTHX_ "%s>%3d: %s (%d)", + depth, + str, + REG_NODE_NUM(scan), SvPV_nolen_const(RExC_mysv), + Next ? (REG_NODE_NUM(Next)) : 0 ); + S_debug_show_study_flags(aTHX_ flags," [ ","]"); + Perl_re_printf( aTHX_ "\n"); + }); +} + + +# define DEBUG_STUDYDATA(where, data, depth, is_inf) \ + S_debug_studydata(aTHX_ where, data, depth, is_inf) + +# define DEBUG_PEEP(str, scan, depth, flags) \ + S_debug_peep(aTHX_ str, pRExC_state, scan, depth, flags) + +#else +# define DEBUG_STUDYDATA(where, data, depth, is_inf) NOOP +# define DEBUG_PEEP(str, scan, depth, flags) NOOP +#endif /* ========================================================= @@ -1200,7 +1263,7 @@ S_cntrl_to_mnemonic(const U8 c) } /* Mark that we cannot extend a found fixed substring at this point. - Update the longest found anchored substring and the longest found + Update the longest found anchored substring or the longest found floating substrings if needed. */ STATIC void @@ -1208,42 +1271,38 @@ S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, scan_data_t *data, SSize_t *minlenp, int is_inf) { const STRLEN l = CHR_SVLEN(data->last_found); - const STRLEN old_l = CHR_SVLEN(*data->longest); + SV * const longest_sv = data->substrs[data->cur_is_floating].str; + const STRLEN old_l = CHR_SVLEN(longest_sv); GET_RE_DEBUG_FLAGS_DECL; PERL_ARGS_ASSERT_SCAN_COMMIT; if ((l >= old_l) && ((l > old_l) || (data->flags & SF_BEFORE_EOL))) { - SvSetMagicSV(*data->longest, data->last_found); - if (*data->longest == data->longest_fixed) { - data->offset_fixed = l ? data->last_start_min : data->pos_min; - if (data->flags & SF_BEFORE_EOL) - data->flags - |= ((data->flags & SF_BEFORE_EOL) << SF_FIX_SHIFT_EOL); - else - data->flags &= ~SF_FIX_BEFORE_EOL; - data->minlen_fixed=minlenp; - data->lookbehind_fixed=0; - } - else { /* *data->longest == data->longest_float */ - data->offset_float_min = l ? data->last_start_min : data->pos_min; - data->offset_float_max = (l + const U8 i = data->cur_is_floating; + SvSetMagicSV(longest_sv, data->last_found); + data->substrs[i].min_offset = l ? data->last_start_min : data->pos_min; + + if (!i) /* fixed */ + data->substrs[0].max_offset = data->substrs[0].min_offset; + else { /* float */ + data->substrs[1].max_offset = (l ? data->last_start_max : (data->pos_delta > SSize_t_MAX - data->pos_min ? SSize_t_MAX : data->pos_min + data->pos_delta)); if (is_inf - || (STRLEN)data->offset_float_max > (STRLEN)SSize_t_MAX) - data->offset_float_max = SSize_t_MAX; - if (data->flags & SF_BEFORE_EOL) - data->flags - |= ((data->flags & SF_BEFORE_EOL) << SF_FL_SHIFT_EOL); - else - data->flags &= ~SF_FL_BEFORE_EOL; - data->minlen_float=minlenp; - data->lookbehind_float=0; - } + || (STRLEN)data->substrs[1].max_offset > (STRLEN)SSize_t_MAX) + data->substrs[1].max_offset = SSize_t_MAX; + } + + if (data->flags & SF_BEFORE_EOL) + data->substrs[i].flags |= (data->flags & SF_BEFORE_EOL); + else + data->substrs[i].flags &= ~SF_BEFORE_EOL; + data->substrs[i].minlenp = minlenp; + data->substrs[i].lookbehind = 0; } + SvCUR_set(data->last_found, 0); { SV * const sv = data->last_found; @@ -1255,7 +1314,7 @@ S_scan_commit(pTHX_ const RExC_state_t *pRExC_state, scan_data_t *data, } data->last_end = -1; data->flags &= ~SF_BEFORE_EOL; - DEBUG_STUDYDATA("commit: ",data,0); + DEBUG_STUDYDATA("commit", data, 0, is_inf); } /* An SSC is just a regnode_charclass_posix with an extra field: the inversion @@ -3621,17 +3680,6 @@ S_construct_ahocorasick_from_trie(pTHX_ RExC_state_t *pRExC_state, regnode *sour } -#define DEBUG_PEEP(str,scan,depth) \ - DEBUG_OPTIMISE_r({if (scan){ \ - regnode *Next = regnext(scan); \ - regprop(RExC_rx, RExC_mysv, scan, NULL, pRExC_state);\ - Perl_re_indentf( aTHX_ "" str ">%3d: %s (%d)", \ - depth, REG_NODE_NUM(scan), SvPV_nolen_const(RExC_mysv),\ - Next ? (REG_NODE_NUM(Next)) : 0 );\ - DEBUG_SHOW_STUDY_FLAGS(flags," [ ","]");\ - Perl_re_printf( aTHX_ "\n"); \ - }}); - /* The below joins as many adjacent EXACTish nodes as possible into a single * one. The regop may be changed if the node(s) contain certain sequences that * require special handling. The joining is only done if: @@ -3786,7 +3834,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, PERL_UNUSED_ARG(flags); PERL_UNUSED_ARG(val); #endif - DEBUG_PEEP("join",scan,depth); + DEBUG_PEEP("join", scan, depth, 0); /* Look through the subsequent nodes in the chain. Skip NOTHING, merge * EXACT ones that are mergeable to the current one. */ @@ -3800,7 +3848,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, if (OP(n) == TAIL || n > next) stringok = 0; if (PL_regkind[OP(n)] == NOTHING) { - DEBUG_PEEP("skip:",n,depth); + DEBUG_PEEP("skip:", n, depth, 0); NEXT_OFF(scan) += NEXT_OFF(n); next = n + NODE_STEP_REGNODE; #ifdef DEBUGGING @@ -3820,7 +3868,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, if (oldl + STR_LEN(n) > U8_MAX) break; - DEBUG_PEEP("merg",n,depth); + DEBUG_PEEP("merg", n, depth, 0); merged++; NEXT_OFF(scan) += NEXT_OFF(n); @@ -3837,7 +3885,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, #ifdef EXPERIMENTAL_INPLACESCAN if (flags && !NEXT_OFF(n)) { - DEBUG_PEEP("atch", val, depth); + DEBUG_PEEP("atch", val, depth, 0); if (reg_off_by_arg[OP(n)]) { ARG_SET(n, val - n); } @@ -4068,7 +4116,7 @@ S_join_exact(pTHX_ RExC_state_t *pRExC_state, regnode *scan, n++; } #endif - DEBUG_OPTIMISE_r(if (merged){DEBUG_PEEP("finl",scan,depth)}); + DEBUG_OPTIMISE_r(if (merged){DEBUG_PEEP("finl", scan, depth, 0);}); return stopnow; } @@ -4182,8 +4230,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, the folded version may be shorter) */ bool unfolded_multi_char = FALSE; /* Peephole optimizer: */ - DEBUG_STUDYDATA("Peep:", data, depth); - DEBUG_PEEP("Peep", scan, depth); + DEBUG_STUDYDATA("Peep", data, depth, is_inf); + DEBUG_PEEP("Peep", scan, depth, flags); /* The reason we do this here is that we need to deal with things like @@ -4227,14 +4275,14 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, StructCopy(&zero_scan_data, &data_fake, scan_data_t); scan = regnext(scan); assert( OP(scan) == IFTHEN ); - DEBUG_PEEP("expect IFTHEN", scan, depth); + DEBUG_PEEP("expect IFTHEN", scan, depth, flags); data_fake.last_closep= &fake_last_close; minlen = *minlenp; next = regnext(scan); scan = NEXTOPER(NEXTOPER(scan)); - DEBUG_PEEP("scan", scan, depth); - DEBUG_PEEP("next", next, depth); + DEBUG_PEEP("scan", scan, depth, flags); + DEBUG_PEEP("next", next, depth, flags); /* we suppose the run is continuous, last=next... * NOTE we dont use the return here! */ @@ -4278,7 +4326,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, I32 f = 0; regnode_ssc this_class; - DEBUG_PEEP("Branch", scan, depth); + DEBUG_PEEP("Branch", scan, depth, flags); num++; StructCopy(&zero_scan_data, &data_fake, scan_data_t); @@ -4343,7 +4391,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, else data->pos_delta += max1 - min1; if (max1 != min1 || is_inf) - data->longest = &(data->longest_float); + data->cur_is_floating = 1; } min += min1; if (delta == SSize_t_MAX @@ -4781,16 +4829,16 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, RExC_study_chunk_recursed_bytes, U8); } /* we havent recursed into this paren yet, so recurse into it */ - DEBUG_STUDYDATA("gosub-set:", data,depth); + DEBUG_STUDYDATA("gosub-set", data, depth, is_inf); PAREN_SET(RExC_study_chunk_recursed + (recursed_depth * RExC_study_chunk_recursed_bytes), paren); my_recursed_depth= recursed_depth + 1; } else { - DEBUG_STUDYDATA("gosub-inf:", data,depth); + DEBUG_STUDYDATA("gosub-inf", data, depth, is_inf); /* some form of infinite recursion, assume infinite length * */ if (flags & SCF_DO_SUBSTR) { scan_commit(pRExC_state, data, minlenp, is_inf); - data->longest = &(data->longest_float); + data->cur_is_floating = 1; } is_inf = is_inf_internal = 1; if (flags & SCF_DO_STCLASS_OR) /* Allow everything */ @@ -4828,8 +4876,8 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, newframe->prev_recursed_depth = recursed_depth; newframe->this_prev_frame= frame; - DEBUG_STUDYDATA("frame-new:",data,depth); - DEBUG_PEEP("fnew", scan, depth); + DEBUG_STUDYDATA("frame-new", data, depth, is_inf); + DEBUG_PEEP("fnew", scan, depth, flags); frame = newframe; scan = start; @@ -4918,7 +4966,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, } data->pos_delta += min_subtract; if (min_subtract) { - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } } @@ -4985,7 +5033,7 @@ S_study_chunk(pTHX_ RExC_state_t *pRExC_state, regnode **scanp, if (flags & SCF_DO_SUBSTR) { scan_commit(pRExC_state, data, minlenp, is_inf); /* Cannot extend fixed substrings */ - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } is_inf = is_inf_internal = 1; scan = regnext(scan); @@ -5323,7 +5371,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", ? SSize_t_MAX : data->pos_min + data->pos_delta - last_chrs; } - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } SvREFCNT_dec(last_str); } @@ -5347,7 +5395,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", if (flags & SCF_DO_SUBSTR) { /* Cannot expect anything... */ scan_commit(pRExC_state, data, minlenp, is_inf); - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } is_inf = is_inf_internal = 1; if (flags & SCF_DO_STCLASS_OR) { @@ -5394,7 +5442,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", scan_commit(pRExC_state, data, minlenp, is_inf); data->pos_min += 1; data->pos_delta += 1; - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } } else if (REGNODE_SIMPLE(OP(scan))) { @@ -5666,6 +5714,8 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", else data_fake.last_closep = &fake; data_fake.flags = 0; + data_fake.substrs[0].flags = 0; + data_fake.substrs[1].flags = 0; data_fake.pos_delta = delta; if (is_inf) data_fake.flags |= SF_IS_INF; @@ -5708,29 +5758,29 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", data->flags |= SF_HAS_EVAL; data->whilem_c = data_fake.whilem_c; if ((flags & SCF_DO_SUBSTR) && data_fake.last_found) { + int i; if (RExC_rx->minlen<*minnextp) RExC_rx->minlen=*minnextp; scan_commit(pRExC_state, &data_fake, minnextp, is_inf); SvREFCNT_dec_NN(data_fake.last_found); - if ( data_fake.minlen_fixed != minlenp ) - { - data->offset_fixed= data_fake.offset_fixed; - data->minlen_fixed= data_fake.minlen_fixed; - data->lookbehind_fixed+= scan->flags; - } - if ( data_fake.minlen_float != minlenp ) - { - data->minlen_float= data_fake.minlen_float; - data->offset_float_min=data_fake.offset_float_min; - data->offset_float_max=data_fake.offset_float_max; - data->lookbehind_float+= scan->flags; + for (i = 0; i < 2; i++) { + if (data_fake.substrs[i].minlenp != minlenp) { + data->substrs[i].min_offset = + data_fake.substrs[i].min_offset; + data->substrs[i].max_offset = + data_fake.substrs[i].max_offset; + data->substrs[i].minlenp = + data_fake.substrs[i].minlenp; + data->substrs[i].lookbehind += scan->flags; + } } } } } #endif } + else if (OP(scan) == OPEN) { if (stopparen != (I32)ARG(scan)) pars++; @@ -5767,7 +5817,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", { if (flags & SCF_DO_SUBSTR) { scan_commit(pRExC_state, data, minlenp, is_inf); - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } is_inf = is_inf_internal = 1; if (flags & SCF_DO_STCLASS_OR) /* Allow everything */ @@ -5878,7 +5928,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", data->pos_min += min1; data->pos_delta += max1 - min1; if (max1 != min1 || is_inf) - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } min += min1; if (delta != SSize_t_MAX) @@ -5922,7 +5972,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", data->pos_min += trie->minlen; data->pos_delta += (trie->maxlen - trie->minlen); if (trie->maxlen != trie->minlen) - data->longest = &(data->longest_float); + data->cur_is_floating = 1; /* float */ } if (trie->jump) /* no more substrings -- for now /grr*/ flags &= ~SCF_DO_SUBSTR; @@ -5939,8 +5989,8 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", /* we need to unwind recursion. */ depth = depth - 1; - DEBUG_STUDYDATA("frame-end:",data,depth); - DEBUG_PEEP("fend", scan, depth); + DEBUG_STUDYDATA("frame-end", data, depth, is_inf); + DEBUG_PEEP("fend", scan, depth, flags); /* restore previous context */ last = frame->last_regnode; @@ -5954,7 +6004,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", } assert(!frame); - DEBUG_STUDYDATA("pre-fin:",data,depth); + DEBUG_STUDYDATA("pre-fin", data, depth, is_inf); *scanp = scan; *deltap = is_inf_internal ? SSize_t_MAX : delta; @@ -5976,7 +6026,7 @@ Perl_re_printf( aTHX_ "LHS=%" UVuf " RHS=%" UVuf "\n", if (flags & SCF_TRIE_RESTUDY) data->flags |= SCF_TRIE_RESTUDY; - DEBUG_STUDYDATA("post-fin:",data,depth); + DEBUG_STUDYDATA("post-fin", data, depth, is_inf); { SSize_t final_minlen= min < stopmin ? min : stopmin; @@ -6694,10 +6744,10 @@ S_compile_runtime_code(pTHX_ RExC_state_t * const pRExC_state, STATIC bool -S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, - SV** rx_utf8, SV** rx_substr, SSize_t* rx_end_shift, - SSize_t lookbehind, SSize_t offset, SSize_t *minlen, - STRLEN longest_length, bool eol, bool meol) +S_setup_longest(pTHX_ RExC_state_t *pRExC_state, + struct reg_substr_datum *rsd, + struct scan_data_substrs *sub, + STRLEN longest_length) { /* This is the common code for setting up the floating and fixed length * string data extracted from Perl_re_op_compile() below. Returns a boolean @@ -6705,6 +6755,8 @@ S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, I32 t; SSize_t ml; + bool eol = cBOOL(sub->flags & SF_BEFORE_EOL); + bool meol = cBOOL(sub->flags & SF_BEFORE_MEOL); if (! (longest_length || (eol /* Can't have SEOL and MULTI */ @@ -6718,29 +6770,29 @@ S_setup_longest(pTHX_ RExC_state_t *pRExC_state, SV* sv_longest, /* copy the information about the longest from the reg_scan_data over to the program. */ - if (SvUTF8(sv_longest)) { - *rx_utf8 = sv_longest; - *rx_substr = NULL; + if (SvUTF8(sub->str)) { + rsd->substr = NULL; + rsd->utf8_substr = sub->str; } else { - *rx_substr = sv_longest; - *rx_utf8 = NULL; + rsd->substr = sub->str; + rsd->utf8_substr = NULL; } /* end_shift is how many chars that must be matched that follow this item. We calculate it ahead of time as once the lookbehind offset is added in we lose the ability to correctly calculate it.*/ - ml = minlen ? *(minlen) : (SSize_t)longest_length; - *rx_end_shift = ml - offset + ml = sub->minlenp ? *(sub->minlenp) : (SSize_t)longest_length; + rsd->end_shift = ml - sub->min_offset - longest_length /* XXX SvTAIL is always false here - did you mean FBMcf_TAIL * intead? - DAPM - + (SvTAIL(sv_longest) != 0) + + (SvTAIL(sub->str) != 0) */ - + lookbehind; + + sub->lookbehind; t = (eol/* Can't have SEOL and MULTI */ && (! meol || (RExC_flags & RXf_PMf_MULTILINE))); - fbm_compile(sv_longest, t ? FBMcf_TAIL : 0); + fbm_compile(sub->str, t ? FBMcf_TAIL : 0); return TRUE; } @@ -7371,12 +7423,14 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, if (!(RExC_seen & REG_TOP_LEVEL_BRANCHES_SEEN)) { /* Only one top-level choice. */ SSize_t fake; - STRLEN longest_float_length, longest_fixed_length; + STRLEN longest_length[2]; regnode_ssc ch_class; /* pointed to by data */ int stclass_flag; SSize_t last_close = 0; /* pointed to by data */ regnode *first= scan; regnode *first_next= regnext(first); + int i; + /* * Skip introductions and multiplicators >= 1 * so that we can extract the 'meat' of the pattern that must @@ -7418,7 +7472,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, /* Starting-point info. */ again: - DEBUG_PEEP("first:",first,0); + DEBUG_PEEP("first:", first, 0, 0); /* Ignore EXACT as we deal with it later. */ if (PL_regkind[OP(first)] == EXACT) { if (OP(first) == EXACT || OP(first) == EXACTL) @@ -7499,13 +7553,13 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, * earlier string may buy us something the later one won't.] */ - data.longest_fixed = newSVpvs(""); - data.longest_float = newSVpvs(""); + data.substrs[0].str = newSVpvs(""); + data.substrs[1].str = newSVpvs(""); data.last_found = newSVpvs(""); - data.longest = &(data.longest_fixed); + data.cur_is_floating = 0; /* initially any found substring is fixed */ ENTER_with_name("study_chunk"); - SAVEFREESV(data.longest_fixed); - SAVEFREESV(data.longest_float); + SAVEFREESV(data.substrs[0].str); + SAVEFREESV(data.substrs[1].str); SAVEFREESV(data.last_found); first = scan; if (!ri->regstclass) { @@ -7528,7 +7582,7 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, CHECK_RESTUDY_GOTO_butfirst(LEAVE_with_name("study_chunk")); - if ( RExC_npar == 1 && data.longest == &(data.longest_fixed) + if ( RExC_npar == 1 && !data.cur_is_floating && data.last_start_min == 0 && data.last_end > 0 && !RExC_seen_zerolen && !(RExC_seen & REG_VERBARG_SEEN) @@ -7538,62 +7592,49 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, } scan_commit(pRExC_state, &data,&minlen,0); - longest_float_length = CHR_SVLEN(data.longest_float); - - if (! ((SvCUR(data.longest_fixed) /* ok to leave SvCUR */ - && data.offset_fixed == data.offset_float_min - && SvCUR(data.longest_fixed) == SvCUR(data.longest_float))) - && S_setup_longest (aTHX_ pRExC_state, - data.longest_float, - &(r->float_utf8), - &(r->float_substr), - &(r->float_end_shift), - data.lookbehind_float, - data.offset_float_min, - data.minlen_float, - longest_float_length, - cBOOL(data.flags & SF_FL_BEFORE_EOL), - cBOOL(data.flags & SF_FL_BEFORE_MEOL))) - { - r->float_min_offset = data.offset_float_min - data.lookbehind_float; - r->float_max_offset = data.offset_float_max; - if (data.offset_float_max < SSize_t_MAX) /* Don't offset infinity */ - r->float_max_offset -= data.lookbehind_float; - SvREFCNT_inc_simple_void_NN(data.longest_float); - } - else { - r->float_substr = r->float_utf8 = NULL; - longest_float_length = 0; - } - longest_fixed_length = CHR_SVLEN(data.longest_fixed); - - if (S_setup_longest (aTHX_ pRExC_state, - data.longest_fixed, - &(r->anchored_utf8), - &(r->anchored_substr), - &(r->anchored_end_shift), - data.lookbehind_fixed, - data.offset_fixed, - data.minlen_fixed, - longest_fixed_length, - cBOOL(data.flags & SF_FIX_BEFORE_EOL), - cBOOL(data.flags & SF_FIX_BEFORE_MEOL))) - { - r->anchored_offset = data.offset_fixed - data.lookbehind_fixed; - SvREFCNT_inc_simple_void_NN(data.longest_fixed); - } - else { - r->anchored_substr = r->anchored_utf8 = NULL; - longest_fixed_length = 0; - } + /* XXX this is done in reverse order because that's the way the + * code was before it was parameterised. Don't know whether it + * actually needs doing in reverse order. DAPM */ + for (i = 1; i >= 0; i--) { + longest_length[i] = CHR_SVLEN(data.substrs[i].str); + + if ( !( i + && SvCUR(data.substrs[0].str) /* ok to leave SvCUR */ + && data.substrs[0].min_offset + == data.substrs[1].min_offset + && SvCUR(data.substrs[0].str) + == SvCUR(data.substrs[1].str) + ) + && S_setup_longest (aTHX_ pRExC_state, + &(r->substrs->data[i]), + &(data.substrs[i]), + longest_length[i])) + { + r->substrs->data[i].min_offset = + data.substrs[i].min_offset - data.substrs[i].lookbehind; + + r->substrs->data[i].max_offset = data.substrs[i].max_offset; + /* Don't offset infinity */ + if (data.substrs[i].max_offset < SSize_t_MAX) + r->substrs->data[i].max_offset -= data.substrs[i].lookbehind; + SvREFCNT_inc_simple_void_NN(data.substrs[i].str); + } + else { + r->substrs->data[i].substr = NULL; + r->substrs->data[i].utf8_substr = NULL; + longest_length[i] = 0; + } + } + LEAVE_with_name("study_chunk"); if (ri->regstclass && (OP(ri->regstclass) == REG_ANY || OP(ri->regstclass) == SANY)) ri->regstclass = NULL; - if ((!(r->anchored_substr || r->anchored_utf8) || r->anchored_offset) + if ((!(r->substrs->data[0].substr || r->substrs->data[0].utf8_substr) + || r->substrs->data[0].min_offset) && stclass_flag && ! (ANYOF_FLAGS(data.start_class) & SSC_MATCHES_EMPTY_STRING) && is_ssc_worth_it(pRExC_state, data.start_class)) @@ -7616,37 +7657,29 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, data.start_class = NULL; } - /* A temporary algorithm prefers floated substr to fixed one to dig - * more info. */ - if (longest_fixed_length > longest_float_length) { - r->substrs->check_ix = 0; - r->check_end_shift = r->anchored_end_shift; - r->check_substr = r->anchored_substr; - r->check_utf8 = r->anchored_utf8; - r->check_offset_min = r->check_offset_max = r->anchored_offset; - if (r->intflags & (PREGf_ANCH_SBOL|PREGf_ANCH_GPOS)) - r->intflags |= PREGf_NOSCAN; - } - else { - r->substrs->check_ix = 1; - r->check_end_shift = r->float_end_shift; - r->check_substr = r->float_substr; - r->check_utf8 = r->float_utf8; - r->check_offset_min = r->float_min_offset; - r->check_offset_max = r->float_max_offset; - } + /* A temporary algorithm prefers floated substr to fixed one of + * same length to dig more info. */ + i = (longest_length[0] <= longest_length[1]); + r->substrs->check_ix = i; + r->check_end_shift = r->substrs->data[i].end_shift; + r->check_substr = r->substrs->data[i].substr; + r->check_utf8 = r->substrs->data[i].utf8_substr; + r->check_offset_min = r->substrs->data[i].min_offset; + r->check_offset_max = r->substrs->data[i].max_offset; + if (!i && (r->intflags & (PREGf_ANCH_SBOL|PREGf_ANCH_GPOS))) + r->intflags |= PREGf_NOSCAN; + if ((r->check_substr || r->check_utf8) ) { r->extflags |= RXf_USE_INTUIT; if (SvTAIL(r->check_substr ? r->check_substr : r->check_utf8)) r->extflags |= RXf_INTUIT_TAIL; } - r->substrs->data[0].max_offset = r->substrs->data[0].min_offset; /* XXX Unneeded? dmq (shouldn't as this is handled elsewhere) - if ( (STRLEN)minlen < longest_float_length ) - minlen= longest_float_length; - if ( (STRLEN)minlen < longest_fixed_length ) - minlen= longest_fixed_length; + if ( (STRLEN)minlen < longest_length[1] ) + minlen= longest_length[1]; + if ( (STRLEN)minlen < longest_length[0] ) + minlen= longest_length[0]; */ } else { @@ -7672,8 +7705,12 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count, CHECK_RESTUDY_GOTO_butfirst(NOOP); - r->check_substr = r->check_utf8 = r->anchored_substr = r->anchored_utf8 - = r->float_substr = r->float_utf8 = NULL; + r->check_substr = NULL; + r->check_utf8 = NULL; + r->substrs->data[0].substr = NULL; + r->substrs->data[0].utf8_substr = NULL; + r->substrs->data[1].substr = NULL; + r->substrs->data[1].utf8_substr = NULL; if (! (ANYOF_FLAGS(data.start_class) & SSC_MATCHES_EMPTY_STRING) && is_ssc_worth_it(pRExC_state, data.start_class)) @@ -18926,6 +18963,7 @@ void Perl_regdump(pTHX_ const regexp *r) { #ifdef DEBUGGING + int i; SV * const sv = sv_newmortal(); SV *dsv= sv_newmortal(); RXi_GET_DECL(r,ri); @@ -18936,41 +18974,40 @@ Perl_regdump(pTHX_ const regexp *r) (void)dumpuntil(r, ri->program, ri->program + 1, NULL, NULL, sv, 0, 0); /* Header fields of interest. */ - if (r->anchored_substr) { - RE_PV_QUOTED_DECL(s, 0, dsv, SvPVX_const(r->anchored_substr), - RE_SV_DUMPLEN(r->anchored_substr), 30); - Perl_re_printf( aTHX_ - "anchored %s%s at %" IVdf " ", - s, RE_SV_TAIL(r->anchored_substr), - (IV)r->anchored_offset); - } else if (r->anchored_utf8) { - RE_PV_QUOTED_DECL(s, 1, dsv, SvPVX_const(r->anchored_utf8), - RE_SV_DUMPLEN(r->anchored_utf8), 30); - Perl_re_printf( aTHX_ - "anchored utf8 %s%s at %" IVdf " ", - s, RE_SV_TAIL(r->anchored_utf8), - (IV)r->anchored_offset); - } - if (r->float_substr) { - RE_PV_QUOTED_DECL(s, 0, dsv, SvPVX_const(r->float_substr), - RE_SV_DUMPLEN(r->float_substr), 30); - Perl_re_printf( aTHX_ - "floating %s%s at %" IVdf "..%" UVuf " ", - s, RE_SV_TAIL(r->float_substr), - (IV)r->float_min_offset, (UV)r->float_max_offset); - } else if (r->float_utf8) { - RE_PV_QUOTED_DECL(s, 1, dsv, SvPVX_const(r->float_utf8), - RE_SV_DUMPLEN(r->float_utf8), 30); - Perl_re_printf( aTHX_ - "floating utf8 %s%s at %" IVdf "..%" UVuf " ", - s, RE_SV_TAIL(r->float_utf8), - (IV)r->float_min_offset, (UV)r->float_max_offset); + for (i = 0; i < 2; i++) { + if (r->substrs->data[i].substr) { + RE_PV_QUOTED_DECL(s, 0, dsv, + SvPVX_const(r->substrs->data[i].substr), + RE_SV_DUMPLEN(r->substrs->data[i].substr), + 30); + Perl_re_printf( aTHX_ + "%s %s%s at %" IVdf "..%" UVuf " ", + i ? "floating" : "anchored", + s, + RE_SV_TAIL(r->substrs->data[i].substr), + (IV)r->substrs->data[i].min_offset, + (UV)r->substrs->data[i].max_offset); + } + else if (r->substrs->data[i].utf8_substr) { + RE_PV_QUOTED_DECL(s, 1, dsv, + SvPVX_const(r->substrs->data[i].utf8_substr), + RE_SV_DUMPLEN(r->substrs->data[i].utf8_substr), + 30); + Perl_re_printf( aTHX_ + "%s utf8 %s%s at %" IVdf "..%" UVuf " ", + i ? "floating" : "anchored", + s, + RE_SV_TAIL(r->substrs->data[i].utf8_substr), + (IV)r->substrs->data[i].min_offset, + (UV)r->substrs->data[i].max_offset); + } } + if (r->check_substr || r->check_utf8) Perl_re_printf( aTHX_ (const char *) - (r->check_substr == r->float_substr - && r->check_utf8 == r->float_utf8 + ( r->check_substr == r->substrs->data[1].substr + && r->check_utf8 == r->substrs->data[1].utf8_substr ? "(checking floating" : "(checking anchored")); if (r->intflags & PREGf_NOSCAN) Perl_re_printf( aTHX_ " noscan"); @@ -19482,10 +19519,11 @@ Perl_pregfree2(pTHX_ REGEXP *rx) Safefree(r->xpv_len_u.xpvlenu_pv); } if (r->substrs) { - SvREFCNT_dec(r->anchored_substr); - SvREFCNT_dec(r->anchored_utf8); - SvREFCNT_dec(r->float_substr); - SvREFCNT_dec(r->float_utf8); + int i; + for (i = 0; i < 2; i++) { + SvREFCNT_dec(r->substrs->data[i].substr); + SvREFCNT_dec(r->substrs->data[i].utf8_substr); + } Safefree(r->substrs); } RX_MATCH_COPY_FREE(rx); @@ -19562,13 +19600,14 @@ Perl_reg_temp_copy (pTHX_ REGEXP *ret_x, REGEXP *rx) Copy(r->offs, ret->offs, npar, regexp_paren_pair); } if (r->substrs) { + int i; Newx(ret->substrs, 1, struct reg_substr_data); StructCopy(r->substrs, ret->substrs, struct reg_substr_data); - SvREFCNT_inc_void(ret->anchored_substr); - SvREFCNT_inc_void(ret->anchored_utf8); - SvREFCNT_inc_void(ret->float_substr); - SvREFCNT_inc_void(ret->float_utf8); + for (i = 0; i < 2; i++) { + SvREFCNT_inc_void(ret->substrs->data[i].substr); + SvREFCNT_inc_void(ret->substrs->data[i].utf8_substr); + } /* check_substr and check_utf8, if non-NULL, point to either their anchored or float namesakes, and don't hold a second reference. */ @@ -19747,36 +19786,41 @@ Perl_re_dup_guts(pTHX_ const REGEXP *sstr, REGEXP *dstr, CLONE_PARAMS *param) /* Do it this way to avoid reading from *r after the StructCopy(). That way, if any of the sv_dup_inc()s dislodge *r from the L1 cache, it doesn't matter. */ + int i; const bool anchored = r->check_substr - ? r->check_substr == r->anchored_substr - : r->check_utf8 == r->anchored_utf8; + ? r->check_substr == r->substrs->data[0].substr + : r->check_utf8 == r->substrs->data[0].utf8_substr; Newx(ret->substrs, 1, struct reg_substr_data); StructCopy(r->substrs, ret->substrs, struct reg_substr_data); - ret->anchored_substr = sv_dup_inc(ret->anchored_substr, param); - ret->anchored_utf8 = sv_dup_inc(ret->anchored_utf8, param); - ret->float_substr = sv_dup_inc(ret->float_substr, param); - ret->float_utf8 = sv_dup_inc(ret->float_utf8, param); + for (i = 0; i < 2; i++) { + ret->substrs->data[i].substr = + sv_dup_inc(ret->substrs->data[i].substr, param); + ret->substrs->data[i].utf8_substr = + sv_dup_inc(ret->substrs->data[i].utf8_substr, param); + } /* check_substr and check_utf8, if non-NULL, point to either their anchored or float namesakes, and don't hold a second reference. */ if (ret->check_substr) { if (anchored) { - assert(r->check_utf8 == r->anchored_utf8); - ret->check_substr = ret->anchored_substr; - ret->check_utf8 = ret->anchored_utf8; + assert(r->check_utf8 == r->substrs->data[0].utf8_substr); + + ret->check_substr = ret->substrs->data[0].substr; + ret->check_utf8 = ret->substrs->data[0].utf8_substr; } else { - assert(r->check_substr == r->float_substr); - assert(r->check_utf8 == r->float_utf8); - ret->check_substr = ret->float_substr; - ret->check_utf8 = ret->float_utf8; + assert(r->check_substr == r->substrs->data[1].substr); + assert(r->check_utf8 == r->substrs->data[1].utf8_substr); + + ret->check_substr = ret->substrs->data[1].substr; + ret->check_utf8 = ret->substrs->data[1].utf8_substr; } } else if (ret->check_utf8) { if (anchored) { - ret->check_utf8 = ret->anchored_utf8; + ret->check_utf8 = ret->substrs->data[0].utf8_substr; } else { - ret->check_utf8 = ret->float_utf8; + ret->check_utf8 = ret->substrs->data[1].utf8_substr; } } } -- Perl5 Master Repository