demerphq <[EMAIL PROTECTED]> wrote:
:Hmm. Yes I think you are right. Ok. If i rework it so the trie is
:allocated and inserted into the regexes data structure at the get-go
:then pregfree would clean it up.

Make sure in that case that you can cope with reentrancy.

:Should i use distinct ids for each call?

I think it would be desirable.

:Coming up with limits on pathological cases is something I had meant
:to mention already but I forgot. You are quite correct there should be
:some hardlimits, but im not sure what they should be, nor how they
:should be handled/configured. Should they be defined in regcomp.h? Id
:appreciate guidance here.
[...]
:To build the initial structure we need need to initially construct a
:two dimensional array, with the number of unique chars in the cols,
:and the length of the strings summed together as the number of rows.
:
:The number of rows is worst case and will be smaller when there are
:common prefixes in the trie. The number of columns only matters during
:construction as the compression logic removes or eliminates most of
:the wasted space. Eg, a string of all of the extended ASCII chars
:would take 256 * 256 nodes to construct, but would compress to only
:256 nodes.
:
:I suppose its possible to construct a compressed trie directly, but it
:would be a lot more complex (youd have to insert all of the first
:digits keep track of the transition points for each, then insert all
:of the second digits, etc, making the process a lot slower). IMO its
:not worth it if intelligent hardlimits are imposed. If people want to
:run pathological regexes then they can run them slowly. :-)

That would be fine if no pathological case would arise in normal usage.
But even where there are a mere 256 unique characters (ie, a situation
that does not even require Unicode) you'll need 2k per character, which
already seems excessive.

Maybe it would make sense to use a poor man's hash: have trans[] be
a C array of (charcount + 1) vectors, each of which is either a linked
list or a resizable array of (charid, reg_trie_trans*) pairs, and walk
the list to search for any existing entry each time - if I understand
correctly, seeing an 'a' => 'b' transition would then take processing
time proportional to the number of distinct 'a' => 'x' transitions
seen previously, but even the worst case would be less bad than trying
to allocate a 32GB data structure.

A hardcoded limit is a possible fallback, but my guess is that such
pathological cases could have most to gain from the trie so I think it'd
be preferable to try and cater for them.

:> :-       PerlIO_printf(Perl_debug_log, "%sCompiling REx%s `%s%*s%s'\n",
:> :+       PerlIO_printf(Perl_debug_log, "%sCompiling REx%s \"%s%*s%s\"\n",
:> [...]
:> :-                     "%sFreeing REx:%s `%s%*.*s%s%s'\n",
:> :+                     "%sFreeing REx:%s %s%*.*s%s%s\n",
:> 
:> Hmm, I hope you checked no-one is trying to parse that output ...
:
:Well, im not sure how I can do that.

You can't, which was the point. :)

:I guess this could be omitted, it just annoyed me.

It has annoyed me too in the past. Just wanted to be sure you were aware
of the danger - I'm sure we've never guaranteed the output (and it has
changed often enough), so it's probably ok. Might want to check with mjd
and japhy though, since I think neither usually follow p5p.

:I was able to address the memory leak issue of EVAL nodes by tracking
:when they are present and then using SAVEFREEPV() instead of
:Safefree() when an EVAL node is present and there are more than one
:accept states. It seems to work fine, insofar that I see no leakage
:from
: 
:  perl -e "eval { ('a' x 100)=~/(?:a|aa|aaa|aaaa)(??{die})/ } while 1"
:
:Can you see a flaw with this approach? 

If I understand correctly, this will free them at the next LEAVE, which
will normally occur when we leave the scope. I don't know offhand when
that's occurring in this case, or whether it's possible to construct
an example that defers it without bound on the number of these structs.

If the SAVEFREEPV approach works, I suggest using it always - EVAL is
not the only thing you may need to cater for.

Hugo

Reply via email to