Re: Text::Balanced v Parse::RecDescent

Damian Conway Tue, 03 Dec 2002 12:32:40 -0800

Andrew Savige wrote:

I like this elegant solution.
However, these recursive regexes seem to have a flawed implementation.
For example, with the following test data:

__DATA__
{1}, hello one two three

it seems to hang. Actually, it finishes eventually, taking 87.33
seconds on Linux Perl 5.6.1.

It's not the implementation that's flawed, it's the interaction of recursion,
greediness, and backtracking that's to blame.


Here's a postmaturely optimized solution that makes use of the (?>...)
metasyntax to prevent expensive and useless backtracking.

You should find it runs very much faster.

-----cut----------cut----------cut----------cut----------cut-----

use re 'eval';

our $quoted = qr/ ' [^'\\]* (?> (?> \\. [^'\\]* )* ) '  # Match 'str'
                | " [^"\\]* (?> (?> \\. [^"\\]* )* ) "  # Match 'str'
                /x;

our $element = qr/ (?> (?> [^'"{},]+ )          # Match non-special characters
                     | \\.                      # Match escaped anything
                     | $quoted                  # Match quoted anything
                     | (??{$nested})            # Match {...,...,...}
                   )
                 /xs;

our $nested  = qr/ [{]                          # Match {
                   (?> (?: $element , )* )      # Match list of subelements
                   $element?                    # Match last subelement
                   [}]                          # Match }
                 /x;


$data = <DATA>;

@fields = $data =~ m/\G ( $element ) ,? /gx;    # Capture elements repeatedly

use Data::Dumper 'Dumper';
print Dumper( @fields );

__DATA__
{1}, hello one two three four five six seven eight nine heat-death-of-the-universe

-----cut----------cut----------cut----------cut----------cut-----

Note that I also added a \G to the actual m// matcher, to ensure that
the sequence of elements matched is actually sequential (i.e. no
convenient skipping of inconvenient non-elements in the middle).

Damian

Re: Text::Balanced v Parse::RecDescent

Reply via email to