Andrew Savige wrote:

It definitely runs a lot faster.
However, [it screws up for] the original test data:
Mea culpa. I'm a little distracted just at the moment and I
botched the upgrade to (?>...). Here's a solution that actually
works for both data sets (and, I devoutly hope, for all others too!):

-----cut----------cut----------cut----------cut----------cut-----

use re 'eval';

our $quoted = qr/ ' [^'\\]* (?> (?> \\. [^'\\]* )* ) '  # Match 'str'
                | " [^"\\]* (?> (?> \\. [^"\\]* )* ) "  # Match 'str'
                /x;

our $element = qr/ (?> (?: (?> [^'"{},]+ )      # Match non-special characters
                     | \\.                      # Or match escaped anything
                     | $quoted                  # Or match quoted anything
                     | (??{$nested})            # Or match {...,...,...}
                   )+ )	                        # ...as many times as possible
                 /xs;

our $nested  = qr/ [{]                          # Match {
                   (?> (?: $element , )* )      # Match list of subelements
                   $element?                    # Match last subelement
                   [}]                          # Match }
                 /x;


while (<DATA>) {
        @fields = m/\G ( $element ) ,? /gx;     # Capture elements repeatedly

        use Data::Dumper 'Dumper';
        print Dumper( @fields );

        print "=================\n";
}

__DATA__
abc, ',def'  "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
{1}, hello one two three four five six seven eight nine heat-death-of-the-universe

-----cut----------cut----------cut----------cut----------cut-----


Will the new Perl 6 pattern matching be a vast improvement for these
sort of parsing problems?
No. Because it still won't do our thinking for us. ;-)

However, syntactically it will be much cleaner, and it will
probably feature much better debugging tools. Both of which
*will* make a big difference.

Here's the same thing in Perl 6:

-----cut----------cut----------cut----------cut----------cut-----

grammar CSV::Nested {

    rule quoted  { ' <-['\\]>*  [ \\. <[^'\\]>* ]*: '  # Match 'str'
                 | " <-["\\]>*  [ \\. <[^"\\]>* ]*: "  # Match 'str'
                 }

    rule element { [ <-['"{},]>+:       # Match non-special characters
                   | \\.                # Or match escaped anything
                   | <quoted>           # Or match quoted anything
                   | <nested>           # Or match {...,...,...}
                   ]+:	                # ...as many times as possible
                 }

    rule nested  { \{                   # Match {
                   [ <element> , ]*:    # Match list of subelements
                   <element>?           # Match last subelement
                   \}                   # Match }
                 }
}


while <$*DATA> {
        @fields = m:ec/ ( <CSV::Nested.element> ) ,? /;     # Capture elements repeatedly

        use Data::Dumper 'Dumper';
        print Dumper( @fields );

        print "=================\n";
}

__DATA__
abc, ',def'  "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
{1}, hello one two three four five six seven eight nine heat-death-of-the-universe

-----cut----------cut----------cut----------cut----------cut-----


Damian

Reply via email to