Andrew Savige wrote:
It definitely runs a lot faster.
However, [it screws up for] the original test data:
Mea culpa. I'm a little distracted just at the moment and I
botched the upgrade to (?>...). Here's a solution that actually
works for both data sets (and, I devoutly hope, for all others too!):
-----cut----------cut----------cut----------cut----------cut-----
use re 'eval';
our $quoted = qr/ ' [^'\\]* (?> (?> \\. [^'\\]* )* ) ' # Match 'str'
| " [^"\\]* (?> (?> \\. [^"\\]* )* ) " # Match 'str'
/x;
our $element = qr/ (?> (?: (?> [^'"{},]+ ) # Match non-special characters
| \\. # Or match escaped anything
| $quoted # Or match quoted anything
| (??{$nested}) # Or match {...,...,...}
)+ ) # ...as many times as possible
/xs;
our $nested = qr/ [{] # Match {
(?> (?: $element , )* ) # Match list of subelements
$element? # Match last subelement
[}] # Match }
/x;
while (<DATA>) {
@fields = m/\G ( $element ) ,? /gx; # Capture elements repeatedly
use Data::Dumper 'Dumper';
print Dumper( @fields );
print "=================\n";
}
__DATA__
abc, ',def' "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
{1}, hello one two three four five six seven eight nine heat-death-of-the-universe
-----cut----------cut----------cut----------cut----------cut-----
Will the new Perl 6 pattern matching be a vast improvement for these
sort of parsing problems?
No. Because it still won't do our thinking for us. ;-)
However, syntactically it will be much cleaner, and it will
probably feature much better debugging tools. Both of which
*will* make a big difference.
Here's the same thing in Perl 6:
-----cut----------cut----------cut----------cut----------cut-----
grammar CSV::Nested {
rule quoted { ' <-['\\]>* [ \\. <[^'\\]>* ]*: ' # Match 'str'
| " <-["\\]>* [ \\. <[^"\\]>* ]*: " # Match 'str'
}
rule element { [ <-['"{},]>+: # Match non-special characters
| \\. # Or match escaped anything
| <quoted> # Or match quoted anything
| <nested> # Or match {...,...,...}
]+: # ...as many times as possible
}
rule nested { \{ # Match {
[ <element> , ]*: # Match list of subelements
<element>? # Match last subelement
\} # Match }
}
}
while <$*DATA> {
@fields = m:ec/ ( <CSV::Nested.element> ) ,? /; # Capture elements repeatedly
use Data::Dumper 'Dumper';
print Dumper( @fields );
print "=================\n";
}
__DATA__
abc, ',def' "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
{1}, hello one two three four five six seven eight nine heat-death-of-the-universe
-----cut----------cut----------cut----------cut----------cut-----
Damian