Andrew Savige wrote:
I need to split the following:
abc, ',def' "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
into comma-separated fields:
1st field: abc
2nd field: ',def' "\"ab'c,}" xyz
3rd field: fred IN { 1, "x}y",3 } x
4th field: 'z'
This is similar to CSV but with a nasty { , , } construct.
Is Text::Balanced powerful enough to solve this problem or do I need
to use Parse::RecDescent or something else?
Either of those modules could do it, but neither is necessary.
Vanilla regexes can handle this (at least, under 5.6 or later):
-----cut----------cut----------cut----------cut----------cut-----
use re 'eval';
our $quoted = qr/ ' (?: \\. | [^'] )*? ' # Match 'str'
| " (?: \\. | [^"] )*? " # Match "str"
/x;
our $element = qr/ (?: [^'"{,]+ # Match non-special characters
| \\. # Match escaped anything
| $quoted # Match quoted anything
| (??{$nested}) # Match {...,...,...}
)+
/xs;
our $nested = qr/ [{] # Match {
(?: $element , )* # Match list of subelements
$element? # Match last subelement
[}] # Match }
/x;
$data = <DATA>;
@fields = $data =~ m/ ( $element ) ,? /gx; # Capture elements repeatedly
use Data::Dumper 'Dumper';
print Dumper(@fields);
__DATA__
abc, ',def' "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'
-----cut----------cut----------cut----------cut----------cut-----
Damian