Andrew Savige wrote:

I need to split the following:

abc, ',def'  "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'

into comma-separated fields:

1st field: abc
2nd field: ',def'  "\"ab'c,}" xyz
3rd field: fred IN { 1, "x}y",3 } x
4th field: 'z'

This is similar to CSV but with a nasty { , , } construct.

Is Text::Balanced powerful enough to solve this problem or do I need
to use Parse::RecDescent or something else?
Either of those modules could do it, but neither is necessary.
Vanilla regexes can handle this (at least, under 5.6 or later):

-----cut----------cut----------cut----------cut----------cut-----

use re 'eval';

our $quoted  = qr/ ' (?: \\. | [^'] )*? '       # Match 'str'
                 | " (?: \\. | [^"] )*? "       # Match "str"
                 /x;

our $element = qr/ (?: [^'"{,]+                 # Match non-special characters
                     | \\.                      # Match escaped anything
                     | $quoted                  # Match quoted anything
                     | (??{$nested})            # Match {...,...,...}
                   )+
                 /xs;

our $nested  = qr/ [{]                          # Match {
                   (?: $element , )*            # Match list of subelements
                   $element?                    # Match last subelement
                   [}]                          # Match }
                 /x;


$data = <DATA>;

@fields =  $data =~ m/ ( $element ) ,? /gx;     # Capture elements repeatedly

use Data::Dumper 'Dumper';
print Dumper(@fields);

__DATA__
abc, ',def'  "\"ab'c,}" xyz , fred IN { 1, "x}y",3 } x, 'z'

-----cut----------cut----------cut----------cut----------cut-----

Damian


Reply via email to