Hi -

I am trying to come up with a simple, elegant word parsing script, that:

* takes a scalar string, and
* splits it into words separating on white space, commas,
  and a set of delimiters: "" '' // () {} [] ##, and
* returns the array of words.

So far I have:


# ----------------------------------------------------------------
print( '-', join( '-&-', parse_words( $_ ) ), "-\n" ) for( @ARGV );

sub parse_words
{
    my $line = shift;
    my @words = ();

    $_ = $line;
    
    while( 1 ) {
        s/^\s*(.*?)\s*$/$1/;
        last unless length $_;
        pos( $_ ) = 0;
        if( /^"(.*?)"/g   || /^'(.*?)'/g   ||
            /^\/(.*?)\//g || /^\((.*?)\)/g ||
            /^{(.*?)}/g   || /^\[(.*?)\]/g ||
            /^<(.*?)>/g   || /^#(.*?)#/g
            ) {
            push @words, $1;
            $_ = substr $_, pos( $_ );
            next;
        }
        if( /^(.*?),/g ) {
            push @words, $1;
            $_ = substr $_, pos( $_ );
            next;
        }
        if( /^(.*?)\s+/g ) {
            push @words, $1;
            $_ = substr $_, pos( $_ );
            next;
        }
        push( @words, $_ ) if length $_;
        last;
    }

    @words;
}
# ----------------------------------------------------------------

A test gives the correct results:

perl t.pl "\"mother's apple pie\" <randy 'lewis'>apple, corn dog,0 1 2"
-mother's apple pie-&-randy 'lewis'-&-apple-&-corn dog-&-0-&-1-&-2-

Now this is fine, and I can use it as is, but, I seems a bit pedestrian
and heavy-handed. I tried, and failed, to write one using a super-all-
in-one regex in a progressive matching /g while loop.

Does anyone want to help me find elegance?

Aloha => Beau;


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to