>sub parse_words
>    my $line = shift;
>    my @words = ();
>    $_ = $line;

You should localize $_ if you're going to be assigning to it explicitly.

  local $_ = $line;

>    while( 1 ) {
>        s/^\s*(.*?)\s*$/$1/;

This is not a very efficient way to remove leading and trailing whitespace
from a string (and it breaks if there are newlines INSIDE the string).
Sometimes, one must resist the urge to try and do everything in one regex.


will end up being much faster in removing leading and trailing spaces
(although for reasons I don't want to get into, the trailing-spaces regex
is not nearly as efficient as I'd like it to be).

>        last unless length $_;
>        pos( $_ ) = 0;
>        if( /^"(.*?)"/g   || /^'(.*?)'/g   ||
>            /^\/(.*?)\//g || /^\((.*?)\)/g ||
>            /^{(.*?)}/g   || /^\[(.*?)\]/g ||
>            /^<(.*?)>/g   || /^#(.*?)#/g
>            ) {

I would suggest a change in the mechanism you're using.  Instead of doing

  if ( /^(p1)/g or /^(p2)/g or /^(p3)/g or /^(p4)/g ) {
    push @w, $1;
    $_ = substr $_, pos($_);

I would suggest using what I call the "inch-worm" approach, which uses the
\G anchor and the /gc modifiers.

  if ( /\G(p1)/gc or /\G(p2)/gc or /\G(p3)/gc or /\G(p4)/gc ) {
    push @w, $1;

You don't need to keep track of pos() or modify $_ yourself.  The /c
modifier changes the meaning of the /g modifier slightly:  it says that if
the regex doesn't match, it should NOT clear pos(), which a /g regex
normally would.  The \G anchor says "match IMMEDIATELY where the last
regex left off", or more specifically, it anchors the regex to match at
the location of pos().

Here's a demonstration of /gc versus /g:

  $str = "perl";
  $str =~ /../g;  # sets pos($str) to 2
  if ($str =~ /(...)/g or $str =~ /(..)/g) {
    $x = $1;  # $x is 'pe'

  $str = "perl";
  $str =~ /../g;  # sets pos($str) to 2
  if ($str =~ /(...)/gc or $str =~ /(..)/gc) {
    $y = $1;  # $y is 'rl'

$x is 'pe' because when we do /(...)/g on $str, the regex fails to match,
and pos($str) is reset, so then /(..)/g matches the first two characters
of $str.  $y is 'rl' because of the /c modifier -- when /(...)/gc fails,
pos($str) is NOT changed, so the next regex, /(..)/gc, matches, and since
pos($str) is 2, it matches starting at that location (or later).

Here's a demonstration of \G:

  $str = "Perl";
  $str =~ /(..)/g;   # puts 'Pe' in $1 and sets pos($str) to 2
  $str =~ /\G(.)/g;  # this puts 'r' in $1

I'd say more, but I'm on vacation and I need to leave for church, so I'll
leave additional comments for later tonight or tomorrow morning.

