On Jun 10, Beau E. Cox said: >sub parse_words >{ > my $line = shift; > my @words = (); > > $_ = $line;
You should localize $_ if you're going to be assigning to it explicitly. local $_ = $line; > while( 1 ) { > s/^\s*(.*?)\s*$/$1/; This is not a very efficient way to remove leading and trailing whitespace from a string (and it breaks if there are newlines INSIDE the string). Sometimes, one must resist the urge to try and do everything in one regex. s/^\s+//; s/\s+$//; will end up being much faster in removing leading and trailing spaces (although for reasons I don't want to get into, the trailing-spaces regex is not nearly as efficient as I'd like it to be). > last unless length $_; > pos( $_ ) = 0; > if( /^"(.*?)"/g || /^'(.*?)'/g || > /^\/(.*?)\//g || /^\((.*?)\)/g || > /^{(.*?)}/g || /^\[(.*?)\]/g || > /^<(.*?)>/g || /^#(.*?)#/g > ) { I would suggest a change in the mechanism you're using. Instead of doing if ( /^(p1)/g or /^(p2)/g or /^(p3)/g or /^(p4)/g ) { push @w, $1; $_ = substr $_, pos($_); } I would suggest using what I call the "inch-worm" approach, which uses the \G anchor and the /gc modifiers. if ( /\G(p1)/gc or /\G(p2)/gc or /\G(p3)/gc or /\G(p4)/gc ) { push @w, $1; } You don't need to keep track of pos() or modify $_ yourself. The /c modifier changes the meaning of the /g modifier slightly: it says that if the regex doesn't match, it should NOT clear pos(), which a /g regex normally would. The \G anchor says "match IMMEDIATELY where the last regex left off", or more specifically, it anchors the regex to match at the location of pos(). Here's a demonstration of /gc versus /g: $str = "perl"; $str =~ /../g; # sets pos($str) to 2 if ($str =~ /(...)/g or $str =~ /(..)/g) { $x = $1; # $x is 'pe' } $str = "perl"; $str =~ /../g; # sets pos($str) to 2 if ($str =~ /(...)/gc or $str =~ /(..)/gc) { $y = $1; # $y is 'rl' } $x is 'pe' because when we do /(...)/g on $str, the regex fails to match, and pos($str) is reset, so then /(..)/g matches the first two characters of $str. $y is 'rl' because of the /c modifier -- when /(...)/gc fails, pos($str) is NOT changed, so the next regex, /(..)/gc, matches, and since pos($str) is 2, it matches starting at that location (or later). Here's a demonstration of \G: $str = "Perl"; $str =~ /(..)/g; # puts 'Pe' in $1 and sets pos($str) to 2 $str =~ /\G(.)/g; # this puts 'r' in $1 I'd say more, but I'm on vacation and I need to leave for church, so I'll leave additional comments for later tonight or tomorrow morning. -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ CPAN ID: PINYAN [Need a programmer? If you like my work, let me know.] <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>