On Jun 10, Beau E. Cox said:
>sub parse_words
>{
> my $line = shift;
> my @words = ();
>
> $_ = $line;
You should localize $_ if you're going to be assigning to it explicitly.
local $_ = $line;
> while( 1 ) {
> s/^\s*(.*?)\s*$/$1/;
This is not a very efficient way to remove leading and trailing whitespace
from a string (and it breaks if there are newlines INSIDE the string).
Sometimes, one must resist the urge to try and do everything in one regex.
s/^\s+//;
s/\s+$//;
will end up being much faster in removing leading and trailing spaces
(although for reasons I don't want to get into, the trailing-spaces regex
is not nearly as efficient as I'd like it to be).
> last unless length $_;
> pos( $_ ) = 0;
> if( /^"(.*?)"/g || /^'(.*?)'/g ||
> /^\/(.*?)\//g || /^\((.*?)\)/g ||
> /^{(.*?)}/g || /^\[(.*?)\]/g ||
> /^<(.*?)>/g || /^#(.*?)#/g
> ) {
I would suggest a change in the mechanism you're using. Instead of doing
if ( /^(p1)/g or /^(p2)/g or /^(p3)/g or /^(p4)/g ) {
push @w, $1;
$_ = substr $_, pos($_);
}
I would suggest using what I call the "inch-worm" approach, which uses the
\G anchor and the /gc modifiers.
if ( /\G(p1)/gc or /\G(p2)/gc or /\G(p3)/gc or /\G(p4)/gc ) {
push @w, $1;
}
You don't need to keep track of pos() or modify $_ yourself. The /c
modifier changes the meaning of the /g modifier slightly: it says that if
the regex doesn't match, it should NOT clear pos(), which a /g regex
normally would. The \G anchor says "match IMMEDIATELY where the last
regex left off", or more specifically, it anchors the regex to match at
the location of pos().
Here's a demonstration of /gc versus /g:
$str = "perl";
$str =~ /../g; # sets pos($str) to 2
if ($str =~ /(...)/g or $str =~ /(..)/g) {
$x = $1; # $x is 'pe'
}
$str = "perl";
$str =~ /../g; # sets pos($str) to 2
if ($str =~ /(...)/gc or $str =~ /(..)/gc) {
$y = $1; # $y is 'rl'
}
$x is 'pe' because when we do /(...)/g on $str, the regex fails to match,
and pos($str) is reset, so then /(..)/g matches the first two characters
of $str. $y is 'rl' because of the /c modifier -- when /(...)/gc fails,
pos($str) is NOT changed, so the next regex, /(..)/gc, matches, and since
pos($str) is 2, it matches starting at that location (or later).
Here's a demonstration of \G:
$str = "Perl";
$str =~ /(..)/g; # puts 'Pe' in $1 and sets pos($str) to 2
$str =~ /\G(.)/g; # this puts 'r' in $1
I'd say more, but I'm on vacation and I need to leave for church, so I'll
leave additional comments for later tonight or tomorrow morning.
--
Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/
RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
CPAN ID: PINYAN [Need a programmer? If you like my work, let me know.]
<stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>