Hi,
I was parsing a collection of HTML files where I wanted to extract a certain
block from each file, like this:
> ./script.pl *.html
my $accumulator;
my $capture_counter;
while ( <> ) {
if ( /<h1>/.../labelsub/ ) {
$accumulator .= $_ unless /labelsub/;
if ( /labelsub/ && !$capture_counter ) {
print $accumulator;
$capture_counter = 1;
}
else {
next;
}
}
else {
next;
}
}
continue { # flush out the variables and clean up
if ( eof ) {
close ARGV;
$accumulator = '';
$capture_counter = '';
}
}
The bit about the $capture_counter is because some of the files have
multiple blocks of text that could be accumulated, and I only want the first
block in the file.
This usually works fine, until I encountered an input file that did not
contain the string 'labelsub' after the first '<h1>' regex pattern match.
Then the conditional if test continued to search in the incoming lines in
the next file (because I am processing a whole batch using the while (<>)
operator), which it eventually found, and then printed nothing, because at
the end-of-file of the previous file, the script flushed the contents of the
accumulator.
One solution is to just run the same script individually on each file, but I
was wondering if there was a way to reset the 'state' of the range operator
pattern match at the end of the physical file (or at any other time for that
matter)?
Thanks,
--Marc