Hi,

I was parsing a collection of HTML files where I wanted to extract a certain
block from each file, like this:

> ./script.pl *.html

my $accumulator;
my $capture_counter;

while ( <> ) {
    if ( /<h1>/.../labelsub/ ) {
        $accumulator .= $_ unless /labelsub/;
        if ( /labelsub/ && !$capture_counter ) {
            print $accumulator;
            $capture_counter = 1;
        }
        else {
            next;
        }
    }
    else {
        next;
    }
}
continue { # flush out the variables and clean up
   if ( eof ) {
        close ARGV;
        $accumulator = '';
        $capture_counter = '';
    }
}

The bit about the $capture_counter is because some of the files have
multiple blocks of text that could be accumulated, and I only want the first
block in the file.

This usually works fine, until I encountered an input file that did not
contain the string 'labelsub' after the first '<h1>' regex pattern match.
Then the conditional if test continued to search in the incoming lines in
the next file (because I am processing a whole batch using the while (<>)
operator), which it eventually found, and then printed nothing, because at
the end-of-file of the previous file, the script flushed the contents of the
accumulator.

One solution is to just run the same script individually on each file, but I
was wondering if there was a way to reset the 'state' of the range operator
pattern match at the end of the physical file (or at any other time for that
matter)?

Thanks,

--Marc

Reply via email to