Below is my revised code based on your comments. It is tidier but more
importantly it works correctly. Ironically, it didn't actually work
correctly before on my dev machine either,– it didn't find all matches.
It looks like using my original code it was only using the first element in
each array. Using the map syntax you provided it is now finding matches on
the second regex for the vin field.
Thank you for your help
Gary
#!/usr/bin/perl
# searches a series of OCR generated text files - one per page
# looks for sets of regex's for field contents and stores in arrays
use warnings;
use strict;
my %searches=('stock'=>[qr/\b([NU][LD] *\d{5})\b/],
'regno'=>[qr/\b([A-Za-z]{2}\d{2}[A-Za-z]{3})\b/],
'vin'=>[qr/\b(WF[0O]XX[A-Z]{6}\d{5}\b)/i,
qr/\b([A-Z]{6}\d{5}\b)/i]);
my %found;
my %values;
foreach my $fn (glob("*.txt")) {
print "file.....$fn\n";
my $FH;
if (!open $FH,"<",$fn ) {
print "file open failed: $!\n";
next;
}
my $content = slurp($FH);
close(FH);
foreach my $field (keys %searches) {
if (my @matches = map { $content =~ $_ } @{$searches{$field}}) {
foreach (@matches) {
$_=~s/ //g; # remove spaces
print "match found - '$field': '$_'\n";
$found{$field}{$_}++;
}
}
}
} # foreach page
foreach my $field (keys %found) { # foreach field
my $value='';
my $count=0;
foreach my $key (keys %{$found{$field}}) { # foreach field -> value
# if current key's tally is > the previous, store it
$value=$key if ($found{$field}{$key} > $count);
}
print "field='$field' value='$value'\n";
$values{$field}=$value;
}
sub slurp {
my ($fh)=@_;
local $/;
return <$fh>;
}
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/