Conor Lillis <> wrote:
> Hi all,
> I have a requirement to match a file against a number of possible
> strings.
> I also need to retain the successfully matching elements, and based
> on the match from the primary list, see if a corresponding secondary
> match is also in the file.
>
> E.g.. Picture the list below
>
> A B
> ---------
> 1 a
> 2 b
> 3 c
> 4 d
> 5 e
> 6 f
>
> I need to match any of the first column in the file, and for any
> matches in the first row match the corresponding entry for the second
> column.
> So if I match 3 and 5 in the file, I rescan and see if I can match c
> or e in the same file.
>
> Here is a snippet of how I am matching anything from column A, and
> then rescanning for only corresponding entries from column B.
> Is there a more efficient method than what I am doing here?
Probably, but if it works, and is fast enough, why bother.
So, assuming that it isn't fast enough (not sure if it works, as I am
not sure if I understand what you are trying to do), here is where I get
to pick holes in your code. (:-) in cast it isn't obvious).
First, I realise that this is just a code fragment, but you do have "use
strict; use warnings;" at the start, don't you? If not consider this the
first comment. Note that Outlook has decided to wrap some of your lines.
>
> while(<FILE>)
> {
> foreach my $string (@strings)
> {
> if (grep /$string/i, $_)
grep is wasted here. "if (/$string/i)" is more Perl-ish. Also it would
probably be more efficient to perform a single match for the whole set
of strings, than multiple matches for each one. See below.
> {
> $primary =++$primary;
Weird! What's wrong with just "++$primary;". That's the equivalent of
saying "$primary += 1; $primary = $primary".
> if (!grep /$string/i,
> @matchindex){push(@matchindex, "$string");}
That's more appropriate use of grep, but your quoting of $string is
unnecessary. Useless quoting is generally frowned upon.
> print gmtime()."\"$file\" matched on primary -
$string\n";
> }
> }
> }
> close(FILE);
No need to close the file here.
> if (@matchindex)
> {
> # GetSecondaries() Reads array to get 2nd column entries for
> primary matches by splitting row on seperators
> my @matches = GetSecondaries(@matchindex);
> open(FILE, $file);
You could just seek back to the beginning of the file rather than open
the file again. Also, you should always check that the open call was
successful.
> LOOP: while(<FILE>)
> {
> foreach my $string (@matches)
> {
> if (grep /$string/i, $_)
> {
> $secondarycounter = ++$secondarycounter;
As above.
> print gmtime()."********\t: Also matched
on secondary -
> $string\n"; logger("$string,$file");
#
> Logs output to a log file
> last LOOP;
Are you sure about that? You exit from the loop after the very first
secondary match. What about the others?
> }
> }
> }
> close(FILE);
If my guess about what you are trying to do is right, the following
might be a bit more efficient.
-------------------------------------------------------
use strict;
use warnings;
use Fcntl qw{:seek};
my %match_table = (1 => "a",
2 => "b",
3 => "c",
4 => "d",
5 => "e",
6 => "f");
my $fn = shift;
die "Expecting filename\n" unless $fn;
open my $fd, "<", $fn or die "Failed to open $fn: $!\n";
my @primaries = find_matches($fd, sort keys %match_table);
@primaries > 0 or die "No primaries found in $fn\n";
print "Found primaries: @primaries\n";
seek $fd, 0, SEEK_SET or die "Failed to seek: $!\n";
$. = undef;
my @secondaries = find_matches($fd, sort @match_tab...@primaries});
@secondaries > 0 or die "No secondaries found\n";
print "Found secondaries: @secondaries\n";
close $fd;
sub find_matches {
my $fd = shift;
my @strings = @_;
my @matches;
return () unless @strings > 0;
my $matchRE = makeRE(@strings);
print "Start looking at line $.\n";
while (<$fd>) {
if (/($matchRE)/i) {
my $match_str = $1;
push @matches, $match_str;
@strings = grep {$_ ne $match_str} @strings;
$matchRE = makeRE(@strings);
# No point continuing if nothing left to look for
last unless $matchRE;
}
}
print "Stopped looking at line $.\n";
return @matches;
}
# Make regexp to match a list of words.
sub makeRE {
return undef unless @_ > 0;
my $str = join "|", map {"(?:$_)"} @_;
return qr{$str};
}
-------------------------------------------------------
You may be able to do better with a single pass through the file,
depending upon the size of your file and the table of strings you are
looking for.
HTH
--
Brian Raven
This e-mail may contain confidential and/or privileged information. If you are
not the intended recipient or have received this e-mail in error, please advise
the sender immediately by reply e-mail and delete this message and any
attachments without retaining a copy.
Any unauthorised copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs