Hi there,
Hello.
I do have one type A file, that holds about 25.000 A-ids, one per line.
Furher I have 500 type B files in a dir that hold 10-500 B-ids each, one per
line.
All files are -T
Now i want to generate 500 type C files, corrosponding to the b files:
each B-id, that occours in a B-type file AND the A-type file should go into
the corrosponding C-file.
That sounds simple enough.
Each A id is unique in the A file, each B-id is unique in its file BUT NOT i
all the Bfiles: each B id may occure in one or more B files.
Im lookin for a fast and memory concious way to do that.
Hmm, both are relative and probably don't go together well here. Is reading 25,001 ids into memory, conscious enough? That would be my first choice.
(Note: The following code works with lines, not ids, since I don't know what they are like. It's also untested code.)
#!/usr/bin/perl
use strict; use warnings;
open A_FILE, '<', 'a_id_file.txt' or die "File error: $!"; my @a_ids; while (<A_FILE>) { chomp; push @a_ids, $_; } close A_FILE;
open C_FILE, '>', 'c_id_file.txt' or die "File error: $!"; opendir B_DIR, 'b_directory' or die "Directory error: $!"; while (defined(my $b_file = readdir B_DIR)) { next if $b_file =~ /^\./; # or whatever filtering you need open B_FILE, '<', $b_file or die "File error: $!"; while (<B_FILE>) { chomp; for my $index (0..$#a_ids) { if ($_ eq $a_ids[$index]) { print C_FILE "$_\n"; splice @a_ids, $index, 1; # remove from a's prevent duplicates last; } } } close B_FILE; } close C_FILE;
__END__
That's probably not perfect, but hopefully it gets you moving in the right direction.
If you don't want to stuff the Type-A IDs in memory like that, you may have to go line by line with the B files, reading all of A to check for a match, as you originally showed. But don't do that, if you can avoid it. That could be pretty slow.
James
-- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>