Re: looping through (a lot of) files

James Edward Gray II Mon, 19 Jan 2004 16:01:58 -0800

On Jan 19, 2004, at 4:33 PM, wolf blaum wrote:

Hi there,

Hello.

I do have one type A file, that holds about 25.000 A-ids, one per line. Furher I have 500 type B files in a dir that hold 10-500 B-ids each, one per line. All files are -T Now i want to generate 500 type C files, corrosponding to the b files:

each B-id, that occours in a B-type file AND the A-type file should go into the corrosponding C-file.

That sounds simple enough.

Each A id is unique in the A file, each B-id is unique in its file BUT NOT i all the Bfiles: each B id may occure in one or more B files.

Im lookin for a fast and memory concious way to do that.

Hmm, both are relative and probably don't go together well here. Is reading 25,001 ids into memory, conscious enough? That would be my first choice.

(Note: The following code works with lines, not ids, since I don't know what they are like. It's also untested code.)

#!/usr/bin/perl

use strict;
use warnings;

open A_FILE, '<', 'a_id_file.txt' or die "File error:  $!";
my @a_ids;
while (<A_FILE>) {
        chomp;
        push @a_ids, $_;
}
close A_FILE;

open C_FILE, '>', 'c_id_file.txt' or die "File error:  $!";
opendir B_DIR, 'b_directory' or die "Directory error:  $!";
while (defined(my $b_file = readdir B_DIR)) {
        next if $b_file =~ /^\./;       # or whatever filtering you need
        open B_FILE, '<', $b_file or die "File error:  $!";
        while (<B_FILE>) {
                chomp;
                for my $index (0..$#a_ids) {
                        if ($_ eq $a_ids[$index]) {
                                print C_FILE "$_\n";
                                splice @a_ids, $index, 1;               # remove from 
a's prevent duplicates
                                last;
                        }
                }
        }
        close B_FILE;
}
close C_FILE;

__END__

That's probably not perfect, but hopefully it gets you moving in the right direction.

If you don't want to stuff the Type-A IDs in memory like that, you may have to go line by line with the B files, reading all of A to check for a match, as you originally showed. But don't do that, if you can avoid it. That could be pretty slow.

James


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: looping through (a lot of) files

Reply via email to