On May 12, Larry Wissink said: >We have a backup server that is missing records from the production >server for a particular table. We know that it should have sequential >records and that it is missing some records. We want to get a sense of >the number of records missing. So, we know the problem started around >the beginning of March at id 70,000,000 (rounded for convenience). >Currently we are at 79,000,000. So, I dumped to a file all the ids >between 70,000,000 and 79,000,000 (commas inserted here for >readability). I need to figure out what numbers are missing. The way >that seemed easiest to me was to create two arrays. One with every >number between 70 and 79 million, the other with every number in our >dump file. Then compare them as illustrated in the Perl Cookbook using >a hash. > >But, when I try to scale that to 9 million records, it doesn't work. >This is probably because it is trying to do something like what db >people call a cartesian join (every record against every record).
Well, don't do that! ;) When you have a super-set and a sub-set, and they're ordered, you only need to go through the set ONCE. @superset = (1 .. 10); @subset = (1, 2, 4, 7, 8, 9); @missing = (); my $idx = 0; for (@superset) { push @missing, $_ unless $subset[$idx] == $_ and ++$idx; } That's just a bit of shorthand for: for (@superset) { if ($subset[$idx] == $_) { $idx++ } else { push @missing, $_ } } Anyway, that populates @missing with the missing elements, in linear time. -- Jeff "japhy" Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/ CPAN ID: PINYAN [Need a programmer? If you like my work, let me know.] <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>