On May 12, Larry Wissink said:

>We have a backup server that is missing records from the production
>server for a particular table.  We know that it should have sequential
>records and that it is missing some records.  We want to get a sense of
>the number of records missing.  So, we know the problem started around
>the beginning of March at id 70,000,000 (rounded for convenience).
>Currently we are at 79,000,000.  So, I dumped to a file all the ids
>between 70,000,000 and 79,000,000 (commas inserted here for
>readability).  I need to figure out what numbers are missing.  The way
>that seemed easiest to me was to create two arrays.  One with every
>number between 70 and 79 million, the other with every number in our
>dump file.  Then compare them as illustrated in the Perl Cookbook using
>a hash.
>
>But, when I try to scale that to 9 million records, it doesn't work.
>This is probably because it is trying to do something like what db
>people call a cartesian join (every record against every record).

Well, don't do that! ;)  When you have a super-set and a sub-set, and
they're ordered, you only need to go through the set ONCE.

  @superset = (1 .. 10);
  @subset = (1, 2, 4, 7, 8, 9);
  @missing = ();

  my $idx = 0;

  for (@superset) {
    push @missing, $_
      unless $subset[$idx] == $_ and ++$idx;
  }

That's just a bit of shorthand for:

  for (@superset) {
    if ($subset[$idx] == $_) { $idx++ }
    else { push @missing, $_ }
  }

Anyway, that populates @missing with the missing elements, in linear time.

-- 
Jeff "japhy" Pinyan      [EMAIL PROTECTED]      http://www.pobox.com/~japhy/
RPI Acacia brother #734   http://www.perlmonks.org/   http://www.cpan.org/
CPAN ID: PINYAN    [Need a programmer?  If you like my work, let me know.]
<stu> what does y/// stand for?  <tenderpuss> why, yansliterate of course.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to