Terry wrote: > Gene wrote: > > I'm sorry, but the response below is wrong on several levels. See > > notes. > > > > C Erler wrote: > > > Because there are infinite integers, there is no need for a list of > > > 4,300,000,000 of them to contain a duplicate. If there are > > > 4,299,999,999 choices, there will be one duplicate. Because the > > > solution uses binary search, the list is sorted. > > > > The problem is talking about 32-bit integers. There are 4,294,967,296 > > of these, so 4,300,000,000 must contain a duplicate. > > > > Binary search for a _solution_ only means you repeatedly halve the > > candidate set. Data need to be sorted when the problem is to find a > > particular element. Here you are solving a different problem entirely. > > Indeed if the ints are sorted, the algorithm is trivial: make one pass > > through them and look for a duplicate adjacent pair. > > > > > > > > So, the full problem is probably something like the following. There > > > is a sorted list of the first 4,299,999,999 positive integers. Someone > > > duplicates one of them and inserts it into the list right next to the > > > original. Figure out which integer is duplicated. > > > > > > With that, it is easy to solve. If you look at the first 5000 > > > integers, the last one should be 5000. If the last one is 4999, you > > > know that the duplicate is in that sublist. > > > > > > So, basically, you take the first half or so of the list and see if the > > > last one is too low. That will tell you which half of the list has the > > > duplicate. Repeat until your sublist has only the duplicated entry. > > > > As stated above, the data aren't sorted. > > > > The algorithm that Bentley is talking about works by repeatedly halving > > the candidate range in which the duplicate element must lie. Initially > > this range is 0..2^32-1. On each pass it throws away the half of the > > range that contains fewer data values and it also throws away the data > > lying in that half. Eventually the range decreases to a single value, > > which must be a duplicate because the remaining data has at least two > > elements. The problem that Bentley notes is that the data may still > > have 4 billion elements at this stage! The final improvement he > > mentions is that you can throw away data _inside_ the candidate range > > as long as you keep enough data around to ensure that at least one > > duplicate occurs in it. This is equal to the size of the current > > candidate range plus 1! > > Hi Gene, > I don't clearly understand what you are saying . > What if we had a list of 8 elements like > 1 3 2 4 2 5 6 7 > How are we going to find the element.
Hi Terry. Your example doesn't really apply to this problem. If you assume integers have 3 bits (rather than 32), then Bentley's algorithm will work only if you have a list longer than 2^3=8 elements. So let's fix up the example by adding a zero at the end of your list. So here is pseudocode for Bentley's proposed algorithm (without the final improvement): lo = 0; // bottom of range m = 8; // number of integers currently in range tape = [1,3,2,4,2,5,6,7,0]; // list to find dup in while (m > 1) { s = lo + m/2; // base of hi half of range tape_lo = []; // set work tapes to empty tape_hi = []; foreach i from tape if (i < s) add i to tape_lo else add i to tape_hi if (tape_lo has more integers than tape_hi use tape_lo for tape in next pass else { use tape_hi for tape in next pass lo = s; } m = m/2; } // answer is in lo On your data, you get lo=0 m=8 s=4 tape=(1 3 2 4 2 5 6 7 0) lo=0 m=4 s=2 tape=(0 2 2 3 1) lo=2 m=2 s=3 tape=(3 2 2) answer=2 Now the need for improvement shows up if you use a list [7,7,7,0,7,7,7,7] as input. lo=0 m=8 s=4 tape=(7 7 7 0 7 7 7 7 7) lo=4 m=4 s=6 tape=(7 7 7 7 7 7 7 7) lo=6 m=2 s=7 tape=(7 7 7 7 7 7 7 7) 7 You can see that all the 7's get copied over and over, so the run time per pass is not decreasing. The improvement is to note that once you have m+1 characters on tape_lo or tape_hi you can stop writing values and the algorithm still works fine. In case you want to try other examples, here is a perl code that implements the algorithm for 3 bits. It includes the improvement to limit tape length. sub main { my $lo = 0; my $m = 8; my $tape = [1,6,7,0,2,4,5,3,1]; while ($m > 1) { my $s = $lo + $m/2; print "lo=$lo m=$m s=$s tape=(@$tape)\n"; my $tape_lo = []; my $tape_hi = []; foreach my $i (@$tape) { if ($i < $s) { push @$tape_lo, $i unless scalar @$tape_lo > $m } else { push @$tape_hi, $i unless scalar @$tape_hi > $m } } if (scalar @$tape_lo > scalar @$tape_hi) { $tape = $tape_lo; } else { $tape = $tape_hi; $lo = $s; } $m /= 2; } print "$lo\n"; } main; --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---