[algogeeks] Re: programming pearls duplicate value

Gene Mon, 29 May 2006 20:29:59 -0700

I'm sorry, but the response below is wrong on several levels.  See
notes.

C Erler wrote:
> Because there are infinite integers, there is no need for a list of
> 4,300,000,000 of them to contain a duplicate.  If there are
> 4,299,999,999 choices, there will be one duplicate.  Because the
> solution uses binary search, the list is sorted.


The problem is talking about 32-bit integers.  There are 4,294,967,296
of these, so 4,300,000,000 must contain a duplicate.

Binary search for a _solution_ only means you repeatedly halve the
candidate set.  Data need to be sorted when the problem is to find a
particular element. Here you are solving a different problem entirely.
Indeed if the ints are sorted, the algorithm is trivial: make one pass
through them and look for a duplicate adjacent pair.

>
> So, the full problem is probably something like the following.  There
> is a sorted list of the first 4,299,999,999 positive integers.  Someone
> duplicates one of them and inserts it into the list right next to the
> original.  Figure out which integer is duplicated.
>
> With that, it is easy to solve.  If you look at the first 5000
> integers, the last one should be 5000.  If the last one is 4999, you
> know that the duplicate is in that sublist.
>
> So, basically, you take the first half or so of the list and see if the
> last one is too low.  That will tell you which half of the list has the
> duplicate.  Repeat until your sublist has only the duplicated entry.

As stated above, the data aren't sorted.

The algorithm that Bentley is talking about works by repeatedly halving
the candidate range in which the duplicate element must lie.  Initially
this range is 0..2^32-1.   On each pass it throws away the half of the
range that contains fewer data values and it also throws away the data
lying in that half.  Eventually the range decreases to a single value,
which must be a duplicate because the remaining data has at least two
elements.  The problem that Bentley notes is that the data may still
have 4 billion elements at this stage!  The final improvement he
mentions is that you can throw away data _inside_ the candidate range
as long as you keep enough data around to ensure that at least one
duplicate occurs in it.  This is equal to the size of the current
candidate range plus 1!


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

[algogeeks] Re: programming pearls duplicate value

Reply via email to