On Sep 9, 2010, at 11:37 AM, Prabhakar Ragde wrote:

> And then what? Are we justified in saying T(N) = T(3N/4) + O(N)? Really, we 
> have T(N) = T(X) + O(N), where X is a random variable uniformly distributed 
> over [0..N-1]. With a suitable induction hypothesis, we can dig ourselves out 
> of this, but it's not simple, unfortunately. --PR

I don't think even this is quite right (I'm really reaching above my pay grade 
here, so I apologize if I'm making stupid mistakes):

We know that each iteration spends O(N) time for the partition, plus whatever 
time is spent on the chosen half of the partition.  So, we definitely have 

T(N) = T(X) + O(N),

where X is the length of the chosen partition, which must lie between 1 and N, 
since it must contain the desired element and could contain all the elements if 
the pivot is the smallest element from the list.  What is the distribution for 
X?  The distribution of N-lt (the length of the set of elements smaller than 
the pivot) is uniform random on [0...N-1] because the pivot is chosen uniformly 
at random from the list.  (We are assuming that the distribution of elements in 
the list is uniform, and that the set of possible list elements has cardinality 
much greater than the length of the list; if there are likely to be lots of 
repeated elements, then N-lt is biased toward smaller values, since repeated 
pivot elements wind up in the greater-or-equal list by default.)

But, even assuming N-lt ~ U(0,N-1), it's not obvious to me that the selection 
of the less-than or greater-or-equal lists is 50/50.  It seems like the longer 
lists are more likely to be chosen.  So, X should have a distribution something 
like

P(X) = P(N-lt = X) P(lt chosen) + P(N-gte = X) P(gte chosen) = P(N-lt = X)P(lt 
chosen) + P(N-lt = N-X) P(gte chosen) = 1/N X/N + 1/N X/N = 2X/N^2,

except that there is one case where this formula breaks down: when X = N, we 
cannot have N-lt = N since at least the pivot element must be found in the 
greater-or-equal list.  So 

P(X) = 2X/N^2 if 1 <= X < N, and 1/N if X = N.

(Note that this sums to 1 = sum(2X/N^2, X = 1..N-1) + 1/N, so it at least 
passes that simple check.)  Now we can arrive at a recursion relation for < 
T(N) >:

< T(N) > = < T(X) > + O(N) = 1/N (sum(2X/N^2 T(X), X=1..N-1) + 1/N T(N)) + O(N)

I'm not exactly sure where to go from here, however; as you say, it's tricky to 
analyze.  Nevertheless, it sure feels like it's O(N) (I've experimented quite a 
bit with timing tests, and the simple argument about averages I gave before 
"feels right").

Will


_________________________________________________
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/users

Reply via email to