Re: [algogeeks] Re: Cliched 'k' largest elements in billion numbers: Heaps or median-of-medians?

atul anand Tue, 13 Dec 2011 19:04:40 -0800

@Gene : considering m-of-m algo , one thing that worries me is that using
m-of-m for huge data is not good bcoz when practically implementing it
constant would be very large so as to make it work for O(N)


here is the reason why is not a good approach for huge data :-
reccurence eqn for m-of-m is :-

T(n) <= (c/5) n + (3/4)cn + theta(n)
         <= (19/20)cn + theta(n)
          <= c.n - ( (1/20)cn - theta(n))

now here the residual part i.e = (1/20)cn - theta(n)  c should
be sufficient large at least 20 times larger that theta(n) so as to make it
work for O(n).

so for huge data input this is a not a good approach hence heap would win
over this.



On Mon, Dec 12, 2011 at 3:19 PM, Gene <gene.ress...@gmail.com> wrote:

> If N is big enough so that all data will not fit in main memory and k
> is small enough so that the k top elements fit in memory, then the
> heap is likely to be the best.This is because disk access is so
> incredibly slow compared to memory access: A few milliseconds (10^-3)
> vs. a few nanoseconds (10^-9), a factor of a million.  The heap-based
> algorithms need to touch the disk only once per data item. It would be
> difficult if not impossible to implement m-of-m that achieves this.
>
> On the other hand, if both k and N are so big that this many data do
> not fit in memory, then the m-of-m algorithm will probably win.  I say
> this because Quicksort is known to have better performance than
> heapsort, including memory access patterns on large data. (In fact
> heaps can have terrible access patterns if implemented in the typical
> textbook way where the children of a[i] are at a[2*i] and a[2*i+1].)
> This is a strong indicator that m-of-m (which is at heart a Quicksort
> that stops early) will do better than keeping the top k elements in a
> heap for this problem when both algorithms need disk i/o.
>
> Needing to find the top 10 billion out of a few trillion data is
> actually a realistic problem in some areas of research.  So this is a
> useful thing to discuss.
>
> Gene
>
> On Dec 11, 8:11 pm, bharath sriram <bharath.sri...@gmail.com> wrote:
> > Hey group
> >
> > This is kind of a cliched question but given a file with billion numbers
> > and the task is to compute 'k' largest numbers from this file, what
> > approach is preferred?
> > 1) Using heaps
> > 2) Using Median-of-median algorithm.
> >
> > Have read few links which prefer heaps but clearly median of median
> > algorithm has a linear time complexity and don't see how its any less if
> > not better than using heaps?
> > Any thought?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: Cliched 'k' largest elements in billion numbers: Heaps or median-of-medians?

Reply via email to