WOULDN'T MEDIAN OF MEDIANS WORK? mappers and reducers using hadoop
Best Regards
Ashish Goel
"Think positive and find fuel in failure"
+919985813081
+919966006652


On Sun, Jul 17, 2011 at 8:57 PM, Dumanshu <duman...@gmail.com> wrote:

> Given billions of integers in a file. Memory Constraints exist so need
> to parallelize. One solution can be to split the file over a 100
> machines and each machine calculates median of their part using
> quickselect and sends the result to host machine. Now the host
> calculates median of these medians and asks the sub machines to
> discard all the numbers less than this final median. So, map reduce!
>
> Now i want to do something like this -
>
> Each sub machine has billion/100 integers. It chooses an integer k
> randomly and divides its integers into two parts, one is less than k
> and other set is more than k (like quickselect algo). This machine
> returns to the host basically 3 things, one is k, second and third is
> number of elements in both sets.
> After having all that data from submachines, the host does some
> calculation and asks each machine to discard either the less than set
> or more than set. then whole thing is repeated with lesser number of
> elements.
>
> Any ideas about how the host machine would do that and on what basis?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to