WOULDN'T MEDIAN OF MEDIANS WORK? mappers and reducers using hadoop Best Regards Ashish Goel "Think positive and find fuel in failure" +919985813081 +919966006652
On Sun, Jul 17, 2011 at 8:57 PM, Dumanshu <duman...@gmail.com> wrote: > Given billions of integers in a file. Memory Constraints exist so need > to parallelize. One solution can be to split the file over a 100 > machines and each machine calculates median of their part using > quickselect and sends the result to host machine. Now the host > calculates median of these medians and asks the sub machines to > discard all the numbers less than this final median. So, map reduce! > > Now i want to do something like this - > > Each sub machine has billion/100 integers. It chooses an integer k > randomly and divides its integers into two parts, one is less than k > and other set is more than k (like quickselect algo). This machine > returns to the host basically 3 things, one is k, second and third is > number of elements in both sets. > After having all that data from submachines, the host does some > calculation and asks each machine to discard either the less than set > or more than set. then whole thing is repeated with lesser number of > elements. > > Any ideas about how the host machine would do that and on what basis? > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.