Given billions of integers in a file. Memory Constraints exist so need to parallelize. One solution can be to split the file over a 100 machines and each machine calculates median of their part using quickselect and sends the result to host machine. Now the host calculates median of these medians and asks the sub machines to discard all the numbers less than this final median. So, map reduce!
Now i want to do something like this - Each sub machine has billion/100 integers. It chooses an integer k randomly and divides its integers into two parts, one is less than k and other set is more than k (like quickselect algo). This machine returns to the host basically 3 things, one is k, second and third is number of elements in both sets. After having all that data from submachines, the host does some calculation and asks each machine to discard either the less than set or more than set. then whole thing is repeated with lesser number of elements. Any ideas about how the host machine would do that and on what basis? -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.