Brice Arnould wrote:
I was asking myself if it could be a good idea to parallelize some of the alogorithms of Hadoop, such as MergeSorter, for the case a single job of run on a multicore system.
One can already exploit parallelism on a multicore system by using "pseudo-distributed" mode and increasing mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum.
LocalRunner should also someday be enhanced to run multiple maps and reduces in separate threads, which would be more efficient, since intermediate data would not need to travel through the loopback network interface. But I don't see an urgent case for making the sort code itself multi-threaded, since MapReduce itself performs parallel sorting.
Doug
