Hi all, I check the source code of Mapper Task, it seems that the output of one mapper task is one data file and one index file. And reducer task will fetch part of the output of mapper. I am wondering why not putting the output of mapper into n files (n is the reducer number), since mapper task knows the Partitioner. and the logic will be much easier. Is there any performance consideration for putting the output into one file ? Thanks.
-- Best Regards Jeff Zhang