Harsh, Is it possible for mapred.reduce.slowstart.completed.maps to even play a significant role in this? The only benefit he would find in tweaking that for his problem would be to spread network traffic from the shuffle over a longer period of time at a cost of having the reducer using resources earlier. Either way he would see this effect across both sets of runs if he is using the default parameters. I guess it would all depend on what kind of network layout the cluster is on.
Matt -----Original Message----- From: Harsh J [mailto:ha...@cloudera.com] Sent: Tuesday, June 21, 2011 12:09 PM To: common-user@hadoop.apache.org Subject: Re: Poor scalability with map reduce application Alberto, On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti <albertoandreo...@gmail.com> wrote: > I don't know if speculatives maps are on, I'll check it. One thing I > observed is that reduces begin before all maps have finished. Let me check > also if the difference is on the map side or in the reduce. I believe it's > balanced, both are slower when adding more nodes, but i'll confirm that. Maps and reduces are speculative by default, so must've been ON. Could you also post a general input vs. output record counts and statistics like that between your job runs, to correlate? The reducers get scheduled early but do not exactly "reduce()" until all maps are done. They just keep fetching outputs. Their scheduling can be controlled with some configurations (say, to start only after X% of maps are done -- by default it starts up when 5% of maps are done). -- Harsh J This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations.