Harsh,

Is it possible for mapred.reduce.slowstart.completed.maps to even play a 
significant role in this? The only benefit he would find in tweaking that for 
his problem would be to spread network traffic from the shuffle over a longer 
period of time at a cost of having the reducer using resources earlier. Either 
way he would see this effect across both sets of runs if he is using the 
default parameters. I guess it would all depend on what kind of network layout 
the cluster is on.

Matt

-----Original Message-----
From: Harsh J [mailto:ha...@cloudera.com] 
Sent: Tuesday, June 21, 2011 12:09 PM
To: common-user@hadoop.apache.org
Subject: Re: Poor scalability with map reduce application

Alberto,

On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
<albertoandreo...@gmail.com> wrote:
> I don't know if speculatives maps are on, I'll check it. One thing I
> observed is that reduces begin before all maps have finished. Let me check
> also if the difference is on the map side or in the reduce. I believe it's
> balanced, both are slower when adding more nodes, but i'll confirm that.

Maps and reduces are speculative by default, so must've been ON. Could
you also post a general input vs. output record counts and statistics
like that between your job runs, to correlate?

The reducers get scheduled early but do not exactly "reduce()" until
all maps are done. They just keep fetching outputs. Their scheduling
can be controlled with some configurations (say, to start only after
X% of maps are done -- by default it starts up when 5% of maps are
done).

-- 
Harsh J
This e-mail message may contain privileged and/or confidential information, and 
is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please 
notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of 
this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, 
reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking 
for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage 
caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control 
laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and 
sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this 
information you are obligated to comply with all
applicable U.S. export laws and regulations.

Reply via email to