I saw that the link I sent you may not be working, please take a look here to see what it is all about,
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B5AOpwg8IzVANjJlODZhZDctNWUzMS00MmNhLWI3OWMtMWNhMTdjODQwNjVl&hl=en_US thanks again! On 21 June 2011 14:22, Alberto Andreotti <albertoandreo...@gmail.com> wrote: > Thank you guys, I really appreciate your answers. I don't have access to > the cluster right now, I'll check the info you are asking and come back in a > couple of hours. > BTW, I tried the app on two clusters with similar results. I'm using > 0.21.0. > > thanks again, Alberto. > > > On 21 June 2011 14:16, GOEKE, MATTHEW (AG/1000) < > matthew.go...@monsanto.com> wrote: > >> Harsh, >> >> Is it possible for mapred.reduce.slowstart.completed.maps to even play a >> significant role in this? The only benefit he would find in tweaking that >> for his problem would be to spread network traffic from the shuffle over a >> longer period of time at a cost of having the reducer using resources >> earlier. Either way he would see this effect across both sets of runs if he >> is using the default parameters. I guess it would all depend on what kind of >> network layout the cluster is on. >> >> Matt >> >> -----Original Message----- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Tuesday, June 21, 2011 12:09 PM >> To: common-user@hadoop.apache.org >> Subject: Re: Poor scalability with map reduce application >> >> Alberto, >> >> On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti >> <albertoandreo...@gmail.com> wrote: >> > I don't know if speculatives maps are on, I'll check it. One thing I >> > observed is that reduces begin before all maps have finished. Let me >> check >> > also if the difference is on the map side or in the reduce. I believe >> it's >> > balanced, both are slower when adding more nodes, but i'll confirm that. >> >> Maps and reduces are speculative by default, so must've been ON. Could >> you also post a general input vs. output record counts and statistics >> like that between your job runs, to correlate? >> >> The reducers get scheduled early but do not exactly "reduce()" until >> all maps are done. They just keep fetching outputs. Their scheduling >> can be controlled with some configurations (say, to start only after >> X% of maps are done -- by default it starts up when 5% of maps are >> done). >> >> -- >> Harsh J >> This e-mail message may contain privileged and/or confidential >> information, and is intended to be received only by persons entitled >> to receive such information. If you have received this e-mail in error, >> please notify the sender immediately. Please delete it and >> all attachments from any servers, hard drives or any other media. Other >> use of this e-mail by you is strictly prohibited. >> >> All e-mails and attachments sent and received are subject to monitoring, >> reading and archival by Monsanto, including its >> subsidiaries. The recipient of this e-mail is solely responsible for >> checking for the presence of "Viruses" or other "Malware". >> Monsanto, along with its subsidiaries, accepts no liability for any damage >> caused by any such code transmitted by or accompanying >> this e-mail or any attachment. >> >> >> The information contained in this email may be subject to the export >> control laws and regulations of the United States, potentially >> including but not limited to the Export Administration Regulations (EAR) >> and sanctions regulations issued by the U.S. Department of >> Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this >> information you are obligated to comply with all >> applicable U.S. export laws and regulations. >> >> > > > -- > José Pablo Alberto Andreotti. > Tel: 54 351 4730292 > Móvil: 54351156526363. > MSN: albertoandreo...@gmail.com > Skype: andreottialberto > -- José Pablo Alberto Andreotti. Tel: 54 351 4730292 Móvil: 54351156526363. MSN: albertoandreo...@gmail.com Skype: andreottialberto