I saw that the link I sent you may not be working, please take a look here
to see what it is all about,

https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B5AOpwg8IzVANjJlODZhZDctNWUzMS00MmNhLWI3OWMtMWNhMTdjODQwNjVl&hl=en_US


thanks again!

On 21 June 2011 14:22, Alberto Andreotti <albertoandreo...@gmail.com> wrote:

> Thank you guys, I really appreciate your answers. I don't have access to
> the cluster right now, I'll check the info you are asking and come back in a
> couple of hours.
> BTW, I tried the app on two clusters with similar results. I'm using
> 0.21.0.
>
> thanks again, Alberto.
>
>
> On 21 June 2011 14:16, GOEKE, MATTHEW (AG/1000) <
> matthew.go...@monsanto.com> wrote:
>
>> Harsh,
>>
>> Is it possible for mapred.reduce.slowstart.completed.maps to even play a
>> significant role in this? The only benefit he would find in tweaking that
>> for his problem would be to spread network traffic from the shuffle over a
>> longer period of time at a cost of having the reducer using resources
>> earlier. Either way he would see this effect across both sets of runs if he
>> is using the default parameters. I guess it would all depend on what kind of
>> network layout the cluster is on.
>>
>> Matt
>>
>> -----Original Message-----
>> From: Harsh J [mailto:ha...@cloudera.com]
>> Sent: Tuesday, June 21, 2011 12:09 PM
>> To: common-user@hadoop.apache.org
>> Subject: Re: Poor scalability with map reduce application
>>
>> Alberto,
>>
>> On Tue, Jun 21, 2011 at 10:27 PM, Alberto Andreotti
>> <albertoandreo...@gmail.com> wrote:
>> > I don't know if speculatives maps are on, I'll check it. One thing I
>> > observed is that reduces begin before all maps have finished. Let me
>> check
>> > also if the difference is on the map side or in the reduce. I believe
>> it's
>> > balanced, both are slower when adding more nodes, but i'll confirm that.
>>
>> Maps and reduces are speculative by default, so must've been ON. Could
>> you also post a general input vs. output record counts and statistics
>> like that between your job runs, to correlate?
>>
>> The reducers get scheduled early but do not exactly "reduce()" until
>> all maps are done. They just keep fetching outputs. Their scheduling
>> can be controlled with some configurations (say, to start only after
>> X% of maps are done -- by default it starts up when 5% of maps are
>> done).
>>
>> --
>> Harsh J
>> This e-mail message may contain privileged and/or confidential
>> information, and is intended to be received only by persons entitled
>> to receive such information. If you have received this e-mail in error,
>> please notify the sender immediately. Please delete it and
>> all attachments from any servers, hard drives or any other media. Other
>> use of this e-mail by you is strictly prohibited.
>>
>> All e-mails and attachments sent and received are subject to monitoring,
>> reading and archival by Monsanto, including its
>> subsidiaries. The recipient of this e-mail is solely responsible for
>> checking for the presence of "Viruses" or other "Malware".
>> Monsanto, along with its subsidiaries, accepts no liability for any damage
>> caused by any such code transmitted by or accompanying
>> this e-mail or any attachment.
>>
>>
>> The information contained in this email may be subject to the export
>> control laws and regulations of the United States, potentially
>> including but not limited to the Export Administration Regulations (EAR)
>> and sanctions regulations issued by the U.S. Department of
>> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
>> information you are obligated to comply with all
>> applicable U.S. export laws and regulations.
>>
>>
>
>
> --
> José Pablo Alberto Andreotti.
> Tel: 54 351 4730292
> Móvil: 54351156526363.
> MSN: albertoandreo...@gmail.com
> Skype: andreottialberto
>



-- 
José Pablo Alberto Andreotti.
Tel: 54 351 4730292
Móvil: 54351156526363.
MSN: albertoandreo...@gmail.com
Skype: andreottialberto

Reply via email to