Even if code is the same, if the data it processes has changed (for eg: date 
related data), or the parameters are different(for eg:sort/spill on map), the 
change in behavior can occur.
Seems to me related to buffering concern.The detailed logs can point out what 
exactly is happening.

Thanks & Regards,
/R


On 11/24/09 2:18 PM, "himanshu chandola" <himanshu_cool...@yahoo.com> wrote:

Hi Todd,
It was definitely working fine a week before and the code hasn't changed much. 
On my laptop a pseudo distributed installation for the same code finishes 
successive map reduce iteration quickly enough.

As far as I can see it, it is probably due to reformatting the fs. But I can't 
understand why it occurs this way.

tx

Himanshu

Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.


________________________________
From: Todd Lipcon <t...@cloudera.com>
To: mapreduce-user@hadoop.apache.org
Sent: Tue, November 24, 2009 2:52:51 AM
Subject: Re: Maps getting stuck at 100%

Hi Himanshu,

The map progress percentage is calculated based on the input read, rather than 
the processing actually done. So, if you're doing a lot of work in your mapper, 
or reading ahead of what you've processed, you'll see this behavior reasonably 
often. It also can show up sometimes in streaming jobs if you are doing a lot 
of work per row, since have more buffering going on between the counters and 
your actual mapper work.

The easiest way to see what the tasks are doing is to drill down to the logs 
for an individual task that's stuck at 100%. If you add some logging output to 
your program, that can be helpful. Another trick, if you have the right access, 
is to ssh into your tasktracker node and send the SIGQUIT signal to one of your 
task pids - this will make it dump stack to its stdout log, which you can then 
inspect to understand what's going on.

Hope that helps
-Todd

On Mon, Nov 23, 2009 at 11:48 PM, himanshu chandola 
<himanshu_cool...@yahoo.com> wrote:
Hi,
I use cloudera's distribution for hadoop. What I see is that a small fraction 
of maps get stuck at 100%. They show up as 100% but continue running. After a 
lot of delay, they succeed finally but it takes a while, like 10 mins from the 
time when they show up as 100%.

We recently reformatted our hadoop fs. Could it be related to that ?


Thanks




 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.







Reply via email to