Re: Reduce task going away for 10 seconds at a time

2009-03-16 Thread Aaron Kimball
If you jstack the process in the middle of one of these pauses, can you see
where it's sticking?
- Aaron

On Fri, Mar 13, 2009 at 6:51 AM, Doug Cook nab...@candiru.com wrote:


 Hi folks,

 I've been debugging a severe performance problems with a Hadoop-based
 application (a highly modified version of Nutch). I've recently upgraded to
 Hadoop 0.19.1 from a much, much older version, and a reduce that used to
 work just fine is now running orders of magnitude more slowly.

 From the logs I can see that progress of my reduce stops for periods that
 average almost exactly 10 seconds (with a very narrow distribution around
 10
 seconds), and it does so in various places in my code, but more or less in
 proportion to how much time I'd expect the task would normally spend in
 that
 particular place in the code, i.e. the behavior seems like my code is
 randomly being interrupted for 10 seconds at a time.

 I'm planning to keep digging, but thought that these symptoms might sound
 familiar to someone on this list. Ring any bells? Your help much
 appreciated.

 Thanks!

 Doug Cook
 --
 View this message in context:
 http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
 Sent from the Hadoop core-user mailing list archive at Nabble.com.




Reduce task going away for 10 seconds at a time

2009-03-13 Thread Doug Cook

Hi folks,

I've been debugging a severe performance problems with a Hadoop-based
application (a highly modified version of Nutch). I've recently upgraded to
Hadoop 0.19.1 from a much, much older version, and a reduce that used to
work just fine is now running orders of magnitude more slowly. 

From the logs I can see that progress of my reduce stops for periods that
average almost exactly 10 seconds (with a very narrow distribution around 10
seconds), and it does so in various places in my code, but more or less in
proportion to how much time I'd expect the task would normally spend in that
particular place in the code, i.e. the behavior seems like my code is
randomly being interrupted for 10 seconds at a time. 

I'm planning to keep digging, but thought that these symptoms might sound
familiar to someone on this list. Ring any bells? Your help much
appreciated. 

Thanks!

Doug Cook
-- 
View this message in context: 
http://www.nabble.com/Reduce-task-going-away-for-10-seconds-at-a-time-tp22496810p22496810.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.