Re: Strange behavior - One reduce out of N reduces always fail.

Venkat Seeth Tue, 20 Feb 2007 08:03:39 -0800

Hi Andrzej,

Thanks for your quick response.


Please find my comments below.

> Perhaps you are running with too large heap, as
strange as it may sound 
> ... If I understand this message correctly, JVM
complains that GC is 
> taking too much resources.
I started with defaults, 200m and maxBufferedDocs to
100. But I got too many open files error. Then I
increased maxBuffredDocs to 2000, I got OOM. Hence I
went thru a series of changes to arrive at this
conclusion that irrespective of any config, one reduce
fails.

> 
> This may be also related to ulimit on this account
I checked and it has a limt of 1024. The number of
segements generated was around 500 for 1 million docs
in each part.

> I think that with this configuration you could
> increase the number of 
> reduces, to decrease the amount of data each reduce
> task has to handle. 
Ideally I want a partition for 10-15 million docs per
reduce since I want to index 100 million. I can try
with 10 or 12 reduces.
But, even with 8, one fails and in isolation that
works fine with the same settings.

> In your current config you run at most 2 reduces per
machine.
True. Why do you say so. I've set 4 tasks/node but I
was at 8 too and faced the same issue.

> You can also use IsolationRunner to re-run
> individual tasks under 
> debugger and see where they fail.
I tried with mapred.job.tracker = local and things fly
without errors. I also tried the same with a slave and
they work too. 
Locally on windows using Cygwin, it works too.

Any thoughts are greatly appreciated. I'm doing a
proof-of-concept and this is really a big hurdle.

Thanks,
Venkat

--- Andrzej Bialecki <[EMAIL PROTECTED]> wrote:

> Venkat Seeth wrote:
> > Hi there,
> >
> > Howdy. I've been using hadoop to parse and index
> XML
> > documents. Its a 2 step process similar to Nutch.
> I
> > parse the XML and create field-value tuples
> written to
> > a file.
> >
> > I read this file and index the field-value pairs
> in
> > the next step.
> >
> > Everything works fine but always one reduce out of
> N
> > fails in the last step when merging segments. It
> fails
> > with one or more of the following:
> > - Task failed to report status for 608 seconds.
> > Killing. 
> > - java.lang.OutOfMemoryError: GC overhead limit
> > exceeded 
> >   
> 
> Perhaps you are running with too large heap, as
> strange as it may sound 
> ... If I understand this message correctly, JVM
> complains that GC is 
> taking too much resources.
> 
> This may be also related to ulimit on this account
> ...
> 
> 
> > Configuration:
> > I have about 128 maps and 8 reduces so I get to
> create
> > 8 partitions of my index. It runs on a 4 node
> cluster
> > with 4-Dual-proc 64GB machines.
> >   
> 
> I think that with this configuration you could
> increase the number of 
> reduces, to decrease the amount of data each reduce
> task has to handle. 
> In your current config you run at most 2 reduces per
> machine.
> 
> > Number of documents: 1.65 million each about 10K
> in
> > size.
> >
> > I ran with 4 or 8 task trackers per node with 4 GB
> > Heap for Job, Task trackers and the child JVMs.
> >
> > mergeFactor set to 50 and maxBufferedDocs at 1000.
> >
> > I fail to understand whats going on. When I run
> the
> > job individually, it works with the same settings.
> >
> > Why would all jobs work where in only one fails.
> >   
> 
> You can also use IsolationRunner to re-run
> individual tasks under 
> debugger and see where they fail.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _  
> __________________________________
> [__ || __|__/|__||\/|  Information Retrieval,
> Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System
> Integration
> http://www.sigram.com  Contact: info at sigram dot
> com
> 
> 
> 



 
____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com

Re: Strange behavior - One reduce out of N reduces always fail.

Reply via email to