Re: Strange behavior - One reduce out of N reduces always fail.

Andrzej Bialecki Mon, 19 Feb 2007 23:30:25 -0800

Venkat Seeth wrote:

Hi there,


Howdy. I've been using hadoop to parse and index XML
documents. Its a 2 step process similar to Nutch. I
parse the XML and create field-value tuples written to
a file.

I read this file and index the field-value pairs in
the next step.

Everything works fine but always one reduce out of N
fails in the last step when merging segments. It fails
with one or more of the following:
- Task failed to report status for 608 seconds.

Killing.- java.lang.OutOfMemoryError: GC overhead limitexceeded

Perhaps you are running with too large heap, as strange as it may sound... If I understand this message correctly, JVM complains that GC istaking too much resources.


This may be also related to ulimit on this account ...

Configuration:
I have about 128 maps and 8 reduces so I get to create
8 partitions of my index. It runs on a 4 node cluster
with 4-Dual-proc 64GB machines.

I think that with this configuration you could increase the number ofreduces, to decrease the amount of data each reduce task has to handle.In your current config you run at most 2 reduces per machine.

Number of documents: 1.65 million each about 10K in
size.

I ran with 4 or 8 task trackers per node with 4 GB
Heap for Job, Task trackers and the child JVMs.

mergeFactor set to 50 and maxBufferedDocs at 1000.

I fail to understand whats going on. When I run the
job individually, it works with the same settings.

Why would all jobs work where in only one fails.

You can also use IsolationRunner to re-run individual tasks underdebugger and see where they fail.


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Strange behavior - One reduce out of N reduces always fail.

Reply via email to