RE: Strange behavior - One reduce out of N reduces always fail.

Mahadev Konar Tue, 20 Feb 2007 17:27:41 -0800

It does look like the value for a particular key is huge in size. Does your
map reduce job fail for the same key/value pair or is it non deterministic?


Regards
Mahadev

> -----Original Message-----
> From: Venkat Seeth [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, February 20, 2007 4:09 PM
> To: hadoop-user@lucene.apache.org; Devaraj Das
> Subject: RE: Strange behavior - One reduce out of N reduces always fail.
> 
> Hi Devraj,
> 
> The log file for key-value pairs are huge? If you can
> tell me what are you looking for I can mine and send
> the relevant information.
> 
>  343891695 2007-02-20 18:37 seq.log
> 
> This time aroung I get the following error:
> 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>         at
> java.util.Arrays.copyOfRange(Arrays.java:3209)
>         at java.lang.String.<init>(String.java:216)
>         at
> java.lang.StringBuffer.toString(StringBuffer.java:585)
>         at
> org.apache.log4j.WriterAppender.checkEntryConditions(WriterAppender.java:1
> 76)
>         at
> org.apache.log4j.WriterAppender.append(WriterAppender.java:156)
>         at
> org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230)
>         at
> org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appe
> nderAttachableImpl.java:65)
>         at
> org.apache.log4j.Category.callAppenders(Category.java:203)
>         at
> org.apache.log4j.Category.forcedLog(Category.java:388)
>         at
> org.apache.log4j.Category.debug(Category.java:257)
>         at
> com.gale.searchng.workflow.model.TuplesWritable.readFields(TuplesWritable.
> java:127)
>         at
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java
> :199)
>         at
> org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:16
> 0)
>         at
> com.gale.searchng.workflow.indexer.Indexer.reduce(Indexer.java:152)
>         at
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:324)
>         at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1372)
> 
> Thanks,
> Venkat
> 
> --- Devaraj Das <[EMAIL PROTECTED]> wrote:
> 
> > Hi Venkat,
> > You forgot to paste the log output in your reply.
> > The patch that I sent will
> > log the key/value sizes in the Reducers as well. See
> > if you get helpful
> > hints with that.
> > Thanks,
> > Devaraj.
> >
> > > -----Original Message-----
> > > From: Venkat Seeth [mailto:[EMAIL PROTECTED]
> > > Sent: Tuesday, February 20, 2007 9:55 PM
> > > To: hadoop-user@lucene.apache.org; Devaraj Das
> > > Subject: RE: Strange behavior - One reduce out of
> > N reduces always fail.
> > >
> > > Hi Devraj,
> > >
> > > Thanks for your response.
> > >
> > > > Do you have an estimate of the sizes?
> > > # of entries:1080746
> > > [# of field-value Pairs]
> > > min count:20
> > > max count:3116
> > > avg count:66
> > >
> > > These are small documents and yes, full-text
> > content
> > > for each document can be big. I've also set the
> > > MaxFieldLength to 10000 so that I dont index very
> > > large values as suggested in Lucene.
> > >
> > > Always, the reduce fails while merging segments. I
> > do
> > > see a large line in Log4J output which consists of
> > >
> > > Typically, the job that fails is is ALWAYS VERY
> > SLOW
> > > as compared to other N - 1 jobs.
> > >
> > > Can I log the Key-Value pair sizes in the reduce
> > part
> > > of the indexer?
> > >
> > > Again,
> > >
> > > Thanks,
> > > Venkat
> > >
> > > --- Devaraj Das <[EMAIL PROTECTED]> wrote:
> > >
> > > > While this could be a JVM/GC issue as Andrez
> > pointed
> > > > out, it could also be
> > > > due to a very large key/value being read from
> > the
> > > > map output. Do you have an
> > > > estimate of the sizes? Attached is a
> > > > quick-hack-patch to log the sizes of
> > > > the key/values read from the sequence files.
> > Please
> > > > apply this patch on
> > > > hadoop-0.11.2 and check the userlogs what
> > key/value
> > > > it is failing for (if at
> > > > all)..
> > > > Thanks,
> > > > Devaraj.
> > > >
> > > > > -----Original Message-----
> > > > > From: Venkat Seeth [mailto:[EMAIL PROTECTED]
> > > > > Sent: Tuesday, February 20, 2007 11:32 AM
> > > > > To: hadoop-user@lucene.apache.org
> > > > > Subject: Strange behavior - One reduce out of
> > N
> > > > reduces always fail.
> > > > >
> > > > > Hi there,
> > > > >
> > > > > Howdy. I've been using hadoop to parse and
> > index
> > > > XML
> > > > > documents. Its a 2 step process similar to
> > Nutch.
> > > > I
> > > > > parse the XML and create field-value tuples
> > > > written to
> > > > > a file.
> > > > >
> > > > > I read this file and index the field-value
> > pairs
> > > > in
> > > > > the next step.
> > > > >
> > > > > Everything works fine but always one reduce
> > out of
> > > > N
> > > > > fails in the last step when merging segments.
> > It
> > > > fails
> > > > > with one or more of the following:
> > > > > - Task failed to report status for 608
> > seconds.
> > > > > Killing.
> > > > > - java.lang.OutOfMemoryError: GC overhead
> > limit
> > > > > exceeded
> > > > >
> > > > > I've tried various configuration combinations
> > and
> > > > it
> > > > > fails always at the 4th one in a 8 reduce
> > > > > configuration and the first one in a 4 reduce
> > > > config.
> > > > >
> > > > > Environment:
> > > > > Suse Linux 64 bit
> > > > > Java 6 (Java 5 also fails)
> > > > > Hadoop-0.11-2
> > > > > Lucene-2.1 (Lucene 2.0 also fails)
> > > > >
> > > > > Configuration:
> > > > > I have about 128 maps and 8 reduces so I get
> > to
> > > > create
> > > > > 8 partitions of my index. It runs on a 4 node
> > > > cluster
> > > > > with 4-Dual-proc 64GB machines.
> > > > >
> > > > > Number of documents: 1.65 million each about
> > 10K
> > > > in
> > > > > size.
> > > > >
> > > > > I ran with 4 or 8 task trackers per node with
> > 4 GB
> > > > > Heap for Job, Task trackers and the child
> > JVMs.
> > > > >
> > > > > mergeFactor set to 50 and maxBufferedDocs at
> > 1000.
> > > > >
> > > > > I fail to understand whats going on. When I
> > run
> > > > the
> > > > > job individually, it works with the same
> > settings.
> > > > >
> > > > > Why would all jobs work where in only one
> > fails.
> > > > >
> > > > > I'd appreciate if any one can share their
> > > > experience.
> > > > >
> > > > > Thanks,
> > > > > Ven
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> __________________________________________________________________________
> > > > > __________
> > > > > Yahoo! Music Unlimited
> > > > > Access over 1 million songs.
> > > > > http://music.yahoo.com/unlimited
> > > >
> > >
> > >
> > >
> > >
> > >
> >
> __________________________________________________________________________
> > > __________
> > > Cheap talk?
> > > Check out Yahoo! Messenger's low PC-to-Phone call
> > rates.
> > > http://voice.yahoo.com
> >
> >
> 
> 
> 
> 
> __________________________________________________________________________
> __________
> Yahoo! Music Unlimited
> Access over 1 million songs.
> http://music.yahoo.com/unlimited

RE: Strange behavior - One reduce out of N reduces always fail.

Reply via email to