It does look like the value for a particular key is huge in size. Does your map reduce job fail for the same key/value pair or is it non deterministic?
Regards Mahadev > -----Original Message----- > From: Venkat Seeth [mailto:[EMAIL PROTECTED] > Sent: Tuesday, February 20, 2007 4:09 PM > To: hadoop-user@lucene.apache.org; Devaraj Das > Subject: RE: Strange behavior - One reduce out of N reduces always fail. > > Hi Devraj, > > The log file for key-value pairs are huge? If you can > tell me what are you looking for I can mine and send > the relevant information. > > 343891695 2007-02-20 18:37 seq.log > > This time aroung I get the following error: > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at > java.util.Arrays.copyOfRange(Arrays.java:3209) > at java.lang.String.<init>(String.java:216) > at > java.lang.StringBuffer.toString(StringBuffer.java:585) > at > org.apache.log4j.WriterAppender.checkEntryConditions(WriterAppender.java:1 > 76) > at > org.apache.log4j.WriterAppender.append(WriterAppender.java:156) > at > org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230) > at > org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(Appe > nderAttachableImpl.java:65) > at > org.apache.log4j.Category.callAppenders(Category.java:203) > at > org.apache.log4j.Category.forcedLog(Category.java:388) > at > org.apache.log4j.Category.debug(Category.java:257) > at > com.gale.searchng.workflow.model.TuplesWritable.readFields(TuplesWritable. > java:127) > at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.getNext(ReduceTask.java > :199) > at > org.apache.hadoop.mapred.ReduceTask$ValuesIterator.next(ReduceTask.java:16 > 0) > at > com.gale.searchng.workflow.indexer.Indexer.reduce(Indexer.java:152) > at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:324) > at > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1372) > > Thanks, > Venkat > > --- Devaraj Das <[EMAIL PROTECTED]> wrote: > > > Hi Venkat, > > You forgot to paste the log output in your reply. > > The patch that I sent will > > log the key/value sizes in the Reducers as well. See > > if you get helpful > > hints with that. > > Thanks, > > Devaraj. > > > > > -----Original Message----- > > > From: Venkat Seeth [mailto:[EMAIL PROTECTED] > > > Sent: Tuesday, February 20, 2007 9:55 PM > > > To: hadoop-user@lucene.apache.org; Devaraj Das > > > Subject: RE: Strange behavior - One reduce out of > > N reduces always fail. > > > > > > Hi Devraj, > > > > > > Thanks for your response. > > > > > > > Do you have an estimate of the sizes? > > > # of entries:1080746 > > > [# of field-value Pairs] > > > min count:20 > > > max count:3116 > > > avg count:66 > > > > > > These are small documents and yes, full-text > > content > > > for each document can be big. I've also set the > > > MaxFieldLength to 10000 so that I dont index very > > > large values as suggested in Lucene. > > > > > > Always, the reduce fails while merging segments. I > > do > > > see a large line in Log4J output which consists of > > > > > > Typically, the job that fails is is ALWAYS VERY > > SLOW > > > as compared to other N - 1 jobs. > > > > > > Can I log the Key-Value pair sizes in the reduce > > part > > > of the indexer? > > > > > > Again, > > > > > > Thanks, > > > Venkat > > > > > > --- Devaraj Das <[EMAIL PROTECTED]> wrote: > > > > > > > While this could be a JVM/GC issue as Andrez > > pointed > > > > out, it could also be > > > > due to a very large key/value being read from > > the > > > > map output. Do you have an > > > > estimate of the sizes? Attached is a > > > > quick-hack-patch to log the sizes of > > > > the key/values read from the sequence files. > > Please > > > > apply this patch on > > > > hadoop-0.11.2 and check the userlogs what > > key/value > > > > it is failing for (if at > > > > all).. > > > > Thanks, > > > > Devaraj. > > > > > > > > > -----Original Message----- > > > > > From: Venkat Seeth [mailto:[EMAIL PROTECTED] > > > > > Sent: Tuesday, February 20, 2007 11:32 AM > > > > > To: hadoop-user@lucene.apache.org > > > > > Subject: Strange behavior - One reduce out of > > N > > > > reduces always fail. > > > > > > > > > > Hi there, > > > > > > > > > > Howdy. I've been using hadoop to parse and > > index > > > > XML > > > > > documents. Its a 2 step process similar to > > Nutch. > > > > I > > > > > parse the XML and create field-value tuples > > > > written to > > > > > a file. > > > > > > > > > > I read this file and index the field-value > > pairs > > > > in > > > > > the next step. > > > > > > > > > > Everything works fine but always one reduce > > out of > > > > N > > > > > fails in the last step when merging segments. > > It > > > > fails > > > > > with one or more of the following: > > > > > - Task failed to report status for 608 > > seconds. > > > > > Killing. > > > > > - java.lang.OutOfMemoryError: GC overhead > > limit > > > > > exceeded > > > > > > > > > > I've tried various configuration combinations > > and > > > > it > > > > > fails always at the 4th one in a 8 reduce > > > > > configuration and the first one in a 4 reduce > > > > config. > > > > > > > > > > Environment: > > > > > Suse Linux 64 bit > > > > > Java 6 (Java 5 also fails) > > > > > Hadoop-0.11-2 > > > > > Lucene-2.1 (Lucene 2.0 also fails) > > > > > > > > > > Configuration: > > > > > I have about 128 maps and 8 reduces so I get > > to > > > > create > > > > > 8 partitions of my index. It runs on a 4 node > > > > cluster > > > > > with 4-Dual-proc 64GB machines. > > > > > > > > > > Number of documents: 1.65 million each about > > 10K > > > > in > > > > > size. > > > > > > > > > > I ran with 4 or 8 task trackers per node with > > 4 GB > > > > > Heap for Job, Task trackers and the child > > JVMs. > > > > > > > > > > mergeFactor set to 50 and maxBufferedDocs at > > 1000. > > > > > > > > > > I fail to understand whats going on. When I > > run > > > > the > > > > > job individually, it works with the same > > settings. > > > > > > > > > > Why would all jobs work where in only one > > fails. > > > > > > > > > > I'd appreciate if any one can share their > > > > experience. > > > > > > > > > > Thanks, > > > > > Ven > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > __________________________________________________________________________ > > > > > __________ > > > > > Yahoo! Music Unlimited > > > > > Access over 1 million songs. > > > > > http://music.yahoo.com/unlimited > > > > > > > > > > > > > > > > > > > > > > __________________________________________________________________________ > > > __________ > > > Cheap talk? > > > Check out Yahoo! Messenger's low PC-to-Phone call > > rates. > > > http://voice.yahoo.com > > > > > > > > > __________________________________________________________________________ > __________ > Yahoo! Music Unlimited > Access over 1 million songs. > http://music.yahoo.com/unlimited