RE: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1
Hi Devraj, It was pretty consistent with my comparator class in my old email(the one that uses UTF8). While trying to resolve the issue, I changed UTF8 to Text. That made it disappear for a while but then it came back again. My new Comparator class(with Text) is - public class IncrementalURLIndexKey implements WritableComparable { private Text url; private long userid; public IncrementalURLIndexKey() { } public IncrementalURLIndexKey(Text url, long userid) { this.url = url; this.userid = userid; } public Text getUrl() { return url; } public long getUserid() { return userid; } public void write(DataOutput out) throws IOException { url.write(out); out.writeLong(userid); } public void readFields(DataInput in) throws IOException { url = new Text(); url.readFields(in); userid = in.readLong(); } public int compareTo(Object o) { IncrementalURLIndexKey other = (IncrementalURLIndexKey) o; int result = url.compareTo(other.getUrl()); if (result == 0) result = CUID.compare(userid, other.userid); return result; } /** * A Comparator optimized for IncrementalURLIndexKey. */ public static class GroupingComparator extends WritableComparator { public GroupingComparator() { super(IncrementalURLIndexKey.class, true); } public int compare(WritableComparable a, WritableComparable b) { IncrementalURLIndexKey key1 = (IncrementalURLIndexKey) a; IncrementalURLIndexKey key2 = (IncrementalURLIndexKey) b; return key1.getUrl().compareTo(key2.getUrl()); } } static { WritableComparator.define(IncrementalURLIndexKey.class, new GroupingComparator()); } } Thanks, Deepika -Original Message- From: Devaraj Das [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 28, 2008 9:01 PM To: core-user@hadoop.apache.org Subject: Re: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1 Quick question (I haven't looked at your comparator code yet) - is this reproducible/consistent? On 10/28/08 11:52 PM, Deepika Khera [EMAIL PROTECTED] wrote: I am getting a similar exception too with Hadoop 0.18.1(See stacktrace below), though its an EOFException. Does anyone have any idea about what it means and how it can be fixed? 2008-10-27 16:53:07,407 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810241922_0844_r_06_0 Merge of the inmemory files threw an exception: java.io.IOException: Intermedate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn MemMerge(ReduceTask.java:2147) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run( ReduceTask.java:2078) Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 103) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:269) at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:122) at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:49) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:321) at org.apache.hadoop.mapred.Merger.merge(Merger.java:72) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn MemMerge(ReduceTask.java:2123) ... 1 more Caused by: java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103) at com.collarity.io.IOUtil.readUTF8(IOUtil.java:213) at com.collarity.url.IncrementalURLIndexKey.readFields(IncrementalURLIndexK ey.java:40) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 97) ... 7 more 2008-10-27 16:53:07,407 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810241922_0844_r_06_0 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 103) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:269) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:135) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:102) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.ja va:226) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:83) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(Reduc eTask.java:2035) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at com.collarity.io.IOUtil.readUTF8(IOUtil.java:213) at com.collarity.url.IncrementalURLIndexKey.readFields(IncrementalURLIndexK ey.java:40) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 97) ... 7 more
RE: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1
Wow, if the issue is fixed with version 0.20, then could we please have a patch for version 0.18? Thanks, Deepika -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Thursday, October 30, 2008 12:19 PM To: core-user@hadoop.apache.org Subject: Re: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1 So, Philippe reports that the problem goes away with 0.20-dev (trunk?): http://mahout.markmail.org/message/swmzreg6fnzf6icv We aren't totally clear on the structure of SVN for Hadoop, but it seems like it is not fixed by this patch. On Oct 29, 2008, at 10:28 AM, Grant Ingersoll wrote: We'll try it out... On Oct 28, 2008, at 3:00 PM, Arun C Murthy wrote: On Oct 27, 2008, at 7:05 PM, Grant Ingersoll wrote: Hi, Over in Mahout (lucene.a.o/mahout), we are seeing an oddity with some of our clustering code and Hadoop 0.18.1. The thread in context is at: http://mahout.markmail.org/message/vcyvlz2met7fnthr The problem seems to occur when going from 0.17.2 to 0.18.1. In the user logs, we are seeing the following exception: 2008-10-27 21:18:37,014 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 5011 bytes 2008-10-27 21:18:37,033 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810272112_0011_r_00_0 Merge of the inmemory files threw an exception: java.io.IOException: Intermedate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2147) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.run(ReduceTask.java:2078) Caused by: java.lang.NumberFormatException: For input string: [ If you are sure that this isn't caused by your application-logic, you could try running with http://issues.apache.org/jira/browse/HADOOP-4277 . That bug caused many a ship to sail in large circles, hopelessly. Arun at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java: 1224) at java.lang.Double.parseDouble(Double.java:510) at org.apache.mahout.matrix.DenseVector.decodeFormat(DenseVector.java: 60) at org .apache .mahout.matrix.AbstractVector.decodeVector(AbstractVector.java:256) at org .apache .mahout .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:38) at org .apache .mahout .clustering.kmeans.KMeansCombiner.reduce(KMeansCombiner.java:31) at org.apache.hadoop.mapred.ReduceTask $ReduceCopier.combineAndSpill(ReduceTask.java:2174) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier.access $3100(ReduceTask.java:341) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier $InMemFSMergeThread.doInMemMerge(ReduceTask.java:2134) And in the main output log (from running bin/hadoop jar mahout/ examples/build/apache-mahout-examples-0.1-dev.job org.apache.mahout.clustering.syntheticcontrol.kmeans.Job) we see: 08/10/27 21:18:41 INFO mapred.JobClient: Task Id : attempt_200810272112_0011_r_00_0, Status : FAILED java.io.IOException: attempt_200810272112_0011_r_00_0The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) at org.apache.hadoop.mapred.TaskTracker $Child.main(TaskTracker.java:2207) If I run this exact same job on 0.17.2 it all runs fine. I suppose either a bug was introduced in 0.18.1 or a bug was fixed that we were relying on. Looking at the release notes between the fixes, nothing in particular struck me as related. If it helps, I can provide the instructions for how to run the example in question (they need to be written up anyway!) I see some related things at http://hadoop.markmail.org/search/?q=Merge+of+the+inmemory+files+threw+a n+exception , but those are older, it seems, so not sure what to make of them. Thanks, Grant -- Grant Ingersoll Lucene Boot Camp Training Nov. 3-4, 2008, ApacheCon US New Orleans. http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Merge of the inmemory files threw an exception and diffs between 0.17.2 and 0.18.1
I am getting a similar exception too with Hadoop 0.18.1(See stacktrace below), though its an EOFException. Does anyone have any idea about what it means and how it can be fixed? 2008-10-27 16:53:07,407 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810241922_0844_r_06_0 Merge of the inmemory files threw an exception: java.io.IOException: Intermedate merge failed at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn MemMerge(ReduceTask.java:2147) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run( ReduceTask.java:2078) Caused by: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 103) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:269) at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:122) at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:49) at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:321) at org.apache.hadoop.mapred.Merger.merge(Merger.java:72) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doIn MemMerge(ReduceTask.java:2123) ... 1 more Caused by: java.io.EOFException at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:323) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:103) at com.collarity.io.IOUtil.readUTF8(IOUtil.java:213) at com.collarity.url.IncrementalURLIndexKey.readFields(IncrementalURLIndexK ey.java:40) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 97) ... 7 more 2008-10-27 16:53:07,407 WARN org.apache.hadoop.mapred.ReduceTask: attempt_200810241922_0844_r_06_0 Merging of the local FS files threw an exception: java.io.IOException: java.lang.RuntimeException: java.io.EOFException at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 103) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:269) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:135) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:102) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.ja va:226) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:242) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:83) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(Reduc eTask.java:2035) Caused by: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at com.collarity.io.IOUtil.readUTF8(IOUtil.java:213) at com.collarity.url.IncrementalURLIndexKey.readFields(IncrementalURLIndexK ey.java:40) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 97) ... 7 more at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(Reduc eTask.java:2039) 2008-10-27 16:53:07,907 WARN org.apache.hadoop.mapred.TaskTracker: Error running child java.io.IOException: attempt_200810241922_0844_r_06_0The reduce copier failed at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:255) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) My WritableComparable class looks like this - public class IncrementalURLIndexKey implements WritableComparable { private UTF8 url; private long userid; public IncrementalURLIndexKey() { } public IncrementalURLIndexKey(UTF8 url, long userid) { this.url = url; this.userid = userid; } public UTF8 getUrl() { return url; } public long getUserid() { return userid; } public void write(DataOutput out) throws IOException { IOUtil.writeUTF8(out, url); out.writeLong(userid); } public void readFields(DataInput in) throws IOException { url = IOUtil.readUTF8(in, url); userid = in.readLong(); } public int compareTo(Object o) { IncrementalURLIndexKey other = (IncrementalURLIndexKey) o; int result = url.compareTo(other.getUrl()); if (result == 0) result = CUID.compare(userid, other.userid); return result; } /** * A Comparator optimized for IncrementalURLIndexKey. */ public static class GroupingComparator extends WritableComparator { public GroupingComparator() { super(IncrementalURLIndexKey.class, true); } public int compare(WritableComparable a, WritableComparable b) { IncrementalURLIndexKey key1 = (IncrementalURLIndexKey) a; IncrementalURLIndexKey key2 = (IncrementalURLIndexKey) b; return key1.getUrl().compareTo(key2.getUrl()); } } static { WritableComparator.define(IncrementalURLIndexKey.class, new GroupingComparator()); } } Thanks, Deepika -Original Message- From: Grant
Katta presentation slides
Hi Stefan, Are the slides from the Katta presentation up somewhere? If not then could you please post them? Thanks, Deepika
Hadoop 0.18 stable?
Hi , When is the hadoop 0.18 version expected to be stable? I was looking into upgrading to it. Are there any known critical issues that we've run into in this version? Thanks, Deepika
RE: Cannot read reducer values into a list
Thanks...this works beautifully :) ! Deepika -Original Message- From: Owen O'Malley [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 20, 2008 7:52 AM To: core-user@hadoop.apache.org Subject: Re: Cannot read reducer values into a list On Aug 19, 2008, at 4:57 PM, Deepika Khera wrote: Thanks for the clarification on this. So, it seems like cloning the object before adding to the list is the only solution for this problem. Is that right? Yes. You can use WritableUtils.clone to do the job. -- Owen
RE: Cannot read reducer values into a list
Hi, Are we sure that this issue was fixed in 0.17.0(or do we need to patch). I am using this version and I still see the issue? Thanks, Deepika -Original Message- From: Arun C Murthy [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 19, 2008 1:04 PM To: core-user@hadoop.apache.org Subject: Re: Cannot read reducer values into a list On Aug 19, 2008, at 12:17 PM, Stuart Sierra wrote: Hello list, Thought I would share this tidbit that frustrated me for a couple of hours. Beware! Hadoop reuses the Writable objects given to the reducer. For example: Yes. http://issues.apache.org/jira/browse/HADOOP-2399 - fixed in 0.17.0. Arun public void reduce(K key, IteratorV values, OutputCollectorK, V output, Reporter reporter) throws IOException { ListV valueList = new ArrayListV(); while (values.hasNext()) { valueList.add(values.next()); } // Say there were 10 values. valueList now contains 10 // pointers to the same object. } I assume this is done for efficiency, but a warning in the Reducer documentation would be nice. -Stuart
RE: Distributed Lucene - from hadoop contrib
Thank you for your response. I was imagining the 2 concepts of i) using hadoop.contrib.index to index documents ii) providing search in a distributed fashion, to be all in one box. So basically, hadoop.contrib.index is used to create lucene indexes in a distributed fashion (by creating shards-each shard being a lucene instance). And then I can use Katta or any other Distributed Lucene application to serve lucene indexes distributed over many servers. Deepika -Original Message- From: Ning Li [mailto:[EMAIL PROTECTED] Sent: Friday, August 08, 2008 7:08 AM To: core-user@hadoop.apache.org Subject: Re: Distributed Lucene - from hadoop contrib 1) Katta n Distributed Lucene are different projects though, right? Both being based on kind of the same paradigm (Distributed Index)? The design of Katta and that of Distributed Lucene are quite different last time I checked. I pointed out the Katta project because you can find the code for Distributed Lucene there. 2) So, I should be able to use the hadoop.contrib.index with HDFS. Though, it would be much better if it is integrated with Distributed Lucene or the Katta project as these are designed keeping the structure and behavior of indexes in mind. Right? As described in the README file, hadoop.contrib.index uses map/reduce to build Lucene instances. It does not contain a component that serves queries. If that's not sufficient for you, you can check out the designs of Katta and Distributed Index and see which one suits your use better. Ning
RE: Distributed Lucene - from hadoop contrib
Hey guys, I would appreciate any feedback on this Deepika -Original Message- From: Deepika Khera [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 06, 2008 5:39 PM To: core-user@hadoop.apache.org Subject: Distributed Lucene - from hadoop contrib Hi, I am planning to use distributed lucene from hadoop.contrib.index for indexing. Has anyone used this or tested it? Any issues or comments? I see that the design described is different from HDFS (Namenode is stateless, stores no information regarding blocks for files, etc) . Does anyone know how hard will it be to setup this kind of system or is there something that can be reused. A reference link - http://wiki.apache.org/hadoop/DistributedLucene Thanks, Deepika
Distributed Lucene - from hadoop contrib
Hi, I am planning to use distributed lucene from hadoop.contrib.index for indexing. Has anyone used this or tested it? Any issues or comments? I see that the design described is different from HDFS (Namenode is stateless, stores no information regarding blocks for files, etc) . Does anyone know how hard will it be to setup this kind of system or is there something that can be reused. A reference link - http://wiki.apache.org/hadoop/DistributedLucene Thanks, Deepika