Re: unable to figure out this exception from reduce task

Jim the Standing Bear Tue, 15 Jan 2008 22:10:08 -0800

Well, I also wish it was this simple, but as I said in the original
message, I never wanted to use LongWritable at all.  Here is how I set
the job conf, and after that, is the reduce task.  Also, if I got the
incorrect output key/value type, shouldn't it always fail as soon as
the reduce task is run?  But my code behaves strangely that sometimes
the exception didn't get thrown until a few iterations had been
successfully passed...  Does the code reveal something that I missed?
Thanks.


            JobConf countNewCatalogJobConf = new
JobConf(ThreddsCatalogIndexer.class);
            countNewCatalogJobConf.setJobName("Count-New-Catalog-" + iteration);
            countNewCatalogJobConf.setInputPath(newCatUrlDir);
            
countNewCatalogJobConf.setInputFormat(KeyValueTextInputFormat.class);
            countNewCatalogJobConf.setOutputPath(newCatalogCountDir);
            countNewCatalogJobConf.setOutputKeyClass(Text.class);
            countNewCatalogJobConf.setOutputValueClass(Text.class);
            countNewCatalogJobConf.setReducerClass(NewCatalogCounter.class);
            countNewCatalogJobConf.setNumReduceTasks(1);
            JobClient.runJob(countNewCatalogJobConf);




    public void reduce(WritableComparable key, Iterator values,
OutputCollector output, Reporter reporter) throws IOException {
        long sum = 0;
        if (key.toString().equals("NEWCAT")) {
            while (values.hasNext()) {
                sum++;
            }
        }
        Text sumText = new Text();
        String sumString = (new Long(sum)).toString();
        sumText.set(sumString);
        output.collect(key, sumText);
    }


On Jan 16, 2008 12:58 AM, Vadim Zaliva <[EMAIL PROTECTED]> wrote:
> On Jan 15, 2008, at 21:53, Jim the Standing Bear wrote:
>
> I was asking lot of questions today, so I am glad to contribute at
> least one answer. I have this problem when there was type mismatch
> for key or values. You need to set up right type at your JobConf like
> this:
>
>          conf.setOutputKeyClass(Text.class);
>          conf.setOutputValueClass(LongWritable.class);
>
> (using appropriate types our mapper produce)
>
> Vadim
>
>
> > I am using hadoop 0.15.1 to index some catalog that has a tree-like
> > structure, where the leaf nodes are data files.  My main task is a
> > loop that performs a breadth-first walkthrough that parses out URLs to
> > catalogs and datafiles at the next level, which is done in a mapper.
> > To determine when the loop should terminate, I use a reduce task that
> > counts the number of new catalogs found, and stops the loop when the
> > count is 0.
> >
> > But while I was running the jobs, I kept getting this exception
> > (pasted below from the logs).  I didn't quite understand what it was
> > trying to say.  But in my code, I never used LongWritable.  Only Text
> > for output key and output values, and KeyValueTextInputFormat for
> > input.
> >
> > What's weirder is that this exception occurs at different places from
> > job to job.  Sometimes it may be thrown at the 2nd iteration of my
> > loop, while other times, it may be the 3rd, the 4th etc.  Can someone
> > explain to me what and why this is?  Also, what would be the best way
> > to test/debug a hadoop job??  Thanks.
> >
> >
> > 2008-01-16 00:37:19,941 INFO org.apache.hadoop.mapred.ReduceTask:
> > task_200801160024_0011_r_000000_1 Copying
> > task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org
> > 2008-01-16 00:37:19,953 INFO org.apache.hadoop.mapred.ReduceTask:
> > task_200801160024_0011_r_000000_1 done copying
> > task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org
> > 2008-01-16 00:37:19,955 INFO org.apache.hadoop.mapred.ReduceTask:
> > task_200801160024_0011_r_000000_1 Copying of all map outputs complete.
> > Initiating the last merge on the remaining files in
> > ramfs://mapoutput26453615
> > 2008-01-16 00:37:20,088 WARN org.apache.hadoop.mapred.ReduceTask:
> > task_200801160024_0011_r_000000_1 Final merge of the inmemory files
> > threw an exception: java.io.IOException: java.io.IOException: wrong
> > key class: class org.apache.hadoop.io.LongWritable is not class
> > org.apache.hadoop.io.Text
> >       at org.apache.hadoop.io.SequenceFile$Sorter
> > $SegmentDescriptor.nextRawKey(SequenceFile.java:2874)
> >       at org.apache.hadoop.io.SequenceFile$Sorter
> > $MergeQueue.merge(SequenceFile.java:2683)
> >       at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:
> > 2437)
> >       at org.apache.hadoop.mapred.ReduceTask
> > $ReduceCopier.fetchOutputs(ReduceTask.java:1153)
> >       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252)
> >       at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
> > 1760)
> >
> >       at org.apache.hadoop.mapred.ReduceTask
> > $ReduceCopier.fetchOutputs(ReduceTask.java:1161)
> >       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252)
> >       at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
> > 1760)
> >
> > 2008-01-16 00:37:20,090 WARN org.apache.hadoop.mapred.TaskTracker:
> > Error running child
> > java.io.IOException: task_200801160024_0011_r_000000_1The reduce
> > copier failed
> >       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253)
> >       at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:
> > 1760)
> >
> >
> >
> > --
> > --------------------------------------
> > Standing Bear Has Spoken
> > --------------------------------------
>
>



-- 
--------------------------------------
Standing Bear Has Spoken
--------------------------------------

Re: unable to figure out this exception from reduce task

Reply via email to