Well, I also wish it was this simple, but as I said in the original message, I never wanted to use LongWritable at all. Here is how I set the job conf, and after that, is the reduce task. Also, if I got the incorrect output key/value type, shouldn't it always fail as soon as the reduce task is run? But my code behaves strangely that sometimes the exception didn't get thrown until a few iterations had been successfully passed... Does the code reveal something that I missed? Thanks.
JobConf countNewCatalogJobConf = new JobConf(ThreddsCatalogIndexer.class); countNewCatalogJobConf.setJobName("Count-New-Catalog-" + iteration); countNewCatalogJobConf.setInputPath(newCatUrlDir); countNewCatalogJobConf.setInputFormat(KeyValueTextInputFormat.class); countNewCatalogJobConf.setOutputPath(newCatalogCountDir); countNewCatalogJobConf.setOutputKeyClass(Text.class); countNewCatalogJobConf.setOutputValueClass(Text.class); countNewCatalogJobConf.setReducerClass(NewCatalogCounter.class); countNewCatalogJobConf.setNumReduceTasks(1); JobClient.runJob(countNewCatalogJobConf); public void reduce(WritableComparable key, Iterator values, OutputCollector output, Reporter reporter) throws IOException { long sum = 0; if (key.toString().equals("NEWCAT")) { while (values.hasNext()) { sum++; } } Text sumText = new Text(); String sumString = (new Long(sum)).toString(); sumText.set(sumString); output.collect(key, sumText); } On Jan 16, 2008 12:58 AM, Vadim Zaliva <[EMAIL PROTECTED]> wrote: > On Jan 15, 2008, at 21:53, Jim the Standing Bear wrote: > > I was asking lot of questions today, so I am glad to contribute at > least one answer. I have this problem when there was type mismatch > for key or values. You need to set up right type at your JobConf like > this: > > conf.setOutputKeyClass(Text.class); > conf.setOutputValueClass(LongWritable.class); > > (using appropriate types our mapper produce) > > Vadim > > > > I am using hadoop 0.15.1 to index some catalog that has a tree-like > > structure, where the leaf nodes are data files. My main task is a > > loop that performs a breadth-first walkthrough that parses out URLs to > > catalogs and datafiles at the next level, which is done in a mapper. > > To determine when the loop should terminate, I use a reduce task that > > counts the number of new catalogs found, and stops the loop when the > > count is 0. > > > > But while I was running the jobs, I kept getting this exception > > (pasted below from the logs). I didn't quite understand what it was > > trying to say. But in my code, I never used LongWritable. Only Text > > for output key and output values, and KeyValueTextInputFormat for > > input. > > > > What's weirder is that this exception occurs at different places from > > job to job. Sometimes it may be thrown at the 2nd iteration of my > > loop, while other times, it may be the 3rd, the 4th etc. Can someone > > explain to me what and why this is? Also, what would be the best way > > to test/debug a hadoop job?? Thanks. > > > > > > 2008-01-16 00:37:19,941 INFO org.apache.hadoop.mapred.ReduceTask: > > task_200801160024_0011_r_000000_1 Copying > > task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org > > 2008-01-16 00:37:19,953 INFO org.apache.hadoop.mapred.ReduceTask: > > task_200801160024_0011_r_000000_1 done copying > > task_200801160024_0011_m_000000_0 output from ginkgo.mycluster.org > > 2008-01-16 00:37:19,955 INFO org.apache.hadoop.mapred.ReduceTask: > > task_200801160024_0011_r_000000_1 Copying of all map outputs complete. > > Initiating the last merge on the remaining files in > > ramfs://mapoutput26453615 > > 2008-01-16 00:37:20,088 WARN org.apache.hadoop.mapred.ReduceTask: > > task_200801160024_0011_r_000000_1 Final merge of the inmemory files > > threw an exception: java.io.IOException: java.io.IOException: wrong > > key class: class org.apache.hadoop.io.LongWritable is not class > > org.apache.hadoop.io.Text > > at org.apache.hadoop.io.SequenceFile$Sorter > > $SegmentDescriptor.nextRawKey(SequenceFile.java:2874) > > at org.apache.hadoop.io.SequenceFile$Sorter > > $MergeQueue.merge(SequenceFile.java:2683) > > at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java: > > 2437) > > at org.apache.hadoop.mapred.ReduceTask > > $ReduceCopier.fetchOutputs(ReduceTask.java:1153) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252) > > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: > > 1760) > > > > at org.apache.hadoop.mapred.ReduceTask > > $ReduceCopier.fetchOutputs(ReduceTask.java:1161) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:252) > > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: > > 1760) > > > > 2008-01-16 00:37:20,090 WARN org.apache.hadoop.mapred.TaskTracker: > > Error running child > > java.io.IOException: task_200801160024_0011_r_000000_1The reduce > > copier failed > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:253) > > at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: > > 1760) > > > > > > > > -- > > -------------------------------------- > > Standing Bear Has Spoken > > -------------------------------------- > > -- -------------------------------------- Standing Bear Has Spoken --------------------------------------