Hi. I'm quite new to Hadoop programming, so to get a good start I started writing my own program that summarizes a column in a large tab separated file (~100 000 000 lines). My first naive implementation was quite simple, a small rework of the WordCounter example that comes with Hadoop. This program did calculate the correct answer, but it performed quite badly, since every line in the file invokes a call to map(). To solve this, I wrote my own RecordReader, one that would return a List<Text> instead of just a Text. It does type check in Eclipse and all seems to be fine until I actually run the program. When I do, I get the following error:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to java.util.List at Summarizer$TokenizerMapper.map(Summarizer.java:1) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303) at org.apache.hadoop.mapred.Child.main(Child.java:170) (repeated several times) What might be the problem? And are there maybe InputFormat (that are not marked as Deprecated) that already solves my problem? Source code: Summarizer: http://pastebin.com/m52876939 RecordReader: http://pastebin.com/m2c541a00 InputFormat: http://pastebin.com/m7714b0c Hadoop version: 0.20.0 Java JDK version: 1.6 u14 Regards, Per and Felix