Hi all, For the TextInputFormat class, the input key is a file position. This is working well. But when I switch to LzoTextInputFormat to read LZO files, the key does not make sense. It does not indicate file position. Is the file position supported with LzoTextInputFormat?
Here is a job that prints out file position and line. public class Test { public static class Map extends Mapper<LongWritable, Text, LongWritable, Text> { private Text outputValue = new Text(); /* * Outputs key,value pair. * key = offset * value = string */ public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String s = value.toString(); if (s.length() > 64) { s = s.substring(0, 64); } this.outputValue.set(s); context.write(key, this.outputValue); } } public static void main(String[] args) throws Exception { Configuration c = new Configuration(); Job j = new Job(c, "Test"); j.setJarByClass(TomcatLogTest.class); FileInputFormat.addInputPath(j, new Path(args[0])); FileOutputFormat.setOutputPath(j, new Path(args[1])); j.setMapperClass(Map.class); j.setInputFormatClass(LzoTextInputFormat.class); j.setOutputFormatClass(TextOutputFormat.class); j.setMapOutputKeyClass(LongWritable.class); j.setMapOutputValueClass(Text.class); j.setOutputKeyClass(LongWritable.class); j.setOutputValueClass(Text.class); if (!j.waitForCompletion(true)) { System.exit(1); } } } The output is: 0 [WEB.WWW.WARNING.30000][Mon 2012/01/09 14:00:00:933 PST][com.wm. 101200 =DynamicItem to String MethodDynamicItem{id=15762417, timestamp= 101200 { 101200 2012-01-09 14:16:19:195 - TP-Processor2, 29718094 -> L2 STRAND B 101200 2012-01-09 14:16:19:192 - TP-Processor2, 29718094 -> hostName=ed 101200 2012-01-09 14:16:19:186 - pool-113-thread-2, 11661605 -> hostNam 101200 SESSION FILTER BENCH: pre-process 0 millis <SessionID: 000000086 101200 TOMCAT REQ: /ip/Archangels-Chessmen/17703726 Mon Jan 09 14:16:19 101200 TIMESTAMP: Mon Jan 9 14:16:11 PST 2012 101200 TOMCAT BENCH: /verify.gsp?novisitor=true&noses=true 3 elapsed Mo 101200 101200 [WEB.WWW.WARNING.PLATFORM][Mon 2012/01/09 14:16:11:778 PST][com. 101200 101200 [WEB.WWW.WARNING.PLATFORM][Mon 2012/01/09 14:16:11:778 PST][com. 101200 TOMCAT REQ: /verify.gsp?novisitor=true&noses=true Mon Jan 09 14: 101200 TOMCAT BENCH: /verify.gsp?novisitor=true&noses=true 3 elapsed Mo 101200 101200 [WEB.WWW.WARNING.PLATFORM][Mon 2012/01/09 14:16:03:767 PST][com. ... The file position does change but it does not make sense to me. Is there any way to get the file position of a line so I can print out that line later? Any help would be helpful! Thanks!