Thank you everyone. Here is the code from the driver : Configuration conf = new Configuration(); conf.addResource("/home/cluster/hadoop-1.0.3/conf/core-site.xml"); conf.addResource("/home/cluster/hadoop-1.0.3/conf/hdfs-site.xml"); Job job = new Job(conf, "XPTReader"); job.setJarByClass(XPTReader.class); job.setMapperClass(XPTMapper.class); job.setOutputKeyClass(LongWritable.class); job.setOutputValueClass(Text.class); job.setInputFormatClass(TextInputFormat.class); Path inPath = new Path("/mapin/TX.xpt"); FileInputFormat.addInputPath(job, inPath); FileOutputFormat.setOutputPath(job, new Path("/mapout/"+inPath.toString().split("/")[4]+java.util.Random.class.newInstance().nextInt())); System.exit(job.waitForCompletion(true) ? 0 : 1);
Bejoy : I have observed one strange thing. When I am using IntWritable, the output file contains the entire content of the input file, but if I am using LongWritable, the output file is empty. Sri, Code is working outside MR. Regards, Mohammad Tariq On Thu, Aug 2, 2012 at 4:38 PM, Bejoy KS <bejoy.had...@gmail.com> wrote: > Hi Tariq > > I assume the mapper being used is IdentityMapper instead of XPTMapper class. > Can you share your main class? > > If you are using TextInputFormat an reading from a file in hdfs, it should > have LongWritable Keys as input and your code has IntWritable as the input > key type. Have a check on that as well. > > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -----Original Message----- > From: Mohammad Tariq <donta...@gmail.com> > Date: Thu, 2 Aug 2012 15:48:42 > To: <mapreduce-user@hadoop.apache.org> > Reply-To: mapreduce-user@hadoop.apache.org > Subject: Re: Reading fields from a Text line > > Thanks for the response Harsh n Sri. Actually, I was trying to prepare > a template for my application using which I was trying to read one > line at a time, extract the first field from it and emit that > extracted value from the mapper. I have these few lines of code for > that : > > public static class XPTMapper extends Mapper<IntWritable, Text, > LongWritable, Text>{ > > public void map(LongWritable key, Text value, Context context) > throws IOException, InterruptedException{ > > Text word = new Text(); > String line = value.toString(); > if (!line.startsWith("TT")){ > context.setStatus("INVALID > LINE..SKIPPING........"); > }else{ > String stdid = line.substring(0, 7); > word.set(stdid); > context.write(key, word); > } > } > > But the output file contains all the rows of the input file including > the lines which I was expecting to get skipped. Also, I was expecting > only the fields I am emitting but the file contains entire lines. > Could you guys please point out the the mistake I might have made. > (Pardon my ignorance, as I am not very good at MapReduce).Many thanks. > > Regards, > Mohammad Tariq > > > On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran > <sri.ram...@gmail.com> wrote: >> Wouldn't it be better if you could skip those unwanted lines >> upfront(preprocess) and have a file which is ready to be processed by the MR >> system? In any case, more details are needed. >> >> >> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <ha...@cloudera.com> wrote: >>> >>> Mohammad, >>> >>> > But it seems I am not doing things in correct way. Need some guidance. >>> >>> What do you mean by the above? What is your written code exactly >>> expected to do and what is it not doing? Perhaps since you ask for a >>> code question here, can you share it with us (pastebin or gists, >>> etc.)? >>> >>> For skipping 8 lines, if you are using splits, you need to detect >>> within the mapper or your record reader if the map task filesplit has >>> an offset of 0 and skip 8 line reads if so (Cause its the first split >>> of some file). >>> >>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <donta...@gmail.com> wrote: >>> > Hello list, >>> > >>> > I have a flat file in which data is stored as lines of 107 >>> > bytes each. I need to skip the first 8 lines(as they don't contain any >>> > valuable info). Thereafter, I have to read each line and extract the >>> > information from them, but not the line as a whole. Each line is >>> > composed of several fields without any delimiter between them. For >>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I >>> > was trying to reach each line as a Text value, convert it into string >>> > and using String.subring() method to extract the value of each field. >>> > But it seems I am not doing things in correct way. Need some >>> > guidance. Many thanks. >>> > >>> > Regards, >>> > Mohammad Tariq >>> >>> >>> >>> -- >>> Harsh J >> >> >> >> >> -- >> It's just about how deep your longing is! >>