Re: Reading fields from a Text line

Bejoy KS Thu, 02 Aug 2012 04:08:56 -0700

Hi Tariq

I assume the mapper being used is IdentityMapper instead of XPTMapper class. 
Can you share your main class?


If you are using TextInputFormat an reading from a file in hdfs, it should have 
LongWritable Keys as input and your code has IntWritable as the input key type. 
Have a check on that as well.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Mohammad Tariq <donta...@gmail.com>
Date: Thu, 2 Aug 2012 15:48:42 
To: <mapreduce-user@hadoop.apache.org>
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Reading fields from a Text line

Thanks for the response Harsh n Sri. Actually, I was trying to prepare
a template for my application using which I was trying to read one
line at a time, extract the first field from it and emit that
extracted value from the mapper. I have these few lines of code for
that :

public static class XPTMapper extends Mapper<IntWritable, Text,
LongWritable, Text>{
                                
                public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{
                
                        Text word = new Text();
                        String line = value.toString();
                        if (!line.startsWith("TT")){
                                context.setStatus("INVALID 
LINE..SKIPPING........");
                        }else{
                                String stdid = line.substring(0, 7);
                                word.set(stdid);
                                context.write(key, word);
                        }
                }

But the output file contains all the rows of the input file including
the lines which I was expecting to get skipped. Also, I was expecting
only the fields I am emitting but the file contains entire lines.
Could you guys please point out the the mistake I might have made.
(Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

Regards,
    Mohammad Tariq


On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
<sri.ram...@gmail.com> wrote:
> Wouldn't it be better if you could skip those unwanted lines
> upfront(preprocess) and have a file which is ready to be processed by the MR
> system? In any case, more details are needed.
>
>
> On Thu, Aug 2, 2012 at 8:23 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Mohammad,
>>
>> > But it seems I am not doing  things in correct way. Need some guidance.
>>
>> What do you mean by the above? What is your written code exactly
>> expected to do and what is it not doing? Perhaps since you ask for a
>> code question here, can you share it with us (pastebin or gists,
>> etc.)?
>>
>> For skipping 8 lines, if you are using splits, you need to detect
>> within the mapper or your record reader if the map task filesplit has
>> an offset of 0 and skip 8 line reads if so (Cause its the first split
>> of some file).
>>
>> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq <donta...@gmail.com> wrote:
>> > Hello list,
>> >
>> >        I have a flat file in which data is stored as lines of 107
>> > bytes each. I need to skip the first 8 lines(as they don't contain any
>> > valuable info). Thereafter, I have to read each line and extract the
>> > information from them, but not the line as a whole. Each line is
>> > composed of several fields without any delimiter between them. For
>> > example, the first field is of 8 bytes, second of 2 bytes and so on. I
>> > was trying to reach each line as a Text value, convert it into string
>> > and using String.subring() method to extract the value of each field.
>> > But it seems I am not doing  things in correct way. Need some
>> > guidance. Many thanks.
>> >
>> > Regards,
>> >     Mohammad Tariq
>>
>>
>>
>> --
>> Harsh J
>
>
>
>
> --
> It's just about how deep your longing is!
>

Re: Reading fields from a Text line

Reply via email to