RE: how to specify key and value for an input to mapreduce job

Vinayakumar B Tue, 14 Feb 2012 07:29:10 -0800

Hi Vamshi,

1. To read the input which have both key and value in text format you can use
KeyValueTextInputFormat inside org.apache.hadoop.mapreduce.lib.input package as 
InputFormat class to your Job. This Input format will have 
KeyValueLineRecordReader which will read the line and separate the key and 
value present in the same line.
Here you need to set the keyValue separator using following configuration in 
the job configuration.
"mapreduce.input.keyvaluelinerecordreader.key.value.separator"
Be default this will be '\t'.


2. Reduce output will be default TextOutputFormat with LongWritable key and 
Text value.
In Your case u need to have Text as both Key and Value.
Since you were using default TextInputFormat, u were getting complete line as 
the Value and the position as the key. Now if you use KeyValueTextInputFormat 
you will get the desired result.

Thanks and Regards,
Vinayakumar B
______________________
________________________________
From: Vamshi Krishna [vamshi2...@gmail.com]
Sent: Tuesday, February 14, 2012 8:28 PM
To: mapreduce-user@hadoop.apache.org
Subject: how to specify key and value for an input to mapreduce job

Hi all,
i have a job which read all the rows from a hbase table and had written them to 
a location in dfs i.e  /user/HSOP. HSOP is a folder which has 9 files each 
having their content as
00015DEGgJ    -HM
00016Pc4Tl    -HM
0001H0iImI    -HM
0001Oyb0Ju    -HM
0001hwBEOr    -HM
0002Qx2Uj9    -HM
0002jCs6gr    -HM
0003PMcWRa    -HM
000488xKIE    -HM

Both 1st and second columns are of Text type as specified in the first job's 
outputformat class.

Now i want onemore job to read all these files as input and and treat first 
column  element as "key" and second column  element as "value". For that i 
tried starting one job by specifying  line 
job.getConfiguration().set("key.value.separator.in.input.line", "-");

In the reduce() method i had context.write(key, value);  key is Longwritable 
and value is Text. But if i see the output of this job, i had seen the format 
like,

46    0002mCjpo9    -HM
253    000AxT9LSA    -HM
460    000FYtnxiB    -HM
667    000WNVBo9N    -HM
874    000dQiseKz    -HM

But i don't want first column to be added to each row. Please how to do that,
somebody help.

RE: how to specify key and value for an input to mapreduce job

Reply via email to