skipping the map input value and key?!!

Xine Jar Fri, 04 Sep 2009 05:47:10 -0700

Hallo,
I have a mapreduce application reading from an existing hbase table. The map
function searches for some values in the table and the reduce function
averages them.


*My question is simple :

****Method1****
*I have initially written the program passing to the Map function the* "
Input key type: ImmutableBytesWritable and Input Value:RowResult"**. *I have
set of course the *setInputFormat(TableInputFormat.class)* and set as well
the COLUM_LIST.

I have added a debug user counter in order to check how often my table has
been read and discovered (with your help as well) that the table is read N
times where N is the number of rows in the table. Which was of course not
acceptable. This was due to the fact that I am passing the RowResult as an
input to the Map function.

*****Method2*****
I decided not to pass the RowResult as an input format to the map but I have
passed a Text which in fact I am not using at all in the map function, I
have used it only in oder to pass anything so that hadoop does not give me
an error :) . Then, similarly to the first method, in the map function I
have created a scanner on the hbase table and started reading the rows.

With this solution, Once I haven't passed the RowResult as a parameter ot
the mapper, the job was much faster and the table was read only once!!!
Perfect!

*Question

**-*Are there any hidden performance issues or complications behind my
method 2?

-It is true that I reached a solution with what I have done but I am
wondering if I can do it in a cleaner way. So I was wondering if I could
somehow skip the fact passing an input key and input value to the map? If
yes how?

Regards,
CJ

skipping the map input value and key?!!

Reply via email to