Hello all,

I am currently testing some MR job which does a form of a naive join on a
HBase table with itself or a Hadoop file with itself.

My findings are that HBase table has a huge overhead (in comparing to Hadoop
file) when doing my join (as much as 50 times slower).

The map function basically uses the column I wish to join by  as the key and
the rest of the columns are used as value.
The reducer just combines all of  the values into a single row.

I used 1Million rows for the join.
I am using 28 mappers and 21 reducers.
All in all I have 7 nodes.

 My question is this:
Should there be such a big overhead (50 times multiplier) when using HBase
instead of Hadoop?


Thanks,
Eran.

Reply via email to