Re: How to serialize POJO as mapper output, for output destination in Hbase table?

yz5od2 Tue, 20 Oct 2009 07:47:39 -0700

Thanks Ryan, Kevin and Stack for your helpful answers andrecommendations!


On Oct 20, 2009, at 1:58 AM, Ryan Rawson wrote:

I have to recommend doing the puts via the API straight in the mapper.
Passing all your data thru the shuffle is not necessary, since
inserting into hbase is a form of sorting.  Besides lets not copy a
100gb import more times than we have to, right?
On Mon, Oct 19, 2009 at 11:41 PM, Kevin Peterson<[email protected]> wrote:
On Mon, Oct 19, 2009 at 7:40 PM, yz5od2 <woods5242-[email protected]>wrote:
ok, so what you are saying is that my mapper should talk directlyto Hbaseto write the data into it? Or I should define my Mapperimplementation class
like

Mapper<LongWritable,Text,Text,byte[]>
Your Mapper must output a Hadoop Writable. You have two options:
1. Handle HBase all yourself, and you are just using Hadoop as away todistribute your load and data across your cluster. Then you canjust useNullWritables and not call output.collect (0.19 API) orcontext.write (0.20
API) at all.
2. Output HBase Puts and Deletes from the Mapper and useTableOutputFormat.Put and Delete extend Writable, but don't share a more specificsuperclass,so the signature for the Mapper is the somewhat confusing <K1, V1,K2,Writable>, where K1 and V1 are whatever is needed for your input,and K2 is
completely ignored.
The second one would involve writing less code. You would dosomething like
this:

byte[] rowId = ...;
byte[] content = pojo.serialize();
Put put = new Put(rowId);
put.add(Bytes.toBytes("content"), Bytes.toBytes("thrift-thingie"),content);
context.write(NullWritable.get(), put);
As Ryan says, you don't want to use Hadoop writables as yourserializationscheme, but they are part of the API to pass data to an outputformat.
I don't know if the first has any advantages. Probably flexibilty,and
better control over details like when to flush the commits.

Re: How to serialize POJO as mapper output, for output destination in Hbase table?

Reply via email to