Re: Writing Multiple Columns in HBase Sync

Hari Shreedharan Sun, 10 Jun 2012 00:02:09 -0700

Hi Patrick, 

The HbaseSink has 2 components - one being the sink itself and the other being 
the serializer. When the sink picks up an event from the channel, it is handed 
over to the serializer which can process the event and return Puts and/or 
Increments. So if you plan to write to different columns within the same column 
family, all you need to do is to write your own serializer that implements 
HbaseEventSerializer, and set that as the serializer for the HbaseSink.

If you need to write to more than one column family, the way to do it is to add 
a header to the event based on the column family/column, use the multiplexing 
channel selector to divert the event to different flows and then use multiple 
Hbase sinks. As of now, the HbaseSink writes only to one table and one column 
family. This was done to simplify configuration and the serializer interface. 

Basically - write a HBaseEventSerializer and plug it into the HbaseSink, which 
will write to Hbase

I hope this helps. 

Thanks
Hari

-- 
Hari Shreedharan

On Saturday, June 9, 2012 at 11:27 PM, Patrick Wendell wrote:

> Hi There,
> 
> For certain types of event data, such as log files, it would be nice to
> have a way to write to HBase such that fields from the original file can be
> parsed into distinct columns.
> 
> I want to implement this for a one-off project (and maybe for contribution
> back to flume if this makes sense).
> 
> What is the best way to go about it? Based on skimming the code my sense is
> that writing a custom HBase sink makes the most sense. Is that heading down
> the right path, or is there some other component I should be modifying or
> extending?
> 
> - Patrick

Re: Writing Multiple Columns in HBase Sync

Reply via email to