Hi Patrick, The HbaseSink has 2 components - one being the sink itself and the other being the serializer. When the sink picks up an event from the channel, it is handed over to the serializer which can process the event and return Puts and/or Increments. So if you plan to write to different columns within the same column family, all you need to do is to write your own serializer that implements HbaseEventSerializer, and set that as the serializer for the HbaseSink.
If you need to write to more than one column family, the way to do it is to add a header to the event based on the column family/column, use the multiplexing channel selector to divert the event to different flows and then use multiple Hbase sinks. As of now, the HbaseSink writes only to one table and one column family. This was done to simplify configuration and the serializer interface. Basically - write a HBaseEventSerializer and plug it into the HbaseSink, which will write to Hbase I hope this helps. Thanks Hari -- Hari Shreedharan On Saturday, June 9, 2012 at 11:27 PM, Patrick Wendell wrote: > Hi There, > > For certain types of event data, such as log files, it would be nice to > have a way to write to HBase such that fields from the original file can be > parsed into distinct columns. > > I want to implement this for a one-off project (and maybe for contribution > back to flume if this makes sense). > > What is the best way to go about it? Based on skimming the code my sense is > that writing a custom HBase sink makes the most sense. Is that heading down > the right path, or is there some other component I should be modifying or > extending? > > - Patrick
