Thanks Hari - that does help. I was envisioning something akin to the RegexSerde in Hive, where you can just write a regular expression to extract fields from the event data and put in to separate columns (within a CF). Sounds like a customer Serializer is exactly what I want here.
- Patrick On Sat, Jun 9, 2012 at 11:01 PM, Hari Shreedharan <[email protected] > wrote: > Hi Patrick, > > The HbaseSink has 2 components - one being the sink itself and the other > being the serializer. When the sink picks up an event from the channel, it > is handed over to the serializer which can process the event and return > Puts and/or Increments. So if you plan to write to different columns within > the same column family, all you need to do is to write your own serializer > that implements HbaseEventSerializer, and set that as the serializer for > the HbaseSink. > > If you need to write to more than one column family, the way to do it is > to add a header to the event based on the column family/column, use the > multiplexing channel selector to divert the event to different flows and > then use multiple Hbase sinks. As of now, the HbaseSink writes only to one > table and one column family. This was done to simplify configuration and > the serializer interface. > > Basically - write a HBaseEventSerializer and plug it into the HbaseSink, > which will write to Hbase > > > I hope this helps. > > > Thanks > Hari > > > -- > Hari Shreedharan > > > On Saturday, June 9, 2012 at 11:27 PM, Patrick Wendell wrote: > > > Hi There, > > > > For certain types of event data, such as log files, it would be nice to > > have a way to write to HBase such that fields from the original file can > be > > parsed into distinct columns. > > > > I want to implement this for a one-off project (and maybe for > contribution > > back to flume if this makes sense). > > > > What is the best way to go about it? Based on skimming the code my sense > is > > that writing a custom HBase sink makes the most sense. Is that heading > down > > the right path, or is there some other component I should be modifying or > > extending? > > > > - Patrick > >
