Re: Writing Multiple Columns in HBase Sync

Hari Shreedharan Sun, 10 Jun 2012 19:05:24 -0700

I have uploaded a patch to Flume-1252 which is a better performing Hbase
sink. I you are experimenting, it would be great help if you could try that
out. The serializer has almost the same functionality/API. It will be great
to get some verification of its correctness and performance.


Thanks!
Hari

On Sunday, June 10, 2012, Patrick Wendell wrote:

> Thanks Hari - that does help. I was envisioning something akin to the
> RegexSerde in Hive, where you can just write a regular expression to
> extract fields from the event data and put in to separate columns (within a
> CF). Sounds like a customer Serializer is exactly what I want here.
>
> - Patrick
>
> On Sat, Jun 9, 2012 at 11:01 PM, Hari Shreedharan <
> [email protected] <javascript:;>
> > wrote:
>
> > Hi Patrick,
> >
> > The HbaseSink has 2 components - one being the sink itself and the other
> > being the serializer. When the sink picks up an event from the channel,
> it
> > is handed over to the serializer which can process the event and return
> > Puts and/or Increments. So if you plan to write to different columns
> within
> > the same column family, all you need to do is to write your own
> serializer
> > that implements HbaseEventSerializer, and set that as the serializer for
> > the HbaseSink.
> >
> > If you need to write to more than one column family, the way to do it is
> > to add a header to the event based on the column family/column, use the
> > multiplexing channel selector to divert the event to different flows and
> > then use multiple Hbase sinks. As of now, the HbaseSink writes only to
> one
> > table and one column family. This was done to simplify configuration and
> > the serializer interface.
> >
> > Basically - write a HBaseEventSerializer and plug it into the HbaseSink,
> > which will write to Hbase
> >
> >
> > I hope this helps.
> >
> >
> > Thanks
> > Hari
> >
> >
> > --
> > Hari Shreedharan
> >
> >
> > On Saturday, June 9, 2012 at 11:27 PM, Patrick Wendell wrote:
> >
> > > Hi There,
> > >
> > > For certain types of event data, such as log files, it would be nice to
> > > have a way to write to HBase such that fields from the original file
> can
> > be
> > > parsed into distinct columns.
> > >
> > > I want to implement this for a one-off project (and maybe for
> > contribution
> > > back to flume if this makes sense).
> > >
> > > What is the best way to go about it? Based on skimming the code my
> sense
> > is
> > > that writing a custom HBase sink makes the most sense. Is that heading
> > down
> > > the right path, or is there some other component I should be modifying
> or
> > > extending?
> > >
> > > - Patrick
> >
> >
>

Re: Writing Multiple Columns in HBase Sync

Reply via email to