Re: piping data into Cassandra

AD Wed, 26 Oct 2011 05:16:28 -0700

Hmm, i am running bin/chukwa demux and i dont have anything past
dataSinkArchives, there is no directory named demuxOutputDir_*.


Also isnt dp an aggregate view?  I need to parse the apache logs to do
custom reports on things like remote_host , query strings, etc so i was
hoping to parse the raw record and load it into Cassandra and run M/R there
to do the aggregate views.  I thought a new version of TSProcessor was the
right place here but i could be wrong.

Thoughts?



If not, how do you write a custom postProcessor?

On Wed, Oct 26, 2011 at 12:57 AM, Eric Yang <[email protected]> wrote:

> Hi AD,
>
> Data is stored in demuxOutputDir_* by demux and there is a
> postProcessorMananger (bin/chukwa dp) which monitors postProcess
> directory and load data to MySQL.  For your use case, you will need to
> modify PostProcessorManager.java to adopt to your use case.  Hope this
> helps.
>
> regards,
> Eric
>
> On Tue, Oct 25, 2011 at 6:34 PM, AD <[email protected]> wrote:
> > hello,
> >  I currently push apache logs into Chukwa.  I am trying to figure out how
> to
> > get all those logs into Cassandra and run mapreduce there.  Is the best
> > place to do this in Demux (right my own version of TSProcessor?)
> >  Also the data flow seems to miss a step.  The
> > page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html says
> in
> > 3.3 that
> >    - demux moves complete files to: dataSinkArchives/[yyyyMMdd]/*/*.done
> >  - the next step is to move files
> >
> from 
> postProcess/demuxOutputDir_*/[clusterName]/[dataType]/[dataType]_[yyyyMMdd]_[HH].R.evt
> >   How do they get from dataSinkArchives to postProcess?  does this run
> > inside of DemuxManager or a separate process (bin/chukwa demux) ?
> >  Thanks
> >  AD
>

Re: piping data into Cassandra

Reply via email to