Re: piping data into Cassandra

Eric Yang Wed, 26 Oct 2011 19:09:42 -0700

See: http://incubator.apache.org/chukwa/docs/r0.4.0/agent.html and 
http://incubator.apache.org/chukwa/docs/r0.4.0/programming.html


The configuration are the same for collector based demux.  Hope this helps.

regards,
Eric

On Oct 26, 2011, at 4:20 PM, AD wrote:

> Thanks.  Sorry for being dense here, but where does the data type get mapped 
> from the agent to the collector when passing data so that demux will match ?
> 
> On Wed, Oct 26, 2011 at 12:34 PM, Eric Yang <[email protected]> wrote:
> "dp" serves as two functions, first it loads data to mysql, second, it runs 
> SQL for aggregated views.  demuxOutputDir_* is created if the demux mapreduce 
> produces data.  Hence, make sure that there is a demux processor mapped to 
> your data type for the extracting process in chukwa-demux-conf.xml.
> 
> regards,
> Eric
> 
> On Oct 26, 2011, at 5:15 AM, AD wrote:
> 
> > Hmm, i am running bin/chukwa demux and i dont have anything past 
> > dataSinkArchives, there is no directory named demuxOutputDir_*.
> >
> > Also isnt dp an aggregate view?  I need to parse the apache logs to do 
> > custom reports on things like remote_host , query strings, etc so i was 
> > hoping to parse the raw record and load it into Cassandra and run M/R there 
> > to do the aggregate views.  I thought a new version of TSProcessor was the 
> > right place here but i could be wrong.
> >
> > Thoughts?
> >
> >
> >
> > If not, how do you write a custom postProcessor?
> >
> > On Wed, Oct 26, 2011 at 12:57 AM, Eric Yang <[email protected]> wrote:
> > Hi AD,
> >
> > Data is stored in demuxOutputDir_* by demux and there is a
> > postProcessorMananger (bin/chukwa dp) which monitors postProcess
> > directory and load data to MySQL.  For your use case, you will need to
> > modify PostProcessorManager.java to adopt to your use case.  Hope this
> > helps.
> >
> > regards,
> > Eric
> >
> > On Tue, Oct 25, 2011 at 6:34 PM, AD <[email protected]> wrote:
> > > hello,
> > >  I currently push apache logs into Chukwa.  I am trying to figure out how 
> > > to
> > > get all those logs into Cassandra and run mapreduce there.  Is the best
> > > place to do this in Demux (right my own version of TSProcessor?)
> > >  Also the data flow seems to miss a step.  The
> > > page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html says in
> > > 3.3 that
> > >    - demux moves complete files to: dataSinkArchives/[yyyyMMdd]/*/*.done
> > >  - the next step is to move files
> > > from 
> > > postProcess/demuxOutputDir_*/[clusterName]/[dataType]/[dataType]_[yyyyMMdd]_[HH].R.evt
> > >   How do they get from dataSinkArchives to postProcess?  does this run
> > > inside of DemuxManager or a separate process (bin/chukwa demux) ?
> > >  Thanks
> > >  AD
> >
> 
>

Re: piping data into Cassandra

Reply via email to