Hmm, i am running bin/chukwa demux and i dont have anything past dataSinkArchives, there is no directory named demuxOutputDir_*.
Also isnt dp an aggregate view? I need to parse the apache logs to do custom reports on things like remote_host , query strings, etc so i was hoping to parse the raw record and load it into Cassandra and run M/R there to do the aggregate views. I thought a new version of TSProcessor was the right place here but i could be wrong. Thoughts? If not, how do you write a custom postProcessor? On Wed, Oct 26, 2011 at 12:57 AM, Eric Yang <[email protected]> wrote: > Hi AD, > > Data is stored in demuxOutputDir_* by demux and there is a > postProcessorMananger (bin/chukwa dp) which monitors postProcess > directory and load data to MySQL. For your use case, you will need to > modify PostProcessorManager.java to adopt to your use case. Hope this > helps. > > regards, > Eric > > On Tue, Oct 25, 2011 at 6:34 PM, AD <[email protected]> wrote: > > hello, > > I currently push apache logs into Chukwa. I am trying to figure out how > to > > get all those logs into Cassandra and run mapreduce there. Is the best > > place to do this in Demux (right my own version of TSProcessor?) > > Also the data flow seems to miss a step. The > > page http://incubator.apache.org/chukwa/docs/r0.4.0/dataflow.html says > in > > 3.3 that > > - demux moves complete files to: dataSinkArchives/[yyyyMMdd]/*/*.done > > - the next step is to move files > > > from > postProcess/demuxOutputDir_*/[clusterName]/[dataType]/[dataType]_[yyyyMMdd]_[HH].R.evt > > How do they get from dataSinkArchives to postProcess? does this run > > inside of DemuxManager or a separate process (bin/chukwa demux) ? > > Thanks > > AD >
