The data stored in Hadoop after the demux process is a sequence file containing the data. One easy way to get this is to use Pig via the ChukwaLoader:
http://svn.apache.org/viewvc/incubator/chukwa/trunk/contrib/chukwa-pig/src/java/org/apache/hadoop/chukwa/pig/ChukwaLoader.java?view=markup Note that it's using the SequenceFileRecordReader like this to read the data, so if you don't want to use Pig, you could do something similar. SequenceFileRecordReader<ChukwaRecordKey, ChukwaRecord> The ChukwaRecord contains a handful of fields created by the Processor that you've configured to collect your data. If you're using the TSProcessor, I think the payload is in a field called 'body' IIRC. There's also a command line java tool to dump the contents of a sequence file to stdout, which can be handy. I forget what it's called, but it should be in the docs. On Thu, Nov 17, 2011 at 2:53 AM, Mohammad Tariq <[email protected]> wrote: > Oh, in that case i have to wait for their reply and keep on trying > till then..Thanks for the reply Ahmed. > > Regards, > Mohammad Tariq > > > > On Thu, Nov 17, 2011 at 4:20 PM, Ahmed Fathalla <[email protected]> > wrote: > > Hmm...maybe in the demux part of the system ( I think it utilizes pig > > scripts somewhere). I'm not an expert in this, maybe Ari, Bill or Eric > can > > help on this. > > > > On Thu, Nov 17, 2011 at 12:47 PM, Mohammad Tariq <[email protected]> > wrote: > >> > >> Is it possible for us to extract only the actual content present > >> inside a file without any other information, using Chukwa?? > >> > >> Regards, > >> Mohammad Tariq > > > > > > > > -- > > Ahmed Fathalla > > >
