Re: A demo setup on a single linux server

Eric Yang Fri, 03 Jun 2011 10:33:10 -0700

1. HadoopLog should be removed.  There is a legacy parser for hadoop 0.18 job 
history format, which requires this table.  The code probably should be updated 
to parse the new hadoop 0.20 format.


2. The data written to HBase is defined by the demux parsers.  You would find 
the parsers are located in src/ 
java/org/apache/hadoop/chukwa/extraction/demux/processor/mapper/*.java and src/ 
java/org/apache/hadoop/chukwa/extraction/demux/processor/reducer/*.java.  It is 
using the same parsers that is executed by demux mapreduce job for backward 
compatibility with Chukwa 0.4.  It is definitely possible to write your own 
parser to extract features from /var/log/messages and provide visualization 
through HICC.  As long as you have written a demux parser and defined Hbase 
Schema, it should show up in HICC.

3.  HBaseWriter act as a mini-demux process and output demuxed key/value pairs. 
 If the data structure is pre-determined, and data analytics only requires 
semi-structures.  HBaseWriter is good enough for near real time data 
monitoring.  The use case for SeqFileWriter is writing the raw unstructured 
data into HDFS.  Hence, if your use case is to preserve unknown data structure, 
use SeqFileWriter.  This creates archive files on HDFS and can be processed by 
demux (ETL) process as a secondary pipeline.

4. Chukwa agent keeps track of the file offset, and written a check point file. 
 If agent has been restarted, it would resume operation from last check point.  
I am not sure if this was what you saw.

Regards,
Eric

On 6/3/11 12:46 AM, "DKN" <[email protected]> wrote:

A few corrections to the above post.

1. Data is written ( a lot of them actually !) to Hadoop table in HBase..
However HadoopLog table is empty.

2. and 3. I was reading this archive today :
http://www.mail-archive.com/[email protected]/msg00078.html
and got some more insights. However, if Eric can comment on the data storage
strategy between HDFS and HBase tables, it will be useful.

4. I restarted the entire cluster and now, I don't see this problem. In
other words, I do see what is configured in initial_adaptors file running. I
will put this in the back-burner for now.

Thanks and regards, DKN

--
View this message in context: 
http://apache-chukwa.679492.n3.nabble.com/A-demo-setup-on-a-single-linux-server-tp3001627p3018864.html
Sent from the Chukwa - Users mailing list archive at Nabble.com.

Re: A demo setup on a single linux server

Reply via email to