Comparison chart: --------------------------------------------------------------------------- | Chukwa Types | Chukwa classic | Chukwa on Hbase | --------------------------------------------------------------------------- | Installation cost | Hadoop + Chukwa | Hadoop + Hbase + Chukwa | --------------------------------------------------------------------------- | Data latency | fixed n Minutes | 50-100 ms | --------------------------------------------------------------------------- | File Management | Hourly/Daily Roll Up | Hbase periodically | | Cost | Mapreduce Job | spill data to disk | --------------------------------------------------------------------------- | Record Size | Small needs to fit | Data node block | | | in java HashMap | size. (64MB) | --------------------------------------------------------------------------- | GUI friendly view | Data needs to be | drill down to raw | | | aggregated first | data or aggregated | --------------------------------------------------------------------------- | Demux | Single reducer | Write to hbase in | | | or creates multiple | parallel | | | part-nnn files, and | | | | unsorted between files | | --------------------------------------------------------------------------- | Demux Output | Sequence file | Hbase Table | --------------------------------------------------------------------------- | Data analytics tools | Mapreduce/Pig | MR/Pig/Hive/Cascading | ---------------------------------------------------------------------------
Regards, Eric On 11/22/10 3:05 PM, "Ahmed Fathalla" <[email protected]> wrote: > I think what we need to do is create some kind of comparison table > contrasting the merits of each approach (HBase vs Normal Demux processing). > This exercise will be both useful in making the decision of choosing the > default and for documentation purposes to illustrate the difference for new > users. > > > On Mon, Nov 22, 2010 at 11:19 PM, Bill Graham <[email protected]> wrote: > >> We are going to continue to have use cases where we want log data >> rolled up into 5 minute, hourly and daily increments in HDFS to run >> map reduce jobs on them. How will this model work with the HBase >> approach? What process will aggregate the HBase data into time >> increments like the current demux and hourly/daily rolling processes >> do? Basically, what does the time partitioning look like in the HBase >> storage scheme? >> >>> My concern is that the demux process is going to become two parallel >>> tracks, one works in mapreduce, and another one works in collector. It >>> becomes difficult to have clean efficient parsers which works in both >> >> This statement makes me concerned that you're implying the need to >> deprecate the current demux model, which is very different than making >> one or the other the default in the configs. Is that the case? >> >> >> >> On Mon, Nov 22, 2010 at 11:41 AM, Eric Yang <[email protected]> wrote: >>> MySQL support has been removed from Chukwa 0.5. My concern is that the >> demux process is going to become two parallel tracks, one works in >> mapreduce, and another one works in collector. It becomes difficult to have >> clean efficient parsers which works in both places. From architecture >> perspective, incremental updates to data is better than batch processing for >> near real time monitoring purpose. I like to ensure Chukwa framework can >> deliver Chukwa's mission statement, hence I standby Hbase as default. I was >> playing with Hbase 0.20.6+Pig 0.8 branch last weekend, I was very impressed >> by both speed and performance of this combination. I encourage people to >> try it out. >>> >>> Regards, >>> Eric >>> >>> On 11/22/10 10:50 AM, "Ariel Rabkin" <[email protected]> wrote: >>> >>> I agree with Bill and Deshpande that we ought to make clear to users >>> that they don't nee HICC, and therefore don't need either MySQL or >>> HBase. >>> >>> But I think what Eric meant to ask was which of MySQL and HBase ought >>> to be the default *for HICC*. My sense is that the HBase support >>> isn't quite mature enough, but it's getting there. >>> >>> I think HBase is ultimately the way to go. I think we might benefit as >>> a community by doing a 0.5 release first, while waiting for the >>> pig-based aggregation support that's blocking HBase. >>> >>> --Ari >>> >>> On Mon, Nov 22, 2010 at 10:47 AM, Deshpande, Deepak >>> <[email protected]> wrote: >>>> I agree. Making HBase by default would make some Chukwa users life >> difficult. In my set up, I don't need HDFS. I am using Chukwa merely as a >> Log Streaming framework. I have plugged in my own writer to write log files >> in Local File system (instead of HDFS). I evaluated Chukwa with other >> frameworks and Chukwa had very good fault tolerance built in than other >> frameworks. This made me recommend Chukwa over other frameworks. >>>> >>>> By making HBase default option would definitely make my life difficult >> :). >>>> >>>> Thanks, >>>> Deepak Deshpande >>>> >>> >>> >>> -- >>> Ari Rabkin [email protected] >>> UC Berkeley Computer Science Department >>> >>> >> > > > > -- > Ahmed Fathalla >
