Re: [DISCUSSION] Making HBaseWriter default

Eric Yang Mon, 22 Nov 2010 17:01:03 -0800

Comparison chart:

---------------------------------------------------------------------------
| Chukwa Types         | Chukwa classic         | Chukwa on Hbase         |
---------------------------------------------------------------------------
| Installation cost    | Hadoop + Chukwa        | Hadoop + Hbase + Chukwa |
---------------------------------------------------------------------------
| Data latency         | fixed n Minutes        | 50-100 ms               |
---------------------------------------------------------------------------
| File Management      | Hourly/Daily Roll Up   | Hbase periodically      |
| Cost                 | Mapreduce Job          | spill data to disk      |
---------------------------------------------------------------------------
| Record Size          | Small needs to fit     | Data node block         |
|                      | in java HashMap        | size. (64MB)            |
---------------------------------------------------------------------------
| GUI friendly view    | Data needs to be       | drill down to raw       |
|                      | aggregated first       | data or aggregated      |
---------------------------------------------------------------------------
| Demux                | Single reducer         | Write to hbase in       |
|                      | or creates multiple    | parallel                |
|                      | part-nnn files, and    |                         |
|                      | unsorted between files |                         |
---------------------------------------------------------------------------
| Demux Output         | Sequence file          | Hbase Table             |
---------------------------------------------------------------------------
| Data analytics tools | Mapreduce/Pig          | MR/Pig/Hive/Cascading   |
---------------------------------------------------------------------------


Regards,
Eric

On 11/22/10 3:05 PM, "Ahmed Fathalla" <[email protected]> wrote:

> I think what we need to do is create some kind of comparison table
> contrasting the merits of each approach (HBase vs Normal Demux processing).
> This exercise will be both useful in making the decision of choosing the
> default and for documentation purposes to illustrate the difference for new
> users.
> 
> 
> On Mon, Nov 22, 2010 at 11:19 PM, Bill Graham <[email protected]> wrote:
> 
>> We are going to continue to have use cases where we want log data
>> rolled up into 5 minute, hourly and daily increments in HDFS to run
>> map reduce jobs on them. How will this model work with the HBase
>> approach? What process will aggregate the HBase data into time
>> increments like the current demux and hourly/daily rolling processes
>> do? Basically, what does the time partitioning look like in the HBase
>> storage scheme?
>> 
>>> My concern is that the demux process is going to become two parallel
>>> tracks, one works in mapreduce, and another one works in collector.  It
>>> becomes difficult to have clean efficient parsers which works in both
>> 
>> This statement makes me concerned that you're implying the need to
>> deprecate the current demux model, which is very different than making
>> one or the other the default in the configs. Is that the case?
>> 
>> 
>> 
>> On Mon, Nov 22, 2010 at 11:41 AM, Eric Yang <[email protected]> wrote:
>>> MySQL support has been removed from Chukwa 0.5.  My concern is that the
>> demux process is going to become two parallel tracks, one works in
>> mapreduce, and another one works in collector.  It becomes difficult to have
>> clean efficient parsers which works in both places.  From architecture
>> perspective, incremental updates to data is better than batch processing for
>> near real time monitoring purpose.  I like to ensure Chukwa framework can
>> deliver Chukwa's mission statement, hence I standby Hbase as default.  I was
>> playing with Hbase 0.20.6+Pig 0.8 branch last weekend, I was very impressed
>> by both speed and performance of this combination.  I encourage people to
>> try it out.
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 11/22/10 10:50 AM, "Ariel Rabkin" <[email protected]> wrote:
>>> 
>>> I agree with Bill and Deshpande that we ought to make clear to users
>>> that they don't nee HICC, and therefore don't need either MySQL or
>>> HBase.
>>> 
>>> But I think what Eric meant to ask was which of MySQL and HBase ought
>>> to be the default *for HICC*.  My sense is that the HBase support
>>> isn't quite mature enough, but it's getting there.
>>> 
>>> I think HBase is ultimately the way to go. I think we might benefit as
>>> a community by doing a 0.5 release first, while waiting for the
>>> pig-based aggregation support that's blocking HBase.
>>> 
>>> --Ari
>>> 
>>> On Mon, Nov 22, 2010 at 10:47 AM, Deshpande, Deepak
>>> <[email protected]> wrote:
>>>> I agree. Making HBase by default would make some Chukwa users life
>> difficult. In my set up, I don't need HDFS. I am using Chukwa merely as a
>> Log Streaming framework. I have plugged in my own writer to write log files
>> in Local File system (instead of HDFS). I evaluated Chukwa with other
>> frameworks and Chukwa had very good fault tolerance built in than other
>> frameworks. This made me recommend Chukwa over other frameworks.
>>>> 
>>>> By making HBase default option would definitely make my life difficult
>> :).
>>>> 
>>>> Thanks,
>>>> Deepak Deshpande
>>>> 
>>> 
>>> 
>>> --
>>> Ari Rabkin [email protected]
>>> UC Berkeley Computer Science Department
>>> 
>>> 
>> 
> 
> 
> 
> --
> Ahmed Fathalla
>

Re: [DISCUSSION] Making HBaseWriter default

Reply via email to