Re: Spark on HBase vs. Spark on HDFS

Nick Pentreath Thu, 22 May 2014 00:40:18 -0700

Hi

In my opinion, running HBase for immutable data is generally overkill in
particular if you are using Shark anyway to cache and analyse the data and
provide the speed.


HBase is designed for random-access data patterns and high throughput R/W
activities. If you are only ever writing immutable logs, then that is what
HDFS is designed for.

Having said that, if you replace HBase you will need to come up with a
reliable way to put data into HDFS (a log aggregator like Flume or message
bus like Kafka perhaps, etc), so the pain of doing that may not be worth it
given you already know HBase.


On Thu, May 22, 2014 at 9:33 AM, Limbeck, Philip <philip.limb...@automic.com
> wrote:

>  HI!
>
>
>
> We are currently using HBase as our primary data store of different
> event-like data. On-top of that, we use Shark to aggregate this data and
> keep it
> in memory for fast data access.  Since we use no specific HBase
> functionality whatsoever except Putting data into it, a discussion
> came up on having to set up an additional set of components on top of HDFS
> instead of just writing to HDFS directly.
>
>  Is there any overview regarding implications of doing that ? I mean
> except things like taking care of file structure and the like. What is the
> true
>
> advantage of Spark on HBase in favor of Spark on HDFS?
>
>
>
> Best
>
> Philip
>
> Automic Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben
> Firmenbuchnummer/Commercial Register No. 275184h
> Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten
>
> This email (including any attachments) may contain information which is
> privileged, confidential, or protected. If you are not the intended
> recipient, note that any disclosure, copying, distribution, or use of the
> contents of this message and attached files is prohibited. If you have
> received this email in error, please notify the sender and delete this
> email and any attached files.
>
>

Re: Spark on HBase vs. Spark on HDFS

Reply via email to