Also I am unsure if Spark on Hbase leverages Locality. When you cache &
process data do you see node_local jobs in process list.
Spark on HDFS leverages locality quite well & can really boost performance
by 3-4x in my experience.
If you are loading all your data from HBase to spark then you  are better
off using HDFS.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Thu, May 22, 2014 at 1:09 PM, Nick Pentreath <nick.pentre...@gmail.com>wrote:

> Hi
>
> In my opinion, running HBase for immutable data is generally overkill in
> particular if you are using Shark anyway to cache and analyse the data and
> provide the speed.
>
> HBase is designed for random-access data patterns and high throughput R/W
> activities. If you are only ever writing immutable logs, then that is what
> HDFS is designed for.
>
> Having said that, if you replace HBase you will need to come up with a
> reliable way to put data into HDFS (a log aggregator like Flume or message
> bus like Kafka perhaps, etc), so the pain of doing that may not be worth it
> given you already know HBase.
>
>
> On Thu, May 22, 2014 at 9:33 AM, Limbeck, Philip <
> philip.limb...@automic.com> wrote:
>
>>  HI!
>>
>>
>>
>> We are currently using HBase as our primary data store of different
>> event-like data. On-top of that, we use Shark to aggregate this data and
>> keep it
>> in memory for fast data access.  Since we use no specific HBase
>> functionality whatsoever except Putting data into it, a discussion
>> came up on having to set up an additional set of components on top of
>> HDFS instead of just writing to HDFS directly.
>>
>>  Is there any overview regarding implications of doing that ? I mean
>> except things like taking care of file structure and the like. What is the
>> true
>>
>> advantage of Spark on HBase in favor of Spark on HDFS?
>>
>>
>>
>> Best
>>
>> Philip
>>
>> Automic Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben
>> Firmenbuchnummer/Commercial Register No. 275184h
>> Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten
>>
>> This email (including any attachments) may contain information which is
>> privileged, confidential, or protected. If you are not the intended
>> recipient, note that any disclosure, copying, distribution, or use of the
>> contents of this message and attached files is prohibited. If you have
>> received this email in error, please notify the sender and delete this
>> email and any attached files.
>>
>>
>

Reply via email to