Also I am unsure if Spark on Hbase leverages Locality. When you cache & process data do you see node_local jobs in process list. Spark on HDFS leverages locality quite well & can really boost performance by 3-4x in my experience. If you are loading all your data from HBase to spark then you are better off using HDFS.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, May 22, 2014 at 1:09 PM, Nick Pentreath <nick.pentre...@gmail.com>wrote: > Hi > > In my opinion, running HBase for immutable data is generally overkill in > particular if you are using Shark anyway to cache and analyse the data and > provide the speed. > > HBase is designed for random-access data patterns and high throughput R/W > activities. If you are only ever writing immutable logs, then that is what > HDFS is designed for. > > Having said that, if you replace HBase you will need to come up with a > reliable way to put data into HDFS (a log aggregator like Flume or message > bus like Kafka perhaps, etc), so the pain of doing that may not be worth it > given you already know HBase. > > > On Thu, May 22, 2014 at 9:33 AM, Limbeck, Philip < > philip.limb...@automic.com> wrote: > >> HI! >> >> >> >> We are currently using HBase as our primary data store of different >> event-like data. On-top of that, we use Shark to aggregate this data and >> keep it >> in memory for fast data access. Since we use no specific HBase >> functionality whatsoever except Putting data into it, a discussion >> came up on having to set up an additional set of components on top of >> HDFS instead of just writing to HDFS directly. >> >> Is there any overview regarding implications of doing that ? I mean >> except things like taking care of file structure and the like. What is the >> true >> >> advantage of Spark on HBase in favor of Spark on HDFS? >> >> >> >> Best >> >> Philip >> >> Automic Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben >> Firmenbuchnummer/Commercial Register No. 275184h >> Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten >> >> This email (including any attachments) may contain information which is >> privileged, confidential, or protected. If you are not the intended >> recipient, note that any disclosure, copying, distribution, or use of the >> contents of this message and attached files is prohibited. If you have >> received this email in error, please notify the sender and delete this >> email and any attached files. >> >> >