Re: Spark on HBase vs. Spark on HDFS

2014-05-23 Thread Mayur Rustagi
Also I am unsure if Spark on Hbase leverages Locality. When you cache process data do you see node_local jobs in process list. Spark on HDFS leverages locality quite well can really boost performance by 3-4x in my experience. If you are loading all your data from HBase to spark then you are

Spark on HBase vs. Spark on HDFS

2014-05-22 Thread Limbeck, Philip
HI! We are currently using HBase as our primary data store of different event-like data. On-top of that, we use Shark to aggregate this data and keep it in memory for fast data access. Since we use no specific HBase functionality whatsoever except Putting data into it, a discussion came up on

Re: Spark on HBase vs. Spark on HDFS

2014-05-22 Thread Nick Pentreath
Hi In my opinion, running HBase for immutable data is generally overkill in particular if you are using Shark anyway to cache and analyse the data and provide the speed. HBase is designed for random-access data patterns and high throughput R/W activities. If you are only ever writing immutable