Also I am unsure if Spark on Hbase leverages Locality. When you cache
process data do you see node_local jobs in process list.
Spark on HDFS leverages locality quite well can really boost performance
by 3-4x in my experience.
If you are loading all your data from HBase to spark then you are
HI!
We are currently using HBase as our primary data store of different event-like
data. On-top of that, we use Shark to aggregate this data and keep it
in memory for fast data access. Since we use no specific HBase functionality
whatsoever except Putting data into it, a discussion
came up on
Hi
In my opinion, running HBase for immutable data is generally overkill in
particular if you are using Shark anyway to cache and analyse the data and
provide the speed.
HBase is designed for random-access data patterns and high throughput R/W
activities. If you are only ever writing immutable