Spark, in general, is good for iterating through an entire dataset again and again. All operations are expressed in terms of iteration through all the records of at least one partition. You may want to look at IndexedRDD ( https://issues.apache.org/jira/browse/SPARK-2365) that aims to improve point queries. In general though, Spark is unlikely to outperform KV stores because of the nature of scheduling a job for every operation.
On Wed, Oct 22, 2014 at 7:51 AM, Hajime Takase <placeofnomemor...@gmail.com> wrote: > Hi, > Is it possible to use Spark as clustered key/value store ( say, like > redis-cluster or hazelcast)?Will it out perform in write/read or other > operation? > My main urge is to use same RDD from several different SparkContext > without saving to disk or using spark-job server,but I'm curious if someone > has already tried using Spark like key/value store. > > Thanks, > > Hajime > > >