Re: Spark as key/value store?

2014-10-22 Thread Akshat Aranya
Spark, in general, is good for iterating through an entire dataset again
and again.  All operations are expressed in terms of iteration through all
the records of at least one partition.  You may want to look at IndexedRDD (
https://issues.apache.org/jira/browse/SPARK-2365) that aims to improve
point queries.  In general though, Spark is unlikely to outperform KV
stores because of the nature of scheduling a job for every operation.

On Wed, Oct 22, 2014 at 7:51 AM, Hajime Takase placeofnomemor...@gmail.com
wrote:

 Hi,
 Is it possible to use Spark as clustered key/value store ( say, like
 redis-cluster or hazelcast)?Will it out perform in write/read or other
 operation?
 My main urge is to use same RDD from several different SparkContext
 without saving to disk or using spark-job server,but I'm curious if someone
 has already tried using Spark like key/value store.

 Thanks,

 Hajime





Re: Spark as key/value store?

2014-10-22 Thread Hajime Takase
Thanks!

On Thu, Oct 23, 2014 at 10:56 AM, Akshat Aranya aara...@gmail.com wrote:

 Yes, that is a downside of Spark's design in general. The only way to
 share data across consumers of data is by having a separate entity that
 owns the Spark context. That's the idea behind Ooyala's job server. The
 driver is still a single point of failure; if you lose the driver process,
 you lose all information about the RDDs.
 On Oct 22, 2014 6:33 PM, Hajime Takase placeofnomemor...@gmail.com
 wrote:

 Interesting.I see the interface of IndexedRDD, which seems to be like
 key/value store of the certain SparkContext.
 https://github.com/apache/spark/pull/1297
 But the different SparkContext won't let their IndexedRDD to be used by
 other ( I want to use multiple driver in my system)?



 On Thu, Oct 23, 2014 at 1:01 AM, Akshat Aranya aara...@gmail.com wrote:

 Spark, in general, is good for iterating through an entire dataset again
 and again.  All operations are expressed in terms of iteration through all
 the records of at least one partition.  You may want to look at IndexedRDD (
 https://issues.apache.org/jira/browse/SPARK-2365) that aims to improve
 point queries.  In general though, Spark is unlikely to outperform KV
 stores because of the nature of scheduling a job for every operation.

 On Wed, Oct 22, 2014 at 7:51 AM, Hajime Takase 
 placeofnomemor...@gmail.com wrote:

 Hi,
 Is it possible to use Spark as clustered key/value store ( say, like
 redis-cluster or hazelcast)?Will it out perform in write/read or other
 operation?
 My main urge is to use same RDD from several different SparkContext
 without saving to disk or using spark-job server,but I'm curious if someone
 has already tried using Spark like key/value store.

 Thanks,

 Hajime