On Fri, Nov 7, 2014 at 8:04 PM, Nicholas Chammas <nicholas.cham...@gmail.com
> wrote:

> Sounds good. I'm looking forward to tracking improvements in this area.
>
> Also, just to connect some more dots here, I just remembered that there is
> currently an initiative to add an IndexedRDD
> <https://issues.apache.org/jira/browse/SPARK-2365> interface. Some
> interesting use cases mentioned there include (emphasis added):
>
> To address these problems, we propose IndexedRDD, an efficient key-value
> > store built on RDDs. IndexedRDD would extend RDD[(Long, V)] by enforcing
> > key uniqueness and pre-indexing the entries for efficient joins and
> *point
> > lookups, updates, and deletions*.
>
>
> GraphX would be the first user of IndexedRDD, since it currently implements
> > a limited form of this functionality in VertexRDD. We envision a variety
> of
> > other uses for IndexedRDD, including *streaming updates* to RDDs, *direct
> > serving* from RDDs, and as an execution strategy for Spark SQL.
>
>
> Maybe some day we'll have Spark clusters directly serving up point lookups
> or updates. I imagine the tasks running on clusters like that would be tiny
> and would benefit from very low task startup times and scheduling latency.
> Am I painting that picture correctly?
>
> Yeah - we painted a similar picture in a short paper last year titled "The
Case for Tiny Tasks in Compute Clusters"
http://shivaram.org/publications/tinytasks-hotos13.pdf

> Anyway, thanks for explaining the current status of Sparrow.
>
> Nick
>

Reply via email to