Re: efficient zipping of lots of RDDs

Mohit Jaggi Thu, 11 Sep 2014 09:21:51 -0700

filed  jira SPARK-3489  <https://issues.apache.org/jira/browse/SPARK-3489>


On Thu, Sep 4, 2014 at 9:36 AM, Mohit Jaggi <mohitja...@gmail.com> wrote:

> Folks,
> I sent an email announcing
> https://github.com/AyasdiOpenSource/df
>
> This dataframe is basically a map of RDDs of columns(along with DSL
> sugar), as column based operations seem to be most common. But row
> operations are not uncommon. To get rows out of columns right now I zip the
> column RDDs together. I use RDD.zip then flatten the tuples I get. I
> realize that RDD.zipPartitions might be faster. However, I believe an even
> better approach should be possible. Surely we can have a zip method that
> can combine a large variable number of RDDs? Can that be added to
> Spark-core? Or is there an alternative equally good or better approach?
>
> Cheers,
> Mohit.
>

Re: efficient zipping of lots of RDDs

Reply via email to