filed jira SPARK-3489 https://issues.apache.org/jira/browse/SPARK-3489
On Thu, Sep 4, 2014 at 9:36 AM, Mohit Jaggi mohitja...@gmail.com wrote:
Folks,
I sent an email announcing
https://github.com/AyasdiOpenSource/df
This dataframe is basically a map of RDDs of columns(along with DSL
sugar), as column based operations seem to be most common. But row
operations are not uncommon. To get rows out of columns right now I zip the
column RDDs together. I use RDD.zip then flatten the tuples I get. I
realize that RDD.zipPartitions might be faster. However, I believe an even
better approach should be possible. Surely we can have a zip method that
can combine a large variable number of RDDs? Can that be added to
Spark-core? Or is there an alternative equally good or better approach?
Cheers,
Mohit.