Thanks much for the detailed explanations. I suspected architectural
support of the notion of rdd of rdds, but my understanding of Spark or
distributed computing in general is not as deep as allowing me to
understand better. so this really helps!
I ended up going with List[RDD]. The collection
before:
http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html
Here is one of the reasons why I think RDD[RDD[T]] is not possible:
- RDD is only a handle to the actual data partitions. It has a
reference/pointer to the *SparkContext* object (*sc*) and a list
Simillar question was asked before:
http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html
Here is one of the reasons why I think RDD[RDD[T]] is not possible:
- RDD is only a handle to the actual data partitions. It has a
reference/pointer to the *SparkContext* object
, if and when spark architecture allows workers to launch
spark jobs (the functions passed to transformation or action APIs of RDD),
it will be possible to have RDD of RDD.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-tp17025p23217.html
Sent
or action APIs of
RDD), it will be possible to have RDD of RDD.
On Tue, Jun 9, 2015 at 1:47 PM, kiran lonikar loni...@gmail.com wrote:
Simillar question was asked before:
http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html
Here is one of the reasons why I think RDD
, kiran lonikar loni...@gmail.com wrote:
Simillar question was asked before:
http://apache-spark-user-list.1001560.n3.nabble.com/Rdd-of-Rdds-td17025.html
Here is one of the reasons why I think RDD[RDD[T]] is not possible:
- RDD is only a handle to the actual data partitions. It has
Hi,
The problem I am looking at is as follows:
- I read in a log file of multiple users as a RDD
- I'd like to group the above RDD into *multiple RDDs* by userIds (the key)
- my processEachUser() function then takes in each RDD mapped into
each individual user, and calls for RDD.map
a indexOutOfBounds, so
trying to figure out if the original problem is manifesting itself as a new
one.
Regards
-Ravi
--
View this message in context: http://apache-spark-user-list.
1001560.n3.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-
uber-RDD-tp20986p21012.html
Sent from
.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-uber-RDD-tp20986p21012.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands
is manifesting itself as a
new
one.
Regards
-Ravi
--
View this message in context: http://apache-spark-user-list.
1001560.n3.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-
uber-RDD-tp20986p21012.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
of RDDs from which you can fold over them and merge
them.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-merge-a-RDD-of-RDDs-into-one-uber-RDD-tp20986p21007.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hello,
I would like to parallelize my work on multiple RDDs I have. I wanted
to know if spark can support a foreach on an RDD of RDDs. Here's a
java example:
public static void main(String[] args) {
SparkConf sparkConf = new SparkConf().setAppName(testapp
No, there's no such thing as an RDD of RDDs in Spark.
Here though, why not just operate on an RDD of Lists? or a List of RDDs?
Usually one of these two is the right approach whenever you feel
inclined to operate on an RDD of RDDs.
On Wed, Oct 22, 2014 at 3:58 PM, Tomer Benyamini tomer
/in/sonalgoyal
On Wed, Oct 22, 2014 at 8:35 PM, Sean Owen so...@cloudera.com wrote:
No, there's no such thing as an RDD of RDDs in Spark.
Here though, why not just operate on an RDD of Lists? or a List of RDDs?
Usually one of these two is the right approach whenever you feel
inclined
On Wednesday, October 22, 2014 9:06 AM, Sean Owen so...@cloudera.com wrote:
No, there's no such thing as an RDD of RDDs in Spark.
Here though, why not just operate on an RDD of Lists? or a List of RDDs?
Usually one of these two is the right approach whenever you feel
inclined to operate
15 matches
Mail list logo