Date: Fri, 5 Dec 2014 14:58:37 -0600
Subject: Re: Java RDD Union
To: ronalday...@live.com; user@spark.apache.org
foreach also creates a new RDD, and does not modify an existing RDD.
However, in practice, nothing stops you from fiddling with the Java
objects inside an RDD when you get
I guess a major problem with this is that you lose fault tolerance.
You have no way of recreating the local state of the mutable RDD if a
partition is lost.
Why would you need thousands of RDDs for kmeans? it's a few per iteration.
An RDD is more bookkeeping that data structure, itself. They
.
But anyway, that is the very thing Spark is advertised for.
From: so...@cloudera.com
Date: Sat, 6 Dec 2014 06:39:10 -0600
Subject: Re: Java RDD Union
To: ronalday...@live.com
CC: user@spark.apache.org
I guess a major problem with this is that you lose fault tolerance.
You have no way
No, RDDs are immutable. union() creates a new RDD, and does not modify
an existing RDD. Maybe this obviates the question. I'm not sure what
you mean about releasing from memory. If you want to repartition the
unioned RDD, you repartition the result of union(), not anything else.
On Fri, Dec 5,
Hi Ron,
Out of curiosity, why do you think that union is modifying an existing RDD
in place? In general all transformations, including union, will create new
RDDs, not modify old RDDs in place.
Here's a quick test:
scala val firstRDD = sc.parallelize(1 to 5)
firstRDD:
but there are referents are and somehow this will no longer work
when clustering.
Thanks for comments.
From: so...@cloudera.com
Date: Fri, 5 Dec 2014 14:22:38 -0600
Subject: Re: Java RDD Union
To: ronalday...@live.com
CC: user@spark.apache.org
No, RDDs are immutable. union() creates a new