RE: Java RDD Union

2014-12-06 Thread Ron Ayoub
Date: Fri, 5 Dec 2014 14:58:37 -0600 Subject: Re: Java RDD Union To: ronalday...@live.com; user@spark.apache.org foreach also creates a new RDD, and does not modify an existing RDD. However, in practice, nothing stops you from fiddling with the Java objects inside an RDD when you get

Re: Java RDD Union

2014-12-06 Thread Sean Owen
I guess a major problem with this is that you lose fault tolerance. You have no way of recreating the local state of the mutable RDD if a partition is lost. Why would you need thousands of RDDs for kmeans? it's a few per iteration. An RDD is more bookkeeping that data structure, itself. They

RE: Java RDD Union

2014-12-06 Thread Ron Ayoub
. But anyway, that is the very thing Spark is advertised for. From: so...@cloudera.com Date: Sat, 6 Dec 2014 06:39:10 -0600 Subject: Re: Java RDD Union To: ronalday...@live.com CC: user@spark.apache.org I guess a major problem with this is that you lose fault tolerance. You have no way

Re: Java RDD Union

2014-12-05 Thread Sean Owen
No, RDDs are immutable. union() creates a new RDD, and does not modify an existing RDD. Maybe this obviates the question. I'm not sure what you mean about releasing from memory. If you want to repartition the unioned RDD, you repartition the result of union(), not anything else. On Fri, Dec 5,

Re: Java RDD Union

2014-12-05 Thread Sameer Farooqui
Hi Ron, Out of curiosity, why do you think that union is modifying an existing RDD in place? In general all transformations, including union, will create new RDDs, not modify old RDDs in place. Here's a quick test: scala val firstRDD = sc.parallelize(1 to 5) firstRDD:

Re: Java RDD Union

2014-12-05 Thread Sean Owen
but there are referents are and somehow this will no longer work when clustering. Thanks for comments. From: so...@cloudera.com Date: Fri, 5 Dec 2014 14:22:38 -0600 Subject: Re: Java RDD Union To: ronalday...@live.com CC: user@spark.apache.org No, RDDs are immutable. union() creates a new