foreach also creates a new RDD, and does not modify an existing RDD. However, in practice, nothing stops you from fiddling with the Java objects inside an RDD when you get a reference to them in a method like this. This is definitely a bad idea, as there is certainly no guarantee that any other operations will see any, some or all of these edits.
On Fri, Dec 5, 2014 at 2:40 PM, Ron Ayoub <ronalday...@live.com> wrote: > I tricked myself into thinking it was uniting things correctly. I see I'm > wrong now. > > I have a question regarding your comment that RDD are immutable. Can you > change values in an RDD using forEach. Does that violate immutability. I've > been using forEach to modify RDD but perhaps I've tricked myself once again > into believing it is working. I have object reference so perhaps it is > working serendipitously in local mode since the references are in fact not > changing but there are referents are and somehow this will no longer work > when clustering. > > Thanks for comments. > >> From: so...@cloudera.com >> Date: Fri, 5 Dec 2014 14:22:38 -0600 >> Subject: Re: Java RDD Union >> To: ronalday...@live.com >> CC: user@spark.apache.org > >> >> No, RDDs are immutable. union() creates a new RDD, and does not modify >> an existing RDD. Maybe this obviates the question. I'm not sure what >> you mean about releasing from memory. If you want to repartition the >> unioned RDD, you repartition the result of union(), not anything else. >> >> On Fri, Dec 5, 2014 at 1:27 PM, Ron Ayoub <ronalday...@live.com> wrote: >> > I'm a bit confused regarding expected behavior of unions. I'm running on >> > 8 >> > cores. I have an RDD that is used to collect cluster associations >> > (cluster >> > id, content id, distance) for internal clusters as well as leaf clusters >> > since I'm doing hierarchical k-means and need all distances for sorting >> > documents appropriately upon examination. >> > >> > It appears that Union simply adds items in the argument to the RDD >> > instance >> > the method is called on rather than just returning a new RDD. If I want >> > to >> > do Union this was as more of an add/append should I be capturing the >> > return >> > value and releasing it from memory. Need help clarifying the semantics >> > here. >> > >> > Also, in another related thread someone mentioned coalesce after union. >> > Would I need to do the same on the instance RDD I'm calling Union on. >> > >> > Perhaps a method such as append would be useful and clearer. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org