With that said, and the nature of iterative algorithms that Spark is advertised 
for, isn't this a bit of an unnecessary restriction since I don't see where the 
problem is. For instance, it is clear that when aggregating you need operations 
to be associative because of the way they are divided and combined. But since 
forEach works on an individual item the same problem doesn't exist. 
As an example, during a k-means algorithm you have to continually update 
cluster assignments per data item along with perhaps distance from centroid.  
So if you can't update items in place you have to literally create thousands 
upon thousands of RDDs. Does Spark have some kind of trick like reuse behind 
the scenes - fully persistent data objects or whatever. How can it possibly be 
efficient for 'iterative' algorithms when it is creating so many RDDs as 
opposed to one? 

> From: so...@cloudera.com
> Date: Fri, 5 Dec 2014 14:58:37 -0600
> Subject: Re: Java RDD Union
> To: ronalday...@live.com; user@spark.apache.org
> 
> foreach also creates a new RDD, and does not modify an existing RDD.
> However, in practice, nothing stops you from fiddling with the Java
> objects inside an RDD when you get a reference to them in a method
> like this. This is definitely a bad idea, as there is certainly no
> guarantee that any other operations will see any, some or all of these
> edits.
> 
> On Fri, Dec 5, 2014 at 2:40 PM, Ron Ayoub <ronalday...@live.com> wrote:
> > I tricked myself into thinking it was uniting things correctly. I see I'm
> > wrong now.
> >
> > I have a question regarding your comment that RDD are immutable. Can you
> > change values in an RDD using forEach. Does that violate immutability. I've
> > been using forEach to modify RDD but perhaps I've tricked myself once again
> > into believing it is working. I have object reference so perhaps it is
> > working serendipitously in local mode since the references are in fact not
> > changing but there are referents are and somehow this will no longer work
> > when clustering.
> >
> > Thanks for comments.
> >
> >> From: so...@cloudera.com
> >> Date: Fri, 5 Dec 2014 14:22:38 -0600
> >> Subject: Re: Java RDD Union
> >> To: ronalday...@live.com
> >> CC: user@spark.apache.org
> >
> >>
> >> No, RDDs are immutable. union() creates a new RDD, and does not modify
> >> an existing RDD. Maybe this obviates the question. I'm not sure what
> >> you mean about releasing from memory. If you want to repartition the
> >> unioned RDD, you repartition the result of union(), not anything else.
> >>
> >> On Fri, Dec 5, 2014 at 1:27 PM, Ron Ayoub <ronalday...@live.com> wrote:
> >> > I'm a bit confused regarding expected behavior of unions. I'm running on
> >> > 8
> >> > cores. I have an RDD that is used to collect cluster associations
> >> > (cluster
> >> > id, content id, distance) for internal clusters as well as leaf clusters
> >> > since I'm doing hierarchical k-means and need all distances for sorting
> >> > documents appropriately upon examination.
> >> >
> >> > It appears that Union simply adds items in the argument to the RDD
> >> > instance
> >> > the method is called on rather than just returning a new RDD. If I want
> >> > to
> >> > do Union this was as more of an add/append should I be capturing the
> >> > return
> >> > value and releasing it from memory. Need help clarifying the semantics
> >> > here.
> >> >
> >> > Also, in another related thread someone mentioned coalesce after union.
> >> > Would I need to do the same on the instance RDD I'm calling Union on.
> >> >
> >> > Perhaps a method such as append would be useful and clearer.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: user-h...@spark.apache.org
> >>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
                                          

Reply via email to