Re: Not able to update collections
Rdd.foreach runs in the executors. You should use `collect` to fetch data to the driver. E.g., myRdd.collect().foreach { node => { mp(node) = 1 } } Best Regards, Shixiong Zhu 2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan : > Thanks, but it still doesn't seem to work. > > Below is my entire code. > > var mp = scala.collection.mutable.Map[VertexId, Int]() > > var myRdd = graph.edges.groupBy[VertexId](f).flatMap { > edgesBySrc => func(edgesBySrc, a, b) > } > > myRdd.foreach { > node => { > mp(node) = 1 > } > } > > Values in "mp" do not get updated for any element in "myRdd". > > On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen wrote: > >> Instead of >> >> ...foreach { >> edgesBySrc => { >> lst ++= func(edgesBySrc) >> } >> } >> >> try >> >> ...flatMap { edgesBySrc => func(edgesBySrc) } >> >> or even more succinctly >> >> ...flatMap(func) >> >> This returns an RDD that basically has the list you are trying to >> build, I believe. >> >> You can collect() to the driver but beware if it is a huge data set. >> >> If you really just mean to count the results, you can count() instead >> >> On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan >> wrote: >> > I am a beginner to Scala/Spark. Could you please elaborate on how to >> make >> > RDD of results of func() and collect? >> > >> > >> > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen wrote: >> >> >> >> They aren't the same 'lst'. One is on your driver. It gets copied to >> >> executors when the tasks are executed. Those copies are updated. But >> >> the updates will never reflect in the local copy back in the driver. >> >> >> >> You may just wish to make an RDD of the results of func() and >> >> collect() them back to the driver. >> >> >> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt wrote: >> >> > I am working on the below piece of code. >> >> > >> >> > var lst = scala.collection.mutable.MutableList[VertexId]() >> >> > graph.edges.groupBy[VertexId](f).foreach { >> >> > edgesBySrc => { >> >> > lst ++= func(edgesBySrc) >> >> > } >> >> > } >> >> > >> >> > println(lst.length) >> >> > >> >> > Here, the final println() always says that the length of the list is >> 0. >> >> > The >> >> > list is non-empty (correctly prints the length of the returned list >> >> > inside >> >> > func()). >> >> > >> >> > I am not sure if I am doing the append correctly. Can someone point >> out >> >> > what >> >> > I am doing wrong? >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> > >> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html >> >> > Sent from the Apache Spark User List mailing list archive at >> Nabble.com. >> >> > >> >> > - >> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> >> > For additional commands, e-mail: user-h...@spark.apache.org >> >> > >> > >> > >> > >
Re: Not able to update collections
Thanks, but it still doesn't seem to work. Below is my entire code. var mp = scala.collection.mutable.Map[VertexId, Int]() var myRdd = graph.edges.groupBy[VertexId](f).flatMap { edgesBySrc => func(edgesBySrc, a, b) } myRdd.foreach { node => { mp(node) = 1 } } Values in "mp" do not get updated for any element in "myRdd". On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen wrote: > Instead of > > ...foreach { > edgesBySrc => { > lst ++= func(edgesBySrc) > } > } > > try > > ...flatMap { edgesBySrc => func(edgesBySrc) } > > or even more succinctly > > ...flatMap(func) > > This returns an RDD that basically has the list you are trying to > build, I believe. > > You can collect() to the driver but beware if it is a huge data set. > > If you really just mean to count the results, you can count() instead > > On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan > wrote: > > I am a beginner to Scala/Spark. Could you please elaborate on how to make > > RDD of results of func() and collect? > > > > > > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen wrote: > >> > >> They aren't the same 'lst'. One is on your driver. It gets copied to > >> executors when the tasks are executed. Those copies are updated. But > >> the updates will never reflect in the local copy back in the driver. > >> > >> You may just wish to make an RDD of the results of func() and > >> collect() them back to the driver. > >> > >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt wrote: > >> > I am working on the below piece of code. > >> > > >> > var lst = scala.collection.mutable.MutableList[VertexId]() > >> > graph.edges.groupBy[VertexId](f).foreach { > >> > edgesBySrc => { > >> > lst ++= func(edgesBySrc) > >> > } > >> > } > >> > > >> > println(lst.length) > >> > > >> > Here, the final println() always says that the length of the list is > 0. > >> > The > >> > list is non-empty (correctly prints the length of the returned list > >> > inside > >> > func()). > >> > > >> > I am not sure if I am doing the append correctly. Can someone point > out > >> > what > >> > I am doing wrong? > >> > > >> > > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > > http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html > >> > Sent from the Apache Spark User List mailing list archive at > Nabble.com. > >> > > >> > - > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > >> > > > > > >
Re: Not able to update collections
Instead of ...foreach { edgesBySrc => { lst ++= func(edgesBySrc) } } try ...flatMap { edgesBySrc => func(edgesBySrc) } or even more succinctly ...flatMap(func) This returns an RDD that basically has the list you are trying to build, I believe. You can collect() to the driver but beware if it is a huge data set. If you really just mean to count the results, you can count() instead On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan wrote: > I am a beginner to Scala/Spark. Could you please elaborate on how to make > RDD of results of func() and collect? > > > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen wrote: >> >> They aren't the same 'lst'. One is on your driver. It gets copied to >> executors when the tasks are executed. Those copies are updated. But >> the updates will never reflect in the local copy back in the driver. >> >> You may just wish to make an RDD of the results of func() and >> collect() them back to the driver. >> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt wrote: >> > I am working on the below piece of code. >> > >> > var lst = scala.collection.mutable.MutableList[VertexId]() >> > graph.edges.groupBy[VertexId](f).foreach { >> > edgesBySrc => { >> > lst ++= func(edgesBySrc) >> > } >> > } >> > >> > println(lst.length) >> > >> > Here, the final println() always says that the length of the list is 0. >> > The >> > list is non-empty (correctly prints the length of the returned list >> > inside >> > func()). >> > >> > I am not sure if I am doing the append correctly. Can someone point out >> > what >> > I am doing wrong? >> > >> > >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > - >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > > > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Not able to update collections
I am a beginner to Scala/Spark. Could you please elaborate on how to make RDD of results of func() and collect? On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen wrote: > They aren't the same 'lst'. One is on your driver. It gets copied to > executors when the tasks are executed. Those copies are updated. But > the updates will never reflect in the local copy back in the driver. > > You may just wish to make an RDD of the results of func() and > collect() them back to the driver. > > On Tue, Feb 24, 2015 at 7:20 PM, kvvt wrote: > > I am working on the below piece of code. > > > > var lst = scala.collection.mutable.MutableList[VertexId]() > > graph.edges.groupBy[VertexId](f).foreach { > > edgesBySrc => { > > lst ++= func(edgesBySrc) > > } > > } > > > > println(lst.length) > > > > Here, the final println() always says that the length of the list is 0. > The > > list is non-empty (correctly prints the length of the returned list > inside > > func()). > > > > I am not sure if I am doing the append correctly. Can someone point out > what > > I am doing wrong? > > > > > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > >
Re: Not able to update collections
They aren't the same 'lst'. One is on your driver. It gets copied to executors when the tasks are executed. Those copies are updated. But the updates will never reflect in the local copy back in the driver. You may just wish to make an RDD of the results of func() and collect() them back to the driver. On Tue, Feb 24, 2015 at 7:20 PM, kvvt wrote: > I am working on the below piece of code. > > var lst = scala.collection.mutable.MutableList[VertexId]() > graph.edges.groupBy[VertexId](f).foreach { > edgesBySrc => { > lst ++= func(edgesBySrc) > } > } > > println(lst.length) > > Here, the final println() always says that the length of the list is 0. The > list is non-empty (correctly prints the length of the returned list inside > func()). > > I am not sure if I am doing the append correctly. Can someone point out what > I am doing wrong? > > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Not able to update collections
I am working on the below piece of code. var lst = scala.collection.mutable.MutableList[VertexId]() graph.edges.groupBy[VertexId](f).foreach { edgesBySrc => { lst ++= func(edgesBySrc) } } println(lst.length) Here, the final println() always says that the length of the list is 0. The list is non-empty (correctly prints the length of the returned list inside func()). I am not sure if I am doing the append correctly. Can someone point out what I am doing wrong? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org