Re: Not able to update collections

Shixiong Zhu Tue, 24 Feb 2015 22:21:46 -0800

Rdd.foreach runs in the executors. You should use `collect` to fetch data
to the driver. E.g.,


myRdd.collect().foreach {
    node => {
        mp(node) = 1
    }
  }


Best Regards,
Shixiong Zhu

2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan <kvi...@vt.edu>:

> Thanks, but it still doesn't seem to work.
>
> Below is my entire code.
>
>   var mp = scala.collection.mutable.Map[VertexId, Int]()
>
>   var myRdd = graph.edges.groupBy[VertexId](f).flatMap {
>      edgesBySrc => func(edgesBySrc, a, b)
>   }
>
>   myRdd.foreach {
>     node => {
>         mp(node) = 1
>     }
>   }
>
> Values in "mp" do not get updated for any element in "myRdd".
>
> On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> Instead of
>>
>> ...foreach {
>>   edgesBySrc => {
>>       lst ++= func(edgesBySrc)
>>   }
>> }
>>
>> try
>>
>> ...flatMap { edgesBySrc => func(edgesBySrc) }
>>
>> or even more succinctly
>>
>> ...flatMap(func)
>>
>> This returns an RDD that basically has the list you are trying to
>> build, I believe.
>>
>> You can collect() to the driver but beware if it is a huge data set.
>>
>> If you really just mean to count the results, you can count() instead
>>
>> On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan <kvi...@vt.edu>
>> wrote:
>> > I am a beginner to Scala/Spark. Could you please elaborate on how to
>> make
>> > RDD of results of func() and collect?
>> >
>> >
>> > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen <so...@cloudera.com> wrote:
>> >>
>> >> They aren't the same 'lst'. One is on your driver. It gets copied to
>> >> executors when the tasks are executed. Those copies are updated. But
>> >> the updates will never reflect in the local copy back in the driver.
>> >>
>> >> You may just wish to make an RDD of the results of func() and
>> >> collect() them back to the driver.
>> >>
>> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt <kvi...@vt.edu> wrote:
>> >> > I am working on the below piece of code.
>> >> >
>> >> > var lst = scala.collection.mutable.MutableList[VertexId]()
>> >> > graph.edges.groupBy[VertexId](f).foreach {
>> >> >   edgesBySrc => {
>> >> >       lst ++= func(edgesBySrc)
>> >> >   }
>> >> > }
>> >> >
>> >> > println(lst.length)
>> >> >
>> >> > Here, the final println() always says that the length of the list is
>> 0.
>> >> > The
>> >> > list is non-empty (correctly prints the length of the returned list
>> >> > inside
>> >> > func()).
>> >> >
>> >> > I am not sure if I am doing the append correctly. Can someone point
>> out
>> >> > what
>> >> > I am doing wrong?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>> >> >
>> >
>> >
>>
>
>

Re: Not able to update collections

Reply via email to