Re: Not able to update collections

2015-02-24 Thread Shixiong Zhu
Rdd.foreach runs in the executors. You should use `collect` to fetch data
to the driver. E.g.,

myRdd.collect().foreach {
node => {
mp(node) = 1
}
  }


Best Regards,
Shixiong Zhu

2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan :

> Thanks, but it still doesn't seem to work.
>
> Below is my entire code.
>
>   var mp = scala.collection.mutable.Map[VertexId, Int]()
>
>   var myRdd = graph.edges.groupBy[VertexId](f).flatMap {
>  edgesBySrc => func(edgesBySrc, a, b)
>   }
>
>   myRdd.foreach {
> node => {
> mp(node) = 1
> }
>   }
>
> Values in "mp" do not get updated for any element in "myRdd".
>
> On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen  wrote:
>
>> Instead of
>>
>> ...foreach {
>>   edgesBySrc => {
>>   lst ++= func(edgesBySrc)
>>   }
>> }
>>
>> try
>>
>> ...flatMap { edgesBySrc => func(edgesBySrc) }
>>
>> or even more succinctly
>>
>> ...flatMap(func)
>>
>> This returns an RDD that basically has the list you are trying to
>> build, I believe.
>>
>> You can collect() to the driver but beware if it is a huge data set.
>>
>> If you really just mean to count the results, you can count() instead
>>
>> On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan 
>> wrote:
>> > I am a beginner to Scala/Spark. Could you please elaborate on how to
>> make
>> > RDD of results of func() and collect?
>> >
>> >
>> > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen  wrote:
>> >>
>> >> They aren't the same 'lst'. One is on your driver. It gets copied to
>> >> executors when the tasks are executed. Those copies are updated. But
>> >> the updates will never reflect in the local copy back in the driver.
>> >>
>> >> You may just wish to make an RDD of the results of func() and
>> >> collect() them back to the driver.
>> >>
>> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt  wrote:
>> >> > I am working on the below piece of code.
>> >> >
>> >> > var lst = scala.collection.mutable.MutableList[VertexId]()
>> >> > graph.edges.groupBy[VertexId](f).foreach {
>> >> >   edgesBySrc => {
>> >> >   lst ++= func(edgesBySrc)
>> >> >   }
>> >> > }
>> >> >
>> >> > println(lst.length)
>> >> >
>> >> > Here, the final println() always says that the length of the list is
>> 0.
>> >> > The
>> >> > list is non-empty (correctly prints the length of the returned list
>> >> > inside
>> >> > func()).
>> >> >
>> >> > I am not sure if I am doing the append correctly. Can someone point
>> out
>> >> > what
>> >> > I am doing wrong?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
>> >> > Sent from the Apache Spark User List mailing list archive at
>> Nabble.com.
>> >> >
>> >> > -
>> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> >> > For additional commands, e-mail: user-h...@spark.apache.org
>> >> >
>> >
>> >
>>
>
>


Re: Not able to update collections

2015-02-24 Thread Vijayasarathy Kannan
Thanks, but it still doesn't seem to work.

Below is my entire code.

  var mp = scala.collection.mutable.Map[VertexId, Int]()

  var myRdd = graph.edges.groupBy[VertexId](f).flatMap {
 edgesBySrc => func(edgesBySrc, a, b)
  }

  myRdd.foreach {
node => {
mp(node) = 1
}
  }

Values in "mp" do not get updated for any element in "myRdd".

On Tue, Feb 24, 2015 at 2:39 PM, Sean Owen  wrote:

> Instead of
>
> ...foreach {
>   edgesBySrc => {
>   lst ++= func(edgesBySrc)
>   }
> }
>
> try
>
> ...flatMap { edgesBySrc => func(edgesBySrc) }
>
> or even more succinctly
>
> ...flatMap(func)
>
> This returns an RDD that basically has the list you are trying to
> build, I believe.
>
> You can collect() to the driver but beware if it is a huge data set.
>
> If you really just mean to count the results, you can count() instead
>
> On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan 
> wrote:
> > I am a beginner to Scala/Spark. Could you please elaborate on how to make
> > RDD of results of func() and collect?
> >
> >
> > On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen  wrote:
> >>
> >> They aren't the same 'lst'. One is on your driver. It gets copied to
> >> executors when the tasks are executed. Those copies are updated. But
> >> the updates will never reflect in the local copy back in the driver.
> >>
> >> You may just wish to make an RDD of the results of func() and
> >> collect() them back to the driver.
> >>
> >> On Tue, Feb 24, 2015 at 7:20 PM, kvvt  wrote:
> >> > I am working on the below piece of code.
> >> >
> >> > var lst = scala.collection.mutable.MutableList[VertexId]()
> >> > graph.edges.groupBy[VertexId](f).foreach {
> >> >   edgesBySrc => {
> >> >   lst ++= func(edgesBySrc)
> >> >   }
> >> > }
> >> >
> >> > println(lst.length)
> >> >
> >> > Here, the final println() always says that the length of the list is
> 0.
> >> > The
> >> > list is non-empty (correctly prints the length of the returned list
> >> > inside
> >> > func()).
> >> >
> >> > I am not sure if I am doing the append correctly. Can someone point
> out
> >> > what
> >> > I am doing wrong?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > View this message in context:
> >> >
> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
> >> > Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >> >
> >> > -
> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >> >
> >
> >
>


Re: Not able to update collections

2015-02-24 Thread Sean Owen
Instead of

...foreach {
  edgesBySrc => {
  lst ++= func(edgesBySrc)
  }
}

try

...flatMap { edgesBySrc => func(edgesBySrc) }

or even more succinctly

...flatMap(func)

This returns an RDD that basically has the list you are trying to
build, I believe.

You can collect() to the driver but beware if it is a huge data set.

If you really just mean to count the results, you can count() instead

On Tue, Feb 24, 2015 at 7:35 PM, Vijayasarathy Kannan  wrote:
> I am a beginner to Scala/Spark. Could you please elaborate on how to make
> RDD of results of func() and collect?
>
>
> On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen  wrote:
>>
>> They aren't the same 'lst'. One is on your driver. It gets copied to
>> executors when the tasks are executed. Those copies are updated. But
>> the updates will never reflect in the local copy back in the driver.
>>
>> You may just wish to make an RDD of the results of func() and
>> collect() them back to the driver.
>>
>> On Tue, Feb 24, 2015 at 7:20 PM, kvvt  wrote:
>> > I am working on the below piece of code.
>> >
>> > var lst = scala.collection.mutable.MutableList[VertexId]()
>> > graph.edges.groupBy[VertexId](f).foreach {
>> >   edgesBySrc => {
>> >   lst ++= func(edgesBySrc)
>> >   }
>> > }
>> >
>> > println(lst.length)
>> >
>> > Here, the final println() always says that the length of the list is 0.
>> > The
>> > list is non-empty (correctly prints the length of the returned list
>> > inside
>> > func()).
>> >
>> > I am not sure if I am doing the append correctly. Can someone point out
>> > what
>> > I am doing wrong?
>> >
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > -
>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: user-h...@spark.apache.org
>> >
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Not able to update collections

2015-02-24 Thread Vijayasarathy Kannan
I am a beginner to Scala/Spark. Could you please elaborate on how to make
RDD of results of func() and collect?


On Tue, Feb 24, 2015 at 2:27 PM, Sean Owen  wrote:

> They aren't the same 'lst'. One is on your driver. It gets copied to
> executors when the tasks are executed. Those copies are updated. But
> the updates will never reflect in the local copy back in the driver.
>
> You may just wish to make an RDD of the results of func() and
> collect() them back to the driver.
>
> On Tue, Feb 24, 2015 at 7:20 PM, kvvt  wrote:
> > I am working on the below piece of code.
> >
> > var lst = scala.collection.mutable.MutableList[VertexId]()
> > graph.edges.groupBy[VertexId](f).foreach {
> >   edgesBySrc => {
> >   lst ++= func(edgesBySrc)
> >   }
> > }
> >
> > println(lst.length)
> >
> > Here, the final println() always says that the length of the list is 0.
> The
> > list is non-empty (correctly prints the length of the returned list
> inside
> > func()).
> >
> > I am not sure if I am doing the append correctly. Can someone point out
> what
> > I am doing wrong?
> >
> >
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>


Re: Not able to update collections

2015-02-24 Thread Sean Owen
They aren't the same 'lst'. One is on your driver. It gets copied to
executors when the tasks are executed. Those copies are updated. But
the updates will never reflect in the local copy back in the driver.

You may just wish to make an RDD of the results of func() and
collect() them back to the driver.

On Tue, Feb 24, 2015 at 7:20 PM, kvvt  wrote:
> I am working on the below piece of code.
>
> var lst = scala.collection.mutable.MutableList[VertexId]()
> graph.edges.groupBy[VertexId](f).foreach {
>   edgesBySrc => {
>   lst ++= func(edgesBySrc)
>   }
> }
>
> println(lst.length)
>
> Here, the final println() always says that the length of the list is 0. The
> list is non-empty (correctly prints the length of the returned list inside
> func()).
>
> I am not sure if I am doing the append correctly. Can someone point out what
> I am doing wrong?
>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Not able to update collections

2015-02-24 Thread kvvt
I am working on the below piece of code.

var lst = scala.collection.mutable.MutableList[VertexId]()
graph.edges.groupBy[VertexId](f).foreach {
  edgesBySrc => {
  lst ++= func(edgesBySrc)
  }
}

println(lst.length)

Here, the final println() always says that the length of the list is 0. The
list is non-empty (correctly prints the length of the returned list inside
func()).

I am not sure if I am doing the append correctly. Can someone point out what
I am doing wrong?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Not-able-to-update-collections-tp21790.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org