Re: mapPartitions - How Does it Works

2015-03-18 Thread Alex Turner (TMS)
List(x.next).iterator is giving you the first element from each partition, which would be 1, 4 and 7 respectively. On 3/18/15, 10:19 AM, ashish.usoni ashish.us...@gmail.com wrote: I am trying to understand about mapPartitions but i am still not sure how it works in the below example it create

RDD pair to pair of RDDs

2015-03-18 Thread Alex Turner (TMS)
What's the best way to go from: RDD[(A, B)] to (RDD[A], RDD[B]) If I do: def separate[A, B](k: RDD[(A, B)]) = (k.map(_._1), k.map(_._2)) Which is the obvious solution, this runs two maps in the cluster. Can I do some kind of a fold instead: def separate[A, B](l: List[(A, B)]) =

Memory Settings for local execution context

2015-03-17 Thread Alex Turner (TMS)
So the page that talks about settings: http://spark.apache.org/docs/1.2.1/configuration.html seems to not apply when running local contexts. I have a shell script that starts my job: xport SPARK_MASTER_OPTS=-Dsun.io.serialization.extendedDebugInfo=true export