Hi, I am using the fold(zeroValue)(t1, t2) on the RDD & I noticed that it runs in parallel on all the partitions & then aggregates the results from the partitions. My data object is not aggregate-able & I was wondering if there's any way to run the fold sequentially. [I am looking to do a foldLeft kind of scala operaton].
Here's what I want: run_partition1 -> get_t1_and_send_to_next_partition -> run_partition_2 -> get_t1_and_send_to_next_partition .. I tried setting coalesce(1, true) on the parent RDD & since I have a lot of data (30G) it was trying to shuffle all the data to one node & took forever so that's not really an option. Thanks, -C -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-fold-question-tp15888.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org