spark fold question

chinchu Tue, 07 Oct 2014 17:51:03 -0700

Hi,

I am using the fold(zeroValue)(t1, t2) on the RDD & I noticed that it runs
in parallel on all the partitions & then aggregates the results from the
partitions. My data object is not aggregate-able & I was wondering if
there's any way to run the fold sequentially. [I am looking to do a foldLeft
kind of scala operaton].


Here's what I want:
run_partition1 -> get_t1_and_send_to_next_partition -> run_partition_2 ->
get_t1_and_send_to_next_partition ..

I tried setting coalesce(1, true) on the parent RDD & since I have a lot of
data (30G) it was trying to shuffle all the data to one node & took forever
so that's not really an option.

Thanks,
-C



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-fold-question-tp15888.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

spark fold question

Reply via email to