Do not call collect as that will perform materialization as well as transfer of data to driver (might actually cause driver to fail if the data is huge). You have to materialize the RDD in some way(call save, count, collect).
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Tue, Jun 24, 2014 at 2:50 AM, Xiangrui Meng <men...@gmail.com> wrote: > Calling checkpoint() alone doesn't cut the lineage. It only marks the > RDD as to be checkpointed. The lineage is cut after the first time > this RDD is materialized. You see StackOverflow becaure the lineage is > still there. -Xiangrui > > On Sun, Jun 22, 2014 at 6:37 PM, dash <b...@nd.edu> wrote: > > Hi Xiangrui, > > > > According to my knowledge, calling count is for materialize the RDD, does > > collect do the same thing since it also an action? I can not call count > > because for a Graph object, count does not materialize the RDD. I already > > send an issue on that. > > > > My question is, why there still have stack overflow even if > `isCheckpointed` > > is true? > > > > > > > > -- > > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/Checkpointed-RDD-still-causing-StackOverflow-tp7066p7068.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. >