Hi, I'm quite interested in how Spark's fault tolerance works and I'd like to ask a question here.
According to the paper, there are two kinds of dependencies--the wide dependency and the narrow dependency. My understanding is, if the operations I use are all "narrow", then when one machine crashes, the system just need to recover the lost RDDs from the most recent checkpoint. However, if all transformations are "wide"(e.g. in calculating PageRank), then when one node crashes, all other nodes need to roll back to the most recent checkpoint. Is my understanding correct? Thanks! Best Regards, Fan