My job is not being fault-tolerant (e.g., when there's a fetch failure or
something).

The lineage of RDDs are constantly updated every iteration. However, I
think that when there's a failure, the lineage information is not being
correctly reapplied.

It goes something like this:

val rawRDD = read(...)
val repartRDD = rawRDD.repartition(X)

val tx1 = repartRDD.map(...)
var tx2 = tx1.map(...)

while (...) {
  tx2 = tx1.zip(tx2).map(...)
}


Is there any way to monitor RDD's lineage, maybe even including? I want to
make sure that there's no unexpected things happening.

Reply via email to