Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/22112#discussion_r213061324 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1918,3 +1980,19 @@ object RDD { new DoubleRDDFunctions(rdd.map(x => num.toDouble(x))) } } + +/** + * The random level of RDD's output (i.e. what `RDD#compute` returns), which indicates how the + * output will diff when Spark reruns the tasks for the RDD. There are 3 random levels, ordered + * by the randomness from low to high: --- End diff -- Again, please remove "random" and "randomness". The issue is not randomness, but rather determinism. For example, the output of `RDD#compute` could be completely non-random but still dependent on state not contained in the RDD. That would still make it problematic in terms of recomputing only some partitions and aggregating the results.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org