Github user mridulm commented on the issue: https://github.com/apache/spark/pull/22112 I am not sure what the definition of `isIdempotent` here is. For example, from MapPartitionsRDD : ``` override private[spark] def isIdempotent = { if (inputOrderSensitive) { prev.isIdempotent } else { true } } ``` Consider: `val rdd1 = rdd.groupBy().map(...).repartition(...).filter(...)`. By definition above, this would make rdd1 idempotent. Depending on what the definition of idempotent is (partition level, record level, etc) - this can be correct or wrong code. Similarly, I am not sure why idempotency or ordering is depending on `Partitioner`. IMO we should traverse the dependency graph and rely on how `ShuffledRDD` is configured - whether there is a key ordering specified (applies to both global sort and per partition sort), whether it is from a checkpoint or marked for checkpoint, whether it is from a stable input source, etc.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org