GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/20941
Spark 23827 ## What changes were proposed in this pull request? Currently, the requiredChildDistribution does not specify the partitions. This can cause the weird corner cases where the child's distribution is `SinglePartition` which satisfies the required distribution of `ClusterDistribution(no-num-partition-requirement)`, thus eliminating the shuffle needed to repartition input data into the required number of partitions (i.e. same as state stores). That can lead to "file not found" errors on the state store delta files as the micro-batch-with-no-shuffle will not run certain tasks and therefore not generate the expected state store delta files. This PR adds the required constraint on the number of partitions. ## How was this patch tested? Modified test harness to always check that ANY stateful operator should have a constraint on the number of partitions. As part of that, the existing opt-in checks on child output partitioning were removed, as they are redundant. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark SPARK-23827 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20941.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20941 ---- commit 02cc5509455d3f9d6d683a46fe4a50fcde8da348 Author: Tathagata Das <tathagata.das1565@...> Date: 2018-03-29T02:38:59Z Fixed join issue commit 7046fbd5244e5d3adb75b7d090d57f1adc8b9859 Author: Tathagata Das <tathagata.das1565@...> Date: 2018-03-29T22:32:55Z Fix compilation commit c162f8def7f7f57b9e8b954a5fe2f96368b5ed2f Author: Tathagata Das <tathagata.das1565@...> Date: 2018-03-29T23:22:53Z Removed unnecessary tests ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org