GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/20941

    Spark 23827

    ## What changes were proposed in this pull request?
    
    Currently, the requiredChildDistribution does not specify the partitions. 
This can cause the weird corner cases where the child's distribution is 
`SinglePartition` which satisfies the required distribution of 
`ClusterDistribution(no-num-partition-requirement)`, thus eliminating the 
shuffle needed to repartition input data into the required number of partitions 
(i.e. same as state stores). That can lead to "file not found" errors on the 
state store delta files as the micro-batch-with-no-shuffle will not run certain 
tasks and therefore not generate the expected state store delta files.
    
    This PR adds the required constraint on the number of partitions.
    
    ## How was this patch tested?
    Modified test harness to always check that ANY stateful operator should 
have a constraint on the number of partitions. As part of that, the existing 
opt-in checks on child output partitioning were removed, as they are redundant.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-23827

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20941.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20941
    
----
commit 02cc5509455d3f9d6d683a46fe4a50fcde8da348
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-03-29T02:38:59Z

    Fixed join issue

commit 7046fbd5244e5d3adb75b7d090d57f1adc8b9859
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-03-29T22:32:55Z

    Fix compilation

commit c162f8def7f7f57b9e8b954a5fe2f96368b5ed2f
Author: Tathagata Das <tathagata.das1565@...>
Date:   2018-03-29T23:22:53Z

    Removed unnecessary tests

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to