Tathagata Das created SPARK-23827:
-------------------------------------

             Summary: StreamingJoinExec should ensure that input data is 
partitioned into specific number of partitions
                 Key: SPARK-23827
                 URL: https://issues.apache.org/jira/browse/SPARK-23827
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 2.3.0
            Reporter: Tathagata Das
            Assignee: Tathagata Das


Currently, the requiredChildDistribution does not specify the partitions. This 
can cause the weird corner cases where the child's distribution is 
`SinglePartition` which satisfies the required distribution of 
`ClusterDistribution(no-num-partition-requirement)`, thus eliminating the 
shuffle needed to repartition input data into the required number of partitions 
(i.e. same as state stores). That can lead to "file not found" errors on the 
state store delta files as the micro-batch-with-no-shuffle will not run certain 
tasks and therefore not generate the expected state store delta files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to