[ https://issues.apache.org/jira/browse/SPARK-20718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhenhua Wang updated SPARK-20718: --------------------------------- Description: Since `constraints` in `QueryPlan` is a set, the order of filters can differ. Usually this is ok because of canonicalization. However, in `FileSourceScanExec`, its data filters and partition filters are sequences, and their orders are not canonicalized. So `def sameResult` returns different results for different orders of data/partition filters. This leads to, e.g. different decision for `ReuseExchange`, and thus results in unstable performance. The same issue exists in `HiveTableScanExec`. was:Since `constraints` in `QueryPlan` is a set, the order of filters can differ. Usually this is ok because of canonicalization. However, in `FileSourceScanExec`, its data filters and partition filters are sequences, and their orders are not canonicalized. So `def sameResult` returns different results for different orders of data/partition filters. This leads to, e.g. different decision for `ReuseExchange`, and thus results in unstable performance. > FileSourceScanExec with different filter orders should be the same after > canonicalization > ----------------------------------------------------------------------------------------- > > Key: SPARK-20718 > URL: https://issues.apache.org/jira/browse/SPARK-20718 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.2.0 > Reporter: Zhenhua Wang > Assignee: Zhenhua Wang > Fix For: 2.2.0 > > > Since `constraints` in `QueryPlan` is a set, the order of filters can differ. > Usually this is ok because of canonicalization. However, in > `FileSourceScanExec`, its data filters and partition filters are sequences, > and their orders are not canonicalized. So `def sameResult` returns different > results for different orders of data/partition filters. This leads to, e.g. > different decision for `ReuseExchange`, and thus results in unstable > performance. > The same issue exists in `HiveTableScanExec`. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org