[ https://issues.apache.org/jira/browse/SPARK-46460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhou Tong updated SPARK-46460: ------------------------------ Attachment: SPARK-46460.patch > The filter of partition including cast function may lead the partition > pruning to disable > ----------------------------------------------------------------------------------------- > > Key: SPARK-46460 > URL: https://issues.apache.org/jira/browse/SPARK-46460 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL > Affects Versions: 3.2.0 > Reporter: Zhou Tong > Priority: Minor > Labels: pull-request-available > Attachments: SPARK-46460.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > SQL:select * from test_db.test_table where day between > date_sub('2023-12-01',1) and '2023-12-03' > The Physical Plan of sql above will implement _cast_ function on partition > col 'day', like this, {_}cast(day as date) > 2023-11-30{_}. In this > situation, spark just pass the filter condition _day < "2023-12-03"_ to > HiveMetastore, not including filter condition {_}cast(day as date) > > 2023-11-30{_}, which may lead performance of HMS degarde if the HiveTable has > huge number of partitions. > > In this regard, a new rule may solve this problem. This rule can convert > binary comparison _cast(day as date) > 2023-11-30_ to {_}day > > cast(2023-11-30 as string){_}. The right node is foldable, so the result is > {_}day > "2023-11-30"{_}, and the filter condition passed to HMS will be _day > > "2023-11-30" and_ _day < "2023-12-03"._ > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org