[ 
https://issues.apache.org/jira/browse/SPARK-46460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou Tong updated SPARK-46460:
------------------------------
    Attachment: SPARK-46460.patch

> The filter of partition including cast function may lead the partition 
> pruning to disable
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-46460
>                 URL: https://issues.apache.org/jira/browse/SPARK-46460
>             Project: Spark
>          Issue Type: Improvement
>          Components: Optimizer, SQL
>    Affects Versions: 3.2.0
>            Reporter: Zhou Tong
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: SPARK-46460.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SQL:select * from test_db.test_table where day between 
> date_sub('2023-12-01',1) and  '2023-12-03'
> The Physical Plan of sql above will implement _cast_ function on partition 
> col 'day',  like this, {_}cast(day as date) > 2023-11-30{_}. In this 
> situation, spark just pass the filter condition _day < "2023-12-03"_ to 
> HiveMetastore, not including filter condition {_}cast(day as date) > 
> 2023-11-30{_}, which may lead performance of HMS degarde if the HiveTable has 
> huge number of partitions.
>  
> In this regard, a new rule may solve this problem. This rule can convert 
> binary comparison _cast(day as date) > 2023-11-30_ to {_}day > 
> cast(2023-11-30 as string){_}. The right node is foldable, so the result is 
> {_}day > "2023-11-30"{_}, and the filter condition passed to HMS will be _day 
> > "2023-11-30" and_ _day < "2023-12-03"._
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to