[jira] [Commented] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2024-03-05 Thread TianyiMa (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17823524#comment-17823524
 ] 

TianyiMa commented on SPARK-45387:
--

[~doki] the output execution plan is the final result, but the problem lies in 
the optimize process.

In your example, the partition key is stringType, but was cast to int to filter 
partitions. The driver will get all the partition to do this filter. If you 
have a hive table with thousands of partitions, this process will very slow and 
costly.

> Partition key filter cannot be pushed down when using cast
> --
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source, but if the filter condition is number, e.g. dt = 123, that cannot be 
> pushed down to data source, causing spark to pull all of that table's 
> partition meta data to client, which is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45387) Partition key filter cannot be pushed down when using cast

2024-02-04 Thread Jie Han (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17814150#comment-17814150
 ] 

Jie Han commented on SPARK-45387:
-

I can't reproduce, can you give me a short reproduction?

> Partition key filter cannot be pushed down when using cast
> --
>
> Key: SPARK-45387
> URL: https://issues.apache.org/jira/browse/SPARK-45387
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1, 3.1.2, 3.3.0, 3.4.0
>Reporter: TianyiMa
>Priority: Critical
> Attachments: PruneFileSourcePartitions.diff
>
>
> Suppose we have a partitioned table `table_pt` with partition colum `dt` 
> which is StringType and the table metadata is managed by Hive Metastore, if 
> we filter partition by dt = '123', this filter can be pushed down to data 
> source, but if the filter condition is number, e.g. dt = 123, that cannot be 
> pushed down to data source, causing spark to pull all of that table's 
> partition meta data to client, which is poor of performance if the table has 
> thousands of partitions and increasing the risk of hive metastore oom.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org