[ 
https://issues.apache.org/jira/browse/SPARK-36128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17380299#comment-17380299
 ] 

Chao Sun edited comment on SPARK-36128 at 7/14/21, 4:24 AM:
------------------------------------------------------------

[~hyukjin.kwon] you are right - I didn't know this config is designed to be 
only used by Hive table scan, but this poses a few issues:
1. by default, data source tables also manage their partitions through HMS, via 
config {{spark.sql.hive.manageFilesourcePartitions}}. This config also says 
"When partition management is enabled, datasource tables store partition in the 
Hive metastore, and use the metastore to prune partitions during query 
planning", so it sounds like they should have the same partition pruning 
mechanism as Hive tables, including the flag.
2. there is effectively only one implementation for {{ExternalCatalog}} which 
is HMS, so I'm not sure why we treat Hive table scans differently than data 
source table scans, even though both of them are reading partition metadata 
from HMS.


was (Author: csun):
[~hyukjin.kwon] you are right - I didn't know this config is designed to be 
only used by Hive table scan, but this poses a few issues:
1. by default, data source tables also manage their partitions through HMS, via 
config {{spark.sql.hive.manageFilesourcePartitions}}. This config also says 
"When partition management is enabled, datasource tables store partition in the 
Hive metastore, and use the metastore to prune partitions during query 
planning", so it sounds like they should have the same partition pruning 
mechanism as Hive tables.
2. there is effectively only one implementation for {{ExternalCatalog}} which 
is HMS, so I'm not sure why we treat Hive table scans differently than data 
source table scans, even though both of them are reading partition metadata 
from HMS.

> CatalogFileIndex.filterPartitions should respect 
> spark.sql.hive.metastorePartitionPruning
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-36128
>                 URL: https://issues.apache.org/jira/browse/SPARK-36128
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Chao Sun
>            Priority: Major
>
> Currently the config {{spark.sql.hive.metastorePartitionPruning}} is only 
> used in {{PruneHiveTablePartitions}} but not {{PruneFileSourcePartitions}}. 
> The latter calls {{CatalogFileIndex.filterPartitions}} which calls 
> {{listPartitionsByFilter}} regardless of whether the above config is set or 
> not. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to