[ https://issues.apache.org/jira/browse/SPARK-45373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Asif updated SPARK-45373: ------------------------- Shepherd: Wenchen Fan > Minimizing calls to HiveMetaStore layer for getting partitions, when tables > are repeated > ----------------------------------------------------------------------------------------- > > Key: SPARK-45373 > URL: https://issues.apache.org/jira/browse/SPARK-45373 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 3.5.0 > Reporter: Asif > Priority: Major > Labels: pull-request-available > > In the rule PruneFileSourcePartitions where the CatalogFileIndex gets > converted to InMemoryFileIndex, the HMS calls can get very expensive if : > 1) The translated filter string for push down to HMS layer becomes empty , > resulting in fetching of all partitions and same table is referenced multiple > times in the query. > 2) Or just in case same table is referenced multiple times in the query with > different partition filters. > In such cases current code would result in multiple calls to HMS layer. > This can be avoided by grouping the tables based on CatalogFileIndex and > passing a common minimum filter ( filter1 || filter2) and getting a base > PrunedInmemoryFileIndex which can become a basis for each of the specific > table. > Opened following PR for ticket: > [SPARK-45373-PR|https://github.com/apache/spark/pull/43183] -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org