Asif created SPARK-45373:
----------------------------
Summary: Minimizing calls to HiveMetaStore layer for getting
partitions, when tables are repeated
Key: SPARK-45373
URL: https://issues.apache.org/jira/browse/SPARK-45373
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.5.1
Reporter: Asif
Fix For: 3.5.1
In the rule PruneFileSourcePartitions where the CatalogFileIndex gets converted
to InMemoryFileIndex, the HMS calls can get very expensive if :
1) The translated filter string for push down to HMS layer becomes empty ,
resulting in fetching of all partitions and same table is referenced multiple
times in the query.
2) Or just in case same table is referenced multiple times in the query with
different partition filters.
In such cases current code would result in multiple calls to HMS layer.
This can be avoided by grouping the tables based on CatalogFileIndex and
passing a common minimum filter ( filter1 || filter2) and getting a base
PrunedInmemoryFileIndex which can become a basis for each of the specific table.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]