[ https://issues.apache.org/jira/browse/HUDI-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HUDI-4812: --------------------------------- Labels: pull-request-available (was: ) > Lazy partition listing and file groups fetching in Spark Query > -------------------------------------------------------------- > > Key: HUDI-4812 > URL: https://issues.apache.org/jira/browse/HUDI-4812 > Project: Apache Hudi > Issue Type: Improvement > Reporter: Yuwei Xiao > Assignee: Yuwei Xiao > Priority: Blocker > Labels: pull-request-available > Fix For: 0.12.1 > > > In current spark query implementation, the FileIndex will refresh and load > all file groups in cached in order to serve subsequent queries. > > For large table with many partitions, this may introduce much overhead in > initialization. Meanwhile, the query itself may come with partition filter. > So the loading of file groups will be unnecessary. > > So to optimize, the whole refresh logic will become lazy, where actual work > will be carried out only after the partition filter. -- This message was sent by Atlassian Jira (v8.20.10#820010)