[ https://issues.apache.org/jira/browse/HUDI-4812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu closed HUDI-4812. ---------------------------- Resolution: Done > Lazy partition listing and file groups fetching in Spark Query > -------------------------------------------------------------- > > Key: HUDI-4812 > URL: https://issues.apache.org/jira/browse/HUDI-4812 > Project: Apache Hudi > Issue Type: Improvement > Components: spark > Reporter: Yuwei Xiao > Assignee: Yuwei Xiao > Priority: Blocker > Labels: pull-request-available > Fix For: 0.13.0 > > > In current spark query implementation, the FileIndex will refresh and load > all file groups in cached in order to serve subsequent queries. > > For large table with many partitions, this may introduce much overhead in > initialization. Meanwhile, the query itself may come with partition filter. > So the loading of file groups will be unnecessary. > > So to optimize, the whole refresh logic will become lazy, where actual work > will be carried out only after the partition filter. -- This message was sent by Atlassian Jira (v8.20.10#820010)