suryaprasanna opened a new pull request, #18265:
URL: https://github.com/apache/hudi/pull/18265
### Describe the issue this Pull Request addresses
When metadata table is disabled or corrupted, partition listing operations
can result in expensive recursive filesystem queries. This PR introduces a
catalog-backed approach to fetch partition information directly from the Spark
external catalog, avoiding recursive calls and improving query performance.
### Summary and Changelog
Users gain improved performance for partition listing operations when
metadata table is unavailable. The change introduces:
- Added CatalogBackedTableMetadata class that fetches partitions from
Spark's external catalog
- Added FILE_INDEX_PARTITION_LISTING_VIA_CATALOG config to enable
catalog-based partition listing
- Modified SparkHoodieTableFileIndex to use catalog-backed metadata when
metadata table is not available
- Added PartitionPathFilterUtil for partition path filtering logic
- Refactored BaseHoodieTableFileIndex.createMetadataTable() to be
overridable
- Added comprehensive unit tests in TestCatalogBackedTableMetadata
### Impact
- Performance: Reduced latency for partition listing when metadata table is
disabled by avoiding recursive filesystem
queries
- API Change: Added new config option
FILE_INDEX_PARTITION_LISTING_VIA_CATALOG (default: false)
- Behavior: When enabled and metadata table is unavailable, partitions are
fetched from catalog instead of filesystem
### Risk Level
Low - Feature is behind a config flag (disabled by default). Extensive unit
tests verify catalog-based partition listing behavior. Fallback to existing
filesystem-based approach when config is disabled.
### Documentation Update
Config documentation needs to be updated to include the new
FILE_INDEX_PARTITION_LISTING_VIA_CATALOG option describing when to enable
catalog-based partition listing for performance optimization
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]