suryaprasanna opened a new pull request, #18265:
URL: https://github.com/apache/hudi/pull/18265

   ### Describe the issue this Pull Request addresses
   
   When metadata table is disabled or corrupted, partition listing operations 
can result in expensive recursive filesystem queries. This PR introduces a 
catalog-backed approach to fetch partition information directly from the Spark 
external catalog, avoiding recursive calls and improving query performance.
   
   ### Summary and Changelog
   
   Users gain improved performance for partition listing operations when 
metadata table is unavailable. The change introduces:
   
    - Added CatalogBackedTableMetadata class that fetches partitions from 
Spark's external catalog
    - Added FILE_INDEX_PARTITION_LISTING_VIA_CATALOG config to enable 
catalog-based partition listing
    - Modified SparkHoodieTableFileIndex to use catalog-backed metadata when 
metadata table is not available
    - Added PartitionPathFilterUtil for partition path filtering logic
    - Refactored BaseHoodieTableFileIndex.createMetadataTable() to be 
overridable
    - Added comprehensive unit tests in TestCatalogBackedTableMetadata
   
   ### Impact
   
    - Performance: Reduced latency for partition listing when metadata table is 
disabled by avoiding recursive filesystem
     queries
    - API Change: Added new config option 
FILE_INDEX_PARTITION_LISTING_VIA_CATALOG (default: false)
    - Behavior: When enabled and metadata table is unavailable, partitions are 
fetched from catalog instead of filesystem
   
   ### Risk Level
   
   Low - Feature is behind a config flag (disabled by default). Extensive unit 
tests verify catalog-based partition listing behavior. Fallback to existing 
filesystem-based approach when config is disabled.
   
   ### Documentation Update
   
   Config documentation needs to be updated to include the new 
FILE_INDEX_PARTITION_LISTING_VIA_CATALOG option describing when to enable 
catalog-based partition listing for performance optimization
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to