alamb opened a new issue, #17001: URL: https://github.com/apache/datafusion/issues/17001
### Is your feature request related to a problem or challenge? @nuno-faria implemented the core Parquet Metadata caching logic in the following PR: - https://github.com/apache/datafusion/pull/16971 However, as implemented there is no bound on the amount of memory that is in the cache, which will result in a "leak" over time (aka memory usage always goes up and never down) ### Describe the solution you'd like I would like the cache to have an upper memory limit so we people can turn it on / off and its resource use is capped ### Describe alternatives you've considered I personally recommend: 1. Adding another [Runtime Configuration Setting](https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings) `datafusion.runtime.file_metadata_cache_limit` with the same interface as `datafusion.runtime.memory_limit` 2. Implement a basic LRU strategy for the cache (when the limit is exceeded, evict the least recently used elements until there is space) 3. Tests for the above You can get the memory usage for `ParquetMetaData` using the following API: https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaData.html#method.memory_size Some care will be needed to make this work with the traits (e.g you may have to change `FileMetadata` into a `trait`) ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org