alamb opened a new issue, #17001:
URL: https://github.com/apache/datafusion/issues/17001

   ### Is your feature request related to a problem or challenge?
   
   @nuno-faria implemented the core Parquet Metadata caching logic in the 
following PR:
   - https://github.com/apache/datafusion/pull/16971
   
   However, as implemented there is no bound on the amount of memory that is in 
the cache, which will result in a "leak" over time (aka memory usage always 
goes up and never down)
   
   ### Describe the solution you'd like
   
   I would like the cache to have an upper memory limit so we people can turn 
it on / off and its resource use is capped
   
   
   
   ### Describe alternatives you've considered
   
   I personally recommend:
   1. Adding another [Runtime Configuration 
Setting](https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings)
  `datafusion.runtime.file_metadata_cache_limit` with the same interface as 
`datafusion.runtime.memory_limit`
   2. Implement a basic LRU strategy for the cache (when the limit is exceeded, 
evict the least recently used elements until there is space)
   3. Tests for the above
   
   You can get the memory usage for `ParquetMetaData` using the following API: 
https://docs.rs/parquet/latest/parquet/file/metadata/struct.ParquetMetaData.html#method.memory_size
   
   Some care will be needed to make this work with the traits (e.g  you may 
have to change `FileMetadata` into a `trait`)
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to