mkleen opened a new pull request, #20047: URL: https://github.com/apache/datafusion/pull/20047
## Which issue does this PR close? This change introduces a default FileStatisticsCache implementation for the ListingTable with a size limit implementing the following steps following https://github.com/apache/datafusion/issues/19052#issuecomment-3603796097: - Add heap size estimation for file statistics and the relevant data types used in caching (This is temporary until https://github.com/apache/datafusion/pull/19599 and https://github.com/apache/arrow-rs/pull/9138 are resolved) - Redesign DefaultFileStatisticsCache to use an LruQueue, following https://github.com/apache/datafusion/pull/18855 - Introduce a size limit on DefaultFileStatisticsCache This update also moves FileStatisticsCache creation into CacheManager, making it session-scoped and shared across statements and listing tables. Closes https://github.com/apache/datafusion/issues/19217, https://github.com/apache/datafusion/issues/19052 ## Rationale for this change See above. ## What changes are included in this PR? See above. ## Are these changes tested? Yes. ## Are there any user-facing changes? A new runtime setting `datafusion.runtime.file_statistics.cache_limit` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
