alamb opened a new issue, #17047: URL: https://github.com/apache/datafusion/issues/17047
### Is your feature request related to a problem or challenge? Now that we have a limited parquet metadata cache for the built in ListingTableProvider thanks to @nuno-faria ❤️ in https://github.com/apache/datafusion/pull/17031 There are now two configuration options that control the caching behavior ```sql set datafusion.execution.parquet.cache_metadata = true; ``` And ```sql set datafusion.runtime.file_metadata_cache_limit = 100M ``` Now that we have a cache limit, I think we should consider "always" trying to cache the parquet metadata ### Describe the solution you'd like I suggest we remove `options.cache_metadata` and always try to save the metadata (which will be a noop if the cache is too smal) As @nuno-faria says on https://github.com/apache/datafusion/pull/17031#discussion_r225360044 > I think caching by default would be good. The only situation where it wouldn't help would be one-time scans of parquet files that do not require the page index, but for large files the scan should largely outweigh the page index retrieval anyway. And especially if we limit memory used to 50 or 100MB that people can disable by turning off the cache, I think that would be the best "out of the box" experience for the most users ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org