alamb opened a new issue, #17047:
URL: https://github.com/apache/datafusion/issues/17047

   ### Is your feature request related to a problem or challenge?
   
   Now that we have a limited parquet metadata cache for the built in 
ListingTableProvider thanks to @nuno-faria ❤️ in 
https://github.com/apache/datafusion/pull/17031 
   
   There are now two configuration options that control the caching behavior
   
   ```sql
   set datafusion.execution.parquet.cache_metadata = true;
   ```
   
   And 
   ```sql
   set datafusion.runtime.file_metadata_cache_limit = 100M
   ```
   
   Now that we have a cache limit, I think we should consider "always" trying 
to cache the parquet metadata 
   
   ### Describe the solution you'd like
   
   I suggest we remove `options.cache_metadata` and always try to save the 
metadata (which will be a noop if the cache is too smal)
   
   As @nuno-faria says on 
https://github.com/apache/datafusion/pull/17031#discussion_r225360044 
   
   > I think caching by default would be good. The only situation where it 
wouldn't help would be one-time scans of parquet files that do not require the 
page index, but for large files the scan should largely outweigh the page index 
retrieval anyway.
   
   And especially if we limit memory used to 50 or 100MB that people can 
disable by turning off the cache, I think that would be the best "out of the 
box" experience for the most users
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to