nuno-faria commented on code in PR #16971:
URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246052908


##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -86,6 +114,10 @@ pub struct CacheManagerConfig {
     /// location.  
     /// Default is disable.
     pub list_files_cache: Option<ListFilesCache>,
+    /// Cache of file-embedded metadata, used to avoid reading it multiple 
times when processing a
+    /// data file (e.g., Parquet footer and page metadata).
+    /// If not provided, the [`CacheManager`] will create a 
[`DefaultFilesMetadataCache`].
+    pub file_metadata_cache: Option<FileMetadataCache>,

Review Comment:
   My initial idea here was to make it easy to enable the metadata cache 
without having to provide a custom `FileMetadataCache` when setting up the 
runtime (default). This way, the user can simply call `set 
datafusion.execution.parquet.cache_metadata = true;` or enable for a file with 
the `ParquetReadOptions`. But I don't know if there is a better approach (maybe 
removing the `Option` for the `file_metadata_cache` altogether?).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to