nuno-faria commented on code in PR #17031: URL: https://github.com/apache/datafusion/pull/17031#discussion_r2251218409
########## Cargo.toml: ########## @@ -153,6 +153,7 @@ hex = { version = "0.4.3" } indexmap = "2.10.0" itertools = "0.14" log = "^0.4" +lru = "0.16.0" Review Comment: The [`lru`](https://crates.io/crates/lru) crate appears to be well maintained and used by a number of popular crates (like `tracing-log`, `redis`, and `aws-sdk-s3`). Let me know if it is ok to include it. ########## datafusion/execution/src/cache/cache_manager.rs: ########## @@ -102,12 +118,19 @@ impl CacheManager { } /// Get the file embedded metadata cache. - pub fn get_file_metadata_cache(&self) -> Option<Arc<dyn FileMetadataCache>> { - self.file_metadata_cache.clone() + pub fn get_file_metadata_cache(&self) -> Arc<dyn FileMetadataCache> { + Arc::clone(&self.file_metadata_cache) + } + + /// Get the limit of the file embedded metadata cache. + pub fn get_file_metadata_cache_limit(&self) -> Option<usize> { + self.file_metadata_cache.cache_limit() } } -#[derive(Clone, Default)] +const DEFAULT_FILE_METADATA_CACHE_LIMIT: usize = 1024 * 1024 * 1024; // 1G Review Comment: The default limit is set to `1G`. I'm not sure if this is an appropriate default. ########## datafusion/core/src/execution/context/mod.rs: ########## @@ -1068,6 +1068,10 @@ impl SessionContext { builder.with_max_temp_directory_size(directory_size as u64) } "temp_directory" => builder.with_temp_file_path(value), + "file_metadata_cache_limit" => { + let limit = Self::parse_memory_limit(value)?; + builder.with_file_metadata_cache_limit(Some(limit)) + } Review Comment: While in theory we can set the limit to `None` to allow unbounded caching, the parsing of `set ...` uses the existing `parse_memory_limit`, which always returns `usize`. ########## datafusion/execution/src/cache/cache_unit.rs: ########## @@ -215,25 +350,23 @@ impl CacheAccessor<ObjectMeta, Arc<dyn FileMetadata>> for DefaultFilesMetadataCa } fn remove(&mut self, k: &ObjectMeta) -> Option<Arc<dyn FileMetadata>> { Review Comment: I noticed that the `remove` method is the only one in the `CacheAccessor` trait that expects a `&mut`. This appears to be inconsistent with the other update methods, but I did not change it since the trait is public. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org