BlakeOrth commented on code in PR #16971:
URL: https://github.com/apache/datafusion/pull/16971#discussion_r2246474646


##########
datafusion/execution/src/cache/cache_unit.rs:
##########
@@ -157,9 +158,79 @@ impl CacheAccessor<Path, Arc<Vec<ObjectMeta>>> for 
DefaultListFilesCache {
     }
 }
 
+/// Collected file embedded metadata cache.
+/// The metadata for some file is invalided when the file size or last 
modification time have been
+/// changed.
+#[derive(Default)]
+pub struct DefaultFilesMetadataCache {
+    metadata: DashMap<Path, (ObjectMeta, Arc<FileMetadata>)>,
+}
+
+impl CacheAccessor<Path, Arc<FileMetadata>> for DefaultFilesMetadataCache {
+    type Extra = ObjectMeta;
+
+    fn get(&self, _k: &Path) -> Option<Arc<FileMetadata>> {
+        panic!("get in DefaultFilesMetadataCache is not supported, please use 
get_with_extra")
+    }
+
+    fn get_with_extra(&self, k: &Path, e: &Self::Extra) -> 
Option<Arc<FileMetadata>> {
+        self.metadata
+            .get(k)
+            .map(|s| {
+                let (extra, metadata) = s.value();
+                if extra.size != e.size || extra.last_modified != 
e.last_modified {
+                    None
+                } else {
+                    Some(Arc::clone(metadata))
+                }
+            })
+            .unwrap_or(None)
+    }
+
+    fn put(&self, _key: &Path, _value: Arc<FileMetadata>) -> 
Option<Arc<FileMetadata>> {
+        panic!("put in DefaultFilesMetadataCache is not supported, please use 
put_with_extra")

Review Comment:
   Preface: I'm a user looking forward to this work being implemented, so don't 
let anything said here impact the timeline for this PR.
   
   As a user who has recently been looking into (seemingly) IO related 
performance using remote parquet backing listing tables, I came across the 
`cache_unit.rs` source the other day and similarly felt a bit concerned that 
methods in the default cache implementation code could panic. Obviously an 
error that can be handled is better than a panic; is this a situation that 
could be reasonably handled at compile time? This implementation makes use of 
only `_with_extra` implementations, but the `DefaultListFilesCache` makes use 
of only the normal `get` and `put`. Returning a `Result` would already be API 
breaking, perhaps the `CacheAccessor` trait could be broken into several parts 
indicating whether the implementer intends to support just `get` and `put` or 
the `_with_extra` variants.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to