Phoenix500526 commented on code in PR #23201:
URL: https://github.com/apache/datafusion/pull/23201#discussion_r3517899928
##########
datafusion/execution/src/cache/cache_manager.rs:
##########
@@ -106,22 +109,29 @@ impl CachedFileMetadata {
/// Create a new cached file metadata entry.
pub fn new(
meta: ObjectMeta,
+ schema_fingerprint: Arc<SchemaFingerprint>,
statistics: Arc<Statistics>,
ordering: Option<LexOrdering>,
) -> Self {
Self {
meta,
+ schema_fingerprint,
statistics,
ordering,
}
}
/// Check if this cached entry is still valid for the given metadata.
///
- /// Returns true if the file size and last modified time match.
- pub fn is_valid_for(&self, current_meta: &ObjectMeta) -> bool {
+ /// Returns true if the file size, last modified time, and schema match.
+ pub fn is_valid_for(
+ &self,
+ current_meta: &ObjectMeta,
+ current_schema_fingerprint: &SchemaFingerprint,
+ ) -> bool {
self.meta.size == current_meta.size
&& self.meta.last_modified == current_meta.last_modified
+ && self.schema_fingerprint.as_ref() == current_schema_fingerprint
Review Comment:
Good point. I added an `Arc::ptr_eq` fast path for the shared-fingerprint
case, while keeping exact equality as a fallback for equivalent fingerprints
from different `Arc` allocations. This keeps collision safety and existing
cache reuse behavior, but makes the common successful validation path O(1).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]