abhita commented on issue #18405:
URL: https://github.com/apache/datafusion/issues/18405#issuecomment-3519855274
Thanks for the feedback, @alamb @nuno-faria
Regarding your suggestion to provide a custom cache manager - I’ve actually
looked into this, and the issue is that with the current design, I would need
to reimplement the entire cache accessor logic and all cache operations. This
leads to significant code duplication.
Following that approach, I implemented a custom `FileMetadataCache` with
decoupled storage and eviction:
```rust
pub struct MyCustomFileMetadataCache {
inner_cache: DashMap<ObjectMeta, Arc<dyn FileMetadata>>,
eviction_strategy: Arc<Mutex<Box<dyn MyEvictionStrategy>>>,
memory_limit: usize,
}
pub trait MyEvictionStrategy: Send + Sync {
fn on_access(&mut self, key: &ObjectMeta, size: usize);
fn select_for_eviction(&self, target_size: usize) -> Vec<ObjectMeta>;
}
impl FileMetadataCache for MyCustomFileMetadataCache {
fn get_with_extra(&self, k: &ObjectMeta, e: &ObjectMeta) ->
Option<Arc<dyn FileMetadata>> {
if let Some(entry) = self.inner_cache.get(k) {
self.eviction_strategy.lock().unwrap().on_access(k,
entry.memory_size());
return Some(entry.clone());
}
None
}
// ... 9 more trait methods
}
// Configure it
let config = CacheManagerConfig::default()
.with_file_metadata_cache(Some(Arc::new(MyCustomFileMetadataCache::new())));
```
## Why This Isn't a Clean Solution:
__1. I'm Building Framework Infrastructure:__ I'm implementing the entire
decoupling pattern (storage abstraction, eviction trait, memory management)
that should exist in framework itself.
__2. Maintenance Burden:__ Bug fixes and improvements to DataFusion's cache
infrastructure don't benefit the custom implementation.
__3. Not Scalable:__ Every user needing different eviction strategies must
implement this same decoupling pattern independently.
## Again Coming back to the Proposed Solution:
Moving the decoupling __into DataFusion__:
```rust
// In DataFusion - written once, maintained once
pub struct CustomFileMetadataCache {
inner: DashMap<ObjectMeta, Arc<dyn FileMetadata>>,
strategy: Arc<Mutex<Box<dyn EvictionStrategy>>>,
}
```
Users just configure the given eviction strategy. With this, the
maintenance of Cache storage and retrieval lies within Datafusion thereby
eliminating room for errors in case of : Cache Access, Memory Tracking, Metric
Evaluation,etc
```rust
let cache = CustomFileMetadataCache::new().with_strategy(LfuStrategy::new());
```
__How this solves the issue:__
- For different strategies I can configure the eviction policy without
reimplementing cache logic
- The cache storage code is written once and tested once
- Bug fixes apply to all strategies automatically
- Users wanting basic LRU continue using `DefaultFilesMetadataCache`
unchanged
- Allowing users to swap strategies without re-initializing the entire cache
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]