alamb commented on code in PR #17031:
URL: https://github.com/apache/datafusion/pull/17031#discussion_r2254953311


##########
Cargo.toml:
##########
@@ -153,6 +153,7 @@ hex = { version = "0.4.3" }
 indexmap = "2.10.0"
 itertools = "0.14"
 log = "^0.4"
+lru = "0.16.0"

Review Comment:
   I spoke with @XiangpengHao  this afternoon and we have a proposal:
   
   1. Use a FIFO queue -- aka evict the oldest elements first (e.g. store 
`Path`s in a `VecDeque` or something and remove them from the HashSet in order 
when the cache is full)
   
   Benefits of this approach
   1. Simple to implement
   2. Predictable, easy to explain behavior
   
   Downsides:
   1. Can evict item that are frequently used
   2. Doesn't account for how costly an item is to keep in the cache 
   
   However, I think this is an appropriate tradeoff for the default 
implementation because 
   1. a properly sized cache will have no evictions (so the eviction strategy 
for the simple case doesn't matter)
   1. users can supply their own implementations for more advanced usecases
   
   I argue against including anything more complicated in the initial 
implementation because of the many tradeoffs. A cache eviction strategy needs 
to weigh multiple different factors, for example
   1. The cost (eg. size in bytes) of keeping elements in the cache
   3. The  benefit (e.g. how much will the cached metadata improve future 
queries)
   4. The cost of next miss (e.g. the cost to reload metadata from a local disk 
is likely much lower than from S3)
   
   The relative tradeoffs between these factors likely varies substantially 
from system to sytem
   
   Users with more advanced needs can implement whatever strategy is most 
appropriate to their system via the traits



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to