alamb commented on issue #16365:
URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3167659695

   > Would there be interest in looking further at the POC or discussing 
additional strategies for normalizing and reducing the amount of time spent 
listing objects?
   
   I am interested for sure
   
   Similarly to the hooks for Parquet Metadata which @nuno-faria is connecting 
up as part of this epic
   - https://github.com/apache/datafusion/issues/17000
   
   There is an old partial API for caching the results of listing here:
   
https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html#method.get_list_files_cache
   
   Perhaps we can follow the same approach as in #17000 and add a default cache 
for listing results (maybe even using the same cache limit).
   
   The biggest challenge I think will be to clearly articulate when the cached 
list is expired (so for example, the table picks up changes to the underlying 
system)
   
   Is this something you might be willing to help with @BlakeOrth ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to