alamb commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3167659695
> Would there be interest in looking further at the POC or discussing additional strategies for normalizing and reducing the amount of time spent listing objects? I am interested for sure Similarly to the hooks for Parquet Metadata which @nuno-faria is connecting up as part of this epic - https://github.com/apache/datafusion/issues/17000 There is an old partial API for caching the results of listing here: https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html#method.get_list_files_cache Perhaps we can follow the same approach as in #17000 and add a default cache for listing results (maybe even using the same cache limit). The biggest challenge I think will be to clearly articulate when the cached list is expired (so for example, the table picks up changes to the underlying system) Is this something you might be willing to help with @BlakeOrth ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org