alamb commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3180916288
> At the end of the day I'm going to be working on some way to get listing resulted cached, and I'd much rather make those changes here to contribute back to open source than keep it in our proprietary code. I'm happy to help out to move this forward wherever I can. @BlakeOrth I think we should make a new issue. I think we can take the same approach for listing results as we took for parquet metadata caching (basically follow the path that @nuno-faria blazed): - https://github.com/apache/datafusion/issues/17000 Basically 1. Provide a default implementation for the (already existing) [ListFilesCache](https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html#method.get_list_files_cache) 2. Implement some reasonable default value for refresh along with a config setting to change that default 3. Implement some way to see the contents of the cache If you are willing to potentially help with this work, I can spec it out in a ticket / epic. > In my mind the work to normalize performance between flat and hive partitioned datasets is separate, but related, to any work that would actually cache the listing results from either of those workflows. Should discussions on approach happen here or in separate issue(s) more aligned with the work directly? Since they all use the ListingTable implementation I think the code will the same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org