alamb commented on issue #16365:
URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3180916288

   > At the end of the day I'm going to be working on some way to get listing 
resulted cached, and I'd much rather make those changes here to contribute back 
to open source than keep it in our proprietary code. I'm happy to help out to 
move this forward wherever I can.
   
   @BlakeOrth 
   
   
   I think we should make a new issue. I think we can take the same approach 
for listing results as we took for parquet metadata caching (basically follow 
the path that @nuno-faria blazed): 
   - https://github.com/apache/datafusion/issues/17000
   
   Basically 
   1. Provide a default implementation for the (already existing) 
[ListFilesCache](https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/struct.CacheManager.html#method.get_list_files_cache)
   2. Implement some reasonable default value for refresh along with a config 
setting to change that default
   3. Implement some way to see the contents of the cache
   
   If you are willing to potentially help with this work, I can spec it out in 
a ticket / epic.
   
   > In my mind the work to normalize performance between flat and hive 
partitioned datasets is separate, but related, to any work that would actually 
cache the listing results from either of those workflows. Should discussions on 
approach happen here or in separate issue(s) more aligned with the work 
directly?
   
   Since they all use the ListingTable implementation I think the code will the 
same
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to