alamb commented on issue #7556: URL: https://github.com/apache/arrow-datafusion/issues/7556#issuecomment-1720040111
I think keeping a metadata cache on the RuntimeEnv is reasonable as long as 1. There is a way to extend / disable the default behavior (as there is with the DiskManager and MemoryPool). 2. The default implementation in DataFusion is simple The rationale for something simple built in but a configurable API is that the exact caching strategy is likely to vary tremendously from system to system (for example, if there is a local file based parquet cache, storing metadata in memory might not make sense, or how to do cache eviction or enforce limits, etc). Therefore it is unlikely that anything in DataFusion will cover all usecases, so what is built in should be simple and allow users to add whatever specific caching policy they want Does that makes sense @Ted-Jiang ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
