tdcmeehan opened a new issue, #9991: URL: https://github.com/apache/iceberg/issues/9991
### Feature Request / Improvement While experimenting with the features in #4518 (core: Provide mechanism to cache manifest file content), we encountered a couple of limitations which we would like to improve on and wanted to see if there are any thoughts or feedback from the Iceberg community. ### Add support for modifying the cache key Manifest caching assumes that the key to the manifest cache is a single FileIO instance. This presumption seems difficult in certain circumstances, for example, when authenticating with Hadoop and you are using impersonation for authentication—-it’s unclear how to do this with the reference implementation without creating multiple FileIOs per user, which would degrade the cache hit rate. To allow for this change in caching behavior, I am wondering if it makes sense to allow FileIO instances to supply a cache key. This way, users of the Iceberg reference implementation can supply their own FileIO (as I believe is already common practice among some query engines), and in this implementation you could specify a cache key that achieves better hit rate across multiple FileIO instances. ### Allow users to retrieve the cache statistics/potentially supply their own Caffeine cache Ideally, when we set the cache, operators would have a way of monitoring the cache hit rate for effectiveness (to see if the cache needs to be tuned), or to simply supply a cache of their own, of which operators can then set up independent monitoring. Right now, I believe the only way to access the stats is through debug-level logging, and there is not public interface exposed to retrieve the metrics. I haven’t found a way to do this yet, but I think if possible, it would be a really great idea to allow users to supply their own cache, with their own caching parameters, which they can then set up observability over so they can export these cache metrics to e.g. Prometheus. In lieu of that though, I think a simple solution would be to simply allow the cache statistics to be exported from ManifestFiles. --- I wanted to get some feedback on whether or not the community would be supportive of these contributions, or had feedback or different ideas on how to go about this. ### Query engine PrestoDB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org