tdcmeehan opened a new issue, #9991:
URL: https://github.com/apache/iceberg/issues/9991

   ### Feature Request / Improvement
   
   While experimenting with the features in #4518 (core: Provide mechanism to 
cache manifest file content), we encountered a couple of limitations which we 
would like to improve on and wanted to see if there are any thoughts or 
feedback from the Iceberg community.
   
   ### Add support for modifying the cache key
   
   Manifest caching assumes that the key to the manifest cache is a single 
FileIO instance.  This presumption seems difficult in certain circumstances, 
for example, when authenticating with Hadoop and you are using impersonation 
for authentication—-it’s unclear how to do this with the reference 
implementation without creating multiple FileIOs per user, which would degrade 
the cache hit rate.
   
   To allow for this change in caching behavior, I am wondering if it makes 
sense to allow FileIO instances to supply a cache key.  This way, users of the 
Iceberg reference implementation can supply their own FileIO (as I believe is 
already common practice among some query engines), and in this implementation 
you could specify a cache key that achieves better hit rate across multiple 
FileIO instances.
   
   ### Allow users to retrieve the cache statistics/potentially supply their 
own Caffeine cache
   
   Ideally, when we set the cache, operators would have a way of monitoring the 
cache hit rate for effectiveness (to see if the cache needs to be tuned), or to 
simply supply a cache of their own, of which operators can then set up 
independent monitoring.  Right now, I believe the only way to access the stats 
is through debug-level logging, and there is not public interface exposed to 
retrieve the metrics.
   
   I haven’t found a way to do this yet, but I think if possible, it would be a 
really great idea to allow users to supply their own cache, with their own 
caching parameters, which they can then set up observability over so they can 
export these cache metrics to e.g. Prometheus.  In lieu of that though, I think 
a simple solution would be to simply allow the cache statistics to be exported 
from ManifestFiles.
   
   ---
   
   I wanted to get some feedback on whether or not the community would be 
supportive of these contributions, or had feedback or different ideas on how to 
go about this.
   
   ### Query engine
   
   PrestoDB


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to