Hi,

I drafted a proposal regarding adding iceberg rest-compliant scan planning
support to Polaris. The proposal doc can be found here:
https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing

tldr: doc proposes to first add a straightforward implementation of scan
planning in the initial phase and integrate new endpoints with polaris
authz. Subsequently, we can enhance scan planning performance with 2
independent caching layers:

   - *CachingFileIO* - FileIO wrapper that wraps existing FileIO
   implementations and introduces a configurable Caffeine-powered in-memory
   cache to speed up access to manifest files.
   - *SQL Pruning Index* - additional index stored in a rdbms and
   asynchronously updated by polaris when a new table snapshot is registered.
   The goal is to store all relevant per-file stats in a db table that will
   allow applying a pruning predicate in a single sql query. This is
   essentially a ducklake-style index but used only as a file pruning index
   rather than the source of truth. Index is allowed to lag behind the latest
   snapshot in which case ScanPlanner will use both index and underlying files
   for the relevant parts of the table metadata.

I have a POC for caching layers in a private repo which you can take a look
at as well: https://github.com/tokoko/iceberg-cache/.

thanks,
Tornike

Reply via email to