I agree with Yufei - I don't think we can implement something as heavy as server-side planning directly onto Polaris as it stands. I think we need to revisit the Delegation Service discussion; it would be a great place to implement this type of functionality.
Best, Adnan Hemani On Wed, Jun 17, 2026 at 4:11 PM Yufei Gu <[email protected]> wrote: > Thanks for putting this together. The first phase sounds good to me. > > My main concern is that, without some form of delegation service, scan > planning could easily become a heavy workload that impacts Polaris > performance. > > The SQL pruning index is also a pretty big topic with a lot of design > choices around ownership, consistency, updates, and operations. I'm not > sure Polaris itself should be responsible for managing the index. > > One possible direction is to delegate scan planning and indexing to a > separate service. That would keep Polaris focused on catalog and governance > responsibilities while still enabling these optimizations. In a way, that > brings us back to the delegation service discussion. > > Curious what others think. > > Yufei > > > On Tue, Jun 16, 2026 at 12:44 AM Tornike Gurgenidze < > [email protected]> > wrote: > > > Hi, > > > > I drafted a proposal regarding adding iceberg rest-compliant scan > planning > > support to Polaris. The proposal doc can be found here: > > > > > https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing > > > > tldr: doc proposes to first add a straightforward implementation of scan > > planning in the initial phase and integrate new endpoints with polaris > > authz. Subsequently, we can enhance scan planning performance with 2 > > independent caching layers: > > > > - *CachingFileIO* - FileIO wrapper that wraps existing FileIO > > implementations and introduces a configurable Caffeine-powered > in-memory > > cache to speed up access to manifest files. > > - *SQL Pruning Index* - additional index stored in a rdbms and > > asynchronously updated by polaris when a new table snapshot is > > registered. > > The goal is to store all relevant per-file stats in a db table that > will > > allow applying a pruning predicate in a single sql query. This is > > essentially a ducklake-style index but used only as a file pruning > index > > rather than the source of truth. Index is allowed to lag behind the > > latest > > snapshot in which case ScanPlanner will use both index and underlying > > files > > for the relevant parts of the table metadata. > > > > I have a POC for caching layers in a private repo which you can take a > look > > at as well: https://github.com/tokoko/iceberg-cache/. > > > > thanks, > > Tornike > > >
