Hi Tornike, To clarify, I support Phase 1. That was actually the main point of the first paragraph in my previous email. Could we focus on Phase 1 first? We can also discuss the other topics in parallel.
Thanks, Yufei On Fri, Jun 19, 2026 at 4:08 PM Dmitri Bourlatchkov <[email protected]> wrote: > Hi Tornike, > > It's a very interesting proposal. Thanks for submitting it! > > The doc LGTM - no particular comments there. > > I imagine the actual caching layer might receive some more feedback and > alternative suggestions later, but I'm sure it will invigorate the project. > > Breaking the implementation plan into multiple phases is certainly a > good idea. > > Re: performance concerns, I propose making the implementation modular and > composable (which is the approach followed by [4115]). > > Users of the ASF binaries will be able to switch the feature on/off > according to their needs and avoid unnecessary overhead if the do not need > this functionality. > > Downstream builds will be able to include/exclude related modules and > further optimize the server's image this way. > > If a suitable external service (such as delegation) becomes available > later, the modular design of this feature should simplify integrating with > it. > > All in all, I support implementing this proposal in Polaris. Making it > available in ASF releases will promote user feedback, which will inform > further development of this feature. > > [4115] https://github.com/apache/polaris/pull/4115 > > Cheers, > Dmitri. > > On Fri, Jun 19, 2026 at 9:20 AM Tornike Gurgenidze <[email protected] > > > wrote: > > > Yufei, Adnan, thanks for taking a look at the proposal. > > > > I definitely understand the concern and agree that there should be a way > to > > avoid including compute-intensive workload in polaris server and/or > > metadata db. Still, my preferred approach would be to implement entire > > functionality first and make it configurable later on when we have better > > idea of how Delegation Service will look like (planning will sit behind a > > feature flag, after all). if that sounds fine, I can adjust the proposal > to > > include eventual integration with delegation service (both for > ScanPlanner > > SPI and indexing) rather than make Delegation Service a hard > prerequisite. > > > > regarding SQL pruning index: I agree that it's a big topic and probably > > valuable even outside of the scope of polaris. still.. since there's no > > existing spec for anything like that outside of polaris, I think it makes > > sense to start laying the foundation for it here for this particular use > > case, don't you agree? In terms of compute, the actual indexing can > happen > > "externally", maybe orchestrated by polaris cli rather than as a side > > effect of a snapshot update. > > > > In short, while I agree that we should coordinate planning and delegation > > service, I'd much rather implement the feature first and then build > > delegation service around it especially since there's both types of > > delegation requirement here (invoking external planner, notifying > external > > indexer). > > > > Thanks, > > Tornike > > > > On Fri, Jun 19, 2026 at 2:12 AM Adnan Hemani via dev < > > [email protected]> > > wrote: > > > > > I agree with Yufei - I don't think we can implement something as heavy > as > > > server-side planning directly onto Polaris as it stands. I think we > need > > to > > > revisit the Delegation Service discussion; it would be a great place to > > > implement this type of functionality. > > > > > > Best, > > > Adnan Hemani > > > > > > On Wed, Jun 17, 2026 at 4:11 PM Yufei Gu <[email protected]> wrote: > > > > > > > Thanks for putting this together. The first phase sounds good to me. > > > > > > > > My main concern is that, without some form of delegation service, > scan > > > > planning could easily become a heavy workload that impacts Polaris > > > > performance. > > > > > > > > The SQL pruning index is also a pretty big topic with a lot of design > > > > choices around ownership, consistency, updates, and operations. I'm > not > > > > sure Polaris itself should be responsible for managing the index. > > > > > > > > One possible direction is to delegate scan planning and indexing to a > > > > separate service. That would keep Polaris focused on catalog and > > > governance > > > > responsibilities while still enabling these optimizations. In a way, > > that > > > > brings us back to the delegation service discussion. > > > > > > > > Curious what others think. > > > > > > > > Yufei > > > > > > > > > > > > On Tue, Jun 16, 2026 at 12:44 AM Tornike Gurgenidze < > > > > [email protected]> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I drafted a proposal regarding adding iceberg rest-compliant scan > > > > planning > > > > > support to Polaris. The proposal doc can be found here: > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing > > > > > > > > > > tldr: doc proposes to first add a straightforward implementation of > > > scan > > > > > planning in the initial phase and integrate new endpoints with > > polaris > > > > > authz. Subsequently, we can enhance scan planning performance with > 2 > > > > > independent caching layers: > > > > > > > > > > - *CachingFileIO* - FileIO wrapper that wraps existing FileIO > > > > > implementations and introduces a configurable Caffeine-powered > > > > in-memory > > > > > cache to speed up access to manifest files. > > > > > - *SQL Pruning Index* - additional index stored in a rdbms and > > > > > asynchronously updated by polaris when a new table snapshot is > > > > > registered. > > > > > The goal is to store all relevant per-file stats in a db table > > that > > > > will > > > > > allow applying a pruning predicate in a single sql query. This > is > > > > > essentially a ducklake-style index but used only as a file > pruning > > > > index > > > > > rather than the source of truth. Index is allowed to lag behind > > the > > > > > latest > > > > > snapshot in which case ScanPlanner will use both index and > > > underlying > > > > > files > > > > > for the relevant parts of the table metadata. > > > > > > > > > > I have a POC for caching layers in a private repo which you can > take > > a > > > > look > > > > > at as well: https://github.com/tokoko/iceberg-cache/. > > > > > > > > > > thanks, > > > > > Tornike > > > > > > > > > > > > > > >
