Re: [PROPOSAL] Scan Planning with Optional Caching Layers

Adnan Hemani via dev Thu, 18 Jun 2026 15:12:45 -0700

I agree with Yufei - I don't think we can implement something as heavy as
server-side planning directly onto Polaris as it stands. I think we need to
revisit the Delegation Service discussion; it would be a great place to
implement this type of functionality.


Best,
Adnan Hemani

On Wed, Jun 17, 2026 at 4:11 PM Yufei Gu <[email protected]> wrote:

> Thanks for putting this together. The first phase sounds good to me.
>
> My main concern is that, without some form of delegation service, scan
> planning could easily become a heavy workload that impacts Polaris
> performance.
>
> The SQL pruning index is also a pretty big topic with a lot of design
> choices around ownership, consistency, updates, and operations. I'm not
> sure Polaris itself should be responsible for managing the index.
>
> One possible direction is to delegate scan planning and indexing to a
> separate service. That would keep Polaris focused on catalog and governance
> responsibilities while still enabling these optimizations. In a way, that
> brings us back to the delegation service discussion.
>
> Curious what others think.
>
> Yufei
>
>
> On Tue, Jun 16, 2026 at 12:44 AM Tornike Gurgenidze <
> [email protected]>
> wrote:
>
> > Hi,
> >
> > I drafted a proposal regarding adding iceberg rest-compliant scan
> planning
> > support to Polaris. The proposal doc can be found here:
> >
> >
> https://docs.google.com/document/d/1agpz4wwXxWfEy9fJLgPRDcrzdR5USM1i9vQhOBcHo3Q/edit?usp=sharing
> >
> > tldr: doc proposes to first add a straightforward implementation of scan
> > planning in the initial phase and integrate new endpoints with polaris
> > authz. Subsequently, we can enhance scan planning performance with 2
> > independent caching layers:
> >
> >    - *CachingFileIO* - FileIO wrapper that wraps existing FileIO
> >    implementations and introduces a configurable Caffeine-powered
> in-memory
> >    cache to speed up access to manifest files.
> >    - *SQL Pruning Index* - additional index stored in a rdbms and
> >    asynchronously updated by polaris when a new table snapshot is
> > registered.
> >    The goal is to store all relevant per-file stats in a db table that
> will
> >    allow applying a pruning predicate in a single sql query. This is
> >    essentially a ducklake-style index but used only as a file pruning
> index
> >    rather than the source of truth. Index is allowed to lag behind the
> > latest
> >    snapshot in which case ScanPlanner will use both index and underlying
> > files
> >    for the relevant parts of the table metadata.
> >
> > I have a POC for caching layers in a private repo which you can take a
> look
> > at as well: https://github.com/tokoko/iceberg-cache/.
> >
> > thanks,
> > Tornike
> >
>

Re: [PROPOSAL] Scan Planning with Optional Caching Layers

Reply via email to