Re: Dedicated sync for Iceberg Index Support

Péter Váry Fri, 13 Mar 2026 03:42:59 -0700

Please note that the next Secondary Index Sync will take place on March
16th, 9:00–10:00 AM PDT (5:00-6:00 PM CET).


Proposed agenda

   - Discussion of potential use cases
      - Vector indexes
         - IVF‑PQ
         - DiskANN
      - Writer requirements - *Iceberg secondary indexes
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>*
      - Index updates should be optional by default
      - Should we support mandatory indexes at the table level?
         - This either requires storing index metadata in table metadata, or
         - Enabling mandatory indexes only via REST Catalog composite APIs.
         In this model, REST Catalogs can report and enforce the
requirement, while
         other catalog implementations may fail at commit time.
      - Index maintenance - *Iceberg secondary indexes
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>*
      - Asynchronous maintenance by default - This is typically required
      for all index types to optimize index layout, similar to table compaction.
      - REST Catalogs could enable synchronous table and index commits when
      needed - See discussion under Writer requirements.
   - Placement of index metadata (TableMetadata vs. IndexCatalog) - *Iceberg
   secondary indexes
   
<https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4oz3yd6ngr3>*
      - If we accept the following constraints, IndexCatalog enables looser
      coupling:
         - REST Catalog composite APIs are required to:
            - Read table and index metadata in a single call (instead of
            two parallel REST calls)
            - Notify writers when index updates are required
            - Support synchronous index updates
         - Until composite APIs are available, indexes can still be used,
      but will require multiple REST calls to fetch metadata.
   - Scope clarification
      - Synchronously updated indexes should remain a supported option, but
      I suggest treating them as out of scope for this proposal.


Huaxin, please update the doc with the agenda items from your side if you
have them.

See you there,
Thanks,
Peter

Péter Váry <[email protected]> ezt írta (időpont: 2026. márc. 4.,
Sze, 13:42):

> You are right Steven. If we use column ids as a reference then we should
> not have issues
>
> On Tue, Mar 3, 2026, 18:07 Steven Wu <[email protected]> wrote:
>
>> > if a column’s default value changes (a schema/metadata-only update), we
>> may still need to refresh the index to ensure it returns correct results.
>>
>> initial-default value never changes after the column is added to the
>> schema. The write-default can change but that only applies to new rows. I
>> am not sure if we have a problem here
>>
>> On Tue, Mar 3, 2026 at 5:27 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> Thanks everyone who was participating on the community sync about the
>>> indexes!
>>>
>>> Here is the recording:
>>> https://www.youtube.com/watch?v=pZFJfAlMHsM&list=PLkifVhhWtccwbfBhHk_DGOogxXNtiKvbF
>>> Here is the chat log:
>>> https://drive.google.com/file/d/1_N1suxhhdHt4aQuoPuLX24KJz32w3qW0/view
>>>
>>> Added my highlights about the general index discussion to the doc:
>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.8041k7j2n7y3#heading=h.n0hz359alh52
>>>
>>> A few takeaway from general index the discussion:
>>>
>>>>
>>>>    - We reviewed the options for synchronous and asynchronous index
>>>>    updates. We agreed that asynchronous updates should be our primary 
>>>> focus,
>>>>    while we expect that synchronous updates could still be valuable in 
>>>> certain
>>>>    scenarios. In those cases, we may be able to rely on the catalog REST 
>>>> API
>>>>    to ensure that table updates and index updates occur atomically.
>>>>
>>>>
>>>>    - We also touched on writer requirements. We would like to avoid
>>>>    requiring extra work from writers, but in some cases this might be
>>>>    necessary. Also, many tables typically have a single writer, table
>>>>    maintenance operations still need to be taken into account. We may want 
>>>> to
>>>>    introduce a flag that blocks writes unless the writer is capable of
>>>>    updating the index as well. Alternatively we could define a mechanism 
>>>> that
>>>>    ensures the table cannot be updated without updating the index.
>>>>
>>>>
>>>>    - Prashant pointed out that we must also consider values stored
>>>>    solely in table metadata when computing indexes. For example, if a 
>>>> column’s
>>>>    default value changes (a schema/metadata-only update), we may still 
>>>> need to
>>>>    refresh the index to ensure it returns correct results.
>>>>
>>>>
>>> In the next sync, I would like to follow-up with the vector indexes and
>>> if we have some time then the Index Maintenance.
>>>
>>> Thanks,
>>> Peter
>>>
>>>
>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 2.,
>>> H, 4:24):
>>>
>>>> Thanks Peter for the reminder and agenda!
>>>>
>>>> Here are some more details for the Bloom index status:
>>>>
>>>>
>>>>    - When it helps: high-cardinality =/IN predicates where min/max
>>>>    stats are not selective and many files remain after normal Iceberg 
>>>> pruning
>>>>    (“needle in a haystack”).
>>>>    - Why it helps vs Parquet row-group Bloom: row-group Bloom still
>>>>    requires opening each candidate data file (footer/Bloom pages). Puffin
>>>>    Bloom is consulted during planning, so it can prune files before 
>>>> scheduling
>>>>    scan tasks and opening most files.
>>>>    - Savings vs cost:
>>>>       - Savings: plannedFiles → afterBloom (files avoided)
>>>>       - Cost: planner reads statsFiles/statsBytes/bloomPayloadBytes
>>>>       (Puffin footer + selective blob slices)
>>>>       - Example (POC benchmark): plannedFiles=658, afterBloom=1
>>>>       (needle), with index overhead statsFiles=1, statsBytes≈17MB,
>>>>       bloomPayloadBytes≈16.8MB. The goal is to show “avoided per-file
>>>>       opens/tasks” outweighs “index read”. This benchmark is intentionally 
>>>> scoped
>>>>       to the workload the feature targets; it’s not meant to claim Bloom 
>>>> skipping
>>>>       helps all queries, which is why the feature is opt-in. Users enable 
>>>> this
>>>>       when they see selective point lookups over many files and want to 
>>>> reduce
>>>>       file opens/task scheduling.
>>>>    - Sizing: for fpp=0.01, Bloom needs 1.2 bytes per inserted value.
>>>>    Example: ~10,000 values/file → ~12 KB Bloom payload per data file (plus
>>>>    small Puffin overhead).
>>>>    - Lifecycle/maintenance: incremental shards for new files;
>>>>    missing/behind is safe (no pruning); shard compaction + snapshot
>>>>    expiration/orphan cleanup to bound artifacts.
>>>>    - Writer expectations: async maintenance is primary; inline is
>>>>    optional (inline writers may not know the final number of inserted 
>>>> values
>>>>    up front, so they can size at file close or use a scalable/growing Bloom
>>>>    filter); any error/missing/stale index ⇒ fallback (correctness 
>>>> unchanged).
>>>>    Feature is opt-in for the targeted workload.
>>>>
>>>> Looking forward to the sync!
>>>>
>>>> Best,
>>>>
>>>> Huaxin
>>>>
>>>> On Sat, Feb 28, 2026 at 3:53 AM Péter Váry <[email protected]>
>>>> wrote:
>>>>
>>>>> Please note that the next *Secondary Index Sync* will take place on *March
>>>>> 2nd, 9:00-10:00 AM PT*.
>>>>>
>>>>> *Proposed agenda*:
>>>>>
>>>>>    - Discussion of potential use‑cases
>>>>>       - Primary Key index for Flink equality‑delete resolution
>>>>>       - Secondary data layout
>>>>>          - Containing index
>>>>>          - Alternative query plans
>>>>>       - Vector index
>>>>>    - Discussion of the two alternative approaches for metadata
>>>>>    placement: keeping index metadata inside the table metadata vs. 
>>>>> managing it
>>>>>    externally through an Index Catalog
>>>>>    - Bloom filter index status update
>>>>>       - Performance justification: when this helps (high-cardinality
>>>>>       = / IN, many data files, high object-store latency) and how it 
>>>>> differs from
>>>>>       Parquet row-group Bloom filters (which still require opening the 
>>>>> data file).
>>>>>       - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>       Puffin file size), the planning cost trade-off (driver index reads 
>>>>> vs
>>>>>       executor file opens), and mitigations via caching.
>>>>>       - Lifecycle / maintenance: incremental production as new data
>>>>>       files arrive, behavior when the index is missing/behind, and
>>>>>       sharding/compaction plus cleanup to avoid accumulating too many 
>>>>> small
>>>>>       Puffin files over time.
>>>>>       - Writer expectations: inline (optional) vs asynchronous
>>>>>       (primary) index creation.
>>>>>
>>>>> Looking forward to diving into this topic together.
>>>>>
>>>>> See you all there,
>>>>> Peter
>>>>>
>>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>>> febr. 25., Sze, 10:04):
>>>>>
>>>>>> Dan kindly set up a dedicated public Slack channel (*#indexes)* for
>>>>>> the Secondary Index discussion.
>>>>>> You can find it here:
>>>>>> https://apache-iceberg.slack.com/archives/C0AFDSU3EUU
>>>>>> Feel free to join if you’d like to participate in the discussion or
>>>>>> simply follow along.
>>>>>>
>>>>>> Thanks,
>>>>>> Peter
>>>>>>
>>>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>>>> febr. 24., K, 12:52):
>>>>>>
>>>>>>> We had an extended discussion on Slack with Dan, Steven, and Yufei
>>>>>>> about where index metadata should live. In particular, whether it 
>>>>>>> should be
>>>>>>> stored directly in the table metadata or maintained in a dedicated index
>>>>>>> catalog. I tried to capture this discussion in the Layout
>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4oz3yd6ngr3>
>>>>>>>  section
>>>>>>> of the document.
>>>>>>>
>>>>>>> Once the decision is made, this section can be shortened, but for
>>>>>>> now it is intentionally more detailed so that everyone can see the
>>>>>>> arguments that were discussed and so that those who could not 
>>>>>>> participate
>>>>>>> synchronously can still follow and provide feedback offline.
>>>>>>>
>>>>>>> In short, we are currently *leaning toward storing index metadata
>>>>>>> in its own catalog*, while allowing REST catalogs to expose a
>>>>>>> composite endpoint that returns both table and index metadata in a 
>>>>>>> single
>>>>>>> round trip. This is similar in spirit to the universal load endpoint
>>>>>>> discussed in the context of materialized view loading.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Peter
>>>>>>>
>>>>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>>>>> febr. 19., Cs, 14:06):
>>>>>>>
>>>>>>>> Thanks Huaxin for posting the recording and the meeting notes.
>>>>>>>>
>>>>>>>> I used this time to also address the questions collected during the
>>>>>>>> sync:
>>>>>>>>
>>>>>>>>    - Collected some representative use cases. See the example
>>>>>>>>    use-cases
>>>>>>>>    
>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.i4gt8za99j9d>
>>>>>>>>  paragraph.
>>>>>>>>    Anyone should feel free to suggest their own.
>>>>>>>>    - Collected my thoughts about the writer requirements. See the 
>>>>>>>> writer
>>>>>>>>    requirements
>>>>>>>>    
>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>
>>>>>>>>    paragraph.
>>>>>>>>    - Centralized the index maintenance related parts. See the index
>>>>>>>>    maintenance
>>>>>>>>    
>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>
>>>>>>>>    paragraph.
>>>>>>>>
>>>>>>>> Might be a bit premature but created a PR
>>>>>>>> <https://github.com/apache/iceberg/pull/15101> with the
>>>>>>>> proposed index catalog related changes, so the ones who are more code
>>>>>>>> oriented could take a look at it too.
>>>>>>>>
>>>>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. febr.
>>>>>>>> 19., Cs, 5:34):
>>>>>>>>
>>>>>>>>> Hi Everyone,
>>>>>>>>>
>>>>>>>>> Here are the recording and notes from the Iceberg Index Support
>>>>>>>>> Sync on 2/11.
>>>>>>>>>
>>>>>>>>> Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>>>>>>
>>>>>>>>> Notes:
>>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>>>>>>
>>>>>>>>> The meeting will move to biweekly, Mondays 9–10am PST, starting
>>>>>>>>> March 2.
>>>>>>>>>
>>>>>>>>> Since the sync, I updated the Bloom skipping index proposal
>>>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
>>>>>>>>> to address the discussion questions, specifically:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    - Performance justification: when this helps (high-cardinality
>>>>>>>>>    = / IN, many data files, high object-store latency) and how it 
>>>>>>>>> differs from
>>>>>>>>>    Parquet row-group Bloom filters (which still require opening the 
>>>>>>>>> data file).
>>>>>>>>>    - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>>>>>    Puffin file size), the planning cost trade-off (driver index reads 
>>>>>>>>> vs
>>>>>>>>>    executor file opens), and mitigations via caching.
>>>>>>>>>    - Lifecycle / maintenance: incremental production as new data
>>>>>>>>>    files arrive, behavior when the index is missing/behind, and
>>>>>>>>>    sharding/compaction plus cleanup to avoid accumulating too many 
>>>>>>>>> small
>>>>>>>>>    Puffin files over time.
>>>>>>>>>    - Writer expectations: inline (optional) vs asynchronous
>>>>>>>>>    (primary) index creation.
>>>>>>>>>
>>>>>>>>> I also implemented a Spark 4.1 POC
>>>>>>>>> <https://github.com/apache/iceberg/pull/15311> and a local
>>>>>>>>> benchmark to quantify both the pruning impact (plannedFiles → 
>>>>>>>>> afterBloom)
>>>>>>>>> and the index read overhead (statsFiles, statsBytes, 
>>>>>>>>> bloomPayloadBytes) for
>>>>>>>>> point predicates on high-cardinality columns. Please take a look and 
>>>>>>>>> let me
>>>>>>>>> know if you have any questions or feedback.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Huaxin
>>>>>>>>>
>>>>>>>>> On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Reminder for tomorrow's sync on Iceberg Index Support.
>>>>>>>>>>
>>>>>>>>>> Wednesday: Feb. 11 9:00 – 10:00am
>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>> Google Meet joining info
>>>>>>>>>> Video call link: meet.google.com/nsp-ctyr-khk
>>>>>>>>>> Design doc:
>>>>>>>>>>
>>>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>>>>>>>
>>>>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Huaxin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Thanks Huaxin and Steven for organizing this. Looking forward to
>>>>>>>>>>> meet you all next week!
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> We set up the dev calendar event with a new google meet link.
>>>>>>>>>>>> Please ignore the link from Huaxin's original email.
>>>>>>>>>>>>
>>>>>>>>>>>> The dev calendar has the correct info (including the new
>>>>>>>>>>>> meeting link)
>>>>>>>>>>>>
>>>>>>>>>>>> Iceberg Index Support Sync
>>>>>>>>>>>> Wednesday, February 11 · 9:00 – 10:00am
>>>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>>>> Google Meet joining info
>>>>>>>>>>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry, I meant PST (not EST) :)
>>>>>>>>>>>>> Looking forward to the discussion!
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Huaxin,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for starting the sync!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>>>>>>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>>>>>>>>>>> not EST. Maybe it's a typo?
>>>>>>>>>>>>>> Otherwise, looking forward to the discussion!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>> Shawn
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>>>>>>>>>>>> support. Here is the existing discussion thread:
>>>>>>>>>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty
>>>>>>>>>>>>>>> .
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To ground the discussion, here are the two proposals:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    - Peter's proposal
>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>>>>>>>>>>  (overall
>>>>>>>>>>>>>>>    index support)
>>>>>>>>>>>>>>>    - My proposal
>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>>>>>>>>>>    (bloom filter skipping index)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>>>>>>>>>>>> starting next Wednesday (2/11). After FileFormat sync finishes, 
>>>>>>>>>>>>>>> we plan to
>>>>>>>>>>>>>>> use that slot and switch to every other Monday, 9 AM to 10 AM 
>>>>>>>>>>>>>>> EST.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Huaxin
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>

Re: Dedicated sync for Iceberg Index Support

Reply via email to