Re: Dedicated sync for Iceberg Index Support

Péter Váry Tue, 03 Mar 2026 05:25:47 -0800

Thanks everyone who was participating on the community sync about the
indexes!


Here is the recording:
https://www.youtube.com/watch?v=pZFJfAlMHsM&list=PLkifVhhWtccwbfBhHk_DGOogxXNtiKvbF
Here is the chat log:
https://drive.google.com/file/d/1_N1suxhhdHt4aQuoPuLX24KJz32w3qW0/view

Added my highlights about the general index discussion to the doc:
https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.8041k7j2n7y3#heading=h.n0hz359alh52

A few takeaway from general index the discussion:

>
>    - We reviewed the options for synchronous and asynchronous index
>    updates. We agreed that asynchronous updates should be our primary focus,
>    while we expect that synchronous updates could still be valuable in certain
>    scenarios. In those cases, we may be able to rely on the catalog REST API
>    to ensure that table updates and index updates occur atomically.
>
>
>    - We also touched on writer requirements. We would like to avoid
>    requiring extra work from writers, but in some cases this might be
>    necessary. Also, many tables typically have a single writer, table
>    maintenance operations still need to be taken into account. We may want to
>    introduce a flag that blocks writes unless the writer is capable of
>    updating the index as well. Alternatively we could define a mechanism that
>    ensures the table cannot be updated without updating the index.
>
>
>    - Prashant pointed out that we must also consider values stored solely
>    in table metadata when computing indexes. For example, if a column’s
>    default value changes (a schema/metadata-only update), we may still need to
>    refresh the index to ensure it returns correct results.
>
>
In the next sync, I would like to follow-up with the vector indexes and if
we have some time then the Index Maintenance.

Thanks,
Peter


huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 2., H,
4:24):

> Thanks Peter for the reminder and agenda!
>
> Here are some more details for the Bloom index status:
>
>
>    - When it helps: high-cardinality =/IN predicates where min/max stats
>    are not selective and many files remain after normal Iceberg pruning
>    (“needle in a haystack”).
>    - Why it helps vs Parquet row-group Bloom: row-group Bloom still
>    requires opening each candidate data file (footer/Bloom pages). Puffin
>    Bloom is consulted during planning, so it can prune files before scheduling
>    scan tasks and opening most files.
>    - Savings vs cost:
>       - Savings: plannedFiles → afterBloom (files avoided)
>       - Cost: planner reads statsFiles/statsBytes/bloomPayloadBytes
>       (Puffin footer + selective blob slices)
>       - Example (POC benchmark): plannedFiles=658, afterBloom=1 (needle),
>       with index overhead statsFiles=1, statsBytes≈17MB,
>       bloomPayloadBytes≈16.8MB. The goal is to show “avoided per-file
>       opens/tasks” outweighs “index read”. This benchmark is intentionally 
> scoped
>       to the workload the feature targets; it’s not meant to claim Bloom 
> skipping
>       helps all queries, which is why the feature is opt-in. Users enable this
>       when they see selective point lookups over many files and want to reduce
>       file opens/task scheduling.
>    - Sizing: for fpp=0.01, Bloom needs 1.2 bytes per inserted value.
>    Example: ~10,000 values/file → ~12 KB Bloom payload per data file (plus
>    small Puffin overhead).
>    - Lifecycle/maintenance: incremental shards for new files;
>    missing/behind is safe (no pruning); shard compaction + snapshot
>    expiration/orphan cleanup to bound artifacts.
>    - Writer expectations: async maintenance is primary; inline is
>    optional (inline writers may not know the final number of inserted values
>    up front, so they can size at file close or use a scalable/growing Bloom
>    filter); any error/missing/stale index ⇒ fallback (correctness unchanged).
>    Feature is opt-in for the targeted workload.
>
> Looking forward to the sync!
>
> Best,
>
> Huaxin
>
> On Sat, Feb 28, 2026 at 3:53 AM Péter Váry <[email protected]>
> wrote:
>
>> Please note that the next *Secondary Index Sync* will take place on *March
>> 2nd, 9:00-10:00 AM PT*.
>>
>> *Proposed agenda*:
>>
>>    - Discussion of potential use‑cases
>>       - Primary Key index for Flink equality‑delete resolution
>>       - Secondary data layout
>>          - Containing index
>>          - Alternative query plans
>>       - Vector index
>>    - Discussion of the two alternative approaches for metadata
>>    placement: keeping index metadata inside the table metadata vs. managing 
>> it
>>    externally through an Index Catalog
>>    - Bloom filter index status update
>>       - Performance justification: when this helps (high-cardinality = /
>>       IN, many data files, high object-store latency) and how it differs from
>>       Parquet row-group Bloom filters (which still require opening the data 
>> file).
>>       - Cost / scalability: rough sizing (Bloom blob size per file,
>>       Puffin file size), the planning cost trade-off (driver index reads vs
>>       executor file opens), and mitigations via caching.
>>       - Lifecycle / maintenance: incremental production as new data
>>       files arrive, behavior when the index is missing/behind, and
>>       sharding/compaction plus cleanup to avoid accumulating too many small
>>       Puffin files over time.
>>       - Writer expectations: inline (optional) vs asynchronous (primary)
>>       index creation.
>>
>> Looking forward to diving into this topic together.
>>
>> See you all there,
>> Peter
>>
>> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
>> 25., Sze, 10:04):
>>
>>> Dan kindly set up a dedicated public Slack channel (*#indexes)* for the
>>> Secondary Index discussion.
>>> You can find it here:
>>> https://apache-iceberg.slack.com/archives/C0AFDSU3EUU
>>> Feel free to join if you’d like to participate in the discussion or
>>> simply follow along.
>>>
>>> Thanks,
>>> Peter
>>>
>>> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
>>> 24., K, 12:52):
>>>
>>>> We had an extended discussion on Slack with Dan, Steven, and Yufei
>>>> about where index metadata should live. In particular, whether it should be
>>>> stored directly in the table metadata or maintained in a dedicated index
>>>> catalog. I tried to capture this discussion in the Layout
>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4oz3yd6ngr3>
>>>>  section
>>>> of the document.
>>>>
>>>> Once the decision is made, this section can be shortened, but for now
>>>> it is intentionally more detailed so that everyone can see the arguments
>>>> that were discussed and so that those who could not participate
>>>> synchronously can still follow and provide feedback offline.
>>>>
>>>> In short, we are currently *leaning toward storing index metadata in
>>>> its own catalog*, while allowing REST catalogs to expose a composite
>>>> endpoint that returns both table and index metadata in a single round trip.
>>>> This is similar in spirit to the universal load endpoint discussed in the
>>>> context of materialized view loading.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>> febr. 19., Cs, 14:06):
>>>>
>>>>> Thanks Huaxin for posting the recording and the meeting notes.
>>>>>
>>>>> I used this time to also address the questions collected during the
>>>>> sync:
>>>>>
>>>>>    - Collected some representative use cases. See the example
>>>>>    use-cases
>>>>>    
>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.i4gt8za99j9d>
>>>>>  paragraph.
>>>>>    Anyone should feel free to suggest their own.
>>>>>    - Collected my thoughts about the writer requirements. See the writer
>>>>>    requirements
>>>>>    
>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>
>>>>>    paragraph.
>>>>>    - Centralized the index maintenance related parts. See the index
>>>>>    maintenance
>>>>>    
>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>
>>>>>    paragraph.
>>>>>
>>>>> Might be a bit premature but created a PR
>>>>> <https://github.com/apache/iceberg/pull/15101> with the
>>>>> proposed index catalog related changes, so the ones who are more code
>>>>> oriented could take a look at it too.
>>>>>
>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. febr.
>>>>> 19., Cs, 5:34):
>>>>>
>>>>>> Hi Everyone,
>>>>>>
>>>>>> Here are the recording and notes from the Iceberg Index Support Sync
>>>>>> on 2/11.
>>>>>>
>>>>>> Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>>>
>>>>>> Notes:
>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>>>
>>>>>> The meeting will move to biweekly, Mondays 9–10am PST, starting March
>>>>>> 2.
>>>>>>
>>>>>> Since the sync, I updated the Bloom skipping index proposal
>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
>>>>>> to address the discussion questions, specifically:
>>>>>>
>>>>>>
>>>>>>    - Performance justification: when this helps (high-cardinality =
>>>>>>    / IN, many data files, high object-store latency) and how it differs 
>>>>>> from
>>>>>>    Parquet row-group Bloom filters (which still require opening the data 
>>>>>> file).
>>>>>>    - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>>    Puffin file size), the planning cost trade-off (driver index reads vs
>>>>>>    executor file opens), and mitigations via caching.
>>>>>>    - Lifecycle / maintenance: incremental production as new data
>>>>>>    files arrive, behavior when the index is missing/behind, and
>>>>>>    sharding/compaction plus cleanup to avoid accumulating too many small
>>>>>>    Puffin files over time.
>>>>>>    - Writer expectations: inline (optional) vs asynchronous
>>>>>>    (primary) index creation.
>>>>>>
>>>>>> I also implemented a Spark 4.1 POC
>>>>>> <https://github.com/apache/iceberg/pull/15311> and a local benchmark
>>>>>> to quantify both the pruning impact (plannedFiles → afterBloom) and the
>>>>>> index read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
>>>>>> predicates on high-cardinality columns. Please take a look and let me 
>>>>>> know
>>>>>> if you have any questions or feedback.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Huaxin
>>>>>>
>>>>>> On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Reminder for tomorrow's sync on Iceberg Index Support.
>>>>>>>
>>>>>>> Wednesday: Feb. 11 9:00 – 10:00am
>>>>>>> Time zone: America/Los_Angeles
>>>>>>> Google Meet joining info
>>>>>>> Video call link: meet.google.com/nsp-ctyr-khk
>>>>>>> Design doc:
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>>>>
>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Huaxin
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks Huaxin and Steven for organizing this. Looking forward to
>>>>>>>> meet you all next week!
>>>>>>>>
>>>>>>>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>>>>>>
>>>>>>>>> We set up the dev calendar event with a new google meet link.
>>>>>>>>> Please ignore the link from Huaxin's original email.
>>>>>>>>>
>>>>>>>>> The dev calendar has the correct info (including the new meeting
>>>>>>>>> link)
>>>>>>>>>
>>>>>>>>> Iceberg Index Support Sync
>>>>>>>>> Wednesday, February 11 · 9:00 – 10:00am
>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>> Google Meet joining info
>>>>>>>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>>>>>>
>>>>>>>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Sorry, I meant PST (not EST) :)
>>>>>>>>>> Looking forward to the discussion!
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Huaxin,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for starting the sync!
>>>>>>>>>>>
>>>>>>>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>>>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>>>>>>>> not EST. Maybe it's a typo?
>>>>>>>>>>> Otherwise, looking forward to the discussion!
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Shawn
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>>>>>>>>> support. Here is the existing discussion thread:
>>>>>>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> To ground the discussion, here are the two proposals:
>>>>>>>>>>>>
>>>>>>>>>>>>    - Peter's proposal
>>>>>>>>>>>>    
>>>>>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>>>>>>>  (overall
>>>>>>>>>>>>    index support)
>>>>>>>>>>>>    - My proposal
>>>>>>>>>>>>    
>>>>>>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>>>>>>>    (bloom filter skipping index)
>>>>>>>>>>>>
>>>>>>>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>>>>>>>>> starting next Wednesday (2/11). After FileFormat sync finishes, we 
>>>>>>>>>>>> plan to
>>>>>>>>>>>> use that slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>>>>>>>
>>>>>>>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Huaxin
>>>>>>>>>>>>
>>>>>>>>>>>

Re: Dedicated sync for Iceberg Index Support

Reply via email to