Re: Dedicated sync for Iceberg Index Support

Steven Wu Tue, 03 Mar 2026 09:07:21 -0800

> if a column’s default value changes (a schema/metadata-only update), we
may still need to refresh the index to ensure it returns correct results.


initial-default value never changes after the column is added to the
schema. The write-default can change but that only applies to new rows. I
am not sure if we have a problem here

On Tue, Mar 3, 2026 at 5:27 AM Péter Váry <[email protected]>
wrote:

> Thanks everyone who was participating on the community sync about the
> indexes!
>
> Here is the recording:
> https://www.youtube.com/watch?v=pZFJfAlMHsM&list=PLkifVhhWtccwbfBhHk_DGOogxXNtiKvbF
> Here is the chat log:
> https://drive.google.com/file/d/1_N1suxhhdHt4aQuoPuLX24KJz32w3qW0/view
>
> Added my highlights about the general index discussion to the doc:
> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.8041k7j2n7y3#heading=h.n0hz359alh52
>
> A few takeaway from general index the discussion:
>
>>
>>    - We reviewed the options for synchronous and asynchronous index
>>    updates. We agreed that asynchronous updates should be our primary focus,
>>    while we expect that synchronous updates could still be valuable in 
>> certain
>>    scenarios. In those cases, we may be able to rely on the catalog REST API
>>    to ensure that table updates and index updates occur atomically.
>>
>>
>>    - We also touched on writer requirements. We would like to avoid
>>    requiring extra work from writers, but in some cases this might be
>>    necessary. Also, many tables typically have a single writer, table
>>    maintenance operations still need to be taken into account. We may want to
>>    introduce a flag that blocks writes unless the writer is capable of
>>    updating the index as well. Alternatively we could define a mechanism that
>>    ensures the table cannot be updated without updating the index.
>>
>>
>>    - Prashant pointed out that we must also consider values stored
>>    solely in table metadata when computing indexes. For example, if a 
>> column’s
>>    default value changes (a schema/metadata-only update), we may still need 
>> to
>>    refresh the index to ensure it returns correct results.
>>
>>
> In the next sync, I would like to follow-up with the vector indexes and if
> we have some time then the Index Maintenance.
>
> Thanks,
> Peter
>
>
> huaxin gao <[email protected]> ezt írta (időpont: 2026. márc. 2., H,
> 4:24):
>
>> Thanks Peter for the reminder and agenda!
>>
>> Here are some more details for the Bloom index status:
>>
>>
>>    - When it helps: high-cardinality =/IN predicates where min/max stats
>>    are not selective and many files remain after normal Iceberg pruning
>>    (“needle in a haystack”).
>>    - Why it helps vs Parquet row-group Bloom: row-group Bloom still
>>    requires opening each candidate data file (footer/Bloom pages). Puffin
>>    Bloom is consulted during planning, so it can prune files before 
>> scheduling
>>    scan tasks and opening most files.
>>    - Savings vs cost:
>>       - Savings: plannedFiles → afterBloom (files avoided)
>>       - Cost: planner reads statsFiles/statsBytes/bloomPayloadBytes
>>       (Puffin footer + selective blob slices)
>>       - Example (POC benchmark): plannedFiles=658, afterBloom=1
>>       (needle), with index overhead statsFiles=1, statsBytes≈17MB,
>>       bloomPayloadBytes≈16.8MB. The goal is to show “avoided per-file
>>       opens/tasks” outweighs “index read”. This benchmark is intentionally 
>> scoped
>>       to the workload the feature targets; it’s not meant to claim Bloom 
>> skipping
>>       helps all queries, which is why the feature is opt-in. Users enable 
>> this
>>       when they see selective point lookups over many files and want to 
>> reduce
>>       file opens/task scheduling.
>>    - Sizing: for fpp=0.01, Bloom needs 1.2 bytes per inserted value.
>>    Example: ~10,000 values/file → ~12 KB Bloom payload per data file (plus
>>    small Puffin overhead).
>>    - Lifecycle/maintenance: incremental shards for new files;
>>    missing/behind is safe (no pruning); shard compaction + snapshot
>>    expiration/orphan cleanup to bound artifacts.
>>    - Writer expectations: async maintenance is primary; inline is
>>    optional (inline writers may not know the final number of inserted values
>>    up front, so they can size at file close or use a scalable/growing Bloom
>>    filter); any error/missing/stale index ⇒ fallback (correctness unchanged).
>>    Feature is opt-in for the targeted workload.
>>
>> Looking forward to the sync!
>>
>> Best,
>>
>> Huaxin
>>
>> On Sat, Feb 28, 2026 at 3:53 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> Please note that the next *Secondary Index Sync* will take place on *March
>>> 2nd, 9:00-10:00 AM PT*.
>>>
>>> *Proposed agenda*:
>>>
>>>    - Discussion of potential use‑cases
>>>       - Primary Key index for Flink equality‑delete resolution
>>>       - Secondary data layout
>>>          - Containing index
>>>          - Alternative query plans
>>>       - Vector index
>>>    - Discussion of the two alternative approaches for metadata
>>>    placement: keeping index metadata inside the table metadata vs. managing 
>>> it
>>>    externally through an Index Catalog
>>>    - Bloom filter index status update
>>>       - Performance justification: when this helps (high-cardinality =
>>>       / IN, many data files, high object-store latency) and how it differs 
>>> from
>>>       Parquet row-group Bloom filters (which still require opening the data 
>>> file).
>>>       - Cost / scalability: rough sizing (Bloom blob size per file,
>>>       Puffin file size), the planning cost trade-off (driver index reads vs
>>>       executor file opens), and mitigations via caching.
>>>       - Lifecycle / maintenance: incremental production as new data
>>>       files arrive, behavior when the index is missing/behind, and
>>>       sharding/compaction plus cleanup to avoid accumulating too many small
>>>       Puffin files over time.
>>>       - Writer expectations: inline (optional) vs asynchronous
>>>       (primary) index creation.
>>>
>>> Looking forward to diving into this topic together.
>>>
>>> See you all there,
>>> Peter
>>>
>>> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
>>> 25., Sze, 10:04):
>>>
>>>> Dan kindly set up a dedicated public Slack channel (*#indexes)* for
>>>> the Secondary Index discussion.
>>>> You can find it here:
>>>> https://apache-iceberg.slack.com/archives/C0AFDSU3EUU
>>>> Feel free to join if you’d like to participate in the discussion or
>>>> simply follow along.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>> febr. 24., K, 12:52):
>>>>
>>>>> We had an extended discussion on Slack with Dan, Steven, and Yufei
>>>>> about where index metadata should live. In particular, whether it should 
>>>>> be
>>>>> stored directly in the table metadata or maintained in a dedicated index
>>>>> catalog. I tried to capture this discussion in the Layout
>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4oz3yd6ngr3>
>>>>>  section
>>>>> of the document.
>>>>>
>>>>> Once the decision is made, this section can be shortened, but for now
>>>>> it is intentionally more detailed so that everyone can see the arguments
>>>>> that were discussed and so that those who could not participate
>>>>> synchronously can still follow and provide feedback offline.
>>>>>
>>>>> In short, we are currently *leaning toward storing index metadata in
>>>>> its own catalog*, while allowing REST catalogs to expose a composite
>>>>> endpoint that returns both table and index metadata in a single round 
>>>>> trip.
>>>>> This is similar in spirit to the universal load endpoint discussed in the
>>>>> context of materialized view loading.
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>> Péter Váry <[email protected]> ezt írta (időpont: 2026.
>>>>> febr. 19., Cs, 14:06):
>>>>>
>>>>>> Thanks Huaxin for posting the recording and the meeting notes.
>>>>>>
>>>>>> I used this time to also address the questions collected during the
>>>>>> sync:
>>>>>>
>>>>>>    - Collected some representative use cases. See the example
>>>>>>    use-cases
>>>>>>    
>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.i4gt8za99j9d>
>>>>>>  paragraph.
>>>>>>    Anyone should feel free to suggest their own.
>>>>>>    - Collected my thoughts about the writer requirements. See the writer
>>>>>>    requirements
>>>>>>    
>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>
>>>>>>    paragraph.
>>>>>>    - Centralized the index maintenance related parts. See the index
>>>>>>    maintenance
>>>>>>    
>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>
>>>>>>    paragraph.
>>>>>>
>>>>>> Might be a bit premature but created a PR
>>>>>> <https://github.com/apache/iceberg/pull/15101> with the
>>>>>> proposed index catalog related changes, so the ones who are more code
>>>>>> oriented could take a look at it too.
>>>>>>
>>>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. febr.
>>>>>> 19., Cs, 5:34):
>>>>>>
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>> Here are the recording and notes from the Iceberg Index Support Sync
>>>>>>> on 2/11.
>>>>>>>
>>>>>>> Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>>>>
>>>>>>> Notes:
>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>>>>
>>>>>>> The meeting will move to biweekly, Mondays 9–10am PST, starting
>>>>>>> March 2.
>>>>>>>
>>>>>>> Since the sync, I updated the Bloom skipping index proposal
>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
>>>>>>> to address the discussion questions, specifically:
>>>>>>>
>>>>>>>
>>>>>>>    - Performance justification: when this helps (high-cardinality =
>>>>>>>    / IN, many data files, high object-store latency) and how it differs 
>>>>>>> from
>>>>>>>    Parquet row-group Bloom filters (which still require opening the 
>>>>>>> data file).
>>>>>>>    - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>>>    Puffin file size), the planning cost trade-off (driver index reads vs
>>>>>>>    executor file opens), and mitigations via caching.
>>>>>>>    - Lifecycle / maintenance: incremental production as new data
>>>>>>>    files arrive, behavior when the index is missing/behind, and
>>>>>>>    sharding/compaction plus cleanup to avoid accumulating too many small
>>>>>>>    Puffin files over time.
>>>>>>>    - Writer expectations: inline (optional) vs asynchronous
>>>>>>>    (primary) index creation.
>>>>>>>
>>>>>>> I also implemented a Spark 4.1 POC
>>>>>>> <https://github.com/apache/iceberg/pull/15311> and a local
>>>>>>> benchmark to quantify both the pruning impact (plannedFiles → 
>>>>>>> afterBloom)
>>>>>>> and the index read overhead (statsFiles, statsBytes, bloomPayloadBytes) 
>>>>>>> for
>>>>>>> point predicates on high-cardinality columns. Please take a look and 
>>>>>>> let me
>>>>>>> know if you have any questions or feedback.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Huaxin
>>>>>>>
>>>>>>> On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Reminder for tomorrow's sync on Iceberg Index Support.
>>>>>>>>
>>>>>>>> Wednesday: Feb. 11 9:00 – 10:00am
>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>> Google Meet joining info
>>>>>>>> Video call link: meet.google.com/nsp-ctyr-khk
>>>>>>>> Design doc:
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>>>>>
>>>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Huaxin
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Thanks Huaxin and Steven for organizing this. Looking forward to
>>>>>>>>> meet you all next week!
>>>>>>>>>
>>>>>>>>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> We set up the dev calendar event with a new google meet link.
>>>>>>>>>> Please ignore the link from Huaxin's original email.
>>>>>>>>>>
>>>>>>>>>> The dev calendar has the correct info (including the new meeting
>>>>>>>>>> link)
>>>>>>>>>>
>>>>>>>>>> Iceberg Index Support Sync
>>>>>>>>>> Wednesday, February 11 · 9:00 – 10:00am
>>>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>>>> Google Meet joining info
>>>>>>>>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Sorry, I meant PST (not EST) :)
>>>>>>>>>>> Looking forward to the discussion!
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Huaxin,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for starting the sync!
>>>>>>>>>>>>
>>>>>>>>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>>>>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>>>>>>>>> not EST. Maybe it's a typo?
>>>>>>>>>>>> Otherwise, looking forward to the discussion!
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Shawn
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>>>>>>>>>> support. Here is the existing discussion thread:
>>>>>>>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty
>>>>>>>>>>>>> .
>>>>>>>>>>>>>
>>>>>>>>>>>>> To ground the discussion, here are the two proposals:
>>>>>>>>>>>>>
>>>>>>>>>>>>>    - Peter's proposal
>>>>>>>>>>>>>    
>>>>>>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>>>>>>>>  (overall
>>>>>>>>>>>>>    index support)
>>>>>>>>>>>>>    - My proposal
>>>>>>>>>>>>>    
>>>>>>>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>>>>>>>>    (bloom filter skipping index)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>>>>>>>>>> starting next Wednesday (2/11). After FileFormat sync finishes, 
>>>>>>>>>>>>> we plan to
>>>>>>>>>>>>> use that slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Huaxin
>>>>>>>>>>>>>
>>>>>>>>>>>>

Re: Dedicated sync for Iceberg Index Support

Reply via email to