Re: Dedicated sync for Iceberg Index Support

huaxin gao Sun, 01 Mar 2026 19:24:20 -0800

Thanks Peter for the reminder and agenda!

Here are some more details for the Bloom index status:



   - When it helps: high-cardinality =/IN predicates where min/max stats
   are not selective and many files remain after normal Iceberg pruning
   (“needle in a haystack”).
   - Why it helps vs Parquet row-group Bloom: row-group Bloom still
   requires opening each candidate data file (footer/Bloom pages). Puffin
   Bloom is consulted during planning, so it can prune files before scheduling
   scan tasks and opening most files.
   - Savings vs cost:
      - Savings: plannedFiles → afterBloom (files avoided)
      - Cost: planner reads statsFiles/statsBytes/bloomPayloadBytes (Puffin
      footer + selective blob slices)
      - Example (POC benchmark): plannedFiles=658, afterBloom=1 (needle),
      with index overhead statsFiles=1, statsBytes≈17MB,
      bloomPayloadBytes≈16.8MB. The goal is to show “avoided per-file
      opens/tasks” outweighs “index read”. This benchmark is
intentionally scoped
      to the workload the feature targets; it’s not meant to claim
Bloom skipping
      helps all queries, which is why the feature is opt-in. Users enable this
      when they see selective point lookups over many files and want to reduce
      file opens/task scheduling.
   - Sizing: for fpp=0.01, Bloom needs 1.2 bytes per inserted value.
   Example: ~10,000 values/file → ~12 KB Bloom payload per data file (plus
   small Puffin overhead).
   - Lifecycle/maintenance: incremental shards for new files;
   missing/behind is safe (no pruning); shard compaction + snapshot
   expiration/orphan cleanup to bound artifacts.
   - Writer expectations: async maintenance is primary; inline is optional
   (inline writers may not know the final number of inserted values up front,
   so they can size at file close or use a scalable/growing Bloom filter); any
   error/missing/stale index ⇒ fallback (correctness unchanged). Feature is
   opt-in for the targeted workload.

Looking forward to the sync!

Best,

Huaxin

On Sat, Feb 28, 2026 at 3:53 AM Péter Váry <[email protected]>
wrote:

> Please note that the next *Secondary Index Sync* will take place on *March
> 2nd, 9:00-10:00 AM PT*.
>
> *Proposed agenda*:
>
>    - Discussion of potential use‑cases
>       - Primary Key index for Flink equality‑delete resolution
>       - Secondary data layout
>          - Containing index
>          - Alternative query plans
>       - Vector index
>    - Discussion of the two alternative approaches for metadata placement:
>    keeping index metadata inside the table metadata vs. managing it externally
>    through an Index Catalog
>    - Bloom filter index status update
>       - Performance justification: when this helps (high-cardinality = /
>       IN, many data files, high object-store latency) and how it differs from
>       Parquet row-group Bloom filters (which still require opening the data 
> file).
>       - Cost / scalability: rough sizing (Bloom blob size per file,
>       Puffin file size), the planning cost trade-off (driver index reads vs
>       executor file opens), and mitigations via caching.
>       - Lifecycle / maintenance: incremental production as new data files
>       arrive, behavior when the index is missing/behind, and 
> sharding/compaction
>       plus cleanup to avoid accumulating too many small Puffin files over 
> time.
>       - Writer expectations: inline (optional) vs asynchronous (primary)
>       index creation.
>
> Looking forward to diving into this topic together.
>
> See you all there,
> Peter
>
> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
> 25., Sze, 10:04):
>
>> Dan kindly set up a dedicated public Slack channel (*#indexes)* for the
>> Secondary Index discussion.
>> You can find it here:
>> https://apache-iceberg.slack.com/archives/C0AFDSU3EUU
>> Feel free to join if you’d like to participate in the discussion or
>> simply follow along.
>>
>> Thanks,
>> Peter
>>
>> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
>> 24., K, 12:52):
>>
>>> We had an extended discussion on Slack with Dan, Steven, and Yufei about
>>> where index metadata should live. In particular, whether it should be
>>> stored directly in the table metadata or maintained in a dedicated index
>>> catalog. I tried to capture this discussion in the Layout
>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4oz3yd6ngr3>
>>>  section
>>> of the document.
>>>
>>> Once the decision is made, this section can be shortened, but for now it
>>> is intentionally more detailed so that everyone can see the arguments that
>>> were discussed and so that those who could not participate synchronously
>>> can still follow and provide feedback offline.
>>>
>>> In short, we are currently *leaning toward storing index metadata in
>>> its own catalog*, while allowing REST catalogs to expose a composite
>>> endpoint that returns both table and index metadata in a single round trip.
>>> This is similar in spirit to the universal load endpoint discussed in the
>>> context of materialized view loading.
>>>
>>> Thanks,
>>> Peter
>>>
>>> Péter Váry <[email protected]> ezt írta (időpont: 2026. febr.
>>> 19., Cs, 14:06):
>>>
>>>> Thanks Huaxin for posting the recording and the meeting notes.
>>>>
>>>> I used this time to also address the questions collected during the
>>>> sync:
>>>>
>>>>    - Collected some representative use cases. See the example use-cases
>>>>    
>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.i4gt8za99j9d>
>>>>  paragraph.
>>>>    Anyone should feel free to suggest their own.
>>>>    - Collected my thoughts about the writer requirements. See the writer
>>>>    requirements
>>>>    
>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.4b1p8r8nmfg1>
>>>>    paragraph.
>>>>    - Centralized the index maintenance related parts. See the index
>>>>    maintenance
>>>>    
>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?pli=1&tab=t.0#heading=h.hw2nt44i0k8q>
>>>>    paragraph.
>>>>
>>>> Might be a bit premature but created a PR
>>>> <https://github.com/apache/iceberg/pull/15101> with the proposed index
>>>> catalog related changes, so the ones who are more code oriented could take
>>>> a look at it too.
>>>>
>>>> huaxin gao <[email protected]> ezt írta (időpont: 2026. febr.
>>>> 19., Cs, 5:34):
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> Here are the recording and notes from the Iceberg Index Support Sync
>>>>> on 2/11.
>>>>>
>>>>> Recording: https://www.youtube.com/watch?v=3sFfQ0A50yk
>>>>>
>>>>> Notes:
>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.8041k7j2n7y3
>>>>>
>>>>> The meeting will move to biweekly, Mondays 9–10am PST, starting March
>>>>> 2.
>>>>>
>>>>> Since the sync, I updated the Bloom skipping index proposal
>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.5r5kl6k3fqwu>
>>>>> to address the discussion questions, specifically:
>>>>>
>>>>>
>>>>>    - Performance justification: when this helps (high-cardinality = /
>>>>>    IN, many data files, high object-store latency) and how it differs from
>>>>>    Parquet row-group Bloom filters (which still require opening the data 
>>>>> file).
>>>>>    - Cost / scalability: rough sizing (Bloom blob size per file,
>>>>>    Puffin file size), the planning cost trade-off (driver index reads vs
>>>>>    executor file opens), and mitigations via caching.
>>>>>    - Lifecycle / maintenance: incremental production as new data
>>>>>    files arrive, behavior when the index is missing/behind, and
>>>>>    sharding/compaction plus cleanup to avoid accumulating too many small
>>>>>    Puffin files over time.
>>>>>    - Writer expectations: inline (optional) vs asynchronous (primary)
>>>>>    index creation.
>>>>>
>>>>> I also implemented a Spark 4.1 POC
>>>>> <https://github.com/apache/iceberg/pull/15311> and a local benchmark
>>>>> to quantify both the pruning impact (plannedFiles → afterBloom) and the
>>>>> index read overhead (statsFiles, statsBytes, bloomPayloadBytes) for point
>>>>> predicates on high-cardinality columns. Please take a look and let me know
>>>>> if you have any questions or feedback.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Huaxin
>>>>>
>>>>> On Tue, Feb 10, 2026 at 1:43 PM huaxin gao <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Reminder for tomorrow's sync on Iceberg Index Support.
>>>>>>
>>>>>> Wednesday: Feb. 11 9:00 – 10:00am
>>>>>> Time zone: America/Los_Angeles
>>>>>> Google Meet joining info
>>>>>> Video call link: meet.google.com/nsp-ctyr-khk
>>>>>> Design doc:
>>>>>>
>>>>>> https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2
>>>>>>
>>>>>> https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7
>>>>>>
>>>>>> Thanks,
>>>>>> Huaxin
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 3, 2026 at 10:52 PM Péter Váry <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Thanks Huaxin and Steven for organizing this. Looking forward to
>>>>>>> meet you all next week!
>>>>>>>
>>>>>>> On Wed, Feb 4, 2026, 02:48 Steven Wu <[email protected]> wrote:
>>>>>>>
>>>>>>>> We set up the dev calendar event with a new google meet link.
>>>>>>>> Please ignore the link from Huaxin's original email.
>>>>>>>>
>>>>>>>> The dev calendar has the correct info (including the new meeting
>>>>>>>> link)
>>>>>>>>
>>>>>>>> Iceberg Index Support Sync
>>>>>>>> Wednesday, February 11 · 9:00 – 10:00am
>>>>>>>> Time zone: America/Los_Angeles
>>>>>>>> Google Meet joining info
>>>>>>>> Video call link: https://meet.google.com/nsp-ctyr-khk
>>>>>>>>
>>>>>>>> On Tue, Feb 3, 2026 at 5:08 PM huaxin gao <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Sorry, I meant PST (not EST) :)
>>>>>>>>> Looking forward to the discussion!
>>>>>>>>>
>>>>>>>>> On Tue, Feb 3, 2026 at 4:58 PM Shawn Chang <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Huaxin,
>>>>>>>>>>
>>>>>>>>>> Thanks for starting the sync!
>>>>>>>>>>
>>>>>>>>>> The meeting seems to be 9-10AM PST on the dev events calendar
>>>>>>>>>> <https://calendar.google.com/calendar/u/0?cid=MzkwNWQ0OTJmMWI0NTBiYTA3MTJmMmFlNmFmYTc2ZWI3NTdmMTNkODUyMjBjYzAzYWE0NTI3ODg1YWRjNTYyOUBncm91cC5jYWxlbmRhci5nb29nbGUuY29t>,
>>>>>>>>>> not EST. Maybe it's a typo?
>>>>>>>>>> Otherwise, looking forward to the discussion!
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Shawn
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 3, 2026 at 9:18 AM huaxin gao <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>> I'd like to start a dedicated sync to discuss Iceberg Index
>>>>>>>>>>> support. Here is the existing discussion thread:
>>>>>>>>>>> https://lists.apache.org/thread/fzqk3jjf0xpj5m4cfqb3v4c65p0t04ty
>>>>>>>>>>> .
>>>>>>>>>>>
>>>>>>>>>>> To ground the discussion, here are the two proposals:
>>>>>>>>>>>
>>>>>>>>>>>    - Peter's proposal
>>>>>>>>>>>    
>>>>>>>>>>> <https://docs.google.com/document/d/1N6a2IOzC6Qsqv7NBqHKesees4N6WF49YUSIX2FrF7S0/edit?tab=t.0#heading=h.hs6r9d26w1y2>
>>>>>>>>>>>  (overall
>>>>>>>>>>>    index support)
>>>>>>>>>>>    - My proposal
>>>>>>>>>>>    
>>>>>>>>>>> <https://docs.google.com/document/d/1x-0KT43aTrt8u6EV7EgSietIFQSkGsocqwnBTHPebRU/edit?tab=t.0#heading=h.qouk73o4jxx7>
>>>>>>>>>>>    (bloom filter skipping index)
>>>>>>>>>>>
>>>>>>>>>>> Time slot: Every 3 weeks, Wednesdays at 9 AM to 10 AM EST,
>>>>>>>>>>> starting next Wednesday (2/11). After FileFormat sync finishes, we 
>>>>>>>>>>> plan to
>>>>>>>>>>> use that slot and switch to every other Monday, 9 AM to 10 AM EST.
>>>>>>>>>>>
>>>>>>>>>>> Meet link: https://meet.google.com/fjn-tyze-mko
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Huaxin
>>>>>>>>>>>
>>>>>>>>>>

Re: Dedicated sync for Iceberg Index Support

Reply via email to