Re: Dedicated sync for Iceberg materialized view

Benny Chow Tue, 18 Nov 2025 13:05:24 -0800

Thanks Igor.  The PR has a suggestion for exactly what you suggested.  I
called it a "*warm*" state which is a state where stale materialization can
still be used.
https://github.com/apache/iceberg/pull/11041/files#r2474661166


I think if we continue with the assumption that MVs can only reference
iceberg tables and views, then it makes sense for the max-staleness grace
period to be dynamic based on snapshot history.   This is what Trino does:
https://trino.io/docs/current/connector/iceberg.html?utm_source=chatgpt.com#materialized-views

If there are non-Iceberg tables in the view SQL, then the grace period will
have to be based on last refresh which is also what Trino describes here:
https://trino.io/docs/current/sql/create-materialized-view.html#mv-grace-period

Should we call out both scenarios in the MV spec?  I think this is worth
being explicit here.

Thanks


On Tue, Nov 18, 2025 at 11:03 AM Igor Belianski <[email protected]>
wrote:

> Re:  max-stalenss-ms interpretation
> proposal:
>    A Materialized View(MNV) considered fresh if and only if the results
> stored are equivalent to the those that would have been obtained by running
> MV's defining query at some point in time within interval :
>  [CurrentTime-max-staleness-ms, Current_time]
>
> Note: this definition allows for optimization proposed by option 2
> (implementing which is definitely a great idea) , but doesn't mandate it.
>  One can also imagine some other optimization that would be possible given
> definition above , and would be left up to the engines toi implement.
>
>
>
>
>
> On Tue, Nov 18, 2025 at 10:54 AM Steven Wu <[email protected]> wrote:
>
>> A reminder for tomorrow's community sync for the MV spec.
>> https://calendar.app.google/T4zSk6qKWoy1vV6P7
>>
>> We have one open question from the last meeting on how
>> `max-stalenesss-ms` should be interpreted. You can find more details in the
>> meeting notes.
>>
>> https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.75r8e0rwq02o
>>
>> Please also bring other topics that we should discuss.
>>
>> On Sat, Nov 1, 2025 at 10:14 PM Steven Wu <[email protected]> wrote:
>>
>>> Sorry for the delay. Here are the recording and meeting notes for the MV
>>> sync meeting on Wednesday, Oct 29.
>>>
>>> https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.75r8e0rwq02o
>>>
>>> We have started to collect them in the above google doc.
>>>
>>> On Mon, Oct 27, 2025 at 8:58 AM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> If we have materialized views (MVs) and support for incremental change
>>>> scans, then by introducing a Java-based representation of the view, we can
>>>> expose a scan API that always returns up-to-date results for the MV.
>>>>
>>>> The scan could include multiple tasks:
>>>>
>>>>    - A task for reading the current version of the MV.
>>>>    - An incremental change log scan covering the range between the
>>>>    snapshot ID of the source table at the time the MV was last refreshed 
>>>> and
>>>>    its current snapshot ID. Applying the Java representation of the view 
>>>> when
>>>>    transformations are required.
>>>>
>>>> This approach allows us to build an always up-to-date index
>>>> table/single source MV, using existing components.
>>>>
>>>> Benny Chow <[email protected]> ezt írta (időpont: 2025. okt. 24., P,
>>>> 7:44):
>>>>
>>>>> Hi Peter
>>>>>
>>>>> I think the current proposal would support your example.  In most
>>>>> situations, replace table operations after a view is materialized wouldn’t
>>>>> invalidate the materialization.  However, if the view includes metadata
>>>>> columns, then the replace operations should invalidate the 
>>>>> materialization.
>>>>>
>>>>>
>>>>> This also brings up another important point that engines will differ
>>>>> on what views can be materialized or not.  For example, maybe metadata
>>>>> columns are not allowed similar to non deterministic functions like
>>>>> random.  But some engines like Dremio may allow views that use current 
>>>>> date
>>>>> functions.  It should be possible for one engine to materialize a view and
>>>>> another engine to look at the query tree and decide it’s not a view it
>>>>> supports materializations on and choose not to use that materialization.
>>>>>
>>>>> Thanks
>>>>> Benny
>>>>>
>>>>>
>>>>>
>>>>> On Oct 23, 2025, at 8:44 AM, Péter Váry <[email protected]>
>>>>> wrote:
>>>>>
>>>>> 
>>>>> Hi All,
>>>>>
>>>>> I’ve been catching up on the discussion and wanted to share an
>>>>> observation. One aspect that stands out to me in the proposed staleness
>>>>> evaluation logic is that snapshots which don’t modify data can still 
>>>>> affect
>>>>> the view’s contents if the view includes metadata columns.
>>>>>
>>>>> I was considering using a materialized view as an index for a given
>>>>> table to accelerate the conversion of equality deletes to position 
>>>>> deletes.
>>>>> For example, the query might look like:
>>>>>
>>>>> *SELECT _POS, _FILE, id FROM target_table*
>>>>>
>>>>>
>>>>> During compaction, the materialized view would need to be refreshed to
>>>>> ensure it reflects the correct data.
>>>>>
>>>>> Does this seem like a valid use case? Or should we explicitly exclude
>>>>> scenarios like this?
>>>>>
>>>>> Thanks,
>>>>> Peter
>>>>>
>>>>> Steven Wu <[email protected]> ezt írta (időpont: 2025. okt. 20.,
>>>>> H, 17:30):
>>>>>
>>>>>> Walaa,
>>>>>>
>>>>>> > while Option 2 is described in your summary as "giving engines
>>>>>> *flexibility* to determine freshness recursively beyond a source
>>>>>> MV", that *isn’t achievable* under the MV evaluation model itself.
>>>>>> Because each MV treats upstream MVs as physical tables, recursion
>>>>>> stops at the first materialized boundary; *deeper staleness cannot
>>>>>> be discovered without switching to a logical-view evaluation model, i.e.,
>>>>>> stepping outside the MV model altogether (note that in Option 3 we can
>>>>>> determine recursive staleness while still inside the MV model).*
>>>>>>
>>>>>> In option 2, when determining the freshness of mv_3, engines can
>>>>>> choose to recursively evaluate the freshness of mv_1 and mv_2 since they
>>>>>> are also MVs. But engines can also choose not to.
>>>>>>
>>>>>> > This means that there seems to be an implicit “Option 3”. This
>>>>>> option treats MVs as logical views, i.e., storing only view versions + 
>>>>>> base
>>>>>> table snapshot IDs (no MV storage snapshot IDs, no per-path lineage).
>>>>>>
>>>>>> In the new option 3 you described, how could the engine update mv3's
>>>>>> refresh state for base table_a and table_b? unless all connected MVs are
>>>>>> refreshed and committed in one single transaction, one entry per base 
>>>>>> table
>>>>>> doesn't seem feasible. That's the main reason for option 1 to require the
>>>>>> lineage path information in refresh state for base tables.
>>>>>>
>>>>>> It also seems that option 3 can only interpret freshness recursively,
>>>>>> while today there are engines that support MVs without recursively
>>>>>> evaluating source MVs.
>>>>>>
>>>>>> Thanks,
>>>>>> Steven
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 20, 2025 at 1:44 AM Walaa Eldin Moustafa <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hi Steven,
>>>>>>>
>>>>>>> Thanks for organizing the series and summarizing the outcome.
>>>>>>>
>>>>>>> After re-reading the Option 1/2 proposal, initially I interpreted
>>>>>>> Option 1 as simply expanding MVs like regular logical views. On closer
>>>>>>> look, it is actually more complex. It also preserves per-path lineage 
>>>>>>> state
>>>>>>> (e.g., multiple entries for the same base table via different parents),
>>>>>>> which increases expressiveness but significantly increases metadata
>>>>>>> complexity. So I agree it is not a practical option.
>>>>>>>
>>>>>>> This means that there seems to be an implicit “Option 3”. This
>>>>>>> option treats MVs as logical views, i.e., storing only view versions + 
>>>>>>> base
>>>>>>> table snapshot IDs (no MV storage snapshot IDs, no per-path lineage). 
>>>>>>> Under
>>>>>>> this model, mv_3’s metadata might look like:
>>>>>>>
>>>>>>> Type   Name     Tracked State
>>>>>>> -----  -------  -----------------------
>>>>>>> view   mv_1     view_version_id
>>>>>>> view   mv_2     view_version_id
>>>>>>> table  table_a  table_snapshot_id
>>>>>>> table  table_b  table_snapshot_id
>>>>>>>
>>>>>>> This preserves logical semantics and aligns MV behavior with pure
>>>>>>> views.
>>>>>>>
>>>>>>> *If we choose Option 2 (treat source MV as a materialized table), we
>>>>>>> may have to be consider those constraints:*
>>>>>>>
>>>>>>> * Staleness only degrades up the chain. mv_1 and mv_2 may already be
>>>>>>> stale relative to the base tables, but if mv_3 is refreshed using their
>>>>>>> storage snapshots, then mv_3 will be marked as fresh under Option 2, 
>>>>>>> even
>>>>>>> though all three MVs are stale relative to the base tables.
>>>>>>>
>>>>>>> * Engines can no longer discover staleness beyond mv_1. Once mv_3
>>>>>>> sees mv_1 (or mv_2) as fresh based only on their storage snapshots, it 
>>>>>>> will
>>>>>>> not expand into mv_1 or mv_2 to check whether they are stale relative to
>>>>>>> the base tables.
>>>>>>>
>>>>>>> * If mv_2 and mv_3 were purely logical views instead of MVs, they
>>>>>>> would evaluate directly against base tables and return newer data. Under
>>>>>>> Option 2, the same definitions but materialized upstream produce 
>>>>>>> different
>>>>>>> data, not just different metadata.
>>>>>>>
>>>>>>> Therefore, while Option 2 is described in your summary as "giving
>>>>>>> engines *flexibility* to determine freshness recursively beyond a
>>>>>>> source MV", that *isn’t achievable* under the MV evaluation model
>>>>>>> itself.
>>>>>>> Because each MV treats upstream MVs as physical tables, recursion
>>>>>>> stops at the first materialized boundary; *deeper staleness cannot
>>>>>>> be discovered without switching to a logical-view evaluation model, 
>>>>>>> i.e.,
>>>>>>> stepping outside the MV model altogether (note that in Option 3 we can
>>>>>>> determine recursive staleness while still inside the MV model).*
>>>>>>>
>>>>>>> Let me know your thoughts. I slightly prefer Option 3. I’m also fine
>>>>>>> with Option 2, but I don’t think the flexibility to recursively 
>>>>>>> determine
>>>>>>> freshness actually exists under its evaluation model. Not sure if this
>>>>>>> changes anyone’s view, but I wanted to clarify how I’m reading it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Oct 8, 2025 at 11:11 PM Benny Chow <[email protected]> wrote:
>>>>>>>
>>>>>>>> I just listened to the recording.  I'm the tech lead for MVs at
>>>>>>>> Dremio and responsible for both refresh management and query rewrites 
>>>>>>>> with
>>>>>>>> MVs.
>>>>>>>>
>>>>>>>> It's great that we seem to agree that Iceberg MV spec won't require
>>>>>>>> that MVs always be up to date in order to be usable for query rewrites.
>>>>>>>> There can be many data consistency issues (as Dan pointed out) but 
>>>>>>>> that is
>>>>>>>> the state of affairs today.
>>>>>>>>
>>>>>>>> It sounds like we are converging on the following scenarios for an
>>>>>>>> engine to validate the MV freshness:
>>>>>>>>
>>>>>>>> 1.  Use storage table without any validation.  This might be the
>>>>>>>> extreme "async MV" example.
>>>>>>>> 2.  Ignore storage table even if one exists because SQL command or
>>>>>>>> use case requires that.
>>>>>>>> 3.  Use storage table only if data is not more than x hours old.
>>>>>>>> This can be achieved with the proposed refresh-start-timestamp-ms which
>>>>>>>> is currently in the proposed spec.  For this to work with MVs built on 
>>>>>>>> MVs,
>>>>>>>> we should probably state in the spec that if a MV is built on another 
>>>>>>>> MV,
>>>>>>>> then it needs to inherit the refresh-start-timestamp-ms of the child 
>>>>>>>> MV.
>>>>>>>> In Steven's example, when building mv3, refresh-start-timestamp-ms 
>>>>>>>> needs to
>>>>>>>> be set to the minimum of mv1 or mv2's refresh-start-timestamp-ms.  If 
>>>>>>>> this
>>>>>>>> property name is confusing, we can rename it to
>>>>>>>> "refresh-earliest-table-timestamp-ms".  I originally proposed this 
>>>>>>>> property
>>>>>>>> and also listed out other benefits here:
>>>>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1779797796
>>>>>>>> Also, at the time, MVs built on MVs weren't being considered.  Now 
>>>>>>>> that it
>>>>>>>> is, I would recommend we have both "refresh-start-timestamp-ms" (when 
>>>>>>>> the
>>>>>>>> refresh was started on the storage table) and
>>>>>>>> "refresh-earliest-table-timestamp-ms" (used for freshness validation).
>>>>>>>> 4.  Don't use the storage table if it is older than X hours.  This
>>>>>>>> is what I had originally proposed for the
>>>>>>>> *materialization.max-stalessness-ms* view property here:
>>>>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1744837644
>>>>>>>> It wasn't meant to validate the freshness but more to prevent use of a
>>>>>>>> materialization after some criteria.
>>>>>>>> 5.  Use storage table if recursive validation passes... i.e.
>>>>>>>> refresh-state matches the current expanded query tree state.  This is 
>>>>>>>> what
>>>>>>>> I think Steven is calling the "synchronous MV".
>>>>>>>>
>>>>>>>> For scenario 1-4, it would support the nice use case of an Iceberg
>>>>>>>> client using a view's data through the storage table without needing to
>>>>>>>> know how to parse/validate/expand any view SQLs.
>>>>>>>>
>>>>>>>> In Dremio's planner, we primarily use scenario 1 and 4 together to
>>>>>>>> determine MV validity for query rewrite.  Scenario 2 and 5 also apply 
>>>>>>>> in
>>>>>>>> certain situations.  For scenario 3, Dremio only exposes the
>>>>>>>> "refresh-earliest-table-timestamp-ms" as an fyi to the user but it 
>>>>>>>> would be
>>>>>>>> interesting to allow the user to set this time so that they could run
>>>>>>>> queries and be 100% certain that they were not seeing data older than x
>>>>>>>> hours.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Benny
>>>>>>>>
>>>>>>>> On Wed, Oct 8, 2025 at 3:37 PM Steven Wu <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> correction for a typo.
>>>>>>>>>
>>>>>>>>> Prashanth brought up another scenario of compaction/rewrite where
>>>>>>>>> a new snapshot was added *with* actual data change
>>>>>>>>> -->
>>>>>>>>> Prashanth brought up another scenario of compaction/rewrite where
>>>>>>>>> a new snapshot was added *without* actual data change
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 8, 2025 at 2:12 PM Steven Wu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Thanks everyone for joining the MV discussion meeting. We will
>>>>>>>>>> continue to have the recurring sync meeting on Wednesday 9 am 
>>>>>>>>>> (Pacific)
>>>>>>>>>> every 3 weeks until we get to the finish line where Jan's MV spec PR 
>>>>>>>>>> [1] is
>>>>>>>>>> merged. I have scheduled our next meeting on Oct 29 in the Iceberg 
>>>>>>>>>> dev
>>>>>>>>>> events calendar.
>>>>>>>>>>
>>>>>>>>>> Here is the video recording for today's meeting.
>>>>>>>>>>
>>>>>>>>>> https://drive.google.com/file/d/1-nfhBPDWLoAFDu5cKP0rwLd_30HB6byR/view?usp=sharing
>>>>>>>>>>
>>>>>>>>>> We mostly discussed freshness evaluation. Here is the meeting
>>>>>>>>>> summary.
>>>>>>>>>>
>>>>>>>>>>    1. For tracking the refresh state for the source MV [2], the
>>>>>>>>>>    consensus is option 2 (treating source MV as a materialized 
>>>>>>>>>> table) which
>>>>>>>>>>    would give engines the flexibility on freshness determination 
>>>>>>>>>> (recursive
>>>>>>>>>>    beyond source MV or not).
>>>>>>>>>>    2. Earlier design doc [3] discussed max staleness config. But
>>>>>>>>>>    it wasn't reflected in the spec PR. The general opinion is to add 
>>>>>>>>>> the
>>>>>>>>>>    config to the spec PR. The open question is whether the `
>>>>>>>>>>    materialization.max-staleness-ms` config should be added to
>>>>>>>>>>    the view metadata or the storage table metadata. Either can work. 
>>>>>>>>>> We just
>>>>>>>>>>    need to decide which makes a little better fit.
>>>>>>>>>>    3. Prashanth brought up schema change with default value and
>>>>>>>>>>    how it may affect the MV refresh state (for SQL representation 
>>>>>>>>>> with select
>>>>>>>>>>    *). Jan mentioned that snapshot contains schema id when the 
>>>>>>>>>> snapshot was
>>>>>>>>>>    created. Engine can compare the snapshot schema id to the source 
>>>>>>>>>> table
>>>>>>>>>>    schema id during freshness evaluation. There is no need for 
>>>>>>>>>> additional
>>>>>>>>>>    schema info in refresh-state tracking in the storage table.
>>>>>>>>>>    4. Prashanth brought up another scenario of
>>>>>>>>>>    compaction/rewrite where a new snapshot was added with actual 
>>>>>>>>>> data change.
>>>>>>>>>>    The general take is that the engine can optimize and decide that 
>>>>>>>>>> MV is
>>>>>>>>>>    fresh as the new snapshot doesn't have any data change.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We can add some clarifications in the spec PR for freshness
>>>>>>>>>> evaluation based on the above discussions.
>>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/11041
>>>>>>>>>> [2]
>>>>>>>>>> https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0
>>>>>>>>>> [3]
>>>>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?tab=t.0#heading=h.3wigecex0zls
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 25, 2025 at 9:27 AM Steven Wu <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Iceberg materialized view has been discussed in the community
>>>>>>>>>>> for a long time. Thanks Jan Kaul for driving the discussion and the 
>>>>>>>>>>> spec
>>>>>>>>>>> PR. It has been stalled for a long time due to lack of consensus on 
>>>>>>>>>>> 1 or 2
>>>>>>>>>>> topics. In Wed's Iceberg community sync meeting, Talat brought up 
>>>>>>>>>>> the
>>>>>>>>>>> question on how to move forward and if we can have a dedicated 
>>>>>>>>>>> meeting for
>>>>>>>>>>> MV.
>>>>>>>>>>>
>>>>>>>>>>> I have set up a meeting on *Oct 8 (9-10 am Pacific)*. If you
>>>>>>>>>>> subscribe to the "Iceberg Dev Events" calendar, you should be
>>>>>>>>>>> able to see it. If not, here is the link:
>>>>>>>>>>> https://meet.google.com/nfe-guyq-pqf
>>>>>>>>>>>
>>>>>>>>>>> We are going to discuss
>>>>>>>>>>> * remaining open questions
>>>>>>>>>>> * unresolved concerns
>>>>>>>>>>> * the next step and hopefully some consensus on moving forward
>>>>>>>>>>>
>>>>>>>>>>> MV spec PR is up to date. Jan has incorporated recent feedback.
>>>>>>>>>>> This should be the base of the discussion.
>>>>>>>>>>> https://github.com/apache/iceberg/pull/11041
>>>>>>>>>>> <https://www.google.com/url?q=https://github.com/apache/iceberg/pull/11041&sa=D&source=calendar&usd=2&usg=AOvVaw3w0TjRpwbC17AGzmxZmElM>
>>>>>>>>>>>
>>>>>>>>>>> Dev discussion thread (a long-running thread started by Jan).
>>>>>>>>>>> https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf
>>>>>>>>>>> <https://www.google.com/url?q=https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf&sa=D&source=calendar&usd=2&usg=AOvVaw0fotlsrnRBOb820mA5JRyB>
>>>>>>>>>>>
>>>>>>>>>>> The mail archive has broken lineage and doesn't show all
>>>>>>>>>>> replies. Email subject is "*[DISCUSS] Iceberg
>>>>>>>>>>> Materialzied Views*".
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Steven
>>>>>>>>>>>
>>>>>>>>>>>

Re: Dedicated sync for Iceberg materialized view

Reply via email to