Re: Dedicated sync for Iceberg materialized view

Steven Wu Sat, 01 Nov 2025 22:15:33 -0700

Sorry for the delay. Here are the recording and meeting notes for the MV
sync meeting on Wednesday, Oct 29.
https://docs.google.com/document/d/1EVCM-hKr5tY33t0Yzq37cAXSPncySc6Ghke7OZEcqXU/edit?tab=t.0#heading=h.75r8e0rwq02o


We have started to collect them in the above google doc.

On Mon, Oct 27, 2025 at 8:58 AM Péter Váry <[email protected]>
wrote:

> If we have materialized views (MVs) and support for incremental change
> scans, then by introducing a Java-based representation of the view, we can
> expose a scan API that always returns up-to-date results for the MV.
>
> The scan could include multiple tasks:
>
>    - A task for reading the current version of the MV.
>    - An incremental change log scan covering the range between the
>    snapshot ID of the source table at the time the MV was last refreshed and
>    its current snapshot ID. Applying the Java representation of the view when
>    transformations are required.
>
> This approach allows us to build an always up-to-date index table/single
> source MV, using existing components.
>
> Benny Chow <[email protected]> ezt írta (időpont: 2025. okt. 24., P, 7:44):
>
>> Hi Peter
>>
>> I think the current proposal would support your example.  In most
>> situations, replace table operations after a view is materialized wouldn’t
>> invalidate the materialization.  However, if the view includes metadata
>> columns, then the replace operations should invalidate the materialization.
>>
>>
>> This also brings up another important point that engines will differ on
>> what views can be materialized or not.  For example, maybe metadata columns
>> are not allowed similar to non deterministic functions like random.  But
>> some engines like Dremio may allow views that use current date functions.
>> It should be possible for one engine to materialize a view and another
>> engine to look at the query tree and decide it’s not a view it supports
>> materializations on and choose not to use that materialization.
>>
>> Thanks
>> Benny
>>
>>
>>
>> On Oct 23, 2025, at 8:44 AM, Péter Váry <[email protected]>
>> wrote:
>>
>> 
>> Hi All,
>>
>> I’ve been catching up on the discussion and wanted to share an
>> observation. One aspect that stands out to me in the proposed staleness
>> evaluation logic is that snapshots which don’t modify data can still affect
>> the view’s contents if the view includes metadata columns.
>>
>> I was considering using a materialized view as an index for a given table
>> to accelerate the conversion of equality deletes to position deletes. For
>> example, the query might look like:
>>
>> *SELECT _POS, _FILE, id FROM target_table*
>>
>>
>> During compaction, the materialized view would need to be refreshed to
>> ensure it reflects the correct data.
>>
>> Does this seem like a valid use case? Or should we explicitly exclude
>> scenarios like this?
>>
>> Thanks,
>> Peter
>>
>> Steven Wu <[email protected]> ezt írta (időpont: 2025. okt. 20., H,
>> 17:30):
>>
>>> Walaa,
>>>
>>> > while Option 2 is described in your summary as "giving engines
>>> *flexibility* to determine freshness recursively beyond a source MV",
>>> that *isn’t achievable* under the MV evaluation model itself.
>>> Because each MV treats upstream MVs as physical tables, recursion stops
>>> at the first materialized boundary; *deeper staleness cannot be
>>> discovered without switching to a logical-view evaluation model, i.e.,
>>> stepping outside the MV model altogether (note that in Option 3 we can
>>> determine recursive staleness while still inside the MV model).*
>>>
>>> In option 2, when determining the freshness of mv_3, engines can choose
>>> to recursively evaluate the freshness of mv_1 and mv_2 since they are also
>>> MVs. But engines can also choose not to.
>>>
>>> > This means that there seems to be an implicit “Option 3”. This option
>>> treats MVs as logical views, i.e., storing only view versions + base table
>>> snapshot IDs (no MV storage snapshot IDs, no per-path lineage).
>>>
>>> In the new option 3 you described, how could the engine update mv3's
>>> refresh state for base table_a and table_b? unless all connected MVs are
>>> refreshed and committed in one single transaction, one entry per base table
>>> doesn't seem feasible. That's the main reason for option 1 to require the
>>> lineage path information in refresh state for base tables.
>>>
>>> It also seems that option 3 can only interpret freshness recursively,
>>> while today there are engines that support MVs without recursively
>>> evaluating source MVs.
>>>
>>> Thanks,
>>> Steven
>>>
>>>
>>> On Mon, Oct 20, 2025 at 1:44 AM Walaa Eldin Moustafa <
>>> [email protected]> wrote:
>>>
>>>> Hi Steven,
>>>>
>>>> Thanks for organizing the series and summarizing the outcome.
>>>>
>>>> After re-reading the Option 1/2 proposal, initially I interpreted
>>>> Option 1 as simply expanding MVs like regular logical views. On closer
>>>> look, it is actually more complex. It also preserves per-path lineage state
>>>> (e.g., multiple entries for the same base table via different parents),
>>>> which increases expressiveness but significantly increases metadata
>>>> complexity. So I agree it is not a practical option.
>>>>
>>>> This means that there seems to be an implicit “Option 3”. This option
>>>> treats MVs as logical views, i.e., storing only view versions + base table
>>>> snapshot IDs (no MV storage snapshot IDs, no per-path lineage). Under this
>>>> model, mv_3’s metadata might look like:
>>>>
>>>> Type   Name     Tracked State
>>>> -----  -------  -----------------------
>>>> view   mv_1     view_version_id
>>>> view   mv_2     view_version_id
>>>> table  table_a  table_snapshot_id
>>>> table  table_b  table_snapshot_id
>>>>
>>>> This preserves logical semantics and aligns MV behavior with pure views.
>>>>
>>>> *If we choose Option 2 (treat source MV as a materialized table), we
>>>> may have to be consider those constraints:*
>>>>
>>>> * Staleness only degrades up the chain. mv_1 and mv_2 may already be
>>>> stale relative to the base tables, but if mv_3 is refreshed using their
>>>> storage snapshots, then mv_3 will be marked as fresh under Option 2, even
>>>> though all three MVs are stale relative to the base tables.
>>>>
>>>> * Engines can no longer discover staleness beyond mv_1. Once mv_3 sees
>>>> mv_1 (or mv_2) as fresh based only on their storage snapshots, it will not
>>>> expand into mv_1 or mv_2 to check whether they are stale relative to the
>>>> base tables.
>>>>
>>>> * If mv_2 and mv_3 were purely logical views instead of MVs, they would
>>>> evaluate directly against base tables and return newer data. Under Option
>>>> 2, the same definitions but materialized upstream produce different data,
>>>> not just different metadata.
>>>>
>>>> Therefore, while Option 2 is described in your summary as "giving
>>>> engines *flexibility* to determine freshness recursively beyond a
>>>> source MV", that *isn’t achievable* under the MV evaluation model
>>>> itself.
>>>> Because each MV treats upstream MVs as physical tables, recursion stops
>>>> at the first materialized boundary; *deeper staleness cannot be
>>>> discovered without switching to a logical-view evaluation model, i.e.,
>>>> stepping outside the MV model altogether (note that in Option 3 we can
>>>> determine recursive staleness while still inside the MV model).*
>>>>
>>>> Let me know your thoughts. I slightly prefer Option 3. I’m also fine
>>>> with Option 2, but I don’t think the flexibility to recursively determine
>>>> freshness actually exists under its evaluation model. Not sure if this
>>>> changes anyone’s view, but I wanted to clarify how I’m reading it.
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>>
>>>> On Wed, Oct 8, 2025 at 11:11 PM Benny Chow <[email protected]> wrote:
>>>>
>>>>> I just listened to the recording.  I'm the tech lead for MVs at Dremio
>>>>> and responsible for both refresh management and query rewrites with MVs.
>>>>>
>>>>> It's great that we seem to agree that Iceberg MV spec won't require
>>>>> that MVs always be up to date in order to be usable for query rewrites.
>>>>> There can be many data consistency issues (as Dan pointed out) but that is
>>>>> the state of affairs today.
>>>>>
>>>>> It sounds like we are converging on the following scenarios for an
>>>>> engine to validate the MV freshness:
>>>>>
>>>>> 1.  Use storage table without any validation.  This might be the
>>>>> extreme "async MV" example.
>>>>> 2.  Ignore storage table even if one exists because SQL command or use
>>>>> case requires that.
>>>>> 3.  Use storage table only if data is not more than x hours old.  This
>>>>> can be achieved with the proposed refresh-start-timestamp-ms which is
>>>>> currently in the proposed spec.  For this to work with MVs built on MVs, 
>>>>> we
>>>>> should probably state in the spec that if a MV is built on another MV, 
>>>>> then
>>>>> it needs to inherit the refresh-start-timestamp-ms of the child MV.  In
>>>>> Steven's example, when building mv3, refresh-start-timestamp-ms needs to 
>>>>> be
>>>>> set to the minimum of mv1 or mv2's refresh-start-timestamp-ms.  If this
>>>>> property name is confusing, we can rename it to
>>>>> "refresh-earliest-table-timestamp-ms".  I originally proposed this 
>>>>> property
>>>>> and also listed out other benefits here:
>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1779797796
>>>>> Also, at the time, MVs built on MVs weren't being considered.  Now that it
>>>>> is, I would recommend we have both "refresh-start-timestamp-ms" (when the
>>>>> refresh was started on the storage table) and
>>>>> "refresh-earliest-table-timestamp-ms" (used for freshness validation).
>>>>> 4.  Don't use the storage table if it is older than X hours.  This is
>>>>> what I had originally proposed for the
>>>>> *materialization.max-stalessness-ms* view property here:
>>>>> https://github.com/apache/iceberg/pull/11041#discussion_r1744837644
>>>>> It wasn't meant to validate the freshness but more to prevent use of a
>>>>> materialization after some criteria.
>>>>> 5.  Use storage table if recursive validation passes... i.e.
>>>>> refresh-state matches the current expanded query tree state.  This is what
>>>>> I think Steven is calling the "synchronous MV".
>>>>>
>>>>> For scenario 1-4, it would support the nice use case of an Iceberg
>>>>> client using a view's data through the storage table without needing to
>>>>> know how to parse/validate/expand any view SQLs.
>>>>>
>>>>> In Dremio's planner, we primarily use scenario 1 and 4 together to
>>>>> determine MV validity for query rewrite.  Scenario 2 and 5 also apply in
>>>>> certain situations.  For scenario 3, Dremio only exposes the
>>>>> "refresh-earliest-table-timestamp-ms" as an fyi to the user but it would 
>>>>> be
>>>>> interesting to allow the user to set this time so that they could run
>>>>> queries and be 100% certain that they were not seeing data older than x
>>>>> hours.
>>>>>
>>>>> Thanks
>>>>> Benny
>>>>>
>>>>> On Wed, Oct 8, 2025 at 3:37 PM Steven Wu <[email protected]> wrote:
>>>>>
>>>>>> correction for a typo.
>>>>>>
>>>>>> Prashanth brought up another scenario of compaction/rewrite where a
>>>>>> new snapshot was added *with* actual data change
>>>>>> -->
>>>>>> Prashanth brought up another scenario of compaction/rewrite where a
>>>>>> new snapshot was added *without* actual data change
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 8, 2025 at 2:12 PM Steven Wu <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Thanks everyone for joining the MV discussion meeting. We will
>>>>>>> continue to have the recurring sync meeting on Wednesday 9 am (Pacific)
>>>>>>> every 3 weeks until we get to the finish line where Jan's MV spec PR 
>>>>>>> [1] is
>>>>>>> merged. I have scheduled our next meeting on Oct 29 in the Iceberg dev
>>>>>>> events calendar.
>>>>>>>
>>>>>>> Here is the video recording for today's meeting.
>>>>>>>
>>>>>>> https://drive.google.com/file/d/1-nfhBPDWLoAFDu5cKP0rwLd_30HB6byR/view?usp=sharing
>>>>>>>
>>>>>>> We mostly discussed freshness evaluation. Here is the meeting
>>>>>>> summary.
>>>>>>>
>>>>>>>    1. For tracking the refresh state for the source MV [2], the
>>>>>>>    consensus is option 2 (treating source MV as a materialized table) 
>>>>>>> which
>>>>>>>    would give engines the flexibility on freshness determination 
>>>>>>> (recursive
>>>>>>>    beyond source MV or not).
>>>>>>>    2. Earlier design doc [3] discussed max staleness config. But it
>>>>>>>    wasn't reflected in the spec PR. The general opinion is to add the 
>>>>>>> config
>>>>>>>    to the spec PR. The open question is whether the `
>>>>>>>    materialization.max-staleness-ms` config should be added to the
>>>>>>>    view metadata or the storage table metadata. Either can work. We 
>>>>>>> just need
>>>>>>>    to decide which makes a little better fit.
>>>>>>>    3. Prashanth brought up schema change with default value and how
>>>>>>>    it may affect the MV refresh state (for SQL representation with 
>>>>>>> select *).
>>>>>>>    Jan mentioned that snapshot contains schema id when the snapshot was
>>>>>>>    created. Engine can compare the snapshot schema id to the source 
>>>>>>> table
>>>>>>>    schema id during freshness evaluation. There is no need for 
>>>>>>> additional
>>>>>>>    schema info in refresh-state tracking in the storage table.
>>>>>>>    4. Prashanth brought up another scenario of compaction/rewrite
>>>>>>>    where a new snapshot was added with actual data change. The general 
>>>>>>> take is
>>>>>>>    that the engine can optimize and decide that MV is fresh as the new
>>>>>>>    snapshot doesn't have any data change.
>>>>>>>
>>>>>>>
>>>>>>> We can add some clarifications in the spec PR for freshness
>>>>>>> evaluation based on the above discussions.
>>>>>>>
>>>>>>> [1] https://github.com/apache/iceberg/pull/11041
>>>>>>> [2]
>>>>>>> https://docs.google.com/document/d/1_StBW5hCQhumhIvgbdsHjyW0ED3dWMkjtNzyPp9Sfr8/edit?tab=t.0
>>>>>>> [3]
>>>>>>> https://docs.google.com/document/d/1UnhldHhe3Grz8JBngwXPA6ZZord1xMedY5ukEhZYF-A/edit?tab=t.0#heading=h.3wigecex0zls
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 25, 2025 at 9:27 AM Steven Wu <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> Iceberg materialized view has been discussed in the community for a
>>>>>>>> long time. Thanks Jan Kaul for driving the discussion and the spec PR. 
>>>>>>>> It
>>>>>>>> has been stalled for a long time due to lack of consensus on 1 or 2 
>>>>>>>> topics.
>>>>>>>> In Wed's Iceberg community sync meeting, Talat brought up the question 
>>>>>>>> on
>>>>>>>> how to move forward and if we can have a dedicated meeting for MV.
>>>>>>>>
>>>>>>>> I have set up a meeting on *Oct 8 (9-10 am Pacific)*. If you
>>>>>>>> subscribe to the "Iceberg Dev Events" calendar, you should be able
>>>>>>>> to see it. If not, here is the link:
>>>>>>>> https://meet.google.com/nfe-guyq-pqf
>>>>>>>>
>>>>>>>> We are going to discuss
>>>>>>>> * remaining open questions
>>>>>>>> * unresolved concerns
>>>>>>>> * the next step and hopefully some consensus on moving forward
>>>>>>>>
>>>>>>>> MV spec PR is up to date. Jan has incorporated recent feedback.
>>>>>>>> This should be the base of the discussion.
>>>>>>>> https://github.com/apache/iceberg/pull/11041
>>>>>>>> <https://www.google.com/url?q=https://github.com/apache/iceberg/pull/11041&sa=D&source=calendar&usd=2&usg=AOvVaw3w0TjRpwbC17AGzmxZmElM>
>>>>>>>>
>>>>>>>> Dev discussion thread (a long-running thread started by Jan).
>>>>>>>> https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf
>>>>>>>> <https://www.google.com/url?q=https://lists.apache.org/thread/y1vlpzbn2x7xookjkffcl08zzyofk5hf&sa=D&source=calendar&usd=2&usg=AOvVaw0fotlsrnRBOb820mA5JRyB>
>>>>>>>>
>>>>>>>> The mail archive has broken lineage and doesn't show all replies.
>>>>>>>> Email subject is "*[DISCUSS] Iceberg Materialzied Views*".
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Steven
>>>>>>>>
>>>>>>>>

Re: Dedicated sync for Iceberg materialized view

Reply via email to