Thanks Ryan for the help to trace back to the root question! Just a
clarification question regarding your reply before I reply further: what
exactly does the option "a combination of the two (i.e. commits are
combined)" mean? How is that different from "a new metadata type"?

-Jack




On Wed, Feb 28, 2024 at 2:10 PM Ryan Blue <b...@tabular.io> wrote:

> I’m catching up on this conversation, so hopefully I can bring a fresh
> perspective.
>
> Jack already pointed out that we need to start from the basics and I agree
> with that. Let’s remove voting at this point. Right now is the time for
> discussing trade-offs, not lining up and taking sides. I realize that
> wasn’t the intent with adding a vote, but that’s almost always the result.
> It’s too easy to use it as a stand-in for consensus and move on
> prematurely. I get the impression from the swirl in Slack that discussion
> has moved ahead of agreement.
>
> We’re still at the most basic question: is a materialized view a view and
> a separate table, a combination of the two (i.e. commits are combined), or
> a new metadata type?
>
> For now, I’m ignoring whether the “separate table” is some kind of “system
> table” (meaning hidden?) or if it is exposed in the catalog. That’s a later
> choice (already pointed out) and, I suspect, it should be delegated to
> catalog implementations.
>
> To simplify this a little, I think that we can eliminate the option to
> combine table and view commits. I don’t think there is a reason to combine
> the two. If separate, a table would track the view version used along with
> freshness information for referenced tables. If the table is automatically
> skipped when the version no longer matches the view, then no action needs
> to happen when a view definition changes. Similarly, the table can be
> updated independently without needing to also swap view metadata. This also
> aligns with the idea from the original doc that there can be multiple
> materialization tables for a view. Each should operate independently unless
> I’m missing something
>
> I don’t think the last paragraph’s conclusion is contentious so I’ll move
> on, but please stop here and reply if you disagree!
>
> That leaves the main two options, a view and a separate table linked by
> metadata, or, combined materialized view metadata.
>
> As the doc notes, the separate view and table option is simpler because it
> reuses existing metadata definitions and falls back to simple views. That
> is a significantly smaller spec and small is very, very important when it
> comes to specs. I think that the argument for a new definition of a
> materialized view needs to overcome this disadvantage.
>
> The arguments that I see for a combined materialized view object are:
>
>    - Regular views are separate, rather than being tables with SQL and no
>    data so it would be inconsistent (“Iceberg view is just a table with no
>    data but with representations defined. But we did not do that.”)
>    - Materialized views are different objects in DDL
>    - Tables may be a superset of functionality needed for materialized
>    views
>    - Tables are not typically exposed to end users — but this isn’t
>    required by the separate view and table option
>
> Am I missing any arguments for combined metadata?
>
> Ryan
> --
> Ryan Blue
> Tabular
>

Reply via email to