Thanks Ryan for the help to trace back to the root question! Just a clarification question regarding your reply before I reply further: what exactly does the option "a combination of the two (i.e. commits are combined)" mean? How is that different from "a new metadata type"?
-Jack On Wed, Feb 28, 2024 at 2:10 PM Ryan Blue <b...@tabular.io> wrote: > I’m catching up on this conversation, so hopefully I can bring a fresh > perspective. > > Jack already pointed out that we need to start from the basics and I agree > with that. Let’s remove voting at this point. Right now is the time for > discussing trade-offs, not lining up and taking sides. I realize that > wasn’t the intent with adding a vote, but that’s almost always the result. > It’s too easy to use it as a stand-in for consensus and move on > prematurely. I get the impression from the swirl in Slack that discussion > has moved ahead of agreement. > > We’re still at the most basic question: is a materialized view a view and > a separate table, a combination of the two (i.e. commits are combined), or > a new metadata type? > > For now, I’m ignoring whether the “separate table” is some kind of “system > table” (meaning hidden?) or if it is exposed in the catalog. That’s a later > choice (already pointed out) and, I suspect, it should be delegated to > catalog implementations. > > To simplify this a little, I think that we can eliminate the option to > combine table and view commits. I don’t think there is a reason to combine > the two. If separate, a table would track the view version used along with > freshness information for referenced tables. If the table is automatically > skipped when the version no longer matches the view, then no action needs > to happen when a view definition changes. Similarly, the table can be > updated independently without needing to also swap view metadata. This also > aligns with the idea from the original doc that there can be multiple > materialization tables for a view. Each should operate independently unless > I’m missing something > > I don’t think the last paragraph’s conclusion is contentious so I’ll move > on, but please stop here and reply if you disagree! > > That leaves the main two options, a view and a separate table linked by > metadata, or, combined materialized view metadata. > > As the doc notes, the separate view and table option is simpler because it > reuses existing metadata definitions and falls back to simple views. That > is a significantly smaller spec and small is very, very important when it > comes to specs. I think that the argument for a new definition of a > materialized view needs to overcome this disadvantage. > > The arguments that I see for a combined materialized view object are: > > - Regular views are separate, rather than being tables with SQL and no > data so it would be inconsistent (“Iceberg view is just a table with no > data but with representations defined. But we did not do that.”) > - Materialized views are different objects in DDL > - Tables may be a superset of functionality needed for materialized > views > - Tables are not typically exposed to end users — but this isn’t > required by the separate view and table option > > Am I missing any arguments for combined metadata? > > Ryan > -- > Ryan Blue > Tabular >