Hi Ramin, I like this proposal. I wondered what you thought of supporting evolutions like :
1. Changing a column or child object to be nullable? 2. Allowing column type changes if they were castable? This could require a rewriting of the column data - so may not be feasible. 3. WDYT? Kind regards, David. From: Ramin Gharib <[email protected]> Date: Friday, 5 September 2025 at 11:27 To: [email protected] <[email protected]> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-546: Introduce CREATE OR ALTER for Materialized Tables Hi Ron, Thanks again for the excellent feedback and for bringing FLIP-492 [1] into the discussion. I want to address your points on CREATE OR REPLACE vs. CREATE OR ALTER and the SQL standard. *1. Semantics of REPLACE vs. ALTER* You’re right that FLIP-492 [1] proposed CREATE OR REPLACE. However, I believe the semantics of ALTER are a much better fit for an object as complex as a Materialized Table, which has a running job, state, and physical data associated with it. Interestingly, this exact distinction is recognized by other modern data platforms. For example, *Snowflake supports both CREATE OR REPLACE TABLE and CREATE OR ALTER TABLE as distinct commands with different use cases [2].* - Their documentation for CREATE OR ALTER TABLE explicitly states it *"creates a table if it doesn’t exist, or alters it according to the table definition... existing data in the table is preserved when possible."* This "modify in place" semantic is precisely what this FLIP aims to achieve. - Conversely, their docs describe CREATE OR REPLACE as *"the equivalent of using DROP TABLE on the existing table and then creating a new table..."* This "drop and recreate" semantics is a destructive action that is not suitable for a stateful object like a Flink Materialized Table, where the goal is to evolve the pipeline, not destroy it. This is particularly relevant for Flink, as we already have an ALTER MATERIALIZED TABLE ... AS <select_statement> command that defines the "modify in-place" behavior. My proposal is that the ALTER path of CREATE OR ALTER should be implemented by delegating directly to this existing logic. This ensures behavioral consistency. Using REPLACE would introduce a conflict that doesn't align with our current implementation. *2. Forward-Looking Resilience* The ALTER semantic is also more forward-looking. As Flink's evolution capabilities become more sophisticated—potentially including complex reprocessing strategies that retain historical data—the concept of 'altering' a pipeline is more fitting than 'replacing' it. REPLACE suggests a simple, destructive action, whereas ALTER provides a more resilient foundation for nuanced, state-preserving modifications in the future. *3. The SQL Standard* You've raised an important point about the SQL standard. A review shows that neither CREATE OR REPLACE nor CREATE OR ALTER are part of the formal ANSI/ISO SQL standard*.* They are both widely adopted, vendor-specific extensions. - *CREATE OR REPLACE* was popularized by systems like PostgreSQL (for views/functions) [3] and Oracle [4]. - *CREATE OR ALTER* was popularized by Microsoft SQL Server [5] and is now a key feature in Snowflake [6]. Since neither is formally "standard," our choice should be guided by which one provides the most clarity and consistency for Flink users. The fact that a major platform like Snowflake has implemented both as distinct commands reinforces the idea that the industry sees value in their different semantics. *4. Preventing User Confusion* I believe CREATE OR ALTER will actually be *less confusing* for Flink users precisely because it maps directly to the existing commands they already know. A user familiar with ALTER MATERIALIZED TABLE ... AS will immediately understand what the ALTER part of the new command does. Given that the implementation for FLIP-492 [1] is not yet complete ( FLINK-36995 [7] is still open), now is a good time to choose (or define) the syntax with the clearest and safest semantics before any code is merged. In summary, I believe CREATE OR ALTER provides clearer, non-destructive semantics that are more consistent with Flink's existing DDL for Materialized Tables, making it the superior choice for both current functionality and future resilience. Thanks again for the great discussion points. I look forward to hearing your thoughts and those of the wider community. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables [2] https://docs.snowflake.com/en/sql-reference/sql/create-table#usage-notes [3] https://www.postgresql.org/docs/current/sql-createview.html [4] https://docs.oracle.com/cd/E17952_01/mysql-5.7-en/create-view.html [5] https://learn.microsoft.com/en-us/sql/t-sql/statements/create-view-transact-sql?view=sql-server-ver17#or-alter [6] https://docs.snowflake.com/en/sql-reference/sql/create-table#label-create-or-alter-table-syntax [7] https://issues.apache.org/jira/browse/FLINK-36995 Best, Ramin On Thu, Sep 4, 2025 at 1:54 PM Ron Liu <[email protected]> wrote: > Hi, Ramin > > In FLIP-492[1], we introduced the `CREATE OR REPLACE MATERIALIZED TABLE` > syntax to support modifying materialized tables. Can we extend this syntax > to achieve the functionality you need, such as introducing clause > parameters to determine whether to replace the entire table or just some of > its attributes? > > Regarding the `CREATE OR ALTER TABLE` syntax, I couldn't find it in the SQL > standard. I'm concerned that this new syntax might confuse users and add > new usage costs. What do you think? > > 1. > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-492%3A+Support+Query+Modifications+for+Materialized+Tables > > Best, > Ron > > Ramin Gharib <[email protected]> 于2025年9月4日周四 15:04写道: > > > Good morning, Mate, > > Thanks for the feedback! When I wrote this, the FLIP was not yet closed! > I > > have added the DISTRIBUTED BY/INTO clause to the FLIP. > > > > Cheers, > > > > Ramin > > > > On Wed, Sep 3, 2025 at 7:58 PM Mate Czagany <[email protected]> wrote: > > > > > Hi Ramin, > > > > > > Thank you for the proposal, I think this new command makes perfect > sense > > > and has the potential to improve IaC pipelines. > > > > > > It's missing from the SQL syntax, but are there any plans to make this > > new > > > command also support the DISTRIBUTED BY/INTO functionality of FLIP-542 > > [1] > > > that was just voted to be implemented yesterday [2]? > > > > > > Best Regards > > > Mate > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables > > > [2] https://lists.apache.org/thread/40xyjjvklt65tw21o2pmk8r3srprxs5q > > > > > > On Tue, Sep 2, 2025 at 3:14 PM Ramin Gharib <[email protected]> > > wrote: > > > > > > > Hi everyone, > > > > > > > > I want to start a discussion on a new Flink Improvement Proposal, > > > FLIP-546: > > > > Introduce CREATE OR ALTER for Materialized Tables [1]. > > > > > > > > The introduction of Materialized Tables in FLIP-435 [2] was a > > significant > > > > step toward simplifying data pipelines. However, as we integrate them > > > into > > > > production environments managed by automation, a key challenge > arises. > > > > Modern data operations rely on declarative tools (like dbt, > Terraform, > > > > CI/CD pipelines) that require idempotent commands to manage the > > lifecycle > > > > of data assets. > > > > > > > > The current CREATE MATERIALIZED TABLE syntax is not idempotent and > > fails > > > if > > > > the table already exists. This forces developers and tool creators to > > > > implement complex and risky CREATE IF NOT EXISTS or DROP...IF EXISTS > > > logic > > > > can lead to data loss and complicate the development of reliable data > > > > management tools. > > > > > > > > This FLIP proposes introducing a CREATE OR ALTER MATERIALIZED TABLE > > > command > > > > to provide a native, idempotent way to manage materialized tables, > > > aligning > > > > Flink with best practices for declarative, infrastructure-as-code > > > > deployments. > > > > > > > > [1] https://cwiki.apache.org/confluence/x/0wnXFg > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables > > > > > > > > > > > > > [2] https://cwiki.apache.org/confluence/x/HYySEQ > > > > > > > > Best Regards, > > > > > > > > Ramin Gharib > > > > > > > > > > On Wed, Sep 3, 2025 at 7:58 PM Mate Czagany <[email protected]> wrote: > > > > > Hi Ramin, > > > > > > Thank you for the proposal, I think this new command makes perfect > sense > > > and has the potential to improve IaC pipelines. > > > > > > It's missing from the SQL syntax, but are there any plans to make this > > new > > > command also support the DISTRIBUTED BY/INTO functionality of FLIP-542 > > [1] > > > that was just voted to be implemented yesterday [2]? > > > > > > Best Regards > > > Mate > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables > > > [2] https://lists.apache.org/thread/40xyjjvklt65tw21o2pmk8r3srprxs5q > > > > > > On Tue, Sep 2, 2025 at 3:14 PM Ramin Gharib <[email protected]> > > wrote: > > > > > > > Hi everyone, > > > > > > > > I want to start a discussion on a new Flink Improvement Proposal, > > > FLIP-546: > > > > Introduce CREATE OR ALTER for Materialized Tables [1]. > > > > > > > > The introduction of Materialized Tables in FLIP-435 [2] was a > > significant > > > > step toward simplifying data pipelines. However, as we integrate them > > > into > > > > production environments managed by automation, a key challenge > arises. > > > > Modern data operations rely on declarative tools (like dbt, > Terraform, > > > > CI/CD pipelines) that require idempotent commands to manage the > > lifecycle > > > > of data assets. > > > > > > > > The current CREATE MATERIALIZED TABLE syntax is not idempotent and > > fails > > > if > > > > the table already exists. This forces developers and tool creators to > > > > implement complex and risky CREATE IF NOT EXISTS or DROP...IF EXISTS > > > logic > > > > can lead to data loss and complicate the development of reliable data > > > > management tools. > > > > > > > > This FLIP proposes introducing a CREATE OR ALTER MATERIALIZED TABLE > > > command > > > > to provide a native, idempotent way to manage materialized tables, > > > aligning > > > > Flink with best practices for declarative, infrastructure-as-code > > > > deployments. > > > > > > > > [1] https://cwiki.apache.org/confluence/x/0wnXFg > > > > < > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-546%3A+Introduce+CREATE+OR+ALTER+for+Materialized+Tables > > > > > > > > > > > > > [2] https://cwiki.apache.org/confluence/x/HYySEQ > > > > > > > > Best Regards, > > > > > > > > Ramin Gharib > > > > > > > > > > Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: Building C, IBM Hursley Office, Hursley Park Road, Winchester, Hampshire SO21 2JN
