Hi, Timo Sorry for later response, thanks for your feedback. Regarding your questions:
> Flink has introduced the concept of Dynamic Tables many years ago. How does the term "Dynamic Table" fit into Flink's regular tables and also how does it relate to Table API? > I fear that adding the DYNAMIC TABLE keyword could cause confusion for > users, because a term for regular CREATE TABLE (that can be "kind of > dynamic" as well and is backed by a changelog) is then missing. Also > given that we call our connectors for those tables, DynamicTableSource > and DynamicTableSink. > In general, I find it contradicting that a TABLE can be "paused" or > "resumed". From an English language perspective, this does sound > incorrect. In my opinion (without much research yet), a continuous > updating trigger should rather be modelled as a CREATE MATERIALIZED VIEW > (which users are familiar with?) or a new concept such as a CREATE TASK > (that can be paused and resumed?). 1. In the current concept[1], it actually includes: Dynamic Tables & Continuous Query. Dynamic Table is just an abstract logical concept , which in its physical form represents either a table or a changelog stream. It requires the combination with Continuous Query to achieve dynamic updates of the target table similar to a database’s Materialized View. We hope to upgrade the Dynamic Table to a real entity that users can operate, which combines the logical concepts of Dynamic Tables + Continuous Query. By integrating the definition of tables and queries, it can achieve functions similar to Materialized Views, simplifying users' data processing pipelines. So, the object of the suspend operation is the refresh task of the dynamic table. The command `ALTER DYNAMIC TABLE table_name SUSPEND ` is actually a shorthand for `ALTER DYNAMIC TABLE table_name SUSPEND REFRESH` (if written in full for clarity, we can also modify it). 2. Initially, we also considered Materialized Views , but ultimately decided against them. Materialized views are designed to enhance query performance for workloads that consist of common, repetitive query patterns. In essence, a materialized view represents the result of a query. However, it is not intended to support data modification. For Lakehouse scenarios, where the ability to delete or update data is crucial (such as compliance with GDPR, FLIP-2), materialized views fall short. 3. Compared to CREATE (regular) TABLE, CREATE DYNAMIC TABLE not only defines metadata in the catalog but also automatically initiates a data refresh task based on the query specified during table creation. It dynamically executes data updates. Users can focus on data dependencies and data generation logic. 4. The new dynamic table does not conflict with the existing DynamicTableSource and DynamicTableSink interfaces. For the developer, all that needs to be implemented is the new CatalogDynamicTable, without changing the implementation of source and sink. 5. For now, the FLIP does not consider supporting Table API operations on Dynamic Table . However, once the SQL syntax is finalized, we can discuss this in a separate FLIP. Currently, I have a rough idea: the Table API should also introduce DynamicTable operation interfaces corresponding to the existing Table interfaces. The TableEnvironment will provide relevant methods to support various dynamic table operations. The goal for the new Dynamic Table is to offer users an experience similar to using a database, which is why we prioritize SQL-based approaches initially. > How do you envision re-adding the functionality of a statement set, that > fans out to multiple tables? This is a very important use case for data > pipelines. Multi-tables is indeed a very important user scenario. In the future, we can consider extending the statement set syntax to support the creation of multiple dynamic tables. > > Since the early days of Flink SQL, we were discussing `SELECT STREAM * > FROM T EMIT 5 MINUTES`. Your proposal seems to rephrase STREAM and EMIT, > into other keywords DYNAMIC TABLE and FRESHNESS. But the core > functionality is still there. I'm wondering if we should widen the scope > (maybe not part of this FLIP but a new FLIP) to follow the standard more > closely. Making `SELECT * FROM t` bounded by default and use new syntax > for the dynamic behavior. Flink 2.0 would be the perfect time for this, > however, it would require careful discussions. What do you think? The query part indeed requires a separate FLIP for discussion, as it involves changes to the default behavior. [1] https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables Best, Ron Jing Zhang <beyond1...@gmail.com> 于2024年3月13日周三 15:19写道: > Hi, Lincoln & Ron, > > Thanks for the proposal. > > I agree with the question raised by Timo. > > Besides, I have some other questions. > 1. How to define query of dynamic table? > Use flink sql or introducing new syntax? > If use flink sql, how to handle the difference in SQL between streaming and > batch processing? > For example, a query including window aggregate based on processing time? > or a query including global order by? > > 2. Whether modify the query of dynamic table is allowed? > Or we could only refresh a dynamic table based on initial query? > > 3. How to use dynamic table? > The dynamic table seems to be similar with materialized view. Will we do > something like materialized view rewriting during the optimization? > > Best, > Jing Zhang > > > Timo Walther <twal...@apache.org> 于2024年3月13日周三 01:24写道: > > > Hi Lincoln & Ron, > > > > thanks for proposing this FLIP. I think a design similar to what you > > propose has been in the heads of many people, however, I'm wondering how > > this will fit into the bigger picture. > > > > I haven't deeply reviewed the FLIP yet, but would like to ask some > > initial questions: > > > > Flink has introduced the concept of Dynamic Tables many years ago. How > > does the term "Dynamic Table" fit into Flink's regular tables and also > > how does it relate to Table API? > > > > I fear that adding the DYNAMIC TABLE keyword could cause confusion for > > users, because a term for regular CREATE TABLE (that can be "kind of > > dynamic" as well and is backed by a changelog) is then missing. Also > > given that we call our connectors for those tables, DynamicTableSource > > and DynamicTableSink. > > > > In general, I find it contradicting that a TABLE can be "paused" or > > "resumed". From an English language perspective, this does sound > > incorrect. In my opinion (without much research yet), a continuous > > updating trigger should rather be modelled as a CREATE MATERIALIZED VIEW > > (which users are familiar with?) or a new concept such as a CREATE TASK > > (that can be paused and resumed?). > > > > How do you envision re-adding the functionality of a statement set, that > > fans out to multiple tables? This is a very important use case for data > > pipelines. > > > > Since the early days of Flink SQL, we were discussing `SELECT STREAM * > > FROM T EMIT 5 MINUTES`. Your proposal seems to rephrase STREAM and EMIT, > > into other keywords DYNAMIC TABLE and FRESHNESS. But the core > > functionality is still there. I'm wondering if we should widen the scope > > (maybe not part of this FLIP but a new FLIP) to follow the standard more > > closely. Making `SELECT * FROM t` bounded by default and use new syntax > > for the dynamic behavior. Flink 2.0 would be the perfect time for this, > > however, it would require careful discussions. What do you think? > > > > Regards, > > Timo > > > > > > On 11.03.24 08:23, Ron liu wrote: > > > Hi, Dev > > > > > > > > > Lincoln Lee and I would like to start a discussion about FLIP-435: > > > Introduce a New Dynamic Table for Simplifying Data Pipelines. > > > > > > > > > This FLIP is designed to simplify the development of data processing > > > pipelines. With Dynamic Tables with uniform SQL statements and > > > freshness, users can define batch and streaming transformations to > > > data in the same way, accelerate ETL pipeline development, and manage > > > task scheduling automatically. > > > > > > > > > For more details, see FLIP-435 [1]. Looking forward to your feedback. > > > > > > > > > [1] > > > > > > > > > Best, > > > > > > Lincoln & Ron > > > > > > > >