Hi, Dev Sorry for my previous statement was not quite accurate. We will hold a vote for the name within this thread.
Best, Ron Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:29写道: > Hi, Timo > > Thanks for your reply. > > I agree with you that sometimes naming is more difficult. When no one has > a clear preference, voting on the name is a good solution, so I'll send a > separate email for the vote, clarify the rules for the vote, then let > everyone vote. > > One other point to confirm, in your ranking there is an option for > Materialized View, does it stand for the UPDATING Materialized View that > you mentioned earlier in the discussion? If using Materialized View I think > it is needed to extend it. > > Best, > Ron > > Timo Walther <twal...@apache.org> 于2024年4月9日周二 17:20写道: > >> Hi Ron, >> >> yes naming is hard. But it will have large impact on trainings, >> presentations, and the mental model of users. Maybe the easiest is to >> collect ranking by everyone with some short justification: >> >> >> My ranking (from good to not so good): >> >> 1. Refresh Table -> states what it does >> 2. Materialized Table -> similar to SQL materialized view but a table >> 3. Live Table -> nice buzzword, but maybe still too close to dynamic >> tables? >> 4. Materialized View -> a bit broader than standard but still very similar >> 5. Derived table -> taken by standard >> >> Regards, >> Timo >> >> >> >> On 07.04.24 11:34, Ron liu wrote: >> > Hi, Dev >> > >> > This is a summary letter. After several rounds of discussion, there is a >> > strong consensus about the FLIP proposal and the issues it aims to >> address. >> > The current point of disagreement is the naming of the new concept. I >> have >> > summarized the candidates as follows: >> > >> > 1. Derived Table (Inspired by Google Lookers) >> > - Pros: Google Lookers has introduced this concept, which is >> designed >> > for building Looker's automated modeling, aligning with our purpose for >> the >> > stream-batch automatic pipeline. >> > >> > - Cons: The SQL standard uses derived table term extensively, >> vendors >> > adopt this for simply referring to a table within a subclause. >> > >> > 2. Materialized Table: It means materialize the query result to table, >> > similar to Db2 MQT (Materialized Query Tables). In addition, Snowflake >> > Dynamic Table's predecessor is also called Materialized Table. >> > >> > 3. Updating Table (From Timo) >> > >> > 4. Updating Materialized View (From Timo) >> > >> > 5. Refresh/Live Table (From Martijn) >> > >> > As Martijn said, naming is a headache, looking forward to more valuable >> > input from everyone. >> > >> > [1] >> > >> https://cloud.google.com/looker/docs/derived-tables#persistent_derived_tables >> > [2] >> https://www.ibm.com/docs/en/db2/11.5?topic=tables-materialized-query >> > [3] >> > >> https://community.denodo.com/docs/html/browse/6.0/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables >> > >> > Best, >> > Ron >> > >> > Ron liu <ron9....@gmail.com> 于2024年4月7日周日 15:55写道: >> > >> >> Hi, Lorenzo >> >> >> >> Thank you for your insightful input. >> >> >> >>>>> I think the 2 above twisted the materialized view concept to more >> than >> >> just an optimization for accessing pre-computed aggregates/filters. >> >> I think that concept (at least in my mind) is now adherent to the >> >> semantics of the words themselves ("materialized" and "view") than on >> its >> >> implementations in DBMs, as just a view on raw data that, hopefully, is >> >> constantly updated with fresh results. >> >> That's why I understand Timo's et al. objections. >> >> >> >> Your understanding of Materialized Views is correct. However, in our >> >> scenario, an important feature is the support for Update & Delete >> >> operations, which the current Materialized Views cannot fulfill. As we >> >> discussed with Timo before, if Materialized Views needs to support data >> >> modifications, it would require an extension of new keywords, such as >> >> CREATING xxx (UPDATING) MATERIALIZED VIEW. >> >> >> >>>>> Still, I don't understand why we need another type of special table. >> >> Could you dive deep into the reasons why not simply adding the >> FRESHNESS >> >> parameter to standard tables? >> >> >> >> Firstly, I need to emphasize that we cannot achieve the design goal of >> >> FLIP through the CREATE TABLE syntax combined with a FRESHNESS >> parameter. >> >> The proposal of this FLIP is to use Dynamic Table + Continuous Query, >> and >> >> combine it with FRESHNESS to realize a streaming-batch unification. >> >> However, CREATE TABLE is merely a metadata operation and cannot >> >> automatically start a background refresh job. To achieve the design >> goal of >> >> FLIP with standard tables, it would require extending the CTAS[1] >> syntax to >> >> introduce the FRESHNESS keyword. We considered this design initially, >> but >> >> it has following problems: >> >> >> >> 1. Distinguishing a table created through CTAS as a standard table or >> as a >> >> "special" standard table with an ongoing background refresh job using >> the >> >> FRESHNESS keyword is very obscure for users. >> >> 2. It intrudes on the semantics of the CTAS syntax. Currently, tables >> >> created using CTAS only add table metadata to the Catalog and do not >> record >> >> attributes such as query. There are also no ongoing background refresh >> >> jobs, and the data writing operation happens only once at table >> creation. >> >> 3. For the framework, when we perform a certain kind of Alter Table >> >> behavior for a table, for the table created by specifying FRESHNESS >> and did >> >> not specify the FRESHNESS created table behavior how to distinguish , >> which >> >> will also cause confusion. >> >> >> >> In terms of the design goal of combining Dynamic Table + Continuous >> Query, >> >> the FLIP proposal cannot be realized by only extending the current >> stardand >> >> tables, so a new kind of dynamic table needs to be introduced at the >> >> first-level concept. >> >> >> >> [1] >> >> >> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement >> >> >> >> Best, >> >> Ron >> >> >> >> <lorenzo.affe...@ververica.com.invalid> 于2024年4月3日周三 22:25写道: >> >> >> >>> Hello everybody! >> >>> Thanks for the FLIP as it looks amazing (and I think the prove is this >> >>> deep discussion it is provoking :)) >> >>> >> >>> I have a couple of comments to add to this: >> >>> >> >>> Even though I get the reason why you rejected MATERIALIZED VIEW, I >> still >> >>> like it a lot, and I would like to provide pointers on how the >> materialized >> >>> view concept twisted in last years: >> >>> >> >>> • Materialize DB (https://materialize.com/) >> >>> • The famous talk by Martin Kleppmann "turning the database inside >> out" ( >> >>> https://www.youtube.com/watch?v=fU9hR3kiOK0) >> >>> >> >>> I think the 2 above twisted the materialized view concept to more than >> >>> just an optimization for accessing pre-computed aggregates/filters. >> >>> I think that concept (at least in my mind) is now adherent to the >> >>> semantics of the words themselves ("materialized" and "view") than on >> its >> >>> implementations in DBMs, as just a view on raw data that, hopefully, >> is >> >>> constantly updated with fresh results. >> >>> That's why I understand Timo's et al. objections. >> >>> Still I understand there is no need to add confusion :) >> >>> >> >>> Still, I don't understand why we need another type of special table. >> >>> Could you dive deep into the reasons why not simply adding the >> FRESHNESS >> >>> parameter to standard tables? >> >>> >> >>> I would say that as a very seamless implementation with the goal of a >> >>> unification of batch and streaming. >> >>> If we stick to a unified world, I think that Flink should just >> provide 1 >> >>> type of table that is inherently dynamic. >> >>> Now, depending on FRESHNESS objectives / connectors used in WITH, that >> >>> table can be backed by a stream or batch job as you explained in your >> FLIP. >> >>> >> >>> Maybe I am totally missing the point :) >> >>> >> >>> Thank you in advance, >> >>> Lorenzo >> >>> On Apr 3, 2024 at 15:25 +0200, Martijn Visser < >> martijnvis...@apache.org>, >> >>> wrote: >> >>>> Hi all, >> >>>> >> >>>> Thanks for the proposal. While the FLIP talks extensively on how >> >>> Snowflake >> >>>> has Dynamic Tables and Databricks has Delta Live Tables, my >> >>> understanding >> >>>> is that Databricks has CREATE STREAMING TABLE [1] which relates with >> >>> this >> >>>> proposal. >> >>>> >> >>>> I do have concerns about using CREATE DYNAMIC TABLE, specifically >> about >> >>>> confusing the users who are familiar with Snowflake's approach where >> you >> >>>> can't change the content via DML statements, while that is something >> >>> that >> >>>> would work in this proposal. Naming is hard of course, but I would >> >>> probably >> >>>> prefer something like CREATE CONTINUOUS TABLE, CREATE REFRESH TABLE >> or >> >>>> CREATE LIVE TABLE. >> >>>> >> >>>> Best regards, >> >>>> >> >>>> Martijn >> >>>> >> >>>> [1] >> >>>> >> >>> >> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html >> >>>> >> >>>> On Wed, Apr 3, 2024 at 5:19 AM Ron liu <ron9....@gmail.com> wrote: >> >>>> >> >>>>> Hi, dev >> >>>>> >> >>>>> After offline discussion with Becket Qin, Lincoln Lee and Jark Wu, >> we >> >>> have >> >>>>> improved some parts of the FLIP. >> >>>>> >> >>>>> 1. Add Full Refresh Mode section to clarify the semantics of full >> >>> refresh >> >>>>> mode. >> >>>>> 2. Add Future Improvement section explaining why query statement >> does >> >>> not >> >>>>> support references to temporary view and possible solutions. >> >>>>> 3. The Future Improvement section explains a possible future >> solution >> >>> for >> >>>>> dynamic table to support the modification of query statements to >> meet >> >>> the >> >>>>> common field-level schema evolution requirements of the lakehouse. >> >>>>> 4. The Refresh section emphasizes that the Refresh command and the >> >>>>> background refresh job can be executed in parallel, with no >> >>> restrictions at >> >>>>> the framework level. >> >>>>> 5. Convert RefreshHandler into a plug-in interface to support >> various >> >>>>> workflow schedulers. >> >>>>> >> >>>>> Best, >> >>>>> Ron >> >>>>> >> >>>>> Ron liu <ron9....@gmail.com> 于2024年4月2日周二 10:28写道: >> >>>>> >> >>>>>>> Hi, Venkata krishnan >> >>>>>>> >> >>>>>>> Thank you for your involvement and suggestions, and hope that the >> >>> design >> >>>>>>> goals of this FLIP will be helpful to your business. >> >>>>>>> >> >>>>>>>>>>>>> 1. In the proposed FLIP, given the example for the >> >>> dynamic table, do >> >>>>>>> the >> >>>>>>> data sources always come from a single lake storage such as >> >>> Paimon or >> >>>>> does >> >>>>>>> the same proposal solve for 2 disparate storage systems like >> >>> Kafka and >> >>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to Paimon? >> >>>>>>> Basically the lambda architecture that is mentioned in the FLIP >> >>> as well. >> >>>>>>> I'm wondering if it is possible to switch b/w sources based on the >> >>>>>>> execution mode, for eg: if it is backfill operation, switch to a >> >>> data >> >>>>> lake >> >>>>>>> storage system like Iceberg, otherwise an event streaming system >> >>> like >> >>>>>>> Kafka. >> >>>>>>> >> >>>>>>> Dynamic table is a design abstraction at the framework level and >> >>> is not >> >>>>>>> tied to the physical implementation of the connector. If a >> >>> connector >> >>>>>>> supports a combination of Kafka and lake storage, this works fine. >> >>>>>>> >> >>>>>>>>>>>>> 2. What happens in the context of a bootstrap (batch) + >> >>> nearline >> >>>>> update >> >>>>>>> (streaming) case that are stateful applications? What I mean by >> >>> that is, >> >>>>>>> will the state from the batch application be transferred to the >> >>> nearline >> >>>>>>> application after the bootstrap execution is complete? >> >>>>>>> >> >>>>>>> I think this is another orthogonal thing, something that FLIP-327 >> >>> tries >> >>>>> to >> >>>>>>> address, not directly related to Dynamic Table. >> >>>>>>> >> >>>>>>> [1] >> >>>>>>> >> >>>>> >> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> Ron >> >>>>>>> >> >>>>>>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年3月30日周六 >> >>> 07:06写道: >> >>>>>>> >> >>>>>>>>> Ron and Lincoln, >> >>>>>>>>> >> >>>>>>>>> Great proposal and interesting discussion for adding support >> >>> for dynamic >> >>>>>>>>> tables within Flink. >> >>>>>>>>> >> >>>>>>>>> At LinkedIn, we are also trying to solve compute/storage >> >>> convergence for >> >>>>>>>>> similar problems discussed as part of this FLIP, specifically >> >>> periodic >> >>>>>>>>> backfill, bootstrap + nearline update use cases using single >> >>>>>>>>> implementation >> >>>>>>>>> of business logic (single script). >> >>>>>>>>> >> >>>>>>>>> Few clarifying questions: >> >>>>>>>>> >> >>>>>>>>> 1. In the proposed FLIP, given the example for the dynamic >> >>> table, do the >> >>>>>>>>> data sources always come from a single lake storage such as >> >>> Paimon or >> >>>>> does >> >>>>>>>>> the same proposal solve for 2 disparate storage systems like >> >>> Kafka and >> >>>>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to >> >>> Paimon? >> >>>>>>>>> Basically the lambda architecture that is mentioned in the >> >>> FLIP as well. >> >>>>>>>>> I'm wondering if it is possible to switch b/w sources based on >> >>> the >> >>>>>>>>> execution mode, for eg: if it is backfill operation, switch to >> >>> a data >> >>>>> lake >> >>>>>>>>> storage system like Iceberg, otherwise an event streaming >> >>> system like >> >>>>>>>>> Kafka. >> >>>>>>>>> 2. What happens in the context of a bootstrap (batch) + >> >>> nearline update >> >>>>>>>>> (streaming) case that are stateful applications? What I mean >> >>> by that is, >> >>>>>>>>> will the state from the batch application be transferred to >> >>> the nearline >> >>>>>>>>> application after the bootstrap execution is complete? >> >>>>>>>>> >> >>>>>>>>> Regards >> >>>>>>>>> Venkata krishnan >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> On Mon, Mar 25, 2024 at 8:03 PM Ron liu <ron9....@gmail.com> >> >>> wrote: >> >>>>>>>>> >> >>>>>>>>>>> Hi, Timo >> >>>>>>>>>>> >> >>>>>>>>>>> Thanks for your quick response, and your suggestion. >> >>>>>>>>>>> >> >>>>>>>>>>> Yes, this discussion has turned into confirming whether >> >>> it's a special >> >>>>>>>>>>> table or a special MV. >> >>>>>>>>>>> >> >>>>>>>>>>> 1. The key problem with MVs is that they don't support >> >>> modification, >> >>>>> so >> >>>>>>>>> I >> >>>>>>>>>>> prefer it to be a special table. Although the periodic >> >>> refresh >> >>>>> behavior >> >>>>>>>>> is >> >>>>>>>>>>> more characteristic of an MV, since we are already a >> >>> special table, >> >>>>>>>>>>> supporting periodic refresh behavior is quite natural, >> >>> similar to >> >>>>>>>>> Snowflake >> >>>>>>>>>>> dynamic tables. >> >>>>>>>>>>> >> >>>>>>>>>>> 2. Regarding the keyword UPDATING, since the current >> >>> Regular Table is >> >>>>> a >> >>>>>>>>>>> Dynamic Table, which implies support for updating through >> >>> Continuous >> >>>>>>>>> Query, >> >>>>>>>>>>> I think it is redundant to add the keyword UPDATING. In >> >>> addition, >> >>>>>>>>> UPDATING >> >>>>>>>>>>> can not reflect the Continuous Query part, can not express >> >>> the purpose >> >>>>>>>>> we >> >>>>>>>>>>> want to simplify the data pipeline through Dynamic Table + >> >>> Continuous >> >>>>>>>>>>> Query. >> >>>>>>>>>>> >> >>>>>>>>>>> 3. From the perspective of the SQL standard definition, I >> >>> can >> >>>>> understand >> >>>>>>>>>>> your concerns about Derived Table, but is it possible to >> >>> make a slight >> >>>>>>>>>>> adjustment to meet our needs? Additionally, as Lincoln >> >>> mentioned, the >> >>>>>>>>>>> Google Looker platform has introduced Persistent Derived >> >>> Table, and >> >>>>>>>>> there >> >>>>>>>>>>> are precedents in the industry; could Derived Table be a >> >>> candidate? >> >>>>>>>>>>> >> >>>>>>>>>>> Of course, look forward to your better suggestions. >> >>>>>>>>>>> >> >>>>>>>>>>> Best, >> >>>>>>>>>>> Ron >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2024年3月25日周一 18:49写道: >> >>>>>>>>>>> >> >>>>>>>>>>>>> After thinking about this more, this discussion boils >> >>> down to >> >>>>> whether >> >>>>>>>>>>>>> this is a special table or a special materialized >> >>> view. In both >> >>>>> cases, >> >>>>>>>>>>>>> we would need to add a special keyword: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Either >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> CREATE UPDATING TABLE >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> or >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> CREATE UPDATING MATERIALIZED VIEW >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I still feel that the periodic refreshing behavior is >> >>> closer to a >> >>>>> MV. >> >>>>>>>>> If >> >>>>>>>>>>>>> we add a special keyword to MV, the optimizer would >> >>> know that the >> >>>>> data >> >>>>>>>>>>>>> cannot be used for query optimizations. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I will ask more people for their opinion. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>> Timo >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On 25.03.24 10:45, Timo Walther wrote: >> >>>>>>>>>>>>>>> Hi Ron and Lincoln, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> thanks for the quick response and the very >> >>> insightful discussion. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> we might limit future opportunities to >> >>> optimize queries >> >>>>>>>>>>>>>>>>> through automatic materialization rewriting by >> >>> allowing data >> >>>>>>>>>>>>>>>>> modifications, thus losing the potential for >> >>> such >> >>>>> optimizations. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> This argument makes a lot of sense to me. Due to >> >>> the updates, the >> >>>>>>>>>>> system >> >>>>>>>>>>>>>>> is not in full control of the persisted data. >> >>> However, the system >> >>>>> is >> >>>>>>>>>>>>>>> still in full control of the job that powers the >> >>> refresh. So if >> >>>>> the >> >>>>>>>>>>>>>>> system manages all updating pipelines, it could >> >>> still leverage >> >>>>>>>>>>> automatic >> >>>>>>>>>>>>>>> materialization rewriting but without leveraging >> >>> the data at rest >> >>>>>>>>> (only >> >>>>>>>>>>>>>>> the data in flight). >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> we are considering another candidate, Derived >> >>> Table, the term >> >>>>>>>>>>> 'derive' >> >>>>>>>>>>>>>>>>> suggests a query, and 'table' retains >> >>> modifiability. This >> >>>>>>>>> approach >> >>>>>>>>>>>>>>>>> would not disrupt our current concept of a >> >>> dynamic table >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> I did some research on this term. The SQL standard >> >>> uses the term >> >>>>>>>>>>>>>>> "derived table" extensively (defined in section >> >>> 4.17.3). Thus, a >> >>>>>>>>> lot of >> >>>>>>>>>>>>>>> vendors adopt this for simply referring to a table >> >>> within a >> >>>>>>>>> subclause: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/derived-tables.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghdiMp$ >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32300.1600/doc/html/san1390612291252.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737h1gRux$ >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://www.c-sharpcorner.com/article/derived-tables-vs-common-table-expressions/__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739bWIEcL$ >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://stackoverflow.com/questions/26529804/what-are-the-derived-tables-in-my-explain-statement__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739HnGtQf$ >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://www.sqlservercentral.com/articles/sql-derived-tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737DeBiqg$ >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Esp. the latter example is interesting, SQL Server >> >>> allows things >> >>>>>>>>> like >> >>>>>>>>>>>>>>> this on derived tables: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> UPDATE T SET Name='Timo' FROM (SELECT * FROM >> >>> Product) AS T >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> SELECT * FROM Product; >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Btw also Snowflake's dynamic table state: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Because the content of a dynamic table is >> >>> fully determined >> >>>>>>>>>>>>>>>>> by the given query, the content cannot be >> >>> changed by using DML. >> >>>>>>>>>>>>>>>>> You don’t insert, update, or delete the rows >> >>> in a dynamic >> >>>>> table. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> So a new term makes a lot of sense. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> How about using `UPDATING`? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> CREATE UPDATING TABLE >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> This reflects that modifications can be made and >> >>> from an >> >>>>>>>>>>>>>>> English-language perspective you can PAUSE or >> >>> RESUME the UPDATING. >> >>>>>>>>>>>>>>> Thus, a user can define UPDATING interval and mode? >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Looking forward to your thoughts. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On 25.03.24 07:09, Ron liu wrote: >> >>>>>>>>>>>>>>>>> Hi, Ahmed >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thanks for your feedback. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Regarding your question: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I want to iterate on Timo's comments >> >>> regarding the confusion >> >>>>>>>>> between >> >>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink "Table". >> >>> Should the refactoring >> >>>>>>>>> of >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it in >> >>> this Flip ( as the >> >>>>>>>>>>>>>>>>> suggestions >> >>>>>>>>>>>>>>>>> in the thread ) and address the holistic >> >>> changes in a separate >> >>>>> Flip >> >>>>>>>>>>>>>>>>> for 2.0? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Lincoln proposed a new concept in reply to >> >>> Timo: Derived Table, >> >>>>>>>>> which >> >>>>>>>>>>>>>>>>> is a >> >>>>>>>>>>>>>>>>> combination of Dynamic Table + Continuous >> >>> Query, and the use of >> >>>>>>>>>>> Derived >> >>>>>>>>>>>>>>>>> Table will not conflict with existing concepts, >> >>> what do you >> >>>>> think? >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> I feel confused with how it is further with >> >>> other components, >> >>>>> the >> >>>>>>>>>>>>>>>>> examples provided feel like a standalone ETL >> >>> job, could you >> >>>>>>>>> provide in >> >>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>> FLIP an example where the table is further used >> >>> in subsequent >> >>>>>>>>> queries >> >>>>>>>>>>>>>>>>> (specially in batch mode). >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Thanks for your suggestion, I added how to use >> >>> Dynamic Table in >> >>>>>>>>> FLIP >> >>>>>>>>>>>>> user >> >>>>>>>>>>>>>>>>> story section, Dynamic Table can be referenced >> >>> by downstream >> >>>>>>>>> Dynamic >> >>>>>>>>>>>>>>>>> Table >> >>>>>>>>>>>>>>>>> and can also support OLAP queries. >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>> Ron >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Ron liu <ron9....@gmail.com> 于2024年3月23日周六 >> >>> 10:35写道: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Hi, Feng >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Thanks for your feedback. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Although currently we restrict users from >> >>> modifying the query, >> >>>>> I >> >>>>>>>>>>>>> wonder >> >>>>>>>>>>>>>>>>>>> if >> >>>>>>>>>>>>>>>>>>> we can provide a better way to help users >> >>> rebuild it without >> >>>>>>>>>>> affecting >> >>>>>>>>>>>>>>>>>>> downstream OLAP queries. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Considering the problem of data consistency, >> >>> so in the first >> >>>>> step >> >>>>>>>>> we >> >>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>> strictly limited in semantics and do not >> >>> support modify the >> >>>>> query. >> >>>>>>>>>>>>>>>>>>> This is >> >>>>>>>>>>>>>>>>>>> really a good problem, one of my ideas is to >> >>> introduce a syntax >> >>>>>>>>>>>>>>>>>>> similar to >> >>>>>>>>>>>>>>>>>>> SWAP [1], which supports exchanging two >> >>> Dynamic Tables. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> From the documentation, the definitions >> >>> SQL and job >> >>>>> information >> >>>>>>>>> are >> >>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this mean that >> >>> if a system needs to >> >>>>>>>>> adapt >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to store >> >>> Flink's job information >> >>>>> in >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>> corresponding system? >> >>>>>>>>>>>>>>>>>>> For example, does MySQL's Catalog need to >> >>> store flink job >> >>>>>>>>> information >> >>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>>>> well? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Yes, currently we need to rely on Catalog to >> >>> store refresh job >> >>>>>>>>>>>>>>>>>>> information. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Users still need to consider how much >> >>> memory is being used, how >> >>>>>>>>>>> large >> >>>>>>>>>>>>>>>>>>> the concurrency is, which type of state >> >>> backend is being used, >> >>>>> and >> >>>>>>>>>>>>>>>>>>> may need >> >>>>>>>>>>>>>>>>>>> to set TTL expiration. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Similar to the current practice, job >> >>> parameters can be set via >> >>>>> the >> >>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>> conf or SET commands >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> When we submit a refresh command, can we >> >>> help users detect if >> >>>>>>>>> there >> >>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>> any >> >>>>>>>>>>>>>>>>>>> running jobs and automatically stop them >> >>> before executing the >> >>>>>>>>> refresh >> >>>>>>>>>>>>>>>>>>> command? Then wait for it to complete before >> >>> restarting the >> >>>>>>>>>>> background >> >>>>>>>>>>>>>>>>>>> streaming job? >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Purely from a technical implementation point >> >>> of view, your >> >>>>>>>>> proposal >> >>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>> doable, but it would be more costly. Also I >> >>> think data >> >>>>> consistency >> >>>>>>>>>>>>>>>>>>> itself >> >>>>>>>>>>>>>>>>>>> is the responsibility of the user, similar >> >>> to how Regular Table >> >>>>> is >> >>>>>>>>>>>>>>>>>>> now also >> >>>>>>>>>>>>>>>>>>> the responsibility of the user, so it's >> >>> consistent with its >> >>>>>>>>> behavior >> >>>>>>>>>>>>>>>>>>> and no >> >>>>>>>>>>>>>>>>>>> additional guarantees are made at the engine >> >>> level. >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>> Ron >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> Ahmed Hamdy <hamdy10...@gmail.com> >> >>> 于2024年3月22日周五 23:50写道: >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> Hi Ron, >> >>>>>>>>>>>>>>>>>>>>> Sorry for joining the discussion late, >> >>> thanks for the effort. >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> I think the base idea is great, however I >> >>> have a couple of >> >>>>>>>>> comments: >> >>>>>>>>>>>>>>>>>>>>> - I want to iterate on Timo's comments >> >>> regarding the confusion >> >>>>>>>>>>> between >> >>>>>>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink >> >>> "Table". Should the >> >>>>>>>>> refactoring of >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it >> >>> in this Flip ( as the >> >>>>>>>>>>>>>>>>>>>>> suggestions >> >>>>>>>>>>>>>>>>>>>>> in the thread ) and address the holistic >> >>> changes in a separate >> >>>>>>>>> Flip >> >>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>> 2.0? >> >>>>>>>>>>>>>>>>>>>>> - I feel confused with how it is further >> >>> with other components, >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>> examples provided feel like a standalone >> >>> ETL job, could you >> >>>>>>>>> provide >> >>>>>>>>>>>>>>>>>>>>> in the >> >>>>>>>>>>>>>>>>>>>>> FLIP an example where the table is >> >>> further used in subsequent >> >>>>>>>>>>> queries >> >>>>>>>>>>>>>>>>>>>>> (specially in batch mode). >> >>>>>>>>>>>>>>>>>>>>> - I really like the standard of keeping >> >>> the unified batch and >> >>>>>>>>>>>>> streaming >> >>>>>>>>>>>>>>>>>>>>> approach >> >>>>>>>>>>>>>>>>>>>>> Best Regards >> >>>>>>>>>>>>>>>>>>>>> Ahmed Hamdy >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> On Fri, 22 Mar 2024 at 12:07, Lincoln Lee >> >>> < >> >>>>>>>>> lincoln.8...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Hi Timo, >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your thoughtful inputs! >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Yes, expanding the MATERIALIZED >> >>> VIEW(MV) could achieve the >> >>>>> same >> >>>>>>>>>>>>>>>>>>>>> function, >> >>>>>>>>>>>>>>>>>>>>>>> but our primary concern is that by >> >>> using a view, we might >> >>>>> limit >> >>>>>>>>>>>>> future >> >>>>>>>>>>>>>>>>>>>>>>> opportunities >> >>>>>>>>>>>>>>>>>>>>>>> to optimize queries through automatic >> >>> materialization >> >>>>> rewriting >> >>>>>>>>>>> [1], >> >>>>>>>>>>>>>>>>>>>>>>> leveraging >> >>>>>>>>>>>>>>>>>>>>>>> the support for MV by physical >> >>> storage. This is because we >> >>>>>>>>> would be >> >>>>>>>>>>>>>>>>>>>>>>> breaking >> >>>>>>>>>>>>>>>>>>>>>>> the intuitive semantics of a >> >>> materialized view (a materialized >> >>>>>>>>> view >> >>>>>>>>>>>>>>>>>>>>>>> represents >> >>>>>>>>>>>>>>>>>>>>>>> the result of a query) by allowing >> >>> data modifications, thus >> >>>>>>>>> losing >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> potential >> >>>>>>>>>>>>>>>>>>>>>>> for such optimizations. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> With these considerations in mind, we >> >>> were inspired by Google >> >>>>>>>>>>>>> Looker's >> >>>>>>>>>>>>>>>>>>>>>>> Persistent >> >>>>>>>>>>>>>>>>>>>>>>> Derived Table [2]. PDT is designed for >> >>> building Looker's >> >>>>>>>>> automated >> >>>>>>>>>>>>>>>>>>>>>>> modeling, >> >>>>>>>>>>>>>>>>>>>>>>> aligning with our purpose for the >> >>> stream-batch automatic >> >>>>>>>>> pipeline. >> >>>>>>>>>>>>>>>>>>>>>>> Therefore, >> >>>>>>>>>>>>>>>>>>>>>>> we are considering another candidate, >> >>> Derived Table, the term >> >>>>>>>>>>>>> 'derive' >> >>>>>>>>>>>>>>>>>>>>>>> suggests a >> >>>>>>>>>>>>>>>>>>>>>>> query, and 'table' retains >> >>> modifiability. This approach would >> >>>>>>>>> not >> >>>>>>>>>>>>>>>>>>>>> disrupt >> >>>>>>>>>>>>>>>>>>>>>>> our current >> >>>>>>>>>>>>>>>>>>>>>>> concept of a dynamic table, preserving >> >>> the future utility of >> >>>>>>>>> MVs. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Conceptually, a Derived Table is a >> >>> Dynamic Table + Continuous >> >>>>>>>>>>>>>>>>>>>>>>> Query. By >> >>>>>>>>>>>>>>>>>>>>>>> introducing >> >>>>>>>>>>>>>>>>>>>>>>> a new concept Derived Table for this >> >>> FLIP, this makes all >> >>>>>>>>>>>>>>>>>>>>>>> concepts to >> >>>>>>>>>>>>>>>>>>>>> play >> >>>>>>>>>>>>>>>>>>>>>>> together nicely. >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> What do you think about this? >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73_NFf4D5$ >> >>>>>>>>>>>>>>>>>>>>>>> [2] >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://cloud.google.com/looker/docs/derived-tables*persistent_derived_tables__;Iw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7382-2zI3$ >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> >> >>> 于2024年3月22日周五 17:54写道: >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron, >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> thanks for the detailed answer. >> >>> Sorry, for my late reply, we >> >>>>>>>>> had a >> >>>>>>>>>>>>>>>>>>>>>>>>> conference that kept me busy. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the current concept[1], it >> >>> actually includes: Dynamic >> >>>>>>>>>>> Tables >> >>>>>>>>>>>>> & >> >>>>>>>>>>>>>>>>>>>>>>>>>>> & Continuous Query. Dynamic >> >>> Table is just an abstract >> >>>>>>>>> logical >> >>>>>>>>>>>>>>>>>>>>> concept >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> This explanation makes sense to me. >> >>> But the docs also say "A >> >>>>>>>>>>>>>>>>>>>>> continuous >> >>>>>>>>>>>>>>>>>>>>>>>>> query is evaluated on the dynamic >> >>> table yielding a new >> >>>>> dynamic >> >>>>>>>>>>>>>>>>>>>>> table.". >> >>>>>>>>>>>>>>>>>>>>>>>>> So even our regular CREATE TABLEs >> >>> are considered dynamic >> >>>>>>>>> tables. >> >>>>>>>>>>>>> This >> >>>>>>>>>>>>>>>>>>>>>>>>> can also be seen in the diagram >> >>> "Dynamic Table -> Continuous >> >>>>>>>>> Query >> >>>>>>>>>>>>> -> >> >>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table". Currently, Flink >> >>> queries can only be executed >> >>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>> Dynamic >> >>>>>>>>>>>>>>>>>>>>>>>>> Tables. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> In essence, a materialized view >> >>> represents the result of >> >>>>> a >> >>>>>>>>>>>>> query. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Isn't that what your proposal does >> >>> as well? >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> the object of the suspend >> >>> operation is the refresh task >> >>>>> of >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>> dynamic table >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that Snowflake uses >> >>> the term [1] to merge their >> >>>>>>>>>>>>> concepts >> >>>>>>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>>>>>>>> STREAM, TASK, and TABLE into one >> >>> piece of concept. But Flink >> >>>>>>>>> has >> >>>>>>>>>>> no >> >>>>>>>>>>>>>>>>>>>>>>>>> concept of a "refresh task". Also, >> >>> they already introduced >> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED >> >>>>>>>>>>>>>>>>>>>>>>>>> VIEW. Flink is in the convenient >> >>> position that the concept of >> >>>>>>>>>>>>>>>>>>>>>>>>> materialized views is not taken >> >>> (reserved maybe for exactly >> >>>>>>>>> this >> >>>>>>>>>>> use >> >>>>>>>>>>>>>>>>>>>>>>>>> case?). And SQL standard concept >> >>> could be "slightly adapted" >> >>>>> to >> >>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>>>>> needs. Looking at other vendors >> >>> like Postgres[2], they also >> >>>>> use >> >>>>>>>>>>>>>>>>>>>>>>>>> `REFRESH` commands so why not >> >>> adding additional commands such >> >>>>>>>>> as >> >>>>>>>>>>>>>>>>>>>>> DELETE >> >>>>>>>>>>>>>>>>>>>>>>>>> or UPDATE. Oracle supports "ON >> >>> PREBUILT TABLE clause tells >> >>>>> the >> >>>>>>>>>>>>>>>>>>>>> database >> >>>>>>>>>>>>>>>>>>>>>>>>> to use an existing table >> >>> segment"[3] which comes closer to >> >>>>>>>>> what we >> >>>>>>>>>>>>>>>>>>>>> want >> >>>>>>>>>>>>>>>>>>>>>>>>> as well. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> it is not intended to support >> >>> data modification >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> This is an argument that I >> >>> understand. But we as Flink could >> >>>>>>>>> allow >> >>>>>>>>>>>>>>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>>>>>> modifications. This way we are only >> >>> extending the standard >> >>>>> and >> >>>>>>>>>>> don't >> >>>>>>>>>>>>>>>>>>>>>>>>> introduce new concepts. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> If we can't agree on using >> >>> MATERIALIZED VIEW concept. We >> >>>>> should >> >>>>>>>>>>> fix >> >>>>>>>>>>>>>>>>>>>>> our >> >>>>>>>>>>>>>>>>>>>>>>>>> syntax in a Flink 2.0 effort. >> >>> Making regular tables bounded >> >>>>> and >> >>>>>>>>>>>>>>>>>>>>> dynamic >> >>>>>>>>>>>>>>>>>>>>>>>>> tables unbounded. We would be >> >>> closer to the SQL standard with >> >>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>> pave the way for the future. I >> >>> would actually support this if >> >>>>>>>>> all >> >>>>>>>>>>>>>>>>>>>>>>>>> concepts play together nicely. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the future, we can consider >> >>> extending the statement >> >>>>> set >> >>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>> support the creation of multiple >> >>> dynamic tables. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> It's good that we called the >> >>> concept STATEMENT SET. This >> >>>>>>>>> allows us >> >>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>> defined CREATE TABLE within. Even >> >>> if it might look a bit >> >>>>>>>>>>> confusing. >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$ >> >>>>>>>>>>>>>>>>>>>>>>>>> [2] >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-creatematerializedview.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zbNhvS7$ >> >>>>>>>>>>>>>>>>>>>>>>>>> [3] >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://oracle-base.com/articles/misc/materialized-views__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739xS1kvD$ >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.24 04:14, Feng Jin wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron and Lincoln >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this >> >>> discussion. I believe it will >> >>>>> greatly >> >>>>>>>>>>>>>>>>>>>>> improve >> >>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> convenience of managing user >> >>> real-time pipelines. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have some questions. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding Limitations of >> >>> Dynamic Table:* >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does not support modifying >> >>> the select statement after the >> >>>>>>>>>>> dynamic >> >>>>>>>>>>>>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>>>>>>>>>>>> is created. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Although currently we restrict >> >>> users from modifying the >> >>>>>>>>> query, I >> >>>>>>>>>>>>>>>>>>>>> wonder >> >>>>>>>>>>>>>>>>>>>>>>>>> if >> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can provide a better way to >> >>> help users rebuild it without >> >>>>>>>>>>>>>>>>>>>>> affecting >> >>>>>>>>>>>>>>>>>>>>>>>>>>> downstream OLAP queries. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the management of >> >>> background jobs:* >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. From the documentation, the >> >>> definitions SQL and job >> >>>>>>>>>>> information >> >>>>>>>>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this >> >>> mean that if a system needs >> >>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>> adapt >> >>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to >> >>> store Flink's job >> >>>>>>>>> information in >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding system? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, does MySQL's >> >>> Catalog need to store flink job >> >>>>>>>>>>>>>>>>>>>>> information >> >>>>>>>>>>>>>>>>>>>>>>> as >> >>>>>>>>>>>>>>>>>>>>>>>>>>> well? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Users still need to consider >> >>> how much memory is being >> >>>>> used, >> >>>>>>>>>>> how >> >>>>>>>>>>>>>>>>>>>>>>> large >> >>>>>>>>>>>>>>>>>>>>>>>>>>> the concurrency is, which type >> >>> of state backend is being >> >>>>> used, >> >>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>> may >> >>>>>>>>>>>>>>>>>>>>>>>>> need >> >>>>>>>>>>>>>>>>>>>>>>>>>>> to set TTL expiration. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the Refresh Part:* >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the refresh mode is >> >>> continuous and a background job is >> >>>>>>>>>>> running, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> caution should be taken with the >> >>> refresh command as it can >> >>>>>>>>> lead >> >>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent data. >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> When we submit a refresh >> >>> command, can we help users detect >> >>>>> if >> >>>>>>>>>>> there >> >>>>>>>>>>>>>>>>>>>>> are >> >>>>>>>>>>>>>>>>>>>>>>>>> any >> >>>>>>>>>>>>>>>>>>>>>>>>>>> running jobs and automatically >> >>> stop them before executing >> >>>>> the >> >>>>>>>>>>>>>>>>>>>>> refresh >> >>>>>>>>>>>>>>>>>>>>>>>>>>> command? Then wait for it to >> >>> complete before restarting the >> >>>>>>>>>>>>>>>>>>>>> background >> >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming job? >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>> Feng >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 19, 2024 at 9:40 PM >> >>> Lincoln Lee < >> >>>>>>>>>>>>> lincoln.8...@gmail.com >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yun, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your >> >>> valuable input! >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Incremental mode is indeed an >> >>> attractive idea, we have also >> >>>>>>>>>>>>>>>>>>>>> discussed >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, but in the current >> >>> design, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we first provided two refresh >> >>> modes: CONTINUOUS and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FULL. Incremental mode can be >> >>> introduced >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> once the execution layer has >> >>> the capability. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My answer for the two >> >>> questions: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, cascading is a good >> >>> question. Current proposal >> >>>>>>>>> provides a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness that defines a >> >>> dynamic >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table relative to the base >> >>> table’s lag. If users need to >> >>>>>>>>>>> consider >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-to-end freshness of >> >>> multiple >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cascaded dynamic tables, he >> >>> can manually split them for >> >>>>> now. >> >>>>>>>>> Of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> course, how to let multiple >> >>> cascaded >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or dependent dynamic tables >> >>> complete the freshness >> >>>>>>>>> definition >> >>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simpler way, I think it can be >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extended in the future. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cascading refresh is also a >> >>> part we focus on discussing. In >> >>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>> flip, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we hope to focus as much as >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible on the core features >> >>> (as it already involves a lot >> >>>>>>>>>>>>>>>>>>>>> things), >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so we did not directly >> >>> introduce related >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax. However, based on the >> >>> current design, combined >> >>>>>>>>> with >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog and lineage, >> >>> theoretically, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users can also finish the >> >>> cascading refresh. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang <myas...@live.com> >> >>> 于2024年3月19日周二 13:45写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this >> >>> discussion, and I am so excited to >> >>>>>>>>> see >> >>>>>>>>>>>>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>>>>>> topic >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being discussed in the >> >>> Flink community! >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From my point of view, >> >>> instead of the work of unifying >> >>>>>>>>>>>>> streaming >> >>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in DataStream API [1], >> >>> this FLIP actually could make users >> >>>>>>>>>>>>> benefit >> >>>>>>>>>>>>>>>>>>>>>>> from >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine to rule batch & >> >>> streaming. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we treat this FLIP as >> >>> an open-source implementation of >> >>>>>>>>>>>>>>>>>>>>> Snowflake's >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic tables [2], we >> >>> still lack an incremental refresh >> >>>>>>>>> mode >> >>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>> make >> >>>>>>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ETL near real-time with a >> >>> much cheaper computation cost. >> >>>>>>>>>>> However, >> >>>>>>>>>>>>>>>>>>>>> I >> >>>>>>>>>>>>>>>>>>>>>>>>> think >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this could be done under >> >>> the current design by introducing >> >>>>>>>>>>>>> another >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mode in the future. >> >>> Although the extra work of incremental >> >>>>>>>>> view >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> maintenance >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be much larger. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the FLIP itself, I >> >>> have several questions below: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. It seems this FLIP does >> >>> not consider the lag of >> >>>>> refreshes >> >>>>>>>>>>>>>>>>>>>>> across >> >>>>>>>>>>>>>>>>>>>>>>> ETL >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layers from ODS ---> DWD >> >>> ---> APP [3]. We currently only >> >>>>>>>>>>> consider >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler interval, which >> >>> means we cannot use lag to >> >>>>>>>>>>>>> automatically >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> schedule >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the upfront micro-batch >> >>> jobs to do the work. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. To support the >> >>> automagical refreshes, we should >> >>>>> consider >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> lineage >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the catalog or somewhere >> >>> else. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-134*3A*Batch*execution*for*the*DataStream*API__;JSsrKysrKw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7352JICzI$ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2] >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3] >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghqpxk$ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>> ________________________________ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Lincoln Lee < >> >>> lincoln.8...@gmail.com> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, March 14, >> >>> 2024 14:35 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@flink.apache.org < >> >>> dev@flink.apache.org> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS] >> >>> FLIP-435: Introduce a New Dynamic >> >>>>>>>>> Table >> >>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simplifying Data Pipelines >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jing, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your attention >> >>> to this flip! I'll try to answer >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>> following >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define query >> >>> of dynamic table? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or >> >>> introducing new syntax? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, how >> >>> to handle the difference in SQL >> >>>>>>>>> between >> >>>>>>>>>>>>>>>>>>>>>>>>> streaming >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query >> >>> including window aggregate based on >> >>>>>>>>>>>>>>>>>>>>> processing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including >> >>> global order by? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Similar to `CREATE TABLE >> >>> AS query`, here the `query` also >> >>>>>>>>> uses >> >>>>>>>>>>>>>>>>>>>>> Flink >> >>>>>>>>>>>>>>>>>>>>>>>>> sql >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't introduce a >> >>> totally new syntax. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We will not change the >> >>> status respect to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference in >> >>> functionality of flink sql itself on >> >>>>>>>>>>> streaming >> >>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch, for example, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the proctime window agg on >> >>> streaming and global sort on >> >>>>>>>>> batch >> >>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>> you >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in fact, do not work >> >>> properly in the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other mode, so when the >> >>> user modifies the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh mode of a dynamic >> >>> table that is not supported, we >> >>>>>>>>> will >> >>>>>>>>>>>>>>>>>>>>> throw >> >>>>>>>>>>>>>>>>>>>>>>> an >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify the >> >>> query of dynamic table is allowed? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only >> >>> refresh a dynamic table based on the >> >>>>>>>>> initial >> >>>>>>>>>>>>>>>>>>>>> query? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, in the current >> >>> design, the query definition of the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table is not >> >>> allowed >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be modified, and you >> >>> can only refresh the data based >> >>>>>>>>> on >> >>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial definition. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use dynamic >> >>> table? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table seems >> >>> to be similar to the materialized >> >>>>>>>>>>> view. >> >>>>>>>>>>>>>>>>>>>>>>> Will >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like >> >>> materialized view rewriting during the >> >>>>>>>>>>>>>>>>>>>>> optimization? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's true that dynamic >> >>> table and materialized view >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are similar in some ways, >> >>> but as Ron >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explains >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are differences. In >> >>> terms of optimization, automated >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization discovery >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to that supported >> >>> by calcite is also a potential >> >>>>>>>>>>>>>>>>>>>>> possibility, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perhaps with the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition of automated >> >>> rewriting in the future. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron liu < >> >>> ron9....@gmail.com> 于2024年3月14日周四 14:01写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Timo >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for later >> >>> response, thanks for your feedback. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding your >> >>> questions: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has introduced >> >>> the concept of Dynamic Tables many >> >>>>>>>>> years >> >>>>>>>>>>>>>>>>>>>>> ago. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term "Dynamic >> >>> Table" fit into Flink's regular >> >>>>>>>>> tables >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>> also >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it relate to >> >>> Table API? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that adding >> >>> the DYNAMIC TABLE keyword could cause >> >>>>>>>>>>>>>>>>>>>>> confusion >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a >> >>> term for regular CREATE TABLE (that can >> >>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>> "kind >> >>>>>>>>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well and >> >>> is backed by a changelog) is then >> >>>>>>>>>>> missing. >> >>>>>>>>>>>>>>>>>>>>>>> Also >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we call >> >>> our connectors for those tables, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and DynamicTableSink. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I find >> >>> it contradicting that a TABLE can be >> >>>>>>>>>>>>>>>>>>>>> "paused" or >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From an >> >>> English language perspective, this >> >>>>> does >> >>>>>>>>>>>>> sound >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my >> >>> opinion (without much research yet), a >> >>>>>>>>>>>>>>>>>>>>> continuous >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger >> >>> should rather be modelled as a CREATE >> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are >> >>> familiar with?) or a new concept such >> >>>>> as >> >>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>> CREATE >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be paused >> >>> and resumed?). >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the current >> >>> concept[1], it actually includes: Dynamic >> >>>>>>>>>>> Tables >> >>>>>>>>>>>>> & >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. >> >>> Dynamic Table is just an abstract >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logical concept >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , which in its physical >> >>> form represents either a table >> >>>>> or a >> >>>>>>>>>>>>>>>>>>>>>>> changelog >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream. It requires the >> >>> combination with Continuous Query >> >>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>> achieve >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic updates of the >> >>> target table similar to a >> >>>>> database’s >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Materialized View. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We hope to upgrade the >> >>> Dynamic Table to a real entity >> >>>>> that >> >>>>>>>>>>> users >> >>>>>>>>>>>>>>>>>>>>> can >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operate, which combines >> >>> the logical concepts of Dynamic >> >>>>>>>>>>> Tables + >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. By >> >>> integrating the definition of tables >> >>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>> queries, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can achieve >> >>> functions similar to Materialized Views, >> >>>>>>>>>>>>>>>>>>>>> simplifying >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users' data processing >> >>> pipelines. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, the object of the >> >>> suspend operation is the refresh >> >>>>>>>>> task of >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table. The >> >>> command `ALTER DYNAMIC TABLE >> >>>>> table_name >> >>>>>>>>>>>>>>>>>>>>> SUSPEND >> >>>>>>>>>>>>>>>>>>>>>>> ` >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is actually a shorthand >> >>> for `ALTER DYNAMIC TABLE >> >>>>> table_name >> >>>>>>>>>>>>>>>>>>>>> SUSPEND >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REFRESH` (if written in >> >>> full for clarity, we can also >> >>>>>>>>> modify >> >>>>>>>>>>>>> it). >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Initially, we also >> >>> considered Materialized Views >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , but ultimately >> >>> decided against them. Materialized views >> >>>>>>>>> are >> >>>>>>>>>>>>>>>>>>>>>>> designed >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to enhance query >> >>> performance for workloads that consist >> >>>>> of >> >>>>>>>>>>>>>>>>>>>>> common, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repetitive query >> >>> patterns. In essence, a materialized >> >>>>> view >> >>>>>>>>>>>>>>>>>>>>>>> represents >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the result of a query. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is not >> >>> intended to support data modification. >> >>>>>>>>> For >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lakehouse scenarios, >> >>> where the ability to delete or >> >>>>> update >> >>>>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>> is >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crucial (such as >> >>> compliance with GDPR, FLIP-2), >> >>>>>>>>> materialized >> >>>>>>>>>>>>>>>>>>>>> views >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fall short. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to CREATE >> >>> (regular) TABLE, CREATE DYNAMIC TABLE >> >>>>>>>>> not >> >>>>>>>>>>>>> only >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defines metadata in the >> >>> catalog but also automatically >> >>>>>>>>>>> initiates >> >>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data refresh task based >> >>> on the query specified during >> >>>>> table >> >>>>>>>>>>>>>>>>>>>>>>> creation. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It dynamically executes >> >>> data updates. Users can focus on >> >>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies and data >> >>> generation logic. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The new dynamic table >> >>> does not conflict with the existing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource and >> >>> DynamicTableSink interfaces. For >> >>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> developer, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all that needs to be >> >>> implemented is the new >> >>>>>>>>>>> CatalogDynamicTable, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without changing the >> >>> implementation of source and sink. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. For now, the FLIP >> >>> does not consider supporting Table >> >>>>> API >> >>>>>>>>>>>>>>>>>>>>>>> operations >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . However, once the SQL >> >>> syntax is finalized, we can >> >>>>> discuss >> >>>>>>>>>>> this >> >>>>>>>>>>>>>>>>>>>>> in >> >>>>>>>>>>>>>>>>>>>>>>> a >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate FLIP. >> >>> Currently, I have a rough idea: the Table >> >>>>>>>>> API >> >>>>>>>>>>>>>>>>>>>>> should >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also introduce >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTable operation >> >>> interfaces >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding to the >> >>> existing Table interfaces. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The TableEnvironment >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will provide relevant >> >>> methods to support various >> >>>>> dynamic >> >>>>>>>>>>>>> table >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations. The goal >> >>> for the new Dynamic Table is to >> >>>>> offer >> >>>>>>>>>>> users >> >>>>>>>>>>>>>>>>>>>>> an >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience similar to >> >>> using a database, which is why we >> >>>>>>>>>>>>>>>>>>>>> prioritize >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL-based approaches >> >>> initially. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you envision >> >>> re-adding the functionality of a >> >>>>>>>>>>> statement >> >>>>>>>>>>>>>>>>>>>>> set, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to multiple >> >>> tables? This is a very important >> >>>>> use >> >>>>>>>>>>> case >> >>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Multi-tables is indeed >> >>> a very important user scenario. In >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> future, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we can consider >> >>> extending the statement set syntax to >> >>>>>>>>> support >> >>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of multiple >> >>> dynamic tables. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early >> >>> days of Flink SQL, we were discussing >> >>>>>>>>>>> `SELECT >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5 >> >>> MINUTES`. Your proposal seems to rephrase >> >>>>>>>>>>> STREAM >> >>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other keywords >> >>> DYNAMIC TABLE and FRESHNESS. But the >> >>>>>>>>> core >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is >> >>> still there. I'm wondering if we should >> >>>>>>>>>>> widen >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part of >> >>> this FLIP but a new FLIP) to follow >> >>>>> the >> >>>>>>>>>>>>>>>>>>>>> standard >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making >> >>> `SELECT * FROM t` bounded by default and >> >>>>>>>>> use >> >>>>>>>>>>>>> new >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic >> >>> behavior. Flink 2.0 would be the perfect >> >>>>>>>>> time >> >>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would >> >>> require careful discussions. What do >> >>>>> you >> >>>>>>>>>>>>>>>>>>>>> think? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The query part indeed >> >>> requires a separate FLIP >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for discussion, as it >> >>> involves changes to the default >> >>>>>>>>>>> behavior. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>> >> >>> >> https://urldefense.com/v3/__https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73477_wHn$ >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang < >> >>> beyond1...@gmail.com> 于2024年3月13日周三 15:19写道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lincoln & Ron, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the >> >>> proposal. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with the >> >>> question raised by Timo. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have some >> >>> other questions. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define >> >>> query of dynamic table? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or >> >>> introducing new syntax? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, >> >>> how to handle the difference in SQL >> >>>>>>>>> between >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> streaming >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query >> >>> including window aggregate based on >> >>>>>>>>>>>>>>>>>>>>> processing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including >> >>> global order by? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify >> >>> the query of dynamic table is allowed? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only >> >>> refresh a dynamic table based on >> >>>>> initial >> >>>>>>>>>>>>> query? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use >> >>> dynamic table? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table >> >>> seems to be similar with materialized >> >>>>>>>>> view. >> >>>>>>>>>>>>>>>>>>>>> Will >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like >> >>> materialized view rewriting during the >> >>>>>>>>>>>>>>>>>>>>> optimization? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo Walther < >> >>> twal...@apache.org> 于2024年3月13日周三 01:24写 >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 道: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln & Ron, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for >> >>> proposing this FLIP. I think a design >> >>>>> similar >> >>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>> what >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose has been >> >>> in the heads of many people, however, >> >>>>>>>>> I'm >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this will fit >> >>> into the bigger picture. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't deeply >> >>> reviewed the FLIP yet, but would like >> >>>>> to >> >>>>>>>>>>> ask >> >>>>>>>>>>>>>>>>>>>>> some >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial questions: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has >> >>> introduced the concept of Dynamic Tables many >> >>>>>>>>>>> years >> >>>>>>>>>>>>>>>>>>>>> ago. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term >> >>> "Dynamic Table" fit into Flink's regular >> >>>>>>>>>>> tables >> >>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it >> >>> relate to Table API? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that >> >>> adding the DYNAMIC TABLE keyword could >> >>>>> cause >> >>>>>>>>>>>>>>>>>>>>> confusion >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a >> >>> term for regular CREATE TABLE (that >> >>>>> can >> >>>>>>>>> be >> >>>>>>>>>>>>>>>>>>>>> "kind >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well >> >>> and is backed by a changelog) is then >> >>>>>>>>>>>>> missing. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we >> >>> call our connectors for those tables, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and >> >>> DynamicTableSink. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I >> >>> find it contradicting that a TABLE can be >> >>>>>>>>>>>>>>>>>>>>> "paused" >> >>>>>>>>>>>>>>>>>>>>>>> or >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From >> >>> an English language perspective, this >> >>>>>>>>> does >> >>>>>>>>>>>>>>>>>>>>> sound >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my >> >>> opinion (without much research yet), a >> >>>>>>>>>>>>>>>>>>>>> continuous >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger >> >>> should rather be modelled as a CREATE >> >>>>>>>>>>>>>>>>>>>>>>> MATERIALIZED >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are >> >>> familiar with?) or a new concept such >> >>>>>>>>> as a >> >>>>>>>>>>>>>>>>>>>>> CREATE >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be >> >>> paused and resumed?). >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you >> >>> envision re-adding the functionality of a >> >>>>>>>>>>> statement >> >>>>>>>>>>>>>>>>>>>>>>> set, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to >> >>> multiple tables? This is a very important >> >>>>> use >> >>>>>>>>>>> case >> >>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early >> >>> days of Flink SQL, we were discussing >> >>>>>>>>>>> `SELECT >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> * >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5 >> >>> MINUTES`. Your proposal seems to rephrase >> >>>>>>>>>>> STREAM >> >>>>>>>>>>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other >> >>> keywords DYNAMIC TABLE and FRESHNESS. But >> >>>>> the >> >>>>>>>>>>> core >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is >> >>> still there. I'm wondering if we >> >>>>> should >> >>>>>>>>>>> widen >> >>>>>>>>>>>>>>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part >> >>> of this FLIP but a new FLIP) to follow >> >>>>>>>>> the >> >>>>>>>>>>>>>>>>>>>>>>> standard >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making >> >>> `SELECT * FROM t` bounded by default >> >>>>> and >> >>>>>>>>> use >> >>>>>>>>>>>>>>>>>>>>> new >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic >> >>> behavior. Flink 2.0 would be the >> >>>>> perfect >> >>>>>>>>>>> time >> >>>>>>>>>>>>>>>>>>>>> for >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would >> >>> require careful discussions. What do >> >>>>>>>>> you >> >>>>>>>>>>>>>>>>>>>>> think? >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 11.03.24 >> >>> 08:23, Ron liu wrote: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Dev >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee >> >>> and I would like to start a discussion >> >>>>> about >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-435: >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Introduce a >> >>> New Dynamic Table for Simplifying Data >> >>>>>>>>>>>>> Pipelines. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This FLIP is >> >>> designed to simplify the development of >> >>>>>>>>> data >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processing >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines. >> >>> With Dynamic Tables with uniform SQL >> >>>>>>>>> statements >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness, >> >>> users can define batch and streaming >> >>>>>>>>>>>>>>>>>>>>> transformations >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in the >> >>> same way, accelerate ETL pipeline >> >>>>>>>>> development, >> >>>>>>>>>>>>> and >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> task >> >>> scheduling automatically. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For more >> >>> details, see FLIP-435 [1]. Looking forward to >> >>>>>>>>> your >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback. >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln & Ron >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>> >> >>>>> >> >>> >> >> >> > >> >>