Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Ron liu Tue, 09 Apr 2024 05:11:31 -0700

Hi, Dev

My rankings are:


1. Derived Table
2. Materialized Table
3. Live Table
4. Materialized View

Best,
Ron



Ron liu <ron9....@gmail.com> 于2024年4月9日周二 20:07写道：

> Hi, Dev
>
> After several rounds of discussion, there is currently no consensus on the
> name of the new concept. Timo has proposed that we decide the name through
> a vote. This is a good solution when there is no clear preference, so we
> will adopt this approach.
>
> Regarding the name of the new concept, there are currently five candidates:
> 1. Derived Table -> taken by SQL standard
> 2. Materialized Table -> similar to SQL materialized view but a table
> 3. Live Table -> similar to dynamic tables
> 4. Refresh Table -> states what it does
> 5. Materialized View -> needs to extend the standard to support modifying
> data
>
> For the above five candidates, everyone can give your rankings based on
> your preferences. You can choose up to five options or only choose some of
> them.
> We will use a scoring rule, where the* first rank gets 5 points, second
> rank gets 4 points, third rank gets 3 points, fourth rank gets 2 points,
> and fifth rank gets 1 point*.
> After the voting closes, I will score all the candidates based on
> everyone's votes, and the candidate with the highest score will be chosen
> as the name for the new concept.
>
> The voting will last up to 72 hours and is expected to close this Friday.
> I look forward to everyone voting on the name in this thread. Of course, we
> also welcome new input regarding the name.
>
> Best,
> Ron
>
> Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:49写道：
>
>> Hi, Dev
>>
>> Sorry for my previous statement was not quite accurate. We will hold a
>> vote for the name within this thread.
>>
>> Best,
>> Ron
>>
>>
>> Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:29写道：
>>
>>> Hi, Timo
>>>
>>> Thanks for your reply.
>>>
>>> I agree with you that sometimes naming is more difficult. When no one
>>> has a clear preference, voting on the name is a good solution, so I'll send
>>> a separate email for the vote, clarify the rules for the vote, then let
>>> everyone vote.
>>>
>>> One other point to confirm, in your ranking there is an option for
>>> Materialized View, does it stand for the UPDATING Materialized View that
>>> you mentioned earlier in the discussion? If using Materialized View I think
>>> it is needed to extend it.
>>>
>>> Best,
>>> Ron
>>>
>>> Timo Walther <twal...@apache.org> 于2024年4月9日周二 17:20写道：
>>>
>>>> Hi Ron,
>>>>
>>>> yes naming is hard. But it will have large impact on trainings,
>>>> presentations, and the mental model of users. Maybe the easiest is to
>>>> collect ranking by everyone with some short justification:
>>>>
>>>>
>>>> My ranking (from good to not so good):
>>>>
>>>> 1. Refresh Table -> states what it does
>>>> 2. Materialized Table -> similar to SQL materialized view but a table
>>>> 3. Live Table -> nice buzzword, but maybe still too close to dynamic
>>>> tables?
>>>> 4. Materialized View -> a bit broader than standard but still very
>>>> similar
>>>> 5. Derived table -> taken by standard
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>>
>>>> On 07.04.24 11:34, Ron liu wrote:
>>>> > Hi, Dev
>>>> >
>>>> > This is a summary letter. After several rounds of discussion, there
>>>> is a
>>>> > strong consensus about the FLIP proposal and the issues it aims to
>>>> address.
>>>> > The current point of disagreement is the naming of the new concept. I
>>>> have
>>>> > summarized the candidates as follows:
>>>> >
>>>> > 1. Derived Table (Inspired by Google Lookers)
>>>> >      - Pros: Google Lookers has introduced this concept, which is
>>>> designed
>>>> > for building Looker's automated modeling, aligning with our purpose
>>>> for the
>>>> > stream-batch automatic pipeline.
>>>> >
>>>> >      - Cons: The SQL standard uses derived table term extensively,
>>>> vendors
>>>> > adopt this for simply referring to a table within a subclause.
>>>> >
>>>> > 2. Materialized Table: It means materialize the query result to table,
>>>> > similar to Db2 MQT (Materialized Query Tables). In addition, Snowflake
>>>> > Dynamic Table's predecessor is also called Materialized Table.
>>>> >
>>>> > 3. Updating Table (From Timo)
>>>> >
>>>> > 4. Updating Materialized View (From Timo)
>>>> >
>>>> > 5. Refresh/Live Table (From Martijn)
>>>> >
>>>> > As Martijn said, naming is a headache, looking forward to more
>>>> valuable
>>>> > input from everyone.
>>>> >
>>>> > [1]
>>>> >
>>>> https://cloud.google.com/looker/docs/derived-tables#persistent_derived_tables
>>>> > [2]
>>>> https://www.ibm.com/docs/en/db2/11.5?topic=tables-materialized-query
>>>> > [3]
>>>> >
>>>> https://community.denodo.com/docs/html/browse/6.0/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables
>>>> >
>>>> > Best,
>>>> > Ron
>>>> >
>>>> > Ron liu <ron9....@gmail.com> 于2024年4月7日周日 15:55写道：
>>>> >
>>>> >> Hi, Lorenzo
>>>> >>
>>>> >> Thank you for your insightful input.
>>>> >>
>>>> >>>>> I think the 2 above twisted the materialized view concept to more
>>>> than
>>>> >> just an optimization for accessing pre-computed aggregates/filters.
>>>> >> I think that concept (at least in my mind) is now adherent to the
>>>> >> semantics of the words themselves ("materialized" and "view") than
>>>> on its
>>>> >> implementations in DBMs, as just a view on raw data that, hopefully,
>>>> is
>>>> >> constantly updated with fresh results.
>>>> >> That's why I understand Timo's et al. objections.
>>>> >>
>>>> >> Your understanding of Materialized Views is correct. However, in our
>>>> >> scenario, an important feature is the support for Update & Delete
>>>> >> operations, which the current Materialized Views cannot fulfill. As
>>>> we
>>>> >> discussed with Timo before, if Materialized Views needs to support
>>>> data
>>>> >> modifications, it would require an extension of new keywords, such as
>>>> >> CREATING xxx (UPDATING) MATERIALIZED VIEW.
>>>> >>
>>>> >>>>> Still, I don't understand why we need another type of special
>>>> table.
>>>> >> Could you dive deep into the reasons why not simply adding the
>>>> FRESHNESS
>>>> >> parameter to standard tables?
>>>> >>
>>>> >> Firstly, I need to emphasize that we cannot achieve the design goal
>>>> of
>>>> >> FLIP through the CREATE TABLE syntax combined with a FRESHNESS
>>>> parameter.
>>>> >> The proposal of this FLIP is to use Dynamic Table + Continuous
>>>> Query, and
>>>> >> combine it with FRESHNESS to realize a streaming-batch unification.
>>>> >> However, CREATE TABLE is merely a metadata operation and cannot
>>>> >> automatically start a background refresh job. To achieve the design
>>>> goal of
>>>> >> FLIP with standard tables, it would require extending the CTAS[1]
>>>> syntax to
>>>> >> introduce the FRESHNESS keyword. We considered this design
>>>> initially, but
>>>> >> it has following problems:
>>>> >>
>>>> >> 1. Distinguishing a table created through CTAS as a standard table
>>>> or as a
>>>> >> "special" standard table with an ongoing background refresh job
>>>> using the
>>>> >> FRESHNESS keyword is very obscure for users.
>>>> >> 2. It intrudes on the semantics of the CTAS syntax. Currently, tables
>>>> >> created using CTAS only add table metadata to the Catalog and do not
>>>> record
>>>> >> attributes such as query. There are also no ongoing background
>>>> refresh
>>>> >> jobs, and the data writing operation happens only once at table
>>>> creation.
>>>> >> 3. For the framework, when we perform a certain kind of Alter Table
>>>> >> behavior for a table, for the table created by specifying FRESHNESS
>>>> and did
>>>> >> not specify the FRESHNESS created table behavior how to distinguish
>>>> , which
>>>> >> will also cause confusion.
>>>> >>
>>>> >> In terms of the design goal of combining Dynamic Table + Continuous
>>>> Query,
>>>> >> the FLIP proposal cannot be realized by only extending the current
>>>> stardand
>>>> >> tables, so a new kind of dynamic table needs to be introduced at the
>>>> >> first-level concept.
>>>> >>
>>>> >> [1]
>>>> >>
>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement
>>>> >>
>>>> >> Best,
>>>> >> Ron
>>>> >>
>>>> >> <lorenzo.affe...@ververica.com.invalid> 于2024年4月3日周三 22:25写道：
>>>> >>
>>>> >>> Hello everybody!
>>>> >>> Thanks for the FLIP as it looks amazing (and I think the prove is
>>>> this
>>>> >>> deep discussion it is provoking :))
>>>> >>>
>>>> >>> I have a couple of comments to add to this:
>>>> >>>
>>>> >>> Even though I get the reason why you rejected MATERIALIZED VIEW, I
>>>> still
>>>> >>> like it a lot, and I would like to provide pointers on how the
>>>> materialized
>>>> >>> view concept twisted in last years:
>>>> >>>
>>>> >>> • Materialize DB (https://materialize.com/)
>>>> >>> • The famous talk by Martin Kleppmann "turning the database inside
>>>> out" (
>>>> >>> https://www.youtube.com/watch?v=fU9hR3kiOK0)
>>>> >>>
>>>> >>> I think the 2 above twisted the materialized view concept to more
>>>> than
>>>> >>> just an optimization for accessing pre-computed aggregates/filters.
>>>> >>> I think that concept (at least in my mind) is now adherent to the
>>>> >>> semantics of the words themselves ("materialized" and "view") than
>>>> on its
>>>> >>> implementations in DBMs, as just a view on raw data that,
>>>> hopefully, is
>>>> >>> constantly updated with fresh results.
>>>> >>> That's why I understand Timo's et al. objections.
>>>> >>> Still I understand there is no need to add confusion :)
>>>> >>>
>>>> >>> Still, I don't understand why we need another type of special table.
>>>> >>> Could you dive deep into the reasons why not simply adding the
>>>> FRESHNESS
>>>> >>> parameter to standard tables?
>>>> >>>
>>>> >>> I would say that as a very seamless implementation with the goal of
>>>> a
>>>> >>> unification of batch and streaming.
>>>> >>> If we stick to a unified world, I think that Flink should just
>>>> provide 1
>>>> >>> type of table that is inherently dynamic.
>>>> >>> Now, depending on FRESHNESS objectives / connectors used in WITH,
>>>> that
>>>> >>> table can be backed by a stream or batch job as you explained in
>>>> your FLIP.
>>>> >>>
>>>> >>> Maybe I am totally missing the point :)
>>>> >>>
>>>> >>> Thank you in advance,
>>>> >>> Lorenzo
>>>> >>> On Apr 3, 2024 at 15:25 +0200, Martijn Visser <
>>>> martijnvis...@apache.org>,
>>>> >>> wrote:
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> Thanks for the proposal. While the FLIP talks extensively on how
>>>> >>> Snowflake
>>>> >>>> has Dynamic Tables and Databricks has Delta Live Tables, my
>>>> >>> understanding
>>>> >>>> is that Databricks has CREATE STREAMING TABLE [1] which relates
>>>> with
>>>> >>> this
>>>> >>>> proposal.
>>>> >>>>
>>>> >>>> I do have concerns about using CREATE DYNAMIC TABLE, specifically
>>>> about
>>>> >>>> confusing the users who are familiar with Snowflake's approach
>>>> where you
>>>> >>>> can't change the content via DML statements, while that is
>>>> something
>>>> >>> that
>>>> >>>> would work in this proposal. Naming is hard of course, but I would
>>>> >>> probably
>>>> >>>> prefer something like CREATE CONTINUOUS TABLE, CREATE REFRESH
>>>> TABLE or
>>>> >>>> CREATE LIVE TABLE.
>>>> >>>>
>>>> >>>> Best regards,
>>>> >>>>
>>>> >>>> Martijn
>>>> >>>>
>>>> >>>> [1]
>>>> >>>>
>>>> >>>
>>>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html
>>>> >>>>
>>>> >>>> On Wed, Apr 3, 2024 at 5:19 AM Ron liu <ron9....@gmail.com> wrote:
>>>> >>>>
>>>> >>>>> Hi, dev
>>>> >>>>>
>>>> >>>>> After offline discussion with Becket Qin, Lincoln Lee and Jark
>>>> Wu, we
>>>> >>> have
>>>> >>>>> improved some parts of the FLIP.
>>>> >>>>>
>>>> >>>>> 1. Add Full Refresh Mode section to clarify the semantics of full
>>>> >>> refresh
>>>> >>>>> mode.
>>>> >>>>> 2. Add Future Improvement section explaining why query statement
>>>> does
>>>> >>> not
>>>> >>>>> support references to temporary view and possible solutions.
>>>> >>>>> 3. The Future Improvement section explains a possible future
>>>> solution
>>>> >>> for
>>>> >>>>> dynamic table to support the modification of query statements to
>>>> meet
>>>> >>> the
>>>> >>>>> common field-level schema evolution requirements of the lakehouse.
>>>> >>>>> 4. The Refresh section emphasizes that the Refresh command and the
>>>> >>>>> background refresh job can be executed in parallel, with no
>>>> >>> restrictions at
>>>> >>>>> the framework level.
>>>> >>>>> 5. Convert RefreshHandler into a plug-in interface to support
>>>> various
>>>> >>>>> workflow schedulers.
>>>> >>>>>
>>>> >>>>> Best,
>>>> >>>>> Ron
>>>> >>>>>
>>>> >>>>> Ron liu <ron9....@gmail.com> 于2024年4月2日周二 10:28写道：
>>>> >>>>>
>>>> >>>>>>> Hi, Venkata krishnan
>>>> >>>>>>>
>>>> >>>>>>> Thank you for your involvement and suggestions, and hope that
>>>> the
>>>> >>> design
>>>> >>>>>>> goals of this FLIP will be helpful to your business.
>>>> >>>>>>>
>>>> >>>>>>>>>>>>> 1. In the proposed FLIP, given the example for the
>>>> >>> dynamic table, do
>>>> >>>>>>> the
>>>> >>>>>>> data sources always come from a single lake storage such as
>>>> >>> Paimon or
>>>> >>>>> does
>>>> >>>>>>> the same proposal solve for 2 disparate storage systems like
>>>> >>> Kafka and
>>>> >>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
>>>> Paimon?
>>>> >>>>>>> Basically the lambda architecture that is mentioned in the FLIP
>>>> >>> as well.
>>>> >>>>>>> I'm wondering if it is possible to switch b/w sources based on
>>>> the
>>>> >>>>>>> execution mode, for eg: if it is backfill operation, switch to a
>>>> >>> data
>>>> >>>>> lake
>>>> >>>>>>> storage system like Iceberg, otherwise an event streaming system
>>>> >>> like
>>>> >>>>>>> Kafka.
>>>> >>>>>>>
>>>> >>>>>>> Dynamic table is a design abstraction at the framework level and
>>>> >>> is not
>>>> >>>>>>> tied to the physical implementation of the connector. If a
>>>> >>> connector
>>>> >>>>>>> supports a combination of Kafka and lake storage, this works
>>>> fine.
>>>> >>>>>>>
>>>> >>>>>>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
>>>> >>> nearline
>>>> >>>>> update
>>>> >>>>>>> (streaming) case that are stateful applications? What I mean by
>>>> >>> that is,
>>>> >>>>>>> will the state from the batch application be transferred to the
>>>> >>> nearline
>>>> >>>>>>> application after the bootstrap execution is complete?
>>>> >>>>>>>
>>>> >>>>>>> I think this is another orthogonal thing, something that
>>>> FLIP-327
>>>> >>> tries
>>>> >>>>> to
>>>> >>>>>>> address, not directly related to Dynamic Table.
>>>> >>>>>>>
>>>> >>>>>>> [1]
>>>> >>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data
>>>> >>>>>>>
>>>> >>>>>>> Best,
>>>> >>>>>>> Ron
>>>> >>>>>>>
>>>> >>>>>>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年3月30日周六
>>>> >>> 07:06写道：
>>>> >>>>>>>
>>>> >>>>>>>>> Ron and Lincoln,
>>>> >>>>>>>>>
>>>> >>>>>>>>> Great proposal and interesting discussion for adding support
>>>> >>> for dynamic
>>>> >>>>>>>>> tables within Flink.
>>>> >>>>>>>>>
>>>> >>>>>>>>> At LinkedIn, we are also trying to solve compute/storage
>>>> >>> convergence for
>>>> >>>>>>>>> similar problems discussed as part of this FLIP, specifically
>>>> >>> periodic
>>>> >>>>>>>>> backfill, bootstrap + nearline update use cases using single
>>>> >>>>>>>>> implementation
>>>> >>>>>>>>> of business logic (single script).
>>>> >>>>>>>>>
>>>> >>>>>>>>> Few clarifying questions:
>>>> >>>>>>>>>
>>>> >>>>>>>>> 1. In the proposed FLIP, given the example for the dynamic
>>>> >>> table, do the
>>>> >>>>>>>>> data sources always come from a single lake storage such as
>>>> >>> Paimon or
>>>> >>>>> does
>>>> >>>>>>>>> the same proposal solve for 2 disparate storage systems like
>>>> >>> Kafka and
>>>> >>>>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
>>>> >>> Paimon?
>>>> >>>>>>>>> Basically the lambda architecture that is mentioned in the
>>>> >>> FLIP as well.
>>>> >>>>>>>>> I'm wondering if it is possible to switch b/w sources based on
>>>> >>> the
>>>> >>>>>>>>> execution mode, for eg: if it is backfill operation, switch to
>>>> >>> a data
>>>> >>>>> lake
>>>> >>>>>>>>> storage system like Iceberg, otherwise an event streaming
>>>> >>> system like
>>>> >>>>>>>>> Kafka.
>>>> >>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
>>>> >>> nearline update
>>>> >>>>>>>>> (streaming) case that are stateful applications? What I mean
>>>> >>> by that is,
>>>> >>>>>>>>> will the state from the batch application be transferred to
>>>> >>> the nearline
>>>> >>>>>>>>> application after the bootstrap execution is complete?
>>>> >>>>>>>>>
>>>> >>>>>>>>> Regards
>>>> >>>>>>>>> Venkata krishnan
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Mon, Mar 25, 2024 at 8:03 PM Ron liu <ron9....@gmail.com>
>>>> >>> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>>> Hi, Timo
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Thanks for your quick response, and your suggestion.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Yes, this discussion has turned into confirming whether
>>>> >>> it's a special
>>>> >>>>>>>>>>> table or a special MV.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> 1. The key problem with MVs is that they don't support
>>>> >>> modification,
>>>> >>>>> so
>>>> >>>>>>>>> I
>>>> >>>>>>>>>>> prefer it to be a special table. Although the periodic
>>>> >>> refresh
>>>> >>>>> behavior
>>>> >>>>>>>>> is
>>>> >>>>>>>>>>> more characteristic of an MV, since we are already a
>>>> >>> special table,
>>>> >>>>>>>>>>> supporting periodic refresh behavior is quite natural,
>>>> >>> similar to
>>>> >>>>>>>>> Snowflake
>>>> >>>>>>>>>>> dynamic tables.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> 2. Regarding the keyword UPDATING, since the current
>>>> >>> Regular Table is
>>>> >>>>> a
>>>> >>>>>>>>>>> Dynamic Table, which implies support for updating through
>>>> >>> Continuous
>>>> >>>>>>>>> Query,
>>>> >>>>>>>>>>> I think it is redundant to add the keyword UPDATING. In
>>>> >>> addition,
>>>> >>>>>>>>> UPDATING
>>>> >>>>>>>>>>> can not reflect the Continuous Query part, can not express
>>>> >>> the purpose
>>>> >>>>>>>>> we
>>>> >>>>>>>>>>> want to simplify the data pipeline through Dynamic Table +
>>>> >>> Continuous
>>>> >>>>>>>>>>> Query.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> 3. From the perspective of the SQL standard definition, I
>>>> >>> can
>>>> >>>>> understand
>>>> >>>>>>>>>>> your concerns about Derived Table, but is it possible to
>>>> >>> make a slight
>>>> >>>>>>>>>>> adjustment to meet our needs? Additionally, as Lincoln
>>>> >>> mentioned, the
>>>> >>>>>>>>>>> Google Looker platform has introduced Persistent Derived
>>>> >>> Table, and
>>>> >>>>>>>>> there
>>>> >>>>>>>>>>> are precedents in the industry; could Derived Table be a
>>>> >>> candidate?
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Of course, look forward to your better suggestions.
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Best,
>>>> >>>>>>>>>>> Ron
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2024年3月25日周一 18:49写道：
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>>> After thinking about this more, this discussion boils
>>>> >>> down to
>>>> >>>>> whether
>>>> >>>>>>>>>>>>> this is a special table or a special materialized
>>>> >>> view. In both
>>>> >>>>> cases,
>>>> >>>>>>>>>>>>> we would need to add a special keyword:
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Either
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> CREATE UPDATING TABLE
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> or
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> CREATE UPDATING MATERIALIZED VIEW
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> I still feel that the periodic refreshing behavior is
>>>> >>> closer to a
>>>> >>>>> MV.
>>>> >>>>>>>>> If
>>>> >>>>>>>>>>>>> we add a special keyword to MV, the optimizer would
>>>> >>> know that the
>>>> >>>>> data
>>>> >>>>>>>>>>>>> cannot be used for query optimizations.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> I will ask more people for their opinion.
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>> On 25.03.24 10:45, Timo Walther wrote:
>>>> >>>>>>>>>>>>>>> Hi Ron and Lincoln,
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> thanks for the quick response and the very
>>>> >>> insightful discussion.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> we might limit future opportunities to
>>>> >>> optimize queries
>>>> >>>>>>>>>>>>>>>>> through automatic materialization rewriting by
>>>> >>> allowing data
>>>> >>>>>>>>>>>>>>>>> modifications, thus losing the potential for
>>>> >>> such
>>>> >>>>> optimizations.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> This argument makes a lot of sense to me. Due to
>>>> >>> the updates, the
>>>> >>>>>>>>>>> system
>>>> >>>>>>>>>>>>>>> is not in full control of the persisted data.
>>>> >>> However, the system
>>>> >>>>> is
>>>> >>>>>>>>>>>>>>> still in full control of the job that powers the
>>>> >>> refresh. So if
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>> system manages all updating pipelines, it could
>>>> >>> still leverage
>>>> >>>>>>>>>>> automatic
>>>> >>>>>>>>>>>>>>> materialization rewriting but without leveraging
>>>> >>> the data at rest
>>>> >>>>>>>>> (only
>>>> >>>>>>>>>>>>>>> the data in flight).
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> we are considering another candidate, Derived
>>>> >>> Table, the term
>>>> >>>>>>>>>>> 'derive'
>>>> >>>>>>>>>>>>>>>>> suggests a query, and 'table' retains
>>>> >>> modifiability. This
>>>> >>>>>>>>> approach
>>>> >>>>>>>>>>>>>>>>> would not disrupt our current concept of a
>>>> >>> dynamic table
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> I did some research on this term. The SQL standard
>>>> >>> uses the term
>>>> >>>>>>>>>>>>>>> "derived table" extensively (defined in section
>>>> >>> 4.17.3). Thus, a
>>>> >>>>>>>>> lot of
>>>> >>>>>>>>>>>>>>> vendors adopt this for simply referring to a table
>>>> >>> within a
>>>> >>>>>>>>> subclause:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/derived-tables.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghdiMp$
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32300.1600/doc/html/san1390612291252.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737h1gRux$
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://www.c-sharpcorner.com/article/derived-tables-vs-common-table-expressions/__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739bWIEcL$
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://stackoverflow.com/questions/26529804/what-are-the-derived-tables-in-my-explain-statement__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739HnGtQf$
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://www.sqlservercentral.com/articles/sql-derived-tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737DeBiqg$
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Esp. the latter example is interesting, SQL Server
>>>> >>> allows things
>>>> >>>>>>>>> like
>>>> >>>>>>>>>>>>>>> this on derived tables:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> UPDATE T SET Name='Timo' FROM (SELECT * FROM
>>>> >>> Product) AS T
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> SELECT * FROM Product;
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Btw also Snowflake's dynamic table state:
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Because the content of a dynamic table is
>>>> >>> fully determined
>>>> >>>>>>>>>>>>>>>>> by the given query, the content cannot be
>>>> >>> changed by using DML.
>>>> >>>>>>>>>>>>>>>>> You don’t insert, update, or delete the rows
>>>> >>> in a dynamic
>>>> >>>>> table.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> So a new term makes a lot of sense.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> How about using `UPDATING`?
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> CREATE UPDATING TABLE
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> This reflects that modifications can be made and
>>>> >>> from an
>>>> >>>>>>>>>>>>>>> English-language perspective you can PAUSE or
>>>> >>> RESUME the UPDATING.
>>>> >>>>>>>>>>>>>>> Thus, a user can define UPDATING interval and mode?
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Looking forward to your thoughts.
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>> On 25.03.24 07:09, Ron liu wrote:
>>>> >>>>>>>>>>>>>>>>> Hi, Ahmed
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Thanks for your feedback.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Regarding your question:
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> I want to iterate on Timo's comments
>>>> >>> regarding the confusion
>>>> >>>>>>>>> between
>>>> >>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink "Table".
>>>> >>> Should the refactoring
>>>> >>>>>>>>> of
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it in
>>>> >>> this Flip ( as the
>>>> >>>>>>>>>>>>>>>>> suggestions
>>>> >>>>>>>>>>>>>>>>> in the thread ) and address the holistic
>>>> >>> changes in a separate
>>>> >>>>> Flip
>>>> >>>>>>>>>>>>>>>>> for 2.0?
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Lincoln proposed a new concept in reply to
>>>> >>> Timo: Derived Table,
>>>> >>>>>>>>> which
>>>> >>>>>>>>>>>>>>>>> is a
>>>> >>>>>>>>>>>>>>>>> combination of Dynamic Table + Continuous
>>>> >>> Query, and the use of
>>>> >>>>>>>>>>> Derived
>>>> >>>>>>>>>>>>>>>>> Table will not conflict with existing concepts,
>>>> >>> what do you
>>>> >>>>> think?
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> I feel confused with how it is further with
>>>> >>> other components,
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>>>> examples provided feel like a standalone ETL
>>>> >>> job, could you
>>>> >>>>>>>>> provide in
>>>> >>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>> FLIP an example where the table is further used
>>>> >>> in subsequent
>>>> >>>>>>>>> queries
>>>> >>>>>>>>>>>>>>>>> (specially in batch mode).
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Thanks for your suggestion, I added how to use
>>>> >>> Dynamic Table in
>>>> >>>>>>>>> FLIP
>>>> >>>>>>>>>>>>> user
>>>> >>>>>>>>>>>>>>>>> story section, Dynamic Table can be referenced
>>>> >>> by downstream
>>>> >>>>>>>>> Dynamic
>>>> >>>>>>>>>>>>>>>>> Table
>>>> >>>>>>>>>>>>>>>>> and can also support OLAP queries.
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>> Ron
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>> Ron liu <ron9....@gmail.com> 于2024年3月23日周六
>>>> >>> 10:35写道：
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Hi, Feng
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Thanks for your feedback.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> Although currently we restrict users from
>>>> >>> modifying the query,
>>>> >>>>> I
>>>> >>>>>>>>>>>>> wonder
>>>> >>>>>>>>>>>>>>>>>>> if
>>>> >>>>>>>>>>>>>>>>>>> we can provide a better way to help users
>>>> >>> rebuild it without
>>>> >>>>>>>>>>> affecting
>>>> >>>>>>>>>>>>>>>>>>> downstream OLAP queries.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Considering the problem of data consistency,
>>>> >>> so in the first
>>>> >>>>> step
>>>> >>>>>>>>> we
>>>> >>>>>>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>> strictly limited in semantics and do not
>>>> >>> support modify the
>>>> >>>>> query.
>>>> >>>>>>>>>>>>>>>>>>> This is
>>>> >>>>>>>>>>>>>>>>>>> really a good problem, one of my ideas is to
>>>> >>> introduce a syntax
>>>> >>>>>>>>>>>>>>>>>>> similar to
>>>> >>>>>>>>>>>>>>>>>>> SWAP [1], which supports exchanging two
>>>> >>> Dynamic Tables.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>  From the documentation, the definitions
>>>> >>> SQL and job
>>>> >>>>> information
>>>> >>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this mean that
>>>> >>> if a system needs to
>>>> >>>>>>>>> adapt
>>>> >>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to store
>>>> >>> Flink's job information
>>>> >>>>> in
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>> corresponding system?
>>>> >>>>>>>>>>>>>>>>>>> For example, does MySQL's Catalog need to
>>>> >>> store flink job
>>>> >>>>>>>>> information
>>>> >>>>>>>>>>>>> as
>>>> >>>>>>>>>>>>>>>>>>> well?
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Yes, currently we need to rely on Catalog to
>>>> >>> store refresh job
>>>> >>>>>>>>>>>>>>>>>>> information.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> Users still need to consider how much
>>>> >>> memory is being used, how
>>>> >>>>>>>>>>> large
>>>> >>>>>>>>>>>>>>>>>>> the concurrency is, which type of state
>>>> >>> backend is being used,
>>>> >>>>> and
>>>> >>>>>>>>>>>>>>>>>>> may need
>>>> >>>>>>>>>>>>>>>>>>> to set TTL expiration.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Similar to the current practice, job
>>>> >>> parameters can be set via
>>>> >>>>> the
>>>> >>>>>>>>>>>>> Flink
>>>> >>>>>>>>>>>>>>>>>>> conf or SET commands
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> When we submit a refresh command, can we
>>>> >>> help users detect if
>>>> >>>>>>>>> there
>>>> >>>>>>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>> any
>>>> >>>>>>>>>>>>>>>>>>> running jobs and automatically stop them
>>>> >>> before executing the
>>>> >>>>>>>>> refresh
>>>> >>>>>>>>>>>>>>>>>>> command? Then wait for it to complete before
>>>> >>> restarting the
>>>> >>>>>>>>>>> background
>>>> >>>>>>>>>>>>>>>>>>> streaming job?
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Purely from a technical implementation point
>>>> >>> of view, your
>>>> >>>>>>>>> proposal
>>>> >>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>>>> doable, but it would be more costly. Also I
>>>> >>> think data
>>>> >>>>> consistency
>>>> >>>>>>>>>>>>>>>>>>> itself
>>>> >>>>>>>>>>>>>>>>>>> is the responsibility of the user, similar
>>>> >>> to how Regular Table
>>>> >>>>> is
>>>> >>>>>>>>>>>>>>>>>>> now also
>>>> >>>>>>>>>>>>>>>>>>> the responsibility of the user, so it's
>>>> >>> consistent with its
>>>> >>>>>>>>> behavior
>>>> >>>>>>>>>>>>>>>>>>> and no
>>>> >>>>>>>>>>>>>>>>>>> additional guarantees are made at the engine
>>>> >>> level.
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>> Ron
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>> Ahmed Hamdy <hamdy10...@gmail.com>
>>>> >>> 于2024年3月22日周五 23:50写道：
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> Hi Ron,
>>>> >>>>>>>>>>>>>>>>>>>>> Sorry for joining the discussion late,
>>>> >>> thanks for the effort.
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> I think the base idea is great, however I
>>>> >>> have a couple of
>>>> >>>>>>>>> comments:
>>>> >>>>>>>>>>>>>>>>>>>>> - I want to iterate on Timo's comments
>>>> >>> regarding the confusion
>>>> >>>>>>>>>>> between
>>>> >>>>>>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink
>>>> >>> "Table". Should the
>>>> >>>>>>>>> refactoring of
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it
>>>> >>> in this Flip ( as the
>>>> >>>>>>>>>>>>>>>>>>>>> suggestions
>>>> >>>>>>>>>>>>>>>>>>>>> in the thread ) and address the holistic
>>>> >>> changes in a separate
>>>> >>>>>>>>> Flip
>>>> >>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>> 2.0?
>>>> >>>>>>>>>>>>>>>>>>>>> - I feel confused with how it is further
>>>> >>> with other components,
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> examples provided feel like a standalone
>>>> >>> ETL job, could you
>>>> >>>>>>>>> provide
>>>> >>>>>>>>>>>>>>>>>>>>> in the
>>>> >>>>>>>>>>>>>>>>>>>>> FLIP an example where the table is
>>>> >>> further used in subsequent
>>>> >>>>>>>>>>> queries
>>>> >>>>>>>>>>>>>>>>>>>>> (specially in batch mode).
>>>> >>>>>>>>>>>>>>>>>>>>> - I really like the standard of keeping
>>>> >>> the unified batch and
>>>> >>>>>>>>>>>>> streaming
>>>> >>>>>>>>>>>>>>>>>>>>> approach
>>>> >>>>>>>>>>>>>>>>>>>>> Best Regards
>>>> >>>>>>>>>>>>>>>>>>>>> Ahmed Hamdy
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>> On Fri, 22 Mar 2024 at 12:07, Lincoln Lee
>>>> >>> <
>>>> >>>>>>>>> lincoln.8...@gmail.com>
>>>> >>>>>>>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi Timo,
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your thoughtful inputs!
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Yes, expanding the MATERIALIZED
>>>> >>> VIEW(MV) could achieve the
>>>> >>>>> same
>>>> >>>>>>>>>>>>>>>>>>>>> function,
>>>> >>>>>>>>>>>>>>>>>>>>>>> but our primary concern is that by
>>>> >>> using a view, we might
>>>> >>>>> limit
>>>> >>>>>>>>>>>>> future
>>>> >>>>>>>>>>>>>>>>>>>>>>> opportunities
>>>> >>>>>>>>>>>>>>>>>>>>>>> to optimize queries through automatic
>>>> >>> materialization
>>>> >>>>> rewriting
>>>> >>>>>>>>>>> [1],
>>>> >>>>>>>>>>>>>>>>>>>>>>> leveraging
>>>> >>>>>>>>>>>>>>>>>>>>>>> the support for MV by physical
>>>> >>> storage. This is because we
>>>> >>>>>>>>> would be
>>>> >>>>>>>>>>>>>>>>>>>>>>> breaking
>>>> >>>>>>>>>>>>>>>>>>>>>>> the intuitive semantics of a
>>>> >>> materialized view (a materialized
>>>> >>>>>>>>> view
>>>> >>>>>>>>>>>>>>>>>>>>>>> represents
>>>> >>>>>>>>>>>>>>>>>>>>>>> the result of a query) by allowing
>>>> >>> data modifications, thus
>>>> >>>>>>>>> losing
>>>> >>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> potential
>>>> >>>>>>>>>>>>>>>>>>>>>>> for such optimizations.
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> With these considerations in mind, we
>>>> >>> were inspired by Google
>>>> >>>>>>>>>>>>> Looker's
>>>> >>>>>>>>>>>>>>>>>>>>>>> Persistent
>>>> >>>>>>>>>>>>>>>>>>>>>>> Derived Table [2]. PDT is designed for
>>>> >>> building Looker's
>>>> >>>>>>>>> automated
>>>> >>>>>>>>>>>>>>>>>>>>>>> modeling,
>>>> >>>>>>>>>>>>>>>>>>>>>>> aligning with our purpose for the
>>>> >>> stream-batch automatic
>>>> >>>>>>>>> pipeline.
>>>> >>>>>>>>>>>>>>>>>>>>>>> Therefore,
>>>> >>>>>>>>>>>>>>>>>>>>>>> we are considering another candidate,
>>>> >>> Derived Table, the term
>>>> >>>>>>>>>>>>> 'derive'
>>>> >>>>>>>>>>>>>>>>>>>>>>> suggests a
>>>> >>>>>>>>>>>>>>>>>>>>>>> query, and 'table' retains
>>>> >>> modifiability. This approach would
>>>> >>>>>>>>> not
>>>> >>>>>>>>>>>>>>>>>>>>> disrupt
>>>> >>>>>>>>>>>>>>>>>>>>>>> our current
>>>> >>>>>>>>>>>>>>>>>>>>>>> concept of a dynamic table, preserving
>>>> >>> the future utility of
>>>> >>>>>>>>> MVs.
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Conceptually, a Derived Table is a
>>>> >>> Dynamic Table + Continuous
>>>> >>>>>>>>>>>>>>>>>>>>>>> Query. By
>>>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>>>> >>>>>>>>>>>>>>>>>>>>>>> a new concept Derived Table for this
>>>> >>> FLIP, this makes all
>>>> >>>>>>>>>>>>>>>>>>>>>>> concepts to
>>>> >>>>>>>>>>>>>>>>>>>>> play
>>>> >>>>>>>>>>>>>>>>>>>>>>> together nicely.
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> What do you think about this?
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73_NFf4D5$
>>>> >>>>>>>>>>>>>>>>>>>>>>> [2]
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://cloud.google.com/looker/docs/derived-tables*persistent_derived_tables__;Iw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7382-2zI3$
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org>
>>>> >>> 于2024年3月22日周五 17:54写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> thanks for the detailed answer.
>>>> >>> Sorry, for my late reply, we
>>>> >>>>>>>>> had a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> conference that kept me busy.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the current concept[1], it
>>>> >>> actually includes: Dynamic
>>>> >>>>>>>>>>> Tables
>>>> >>>>>>>>>>>>> &
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> & Continuous Query. Dynamic
>>>> >>> Table is just an abstract
>>>> >>>>>>>>> logical
>>>> >>>>>>>>>>>>>>>>>>>>> concept
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This explanation makes sense to me.
>>>> >>> But the docs also say "A
>>>> >>>>>>>>>>>>>>>>>>>>> continuous
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> query is evaluated on the dynamic
>>>> >>> table yielding a new
>>>> >>>>> dynamic
>>>> >>>>>>>>>>>>>>>>>>>>> table.".
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> So even our regular CREATE TABLEs
>>>> >>> are considered dynamic
>>>> >>>>>>>>> tables.
>>>> >>>>>>>>>>>>> This
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> can also be seen in the diagram
>>>> >>> "Dynamic Table -> Continuous
>>>> >>>>>>>>> Query
>>>> >>>>>>>>>>>>> ->
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table". Currently, Flink
>>>> >>> queries can only be executed
>>>> >>>>>>>>> on
>>>> >>>>>>>>>>>>>>>>>>>>> Dynamic
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Tables.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In essence, a materialized view
>>>> >>> represents the result of
>>>> >>>>> a
>>>> >>>>>>>>>>>>> query.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Isn't that what your proposal does
>>>> >>> as well?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the object of the suspend
>>>> >>> operation is the refresh task
>>>> >>>>> of
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> dynamic table
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that Snowflake uses
>>>> >>> the term [1] to merge their
>>>> >>>>>>>>>>>>> concepts
>>>> >>>>>>>>>>>>>>>>>>>>> of
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> STREAM, TASK, and TABLE into one
>>>> >>> piece of concept. But Flink
>>>> >>>>>>>>> has
>>>> >>>>>>>>>>> no
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> concept of a "refresh task". Also,
>>>> >>> they already introduced
>>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> VIEW. Flink is in the convenient
>>>> >>> position that the concept of
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> materialized views is not taken
>>>> >>> (reserved maybe for exactly
>>>> >>>>>>>>> this
>>>> >>>>>>>>>>> use
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> case?). And SQL standard concept
>>>> >>> could be "slightly adapted"
>>>> >>>>> to
>>>> >>>>>>>>>>> our
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> needs. Looking at other vendors
>>>> >>> like Postgres[2], they also
>>>> >>>>> use
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> `REFRESH` commands so why not
>>>> >>> adding additional commands such
>>>> >>>>>>>>> as
>>>> >>>>>>>>>>>>>>>>>>>>> DELETE
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> or UPDATE. Oracle supports "ON
>>>> >>> PREBUILT TABLE clause tells
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> database
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> to use an existing table
>>>> >>> segment"[3] which comes closer to
>>>> >>>>>>>>> what we
>>>> >>>>>>>>>>>>>>>>>>>>> want
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> as well.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> it is not intended to support
>>>> >>> data modification
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> This is an argument that I
>>>> >>> understand. But we as Flink could
>>>> >>>>>>>>> allow
>>>> >>>>>>>>>>>>>>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> modifications. This way we are only
>>>> >>> extending the standard
>>>> >>>>> and
>>>> >>>>>>>>>>> don't
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce new concepts.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> If we can't agree on using
>>>> >>> MATERIALIZED VIEW concept. We
>>>> >>>>> should
>>>> >>>>>>>>>>> fix
>>>> >>>>>>>>>>>>>>>>>>>>> our
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> syntax in a Flink 2.0 effort.
>>>> >>> Making regular tables bounded
>>>> >>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>> dynamic
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> tables unbounded. We would be
>>>> >>> closer to the SQL standard with
>>>> >>>>>>>>> this
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> pave the way for the future. I
>>>> >>> would actually support this if
>>>> >>>>>>>>> all
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> concepts play together nicely.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the future, we can consider
>>>> >>> extending the statement
>>>> >>>>> set
>>>> >>>>>>>>>>>>> syntax
>>>> >>>>>>>>>>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> support the creation of multiple
>>>> >>> dynamic tables.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> It's good that we called the
>>>> >>> concept STATEMENT SET. This
>>>> >>>>>>>>> allows us
>>>> >>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> defined CREATE TABLE within. Even
>>>> >>> if it might look a bit
>>>> >>>>>>>>>>> confusing.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-creatematerializedview.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zbNhvS7$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://oracle-base.com/articles/misc/materialized-views__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739xS1kvD$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.24 04:14, Feng Jin wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron and Lincoln
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
>>>> >>> discussion. I believe it will
>>>> >>>>> greatly
>>>> >>>>>>>>>>>>>>>>>>>>> improve
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> convenience of managing user
>>>> >>> real-time pipelines.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have some questions.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding Limitations of
>>>> >>> Dynamic Table:*
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does not support modifying
>>>> >>> the select statement after the
>>>> >>>>>>>>>>> dynamic
>>>> >>>>>>>>>>>>>>>>>>>>>>> table
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is created.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Although currently we restrict
>>>> >>> users from modifying the
>>>> >>>>>>>>> query, I
>>>> >>>>>>>>>>>>>>>>>>>>> wonder
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> if
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can provide a better way to
>>>> >>> help users rebuild it without
>>>> >>>>>>>>>>>>>>>>>>>>> affecting
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> downstream OLAP queries.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the management of
>>>> >>> background jobs:*
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. From the documentation, the
>>>> >>> definitions SQL and job
>>>> >>>>>>>>>>> information
>>>> >>>>>>>>>>>>>>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this
>>>> >>> mean that if a system needs
>>>> >>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>> adapt
>>>> >>>>>>>>>>>>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to
>>>> >>> store Flink's job
>>>> >>>>>>>>> information in
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding system?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, does MySQL's
>>>> >>> Catalog need to store flink job
>>>> >>>>>>>>>>>>>>>>>>>>> information
>>>> >>>>>>>>>>>>>>>>>>>>>>> as
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> well?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Users still need to consider
>>>> >>> how much memory is being
>>>> >>>>> used,
>>>> >>>>>>>>>>> how
>>>> >>>>>>>>>>>>>>>>>>>>>>> large
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the concurrency is, which type
>>>> >>> of state backend is being
>>>> >>>>> used,
>>>> >>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>> may
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> need
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to set TTL expiration.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the Refresh Part:*
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the refresh mode is
>>>> >>> continuous and a background job is
>>>> >>>>>>>>>>> running,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> caution should be taken with the
>>>> >>> refresh command as it can
>>>> >>>>>>>>> lead
>>>> >>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent data.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> When we submit a refresh
>>>> >>> command, can we help users detect
>>>> >>>>> if
>>>> >>>>>>>>>>> there
>>>> >>>>>>>>>>>>>>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> any
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> running jobs and automatically
>>>> >>> stop them before executing
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> refresh
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> command? Then wait for it to
>>>> >>> complete before restarting the
>>>> >>>>>>>>>>>>>>>>>>>>> background
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming job?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Feng
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 19, 2024 at 9:40 PM
>>>> >>> Lincoln Lee <
>>>> >>>>>>>>>>>>> lincoln.8...@gmail.com
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yun,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your
>>>> >>> valuable input!
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Incremental mode is indeed an
>>>> >>> attractive idea, we have also
>>>> >>>>>>>>>>>>>>>>>>>>> discussed
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, but in the current
>>>> >>> design,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we first provided two refresh
>>>> >>> modes: CONTINUOUS and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FULL. Incremental mode can be
>>>> >>> introduced
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> once the execution layer has
>>>> >>> the capability.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My answer for the two
>>>> >>> questions:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, cascading is a good
>>>> >>> question. Current proposal
>>>> >>>>>>>>> provides a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness that defines a
>>>> >>> dynamic
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table relative to the base
>>>> >>> table’s lag. If users need to
>>>> >>>>>>>>>>> consider
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-to-end freshness of
>>>> >>> multiple
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cascaded dynamic tables, he
>>>> >>> can manually split them for
>>>> >>>>> now.
>>>> >>>>>>>>> Of
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> course, how to let multiple
>>>> >>> cascaded
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or dependent dynamic tables
>>>> >>> complete the freshness
>>>> >>>>>>>>> definition
>>>> >>>>>>>>>>>>> in
>>>> >>>>>>>>>>>>>>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simpler way, I think it can be
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extended in the future.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cascading refresh is also a
>>>> >>> part we focus on discussing. In
>>>> >>>>>>>>> this
>>>> >>>>>>>>>>>>>>>>>>>>> flip,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we hope to focus as much as
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible on the core features
>>>> >>> (as it already involves a lot
>>>> >>>>>>>>>>>>>>>>>>>>> things),
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so we did not directly
>>>> >>> introduce related
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax. However, based on the
>>>> >>> current design, combined
>>>> >>>>>>>>> with
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog and lineage,
>>>> >>> theoretically,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users can also finish the
>>>> >>> cascading refresh.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang <myas...@live.com>
>>>> >>> 于2024年3月19日周二 13:45写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
>>>> >>> discussion, and I am so excited to
>>>> >>>>>>>>> see
>>>> >>>>>>>>>>>>>>>>>>>>> this
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> topic
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being discussed in the
>>>> >>> Flink community!
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  From my point of view,
>>>> >>> instead of the work of unifying
>>>> >>>>>>>>>>>>> streaming
>>>> >>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in DataStream API [1],
>>>> >>> this FLIP actually could make users
>>>> >>>>>>>>>>>>> benefit
>>>> >>>>>>>>>>>>>>>>>>>>>>> from
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine to rule batch &
>>>> >>> streaming.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we treat this FLIP as
>>>> >>> an open-source implementation of
>>>> >>>>>>>>>>>>>>>>>>>>> Snowflake's
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic tables [2], we
>>>> >>> still lack an incremental refresh
>>>> >>>>>>>>> mode
>>>> >>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>> make
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ETL near real-time with a
>>>> >>> much cheaper computation cost.
>>>> >>>>>>>>>>> However,
>>>> >>>>>>>>>>>>>>>>>>>>> I
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> think
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this could be done under
>>>> >>> the current design by introducing
>>>> >>>>>>>>>>>>> another
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mode in the future.
>>>> >>> Although the extra work of incremental
>>>> >>>>>>>>> view
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> maintenance
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be much larger.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the FLIP itself, I
>>>> >>> have several questions below:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. It seems this FLIP does
>>>> >>> not consider the lag of
>>>> >>>>> refreshes
>>>> >>>>>>>>>>>>>>>>>>>>> across
>>>> >>>>>>>>>>>>>>>>>>>>>>> ETL
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layers from ODS ---> DWD
>>>> >>> ---> APP [3]. We currently only
>>>> >>>>>>>>>>> consider
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler interval, which
>>>> >>> means we cannot use lag to
>>>> >>>>>>>>>>>>> automatically
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> schedule
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the upfront micro-batch
>>>> >>> jobs to do the work.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. To support the
>>>> >>> automagical refreshes, we should
>>>> >>>>> consider
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> lineage
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the catalog or somewhere
>>>> >>> else.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-134*3A*Batch*execution*for*the*DataStream*API__;JSsrKysrKw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7352JICzI$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghqpxk$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>> ________________________________
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Lincoln Lee <
>>>> >>> lincoln.8...@gmail.com>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, March 14,
>>>> >>> 2024 14:35
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@flink.apache.org <
>>>> >>> dev@flink.apache.org>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS]
>>>> >>> FLIP-435: Introduce a New Dynamic
>>>> >>>>>>>>> Table
>>>> >>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simplifying Data Pipelines
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jing,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your attention
>>>> >>> to this flip! I'll try to answer
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> following
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define query
>>>> >>> of dynamic table?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
>>>> >>> introducing new syntax?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, how
>>>> >>> to handle the difference in SQL
>>>> >>>>>>>>> between
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> streaming
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
>>>> >>> including window aggregate based on
>>>> >>>>>>>>>>>>>>>>>>>>> processing
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
>>>> >>> global order by?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Similar to `CREATE TABLE
>>>> >>> AS query`, here the `query` also
>>>> >>>>>>>>> uses
>>>> >>>>>>>>>>>>>>>>>>>>> Flink
>>>> >>>>>>>>>>>>>>>>>>>>>>>>> sql
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't introduce a
>>>> >>> totally new syntax.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We will not change the
>>>> >>> status respect to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference in
>>>> >>> functionality of flink sql itself on
>>>> >>>>>>>>>>> streaming
>>>> >>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch, for example,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the proctime window agg on
>>>> >>> streaming and global sort on
>>>> >>>>>>>>> batch
>>>> >>>>>>>>>>>>> that
>>>> >>>>>>>>>>>>>>>>>>>>>>> you
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in fact, do not work
>>>> >>> properly in the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other mode, so when the
>>>> >>> user modifies the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh mode of a dynamic
>>>> >>> table that is not supported, we
>>>> >>>>>>>>> will
>>>> >>>>>>>>>>>>>>>>>>>>> throw
>>>> >>>>>>>>>>>>>>>>>>>>>>> an
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify the
>>>> >>> query of dynamic table is allowed?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
>>>> >>> refresh a dynamic table based on the
>>>> >>>>>>>>> initial
>>>> >>>>>>>>>>>>>>>>>>>>> query?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, in the current
>>>> >>> design, the query definition of the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table is not
>>>> >>> allowed
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be modified, and you
>>>> >>> can only refresh the data based
>>>> >>>>>>>>> on
>>>> >>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial definition.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use dynamic
>>>> >>> table?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table seems
>>>> >>> to be similar to the materialized
>>>> >>>>>>>>>>> view.
>>>> >>>>>>>>>>>>>>>>>>>>>>> Will
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
>>>> >>> materialized view rewriting during the
>>>> >>>>>>>>>>>>>>>>>>>>> optimization?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's true that dynamic
>>>> >>> table and materialized view
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are similar in some ways,
>>>> >>> but as Ron
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explains
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are differences. In
>>>> >>> terms of optimization, automated
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization discovery
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to that supported
>>>> >>> by calcite is also a potential
>>>> >>>>>>>>>>>>>>>>>>>>> possibility,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perhaps with the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition of automated
>>>> >>> rewriting in the future.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron liu <
>>>> >>> ron9....@gmail.com> 于2024年3月14日周四 14:01写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Timo
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for later
>>>> >>> response, thanks for your feedback.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding your
>>>> >>> questions:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has introduced
>>>> >>> the concept of Dynamic Tables many
>>>> >>>>>>>>> years
>>>> >>>>>>>>>>>>>>>>>>>>> ago.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term "Dynamic
>>>> >>> Table" fit into Flink's regular
>>>> >>>>>>>>> tables
>>>> >>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>> also
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it relate to
>>>> >>> Table API?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that adding
>>>> >>> the DYNAMIC TABLE keyword could cause
>>>> >>>>>>>>>>>>>>>>>>>>> confusion
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
>>>> >>> term for regular CREATE TABLE (that can
>>>> >>>>>>>>> be
>>>> >>>>>>>>>>>>>>>>>>>>> "kind
>>>> >>>>>>>>>>>>>>>>>>>>>>> of
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well and
>>>> >>> is backed by a changelog) is then
>>>> >>>>>>>>>>> missing.
>>>> >>>>>>>>>>>>>>>>>>>>>>> Also
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we call
>>>> >>> our connectors for those tables,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and DynamicTableSink.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I find
>>>> >>> it contradicting that a TABLE can be
>>>> >>>>>>>>>>>>>>>>>>>>> "paused" or
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From an
>>>> >>> English language perspective, this
>>>> >>>>> does
>>>> >>>>>>>>>>>>> sound
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
>>>> >>> opinion (without much research yet), a
>>>> >>>>>>>>>>>>>>>>>>>>> continuous
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
>>>> >>> should rather be modelled as a CREATE
>>>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
>>>> >>> familiar with?) or a new concept such
>>>> >>>>> as
>>>> >>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>>>>>> CREATE
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be paused
>>>> >>> and resumed?).
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the current
>>>> >>> concept[1], it actually includes: Dynamic
>>>> >>>>>>>>>>> Tables
>>>> >>>>>>>>>>>>> &
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query.
>>>> >>> Dynamic Table is just an abstract
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logical concept
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , which in its physical
>>>> >>> form represents either a table
>>>> >>>>> or a
>>>> >>>>>>>>>>>>>>>>>>>>>>> changelog
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream. It requires the
>>>> >>> combination with Continuous Query
>>>> >>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>> achieve
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic updates of the
>>>> >>> target table similar to a
>>>> >>>>> database’s
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Materialized View.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We hope to upgrade the
>>>> >>> Dynamic Table to a real entity
>>>> >>>>> that
>>>> >>>>>>>>>>> users
>>>> >>>>>>>>>>>>>>>>>>>>> can
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operate, which combines
>>>> >>> the logical concepts of Dynamic
>>>> >>>>>>>>>>> Tables +
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. By
>>>> >>> integrating the definition of tables
>>>> >>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>> queries,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can achieve
>>>> >>> functions similar to Materialized Views,
>>>> >>>>>>>>>>>>>>>>>>>>> simplifying
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users' data processing
>>>> >>> pipelines.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, the object of the
>>>> >>> suspend operation is the refresh
>>>> >>>>>>>>> task of
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table. The
>>>> >>> command `ALTER DYNAMIC TABLE
>>>> >>>>> table_name
>>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
>>>> >>>>>>>>>>>>>>>>>>>>>>> `
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is actually a shorthand
>>>> >>> for `ALTER DYNAMIC TABLE
>>>> >>>>> table_name
>>>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REFRESH` (if written in
>>>> >>> full for clarity, we can also
>>>> >>>>>>>>> modify
>>>> >>>>>>>>>>>>> it).
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Initially, we also
>>>> >>> considered Materialized Views
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , but ultimately
>>>> >>> decided against them. Materialized views
>>>> >>>>>>>>> are
>>>> >>>>>>>>>>>>>>>>>>>>>>> designed
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to enhance query
>>>> >>> performance for workloads that consist
>>>> >>>>> of
>>>> >>>>>>>>>>>>>>>>>>>>> common,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repetitive query
>>>> >>> patterns. In essence, a materialized
>>>> >>>>> view
>>>> >>>>>>>>>>>>>>>>>>>>>>> represents
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the result of a query.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is not
>>>> >>> intended to support data modification.
>>>> >>>>>>>>> For
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lakehouse scenarios,
>>>> >>> where the ability to delete or
>>>> >>>>> update
>>>> >>>>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>> is
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crucial (such as
>>>> >>> compliance with GDPR, FLIP-2),
>>>> >>>>>>>>> materialized
>>>> >>>>>>>>>>>>>>>>>>>>> views
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fall short.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to CREATE
>>>> >>> (regular) TABLE, CREATE DYNAMIC TABLE
>>>> >>>>>>>>> not
>>>> >>>>>>>>>>>>> only
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defines metadata in the
>>>> >>> catalog but also automatically
>>>> >>>>>>>>>>> initiates
>>>> >>>>>>>>>>>>>>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data refresh task based
>>>> >>> on the query specified during
>>>> >>>>> table
>>>> >>>>>>>>>>>>>>>>>>>>>>> creation.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It dynamically executes
>>>> >>> data updates. Users can focus on
>>>> >>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies and data
>>>> >>> generation logic.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The new dynamic table
>>>> >>> does not conflict with the existing
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource and
>>>> >>> DynamicTableSink interfaces. For
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> developer,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all that needs to be
>>>> >>> implemented is the new
>>>> >>>>>>>>>>> CatalogDynamicTable,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without changing the
>>>> >>> implementation of source and sink.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. For now, the FLIP
>>>> >>> does not consider supporting Table
>>>> >>>>> API
>>>> >>>>>>>>>>>>>>>>>>>>>>> operations
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . However, once the SQL
>>>> >>> syntax is finalized, we can
>>>> >>>>> discuss
>>>> >>>>>>>>>>> this
>>>> >>>>>>>>>>>>>>>>>>>>> in
>>>> >>>>>>>>>>>>>>>>>>>>>>> a
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate FLIP.
>>>> >>> Currently, I have a rough idea: the Table
>>>> >>>>>>>>> API
>>>> >>>>>>>>>>>>>>>>>>>>> should
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also introduce
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTable operation
>>>> >>> interfaces
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding to the
>>>> >>> existing Table interfaces.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The TableEnvironment
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will provide relevant
>>>> >>> methods to support various
>>>> >>>>> dynamic
>>>> >>>>>>>>>>>>> table
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations. The goal
>>>> >>> for the new Dynamic Table is to
>>>> >>>>> offer
>>>> >>>>>>>>>>> users
>>>> >>>>>>>>>>>>>>>>>>>>> an
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience similar to
>>>> >>> using a database, which is why we
>>>> >>>>>>>>>>>>>>>>>>>>> prioritize
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL-based approaches
>>>> >>> initially.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you envision
>>>> >>> re-adding the functionality of a
>>>> >>>>>>>>>>> statement
>>>> >>>>>>>>>>>>>>>>>>>>> set,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to multiple
>>>> >>> tables? This is a very important
>>>> >>>>> use
>>>> >>>>>>>>>>> case
>>>> >>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Multi-tables is indeed
>>>> >>> a very important user scenario. In
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> future,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we can consider
>>>> >>> extending the statement set syntax to
>>>> >>>>>>>>> support
>>>> >>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of multiple
>>>> >>> dynamic tables.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
>>>> >>> days of Flink SQL, we were discussing
>>>> >>>>>>>>>>> `SELECT
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
>>>> >>> MINUTES`. Your proposal seems to rephrase
>>>> >>>>>>>>>>> STREAM
>>>> >>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other keywords
>>>> >>> DYNAMIC TABLE and FRESHNESS. But the
>>>> >>>>>>>>> core
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
>>>> >>> still there. I'm wondering if we should
>>>> >>>>>>>>>>> widen
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part of
>>>> >>> this FLIP but a new FLIP) to follow
>>>> >>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>> standard
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
>>>> >>> `SELECT * FROM t` bounded by default and
>>>> >>>>>>>>> use
>>>> >>>>>>>>>>>>> new
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
>>>> >>> behavior. Flink 2.0 would be the perfect
>>>> >>>>>>>>> time
>>>> >>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
>>>> >>> require careful discussions. What do
>>>> >>>>> you
>>>> >>>>>>>>>>>>>>>>>>>>> think?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The query part indeed
>>>> >>> requires a separate FLIP
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for discussion, as it
>>>> >>> involves changes to the default
>>>> >>>>>>>>>>> behavior.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>
>>>> >>>
>>>> https://urldefense.com/v3/__https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73477_wHn$
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang <
>>>> >>> beyond1...@gmail.com> 于2024年3月13日周三 15:19写道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lincoln & Ron,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>>>> >>> proposal.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with the
>>>> >>> question raised by Timo.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have some
>>>> >>> other questions.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define
>>>> >>> query of dynamic table?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
>>>> >>> introducing new syntax?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql,
>>>> >>> how to handle the difference in SQL
>>>> >>>>>>>>> between
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> streaming
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
>>>> >>> including window aggregate based on
>>>> >>>>>>>>>>>>>>>>>>>>> processing
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
>>>> >>> global order by?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify
>>>> >>> the query of dynamic table is allowed?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
>>>> >>> refresh a dynamic table based on
>>>> >>>>> initial
>>>> >>>>>>>>>>>>> query?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use
>>>> >>> dynamic table?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table
>>>> >>> seems to be similar with materialized
>>>> >>>>>>>>> view.
>>>> >>>>>>>>>>>>>>>>>>>>> Will
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
>>>> >>> materialized view rewriting during the
>>>> >>>>>>>>>>>>>>>>>>>>> optimization?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo Walther <
>>>> >>> twal...@apache.org> 于2024年3月13日周三 01:24写
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 道：
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln & Ron,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for
>>>> >>> proposing this FLIP. I think a design
>>>> >>>>> similar
>>>> >>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>> what
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose has been
>>>> >>> in the heads of many people, however,
>>>> >>>>>>>>> I'm
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this will fit
>>>> >>> into the bigger picture.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't deeply
>>>> >>> reviewed the FLIP yet, but would like
>>>> >>>>> to
>>>> >>>>>>>>>>> ask
>>>> >>>>>>>>>>>>>>>>>>>>> some
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial questions:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has
>>>> >>> introduced the concept of Dynamic Tables many
>>>> >>>>>>>>>>> years
>>>> >>>>>>>>>>>>>>>>>>>>> ago.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term
>>>> >>> "Dynamic Table" fit into Flink's regular
>>>> >>>>>>>>>>> tables
>>>> >>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it
>>>> >>> relate to Table API?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that
>>>> >>> adding the DYNAMIC TABLE keyword could
>>>> >>>>> cause
>>>> >>>>>>>>>>>>>>>>>>>>> confusion
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
>>>> >>> term for regular CREATE TABLE (that
>>>> >>>>> can
>>>> >>>>>>>>> be
>>>> >>>>>>>>>>>>>>>>>>>>> "kind
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well
>>>> >>> and is backed by a changelog) is then
>>>> >>>>>>>>>>>>> missing.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we
>>>> >>> call our connectors for those tables,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>> >>> DynamicTableSink.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I
>>>> >>> find it contradicting that a TABLE can be
>>>> >>>>>>>>>>>>>>>>>>>>> "paused"
>>>> >>>>>>>>>>>>>>>>>>>>>>> or
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From
>>>> >>> an English language perspective, this
>>>> >>>>>>>>> does
>>>> >>>>>>>>>>>>>>>>>>>>> sound
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
>>>> >>> opinion (without much research yet), a
>>>> >>>>>>>>>>>>>>>>>>>>> continuous
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
>>>> >>> should rather be modelled as a CREATE
>>>> >>>>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
>>>> >>> familiar with?) or a new concept such
>>>> >>>>>>>>> as a
>>>> >>>>>>>>>>>>>>>>>>>>> CREATE
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be
>>>> >>> paused and resumed?).
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you
>>>> >>> envision re-adding the functionality of a
>>>> >>>>>>>>>>> statement
>>>> >>>>>>>>>>>>>>>>>>>>>>> set,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to
>>>> >>> multiple tables? This is a very important
>>>> >>>>> use
>>>> >>>>>>>>>>> case
>>>> >>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
>>>> >>> days of Flink SQL, we were discussing
>>>> >>>>>>>>>>> `SELECT
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
>>>> >>> MINUTES`. Your proposal seems to rephrase
>>>> >>>>>>>>>>> STREAM
>>>> >>>>>>>>>>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other
>>>> >>> keywords DYNAMIC TABLE and FRESHNESS. But
>>>> >>>>> the
>>>> >>>>>>>>>>> core
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
>>>> >>> still there. I'm wondering if we
>>>> >>>>> should
>>>> >>>>>>>>>>> widen
>>>> >>>>>>>>>>>>>>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part
>>>> >>> of this FLIP but a new FLIP) to follow
>>>> >>>>>>>>> the
>>>> >>>>>>>>>>>>>>>>>>>>>>> standard
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
>>>> >>> `SELECT * FROM t` bounded by default
>>>> >>>>> and
>>>> >>>>>>>>> use
>>>> >>>>>>>>>>>>>>>>>>>>> new
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
>>>> >>> behavior. Flink 2.0 would be the
>>>> >>>>> perfect
>>>> >>>>>>>>>>> time
>>>> >>>>>>>>>>>>>>>>>>>>> for
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
>>>> >>> require careful discussions. What do
>>>> >>>>>>>>> you
>>>> >>>>>>>>>>>>>>>>>>>>> think?
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 11.03.24
>>>> >>> 08:23, Ron liu wrote:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Dev
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>>>> >>> and I would like to start a discussion
>>>> >>>>> about
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-435:
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Introduce a
>>>> >>> New Dynamic Table for Simplifying Data
>>>> >>>>>>>>>>>>> Pipelines.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This FLIP is
>>>> >>> designed to simplify the development of
>>>> >>>>>>>>> data
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processing
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>>>> >>> With Dynamic Tables with uniform SQL
>>>> >>>>>>>>> statements
>>>> >>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness,
>>>> >>> users can define batch and streaming
>>>> >>>>>>>>>>>>>>>>>>>>> transformations
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in the
>>>> >>> same way, accelerate ETL pipeline
>>>> >>>>>>>>> development,
>>>> >>>>>>>>>>>>> and
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> task
>>>> >>> scheduling automatically.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For more
>>>> >>> details, see FLIP-435 [1]. Looking forward to
>>>> >>>>>>>>> your
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback.
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln & Ron
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>>>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>
>>>> >>>>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Reply via email to