Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Ron liu Tue, 09 Apr 2024 04:49:48 -0700

Hi, Dev

Sorry for my previous statement was not quite accurate. We will hold a vote
for the name within this thread.


Best,
Ron


Ron liu <ron9....@gmail.com> 于2024年4月9日周二 19:29写道：

> Hi, Timo
>
> Thanks for your reply.
>
> I agree with you that sometimes naming is more difficult. When no one has
> a clear preference, voting on the name is a good solution, so I'll send a
> separate email for the vote, clarify the rules for the vote, then let
> everyone vote.
>
> One other point to confirm, in your ranking there is an option for
> Materialized View, does it stand for the UPDATING Materialized View that
> you mentioned earlier in the discussion? If using Materialized View I think
> it is needed to extend it.
>
> Best,
> Ron
>
> Timo Walther <twal...@apache.org> 于2024年4月9日周二 17:20写道：
>
>> Hi Ron,
>>
>> yes naming is hard. But it will have large impact on trainings,
>> presentations, and the mental model of users. Maybe the easiest is to
>> collect ranking by everyone with some short justification:
>>
>>
>> My ranking (from good to not so good):
>>
>> 1. Refresh Table -> states what it does
>> 2. Materialized Table -> similar to SQL materialized view but a table
>> 3. Live Table -> nice buzzword, but maybe still too close to dynamic
>> tables?
>> 4. Materialized View -> a bit broader than standard but still very similar
>> 5. Derived table -> taken by standard
>>
>> Regards,
>> Timo
>>
>>
>>
>> On 07.04.24 11:34, Ron liu wrote:
>> > Hi, Dev
>> >
>> > This is a summary letter. After several rounds of discussion, there is a
>> > strong consensus about the FLIP proposal and the issues it aims to
>> address.
>> > The current point of disagreement is the naming of the new concept. I
>> have
>> > summarized the candidates as follows:
>> >
>> > 1. Derived Table (Inspired by Google Lookers)
>> >      - Pros: Google Lookers has introduced this concept, which is
>> designed
>> > for building Looker's automated modeling, aligning with our purpose for
>> the
>> > stream-batch automatic pipeline.
>> >
>> >      - Cons: The SQL standard uses derived table term extensively,
>> vendors
>> > adopt this for simply referring to a table within a subclause.
>> >
>> > 2. Materialized Table: It means materialize the query result to table,
>> > similar to Db2 MQT (Materialized Query Tables). In addition, Snowflake
>> > Dynamic Table's predecessor is also called Materialized Table.
>> >
>> > 3. Updating Table (From Timo)
>> >
>> > 4. Updating Materialized View (From Timo)
>> >
>> > 5. Refresh/Live Table (From Martijn)
>> >
>> > As Martijn said, naming is a headache, looking forward to more valuable
>> > input from everyone.
>> >
>> > [1]
>> >
>> https://cloud.google.com/looker/docs/derived-tables#persistent_derived_tables
>> > [2]
>> https://www.ibm.com/docs/en/db2/11.5?topic=tables-materialized-query
>> > [3]
>> >
>> https://community.denodo.com/docs/html/browse/6.0/vdp/vql/materialized_tables/creating_materialized_tables/creating_materialized_tables
>> >
>> > Best,
>> > Ron
>> >
>> > Ron liu <ron9....@gmail.com> 于2024年4月7日周日 15:55写道：
>> >
>> >> Hi, Lorenzo
>> >>
>> >> Thank you for your insightful input.
>> >>
>> >>>>> I think the 2 above twisted the materialized view concept to more
>> than
>> >> just an optimization for accessing pre-computed aggregates/filters.
>> >> I think that concept (at least in my mind) is now adherent to the
>> >> semantics of the words themselves ("materialized" and "view") than on
>> its
>> >> implementations in DBMs, as just a view on raw data that, hopefully, is
>> >> constantly updated with fresh results.
>> >> That's why I understand Timo's et al. objections.
>> >>
>> >> Your understanding of Materialized Views is correct. However, in our
>> >> scenario, an important feature is the support for Update & Delete
>> >> operations, which the current Materialized Views cannot fulfill. As we
>> >> discussed with Timo before, if Materialized Views needs to support data
>> >> modifications, it would require an extension of new keywords, such as
>> >> CREATING xxx (UPDATING) MATERIALIZED VIEW.
>> >>
>> >>>>> Still, I don't understand why we need another type of special table.
>> >> Could you dive deep into the reasons why not simply adding the
>> FRESHNESS
>> >> parameter to standard tables?
>> >>
>> >> Firstly, I need to emphasize that we cannot achieve the design goal of
>> >> FLIP through the CREATE TABLE syntax combined with a FRESHNESS
>> parameter.
>> >> The proposal of this FLIP is to use Dynamic Table + Continuous Query,
>> and
>> >> combine it with FRESHNESS to realize a streaming-batch unification.
>> >> However, CREATE TABLE is merely a metadata operation and cannot
>> >> automatically start a background refresh job. To achieve the design
>> goal of
>> >> FLIP with standard tables, it would require extending the CTAS[1]
>> syntax to
>> >> introduce the FRESHNESS keyword. We considered this design initially,
>> but
>> >> it has following problems:
>> >>
>> >> 1. Distinguishing a table created through CTAS as a standard table or
>> as a
>> >> "special" standard table with an ongoing background refresh job using
>> the
>> >> FRESHNESS keyword is very obscure for users.
>> >> 2. It intrudes on the semantics of the CTAS syntax. Currently, tables
>> >> created using CTAS only add table metadata to the Catalog and do not
>> record
>> >> attributes such as query. There are also no ongoing background refresh
>> >> jobs, and the data writing operation happens only once at table
>> creation.
>> >> 3. For the framework, when we perform a certain kind of Alter Table
>> >> behavior for a table, for the table created by specifying FRESHNESS
>> and did
>> >> not specify the FRESHNESS created table behavior how to distinguish ,
>> which
>> >> will also cause confusion.
>> >>
>> >> In terms of the design goal of combining Dynamic Table + Continuous
>> Query,
>> >> the FLIP proposal cannot be realized by only extending the current
>> stardand
>> >> tables, so a new kind of dynamic table needs to be introduced at the
>> >> first-level concept.
>> >>
>> >> [1]
>> >>
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/create/#as-select_statement
>> >>
>> >> Best,
>> >> Ron
>> >>
>> >> <lorenzo.affe...@ververica.com.invalid> 于2024年4月3日周三 22:25写道：
>> >>
>> >>> Hello everybody!
>> >>> Thanks for the FLIP as it looks amazing (and I think the prove is this
>> >>> deep discussion it is provoking :))
>> >>>
>> >>> I have a couple of comments to add to this:
>> >>>
>> >>> Even though I get the reason why you rejected MATERIALIZED VIEW, I
>> still
>> >>> like it a lot, and I would like to provide pointers on how the
>> materialized
>> >>> view concept twisted in last years:
>> >>>
>> >>> • Materialize DB (https://materialize.com/)
>> >>> • The famous talk by Martin Kleppmann "turning the database inside
>> out" (
>> >>> https://www.youtube.com/watch?v=fU9hR3kiOK0)
>> >>>
>> >>> I think the 2 above twisted the materialized view concept to more than
>> >>> just an optimization for accessing pre-computed aggregates/filters.
>> >>> I think that concept (at least in my mind) is now adherent to the
>> >>> semantics of the words themselves ("materialized" and "view") than on
>> its
>> >>> implementations in DBMs, as just a view on raw data that, hopefully,
>> is
>> >>> constantly updated with fresh results.
>> >>> That's why I understand Timo's et al. objections.
>> >>> Still I understand there is no need to add confusion :)
>> >>>
>> >>> Still, I don't understand why we need another type of special table.
>> >>> Could you dive deep into the reasons why not simply adding the
>> FRESHNESS
>> >>> parameter to standard tables?
>> >>>
>> >>> I would say that as a very seamless implementation with the goal of a
>> >>> unification of batch and streaming.
>> >>> If we stick to a unified world, I think that Flink should just
>> provide 1
>> >>> type of table that is inherently dynamic.
>> >>> Now, depending on FRESHNESS objectives / connectors used in WITH, that
>> >>> table can be backed by a stream or batch job as you explained in your
>> FLIP.
>> >>>
>> >>> Maybe I am totally missing the point :)
>> >>>
>> >>> Thank you in advance,
>> >>> Lorenzo
>> >>> On Apr 3, 2024 at 15:25 +0200, Martijn Visser <
>> martijnvis...@apache.org>,
>> >>> wrote:
>> >>>> Hi all,
>> >>>>
>> >>>> Thanks for the proposal. While the FLIP talks extensively on how
>> >>> Snowflake
>> >>>> has Dynamic Tables and Databricks has Delta Live Tables, my
>> >>> understanding
>> >>>> is that Databricks has CREATE STREAMING TABLE [1] which relates with
>> >>> this
>> >>>> proposal.
>> >>>>
>> >>>> I do have concerns about using CREATE DYNAMIC TABLE, specifically
>> about
>> >>>> confusing the users who are familiar with Snowflake's approach where
>> you
>> >>>> can't change the content via DML statements, while that is something
>> >>> that
>> >>>> would work in this proposal. Naming is hard of course, but I would
>> >>> probably
>> >>>> prefer something like CREATE CONTINUOUS TABLE, CREATE REFRESH TABLE
>> or
>> >>>> CREATE LIVE TABLE.
>> >>>>
>> >>>> Best regards,
>> >>>>
>> >>>> Martijn
>> >>>>
>> >>>> [1]
>> >>>>
>> >>>
>> https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html
>> >>>>
>> >>>> On Wed, Apr 3, 2024 at 5:19 AM Ron liu <ron9....@gmail.com> wrote:
>> >>>>
>> >>>>> Hi, dev
>> >>>>>
>> >>>>> After offline discussion with Becket Qin, Lincoln Lee and Jark Wu,
>> we
>> >>> have
>> >>>>> improved some parts of the FLIP.
>> >>>>>
>> >>>>> 1. Add Full Refresh Mode section to clarify the semantics of full
>> >>> refresh
>> >>>>> mode.
>> >>>>> 2. Add Future Improvement section explaining why query statement
>> does
>> >>> not
>> >>>>> support references to temporary view and possible solutions.
>> >>>>> 3. The Future Improvement section explains a possible future
>> solution
>> >>> for
>> >>>>> dynamic table to support the modification of query statements to
>> meet
>> >>> the
>> >>>>> common field-level schema evolution requirements of the lakehouse.
>> >>>>> 4. The Refresh section emphasizes that the Refresh command and the
>> >>>>> background refresh job can be executed in parallel, with no
>> >>> restrictions at
>> >>>>> the framework level.
>> >>>>> 5. Convert RefreshHandler into a plug-in interface to support
>> various
>> >>>>> workflow schedulers.
>> >>>>>
>> >>>>> Best,
>> >>>>> Ron
>> >>>>>
>> >>>>> Ron liu <ron9....@gmail.com> 于2024年4月2日周二 10:28写道：
>> >>>>>
>> >>>>>>> Hi, Venkata krishnan
>> >>>>>>>
>> >>>>>>> Thank you for your involvement and suggestions, and hope that the
>> >>> design
>> >>>>>>> goals of this FLIP will be helpful to your business.
>> >>>>>>>
>> >>>>>>>>>>>>> 1. In the proposed FLIP, given the example for the
>> >>> dynamic table, do
>> >>>>>>> the
>> >>>>>>> data sources always come from a single lake storage such as
>> >>> Paimon or
>> >>>>> does
>> >>>>>>> the same proposal solve for 2 disparate storage systems like
>> >>> Kafka and
>> >>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to Paimon?
>> >>>>>>> Basically the lambda architecture that is mentioned in the FLIP
>> >>> as well.
>> >>>>>>> I'm wondering if it is possible to switch b/w sources based on the
>> >>>>>>> execution mode, for eg: if it is backfill operation, switch to a
>> >>> data
>> >>>>> lake
>> >>>>>>> storage system like Iceberg, otherwise an event streaming system
>> >>> like
>> >>>>>>> Kafka.
>> >>>>>>>
>> >>>>>>> Dynamic table is a design abstraction at the framework level and
>> >>> is not
>> >>>>>>> tied to the physical implementation of the connector. If a
>> >>> connector
>> >>>>>>> supports a combination of Kafka and lake storage, this works fine.
>> >>>>>>>
>> >>>>>>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
>> >>> nearline
>> >>>>> update
>> >>>>>>> (streaming) case that are stateful applications? What I mean by
>> >>> that is,
>> >>>>>>> will the state from the batch application be transferred to the
>> >>> nearline
>> >>>>>>> application after the bootstrap execution is complete?
>> >>>>>>>
>> >>>>>>> I think this is another orthogonal thing, something that FLIP-327
>> >>> tries
>> >>>>> to
>> >>>>>>> address, not directly related to Dynamic Table.
>> >>>>>>>
>> >>>>>>> [1]
>> >>>>>>>
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-327%3A+Support+switching+from+batch+to+stream+mode+to+improve+throughput+when+processing+backlog+data
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Ron
>> >>>>>>>
>> >>>>>>> Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2024年3月30日周六
>> >>> 07:06写道：
>> >>>>>>>
>> >>>>>>>>> Ron and Lincoln,
>> >>>>>>>>>
>> >>>>>>>>> Great proposal and interesting discussion for adding support
>> >>> for dynamic
>> >>>>>>>>> tables within Flink.
>> >>>>>>>>>
>> >>>>>>>>> At LinkedIn, we are also trying to solve compute/storage
>> >>> convergence for
>> >>>>>>>>> similar problems discussed as part of this FLIP, specifically
>> >>> periodic
>> >>>>>>>>> backfill, bootstrap + nearline update use cases using single
>> >>>>>>>>> implementation
>> >>>>>>>>> of business logic (single script).
>> >>>>>>>>>
>> >>>>>>>>> Few clarifying questions:
>> >>>>>>>>>
>> >>>>>>>>> 1. In the proposed FLIP, given the example for the dynamic
>> >>> table, do the
>> >>>>>>>>> data sources always come from a single lake storage such as
>> >>> Paimon or
>> >>>>> does
>> >>>>>>>>> the same proposal solve for 2 disparate storage systems like
>> >>> Kafka and
>> >>>>>>>>> Iceberg where Kafka events are ETLed to Iceberg similar to
>> >>> Paimon?
>> >>>>>>>>> Basically the lambda architecture that is mentioned in the
>> >>> FLIP as well.
>> >>>>>>>>> I'm wondering if it is possible to switch b/w sources based on
>> >>> the
>> >>>>>>>>> execution mode, for eg: if it is backfill operation, switch to
>> >>> a data
>> >>>>> lake
>> >>>>>>>>> storage system like Iceberg, otherwise an event streaming
>> >>> system like
>> >>>>>>>>> Kafka.
>> >>>>>>>>> 2. What happens in the context of a bootstrap (batch) +
>> >>> nearline update
>> >>>>>>>>> (streaming) case that are stateful applications? What I mean
>> >>> by that is,
>> >>>>>>>>> will the state from the batch application be transferred to
>> >>> the nearline
>> >>>>>>>>> application after the bootstrap execution is complete?
>> >>>>>>>>>
>> >>>>>>>>> Regards
>> >>>>>>>>> Venkata krishnan
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Mon, Mar 25, 2024 at 8:03 PM Ron liu <ron9....@gmail.com>
>> >>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>>> Hi, Timo
>> >>>>>>>>>>>
>> >>>>>>>>>>> Thanks for your quick response, and your suggestion.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Yes, this discussion has turned into confirming whether
>> >>> it's a special
>> >>>>>>>>>>> table or a special MV.
>> >>>>>>>>>>>
>> >>>>>>>>>>> 1. The key problem with MVs is that they don't support
>> >>> modification,
>> >>>>> so
>> >>>>>>>>> I
>> >>>>>>>>>>> prefer it to be a special table. Although the periodic
>> >>> refresh
>> >>>>> behavior
>> >>>>>>>>> is
>> >>>>>>>>>>> more characteristic of an MV, since we are already a
>> >>> special table,
>> >>>>>>>>>>> supporting periodic refresh behavior is quite natural,
>> >>> similar to
>> >>>>>>>>> Snowflake
>> >>>>>>>>>>> dynamic tables.
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2. Regarding the keyword UPDATING, since the current
>> >>> Regular Table is
>> >>>>> a
>> >>>>>>>>>>> Dynamic Table, which implies support for updating through
>> >>> Continuous
>> >>>>>>>>> Query,
>> >>>>>>>>>>> I think it is redundant to add the keyword UPDATING. In
>> >>> addition,
>> >>>>>>>>> UPDATING
>> >>>>>>>>>>> can not reflect the Continuous Query part, can not express
>> >>> the purpose
>> >>>>>>>>> we
>> >>>>>>>>>>> want to simplify the data pipeline through Dynamic Table +
>> >>> Continuous
>> >>>>>>>>>>> Query.
>> >>>>>>>>>>>
>> >>>>>>>>>>> 3. From the perspective of the SQL standard definition, I
>> >>> can
>> >>>>> understand
>> >>>>>>>>>>> your concerns about Derived Table, but is it possible to
>> >>> make a slight
>> >>>>>>>>>>> adjustment to meet our needs? Additionally, as Lincoln
>> >>> mentioned, the
>> >>>>>>>>>>> Google Looker platform has introduced Persistent Derived
>> >>> Table, and
>> >>>>>>>>> there
>> >>>>>>>>>>> are precedents in the industry; could Derived Table be a
>> >>> candidate?
>> >>>>>>>>>>>
>> >>>>>>>>>>> Of course, look forward to your better suggestions.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Ron
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Timo Walther <twal...@apache.org> 于2024年3月25日周一 18:49写道：
>> >>>>>>>>>>>
>> >>>>>>>>>>>>> After thinking about this more, this discussion boils
>> >>> down to
>> >>>>> whether
>> >>>>>>>>>>>>> this is a special table or a special materialized
>> >>> view. In both
>> >>>>> cases,
>> >>>>>>>>>>>>> we would need to add a special keyword:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Either
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> CREATE UPDATING TABLE
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> or
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> CREATE UPDATING MATERIALIZED VIEW
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I still feel that the periodic refreshing behavior is
>> >>> closer to a
>> >>>>> MV.
>> >>>>>>>>> If
>> >>>>>>>>>>>>> we add a special keyword to MV, the optimizer would
>> >>> know that the
>> >>>>> data
>> >>>>>>>>>>>>> cannot be used for query optimizations.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I will ask more people for their opinion.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On 25.03.24 10:45, Timo Walther wrote:
>> >>>>>>>>>>>>>>> Hi Ron and Lincoln,
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> thanks for the quick response and the very
>> >>> insightful discussion.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> we might limit future opportunities to
>> >>> optimize queries
>> >>>>>>>>>>>>>>>>> through automatic materialization rewriting by
>> >>> allowing data
>> >>>>>>>>>>>>>>>>> modifications, thus losing the potential for
>> >>> such
>> >>>>> optimizations.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> This argument makes a lot of sense to me. Due to
>> >>> the updates, the
>> >>>>>>>>>>> system
>> >>>>>>>>>>>>>>> is not in full control of the persisted data.
>> >>> However, the system
>> >>>>> is
>> >>>>>>>>>>>>>>> still in full control of the job that powers the
>> >>> refresh. So if
>> >>>>> the
>> >>>>>>>>>>>>>>> system manages all updating pipelines, it could
>> >>> still leverage
>> >>>>>>>>>>> automatic
>> >>>>>>>>>>>>>>> materialization rewriting but without leveraging
>> >>> the data at rest
>> >>>>>>>>> (only
>> >>>>>>>>>>>>>>> the data in flight).
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> we are considering another candidate, Derived
>> >>> Table, the term
>> >>>>>>>>>>> 'derive'
>> >>>>>>>>>>>>>>>>> suggests a query, and 'table' retains
>> >>> modifiability. This
>> >>>>>>>>> approach
>> >>>>>>>>>>>>>>>>> would not disrupt our current concept of a
>> >>> dynamic table
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> I did some research on this term. The SQL standard
>> >>> uses the term
>> >>>>>>>>>>>>>>> "derived table" extensively (defined in section
>> >>> 4.17.3). Thus, a
>> >>>>>>>>> lot of
>> >>>>>>>>>>>>>>> vendors adopt this for simply referring to a table
>> >>> within a
>> >>>>>>>>> subclause:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://dev.mysql.com/doc/refman/8.0/en/derived-tables.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghdiMp$
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://infocenter.sybase.com/help/topic/com.sybase.infocenter.dc32300.1600/doc/html/san1390612291252.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737h1gRux$
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://www.c-sharpcorner.com/article/derived-tables-vs-common-table-expressions/__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739bWIEcL$
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://stackoverflow.com/questions/26529804/what-are-the-derived-tables-in-my-explain-statement__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739HnGtQf$
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://www.sqlservercentral.com/articles/sql-derived-tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j737DeBiqg$
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Esp. the latter example is interesting, SQL Server
>> >>> allows things
>> >>>>>>>>> like
>> >>>>>>>>>>>>>>> this on derived tables:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> UPDATE T SET Name='Timo' FROM (SELECT * FROM
>> >>> Product) AS T
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> SELECT * FROM Product;
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Btw also Snowflake's dynamic table state:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Because the content of a dynamic table is
>> >>> fully determined
>> >>>>>>>>>>>>>>>>> by the given query, the content cannot be
>> >>> changed by using DML.
>> >>>>>>>>>>>>>>>>> You don’t insert, update, or delete the rows
>> >>> in a dynamic
>> >>>>> table.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> So a new term makes a lot of sense.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> How about using `UPDATING`?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> CREATE UPDATING TABLE
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> This reflects that modifications can be made and
>> >>> from an
>> >>>>>>>>>>>>>>> English-language perspective you can PAUSE or
>> >>> RESUME the UPDATING.
>> >>>>>>>>>>>>>>> Thus, a user can define UPDATING interval and mode?
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Looking forward to your thoughts.
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On 25.03.24 07:09, Ron liu wrote:
>> >>>>>>>>>>>>>>>>> Hi, Ahmed
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks for your feedback.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Regarding your question:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I want to iterate on Timo's comments
>> >>> regarding the confusion
>> >>>>>>>>> between
>> >>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink "Table".
>> >>> Should the refactoring
>> >>>>>>>>> of
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it in
>> >>> this Flip ( as the
>> >>>>>>>>>>>>>>>>> suggestions
>> >>>>>>>>>>>>>>>>> in the thread ) and address the holistic
>> >>> changes in a separate
>> >>>>> Flip
>> >>>>>>>>>>>>>>>>> for 2.0?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Lincoln proposed a new concept in reply to
>> >>> Timo: Derived Table,
>> >>>>>>>>> which
>> >>>>>>>>>>>>>>>>> is a
>> >>>>>>>>>>>>>>>>> combination of Dynamic Table + Continuous
>> >>> Query, and the use of
>> >>>>>>>>>>> Derived
>> >>>>>>>>>>>>>>>>> Table will not conflict with existing concepts,
>> >>> what do you
>> >>>>> think?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> I feel confused with how it is further with
>> >>> other components,
>> >>>>> the
>> >>>>>>>>>>>>>>>>> examples provided feel like a standalone ETL
>> >>> job, could you
>> >>>>>>>>> provide in
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> FLIP an example where the table is further used
>> >>> in subsequent
>> >>>>>>>>> queries
>> >>>>>>>>>>>>>>>>> (specially in batch mode).
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Thanks for your suggestion, I added how to use
>> >>> Dynamic Table in
>> >>>>>>>>> FLIP
>> >>>>>>>>>>>>> user
>> >>>>>>>>>>>>>>>>> story section, Dynamic Table can be referenced
>> >>> by downstream
>> >>>>>>>>> Dynamic
>> >>>>>>>>>>>>>>>>> Table
>> >>>>>>>>>>>>>>>>> and can also support OLAP queries.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>> Ron
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Ron liu <ron9....@gmail.com> 于2024年3月23日周六
>> >>> 10:35写道：
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Hi, Feng
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Thanks for your feedback.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Although currently we restrict users from
>> >>> modifying the query,
>> >>>>> I
>> >>>>>>>>>>>>> wonder
>> >>>>>>>>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>>>> we can provide a better way to help users
>> >>> rebuild it without
>> >>>>>>>>>>> affecting
>> >>>>>>>>>>>>>>>>>>> downstream OLAP queries.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Considering the problem of data consistency,
>> >>> so in the first
>> >>>>> step
>> >>>>>>>>> we
>> >>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>> strictly limited in semantics and do not
>> >>> support modify the
>> >>>>> query.
>> >>>>>>>>>>>>>>>>>>> This is
>> >>>>>>>>>>>>>>>>>>> really a good problem, one of my ideas is to
>> >>> introduce a syntax
>> >>>>>>>>>>>>>>>>>>> similar to
>> >>>>>>>>>>>>>>>>>>> SWAP [1], which supports exchanging two
>> >>> Dynamic Tables.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>  From the documentation, the definitions
>> >>> SQL and job
>> >>>>> information
>> >>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this mean that
>> >>> if a system needs to
>> >>>>>>>>> adapt
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to store
>> >>> Flink's job information
>> >>>>> in
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>> corresponding system?
>> >>>>>>>>>>>>>>>>>>> For example, does MySQL's Catalog need to
>> >>> store flink job
>> >>>>>>>>> information
>> >>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>> well?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Yes, currently we need to rely on Catalog to
>> >>> store refresh job
>> >>>>>>>>>>>>>>>>>>> information.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Users still need to consider how much
>> >>> memory is being used, how
>> >>>>>>>>>>> large
>> >>>>>>>>>>>>>>>>>>> the concurrency is, which type of state
>> >>> backend is being used,
>> >>>>> and
>> >>>>>>>>>>>>>>>>>>> may need
>> >>>>>>>>>>>>>>>>>>> to set TTL expiration.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Similar to the current practice, job
>> >>> parameters can be set via
>> >>>>> the
>> >>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>> conf or SET commands
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> When we submit a refresh command, can we
>> >>> help users detect if
>> >>>>>>>>> there
>> >>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>> any
>> >>>>>>>>>>>>>>>>>>> running jobs and automatically stop them
>> >>> before executing the
>> >>>>>>>>> refresh
>> >>>>>>>>>>>>>>>>>>> command? Then wait for it to complete before
>> >>> restarting the
>> >>>>>>>>>>> background
>> >>>>>>>>>>>>>>>>>>> streaming job?
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Purely from a technical implementation point
>> >>> of view, your
>> >>>>>>>>> proposal
>> >>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>> doable, but it would be more costly. Also I
>> >>> think data
>> >>>>> consistency
>> >>>>>>>>>>>>>>>>>>> itself
>> >>>>>>>>>>>>>>>>>>> is the responsibility of the user, similar
>> >>> to how Regular Table
>> >>>>> is
>> >>>>>>>>>>>>>>>>>>> now also
>> >>>>>>>>>>>>>>>>>>> the responsibility of the user, so it's
>> >>> consistent with its
>> >>>>>>>>> behavior
>> >>>>>>>>>>>>>>>>>>> and no
>> >>>>>>>>>>>>>>>>>>> additional guarantees are made at the engine
>> >>> level.
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>> Ron
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>> Ahmed Hamdy <hamdy10...@gmail.com>
>> >>> 于2024年3月22日周五 23:50写道：
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> Hi Ron,
>> >>>>>>>>>>>>>>>>>>>>> Sorry for joining the discussion late,
>> >>> thanks for the effort.
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> I think the base idea is great, however I
>> >>> have a couple of
>> >>>>>>>>> comments:
>> >>>>>>>>>>>>>>>>>>>>> - I want to iterate on Timo's comments
>> >>> regarding the confusion
>> >>>>>>>>>>> between
>> >>>>>>>>>>>>>>>>>>>>> "Dynamic Table" and current Flink
>> >>> "Table". Should the
>> >>>>>>>>> refactoring of
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> system happen in 2.0, should we rename it
>> >>> in this Flip ( as the
>> >>>>>>>>>>>>>>>>>>>>> suggestions
>> >>>>>>>>>>>>>>>>>>>>> in the thread ) and address the holistic
>> >>> changes in a separate
>> >>>>>>>>> Flip
>> >>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>> 2.0?
>> >>>>>>>>>>>>>>>>>>>>> - I feel confused with how it is further
>> >>> with other components,
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>> examples provided feel like a standalone
>> >>> ETL job, could you
>> >>>>>>>>> provide
>> >>>>>>>>>>>>>>>>>>>>> in the
>> >>>>>>>>>>>>>>>>>>>>> FLIP an example where the table is
>> >>> further used in subsequent
>> >>>>>>>>>>> queries
>> >>>>>>>>>>>>>>>>>>>>> (specially in batch mode).
>> >>>>>>>>>>>>>>>>>>>>> - I really like the standard of keeping
>> >>> the unified batch and
>> >>>>>>>>>>>>> streaming
>> >>>>>>>>>>>>>>>>>>>>> approach
>> >>>>>>>>>>>>>>>>>>>>> Best Regards
>> >>>>>>>>>>>>>>>>>>>>> Ahmed Hamdy
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>> On Fri, 22 Mar 2024 at 12:07, Lincoln Lee
>> >>> <
>> >>>>>>>>> lincoln.8...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Hi Timo,
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for your thoughtful inputs!
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Yes, expanding the MATERIALIZED
>> >>> VIEW(MV) could achieve the
>> >>>>> same
>> >>>>>>>>>>>>>>>>>>>>> function,
>> >>>>>>>>>>>>>>>>>>>>>>> but our primary concern is that by
>> >>> using a view, we might
>> >>>>> limit
>> >>>>>>>>>>>>> future
>> >>>>>>>>>>>>>>>>>>>>>>> opportunities
>> >>>>>>>>>>>>>>>>>>>>>>> to optimize queries through automatic
>> >>> materialization
>> >>>>> rewriting
>> >>>>>>>>>>> [1],
>> >>>>>>>>>>>>>>>>>>>>>>> leveraging
>> >>>>>>>>>>>>>>>>>>>>>>> the support for MV by physical
>> >>> storage. This is because we
>> >>>>>>>>> would be
>> >>>>>>>>>>>>>>>>>>>>>>> breaking
>> >>>>>>>>>>>>>>>>>>>>>>> the intuitive semantics of a
>> >>> materialized view (a materialized
>> >>>>>>>>> view
>> >>>>>>>>>>>>>>>>>>>>>>> represents
>> >>>>>>>>>>>>>>>>>>>>>>> the result of a query) by allowing
>> >>> data modifications, thus
>> >>>>>>>>> losing
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> potential
>> >>>>>>>>>>>>>>>>>>>>>>> for such optimizations.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> With these considerations in mind, we
>> >>> were inspired by Google
>> >>>>>>>>>>>>> Looker's
>> >>>>>>>>>>>>>>>>>>>>>>> Persistent
>> >>>>>>>>>>>>>>>>>>>>>>> Derived Table [2]. PDT is designed for
>> >>> building Looker's
>> >>>>>>>>> automated
>> >>>>>>>>>>>>>>>>>>>>>>> modeling,
>> >>>>>>>>>>>>>>>>>>>>>>> aligning with our purpose for the
>> >>> stream-batch automatic
>> >>>>>>>>> pipeline.
>> >>>>>>>>>>>>>>>>>>>>>>> Therefore,
>> >>>>>>>>>>>>>>>>>>>>>>> we are considering another candidate,
>> >>> Derived Table, the term
>> >>>>>>>>>>>>> 'derive'
>> >>>>>>>>>>>>>>>>>>>>>>> suggests a
>> >>>>>>>>>>>>>>>>>>>>>>> query, and 'table' retains
>> >>> modifiability. This approach would
>> >>>>>>>>> not
>> >>>>>>>>>>>>>>>>>>>>> disrupt
>> >>>>>>>>>>>>>>>>>>>>>>> our current
>> >>>>>>>>>>>>>>>>>>>>>>> concept of a dynamic table, preserving
>> >>> the future utility of
>> >>>>>>>>> MVs.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Conceptually, a Derived Table is a
>> >>> Dynamic Table + Continuous
>> >>>>>>>>>>>>>>>>>>>>>>> Query. By
>> >>>>>>>>>>>>>>>>>>>>>>> introducing
>> >>>>>>>>>>>>>>>>>>>>>>> a new concept Derived Table for this
>> >>> FLIP, this makes all
>> >>>>>>>>>>>>>>>>>>>>>>> concepts to
>> >>>>>>>>>>>>>>>>>>>>> play
>> >>>>>>>>>>>>>>>>>>>>>>> together nicely.
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> What do you think about this?
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://calcite.apache.org/docs/materialized_views.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73_NFf4D5$
>> >>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://cloud.google.com/looker/docs/derived-tables*persistent_derived_tables__;Iw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7382-2zI3$
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org>
>> >>> 于2024年3月22日周五 17:54写道：
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron,
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> thanks for the detailed answer.
>> >>> Sorry, for my late reply, we
>> >>>>>>>>> had a
>> >>>>>>>>>>>>>>>>>>>>>>>>> conference that kept me busy.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the current concept[1], it
>> >>> actually includes: Dynamic
>> >>>>>>>>>>> Tables
>> >>>>>>>>>>>>> &
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> & Continuous Query. Dynamic
>> >>> Table is just an abstract
>> >>>>>>>>> logical
>> >>>>>>>>>>>>>>>>>>>>> concept
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> This explanation makes sense to me.
>> >>> But the docs also say "A
>> >>>>>>>>>>>>>>>>>>>>> continuous
>> >>>>>>>>>>>>>>>>>>>>>>>>> query is evaluated on the dynamic
>> >>> table yielding a new
>> >>>>> dynamic
>> >>>>>>>>>>>>>>>>>>>>> table.".
>> >>>>>>>>>>>>>>>>>>>>>>>>> So even our regular CREATE TABLEs
>> >>> are considered dynamic
>> >>>>>>>>> tables.
>> >>>>>>>>>>>>> This
>> >>>>>>>>>>>>>>>>>>>>>>>>> can also be seen in the diagram
>> >>> "Dynamic Table -> Continuous
>> >>>>>>>>> Query
>> >>>>>>>>>>>>> ->
>> >>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table". Currently, Flink
>> >>> queries can only be executed
>> >>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>> Dynamic
>> >>>>>>>>>>>>>>>>>>>>>>>>> Tables.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In essence, a materialized view
>> >>> represents the result of
>> >>>>> a
>> >>>>>>>>>>>>> query.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Isn't that what your proposal does
>> >>> as well?
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the object of the suspend
>> >>> operation is the refresh task
>> >>>>> of
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> dynamic table
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> I understand that Snowflake uses
>> >>> the term [1] to merge their
>> >>>>>>>>>>>>> concepts
>> >>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>> STREAM, TASK, and TABLE into one
>> >>> piece of concept. But Flink
>> >>>>>>>>> has
>> >>>>>>>>>>> no
>> >>>>>>>>>>>>>>>>>>>>>>>>> concept of a "refresh task". Also,
>> >>> they already introduced
>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>> >>>>>>>>>>>>>>>>>>>>>>>>> VIEW. Flink is in the convenient
>> >>> position that the concept of
>> >>>>>>>>>>>>>>>>>>>>>>>>> materialized views is not taken
>> >>> (reserved maybe for exactly
>> >>>>>>>>> this
>> >>>>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>>> case?). And SQL standard concept
>> >>> could be "slightly adapted"
>> >>>>> to
>> >>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>>>>>>>>> needs. Looking at other vendors
>> >>> like Postgres[2], they also
>> >>>>> use
>> >>>>>>>>>>>>>>>>>>>>>>>>> `REFRESH` commands so why not
>> >>> adding additional commands such
>> >>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>> DELETE
>> >>>>>>>>>>>>>>>>>>>>>>>>> or UPDATE. Oracle supports "ON
>> >>> PREBUILT TABLE clause tells
>> >>>>> the
>> >>>>>>>>>>>>>>>>>>>>> database
>> >>>>>>>>>>>>>>>>>>>>>>>>> to use an existing table
>> >>> segment"[3] which comes closer to
>> >>>>>>>>> what we
>> >>>>>>>>>>>>>>>>>>>>> want
>> >>>>>>>>>>>>>>>>>>>>>>>>> as well.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> it is not intended to support
>> >>> data modification
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> This is an argument that I
>> >>> understand. But we as Flink could
>> >>>>>>>>> allow
>> >>>>>>>>>>>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>> modifications. This way we are only
>> >>> extending the standard
>> >>>>> and
>> >>>>>>>>>>> don't
>> >>>>>>>>>>>>>>>>>>>>>>>>> introduce new concepts.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> If we can't agree on using
>> >>> MATERIALIZED VIEW concept. We
>> >>>>> should
>> >>>>>>>>>>> fix
>> >>>>>>>>>>>>>>>>>>>>> our
>> >>>>>>>>>>>>>>>>>>>>>>>>> syntax in a Flink 2.0 effort.
>> >>> Making regular tables bounded
>> >>>>> and
>> >>>>>>>>>>>>>>>>>>>>> dynamic
>> >>>>>>>>>>>>>>>>>>>>>>>>> tables unbounded. We would be
>> >>> closer to the SQL standard with
>> >>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>> pave the way for the future. I
>> >>> would actually support this if
>> >>>>>>>>> all
>> >>>>>>>>>>>>>>>>>>>>>>>>> concepts play together nicely.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> In the future, we can consider
>> >>> extending the statement
>> >>>>> set
>> >>>>>>>>>>>>> syntax
>> >>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>> support the creation of multiple
>> >>> dynamic tables.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> It's good that we called the
>> >>> concept STATEMENT SET. This
>> >>>>>>>>> allows us
>> >>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>> defined CREATE TABLE within. Even
>> >>> if it might look a bit
>> >>>>>>>>>>> confusing.
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
>> >>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://www.postgresql.org/docs/current/sql-creatematerializedview.html__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zbNhvS7$
>> >>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://oracle-base.com/articles/misc/materialized-views__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j739xS1kvD$
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> On 21.03.24 04:14, Feng Jin wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Ron and Lincoln
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
>> >>> discussion. I believe it will
>> >>>>> greatly
>> >>>>>>>>>>>>>>>>>>>>> improve
>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> convenience of managing user
>> >>> real-time pipelines.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> I have some questions.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding Limitations of
>> >>> Dynamic Table:*
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Does not support modifying
>> >>> the select statement after the
>> >>>>>>>>>>> dynamic
>> >>>>>>>>>>>>>>>>>>>>>>> table
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> is created.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Although currently we restrict
>> >>> users from modifying the
>> >>>>>>>>> query, I
>> >>>>>>>>>>>>>>>>>>>>> wonder
>> >>>>>>>>>>>>>>>>>>>>>>>>> if
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> we can provide a better way to
>> >>> help users rebuild it without
>> >>>>>>>>>>>>>>>>>>>>> affecting
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> downstream OLAP queries.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the management of
>> >>> background jobs:*
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 1. From the documentation, the
>> >>> definitions SQL and job
>> >>>>>>>>>>> information
>> >>>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> stored in the Catalog. Does this
>> >>> mean that if a system needs
>> >>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> adapt
>> >>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Tables, it also needs to
>> >>> store Flink's job
>> >>>>>>>>> information in
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding system?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> For example, does MySQL's
>> >>> Catalog need to store flink job
>> >>>>>>>>>>>>>>>>>>>>> information
>> >>>>>>>>>>>>>>>>>>>>>>> as
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> well?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Users still need to consider
>> >>> how much memory is being
>> >>>>> used,
>> >>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>>>> large
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> the concurrency is, which type
>> >>> of state backend is being
>> >>>>> used,
>> >>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>> may
>> >>>>>>>>>>>>>>>>>>>>>>>>> need
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> to set TTL expiration.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> *Regarding the Refresh Part:*
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> If the refresh mode is
>> >>> continuous and a background job is
>> >>>>>>>>>>> running,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> caution should be taken with the
>> >>> refresh command as it can
>> >>>>>>>>> lead
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> inconsistent data.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> When we submit a refresh
>> >>> command, can we help users detect
>> >>>>> if
>> >>>>>>>>>>> there
>> >>>>>>>>>>>>>>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>>>> any
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> running jobs and automatically
>> >>> stop them before executing
>> >>>>> the
>> >>>>>>>>>>>>>>>>>>>>> refresh
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> command? Then wait for it to
>> >>> complete before restarting the
>> >>>>>>>>>>>>>>>>>>>>> background
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> streaming job?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> Feng
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, Mar 19, 2024 at 9:40 PM
>> >>> Lincoln Lee <
>> >>>>>>>>>>>>> lincoln.8...@gmail.com
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Yun,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your
>> >>> valuable input!
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Incremental mode is indeed an
>> >>> attractive idea, we have also
>> >>>>>>>>>>>>>>>>>>>>> discussed
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this, but in the current
>> >>> design,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we first provided two refresh
>> >>> modes: CONTINUOUS and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FULL. Incremental mode can be
>> >>> introduced
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> once the execution layer has
>> >>> the capability.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> My answer for the two
>> >>> questions:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, cascading is a good
>> >>> question. Current proposal
>> >>>>>>>>> provides a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness that defines a
>> >>> dynamic
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> table relative to the base
>> >>> table’s lag. If users need to
>> >>>>>>>>>>> consider
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> end-to-end freshness of
>> >>> multiple
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> cascaded dynamic tables, he
>> >>> can manually split them for
>> >>>>> now.
>> >>>>>>>>> Of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> course, how to let multiple
>> >>> cascaded
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> or dependent dynamic tables
>> >>> complete the freshness
>> >>>>>>>>> definition
>> >>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> simpler way, I think it can be
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> extended in the future.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cascading refresh is also a
>> >>> part we focus on discussing. In
>> >>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>> flip,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we hope to focus as much as
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> possible on the core features
>> >>> (as it already involves a lot
>> >>>>>>>>>>>>>>>>>>>>> things),
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> so we did not directly
>> >>> introduce related
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax. However, based on the
>> >>> current design, combined
>> >>>>>>>>> with
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> catalog and lineage,
>> >>> theoretically,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users can also finish the
>> >>> cascading refresh.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang <myas...@live.com>
>> >>> 于2024年3月19日周二 13:45写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for driving this
>> >>> discussion, and I am so excited to
>> >>>>>>>>> see
>> >>>>>>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>>>>>> topic
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> being discussed in the
>> >>> Flink community!
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>  From my point of view,
>> >>> instead of the work of unifying
>> >>>>>>>>>>>>> streaming
>> >>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in DataStream API [1],
>> >>> this FLIP actually could make users
>> >>>>>>>>>>>>> benefit
>> >>>>>>>>>>>>>>>>>>>>>>> from
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> one
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> engine to rule batch &
>> >>> streaming.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If we treat this FLIP as
>> >>> an open-source implementation of
>> >>>>>>>>>>>>>>>>>>>>> Snowflake's
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic tables [2], we
>> >>> still lack an incremental refresh
>> >>>>>>>>> mode
>> >>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> make
>> >>>>>>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ETL near real-time with a
>> >>> much cheaper computation cost.
>> >>>>>>>>>>> However,
>> >>>>>>>>>>>>>>>>>>>>> I
>> >>>>>>>>>>>>>>>>>>>>>>>>> think
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this could be done under
>> >>> the current design by introducing
>> >>>>>>>>>>>>> another
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mode in the future.
>> >>> Although the extra work of incremental
>> >>>>>>>>> view
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> maintenance
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would be much larger.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the FLIP itself, I
>> >>> have several questions below:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. It seems this FLIP does
>> >>> not consider the lag of
>> >>>>> refreshes
>> >>>>>>>>>>>>>>>>>>>>> across
>> >>>>>>>>>>>>>>>>>>>>>>> ETL
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> layers from ODS ---> DWD
>> >>> ---> APP [3]. We currently only
>> >>>>>>>>>>> consider
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scheduler interval, which
>> >>> means we cannot use lag to
>> >>>>>>>>>>>>> automatically
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> schedule
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the upfront micro-batch
>> >>> jobs to do the work.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. To support the
>> >>> automagical refreshes, we should
>> >>>>> consider
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> lineage
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the catalog or somewhere
>> >>> else.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-134*3A*Batch*execution*for*the*DataStream*API__;JSsrKysrKw!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j7352JICzI$
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-about__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73zexZBXu$
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [3]
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j735ghqpxk$
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yun Tang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>> ________________________________
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> From: Lincoln Lee <
>> >>> lincoln.8...@gmail.com>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, March 14,
>> >>> 2024 14:35
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> To: dev@flink.apache.org <
>> >>> dev@flink.apache.org>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [DISCUSS]
>> >>> FLIP-435: Introduce a New Dynamic
>> >>>>>>>>> Table
>> >>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Simplifying Data Pipelines
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Jing,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for your attention
>> >>> to this flip! I'll try to answer
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>> following
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> questions.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define query
>> >>> of dynamic table?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
>> >>> introducing new syntax?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql, how
>> >>> to handle the difference in SQL
>> >>>>>>>>> between
>> >>>>>>>>>>>>>>>>>>>>>>>>> streaming
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
>> >>> including window aggregate based on
>> >>>>>>>>>>>>>>>>>>>>> processing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
>> >>> global order by?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Similar to `CREATE TABLE
>> >>> AS query`, here the `query` also
>> >>>>>>>>> uses
>> >>>>>>>>>>>>>>>>>>>>> Flink
>> >>>>>>>>>>>>>>>>>>>>>>>>> sql
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> doesn't introduce a
>> >>> totally new syntax.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We will not change the
>> >>> status respect to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the difference in
>> >>> functionality of flink sql itself on
>> >>>>>>>>>>> streaming
>> >>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch, for example,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the proctime window agg on
>> >>> streaming and global sort on
>> >>>>>>>>> batch
>> >>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> mentioned,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> in fact, do not work
>> >>> properly in the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> other mode, so when the
>> >>> user modifies the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> refresh mode of a dynamic
>> >>> table that is not supported, we
>> >>>>>>>>> will
>> >>>>>>>>>>>>>>>>>>>>> throw
>> >>>>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> exception.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify the
>> >>> query of dynamic table is allowed?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
>> >>> refresh a dynamic table based on the
>> >>>>>>>>> initial
>> >>>>>>>>>>>>>>>>>>>>> query?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, in the current
>> >>> design, the query definition of the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table is not
>> >>> allowed
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to be modified, and you
>> >>> can only refresh the data based
>> >>>>>>>>> on
>> >>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial definition.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use dynamic
>> >>> table?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table seems
>> >>> to be similar to the materialized
>> >>>>>>>>>>> view.
>> >>>>>>>>>>>>>>>>>>>>>>> Will
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
>> >>> materialized view rewriting during the
>> >>>>>>>>>>>>>>>>>>>>> optimization?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It's true that dynamic
>> >>> table and materialized view
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> are similar in some ways,
>> >>> but as Ron
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> explains
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> there are differences. In
>> >>> terms of optimization, automated
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> materialization discovery
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> similar to that supported
>> >>> by calcite is also a potential
>> >>>>>>>>>>>>>>>>>>>>> possibility,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> perhaps with the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> addition of automated
>> >>> rewriting in the future.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron liu <
>> >>> ron9....@gmail.com> 于2024年3月14日周四 14:01写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Timo
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sorry for later
>> >>> response, thanks for your feedback.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding your
>> >>> questions:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has introduced
>> >>> the concept of Dynamic Tables many
>> >>>>>>>>> years
>> >>>>>>>>>>>>>>>>>>>>> ago.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term "Dynamic
>> >>> Table" fit into Flink's regular
>> >>>>>>>>> tables
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it relate to
>> >>> Table API?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that adding
>> >>> the DYNAMIC TABLE keyword could cause
>> >>>>>>>>>>>>>>>>>>>>> confusion
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
>> >>> term for regular CREATE TABLE (that can
>> >>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> "kind
>> >>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well and
>> >>> is backed by a changelog) is then
>> >>>>>>>>>>> missing.
>> >>>>>>>>>>>>>>>>>>>>>>> Also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we call
>> >>> our connectors for those tables,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and DynamicTableSink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I find
>> >>> it contradicting that a TABLE can be
>> >>>>>>>>>>>>>>>>>>>>> "paused" or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From an
>> >>> English language perspective, this
>> >>>>> does
>> >>>>>>>>>>>>> sound
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
>> >>> opinion (without much research yet), a
>> >>>>>>>>>>>>>>>>>>>>> continuous
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
>> >>> should rather be modelled as a CREATE
>> >>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
>> >>> familiar with?) or a new concept such
>> >>>>> as
>> >>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>> CREATE
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be paused
>> >>> and resumed?).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In the current
>> >>> concept[1], it actually includes: Dynamic
>> >>>>>>>>>>> Tables
>> >>>>>>>>>>>>> &
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query.
>> >>> Dynamic Table is just an abstract
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> logical concept
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , which in its physical
>> >>> form represents either a table
>> >>>>> or a
>> >>>>>>>>>>>>>>>>>>>>>>> changelog
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stream. It requires the
>> >>> combination with Continuous Query
>> >>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> achieve
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic updates of the
>> >>> target table similar to a
>> >>>>> database’s
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Materialized View.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> We hope to upgrade the
>> >>> Dynamic Table to a real entity
>> >>>>> that
>> >>>>>>>>>>> users
>> >>>>>>>>>>>>>>>>>>>>> can
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operate, which combines
>> >>> the logical concepts of Dynamic
>> >>>>>>>>>>> Tables +
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Continuous Query. By
>> >>> integrating the definition of tables
>> >>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>> queries,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> it can achieve
>> >>> functions similar to Materialized Views,
>> >>>>>>>>>>>>>>>>>>>>> simplifying
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users' data processing
>> >>> pipelines.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> So, the object of the
>> >>> suspend operation is the refresh
>> >>>>>>>>> task of
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic table. The
>> >>> command `ALTER DYNAMIC TABLE
>> >>>>> table_name
>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
>> >>>>>>>>>>>>>>>>>>>>>>> `
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> is actually a shorthand
>> >>> for `ALTER DYNAMIC TABLE
>> >>>>> table_name
>> >>>>>>>>>>>>>>>>>>>>> SUSPEND
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> REFRESH` (if written in
>> >>> full for clarity, we can also
>> >>>>>>>>> modify
>> >>>>>>>>>>>>> it).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Initially, we also
>> >>> considered Materialized Views
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> , but ultimately
>> >>> decided against them. Materialized views
>> >>>>>>>>> are
>> >>>>>>>>>>>>>>>>>>>>>>> designed
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to enhance query
>> >>> performance for workloads that consist
>> >>>>> of
>> >>>>>>>>>>>>>>>>>>>>> common,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> repetitive query
>> >>> patterns. In essence, a materialized
>> >>>>> view
>> >>>>>>>>>>>>>>>>>>>>>>> represents
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the result of a query.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, it is not
>> >>> intended to support data modification.
>> >>>>>>>>> For
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lakehouse scenarios,
>> >>> where the ability to delete or
>> >>>>> update
>> >>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>> is
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> crucial (such as
>> >>> compliance with GDPR, FLIP-2),
>> >>>>>>>>> materialized
>> >>>>>>>>>>>>>>>>>>>>> views
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fall short.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Compared to CREATE
>> >>> (regular) TABLE, CREATE DYNAMIC TABLE
>> >>>>>>>>> not
>> >>>>>>>>>>>>> only
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> defines metadata in the
>> >>> catalog but also automatically
>> >>>>>>>>>>> initiates
>> >>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data refresh task based
>> >>> on the query specified during
>> >>>>> table
>> >>>>>>>>>>>>>>>>>>>>>>> creation.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> It dynamically executes
>> >>> data updates. Users can focus on
>> >>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dependencies and data
>> >>> generation logic.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 4.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The new dynamic table
>> >>> does not conflict with the existing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource and
>> >>> DynamicTableSink interfaces. For
>> >>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> developer,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all that needs to be
>> >>> implemented is the new
>> >>>>>>>>>>> CatalogDynamicTable,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> without changing the
>> >>> implementation of source and sink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 5. For now, the FLIP
>> >>> does not consider supporting Table
>> >>>>> API
>> >>>>>>>>>>>>>>>>>>>>>>> operations
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Dynamic Table
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> . However, once the SQL
>> >>> syntax is finalized, we can
>> >>>>> discuss
>> >>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>>>>> in
>> >>>>>>>>>>>>>>>>>>>>>>> a
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> separate FLIP.
>> >>> Currently, I have a rough idea: the Table
>> >>>>>>>>> API
>> >>>>>>>>>>>>>>>>>>>>> should
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also introduce
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTable operation
>> >>> interfaces
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> corresponding to the
>> >>> existing Table interfaces.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The TableEnvironment
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> will provide relevant
>> >>> methods to support various
>> >>>>> dynamic
>> >>>>>>>>>>>>> table
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operations. The goal
>> >>> for the new Dynamic Table is to
>> >>>>> offer
>> >>>>>>>>>>> users
>> >>>>>>>>>>>>>>>>>>>>> an
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> experience similar to
>> >>> using a database, which is why we
>> >>>>>>>>>>>>>>>>>>>>> prioritize
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> SQL-based approaches
>> >>> initially.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you envision
>> >>> re-adding the functionality of a
>> >>>>>>>>>>> statement
>> >>>>>>>>>>>>>>>>>>>>> set,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to multiple
>> >>> tables? This is a very important
>> >>>>> use
>> >>>>>>>>>>> case
>> >>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Multi-tables is indeed
>> >>> a very important user scenario. In
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> future,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> we can consider
>> >>> extending the statement set syntax to
>> >>>>>>>>> support
>> >>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> creation of multiple
>> >>> dynamic tables.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
>> >>> days of Flink SQL, we were discussing
>> >>>>>>>>>>> `SELECT
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
>> >>> MINUTES`. Your proposal seems to rephrase
>> >>>>>>>>>>> STREAM
>> >>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other keywords
>> >>> DYNAMIC TABLE and FRESHNESS. But the
>> >>>>>>>>> core
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
>> >>> still there. I'm wondering if we should
>> >>>>>>>>>>> widen
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part of
>> >>> this FLIP but a new FLIP) to follow
>> >>>>> the
>> >>>>>>>>>>>>>>>>>>>>> standard
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
>> >>> `SELECT * FROM t` bounded by default and
>> >>>>>>>>> use
>> >>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
>> >>> behavior. Flink 2.0 would be the perfect
>> >>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
>> >>> require careful discussions. What do
>> >>>>> you
>> >>>>>>>>>>>>>>>>>>>>> think?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The query part indeed
>> >>> requires a separate FLIP
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for discussion, as it
>> >>> involves changes to the default
>> >>>>>>>>>>> behavior.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>
>> >>>
>> https://urldefense.com/v3/__https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/dynamic_tables__;!!IKRxdwAv5BmarQ!dVYcp9PUyjpBGzkYFxb2sdnmB0E22koc-YLdxY2LidExEHUJKRkyvRbAveqjlYFKWevFvmE1Z-j73477_wHn$
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Ron
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang <
>> >>> beyond1...@gmail.com> 于2024年3月13日周三 15:19写道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Lincoln & Ron,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the
>> >>> proposal.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree with the
>> >>> question raised by Timo.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Besides, I have some
>> >>> other questions.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. How to define
>> >>> query of dynamic table?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Use flink sql or
>> >>> introducing new syntax?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> If use flink sql,
>> >>> how to handle the difference in SQL
>> >>>>>>>>> between
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> streaming
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> batch processing?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For example, a query
>> >>> including window aggregate based on
>> >>>>>>>>>>>>>>>>>>>>> processing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> time?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> or a query including
>> >>> global order by?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Whether modify
>> >>> the query of dynamic table is allowed?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Or we could only
>> >>> refresh a dynamic table based on
>> >>>>> initial
>> >>>>>>>>>>>>> query?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. How to use
>> >>> dynamic table?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The dynamic table
>> >>> seems to be similar with materialized
>> >>>>>>>>> view.
>> >>>>>>>>>>>>>>>>>>>>> Will
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> we
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> do
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> something like
>> >>> materialized view rewriting during the
>> >>>>>>>>>>>>>>>>>>>>> optimization?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Jing Zhang
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo Walther <
>> >>> twal...@apache.org> 于2024年3月13日周三 01:24写
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 道：
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Lincoln & Ron,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> thanks for
>> >>> proposing this FLIP. I think a design
>> >>>>> similar
>> >>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>> what
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> propose has been
>> >>> in the heads of many people, however,
>> >>>>>>>>> I'm
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> wondering
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this will fit
>> >>> into the bigger picture.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I haven't deeply
>> >>> reviewed the FLIP yet, but would like
>> >>>>> to
>> >>>>>>>>>>> ask
>> >>>>>>>>>>>>>>>>>>>>> some
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> initial questions:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Flink has
>> >>> introduced the concept of Dynamic Tables many
>> >>>>>>>>>>> years
>> >>>>>>>>>>>>>>>>>>>>> ago.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> does the term
>> >>> "Dynamic Table" fit into Flink's regular
>> >>>>>>>>>>> tables
>> >>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> how does it
>> >>> relate to Table API?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I fear that
>> >>> adding the DYNAMIC TABLE keyword could
>> >>>>> cause
>> >>>>>>>>>>>>>>>>>>>>> confusion
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> users, because a
>> >>> term for regular CREATE TABLE (that
>> >>>>> can
>> >>>>>>>>> be
>> >>>>>>>>>>>>>>>>>>>>> "kind
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> dynamic" as well
>> >>> and is backed by a changelog) is then
>> >>>>>>>>>>>>> missing.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> given that we
>> >>> call our connectors for those tables,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> DynamicTableSource
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>> >>> DynamicTableSink.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> In general, I
>> >>> find it contradicting that a TABLE can be
>> >>>>>>>>>>>>>>>>>>>>> "paused"
>> >>>>>>>>>>>>>>>>>>>>>>> or
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "resumed". From
>> >>> an English language perspective, this
>> >>>>>>>>> does
>> >>>>>>>>>>>>>>>>>>>>> sound
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> incorrect. In my
>> >>> opinion (without much research yet), a
>> >>>>>>>>>>>>>>>>>>>>> continuous
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> updating trigger
>> >>> should rather be modelled as a CREATE
>> >>>>>>>>>>>>>>>>>>>>>>> MATERIALIZED
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> VIEW
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (which users are
>> >>> familiar with?) or a new concept such
>> >>>>>>>>> as a
>> >>>>>>>>>>>>>>>>>>>>> CREATE
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> TASK
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (that can be
>> >>> paused and resumed?).
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> How do you
>> >>> envision re-adding the functionality of a
>> >>>>>>>>>>> statement
>> >>>>>>>>>>>>>>>>>>>>>>> set,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> fans out to
>> >>> multiple tables? This is a very important
>> >>>>> use
>> >>>>>>>>>>> case
>> >>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Since the early
>> >>> days of Flink SQL, we were discussing
>> >>>>>>>>>>> `SELECT
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> STREAM
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> *
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> FROM T EMIT 5
>> >>> MINUTES`. Your proposal seems to rephrase
>> >>>>>>>>>>> STREAM
>> >>>>>>>>>>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> EMIT,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> into other
>> >>> keywords DYNAMIC TABLE and FRESHNESS. But
>> >>>>> the
>> >>>>>>>>>>> core
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functionality is
>> >>> still there. I'm wondering if we
>> >>>>> should
>> >>>>>>>>>>> widen
>> >>>>>>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> scope
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (maybe not part
>> >>> of this FLIP but a new FLIP) to follow
>> >>>>>>>>> the
>> >>>>>>>>>>>>>>>>>>>>>>> standard
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> more
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> closely. Making
>> >>> `SELECT * FROM t` bounded by default
>> >>>>> and
>> >>>>>>>>> use
>> >>>>>>>>>>>>>>>>>>>>> new
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> syntax
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> for the dynamic
>> >>> behavior. Flink 2.0 would be the
>> >>>>> perfect
>> >>>>>>>>>>> time
>> >>>>>>>>>>>>>>>>>>>>> for
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> this,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> however, it would
>> >>> require careful discussions. What do
>> >>>>>>>>> you
>> >>>>>>>>>>>>>>>>>>>>> think?
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Timo
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On 11.03.24
>> >>> 08:23, Ron liu wrote:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi, Dev
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln Lee
>> >>> and I would like to start a discussion
>> >>>>> about
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> FLIP-435:
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Introduce a
>> >>> New Dynamic Table for Simplifying Data
>> >>>>>>>>>>>>> Pipelines.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This FLIP is
>> >>> designed to simplify the development of
>> >>>>>>>>> data
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processing
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> pipelines.
>> >>> With Dynamic Tables with uniform SQL
>> >>>>>>>>> statements
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> freshness,
>> >>> users can define batch and streaming
>> >>>>>>>>>>>>>>>>>>>>> transformations
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>> to
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> data in the
>> >>> same way, accelerate ETL pipeline
>> >>>>>>>>> development,
>> >>>>>>>>>>>>> and
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> manage
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> task
>> >>> scheduling automatically.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For more
>> >>> details, see FLIP-435 [1]. Looking forward to
>> >>>>>>>>> your
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> feedback.
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Lincoln & Ron
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>
>> >>>>>
>> >>>
>> >>
>> >
>>
>>

Re: [DISCUSS] FLIP-435: Introduce a New Dynamic Table for Simplifying Data Pipelines

Reply via email to