Well, thanks xia for your clarification. Agree with your point, I have no other concerns.
Best, Aitozi. yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年4月13日周四 16:17写道: > > Hi, Aitozi. > Thanks for your inputs. I understand your concern. Althogh the external > connector can update the metadata in method `executeTruncation`, > but the Flink catalog can't be aware the updating in some case. If the Hive > catalog only store hive tables, everything will be fine. > But if the Hive catalog also store non-hive table, and the non-hive table > can't be update the underlying Hive metatasore, as a result of which > the Hive catalog will still get old metata. > > Since this problem is generic which is not only limited to truncate table > statment, but also to other statement, like insert, update/delete or other > statments on the way. > I think it deserves another dedicated channel to discuss what the Flink > catalog is for or do we need to introduce some new mechanism for it. > > > Best regards, > Yuxia > > ----- 原始邮件 ----- > 发件人: "Aitozi" <gjying1...@gmail.com> > 收件人: "dev" <dev@flink.apache.org> > 发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48 > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > Hi, xia > > which I think if Flink supports table cache in framework-level, > we can also recache in framework-level for truncate table statement. > > I think currently flink catalog already will some stats for the table, > eg: after `ANALYZE TABLE`, the table's Statistics will be stored in > the > catalog, but truncate table will not correct the statistic. > > I know it's hard for Flink to do the unified follow-up actions after > truncating table. But I think we need define a clear location for the > Flink Catalog > in mind. > IMO, Flink as a compute engine, it's hard for it to maintain the > catalog for different storage table itself. So with more and more > `Executable` > command introduced the data in catalog will be cleaved. > In this case, after truncate the catalog's following part may be affected: > > - the table/column statistic will be not correct > - the partition of this table should be cleared > > > Best, > Aitozi. > > > liu ron <ron9....@gmail.com> 于2023年4月13日周四 11:28写道: > > > > > Hi, xia > > > > Thanks for your explanation, for the first question, given the current > > status, I think we can provide the generic interface in the future if we > > need it. For the second question, it makes sense to me if we can > > support the table cache at the framework level. > > > > Best, > > Ron > > > > yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年4月11日周二 16:12写道: > > > > > Hi, ron. > > > > > > 1: Considering for deleting rows, Flink will also write delete record to > > > achive purpose of deleting data, it may not as so strange for connector > > > devs to make DynamicTableSink implement SupportsTruncate to support > > > truncate the table. Based on the assume that DynamicTableSink is used for > > > inserting/updating/deleting, I think it's reasonable for DynamicTableSink > > > to implement SupportsTruncate. But I think it sounds reasonable to add a > > > generic interface like DynamicTable to differentiate DynamicTableSource & > > > DynamicTableSink. But it will definitely requires much design and > > > discussion which deserves a dedicated FLIP. I perfer not to do that in > > > this > > > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe > > > we can discuss it if some day if we do need the new generic table > > > interface. > > > > > > 2: Considering various catalogs and tables, it's hard for Flink to do the > > > unified follow-up actions after truncating table. But still the external > > > connector can do such follow-up actions in method `executeTruncation`. > > > Btw, in Spark, for the newly truncate table interface[1], Spark only > > > recaches the table after truncating table[2] which I think if Flink > > > supports table cache in framework-level, > > > we can also recache in framework-level for truncate table statement. > > > > > > [1] > > > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java > > > [2] > > > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala > > > > > > > > > I think the external catalog can implemnet such logic in method > > > `executeTruncation`. > > > > > > Best regards, > > > Yuxia > > > > > > ----- 原始邮件 ----- > > > 发件人: "liu ron" <ron9....@gmail.com> > > > 收件人: "dev" <dev@flink.apache.org> > > > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36 > > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > > > Hi, xia > > > It's a nice improvement to support TRUNCATE TABLE statement, making Flink > > > more feature-rich. > > > I think the truncate syntax is a command that will be executed in the > > > client's process, rather than pulling up a Flink job to execute on the > > > cluster. So on the user-facing exposed interface, I think we should not > > > let > > > users implement the SupportsTruncate interface on the DynamicTableSink > > > interface. This seems a bit strange and also confuses users, as hang said, > > > why Source table does not support truncate. It would be nice if we could > > > come up with a generic interface that supports truncate instead of binding > > > it to the DynamicTableSink interface, and maybe in the future we will > > > support more commands like truncate command. > > > > > > In addition, after truncating data, we may also need to update the > > > metadata > > > of the table, such as Hive table, we need to update the statistics, as > > > well > > > as clear the cache in the metastore, I think we should also consider these > > > capabilities, Sparky has considered these, refer to > > > > > > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573 > > > . > > > > > > Best, > > > > > > Ron > > > > > > Jim Hughes <jhug...@confluent.io.invalid> 于2023年4月11日周二 02:15写道: > > > > > > > Hi Yuxia, > > > > > > > > On Mon, Apr 10, 2023 at 10:35 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > > wrote: > > > > > > > > > Hi, Jim. > > > > > > > > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to > > > > > support all at one shot. For the DynamicTableSinks that haven't > > > > implemented > > > > > SupportsTruncate interface, we'll throw exception > > > > > like 'The truncate statement for the table is not supported as it > > > hasn't > > > > > implemented the interface SupportsTruncate'. Also, for some sinks that > > > > > doesn't support deleting data, it can also implements it but throw > > > > > more > > > > > concrete exception like "xxx donesn't support to truncate a table as > > > > delete > > > > > is impossible for xxx". It depends on the external connector's > > > > > implementation. > > > > > Thanks for your advice, I updated it to the FLIP. > > > > > > > > > > > > > Makes sense. > > > > > > > > > > > > > 2: What do you mean by saying "truncate an input to a streaming > > > > > query"? > > > > > This FLIP is aimed to support TRUNCATE TABLE statement which is for > > > > > truncating a table. In which case it will inoperates with streaming > > > > queries? > > > > > > > > > > > > > Let's take a source like Kafka as an example. Suppose I have an input > > > > topic Foo, and query which uses it as an input. > > > > > > > > When Foo is truncated, if the truncation works as a delete and create, > > > then > > > > the connector may need to be made aware (otherwise it may try to use > > > > offsets from the previous topic). On the other hand, one may have to > > > > ask > > > > Kafka to delete records up to a certain point. > > > > > > > > Also, savepoints for the query may contain information from the > > > > truncated > > > > table. Should this FLIP involve invalidating that information in some > > > > manner? Or does truncating a source table for a query cause undefined > > > > behavior on that query? > > > > > > > > Basically, I'm trying to think through the implementations of a truncate > > > > operation to streaming sources and queries. > > > > > > > > Cheers, > > > > > > > > Jim > > > > > > > > > > > > > Best regards, > > > > > Yuxia > > > > > > > > > > ----- 原始邮件 ----- > > > > > 发件人: "Jim Hughes" <jhug...@confluent.io.INVALID> > > > > > 收件人: "dev" <dev@flink.apache.org> > > > > > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28 > > > > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement > > > > > > > > > > Hi Yuxia, > > > > > > > > > > Two questions: > > > > > > > > > > 1. Are you expecting all DynamicTableSinks to support Truncate? The > > > > FLIP > > > > > could use some explanation for what supporting and not supporting the > > > > > operation means. > > > > > > > > > > 2. How will truncate inoperate with streaming queries? That is, if I > > > > > truncate an input to a streaming query, is there any defined behavior? > > > > > > > > > > Cheers, > > > > > > > > > > Jim > > > > > > > > > > On Wed, Mar 22, 2023 at 9:13 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > > > > wrote: > > > > > > > > > > > Hi, devs. > > > > > > > > > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE > > > > > > TABLE > > > > > > statement [1]. > > > > > > > > > > > > The TRUNCATE TABLE statement is a SQL command that allows users to > > > > > quickly > > > > > > and efficiently delete all rows from a table without dropping the > > > table > > > > > > itself. This statement is commonly used in data warehouse, where > > > large > > > > > data > > > > > > sets are frequently loaded and unloaded from tables. > > > > > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore > > > > > exactly, > > > > > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an > > > > > > interface > > > > > with > > > > > > which the coresponding connectors can implement their own logic for > > > > > > truncating table. > > > > > > > > > > > > Looking forwards to your feedback. > > > > > > > > > > > > [1]: [ > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > > > > | > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement > > > > > > ] > > > > > > > > > > > > > > > > > > Best regards, > > > > > > Yuxia > > > > > > > > > > > > > > > > > >