Hi, ron.

1: Considering for deleting rows, Flink will also write delete record to achive 
purpose of deleting data, it may not as so strange for connector devs to make 
DynamicTableSink implement SupportsTruncate to support truncate the table. 
Based on the assume that DynamicTableSink is used for 
inserting/updating/deleting, I think it's reasonable for DynamicTableSink to 
implement SupportsTruncate. But I think it sounds reasonable to add a generic 
interface like DynamicTable to differentiate DynamicTableSource & 
DynamicTableSink. But it will definitely requires much design and discussion 
which deserves a dedicated FLIP. I perfer not to do that in this FLIP to avoid 
overdesign and I think it's not a must for this FLIP. Maybe we can discuss it 
if some day if we do need the new generic table interface.

2: Considering various catalogs and tables, it's hard for Flink to do the 
unified follow-up actions after truncating table. But still the external 
connector can do such follow-up actions in method `executeTruncation`. 
Btw, in Spark, for the newly truncate table interface[1], Spark only recaches 
the table after truncating table[2] which I think if Flink supports table cache 
in framework-level,
we can also recache in framework-level for truncate table statement.

[1] 
https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
[2] 
https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala


I think the external catalog can implemnet such logic in method 
`executeTruncation`.

Best regards,
Yuxia

----- 原始邮件 -----
发件人: "liu ron" <ron9....@gmail.com>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, xia
It's a nice improvement to support TRUNCATE TABLE statement, making Flink
more feature-rich.
I think the truncate syntax is a command that will be executed in the
client's process, rather than pulling up a Flink job to execute on the
cluster. So on the user-facing exposed interface, I think we should not let
users implement the SupportsTruncate interface on the DynamicTableSink
interface. This seems a bit strange and also confuses users, as hang said,
why Source table does not support truncate. It would be nice if we could
come up with a generic interface that supports truncate instead of binding
it to the DynamicTableSink interface, and maybe in the future we will
support more commands like truncate command.

In addition, after truncating data, we may also need to update the metadata
of the table, such as Hive table, we need to update the statistics, as well
as clear the cache in the metastore, I think we should also consider these
capabilities, Sparky has considered these, refer to
https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
.

Best,

Ron

Jim Hughes <jhug...@confluent.io.invalid> 于2023年4月11日周二 02:15写道:

> Hi Yuxia,
>
> On Mon, Apr 10, 2023 at 10:35 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
>
> > Hi, Jim.
> >
> > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > support all at one shot. For the DynamicTableSinks that haven't
> implemented
> > SupportsTruncate interface, we'll throw exception
> > like 'The truncate statement for the table is not supported as it hasn't
> > implemented the interface SupportsTruncate'. Also, for some sinks that
> > doesn't support deleting data, it can also implements it but throw more
> > concrete exception like "xxx donesn't support to truncate a table as
> delete
> > is impossible for xxx". It depends on the external connector's
> > implementation.
> > Thanks for your advice, I updated it to the FLIP.
> >
>
> Makes sense.
>
>
> > 2: What do you mean by saying "truncate an input to a streaming query"?
> > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > truncating a table. In which case it will inoperates with streaming
> queries?
> >
>
> Let's take a source like Kafka as an example.  Suppose I have an input
> topic Foo, and query which uses it as an input.
>
> When Foo is truncated, if the truncation works as a delete and create, then
> the connector may need to be made aware (otherwise it may try to use
> offsets from the previous topic).  On the other hand, one may have to ask
> Kafka to delete records up to a certain point.
>
> Also, savepoints for the query may contain information from the truncated
> table.  Should this FLIP involve invalidating that information in some
> manner?  Or does truncating a source table for a query cause undefined
> behavior on that query?
>
> Basically, I'm trying to think through the implementations of a truncate
> operation to streaming sources and queries.
>
> Cheers,
>
> Jim
>
>
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Jim Hughes" <jhug...@confluent.io.INVALID>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi Yuxia,
> >
> > Two questions:
> >
> > 1.  Are you expecting all DynamicTableSinks to support Truncate?  The
> FLIP
> > could use some explanation for what supporting and not supporting the
> > operation means.
> >
> > 2.  How will truncate inoperate with streaming queries?  That is, if I
> > truncate an input to a streaming query, is there any defined behavior?
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Mar 22, 2023 at 9:13 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> wrote:
> >
> > > Hi, devs.
> > >
> > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > > statement [1].
> > >
> > > The TRUNCATE TABLE statement is a SQL command that allows users to
> > quickly
> > > and efficiently delete all rows from a table without dropping the table
> > > itself. This statement is commonly used in data warehouse, where large
> > data
> > > sets are frequently loaded and unloaded from tables.
> > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> > exactly,
> > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> > with
> > > which the coresponding connectors can implement their own logic for
> > > truncating table.
> > >
> > > Looking forwards to your feedback.
> > >
> > > [1]: [
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > |
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > ]
> > >
> > >
> > > Best regards,
> > > Yuxia
> > >
> >
>

Reply via email to