Hi, Aitozi.
Thanks for your inputs. I understand your concern. Althogh the external 
connector can update the metadata in method `executeTruncation`,
but the Flink catalog can't be aware the updating in some case. If the Hive 
catalog only store hive tables, everything will be fine.
But if the Hive catalog also store non-hive table, and the non-hive table can't 
be update the underlying Hive metatasore, as a result of which
the Hive catalog will still get old metata.

Since this problem is generic which is not only limited to truncate table 
statment, but also to other statement, like insert, update/delete or other 
statments on the way.
I think it deserves another dedicated channel to discuss what the Flink catalog 
is for or do we need to introduce some new mechanism for it.


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Aitozi" <gjying1...@gmail.com>
收件人: "dev" <dev@flink.apache.org>
发送时间: 星期四, 2023年 4 月 13日 下午 2:37:48
主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement

Hi, xia
   > which I think if Flink supports table cache in framework-level,
we can also recache in framework-level for truncate table statement.

I think currently flink catalog already will some stats for the table,
eg: after `ANALYZE TABLE`, the table's Statistics will be stored in
the
catalog, but truncate table will not correct the statistic.

I know it's hard for Flink to do the unified follow-up actions after
truncating table. But I think we need define a clear location for the
Flink Catalog
in mind.
IMO, Flink as a compute engine, it's hard for it to maintain the
catalog for different storage table itself. So with more and more
`Executable`
command introduced the data in catalog will be cleaved.
In this case, after truncate the catalog's following part may be affected:

- the table/column statistic will be not correct
- the partition of this table should be cleared


Best,
Aitozi.


liu ron <ron9....@gmail.com> 于2023年4月13日周四 11:28写道:

>
> Hi, xia
>
> Thanks for your explanation, for the first question, given the current
> status, I think we can provide the generic interface in the future if we
> need it. For the second question,  it makes sense to me if we can
> support the table cache at the framework level.
>
> Best,
> Ron
>
> yuxia <luoyu...@alumni.sjtu.edu.cn> 于2023年4月11日周二 16:12写道:
>
> > Hi, ron.
> >
> > 1: Considering for deleting rows, Flink will also write delete record to
> > achive purpose of deleting data, it may not as so strange for connector
> > devs to make DynamicTableSink implement SupportsTruncate to support
> > truncate the table. Based on the assume that DynamicTableSink is used for
> > inserting/updating/deleting, I think it's reasonable for DynamicTableSink
> > to implement SupportsTruncate. But I think it sounds reasonable to add a
> > generic interface like DynamicTable to differentiate DynamicTableSource &
> > DynamicTableSink. But it will definitely requires much design and
> > discussion which deserves a dedicated FLIP. I perfer not to do that in this
> > FLIP to avoid overdesign and I think it's not a must for this FLIP. Maybe
> > we can discuss it if some day if we do need the new generic table interface.
> >
> > 2: Considering various catalogs and tables, it's hard for Flink to do the
> > unified follow-up actions after truncating table. But still the external
> > connector can do such follow-up actions in method `executeTruncation`.
> > Btw, in Spark, for the newly truncate table interface[1], Spark only
> > recaches the table after truncating table[2] which I think if Flink
> > supports table cache in framework-level,
> > we can also recache in framework-level for truncate table statement.
> >
> > [1]
> > https://github.com/apache/spark/blob/1a42aa5bd44e7524bb55463bbd85bea782715834/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TruncatableTable.java
> > [2]
> > https://github.com/apache/spark/blob/06c09a79b371c5ac3e4ebad1118ed94b460f48d1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/TruncateTableExec.scala
> >
> >
> > I think the external catalog can implemnet such logic in method
> > `executeTruncation`.
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "liu ron" <ron9....@gmail.com>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期二, 2023年 4 月 11日 上午 10:51:36
> > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> >
> > Hi, xia
> > It's a nice improvement to support TRUNCATE TABLE statement, making Flink
> > more feature-rich.
> > I think the truncate syntax is a command that will be executed in the
> > client's process, rather than pulling up a Flink job to execute on the
> > cluster. So on the user-facing exposed interface, I think we should not let
> > users implement the SupportsTruncate interface on the DynamicTableSink
> > interface. This seems a bit strange and also confuses users, as hang said,
> > why Source table does not support truncate. It would be nice if we could
> > come up with a generic interface that supports truncate instead of binding
> > it to the DynamicTableSink interface, and maybe in the future we will
> > support more commands like truncate command.
> >
> > In addition, after truncating data, we may also need to update the metadata
> > of the table, such as Hive table, we need to update the statistics, as well
> > as clear the cache in the metastore, I think we should also consider these
> > capabilities, Sparky has considered these, refer to
> >
> > https://github.com/apache/spark/blob/69dd20b5e45c7e3533efbfdc1974f59931c1b781/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L573
> > .
> >
> > Best,
> >
> > Ron
> >
> > Jim Hughes <jhug...@confluent.io.invalid> 于2023年4月11日周二 02:15写道:
> >
> > > Hi Yuxia,
> > >
> > > On Mon, Apr 10, 2023 at 10:35 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > wrote:
> > >
> > > > Hi, Jim.
> > > >
> > > > 1: I'm expecting all DynamicTableSinks to support. But it's hard to
> > > > support all at one shot. For the DynamicTableSinks that haven't
> > > implemented
> > > > SupportsTruncate interface, we'll throw exception
> > > > like 'The truncate statement for the table is not supported as it
> > hasn't
> > > > implemented the interface SupportsTruncate'. Also, for some sinks that
> > > > doesn't support deleting data, it can also implements it but throw more
> > > > concrete exception like "xxx donesn't support to truncate a table as
> > > delete
> > > > is impossible for xxx". It depends on the external connector's
> > > > implementation.
> > > > Thanks for your advice, I updated it to the FLIP.
> > > >
> > >
> > > Makes sense.
> > >
> > >
> > > > 2: What do you mean by saying "truncate an input to a streaming query"?
> > > > This FLIP is aimed to support TRUNCATE TABLE statement which is for
> > > > truncating a table. In which case it will inoperates with streaming
> > > queries?
> > > >
> > >
> > > Let's take a source like Kafka as an example.  Suppose I have an input
> > > topic Foo, and query which uses it as an input.
> > >
> > > When Foo is truncated, if the truncation works as a delete and create,
> > then
> > > the connector may need to be made aware (otherwise it may try to use
> > > offsets from the previous topic).  On the other hand, one may have to ask
> > > Kafka to delete records up to a certain point.
> > >
> > > Also, savepoints for the query may contain information from the truncated
> > > table.  Should this FLIP involve invalidating that information in some
> > > manner?  Or does truncating a source table for a query cause undefined
> > > behavior on that query?
> > >
> > > Basically, I'm trying to think through the implementations of a truncate
> > > operation to streaming sources and queries.
> > >
> > > Cheers,
> > >
> > > Jim
> > >
> > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > ----- 原始邮件 -----
> > > > 发件人: "Jim Hughes" <jhug...@confluent.io.INVALID>
> > > > 收件人: "dev" <dev@flink.apache.org>
> > > > 发送时间: 星期一, 2023年 4 月 10日 下午 9:32:28
> > > > 主题: Re: [DISCUSS] FLIP-302: Support TRUNCATE TABLE statement
> > > >
> > > > Hi Yuxia,
> > > >
> > > > Two questions:
> > > >
> > > > 1.  Are you expecting all DynamicTableSinks to support Truncate?  The
> > > FLIP
> > > > could use some explanation for what supporting and not supporting the
> > > > operation means.
> > > >
> > > > 2.  How will truncate inoperate with streaming queries?  That is, if I
> > > > truncate an input to a streaming query, is there any defined behavior?
> > > >
> > > > Cheers,
> > > >
> > > > Jim
> > > >
> > > > On Wed, Mar 22, 2023 at 9:13 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > > wrote:
> > > >
> > > > > Hi, devs.
> > > > >
> > > > > I'd like to start a discussion about FLIP-302: Support TRUNCATE TABLE
> > > > > statement [1].
> > > > >
> > > > > The TRUNCATE TABLE statement is a SQL command that allows users to
> > > > quickly
> > > > > and efficiently delete all rows from a table without dropping the
> > table
> > > > > itself. This statement is commonly used in data warehouse, where
> > large
> > > > data
> > > > > sets are frequently loaded and unloaded from tables.
> > > > > So, this FLIP is meant to support TRUNCATE TABLE statement. M ore
> > > > exactly,
> > > > > this FLIP will bring Flink the TRUNCATE TABLE syntax and an interface
> > > > with
> > > > > which the coresponding connectors can implement their own logic for
> > > > > truncating table.
> > > > >
> > > > > Looking forwards to your feedback.
> > > > >
> > > > > [1]: [
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > > > |
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-302%3A+Support+TRUNCATE+TABLE+statement
> > > > > ]
> > > > >
> > > > >
> > > > > Best regards,
> > > > > Yuxia
> > > > >
> > > >
> > >
> >

Reply via email to