Hi Jingsong, thanks for the proposal, providing a built-in storage solution for users will make flink SQL much more easier to use in production.
I have some questions which may be missed in the FLIP, but may be important IMO: 1. Is it possible to read historical data from the file store first and then fetch new changes from the log store? something like a hybrid source, but I think we need a mechanism to get exactly-once semantic. 2. How the built-in table would be persisted in Catalog? 3. Currently a catalog can provide a default table factory and would be used as the top priority factory, what would happen after the default factory was introduced. On Wed, 20 Oct 2021 at 19:35, Ingo Bürk <i...@ververica.com> wrote: > Hi Jingsong, > > thank you for writing up the proposal. The benefits such a mechanism will > bring will be very valuable! I haven't yet looked into this in detail, but > one question came to my mind immediately: > > The DDL for these tables seems to rely on there not being a 'connector' > option. However, catalogs can provide a custom factory, and thus tables > don't necessarily need to contain such an option already today. How will > this interact / work with catalogs? I think there are more points regarding > interaction with catalogs, e.g. if tables are dropped externally rather > than through Flink SQL DDL, how would Flink be able to remove the physical > storage for it. > > > Best > Ingo > > On Wed, Oct 20, 2021 at 11:14 AM Jingsong Li <jingsongl...@gmail.com> > wrote: > > > Hi all, > > > > Kurt and I propose to introduce built-in storage support for dynamic > > table, a truly unified changelog & table representation, from Flink > > SQL’s perspective. We believe this kind of storage will improve the > > usability a lot. > > > > We want to highlight some characteristics about this storage: > > > > - It’s a built-in storage for Flink SQL > > ** Improve usability issues > > ** Flink DDL is no longer just a mapping, but a real creation for these > > tables > > ** Masks & abstracts the underlying technical details, no annoying > options > > > > - Supports subsecond streaming write & consumption > > ** It could be backed by a service-oriented message queue (Like Kafka) > > ** High throughput scan capability > > ** Filesystem with columnar formats would be an ideal choice just like > > iceberg/hudi does. > > > > - More importantly, in order to solve the cognitive bar, storage needs > > to automatically address various Insert/Update/Delete inputs and table > > definitions > > ** Receive any type of changelog > > ** Table can have primary key or no primary key > > > > Looking forward to your feedback. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage > > > > Best, > > Jingsong Lee > > >