Hi Jingsong, thanks for the proposal, providing a built-in storage solution
for users will make flink SQL much more easier to use in production.

I have some questions which may be missed in the FLIP, but may be important
IMO:
1. Is it possible to read historical data from the file store first and
then fetch new changes from the log store? something like a hybrid source,
but I think we need a mechanism to get exactly-once semantic.
2. How the built-in table would be persisted in Catalog?
3. Currently a catalog can provide a default table factory and would be
used as the top priority factory, what would happen after the default
factory was introduced.

On Wed, 20 Oct 2021 at 19:35, Ingo Bürk <i...@ververica.com> wrote:

> Hi Jingsong,
>
> thank you for writing up the proposal. The benefits such a mechanism will
> bring will be very valuable! I haven't yet looked into this in detail, but
> one question came to my mind immediately:
>
> The DDL for these tables seems to rely on there not being a 'connector'
> option. However, catalogs can provide a custom factory, and thus tables
> don't necessarily need to contain such an option already today. How will
> this interact / work with catalogs? I think there are more points regarding
> interaction with catalogs, e.g. if tables are dropped externally rather
> than through Flink SQL DDL, how would Flink be able to remove the physical
> storage for it.
>
>
> Best
> Ingo
>
> On Wed, Oct 20, 2021 at 11:14 AM Jingsong Li <jingsongl...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > Kurt and I propose to introduce built-in storage support for dynamic
> > table, a truly unified changelog & table representation, from Flink
> > SQL’s perspective. We believe this kind of storage will improve the
> > usability a lot.
> >
> > We want to highlight some characteristics about this storage:
> >
> > - It’s a built-in storage for Flink SQL
> > ** Improve usability issues
> > ** Flink DDL is no longer just a mapping, but a real creation for these
> > tables
> > ** Masks & abstracts the underlying technical details, no annoying
> options
> >
> > - Supports subsecond streaming write & consumption
> > ** It could be backed by a service-oriented message queue (Like Kafka)
> > ** High throughput scan capability
> > ** Filesystem with columnar formats would be an ideal choice just like
> > iceberg/hudi does.
> >
> > - More importantly, in order to solve the cognitive bar, storage needs
> > to automatically address various Insert/Update/Delete inputs and table
> > definitions
> > ** Receive any type of changelog
> > ** Table can have primary key or no primary key
> >
> > Looking forward to your feedback.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-188%3A+Introduce+Built-in+Dynamic+Table+Storage
> >
> > Best,
> > Jingsong Lee
> >
>

Reply via email to