Re: [DISCUSS] Flink SQL DDL Design

Shuyi Chen Mon, 26 Nov 2018 22:54:44 -0800

Hi Wenlong, thanks a lot for the comments.

1) I agree we can infer the table type from the queries if the Flink job is
static. However, for SQL client cases, the query is adhoc, dynamic, and not
known beforehand. In such case, we might want to enforce the table open
mode at startup time, so users won't accidentally write to a Kafka topic
that is supposed to be written only by producers outside of the Flink world.
2) as in [1], currently,  format and connector are first class concept in
Flink table, and it's required by most table creations, so I think adding
specific keyword to it makes it more organized and readable. But I do agree
a flattened key-value pair makes it simpler for parser, and easier to
extend. So maybe something like the following make more sense:


CREATE SOURCE TABLE Kafka10SourceTable (

intField INTEGER,

stringField VARCHAR(128) COMMENT ‘User IP address’,

longField BIGINT,

rowTimeField TIMESTAMP

TIMESTAMPS FROM ‘longField’

WATERMARKS PERIODIC-BOUNDED WITH DELAY '60’

)

COMMENT ‘Kafka Source Table of topic user_ip_address’

CONNECTOR (

type = ’kafka’,

property-version = ’1’,
version = ’0.10’,
properties.topic = ‘test-kafka-topic’,
properties.startup-mode = ‘latest-offset’,
properties.specific-offset = ‘offset’

)

FORMAT (

format.type = 'json',

format.prperties.version=’1’,

format.derive-schema = 'true'

)

Shuyi

[1]
https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.41fd6rs7b3cf

On Sun, Nov 4, 2018 at 7:15 PM wenlong.lwl <wenlong88....@gmail.com> wrote:

> Hi, Shuyi, thanks for the proposal.
>
> I have two concerns about the table ddl:
>
> 1. how about remove the source/sink mark from the ddl, because it is not
> necessary, the framework determine the table referred is a source or a sink
> according to the context of the query using the table. it will be more
> convenient for use defining a table which can be both a source and sink,
> and more convenient for catalog to persistent and manage the meta infos.
>
> 2. how about just keeping one pure string map as parameters for table, like
> create tabe Kafka10SourceTable (
> intField INTEGER,
> stringField VARCHAR(128),
> longField BIGINT,
> rowTimeField TIMESTAMP
> ) with (
> connector.type = ’kafka’,
> connector.property-version = ’1’,
> connector.version = ’0.10’,
> connector.properties.topic = ‘test-kafka-topic’,
> connector.properties.startup-mode = ‘latest-offset’,
> connector.properties.specific-offset = ‘offset’,
> format.type = 'json'
> format.prperties.version=’1’,
> format.derive-schema = 'true'
> );
> Because:
> 1. in TableFactory, what user use is a string map properties, defining
> parameters by string-map can be the closest way to mapping how user use the
> parameters.
> 2. The table descriptor can be extended by user, like what is done in Kafka
> and Json, it means that the parameter keys in connector or format can be
> different in different implementation, we can not restrict the key in a
> specified set, so we need a map in connector scope and a map in
> connector.properties scope. why not just give user a single map, let them
> put parameters in a format they like, which is also the simplest way to
> implement DDL parser.
> 3. whether we can define a format clause or not, depends on the
> implementation of the connector, using different clause in DDL may make a
> misunderstanding that we can combine the connectors with arbitrary formats,
> which may not work actually.
>
> On Sun, 4 Nov 2018 at 18:25, Dominik Wosiński <wos...@gmail.com> wrote:
>
> > +1,  Thanks for the proposal.
> >
> > I guess this is a long-awaited change. This can vastly increase the
> > functionalities of the SQL Client as it will be possible to use complex
> > extensions like for example those provided by Apache Bahir[1].
> >
> > Best Regards,
> > Dom.
> >
> > [1]
> > https://github.com/apache/bahir-flink
> >
> > sob., 3 lis 2018 o 17:17 Rong Rong <walter...@gmail.com> napisał(a):
> >
> > > +1. Thanks for putting the proposal together Shuyi.
> > >
> > > DDL has been brought up in a couple of times previously [1,2].
> Utilizing
> > > DDL will definitely be a great extension to the current Flink SQL to
> > > systematically support some of the previously brought up features such
> as
> > > [3]. And it will also be beneficial to see the document closely aligned
> > > with the previous discussion for unified SQL connector API [4].
> > >
> > > I also left a few comments on the doc. Looking forward to the alignment
> > > with the other couple of efforts and contributing to them!
> > >
> > > Best,
> > > Rong
> > >
> > > [1]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201805.mbox/%3CCAMZk55ZTJA7MkCK1Qu4gLPu1P9neqCfHZtTcgLfrFjfO4Xv5YQ%40mail.gmail.com%3E
> > > [2]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3CDC070534-0782-4AFD-8A85-8A82B384B8F7%40gmail.com%3E
> > >
> > > [3] https://issues.apache.org/jira/browse/FLINK-8003
> > > [4]
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201810.mbox/%3c6676cb66-6f31-23e1-eff5-2e9c19f88...@apache.org%3E
> > >
> > >
> > > On Fri, Nov 2, 2018 at 10:22 AM Bowen Li <bowenl...@gmail.com> wrote:
> > >
> > > > Thanks Shuyi!
> > > >
> > > > I left some comments there. I think the design of SQL DDL and
> > Flink-Hive
> > > > integration/External catalog enhancements will work closely with each
> > > > other. Hope we are well aligned on the directions of the two designs,
> > > and I
> > > > look forward to working with you guys on both!
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On Thu, Nov 1, 2018 at 10:57 PM Shuyi Chen <suez1...@gmail.com>
> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > SQL DDL support has been a long-time ask from the community.
> Current
> > > > Flink
> > > > > SQL support only DML (e.g. SELECT and INSERT statements). In its
> > > current
> > > > > form, Flink SQL users still need to define/create table sources and
> > > sinks
> > > > > programmatically in Java/Scala. Also, in SQL Client, without DDL
> > > support,
> > > > > the current implementation does not allow dynamical creation of
> > table,
> > > > type
> > > > > or functions with SQL, this adds friction for its adoption.
> > > > >
> > > > > I drafted a design doc [1] with a few other community members that
> > > > proposes
> > > > > the design and implementation for adding DDL support in Flink. The
> > > > initial
> > > > > design considers DDL for table, view, type, library and function.
> It
> > > will
> > > > > be great to get feedback on the design from the community, and
> align
> > > with
> > > > > latest effort in unified SQL connector API  [2] and Flink Hive
> > > > integration
> > > > > [3].
> > > > >
> > > > > Any feedback is highly appreciated.
> > > > >
> > > > > Thanks
> > > > > Shuyi Chen
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1TTP-GCC8wSsibJaSUyFZ_5NBAHYEB1FVmPpP7RgDGBA/edit?usp=sharing
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit?usp=sharing
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing
> > > > > --
> > > > > "So you have to trust that the dots will somehow connect in your
> > > future."
> > > > >
> > > >
> > >
> >
>


-- 
"So you have to trust that the dots will somehow connect in your future."

Re: [DISCUSS] Flink SQL DDL Design

Reply via email to