Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jark Wu Thu, 04 Feb 2021 01:50:40 -0800

Hi Ingo,

Since we have supported the WITH syntax and SET command since v1.9 [1][2],
and
we have never received such complaints, I think it's fine for such
differences.


Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
string literal keys[3],
and the SET <key>=<value> doesn't allow quoted keys [4].

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
[3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
[4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
(search "set mapred.reduce.tasks=32")

On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <i...@ververica.com> wrote:

> Hi,
>
> regarding the (un-)quoted question, compatibility is of course an important
> argument, but in terms of consistency I'd find it a bit surprising that
> WITH handles it differently than SET, and I wonder if that could cause
> friction for developers when writing their SQL.
>
>
> Regards
> Ingo
>
> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi all,
> >
> > Regarding "One Parser", I think it's not possible for now because Calcite
> > parser can't parse
> > special characters (e.g. "-") unless quoting them as string literals.
> > That's why the WITH option
> > key are string literals not identifiers.
> >
> > SET table.exec.mini-batch.enabled = true and ADD JAR
> > /local/my-home/test.jar
> > have the same
> > problems. That's why we propose two parser, one splits lines into
> multiple
> > statements and match special
> > command through regex which is light-weight, and delegate other
> statements
> > to the other parser which is Calcite parser.
> >
> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> > true syntax,
> > both for backward-compatibility and easy-to-use, and all the other
> systems
> > don't have quotes on the key.
> >
> >
> > Regarding "table.planner" vs "sql-client.planner",
> > if we want to use "table.planner", I think we should explain clearly
> what's
> > the scope it can be used in documentation.
> > Otherwise, there will be users complaining why the planner doesn't change
> > when setting the configuration on TableEnv.
> > Would be better throwing an exception to indicate users it's now allowed
> to
> > change planner after TableEnv is initialized.
> > However, it seems not easy to implement.
> >
> > Best,
> > Jark
> >
> > On Thu, 4 Feb 2021 at 15:49, godfrey he <godfre...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > Regarding "table.planner" and "table.execution-mode"
> > > If we define that those two options are just used to initialize the
> > > TableEnvironment, +1 for introducing table options instead of
> sql-client
> > > options.
> > >
> > > Regarding "the sql client, we will maintain two parsers", I want to
> give
> > > more inputs:
> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
> and
> > > the gateway service will communicate through Rest API. The " ADD JAR
> > > /local/path/jar " will be executed in the CLI client machine. So when
> we
> > > submit a sql file which contains multiple statements, the CLI client
> > needs
> > > to pick out the "ADD JAR" line, and also statements need to be
> submitted
> > or
> > > executed one by one to make sure the result is correct. The sql file
> may
> > be
> > > look like:
> > >
> > > SET xxx=yyy;
> > > create table my_table ...;
> > > create table my_sink ...;
> > > ADD JAR /local/path/jar1;
> > > create function my_udf as com....MyUdf;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > > REMOVE JAR /local/path/jar1;
> > > drop function my_udf;
> > > ADD JAR /local/path/jar2;
> > > create function my_udf as com....MyUdf2;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > >
> > > The lines need to be splitted into multiple statements first in the CLI
> > > client, there are two approaches:
> > > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > > lines and tells which lines are "ADD JAR".
> > > pro: there is only one parser
> > > cons: It's a little heavy that the CLI client depends on the
> sql-parser,
> > > because the CLI client is just a simple tool which receives the user
> > > commands and displays the result. The non "ADD JAR" command will be
> > parsed
> > > twice.
> > >
> > > 2. The CLI client splits the lines into multiple statements and finds
> the
> > > ADD JAR command through regex matching.
> > > pro: The CLI client is very light-weight.
> > > cons: there are two parsers.
> > >
> > > (personally, I prefer the second option)
> > >
> > > Regarding "SHOW or LIST JARS", I think we can support them both.
> > > For default dialect, we support SHOW JARS, but if we switch to hive
> > > dialect, LIST JARS is also supported.
> > >
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >
> > > Best,
> > > Godfrey
> > >
> > > Rui Li <lirui.fu...@gmail.com> 于2021年2月4日周四 上午10:40写道：
> > >
> > > > Hi guys,
> > > >
> > > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > > commands than LIST JARS. I don't have a strong opinion about REMOVE
> vs
> > > > DELETE though.
> > > >
> > > > While flink doesn't need to follow hive syntax, as far as I know,
> most
> > > > users who are requesting these features are previously hive users.
> So I
> > > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> > JARS
> > > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> > as
> > > > the command to terminate the program. So if that's not hard to
> achieve,
> > > and
> > > > will make users happier, I don't see a reason why we must choose one
> > over
> > > > the other.
> > > >
> > > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <twal...@apache.org>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > some feedback regarding the open questions. Maybe we can discuss
> the
> > > > > `TableEnvironment.executeMultiSql` story offline to determine how
> we
> > > > > proceed with this in the near future.
> > > > >
> > > > > 1) "whether the table environment has the ability to update itself"
> > > > >
> > > > > Maybe there was some misunderstanding. I don't think that we should
> > > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > > "old")`. Instead I'm proposing to support
> > > > > `TableEnvironment.create(Configuration)` where planner and
> execution
> > > > > mode are read immediately and a subsequent changes to these options
> > > will
> > > > > have no effect. We are doing it similar in `new
> > > > > StreamExecutionEnvironment(Configuration)`. These two
> ConfigOption's
> > > > > must not be SQL Client specific but can be part of the core table
> > code
> > > > > base. Many users would like to get a 100% preconfigured environment
> > > from
> > > > > just Configuration. And this is not possible right now. We can
> solve
> > > > > both use cases in one change.
> > > > >
> > > > > 2) "the sql client, we will maintain two parsers"
> > > > >
> > > > > I remember we had some discussion about this and decided that we
> > would
> > > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > > where
> > > > > commands influence each other also with respect to keywords. It
> > should
> > > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > > cource the table environment would not be able to handle the
> > > `Operation`
> > > > > instance that would be the result but we can introduce hooks to
> > handle
> > > > > those `Operation`s. Or we introduce parser extensions.
> > > > >
> > > > > Can we skip `table.job.async` in the first version? We should
> further
> > > > > discuss whether we introduce a special SQL clause for wrapping
> async
> > > > > behavior or if we use a config option? Esp. for streaming queries
> we
> > > > > need to be careful and should force users to either "one INSERT
> INTO"
> > > or
> > > > > "one STATEMENT SET".
> > > > >
> > > > > 3) 4) "HIVE also uses these commands"
> > > > >
> > > > > In general, Hive is not a good reference. Aligning the commands
> more
> > > > > with the remaining commands should be our goal. We just had a
> MODULE
> > > > > discussion where we selected SHOW instead of LIST. But it is true
> > that
> > > > > JARs are not part of the catalog which is why I would not use
> > > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> > language.
> > > > > Take a look at the Java collection API as another example.
> > > > >
> > > > > 6) "Most of the commands should belong to the table environment"
> > > > >
> > > > > Thanks for updating the FLIP this makes things easier to
> understand.
> > It
> > > > > is good to see that most commends will be available in
> > > TableEnvironment.
> > > > > However, I would also support SET and RESET for consistency. Again,
> > > from
> > > > > an architectural point of view, if we would allow some kind of
> > > > > `Operation` hook in table environment, we could check for SQL
> Client
> > > > > specific options and forward to regular
> > `TableConfig.getConfiguration`
> > > > > otherwise. What do you think?
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > > Hi Timo,
> > > > > >
> > > > > > I will respond some of the questions:
> > > > > >
> > > > > > 1) SQL client specific options
> > > > > >
> > > > > > Whether it starts with "table" or "sql-client" depends on where
> the
> > > > > > configuration takes effect.
> > > > > > If it is a table configuration, we should make clear what's the
> > > > behavior
> > > > > > when users change
> > > > > > the configuration in the lifecycle of TableEnvironment.
> > > > > >
> > > > > > I agree with Shengkai `sql-client.planner` and
> > > > > `sql-client.execution.mode`
> > > > > > are something special
> > > > > > that can't be changed after TableEnvironment has been
> initialized.
> > > You
> > > > > can
> > > > > > see
> > > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > > override
> > > > > > configuration after
> > > > > > StreamExecutionEnvironment has been initialized.
> > > > > >
> > > > > > Therefore, I think it would be better to still use
> > > > `sql-client.planner`
> > > > > > and `sql-client.execution.mode`.
> > > > > >
> > > > > > 2) Execution file
> > > > > >
> > > > > >>From my point of view, there is a big difference between
> > > > > > `sql-client.job.detach` and
> > > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > > will
> > > > > > affect every single DML statement
> > > > > > in the terminal, not only the statements in SQL files. I think
> the
> > > > single
> > > > > > DML statement in the interactive
> > > > > > terminal is something like tEnv#executeSql() instead of
> > > > > > tEnv#executeMultiSql.
> > > > > > So I don't like the "multi" and "sql" keyword in
> > > > `table.multi-sql-async`.
> > > > > > I just find that runtime provides a configuration called
> > > > > > "execution.attached" [1] which is false by default
> > > > > > which specifies if the pipeline is submitted in attached or
> > detached
> > > > > mode.
> > > > > > It provides exactly the same
> > > > > > functionality of `sql-client.job.detach`. What do you think about
> > > using
> > > > > > this option?
> > > > > >
> > > > > > If we also want to support this config in TableEnvironment, I
> think
> > > it
> > > > > > should also affect the DML execution
> > > > > >   of `tEnv#executeSql()`, not only DMLs in
> > `tEnv#executeMultiSql()`.
> > > > > > Therefore, the behavior may look like this:
> > > > > >
> > > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> by
> > > > > default
> > > > > > tableResult.await()   ==> manually block until finish
> > > > > >
> tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > > "true")
> > > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > > don't
> > > > > need
> > > > > > to wait on the TableResult
> > > > > > tEnv.executeMultiSql(
> > > > > > """
> > > > > > CREATE TABLE ....  ==> always sync
> > > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > > SET execution.attached = false;
> > > > > > INSERT INTO ...  => async
> > > > > > """)
> > > > > >
> > > > > > On the other hand, I think `sql-client.job.detach`
> > > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > > topics,
> > > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > > I'm fine with making `executeMultiSql()` clear but don't want it
> to
> > > > block
> > > > > > this FLIP, maybe we can discuss this in another thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > > >
> > > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <fskm...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi, Timo.
> > > > > >> Thanks for your detailed feedback. I have some thoughts about
> your
> > > > > >> feedback.
> > > > > >>
> > > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > > environment
> > > > > >> has the ability to update itself. Let's take a simple program as
> > an
> > > > > >> example.
> > > > > >>
> > > > > >>
> > > > > >> ```
> > > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > > >>
> > > > > >> tEnv.getConfig.getConfiguration.setString("table.planner",
> "old");
> > > > > >>
> > > > > >>
> > > > > >> tEnv.executeSql("...");
> > > > > >>
> > > > > >> ```
> > > > > >>
> > > > > >> If we regard this option as a table option, users don't have to
> > > create
> > > > > >> another table environment manually. In that case, tEnv needs to
> > > check
> > > > > >> whether the current mode and planner are the same as before when
> > > > > executeSql
> > > > > >> or explainSql. I don't think it's easy work for the table
> > > environment,
> > > > > >> especially if users have a StreamExecutionEnvironment but set
> old
> > > > > planner
> > > > > >> and batch mode. But when we make this option as a sql client
> > option,
> > > > > users
> > > > > >> only use the SET command to change the setting. We can rebuild a
> > new
> > > > > table
> > > > > >> environment when set successes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #2*: I think we need to discuss the implementation
> > before
> > > > > >> continuing this topic. In the sql client, we will maintain two
> > > > parsers.
> > > > > The
> > > > > >> first parser(client parser) will only match the sql client
> > commands.
> > > > If
> > > > > the
> > > > > >> client parser can't parse the statement, we will leverage the
> > power
> > > of
> > > > > the
> > > > > >> table environment to execute. According to our blueprint,
> > > > > >> TableEnvironment#executeSql is enough for the sql client.
> > Therefore,
> > > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > > >>
> > > > > >> But if we need to introduce the
> `TableEnvironment.executeMultiSql`
> > > in
> > > > > the
> > > > > >> future, I think it's OK to use the option
> `table.multi-sql-async`
> > > > rather
> > > > > >> than option `sql-client.job.detach`. But we think the name is
> not
> > > > > suitable
> > > > > >> because the name is confusing for others. When setting the
> option
> > > > > false, we
> > > > > >> just mean it will block the execution of the INSERT INTO
> > statement,
> > > > not
> > > > > DDL
> > > > > >> or others(other sql statements are always executed
> synchronously).
> > > So
> > > > > how
> > > > > >> about `table.job.async`? It only works for the sql-client and
> the
> > > > > >> executeMultiSql. If we set this value false, the table
> environment
> > > > will
> > > > > >> return the result until the job finishes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> > LIST
> > > > JAR
> > > > > >> because HIVE also uses these commands to add the jar into the
> > > > classpath
> > > > > or
> > > > > >> delete the jar. If we use  such commands, it can reduce our work
> > for
> > > > > hive
> > > > > >> compatibility.
> > > > > >>
> > > > > >> For SHOW JAR, I think the main concern is the jars are not
> > > maintained
> > > > by
> > > > > >> the Catalog. If we really needs to keep consistent with SQL
> > grammar,
> > > > > maybe
> > > > > >> we should use
> > > > > >>
> > > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > > >>
> > > > > >> *Regarding #5*: I agree with you that we'd better keep
> consistent.
> > > > > >>
> > > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> > table
> > > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > > identify
> > > > > which
> > > > > >> commands should belong to the sql client and which commands
> should
> > > > > belong
> > > > > >> to the table environment. I also add a new section about
> > > > implementation
> > > > > >> details in the FLIP.
> > > > > >>
> > > > > >> Best,
> > > > > >> Shengkai
> > > > > >>
> > > > > >> Timo Walther <twal...@apache.org> 于2021年2月2日周二 下午6:43写道：
> > > > > >>
> > > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > > Client
> > > > > a
> > > > > >>> very good update and make it production ready.
> > > > > >>>
> > > > > >>> Here is some feedback from my side:
> > > > > >>>
> > > > > >>> 1) SQL client specific options
> > > > > >>>
> > > > > >>> I don't think that `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > >>> are SQL Client specific. Similar to
> `StreamExecutionEnvironment`
> > > and
> > > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > > should
> > > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> > `table.planner`
> > > > and
> > > > > >>> `table.execution-mode` to
> > > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > > >>>
> > > > > >>> 2) Execution file
> > > > > >>>
> > > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including
> the
> > > > > mailing
> > > > > >>> list thread at that time? Could you further elaborate how the
> > > > > >>> multi-statement execution should work for a unified
> > batch/streaming
> > > > > >>> story? According to our past discussions, each line in an
> > execution
> > > > > file
> > > > > >>> should be executed blocking which means a streaming query
> needs a
> > > > > >>> statement set to execute multiple INSERT INTO statement,
> correct?
> > > We
> > > > > >>> should also offer this functionality in
> > > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > > `sql-client.job.detach`
> > > > > is
> > > > > >>> SQL Client specific needs to be determined, it could also be a
> > > > general
> > > > > >>> `table.multi-sql-async` option?
> > > > > >>>
> > > > > >>> 3) DELETE JAR
> > > > > >>>
> > > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> like
> > > one
> > > > > is
> > > > > >>> actively deleting the JAR in the corresponding path.
> > > > > >>>
> > > > > >>> 4) LIST JAR
> > > > > >>>
> > > > > >>> This should be `SHOW JARS` according to other SQL commands such
> > as
> > > > > `SHOW
> > > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > > >>>
> > > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > > >>>
> > > > > >>> We should keep the details in sync with
> > > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > > about
> > > > > >>> differently named ExplainDetails. I would vote for
> > `ESTIMATED_COST`
> > > > > >>> instead of `COST`. I'm sure the original author had a reason
> why
> > to
> > > > > call
> > > > > >>> it that way.
> > > > > >>>
> > > > > >>> 6) Implementation details
> > > > > >>>
> > > > > >>> It would be nice to understand how we plan to implement the
> given
> > > > > >>> features. Most of the commands and config options should go
> into
> > > > > >>> TableEnvironment and SqlParser directly, correct? This way
> users
> > > > have a
> > > > > >>> unified way of using Flink SQL. TableEnvironment would provide
> a
> > > > > similar
> > > > > >>> user experience in notebooks or interactive programs than the
> SQL
> > > > > Client.
> > > > > >>>
> > > > > >>> [1]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > > >>> [2]
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Timo
> > > > > >>>
> > > > > >>>
> > > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > > `UNSET`.
> > > > > >>>>
> > > > > >>>> Shengkai Fang <fskm...@gmail.com> 于2021年2月2日周二 下午4:44写道：
> > > > > >>>>
> > > > > >>>>> Hi, Jingsong.
> > > > > >>>>>
> > > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > > >>>>>
> > > > > >>>>> 1. We don't need to introduce another command `UNSET`.
> `RESET`
> > is
> > > > > >>>>> supported in the current sql client now. Our proposal just
> > > extends
> > > > > its
> > > > > >>>>> grammar and allow users to reset the specified keys.
> > > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the
> default
> > > > > >>> value[1].
> > > > > >>>>> I think it is more friendly for batch users.
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Shengkai
> > > > > >>>>>
> > > > > >>>>> [1]
> > > > > >>>
> > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > > >>>>>
> > > > > >>>>> Jingsong Li <jingsongl...@gmail.com> 于2021年2月2日周二 下午1:56写道：
> > > > > >>>>>
> > > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> > for
> > > > > >>>>>> improving it.
> > > > > >>>>>>
> > > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Jingsong
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> lirui.fu...@gmail.com>
> > > > > wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> > good
> > > to
> > > > > >> me.
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > > fskm...@gmail.com
> > > > >
> > > > > >>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi, Rui.
> > > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The main changes:
> > > > > >>>>>>>>
> > > > > >>>>>>>> # -f parameter has no restriction about the statement
> type.
> > > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > > queries
> > > > to
> > > > > >>>>>>> debug
> > > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > > comparing
> > > > > >>> to
> > > > > >>>>>>>> writing INSERT INTO statements.
> > > > > >>>>>>>>
> > > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > > Users
> > > > > >> can
> > > > > >>>>>>> set
> > > > > >>>>>>>> this option false and the client will process the next job
> > > until
> > > > > >> the
> > > > > >>>>>>>> current job finishes. The default value of this option is
> > > false,
> > > > > >>> which
> > > > > >>>>>>>> means the client will execute the next job when the
> current
> > > job
> > > > is
> > > > > >>>>>>>> submitted.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Shengkai
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午4:52写道：
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > > >> different
> > > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > > example,
> > > > if
> > > > > >>> the
> > > > > >>>>>>>>> client just submits the job and exits, what happens if
> the
> > > file
> > > > > >>>>>>> contains
> > > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> > as
> > > a
> > > > > >>>>>>> statement
> > > > > >>>>>>>>> set, because users should explicitly write BEGIN
> STATEMENT
> > > SET
> > > > in
> > > > > >>> that
> > > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> > two
> > > > > jobs,
> > > > > >>>>>>> because
> > > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > > fskm...@gmail.com
> > > > > >
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Hi Rui,
> > > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> > set
> > > > > >>>>>>> command. In
> > > > > >>>>>>>>>> the implementation, it will just put the key-value into
> > the
> > > > > >>>>>>>>>> `Configuration`, which will be used to generate the
> table
> > > > > config.
> > > > > >>> If
> > > > > >>>>>>> hive
> > > > > >>>>>>>>>> supports to read the setting from the table config,
> users
> > > are
> > > > > >> able
> > > > > >>>>>>> to set
> > > > > >>>>>>>>>> the hive-related settings.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> job
> > > and
> > > > > >>> exit.
> > > > > >>>>>>> If
> > > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > > >> themselves,
> > > > > >>>>>>> which is
> > > > > >>>>>>>>>> not reliable(people may forget their jobs). In most
> case,
> > > > > queries
> > > > > >>>>>>> are used
> > > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > > interactive
> > > > > >>>>>>> mode.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 下午3:18写道：
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> think
> > it
> > > > > >>> covers a
> > > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> > the
> > > > > >>>>>>> usability of our
> > > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > > configurations
> > > > > >> via
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>> SET command? A connector may have its own
> configurations
> > > and
> > > > we
> > > > > >>>>>>> don't have
> > > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > > Client.
> > > > > >> For
> > > > > >>>>>>> example,
> > > > > >>>>>>>>>>> users may want to be able to change hive conf when
> using
> > > hive
> > > > > >>>>>>> connector [1].
> > > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> files
> > > > > >> specified
> > > > > >>>>>>> with
> > > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> > allows
> > > > > >>> queries
> > > > > >>>>>>> in the
> > > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > > redirect
> > > > > >> the
> > > > > >>>>>>> results
> > > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to
> do
> > > the
> > > > > >> same,
> > > > > >>>>>>>>>>> especially in batch scenarios.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > > >>>>>>> liuyang0...@gmail.com>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Glad to see this improvement. And I have some
> additional
> > > > > >>>>>>> suggestions:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> sql.
> > > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > > collect
> > > > > >> the
> > > > > >>>>>>>>>>>> results
> > > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > > >>>>>>>>>>>>         which may have memory issues in JM or Local
> for
> > > the
> > > > > big
> > > > > >>> query
> > > > > >>>>>>>>>>>> result.
> > > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > > based
> > > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is
> in
> > > > > >> FLIP-91.
> > > > > >>>>>>> Seems
> > > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > > >>>>>>>>>>>>         Provide a long running service out of the box
> to
> > > > > >>> facilitate
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>> sql
> > > > > >>>>>>>>>>>> submission is necessary.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> What do you think of these?
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Shengkai Fang <fskm...@gmail.com> 于2021年1月28日周四
> > 下午8:54写道：
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Hi devs,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> > FLIP-163:SQL
> > > > > >> Client
> > > > > >>>>>>>>>>>>> Improvements.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> > sql
> > > > > >> client.
> > > > > >>>>>>> For
> > > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > > FLIP-95.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> > initialize
> > > > the
> > > > > >>>>>>> table
> > > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > > parameter;
> > > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> For more detailed changes, please refer to
> FLIP-163[1].
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> *With kind regards
> > > > > >>>>>>>>>>>>
> > > ------------------------------------------------------------
> > > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > > Science
> > > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > > >>>>>>>>>>>> E-mail: liuyang0...@gmail.com <liuyang0...@gmail.com>
> > > > > >>>>>>>>>>>> QQ: 3239559*
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>> Best regards!
> > > > > >>>>>>>>>>> Rui Li
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> --
> > > > > >>>>>>>>> Best regards!
> > > > > >>>>>>>>> Rui Li
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Best regards!
> > > > > >>>>>>> Rui Li
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>> Best, Jingsong Lee
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
>

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Reply via email to