Ah, I just forgot the option name. I'm also fine with `table.dml-async`.
What do you think @Rui Li <lirui.fu...@gmail.com> @Shengkai Fang <fskm...@gmail.com> ? Best, Jark On Mon, 8 Feb 2021 at 23:06, Timo Walther <twal...@apache.org> wrote: > Great to hear that. Can someone update the FLIP a final time before we > start a vote? > > We should quickly discuss how we would like to name the config option > for the async/sync mode. I heared voices internally that are strongly > against calling it "detach" due to historical reasons with a Flink job > detach mode. How about `table.dml-async`? > > Thanks, > Timo > > > On 08.02.21 15:55, Jark Wu wrote: > > Thanks Timo, > > > > I'm +1 for option#2 too. > > > > I think we have addressed all the concerns and can start a vote. > > > > Best, > > Jark > > > > On Mon, 8 Feb 2021 at 22:19, Timo Walther <twal...@apache.org> wrote: > > > >> Hi Jark, > >> > >> you are right. Nesting STATEMENT SET and ASYNC might be too verbose. > >> > >> So let's stick to the config option approach. > >> > >> However, I strongly believe that we should not use the batch/streaming > >> mode for deriving semantics. This discussion is similar to time function > >> discussion. We should not derive sync/async submission behavior from a > >> flag that should only influence runtime operators and the incremental > >> computation. Statements for bounded streams should have the same > >> semantics in batch mode. > >> > >> I think your proposed option 2) is a good tradeoff. For the following > >> reasons: > >> > >> pros: > >> - by default, batch and streaming behave exactly the same > >> - SQL Client CLI behavior does not change compared to 1.12 and remains > >> async for batch and streaming > >> - consistent with the async Table API behavior > >> > >> con: > >> - batch files are not 100% SQL compliant by default > >> > >> The last item might not be an issue since we can expect that users have > >> long-running jobs and prefer async execution in most cases. > >> > >> Regards, > >> Timo > >> > >> > >> On 08.02.21 14:15, Jark Wu wrote: > >>> Hi Timo, > >>> > >>> Actually, I'm not in favor of explicit syntax `BEGIN ASYNC;... END;`. > >>> Because it makes submitting streaming jobs very verbose, every INSERT > >> INTO > >>> and STATEMENT SET must be wrapped in the ASYNC clause which is > >>> not user-friendly and not backward-compatible. > >>> > >>> I agree we will have unified behavior but this is at the cost of > hurting > >>> our main users. > >>> I'm worried that end users can't understand the technical decision, and > >>> they would > >>> feel streaming is harder to use. > >>> > >>> If we want to have an unified behavior, and let users decide what's the > >>> desirable behavior, I prefer to have a config option. A Flink cluster > can > >>> be set to async, then > >>> users don't need to wrap every DML in an ASYNC clause. This is the > least > >>> intrusive > >>> way to the users. > >>> > >>> > >>> Personally, I'm fine with following options in priority: > >>> > >>> 1) sync for batch DML and async for streaming DML > >>> ==> only breaks batch behavior, but makes both happy > >>> > >>> 2) async for both batch and streaming DML, and can be set to sync via a > >>> configuration. > >>> ==> compatible, and provides flexible configurable behavior > >>> > >>> 3) sync for both batch and streaming DML, and can be > >>> set to async via a configuration. > >>> ==> +0 for this, because it breaks all the compatibility, esp. our main > >>> users. > >>> > >>> Best, > >>> Jark > >>> > >>> On Mon, 8 Feb 2021 at 17:34, Timo Walther <twal...@apache.org> wrote: > >>> > >>>> Hi Jark, Hi Rui, > >>>> > >>>> 1) How should we execute statements in CLI and in file? Should there > be > >>>> a difference? > >>>> So it seems we have consensus here with unified bahavior. Even though > >>>> this means we are breaking existing batch INSERT INTOs that were > >>>> asynchronous before. > >>>> > >>>> 2) Should we have different behavior for batch and streaming? > >>>> I think also batch users prefer async behavior because usually even > >>>> those pipelines take some time to execute. But we need should stick to > >>>> standard SQL blocking semantics. > >>>> > >>>> What are your opinions on making async explicit in SQL via `BEGIN > ASYNC; > >>>> ... END;`? This would allow us to really have unified semantics > because > >>>> batch and streaming would behave the same? > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> > >>>> On 07.02.21 04:46, Rui Li wrote: > >>>>> Hi Timo, > >>>>> > >>>>> I agree with Jark that we should provide consistent experience > >> regarding > >>>>> SQL CLI and files. Some systems even allow users to execute SQL files > >> in > >>>>> the CLI, e.g. the "SOURCE" command in MySQL. If we want to support > that > >>>> in > >>>>> the future, it's a little tricky to decide whether that should be > >> treated > >>>>> as CLI or file. > >>>>> > >>>>> I actually prefer a config option and let users decide what's the > >>>>> desirable behavior. But if we have agreed not to use options, I'm > also > >>>> fine > >>>>> with Alternative #1. > >>>>> > >>>>> On Sun, Feb 7, 2021 at 11:01 AM Jark Wu <imj...@gmail.com> wrote: > >>>>> > >>>>>> Hi Timo, > >>>>>> > >>>>>> 1) How should we execute statements in CLI and in file? Should there > >> be > >>>> a > >>>>>> difference? > >>>>>> I do think we should unify the behavior of CLI and SQL files. SQL > >> files > >>>> can > >>>>>> be thought of as a shortcut of > >>>>>> "start CLI" => "copy content of SQL files" => "past content in CLI". > >>>>>> Actually, we already did this in kafka_e2e.sql [1]. > >>>>>> I think it's hard for users to understand why SQL files behave > >>>> differently > >>>>>> from CLI, all the other systems don't have such a difference. > >>>>>> > >>>>>> If we distinguish SQL files and CLI, should there be a difference in > >>>> JDBC > >>>>>> driver and UI platform? > >>>>>> Personally, they all should have consistent behavior. > >>>>>> > >>>>>> 2) Should we have different behavior for batch and streaming? > >>>>>> I think we all agree streaming users prefer async execution, > otherwise > >>>> it's > >>>>>> weird and difficult to use if the > >>>>>> submit script or CLI never exists. On the other hand, batch SQL > users > >>>> are > >>>>>> used to SQL statements being > >>>>>> executed blockly. > >>>>>> > >>>>>> Either unified async execution or unified sync execution, will hurt > >> one > >>>>>> side of the streaming > >>>>>> batch users. In order to make both sides happy, I think we can have > >>>>>> different behavior for batch and streaming. > >>>>>> There are many essential differences between batch and stream > >> systems, I > >>>>>> think it's normal to have some > >>>>>> different behaviors, and the behavior doesn't break the unified > batch > >>>>>> stream semantics. > >>>>>> > >>>>>> > >>>>>> Thus, I'm +1 to Alternative 1: > >>>>>> We consider batch/streaming mode and block for batch INSERT INTO and > >>>> async > >>>>>> for streaming INSERT INTO/STATEMENT SET. > >>>>>> And this behavior is consistent across CLI and files. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> [1]: > >>>>>> > >>>>>> > >>>> > >> > https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/resources/kafka_e2e.sql > >>>>>> > >>>>>> On Fri, 5 Feb 2021 at 21:49, Timo Walther <twal...@apache.org> > wrote: > >>>>>> > >>>>>>> Hi Jark, > >>>>>>> > >>>>>>> thanks for the summary. I hope we can also find a good long-term > >>>>>>> solution on the async/sync execution behavior topic. > >>>>>>> > >>>>>>> It should be discussed in a bigger round because it is (similar to > >> the > >>>>>>> time function discussion) related to batch-streaming unification > >> where > >>>>>>> we should stick to the SQL standard to some degree but also need to > >>>> come > >>>>>>> up with good streaming semantics. > >>>>>>> > >>>>>>> Let me summarize the problem again to hear opinions: > >>>>>>> > >>>>>>> - Batch SQL users are used to execute SQL files sequentially (from > >> top > >>>>>>> to bottom). > >>>>>>> - Batch SQL users are used to SQL statements being executed > blocking. > >>>>>>> One after the other. Esp. when moving around data with INSERT INTO. > >>>>>>> - Streaming users prefer async execution because unbounded stream > are > >>>>>>> more frequent than bounded streams. > >>>>>>> - We decided to make Flink Table API is async because in a > >> programming > >>>>>>> language it is easy to call `.await()` on the result to make it > >>>> blocking. > >>>>>>> - INSERT INTO statements in the current SQL Client implementation > are > >>>>>>> always submitted asynchrounous. > >>>>>>> - Other client's such as Ververica platform allow only one INSERT > >> INTO > >>>>>>> or a STATEMENT SET at the end of a file that will run > >> asynchrounously. > >>>>>>> > >>>>>>> Questions: > >>>>>>> > >>>>>>> - How should we execute statements in CLI and in file? Should there > >> be > >>>> a > >>>>>>> difference? > >>>>>>> - Should we have different behavior for batch and streaming? > >>>>>>> - Shall we solve parts with a config option or is it better to make > >> it > >>>>>>> explicit in the SQL job definition because it influences the > >> semantics > >>>>>>> of multiple INSERT INTOs? > >>>>>>> > >>>>>>> Let me summarize my opinion at the moment: > >>>>>>> > >>>>>>> - SQL files should always be executed blocking by default. Because > >> they > >>>>>>> could potentially contain a long list of INSERT INTO statements. > This > >>>>>>> would be SQL standard compliant. > >>>>>>> - If we allow async execution, we should make this explicit in the > >> SQL > >>>>>>> file via `BEGIN ASYNC; ... END;`. > >>>>>>> - In the CLI, we always execute async to maintain the old behavior. > >> We > >>>>>>> can also assume that people are only using the CLI to fire > statements > >>>>>>> and close the CLI afterwards. > >>>>>>> > >>>>>>> Alternative 1: > >>>>>>> - We consider batch/streaming mode and block for batch INSERT INTO > >> and > >>>>>>> async for streaming INSERT INTO/STATEMENT SET > >>>>>>> > >>>>>>> What do others think? > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 05.02.21 04:03, Jark Wu wrote: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> After an offline discussion with Timo and Kurt, we have reached > some > >>>>>>>> consensus. > >>>>>>>> Please correct me if I am wrong or missed anything. > >>>>>>>> > >>>>>>>> 1) We will introduce "table.planner" and "table.execution-mode" > >>>> instead > >>>>>>> of > >>>>>>>> "sql-client" prefix, > >>>>>>>> and add `TableEnvironment.create(Configuration)` interface. These > 2 > >>>>>>> options > >>>>>>>> can only be used > >>>>>>>> for tableEnv initialization. If used after initialization, Flink > >>>> should > >>>>>>>> throw an exception. We may can > >>>>>>>> support dynamic switch the planner in the future. > >>>>>>>> > >>>>>>>> 2) We will have only one parser, > >>>>>>>> i.e. org.apache.flink.table.delegation.Parser. It accepts a string > >>>>>>>> statement, and returns a list of Operation. It will first use > regex > >> to > >>>>>>>> match some special statement, > >>>>>>>> e.g. SET, ADD JAR, others will be delegated to the underlying > >>>> Calcite > >>>>>>>> parser. The Parser can > >>>>>>>> have different implementations, e.g. HiveParser. > >>>>>>>> > >>>>>>>> 3) We only support ADD JAR, REMOVE JAR, SHOW JAR for Flink > dialect. > >>>> But > >>>>>>> we > >>>>>>>> can allow > >>>>>>>> DELETE JAR, LIST JAR in Hive dialect through HiveParser. > >>>>>>>> > >>>>>>>> 4) We don't have a conclusion for async/sync execution behavior > yet. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, 4 Feb 2021 at 17:50, Jark Wu <imj...@gmail.com> wrote: > >>>>>>>> > >>>>>>>>> Hi Ingo, > >>>>>>>>> > >>>>>>>>> Since we have supported the WITH syntax and SET command since > v1.9 > >>>>>>> [1][2], > >>>>>>>>> and > >>>>>>>>> we have never received such complaints, I think it's fine for > such > >>>>>>>>> differences. > >>>>>>>>> > >>>>>>>>> Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also > >>>>>> requires > >>>>>>>>> string literal keys[3], > >>>>>>>>> and the SET <key>=<value> doesn't allow quoted keys [4]. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jark > >>>>>>>>> > >>>>>>>>> [1]: > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html > >>>>>>>>> [2]: > >>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries > >>>>>>>>> [3]: > >>>>>>> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL > >>>>>>>>> [4]: > >>>>>>> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli > >>>>>>>>> (search "set mapred.reduce.tasks=32") > >>>>>>>>> > >>>>>>>>> On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <i...@ververica.com> > wrote: > >>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> > >>>>>>>>>> regarding the (un-)quoted question, compatibility is of course > an > >>>>>>>>>> important > >>>>>>>>>> argument, but in terms of consistency I'd find it a bit > surprising > >>>>>> that > >>>>>>>>>> WITH handles it differently than SET, and I wonder if that could > >>>>>> cause > >>>>>>>>>> friction for developers when writing their SQL. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Regards > >>>>>>>>>> Ingo > >>>>>>>>>> > >>>>>>>>>> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <imj...@gmail.com> > wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> Regarding "One Parser", I think it's not possible for now > because > >>>>>>>>>> Calcite > >>>>>>>>>>> parser can't parse > >>>>>>>>>>> special characters (e.g. "-") unless quoting them as string > >>>>>> literals. > >>>>>>>>>>> That's why the WITH option > >>>>>>>>>>> key are string literals not identifiers. > >>>>>>>>>>> > >>>>>>>>>>> SET table.exec.mini-batch.enabled = true and ADD JAR > >>>>>>>>>>> /local/my-home/test.jar > >>>>>>>>>>> have the same > >>>>>>>>>>> problems. That's why we propose two parser, one splits lines > into > >>>>>>>>>> multiple > >>>>>>>>>>> statements and match special > >>>>>>>>>>> command through regex which is light-weight, and delegate other > >>>>>>>>>> statements > >>>>>>>>>>> to the other parser which is Calcite parser. > >>>>>>>>>>> > >>>>>>>>>>> Note: we should stick on the unquoted SET > >>>>>>> table.exec.mini-batch.enabled > >>>>>>>>>> = > >>>>>>>>>>> true syntax, > >>>>>>>>>>> both for backward-compatibility and easy-to-use, and all the > >> other > >>>>>>>>>> systems > >>>>>>>>>>> don't have quotes on the key. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Regarding "table.planner" vs "sql-client.planner", > >>>>>>>>>>> if we want to use "table.planner", I think we should explain > >>>> clearly > >>>>>>>>>> what's > >>>>>>>>>>> the scope it can be used in documentation. > >>>>>>>>>>> Otherwise, there will be users complaining why the planner > >> doesn't > >>>>>>>>>> change > >>>>>>>>>>> when setting the configuration on TableEnv. > >>>>>>>>>>> Would be better throwing an exception to indicate users it's > now > >>>>>>>>>> allowed to > >>>>>>>>>>> change planner after TableEnv is initialized. > >>>>>>>>>>> However, it seems not easy to implement. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Jark > >>>>>>>>>>> > >>>>>>>>>>> On Thu, 4 Feb 2021 at 15:49, godfrey he <godfre...@gmail.com> > >>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>> > >>>>>>>>>>>> Regarding "table.planner" and "table.execution-mode" > >>>>>>>>>>>> If we define that those two options are just used to > initialize > >>>> the > >>>>>>>>>>>> TableEnvironment, +1 for introducing table options instead of > >>>>>>>>>> sql-client > >>>>>>>>>>>> options. > >>>>>>>>>>>> > >>>>>>>>>>>> Regarding "the sql client, we will maintain two parsers", I > want > >>>> to > >>>>>>>>>> give > >>>>>>>>>>>> more inputs: > >>>>>>>>>>>> We want to introduce sql-gateway into the Flink project (see > >>>>>> FLIP-24 > >>>>>>> & > >>>>>>>>>>>> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI > >>>>>> client > >>>>>>>>>> and > >>>>>>>>>>>> the gateway service will communicate through Rest API. The " > ADD > >>>>>> JAR > >>>>>>>>>>>> /local/path/jar " will be executed in the CLI client machine. > So > >>>>>> when > >>>>>>>>>> we > >>>>>>>>>>>> submit a sql file which contains multiple statements, the CLI > >>>>>> client > >>>>>>>>>>> needs > >>>>>>>>>>>> to pick out the "ADD JAR" line, and also statements need to be > >>>>>>>>>> submitted > >>>>>>>>>>> or > >>>>>>>>>>>> executed one by one to make sure the result is correct. The > sql > >>>>>> file > >>>>>>>>>> may > >>>>>>>>>>> be > >>>>>>>>>>>> look like: > >>>>>>>>>>>> > >>>>>>>>>>>> SET xxx=yyy; > >>>>>>>>>>>> create table my_table ...; > >>>>>>>>>>>> create table my_sink ...; > >>>>>>>>>>>> ADD JAR /local/path/jar1; > >>>>>>>>>>>> create function my_udf as com....MyUdf; > >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...; > >>>>>>>>>>>> REMOVE JAR /local/path/jar1; > >>>>>>>>>>>> drop function my_udf; > >>>>>>>>>>>> ADD JAR /local/path/jar2; > >>>>>>>>>>>> create function my_udf as com....MyUdf2; > >>>>>>>>>>>> insert into my_sink select ..., my_udf(xx) from ...; > >>>>>>>>>>>> > >>>>>>>>>>>> The lines need to be splitted into multiple statements first > in > >>>> the > >>>>>>>>>> CLI > >>>>>>>>>>>> client, there are two approaches: > >>>>>>>>>>>> 1. The CLI client depends on the sql-parser: the sql-parser > >> splits > >>>>>>> the > >>>>>>>>>>>> lines and tells which lines are "ADD JAR". > >>>>>>>>>>>> pro: there is only one parser > >>>>>>>>>>>> cons: It's a little heavy that the CLI client depends on the > >>>>>>>>>> sql-parser, > >>>>>>>>>>>> because the CLI client is just a simple tool which receives > the > >>>>>> user > >>>>>>>>>>>> commands and displays the result. The non "ADD JAR" command > will > >>>> be > >>>>>>>>>>> parsed > >>>>>>>>>>>> twice. > >>>>>>>>>>>> > >>>>>>>>>>>> 2. The CLI client splits the lines into multiple statements > and > >>>>>> finds > >>>>>>>>>> the > >>>>>>>>>>>> ADD JAR command through regex matching. > >>>>>>>>>>>> pro: The CLI client is very light-weight. > >>>>>>>>>>>> cons: there are two parsers. > >>>>>>>>>>>> > >>>>>>>>>>>> (personally, I prefer the second option) > >>>>>>>>>>>> > >>>>>>>>>>>> Regarding "SHOW or LIST JARS", I think we can support them > both. > >>>>>>>>>>>> For default dialect, we support SHOW JARS, but if we switch to > >>>> hive > >>>>>>>>>>>> dialect, LIST JARS is also supported. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>> > >>>>>>> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client > >>>>>>>>>>>> [2] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Godfrey > >>>>>>>>>>>> > >>>>>>>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年2月4日周四 上午10:40写道: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi guys, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Regarding #3 and #4, I agree SHOW JARS is more consistent > with > >>>>>> other > >>>>>>>>>>>>> commands than LIST JARS. I don't have a strong opinion about > >>>>>> REMOVE > >>>>>>>>>> vs > >>>>>>>>>>>>> DELETE though. > >>>>>>>>>>>>> > >>>>>>>>>>>>> While flink doesn't need to follow hive syntax, as far as I > >> know, > >>>>>>>>>> most > >>>>>>>>>>>>> users who are requesting these features are previously hive > >>>> users. > >>>>>>>>>> So I > >>>>>>>>>>>>> wonder whether we can support both LIST/SHOW JARS and > >>>>>> REMOVE/DELETE > >>>>>>>>>>> JARS > >>>>>>>>>>>>> as synonyms? It's just like lots of systems accept both EXIT > >> and > >>>>>>>>>> QUIT > >>>>>>>>>>> as > >>>>>>>>>>>>> the command to terminate the program. So if that's not hard > to > >>>>>>>>>> achieve, > >>>>>>>>>>>> and > >>>>>>>>>>>>> will make users happier, I don't see a reason why we must > >> choose > >>>>>> one > >>>>>>>>>>> over > >>>>>>>>>>>>> the other. > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther < > >> twal...@apache.org > >>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> some feedback regarding the open questions. Maybe we can > >> discuss > >>>>>>>>>> the > >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql` story offline to > determine > >>>> how > >>>>>>>>>> we > >>>>>>>>>>>>>> proceed with this in the near future. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1) "whether the table environment has the ability to update > >>>>>>>>>> itself" > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Maybe there was some misunderstanding. I don't think that we > >>>>>>>>>> should > >>>>>>>>>>>>>> support > >>>>>>>>>> `tEnv.getConfig.getConfiguration.setString("table.planner", > >>>>>>>>>>>>>> "old")`. Instead I'm proposing to support > >>>>>>>>>>>>>> `TableEnvironment.create(Configuration)` where planner and > >>>>>>>>>> execution > >>>>>>>>>>>>>> mode are read immediately and a subsequent changes to these > >>>>>>>>>> options > >>>>>>>>>>>> will > >>>>>>>>>>>>>> have no effect. We are doing it similar in `new > >>>>>>>>>>>>>> StreamExecutionEnvironment(Configuration)`. These two > >>>>>>>>>> ConfigOption's > >>>>>>>>>>>>>> must not be SQL Client specific but can be part of the core > >>>> table > >>>>>>>>>>> code > >>>>>>>>>>>>>> base. Many users would like to get a 100% preconfigured > >>>>>>>>>> environment > >>>>>>>>>>>> from > >>>>>>>>>>>>>> just Configuration. And this is not possible right now. We > can > >>>>>>>>>> solve > >>>>>>>>>>>>>> both use cases in one change. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 2) "the sql client, we will maintain two parsers" > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I remember we had some discussion about this and decided > that > >> we > >>>>>>>>>>> would > >>>>>>>>>>>>>> like to maintain only one parser. In the end it is "One > Flink > >>>>>> SQL" > >>>>>>>>>>>> where > >>>>>>>>>>>>>> commands influence each other also with respect to keywords. > >> It > >>>>>>>>>>> should > >>>>>>>>>>>>>> be fine to include the SQL Client commands in the Flink > >> parser. > >>>>>> Of > >>>>>>>>>>>>>> cource the table environment would not be able to handle the > >>>>>>>>>>>> `Operation` > >>>>>>>>>>>>>> instance that would be the result but we can introduce hooks > >> to > >>>>>>>>>>> handle > >>>>>>>>>>>>>> those `Operation`s. Or we introduce parser extensions. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Can we skip `table.job.async` in the first version? We > should > >>>>>>>>>> further > >>>>>>>>>>>>>> discuss whether we introduce a special SQL clause for > wrapping > >>>>>>>>>> async > >>>>>>>>>>>>>> behavior or if we use a config option? Esp. for streaming > >>>> queries > >>>>>>>>>> we > >>>>>>>>>>>>>> need to be careful and should force users to either "one > >> INSERT > >>>>>>>>>> INTO" > >>>>>>>>>>>> or > >>>>>>>>>>>>>> "one STATEMENT SET". > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 3) 4) "HIVE also uses these commands" > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In general, Hive is not a good reference. Aligning the > >> commands > >>>>>>>>>> more > >>>>>>>>>>>>>> with the remaining commands should be our goal. We just had > a > >>>>>>>>>> MODULE > >>>>>>>>>>>>>> discussion where we selected SHOW instead of LIST. But it is > >>>> true > >>>>>>>>>>> that > >>>>>>>>>>>>>> JARs are not part of the catalog which is why I would not > use > >>>>>>>>>>>>>> CREATE/DROP. ADD/REMOVE are commonly siblings in the English > >>>>>>>>>>> language. > >>>>>>>>>>>>>> Take a look at the Java collection API as another example. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 6) "Most of the commands should belong to the table > >> environment" > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks for updating the FLIP this makes things easier to > >>>>>>>>>> understand. > >>>>>>>>>>> It > >>>>>>>>>>>>>> is good to see that most commends will be available in > >>>>>>>>>>>> TableEnvironment. > >>>>>>>>>>>>>> However, I would also support SET and RESET for consistency. > >>>>>>>>>> Again, > >>>>>>>>>>>> from > >>>>>>>>>>>>>> an architectural point of view, if we would allow some kind > of > >>>>>>>>>>>>>> `Operation` hook in table environment, we could check for > SQL > >>>>>>>>>> Client > >>>>>>>>>>>>>> specific options and forward to regular > >>>>>>>>>>> `TableConfig.getConfiguration` > >>>>>>>>>>>>>> otherwise. What do you think? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 03.02.21 08:58, Jark Wu wrote: > >>>>>>>>>>>>>>> Hi Timo, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I will respond some of the questions: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1) SQL client specific options > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Whether it starts with "table" or "sql-client" depends on > >> where > >>>>>>>>>> the > >>>>>>>>>>>>>>> configuration takes effect. > >>>>>>>>>>>>>>> If it is a table configuration, we should make clear what's > >> the > >>>>>>>>>>>>> behavior > >>>>>>>>>>>>>>> when users change > >>>>>>>>>>>>>>> the configuration in the lifecycle of TableEnvironment. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I agree with Shengkai `sql-client.planner` and > >>>>>>>>>>>>>> `sql-client.execution.mode` > >>>>>>>>>>>>>>> are something special > >>>>>>>>>>>>>>> that can't be changed after TableEnvironment has been > >>>>>>>>>> initialized. > >>>>>>>>>>>> You > >>>>>>>>>>>>>> can > >>>>>>>>>>>>>>> see > >>>>>>>>>>>>>>> `StreamExecutionEnvironment` provides `configure()` method > >> to > >>>>>>>>>>>> override > >>>>>>>>>>>>>>> configuration after > >>>>>>>>>>>>>>> StreamExecutionEnvironment has been initialized. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Therefore, I think it would be better to still use > >>>>>>>>>>>>> `sql-client.planner` > >>>>>>>>>>>>>>> and `sql-client.execution.mode`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 2) Execution file > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >From my point of view, there is a big difference between > >>>>>>>>>>>>>>> `sql-client.job.detach` and > >>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` that > >>>>>>>>>> `sql-client.job.detach` > >>>>>>>>>>>> will > >>>>>>>>>>>>>>> affect every single DML statement > >>>>>>>>>>>>>>> in the terminal, not only the statements in SQL files. I > >> think > >>>>>>>>>> the > >>>>>>>>>>>>> single > >>>>>>>>>>>>>>> DML statement in the interactive > >>>>>>>>>>>>>>> terminal is something like tEnv#executeSql() instead of > >>>>>>>>>>>>>>> tEnv#executeMultiSql. > >>>>>>>>>>>>>>> So I don't like the "multi" and "sql" keyword in > >>>>>>>>>>>>> `table.multi-sql-async`. > >>>>>>>>>>>>>>> I just find that runtime provides a configuration called > >>>>>>>>>>>>>>> "execution.attached" [1] which is false by default > >>>>>>>>>>>>>>> which specifies if the pipeline is submitted in attached or > >>>>>>>>>>> detached > >>>>>>>>>>>>>> mode. > >>>>>>>>>>>>>>> It provides exactly the same > >>>>>>>>>>>>>>> functionality of `sql-client.job.detach`. What do you think > >>>>>>>>>> about > >>>>>>>>>>>> using > >>>>>>>>>>>>>>> this option? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> If we also want to support this config in > TableEnvironment, I > >>>>>>>>>> think > >>>>>>>>>>>> it > >>>>>>>>>>>>>>> should also affect the DML execution > >>>>>>>>>>>>>>> of `tEnv#executeSql()`, not only DMLs in > >>>>>>>>>>> `tEnv#executeMultiSql()`. > >>>>>>>>>>>>>>> Therefore, the behavior may look like this: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> val tableResult = tEnv.executeSql("INSERT INTO ...") ==> > >> async > >>>>>>>>>> by > >>>>>>>>>>>>>> default > >>>>>>>>>>>>>>> tableResult.await() ==> manually block until finish > >>>>>>>>>>>>>>> > >>>>>>>>>> > >> tEnv.getConfig().getConfiguration().setString("execution.attached", > >>>>>>>>>>>>>> "true") > >>>>>>>>>>>>>>> val tableResult2 = tEnv.executeSql("INSERT INTO ...") ==> > >>>> sync, > >>>>>>>>>>>> don't > >>>>>>>>>>>>>> need > >>>>>>>>>>>>>>> to wait on the TableResult > >>>>>>>>>>>>>>> tEnv.executeMultiSql( > >>>>>>>>>>>>>>> """ > >>>>>>>>>>>>>>> CREATE TABLE .... ==> always sync > >>>>>>>>>>>>>>> INSERT INTO ... => sync, because we set configuration > above > >>>>>>>>>>>>>>> SET execution.attached = false; > >>>>>>>>>>>>>>> INSERT INTO ... => async > >>>>>>>>>>>>>>> """) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On the other hand, I think `sql-client.job.detach` > >>>>>>>>>>>>>>> and `TableEnvironment.executeMultiSql()` should be two > >> separate > >>>>>>>>>>>> topics, > >>>>>>>>>>>>>>> as Shengkai mentioned above, SQL CLI only depends on > >>>>>>>>>>>>>>> `TableEnvironment#executeSql()` to support multi-line > >>>>>>>>>> statements. > >>>>>>>>>>>>>>> I'm fine with making `executeMultiSql()` clear but don't > want > >>>>>>>>>> it to > >>>>>>>>>>>>> block > >>>>>>>>>>>>>>> this FLIP, maybe we can discuss this in another thread. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [1]: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang < > >> fskm...@gmail.com> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi, Timo. > >>>>>>>>>>>>>>>> Thanks for your detailed feedback. I have some thoughts > >> about > >>>>>>>>>> your > >>>>>>>>>>>>>>>> feedback. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Regarding #1*: I think the main problem is whether the > >> table > >>>>>>>>>>>>>> environment > >>>>>>>>>>>>>>>> has the ability to update itself. Let's take a simple > >> program > >>>>>>>>>> as > >>>>>>>>>>> an > >>>>>>>>>>>>>>>> example. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>> TableEnvironment tEnv = TableEnvironment.create(...); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tEnv.getConfig.getConfiguration.setString("table.planner", > >>>>>>>>>> "old"); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> tEnv.executeSql("..."); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> If we regard this option as a table option, users don't > have > >>>> to > >>>>>>>>>>>> create > >>>>>>>>>>>>>>>> another table environment manually. In that case, tEnv > needs > >>>> to > >>>>>>>>>>>> check > >>>>>>>>>>>>>>>> whether the current mode and planner are the same as > before > >>>>>>>>>> when > >>>>>>>>>>>>>> executeSql > >>>>>>>>>>>>>>>> or explainSql. I don't think it's easy work for the table > >>>>>>>>>>>> environment, > >>>>>>>>>>>>>>>> especially if users have a StreamExecutionEnvironment but > >> set > >>>>>>>>>> old > >>>>>>>>>>>>>> planner > >>>>>>>>>>>>>>>> and batch mode. But when we make this option as a sql > client > >>>>>>>>>>> option, > >>>>>>>>>>>>>> users > >>>>>>>>>>>>>>>> only use the SET command to change the setting. We can > >> rebuild > >>>>>>>>>> a > >>>>>>>>>>> new > >>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>> environment when set successes. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Regarding #2*: I think we need to discuss the > >> implementation > >>>>>>>>>>> before > >>>>>>>>>>>>>>>> continuing this topic. In the sql client, we will maintain > >> two > >>>>>>>>>>>>> parsers. > >>>>>>>>>>>>>> The > >>>>>>>>>>>>>>>> first parser(client parser) will only match the sql client > >>>>>>>>>>> commands. > >>>>>>>>>>>>> If > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> client parser can't parse the statement, we will leverage > >> the > >>>>>>>>>>> power > >>>>>>>>>>>> of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> table environment to execute. According to our blueprint, > >>>>>>>>>>>>>>>> TableEnvironment#executeSql is enough for the sql client. > >>>>>>>>>>> Therefore, > >>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql is out-of-scope for this > >>>> FLIP. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> But if we need to introduce the > >>>>>>>>>> `TableEnvironment.executeMultiSql` > >>>>>>>>>>>> in > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> future, I think it's OK to use the option > >>>>>>>>>> `table.multi-sql-async` > >>>>>>>>>>>>> rather > >>>>>>>>>>>>>>>> than option `sql-client.job.detach`. But we think the name > >> is > >>>>>>>>>> not > >>>>>>>>>>>>>> suitable > >>>>>>>>>>>>>>>> because the name is confusing for others. When setting the > >>>>>>>>>> option > >>>>>>>>>>>>>> false, we > >>>>>>>>>>>>>>>> just mean it will block the execution of the INSERT INTO > >>>>>>>>>>> statement, > >>>>>>>>>>>>> not > >>>>>>>>>>>>>> DDL > >>>>>>>>>>>>>>>> or others(other sql statements are always executed > >>>>>>>>>> synchronously). > >>>>>>>>>>>> So > >>>>>>>>>>>>>> how > >>>>>>>>>>>>>>>> about `table.job.async`? It only works for the sql-client > >> and > >>>>>>>>>> the > >>>>>>>>>>>>>>>> executeMultiSql. If we set this value false, the table > >>>>>>>>>> environment > >>>>>>>>>>>>> will > >>>>>>>>>>>>>>>> return the result until the job finishes. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Regarding #3, #4*: I still think we should use DELETE JAR > >> and > >>>>>>>>>>> LIST > >>>>>>>>>>>>> JAR > >>>>>>>>>>>>>>>> because HIVE also uses these commands to add the jar into > >> the > >>>>>>>>>>>>> classpath > >>>>>>>>>>>>>> or > >>>>>>>>>>>>>>>> delete the jar. If we use such commands, it can reduce > our > >>>>>>>>>> work > >>>>>>>>>>> for > >>>>>>>>>>>>>> hive > >>>>>>>>>>>>>>>> compatibility. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> For SHOW JAR, I think the main concern is the jars are not > >>>>>>>>>>>> maintained > >>>>>>>>>>>>> by > >>>>>>>>>>>>>>>> the Catalog. If we really needs to keep consistent with > SQL > >>>>>>>>>>> grammar, > >>>>>>>>>>>>>> maybe > >>>>>>>>>>>>>>>> we should use > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> `ADD JAR` -> `CREATE JAR`, > >>>>>>>>>>>>>>>> `DELETE JAR` -> `DROP JAR`, > >>>>>>>>>>>>>>>> `LIST JAR` -> `SHOW JAR`. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Regarding #5*: I agree with you that we'd better keep > >>>>>>>>>> consistent. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> *Regarding #6*: Yes. Most of the commands should belong to > >> the > >>>>>>>>>>> table > >>>>>>>>>>>>>>>> environment. In the Summary section, I use the <NOTE> tag > to > >>>>>>>>>>>> identify > >>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>> commands should belong to the sql client and which > commands > >>>>>>>>>> should > >>>>>>>>>>>>>> belong > >>>>>>>>>>>>>>>> to the table environment. I also add a new section about > >>>>>>>>>>>>> implementation > >>>>>>>>>>>>>>>> details in the FLIP. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Timo Walther <twal...@apache.org> 于2021年2月2日周二 下午6:43写道: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Thanks for this great proposal Shengkai. This will give > the > >>>>>>>>>> SQL > >>>>>>>>>>>>> Client > >>>>>>>>>>>>>> a > >>>>>>>>>>>>>>>>> very good update and make it production ready. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Here is some feedback from my side: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 1) SQL client specific options > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I don't think that `sql-client.planner` and > >>>>>>>>>>>>> `sql-client.execution.mode` > >>>>>>>>>>>>>>>>> are SQL Client specific. Similar to > >>>>>>>>>> `StreamExecutionEnvironment` > >>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> `ExecutionConfig#configure` that have been added > recently, > >> we > >>>>>>>>>>>> should > >>>>>>>>>>>>>>>>> offer a possibility for TableEnvironment. How about we > >> offer > >>>>>>>>>>>>>>>>> `TableEnvironment.create(ReadableConfig)` and add a > >>>>>>>>>>> `table.planner` > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> `table.execution-mode` to > >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.config.TableConfigOptions`? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 2) Execution file > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Did you have a look at the Appendix of FLIP-84 [1] > >> including > >>>>>>>>>> the > >>>>>>>>>>>>>> mailing > >>>>>>>>>>>>>>>>> list thread at that time? Could you further elaborate how > >> the > >>>>>>>>>>>>>>>>> multi-statement execution should work for a unified > >>>>>>>>>>> batch/streaming > >>>>>>>>>>>>>>>>> story? According to our past discussions, each line in an > >>>>>>>>>>> execution > >>>>>>>>>>>>>> file > >>>>>>>>>>>>>>>>> should be executed blocking which means a streaming query > >>>>>>>>>> needs a > >>>>>>>>>>>>>>>>> statement set to execute multiple INSERT INTO statement, > >>>>>>>>>> correct? > >>>>>>>>>>>> We > >>>>>>>>>>>>>>>>> should also offer this functionality in > >>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()`. Whether > >>>>>>>>>>>> `sql-client.job.detach` > >>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>> SQL Client specific needs to be determined, it could also > >> be > >>>> a > >>>>>>>>>>>>> general > >>>>>>>>>>>>>>>>> `table.multi-sql-async` option? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 3) DELETE JAR > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" > >> sounds > >>>>>>>>>> like > >>>>>>>>>>>> one > >>>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>> actively deleting the JAR in the corresponding path. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 4) LIST JAR > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> This should be `SHOW JARS` according to other SQL > commands > >>>>>>>>>> such > >>>>>>>>>>> as > >>>>>>>>>>>>>> `SHOW > >>>>>>>>>>>>>>>>> CATALOGS`, `SHOW TABLES`, etc. [2]. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> We should keep the details in sync with > >>>>>>>>>>>>>>>>> `org.apache.flink.table.api.ExplainDetail` and avoid > >>>> confusion > >>>>>>>>>>>> about > >>>>>>>>>>>>>>>>> differently named ExplainDetails. I would vote for > >>>>>>>>>>> `ESTIMATED_COST` > >>>>>>>>>>>>>>>>> instead of `COST`. I'm sure the original author had a > >> reason > >>>>>>>>>> why > >>>>>>>>>>> to > >>>>>>>>>>>>>> call > >>>>>>>>>>>>>>>>> it that way. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 6) Implementation details > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> It would be nice to understand how we plan to implement > the > >>>>>>>>>> given > >>>>>>>>>>>>>>>>> features. Most of the commands and config options should > go > >>>>>>>>>> into > >>>>>>>>>>>>>>>>> TableEnvironment and SqlParser directly, correct? This > way > >>>>>>>>>> users > >>>>>>>>>>>>> have a > >>>>>>>>>>>>>>>>> unified way of using Flink SQL. TableEnvironment would > >>>>>>>>>> provide a > >>>>>>>>>>>>>> similar > >>>>>>>>>>>>>>>>> user experience in notebooks or interactive programs than > >> the > >>>>>>>>>> SQL > >>>>>>>>>>>>>> Client. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > >>>>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On 02.02.21 10:13, Shengkai Fang wrote: > >>>>>>>>>>>>>>>>>> Sorry for the typo. I mean `RESET` is much better rather > >>>> than > >>>>>>>>>>>>> `UNSET`. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Shengkai Fang <fskm...@gmail.com> 于2021年2月2日周二 > 下午4:44写道: > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Hi, Jingsong. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Thanks for your reply. I think `UNSET` is much better. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> 1. We don't need to introduce another command `UNSET`. > >>>>>>>>>> `RESET` > >>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>> supported in the current sql client now. Our proposal > >> just > >>>>>>>>>>>> extends > >>>>>>>>>>>>>> its > >>>>>>>>>>>>>>>>>>> grammar and allow users to reset the specified keys. > >>>>>>>>>>>>>>>>>>> 2. Hive beeline also uses `RESET` to set the key to the > >>>>>>>>>> default > >>>>>>>>>>>>>>>>> value[1]. > >>>>>>>>>>>>>>>>>>> I think it is more friendly for batch users. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Jingsong Li <jingsongl...@gmail.com> 于2021年2月2日周二 > >>>> 下午1:56写道: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Thanks for the proposal, yes, sql-client is too > >> outdated. > >>>>>>>>>> +1 > >>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>> improving it. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> About "SET" and "RESET", Why not be "SET" and > "UNSET"? > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Jingsong > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li < > >>>>>>>>>> lirui.fu...@gmail.com> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for the update! The proposed changes > >> look > >>>>>>>>>>> good > >>>>>>>>>>>> to > >>>>>>>>>>>>>>>> me. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang < > >>>>>>>>>>>> fskm...@gmail.com > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Hi, Rui. > >>>>>>>>>>>>>>>>>>>>>> You are right. I have already modified the FLIP. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> The main changes: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> # -f parameter has no restriction about the > statement > >>>>>>>>>> type. > >>>>>>>>>>>>>>>>>>>>>> Sometimes, users use the pipe to redirect the result > >> of > >>>>>>>>>>>> queries > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>> debug > >>>>>>>>>>>>>>>>>>>>>> when submitting job by -f parameter. It's much > >>>> convenient > >>>>>>>>>>>>>> comparing > >>>>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>>>>>> writing INSERT INTO statements. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> # Add a new sql client option > `sql-client.job.detach` > >> . > >>>>>>>>>>>>>>>>>>>>>> Users prefer to execute jobs one by one in the batch > >>>>>>>>>> mode. > >>>>>>>>>>>> Users > >>>>>>>>>>>>>>>> can > >>>>>>>>>>>>>>>>>>>>> set > >>>>>>>>>>>>>>>>>>>>>> this option false and the client will process the > next > >>>>>>>>>> job > >>>>>>>>>>>> until > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>> current job finishes. The default value of this > option > >>>> is > >>>>>>>>>>>> false, > >>>>>>>>>>>>>>>>> which > >>>>>>>>>>>>>>>>>>>>>> means the client will execute the next job when the > >>>>>>>>>> current > >>>>>>>>>>>> job > >>>>>>>>>>>>> is > >>>>>>>>>>>>>>>>>>>>>> submitted. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 > >> 下午4:52写道: > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> Regarding #2, maybe the -f options in flink and > hive > >>>>>>>>>> have > >>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>>>>>> implications, and we should clarify the behavior. > For > >>>>>>>>>>>> example, > >>>>>>>>>>>>> if > >>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>> client just submits the job and exits, what happens > >> if > >>>>>>>>>> the > >>>>>>>>>>>> file > >>>>>>>>>>>>>>>>>>>>> contains > >>>>>>>>>>>>>>>>>>>>>>> two INSERT statements? I don't think we should > treat > >>>>>>>>>> them > >>>>>>>>>>> as > >>>>>>>>>>>> a > >>>>>>>>>>>>>>>>>>>>> statement > >>>>>>>>>>>>>>>>>>>>>>> set, because users should explicitly write BEGIN > >>>>>>>>>> STATEMENT > >>>>>>>>>>>> SET > >>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> that > >>>>>>>>>>>>>>>>>>>>>>> case. And the client shouldn't asynchronously > submit > >>>> the > >>>>>>>>>>> two > >>>>>>>>>>>>>> jobs, > >>>>>>>>>>>>>>>>>>>>> because > >>>>>>>>>>>>>>>>>>>>>>> the 2nd may depend on the 1st, right? > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang < > >>>>>>>>>>>>> fskm...@gmail.com > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Hi Rui, > >>>>>>>>>>>>>>>>>>>>>>>> Thanks for your feedback. I agree with your > >>>>>>>>>> suggestions. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 1: Yes. we are plan to > strengthen > >>>>>>>>>> the > >>>>>>>>>>> set > >>>>>>>>>>>>>>>>>>>>> command. In > >>>>>>>>>>>>>>>>>>>>>>>> the implementation, it will just put the key-value > >>>> into > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>> `Configuration`, which will be used to generate > the > >>>>>>>>>> table > >>>>>>>>>>>>>> config. > >>>>>>>>>>>>>>>>> If > >>>>>>>>>>>>>>>>>>>>> hive > >>>>>>>>>>>>>>>>>>>>>>>> supports to read the setting from the table > config, > >>>>>>>>>> users > >>>>>>>>>>>> are > >>>>>>>>>>>>>>>> able > >>>>>>>>>>>>>>>>>>>>> to set > >>>>>>>>>>>>>>>>>>>>>>>> the hive-related settings. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> For the suggestion 2: The -f parameter will submit > >> the > >>>>>>>>>> job > >>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> exit. > >>>>>>>>>>>>>>>>>>>>> If > >>>>>>>>>>>>>>>>>>>>>>>> the queries never end, users have to cancel the > job > >> by > >>>>>>>>>>>>>>>> themselves, > >>>>>>>>>>>>>>>>>>>>> which is > >>>>>>>>>>>>>>>>>>>>>>>> not reliable(people may forget their jobs). In > most > >>>>>>>>>> case, > >>>>>>>>>>>>>> queries > >>>>>>>>>>>>>>>>>>>>> are used > >>>>>>>>>>>>>>>>>>>>>>>> to analyze the data. Users should use queries in > the > >>>>>>>>>>>>> interactive > >>>>>>>>>>>>>>>>>>>>> mode. > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> Rui Li <lirui.fu...@gmail.com> 于2021年1月29日周五 > >>>> 下午3:18写道: > >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. > I > >>>>>>>>>> think > >>>>>>>>>>> it > >>>>>>>>>>>>>>>>> covers a > >>>>>>>>>>>>>>>>>>>>>>>>> lot of useful features which will dramatically > >>>> improve > >>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>> usability of our > >>>>>>>>>>>>>>>>>>>>>>>>> SQL Client. I have two questions regarding the > >> FLIP. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> 1. Do you think we can let users set arbitrary > >>>>>>>>>>>> configurations > >>>>>>>>>>>>>>>> via > >>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>> SET command? A connector may have its own > >>>>>>>>>> configurations > >>>>>>>>>>>> and > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>>>>>>>> don't have > >>>>>>>>>>>>>>>>>>>>>>>>> a way to dynamically change such configurations > in > >>>> SQL > >>>>>>>>>>>>> Client. > >>>>>>>>>>>>>>>> For > >>>>>>>>>>>>>>>>>>>>> example, > >>>>>>>>>>>>>>>>>>>>>>>>> users may want to be able to change hive conf > when > >>>>>>>>>> using > >>>>>>>>>>>> hive > >>>>>>>>>>>>>>>>>>>>> connector [1]. > >>>>>>>>>>>>>>>>>>>>>>>>> 2. Any reason why we have to forbid queries in > SQL > >>>>>>>>>> files > >>>>>>>>>>>>>>>> specified > >>>>>>>>>>>>>>>>>>>>> with > >>>>>>>>>>>>>>>>>>>>>>>>> the -f option? Hive supports a similar -f option > >> but > >>>>>>>>>>> allows > >>>>>>>>>>>>>>>>> queries > >>>>>>>>>>>>>>>>>>>>> in the > >>>>>>>>>>>>>>>>>>>>>>>>> file. And a common use case is to run some query > >> and > >>>>>>>>>>>> redirect > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>> results > >>>>>>>>>>>>>>>>>>>>>>>>> to a file. So I think maybe flink users would > like > >> to > >>>>>>>>>> do > >>>>>>>>>>>> the > >>>>>>>>>>>>>>>> same, > >>>>>>>>>>>>>>>>>>>>>>>>> especially in batch scenarios. > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>> https://issues.apache.org/jira/browse/FLINK-20590 > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu < > >>>>>>>>>>>>>>>>>>>>> liuyang0...@gmail.com> > >>>>>>>>>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Hi Shengkai, > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Glad to see this improvement. And I have some > >>>>>>>>>> additional > >>>>>>>>>>>>>>>>>>>>> suggestions: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> #1. Unify the TableEnvironment in > ExecutionContext > >>>> to > >>>>>>>>>>>>>>>>>>>>>>>>>> StreamTableEnvironment for both streaming and > >> batch > >>>>>>>>>> sql. > >>>>>>>>>>>>>>>>>>>>>>>>>> #2. Improve the way of results retrieval: sql > >> client > >>>>>>>>>>>> collect > >>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> results > >>>>>>>>>>>>>>>>>>>>>>>>>> locally all at once using accumulators at > present, > >>>>>>>>>>>>>>>>>>>>>>>>>> which may have memory issues in JM > or > >>>> Local > >>>>>>>>>> for > >>>>>>>>>>>> the > >>>>>>>>>>>>>> big > >>>>>>>>>>>>>>>>> query > >>>>>>>>>>>>>>>>>>>>>>>>>> result. > >>>>>>>>>>>>>>>>>>>>>>>>>> Accumulator is only suitable for testing > purpose. > >>>>>>>>>>>>>>>>>>>>>>>>>> We may change to use > SelectTableSink, > >>>> which > >>>>>>>>>> is > >>>>>>>>>>>> based > >>>>>>>>>>>>>>>>>>>>>>>>>> on CollectSinkOperatorCoordinator. > >>>>>>>>>>>>>>>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway > which > >>>>>>>>>> is in > >>>>>>>>>>>>>>>> FLIP-91. > >>>>>>>>>>>>>>>>>>>>> Seems > >>>>>>>>>>>>>>>>>>>>>>>>>> that this FLIP has not moved forward for a long > >>>> time. > >>>>>>>>>>>>>>>>>>>>>>>>>> Provide a long running service out > of > >> the > >>>>>>>>>> box to > >>>>>>>>>>>>>>>>> facilitate > >>>>>>>>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>>>>>>> sql > >>>>>>>>>>>>>>>>>>>>>>>>>> submission is necessary. > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> What do you think of these? > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai Fang <fskm...@gmail.com> 于2021年1月28日周四 > >>>>>>>>>>> 下午8:54写道: > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Hi devs, > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Jark and I want to start a discussion about > >>>>>>>>>>> FLIP-163:SQL > >>>>>>>>>>>>>>>> Client > >>>>>>>>>>>>>>>>>>>>>>>>>>> Improvements. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Many users have complained about the problems > of > >>>> the > >>>>>>>>>>> sql > >>>>>>>>>>>>>>>> client. > >>>>>>>>>>>>>>>>>>>>> For > >>>>>>>>>>>>>>>>>>>>>>>>>>> example, users can not register the table > >> proposed > >>>>>>>>>> by > >>>>>>>>>>>>>> FLIP-95. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> The main changes in this FLIP: > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> - use -i parameter to specify the sql file to > >>>>>>>>>>> initialize > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>>>>> table > >>>>>>>>>>>>>>>>>>>>>>>>>>> environment and deprecated YAML file; > >>>>>>>>>>>>>>>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' > >>>>>>>>>>>> parameter; > >>>>>>>>>>>>>>>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR; > >>>>>>>>>>>>>>>>>>>>>>>>>>> - support statement set syntax; > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> For more detailed changes, please refer to > >>>>>>>>>> FLIP-163[1]. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Look forward to your feedback. > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements > >>>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> *With kind regards > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>> ------------------------------------------------------------ > >>>>>>>>>>>>>>>>>>>>>>>>>> Sebastian Liu 刘洋 > >>>>>>>>>>>>>>>>>>>>>>>>>> Institute of Computing Technology, Chinese > Academy > >>>> of > >>>>>>>>>>>>> Science > >>>>>>>>>>>>>>>>>>>>>>>>>> Mobile\WeChat: +86—15201613655 > >>>>>>>>>>>>>>>>>>>>>>>>>> E-mail: liuyang0...@gmail.com < > >>>> liuyang0...@gmail.com > >>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>>> QQ: 3239559* > >>>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>>>>>>>> Best regards! > >>>>>>>>>>>>>>>>>>>>>>>>> Rui Li > >>>>>>>>>>> > > > >