Hi Godfrey, Sorry for the late reply, I was on vacation.
It looks like we have a variety of preferences on the syntax, how about we choose the most acceptable one? WRT keyword for SQL jobs, we use JOBS, thus the statements related to jobs would be: - SHOW JOBS - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and `table.job.stop-with-drain`) WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR JOB`: - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job manager remembers) - DROP SAVEPOINT <savepoint_path> cc @Jark @ShengKai @Martijn @Timo . Best, Paul Lam godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道: > Hi Paul, > > Thanks for the update. > > >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs > (DataStream or SQL) or > clients (SQL client or CLI). > > Is DataStream job a QUERY? I think not. > For a QUERY, the most important concept is the statement. But the > result does not contain this info. > If we need to contain all jobs in the cluster, I think the name should > be JOB or PIPELINE. > I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. > > > SHOW SAVEPOINTS > To list the savepoint for a specific job, we need to specify a > specific pipeline, > the syntax should be SHOW SAVEPOINTS FOR PIPELINE id > > Best, > Godfrey > > Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道: > > > > Hi Jark, > > > > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s > > part of the reason why I proposed “STOP/CANCEL QUERY” at the > > beginning. The downside of it is that it’s not ANSI-SQL compatible. > > > > Another question is, what should be the syntax for ungracefully > > canceling a query? As ShengKai pointed out in a offline discussion, > > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. > > Flink CLI has both stop and cancel, mostly due to historical problems. > > > > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is > > that savepoints are owned by users and beyond the lifecycle of a Flink > > cluster. For example, a user might take a savepoint at a custom path > > that’s different than the default savepoint path, I think jobmanager > would > > not remember that, not to mention the jobmanager may be a fresh new > > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's > > probably a best-effort one. > > > > WRT savepoint syntax, I’m thinking of the semantic of the savepoint. > > Savepoints are alias for nested transactions in DB area[1], and there’s > > correspondingly global transactions. If we consider Flink jobs as > > global transactions and Flink checkpoints as nested transactions, > > then the savepoint semantics are close, thus I think savepoint syntax > > in SQL-standard could be considered. But again, I’m don’t have very > > strong preference. > > > > Ping @Timo to get more inputs. > > > > [1] https://en.wikipedia.org/wiki/Nested_transaction < > https://en.wikipedia.org/wiki/Nested_transaction> > > > > Best, > > Paul Lam > > > > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道: > > > > > > Hi Paul, > > > > > > 1) SHOW QUERIES > > > +1 to add finished time, but it would be better to call it "end_time" > to > > > keep aligned with names in Web UI. > > > > > > 2) DROP QUERY > > > I think we shouldn't throw exceptions for batch jobs, otherwise, how to > > > stop batch queries? > > > At present, I don't think "DROP" is a suitable keyword for this > statement. > > > From the perspective of users, "DROP" sounds like the query should be > > > removed from the > > > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is more > > > suitable and > > > compliant with commands of Flink CLI. > > > > > > 3) SHOW SAVEPOINTS > > > I think this statement is needed, otherwise, savepoints are lost after > the > > > SAVEPOINT > > > command is executed. Savepoints can be retrieved from REST API > > > "/jobs/:jobid/checkpoints" > > > with filtering "checkpoint_type"="savepoint". It's also worth > considering > > > providing "SHOW CHECKPOINTS" > > > to list all checkpoints. > > > > > > 4) SAVEPOINT & RELEASE SAVEPOINT > > > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT > statements > > > now. > > > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are > both > > > the same savepoint id. > > > However, in our syntax, the first one is query id, and the second one > is > > > savepoint path, which is confusing and > > > not consistent. When I came across SHOW SAVEPOINT, I thought maybe they > > > should be in the same syntax set. > > > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT > > > <sp_path>. > > > That means we don't follow the majority of vendors in SAVEPOINT > commands. I > > > would say the purpose is different in Flink. > > > What other's opinion on this? > > > > > > Best, > > > Jark > > > > > > [1]: > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints > > > > > > > > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> wrote: > > > > > >> Hi Godfrey, > > >> > > >> Thanks a lot for your inputs! > > >> > > >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs > (DataStream > > >> or SQL) or > > >> clients (SQL client or CLI). Under the hook, it’s based on > > >> ClusterClient#listJobs, the > > >> same with Flink CLI. I think it’s okay to have non-SQL jobs listed in > SQL > > >> client, because > > >> these jobs can be managed via SQL client too. > > >> > > >> WRT finished time, I think you’re right. Adding it to the FLIP. But > I’m a > > >> bit afraid that the > > >> rows would be too long. > > >> > > >> WRT ‘DROP QUERY’, > > >>> What's the behavior for batch jobs and the non-running jobs? > > >> > > >> > > >> In general, the behavior would be aligned with Flink CLI. Triggering a > > >> savepoint for > > >> a non-running job would cause errors, and the error message would be > > >> printed to > > >> the SQL client. Triggering a savepoint for batch(unbounded) jobs in > > >> streaming > > >> execution mode would be the same with streaming jobs. However, for > batch > > >> jobs in > > >> batch execution mode, I think there would be an error, because batch > > >> execution > > >> doesn’t support checkpoints currently (please correct me if I’m > wrong). > > >> > > >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/ > > >> jobClient doesn’t have such a functionality at the moment, neither do > > >> Flink CLI. > > >> Maybe we could make it a follow-up FLIP, which includes the > modifications > > >> to > > >> clusterClient/jobClient and Flink CLI. WDYT? > > >> > > >> Best, > > >> Paul Lam > > >> > > >>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道: > > >>> > > >>> Godfrey > > >> > > >> > > >