Hi Godfrey,

Sorry for the late reply, I was on vacation.

It looks like we have a variety of preferences on the syntax, how about we
choose the most acceptable one?

WRT keyword for SQL jobs, we use JOBS, thus the statements related to jobs
would be:

- SHOW JOBS
- STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and
`table.job.stop-with-drain`)

WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR JOB`:

- CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
- SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job manager
remembers)
- DROP SAVEPOINT <savepoint_path>

cc @Jark @ShengKai @Martijn @Timo .

Best,
Paul Lam


godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道:

> Hi Paul,
>
> Thanks for the update.
>
> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
> (DataStream or SQL) or
> clients (SQL client or CLI).
>
> Is DataStream job a QUERY? I think not.
> For a QUERY, the most important concept is the statement. But the
> result does not contain this info.
> If we need to contain all jobs in the cluster, I think the name should
> be JOB or PIPELINE.
> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
>
> > SHOW SAVEPOINTS
> To list the savepoint for a specific job, we need to specify a
> specific pipeline,
> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
>
> Best,
> Godfrey
>
> Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道:
> >
> > Hi Jark,
> >
> > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
> > part of the reason why I proposed “STOP/CANCEL QUERY” at the
> > beginning. The downside of it is that it’s not ANSI-SQL compatible.
> >
> > Another question is, what should be the syntax for ungracefully
> > canceling a query? As ShengKai pointed out in a offline discussion,
> > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
> > Flink CLI has both stop and cancel, mostly due to historical problems.
> >
> > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is
> > that savepoints are owned by users and beyond the lifecycle of a Flink
> > cluster. For example, a user might take a savepoint at a custom path
> > that’s different than the default savepoint path, I think jobmanager
> would
> > not remember that, not to mention the jobmanager may be a fresh new
> > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's
> > probably a best-effort one.
> >
> > WRT savepoint syntax, I’m thinking of the semantic of the savepoint.
> > Savepoints are alias for nested transactions in DB area[1], and there’s
> > correspondingly global transactions. If we consider Flink jobs as
> > global transactions and Flink checkpoints as nested transactions,
> > then the savepoint semantics are close, thus I think savepoint syntax
> > in SQL-standard could be considered. But again, I’m don’t have very
> > strong preference.
> >
> > Ping @Timo to get more inputs.
> >
> > [1] https://en.wikipedia.org/wiki/Nested_transaction <
> https://en.wikipedia.org/wiki/Nested_transaction>
> >
> > Best,
> > Paul Lam
> >
> > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道:
> > >
> > > Hi Paul,
> > >
> > > 1) SHOW QUERIES
> > > +1 to add finished time, but it would be better to call it "end_time"
> to
> > > keep aligned with names in Web UI.
> > >
> > > 2) DROP QUERY
> > > I think we shouldn't throw exceptions for batch jobs, otherwise, how to
> > > stop batch queries?
> > > At present, I don't think "DROP" is a suitable keyword for this
> statement.
> > > From the perspective of users, "DROP" sounds like the query should be
> > > removed from the
> > > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is more
> > > suitable and
> > > compliant with commands of Flink CLI.
> > >
> > > 3) SHOW SAVEPOINTS
> > > I think this statement is needed, otherwise, savepoints are lost after
> the
> > > SAVEPOINT
> > > command is executed. Savepoints can be retrieved from REST API
> > > "/jobs/:jobid/checkpoints"
> > > with filtering "checkpoint_type"="savepoint". It's also worth
> considering
> > > providing "SHOW CHECKPOINTS"
> > > to list all checkpoints.
> > >
> > > 4) SAVEPOINT & RELEASE SAVEPOINT
> > > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT
> statements
> > > now.
> > > In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are
> both
> > > the same savepoint id.
> > > However, in our syntax, the first one is query id, and the second one
> is
> > > savepoint path, which is confusing and
> > > not consistent. When I came across SHOW SAVEPOINT, I thought maybe they
> > > should be in the same syntax set.
> > > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT
> > > <sp_path>.
> > > That means we don't follow the majority of vendors in SAVEPOINT
> commands. I
> > > would say the purpose is different in Flink.
> > > What other's opinion on this?
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints
> > >
> > >
> > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> wrote:
> > >
> > >> Hi Godfrey,
> > >>
> > >> Thanks a lot for your inputs!
> > >>
> > >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
> (DataStream
> > >> or SQL) or
> > >> clients (SQL client or CLI). Under the hook, it’s based on
> > >> ClusterClient#listJobs, the
> > >> same with Flink CLI. I think it’s okay to have non-SQL jobs listed in
> SQL
> > >> client, because
> > >> these jobs can be managed via SQL client too.
> > >>
> > >> WRT finished time, I think you’re right. Adding it to the FLIP. But
> I’m a
> > >> bit afraid that the
> > >> rows would be too long.
> > >>
> > >> WRT ‘DROP QUERY’,
> > >>> What's the behavior for batch jobs and the non-running jobs?
> > >>
> > >>
> > >> In general, the behavior would be aligned with Flink CLI. Triggering a
> > >> savepoint for
> > >> a non-running job would cause errors, and the error message would be
> > >> printed to
> > >> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
> > >> streaming
> > >> execution mode would be the same with streaming jobs. However, for
> batch
> > >> jobs in
> > >> batch execution mode, I think there would be an error, because batch
> > >> execution
> > >> doesn’t support checkpoints currently (please correct me if I’m
> wrong).
> > >>
> > >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink clusterClient/
> > >> jobClient doesn’t have such a functionality at the moment, neither do
> > >> Flink CLI.
> > >> Maybe we could make it a follow-up FLIP, which includes the
> modifications
> > >> to
> > >> clusterClient/jobClient and Flink CLI. WDYT?
> > >>
> > >> Best,
> > >> Paul Lam
> > >>
> > >>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道:
> > >>>
> > >>> Godfrey
> > >>
> > >>
> >
>

Reply via email to