Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Paul Lam Wed, 30 Mar 2022 03:38:47 -0700

Hi Cheng,

Thanks a lot for your input! IMHO, now the whole design gets pretty clear.


I try to re-summarize the following plan, please collect me if I’m wrong.

1. Discuss savepoint SQL syntax with Flink community

There 3 types of syntax:

1) ANSI SQL style with extended object

> SHOW SAVEPOINTS <query_id>
> CREATE SAVEPOINT <query_id>
> DROP SAVEPOINT <savepoint_id>


2) ANSI SQL style with stored procedure

CALL trigger_savepoint(…)
CALL show_savepoints(…)
CALL drop_savepoint(...)

3) Command-like style 

TRIGGER SAVEPOINT <query_id>
SHOW SAVEPOINTS <query_id>
REMOVE SAVEPOINT <savepoint_path>

If we reach agreement with Flink community on one of them, we adopt 
the syntax. There’re might be some duplicate efforts in a short time,
but finally we will converge, and reuse most if not all functionalities 
that Flink provides.

Personally speaking, I think 3) is most likely to be adopted by Flink 
community, since TRIGGER/SAVEPOINT are already reserved 
keywords in Flink SQL[1]. Should we propose only 3) to Flink 
community?

2. Draft a KIP about savepoint management

TODO list as far we can see:
1) Support retrieving query ID in Flink engine
2) Introduce a SQL layer to support new SQL syntax if needed (compatible 
    with Flink SQL)
3) Support savepoint related operations in Flink engine
4) Extend Beeline to support query ID

Best,
Paul Lam

> 2022年3月30日 16:01，Cheng Pan <[email protected]> 写道：
> 
> Thanks Paul, I agree that we’ve reached a consensus on high-level, 1)
> use SQL to manipulate the savepoint, 2) follow upstreaming-first
> philosophy in SQL syntax and RPC protocol to achieve the best
> compatibility and user experience.
> 
> Specifically for details, add some comments.
> 
>> 1) ANSI SQL
>>  `CALL trigger_savepoint($query_id)`
>>  `CALL show_savepoint($query_id)`
> 
> We could give more flexibility to the concept of ANSI SQL-like.
> 
> For instance, we have
> 
> SHOW TABLES [LIKE ...]
> ALTER TABLE <table_name> SET xxx
> ALTER TABLE ADD ...
> DROP TABLE <table_name>
> SELECT xxx FROM <table_name>
> DESC <table_name>
> 
> We can extend SQL in same style for savepoints, e.g.
> 
> SHOW SAVEPOINTS <query_id>
> CREATE SAVEPOINT <query_id>
> DROP SAVEPOINT <query_id>
> SELECT ... FROM <system.savepoint_table_name> WHERE ...
> DESC <query_id>
> 
> One example is DistSQL[1]
> 
> The command style is specific to introduce new SQL action keywords, e.g.
> 
> OPTIMIZE <table_name>, VACUUM <table_name>, KILL <query_id>, KILL
> QUERY <query_id>
> 
> Usually, different engines/databases may have different syntax for the
> same behavior or different behavior in the same syntax. Unless the
> syntax has been adopted by the upstream, I prefer to use
> 
> CALL <procedure_name>(arg1, arg2, ...)
> 
> to avoid conflicting, and switch to the official syntax once the
> upstream introduces the new syntax.
> 
>> There 2 approach to return the query ID to the clients.
>> 
>> 1) TGetQueryIdReq/Resp
>> The clients need to request the query ID when a query is finished.
>> Given that the origin semantic for the Req is to return all query IDs in the 
>> session[1],
>> we may needed change it “the ID of the latest query”, or else it would be 
>> difficult
>> for users to figure out which ID is the right one.
>> 
>> 2) Return it in the result set
>> This approach is straightforward. Flink returns a -1 as the affected rows,
>> which is not very useful. We can simply replace that with the query ID.
> 
> Have a look on the TGetQueryIdReq/Resp, I think we can simplify the procedure 
> to
> 
> 1. client sends an ExecuteQueryReq
> 2. server returns an OpHandle to client immediately
> 3. client sends TGetQueryIdReq(OpHandle) to ask for QueryId
> periodically until a legal result.
> 4. server returns the corresponding TGetQueryIdResp(QueryId) is
> available, otherwise returns a predefined QueryId constant e.g.
> 'UNDEFINED_QUERY_ID' if the statement does not accepted by the engine
> (there is no queryId for the stmt now)
> 
> [1] https://github.com/apache/shardingsphere/releases/tag/5.1.0
> 
> Thanks,
> Cheng Pan
> 
> On Tue, Mar 29, 2022 at 7:13 PM 林小铂 <[email protected]> wrote:
>> 
>> Hi team,
>> 
>> Sorry for the late follow-up. It took me some time to do some research.
>> 
>> TL;DR  It’s good to express savepoint in SQL statements. We should join 
>> efforts
>> withFlink community to discuss SQL syntax for savepoint statements.There’re
>> mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And
>> the rests are implementation details, such as how to return the query ID.
>> 
>> We had an offline discussion on DingTalk last week, and I believe we’ve 
>> reached
>> a consensus on some issues.
>> 
>> As pointed out in the previous mails, we should consider
>> 1. how to trigger a savepoint?
>> 2. how to find the available savepoints/checkpoints for a job?
>> 3. how to specify a savepoint/checkpoint for restore?
>> 
>> However, 3 is already supported by Flink SQL client, leaving 2 questions. As 
>> we
>> discussed previous, the most straightforward solution is to extend Flink’s 
>> SQL
>> parser to support savepointcommand. In such way, we treat savepoint
>> command as a normal SQL statement. So we could split the topic into SQL
>> syntax and implementation.
>> 
>> WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to align
>> these efforts with Flink community. So I think we should draft a proposal and
>> start a discussion at Flink community to determine a solution , then we could
>> implement it in Kyuubi first and push back to Flink (I’m planning to start a
>> discussion in Flink community this week).
>> 
>> We have two solutions (thanks to Cheng):
>> 
>> 1) ANSI SQL
>> 
>>   `CALL trigger_savepoint($query_id)`
>>   `CALL show_savepoint($query_id)`
>> 
>> pros:
>> - no syntax conflict
>> - respect ANSI SQL
>> 
>> cons:
>> - CALL is not used in Flink SQL yet
>> - not sure if it’s viable to return savepoint paths, because stored 
>> procedures
>>  should return rows count in normal cases
>> 
>> 2)  Custom command
>> 
>>  `TRIGGER SAVEPOINT $query_id`
>>  `SHOW SAVEPOINT $query_id`
>> 
>> pros:
>> - simple syntax, easy to understand
>> 
>> cons:
>> - need to introduce new reserved keywords TRIGGER/SAVEPOINT
>> - not ANSI-SQL compatible
>> 
>> 
>> WRT implementations, first we need a query ID, namely Flink job ID,
>> which we could acquire through TableResult with a few adjustments
>> to ExecuteStatement in Flink Engine.
>> 
>> There 2 approach to return the query ID to the clients.
>> 
>> 1) TGetQueryIdReq/Resp
>> The clients need to request the query ID when a query is finished.
>> Given that the origin semantic for the Req is to return all query IDs in the 
>> session[1],
>> we may needed change it “the ID of the latest query”, or else it would be 
>> difficult
>> for users to figure out which ID is the right one.
>> 
>> 2) Return it in the result set
>> This approach is straightforward. Flink returns a -1 as the affected rows,
>> which is not very useful. We can simply replace that with the query ID.
>> 
>> Please tell me what do you think. Thanks a lot!
>> 
>> [1] 
>> https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761
>> 
>> 
>>> 2022年3月24日 18:15，Vino Yang <[email protected]> 写道：
>>> 
>>> Hi Paul,
>>> 
>>> Big +1 for the proposal.
>>> 
>>> You can summarize all of this into a design document. And drive this 
>>> feature!
>>> 
>>> Best,
>>> Vino
>>> 
>>> Paul Lam <[email protected]> 于2022年3月22日周二 14:40写道：
>>>> 
>>>> Hi Kent,
>>>> 
>>>> Thanks for your pointer!
>>>> 
>>>> TGetQueryIdReq/Resp looks very promising.
>>>> 
>>>> Best,
>>>> Paul Lam
>>>> 
>>>>> 2022年3月21日 12:20，Kent Yao <[email protected]> 写道：
>>>>> 
>>>>> 
>>>> 
>>

Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Reply via email to