[ 
https://issues.apache.org/jira/browse/FLINK-10212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marc Rooding updated FLINK-10212:
---------------------------------
    Description: 
*Background*

I'm one of the authors of the open-source Flink job deployer 
([https://github.com/ing-bank/flink-deployer)]. Recently, I rewrote our 
implementation to use the Flink REST API instead of the native CLI. 

In our use case, we store the job savepoints in a Kubernetes persistent volume. 
For our deployer, we mount the persistent volume to our deployer container so 
that we can find and use the savepoints. 

In the rewrite to the REST API, I saw that the API to monitor savepoint 
creation returns the complete path to the created savepoint, and we can use 
this one in the job deployer to start the new job with the latest save point.

However, we also allow users to deploy a job with a recovered state by 
specifying only the directory savepoints are stored in. In this scenario we 
will look for the latest savepoint created for this job ourselves inside the 
given directory. To find this path, we're still relying on the mounted volume 
and listing directory content to discover savepoints.

*Feature*

I was thinking that it might be a good addition if the native Flink REST API 
offers the ability to retrieve savepoints. Seeing that the API doesn't 
inherently know where savepoints are stored, it could take a path as one of the 
arguments. It could even allow the user to provide a job ID as an argument so 
that the API would be able to search for savepoints for a specific job ID in 
the specified directory. 

As the API would require the path as an argument, and providing a path 
containing forward slashes in the URL isn't ideal, I'm eager to discuss what a 
proper solution would look like.

A POST request to /jobs/:jobid/savepoints with the path as a body parameter 
would make sense if the API were to offer to list all save points in a specific 
path but this request is already being used for creating new savepoints.

An alternative could be a POST to /savepoints with the path and job ID in the 
request body.

A POST request to retrieve data is obviously not the most straightforward 
approach but in my opinion still preferable over a GET to, for example, 
/jobs/:jobid/savepoints/:targetDirectory

I'm willing to help out on this one by submitting a pull request.

Looking forward to your thoughts! 

  was:
*Background*

I'm one of the authors of the open-source Flink job deployer 
([https://github.com/ing-bank/flink-deployer)]. Recently, I rewrote our 
implementation to use the Flink REST API instead of the native CLI. 

In our use case, we store the job savepoints in a Kubernetes persistent volume. 
For our deployer, we mount the persistent volume to our deployer container so 
that we can find and use the savepoints. 

In the rewrite to the REST API, I saw that the API to monitor savepoint 
creation returns the complete path to the created savepoint, and we can use 
this one in the job deployer to start the new job with the latest save point.

However, we also allow users to deploy a job with a recovered state by 
specifying only the directory savepoints are stored in. In this scenario we 
will look for the latest savepoint created for this job ourselves inside the 
given directory. To find this path, we're still relying on the mounted volume 
and listing directory content to discover savepoints.

*Feature*

I was thinking that it might be a good addition if the native Flink REST API 
offers the ability to retrieve savepoints. Seeing that the API doesn't 
inherently know where savepoints are stored, it could take a path as one of the 
arguments. It could even allow the user to provide a job ID as an argument so 
that the API would be able to search for savepoints for a specific job ID in 
the specified directory.

 

As the API would require the path as an argument, and providing a path 
containing forward slashes in the URL isn't ideal, I'm eager to discuss what a 
proper solution would look like.

A POST request to /jobs/:jobid/savepoints with the path as a body parameter 
would make sense if the API were to offer to list all save points in a specific 
path but this request is already being used for creating new savepoints.

An alternative could be a POST to /savepoints with the path and job ID in the 
request body.

A POST request to retrieve data is obviously not the most straightforward 
approach but in my opinion still preferable over a GET to, for example, 
/jobs/:jobid/savepoints/:targetDirectory

I'm willing to help out on this one by submitting a pull request.

Looking forward to your thoughts! 


> REST API for listing all the available save points
> --------------------------------------------------
>
>                 Key: FLINK-10212
>                 URL: https://issues.apache.org/jira/browse/FLINK-10212
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Marc Rooding
>            Priority: Major
>
> *Background*
> I'm one of the authors of the open-source Flink job deployer 
> ([https://github.com/ing-bank/flink-deployer)]. Recently, I rewrote our 
> implementation to use the Flink REST API instead of the native CLI. 
> In our use case, we store the job savepoints in a Kubernetes persistent 
> volume. For our deployer, we mount the persistent volume to our deployer 
> container so that we can find and use the savepoints. 
> In the rewrite to the REST API, I saw that the API to monitor savepoint 
> creation returns the complete path to the created savepoint, and we can use 
> this one in the job deployer to start the new job with the latest save point.
> However, we also allow users to deploy a job with a recovered state by 
> specifying only the directory savepoints are stored in. In this scenario we 
> will look for the latest savepoint created for this job ourselves inside the 
> given directory. To find this path, we're still relying on the mounted volume 
> and listing directory content to discover savepoints.
> *Feature*
> I was thinking that it might be a good addition if the native Flink REST API 
> offers the ability to retrieve savepoints. Seeing that the API doesn't 
> inherently know where savepoints are stored, it could take a path as one of 
> the arguments. It could even allow the user to provide a job ID as an 
> argument so that the API would be able to search for savepoints for a 
> specific job ID in the specified directory. 
> As the API would require the path as an argument, and providing a path 
> containing forward slashes in the URL isn't ideal, I'm eager to discuss what 
> a proper solution would look like.
> A POST request to /jobs/:jobid/savepoints with the path as a body parameter 
> would make sense if the API were to offer to list all save points in a 
> specific path but this request is already being used for creating new 
> savepoints.
> An alternative could be a POST to /savepoints with the path and job ID in the 
> request body.
> A POST request to retrieve data is obviously not the most straightforward 
> approach but in my opinion still preferable over a GET to, for example, 
> /jobs/:jobid/savepoints/:targetDirectory
> I'm willing to help out on this one by submitting a pull request.
> Looking forward to your thoughts! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to