Hi Till,
The problem is reproducible with a basic shell script doing the following
operations.
1. Post request to /jobs/${JOB_ID}/savepoints with the payload
{"cancel-job": true,"target-directory": $(LOCATION)}
and store the trigger ID
2. Sleep 10 seconds
3. Get jobs/${JOB_ID}/savepoints/$(TRIGGER_ID)
results in a connect exception because rest endpoint is shutdown.
Sorry, if I misunderstood you previous answer but I would expect that stopping
the job
with a savepoint is an asynchronous operation and should block the shutdown
until
the result is served.
I also can confirm that the cluster is not shutdown but the rest endpoint is
which makes
it impossible to serve the asynchronous result.
Best,
Fabian