On 08.04.21 19:26, Vladimir Sementsov-Ogievskiy wrote:
08.04.2021 20:04, John Snow wrote:
On 4/8/21 12:58 PM, Vladimir Sementsov-Ogievskiy wrote:
job-complete command is async. Can we instead just add a boolean like
job->completion_requested, and set it if job-complete called in
STANDBY state, and on job_resume job_complete will be called
automatically if this boolean is true?
job_complete has a synchronous setup, though -- we lose out on a lot
of synchronous error checking in that circumstance.
yes, that's a problem..
I was not able to audit it to determine that it'd be safe to attempt
that setup during a drained section -- I imagine it won't work and
will fail, though.
So I thought we'd have to signal completion and run the setup *later*,
but what do we do if we get an error then? Does the entire job fail?
Do we emit some new event? ("BLOCK_JOB_COMPLETION_FAILED" ?) Is it
recoverable?
Isn't it possible even now, that after successful job-complete job still
fails and we report BLOCK_JOB_COMPLETED with error?
And actually, how much benefit user get from the fact that job-complete
may fail?
We can make job-complete a simple always-success boolean flag setter
like job-pause.
I wanted to say the following:
But job-pause does always succeed, in contrast to block-job-complete.
block-job-complete is more akin to job-finalize, which too is a
synchronous operation.
But when I wrote that last sentence, I asked myself whether what
mirror_complete() does isn’t actually a remnant of what we had to do
when we didn’t have job-finalize yet. Shouldn’t that all be in
mirror_exit_common()? What’s the advantage of opening the backing chain
or putting blockers on the to-replace node in block-job-complete?
Aren’t that all graph-changing operation, basically, i.e. stuff that
should be done in job-finalize?
If we move everything to mirror_exit_common(), all that remains to do is
basically set some should_complete flag (could even be part of the Job
struct), and then the whole problem disappears.
Thoughts?
Max
And actual completion will be done in background, when possible. And if
it fail, job just fails, like it does for any background io error. And
user have to check error/success status of final BLOCK_JOB_COMPLETED
anyway.