John Snow <js...@redhat.com> writes: > On 04/18/2018 03:25 AM, Markus Armbruster wrote: >> John Snow <js...@redhat.com> writes: >> >>> On 04/17/2018 09:44 AM, Markus Armbruster wrote: >>>> John Snow <js...@redhat.com> writes: >>>> >>>>> This series seeks to address two distinct but closely related issues >>>>> concerning the job management API. >>>>> >>>>> (1) For jobs that complete when a monitor is not attached and receiving >>>>> events or notifications, there's no way to discern the job's final >>>>> return code. Jobs must remain in the query list until dismissed >>>>> for reliable management. >>>>> >>>>> (2) Jobs that change the block graph structure at an indeterminate point >>>>> after the job starts compete with the management layer that relies >>>>> on that graph structure to issue meaningful commands. >>>>> >>>>> This structure should change only at the behest of the management >>>>> API, and not asynchronously at unknown points in time. Before a job >>>>> issues such changes, it must rely on explicit and synchronous >>>>> confirmation from the management API. >>>>> >>>>> These changes are implemented by formalizing a State Transition Machine >>>>> for the BlockJob subsystem. >>>>> >>>>> Job States: >>>>> >>>>> UNDEFINED Default state. Internal state only. >>>>> CREATED Job has been created >>>>> RUNNING Job has been started and is running >>>>> PAUSED Job is not ready and has been paused >>>>> READY Job is ready and is running >>>>> STANDBY Job is ready and is paused >>>>> >>>>> WAITING Job is waiting on peers in transaction >>>>> PENDING Job is waiting on ACK from QMP >>>>> ABORTING Job is aborting or has been cancelled >>>>> CONCLUDED Job has finished and has a retcode available >>>>> NULL Job is being dismantled. Internal state only. >>>>> >>>>> Job Verbs: >>>>> >>> >>> Backporting your quote up here: >>> >>>> For each job verb and job state: what's the new job state? >>>> >>> >>> That's not always 1:1, though I tried to address it in the commit messages. >> >> Let me rephrase my question then. For each job verb and job state: what >> are the possible new job states? If there's more than one, what's the >> condition for each? >> > > Is my answer below not sufficient? Maybe you're asking "Can you write > this up in a formal document" instead, or did I miss explaining something?
Your answer was fine. I blame my one-pass reply writing. Nevertheless: >> I appreciate commit messages explaining that, but having complete state >> machine documentation in one place (a comment or in docs/) would be >> nice, wouldn't it? Pretty-please? >>>>> CANCEL Instructs a running job to terminate with error, >>>>> (Except when that job is READY, which produces no error.) >>> >>> CANCEL will take a job to either NULL... (this is the early abort >>> pathway, prior to the job being fully realized.) >>> >>> ...or to ABORTING (from CREATED once it has fully realized the job, or >>> from RUNNING, READY, WAITING, or PENDING.) >>> >>>>> PAUSE Request a job to pause. >>> >>> issued to RUNNING or READY, transitions to PAUSED or STANDBY respectively. >>> >>>>> RESUME Request a job to resume from a pause. >>> >>> issued to PAUSED or STANDBY, transitions to RUNNING or READY respectively. >>> >>>>> SET-SPEED Change the speed limiting parameter of a job. >>> >>> No run state change. >>> >>>>> COMPLETE Ask a READY job to finish and exit. >>>>> >>> >>> Issued to a READY job, transitions to WAITING. >>> >>>>> FINALIZE Ask a PENDING job to perform its graph finalization. >>> >>> Issued to a PENDING job, transitions to CONCLUDED. >>> >>>>> DISMISS Finish cleaning up an empty job. >>>> >>> >>> Issued to a CONCLUDED job, transitions to NULL. >>> >>> >>>>> And here's my stab at a diagram: >>>>> >>>>> +---------+ >>>>> |UNDEFINED| >>>>> +--+------+ >>>>> | >>>>> +--v----+ >>>>> +---------+CREATED+-----------------+ >>>>> | +--+----+ | >>>>> | | | >>>>> | +--+----+ +------+ | >>>>> +---------+RUNNING<----->PAUSED| | >>>>> | +--+-+--+ +------+ | >>>>> | | | | >>>>> | | +------------------+ | >>>>> | | | | >>>>> | +--v--+ +-------+ | | >>>>> +---------+READY<------->STANDBY| | | >>>>> | +--+--+ +-------+ | | >>>>> | | | | >>>>> | +--v----+ | | >>>>> +---------+WAITING<---------------+ | >>>>> | +--+----+ | >>>>> | | | >>>>> | +--v----+ | >>>>> +---------+PENDING| | >>>>> | +--+----+ | >>>>> | | | >>>>> +--v-----+ +--v------+ | >>>>> |ABORTING+--->CONCLUDED| | >>>>> +--------+ +--+------+ | >>>>> | | >>>>> +--v-+ | >>>>> |NULL<--------------------+ >>>>> +----+ >>>> >>>> Is this diagram missing a few arrowheads? E.g. on the edge between >>>> RUNNING and WAITING. >>>> >>> >>> Apparently yes. :\ >>> >>> (Secretly fixed up in my reply.) >>> >>>> Might push the limits of ASCII art, but here goes anyway: can we label >>>> the arrows with job verbs? >>>> >>> >>> Can you recommend a tool to help me do that? I've been using asciiflow >>> infinity (http://asciiflow.com) and it's not very good, but I don't have >>> anything better. >> >> I do my ASCII art in Emacs picture-mode. >> >>>> Can you briefly explain how this state machine addresses (1) and (2)? >>>> >>> >>> (1) The CONCLUDED state allows jobs to persist in the job query list >>> after they would have disappeared in 2.11-era QEMU. This lets us query >>> for completion codes and to dismiss the job at our own leisure. >> >> Got it. >> >>> (2) The PENDING state allows jobs to wait in a nearly-completed state, >>> pending authorization from the QMP client to make graph changes. >>> Otherwise, the job has to asynchronously perform this cleanup and the >>> exact point in time is unknowable to the QMP client. By making a PENDING >>> state and a finalize callback (.prepare), we can make this portion of a >>> job's task synchronous. >> >> This provides for jobs modifying the graph on job completion. It >> doesn't provide for jobs modifying the graph while they run. Fine with >> me; we're not aware of a use for messing with the graph in the middle of >> a job. >> > > I didn't consider this possibility. The concept could in theory be > expanded to arbitrary sync points, but I'm not going to worry about that > until the need arises. Makes sense. >>> "John, you added more than two states..." >>> >>> Yup, this was to help simplify the existing state machine, believe it or >>> not. I modeled all jobs as transactions to eliminate different cleanup >>> routing and added two new interim states; >>> >>> - WAITING >>> - ABORTING >>> >>> to help make assertions about the valid transitions jobs can make. The >>> ABORTING state helps make it clear when a job is allowed to fail (and >>> emit QMP events related to such). >>> >>> The WAITING state is simply advisory to help a client know that a job is >>> "finished" but cannot yet receive further instruction because of peers >>> in a transaction. This helps me to add nice QMP errors for any verbs >>> issued to such jobs. "Sorry pal, this job is waiting and can't hear you >>> right now!" >>> >>> This kept the code cleaner than adding a bunch of very fragile boolean >>> error-checking pathways in dozens of helper functions to help avoid >>> illegal instructions on jobs not prepared to receive those instructions. >>> >>> So these two new states don't help accomplish (1) or (2) strictly, but >>> they do facilitate the code additions that _do_ a lot less ugly. >> > > I really bungled that sentence. No problem, I got it anyway :) >> Thanks! >> >> Looks like a fine starting point for in-tree state machine documentation >> :)