On Wed, May 18, 2011 at 04:41:55PM +0200, Michael Hanselmann wrote:
> Hello
> 
> The following is a design proposal for the implementation of “chained jobs”. 
> It
> is not yet finished, but before I get into the technical details I'd like to
> get some review on the general idea (see the “TODO” section).

Hi,

Question: how does this help multi-group?

> Thanks,
> Michael
> 
> +++ b/doc/design-chained-jobs.rst
> @@ -0,0 +1,142 @@
> +============
> +Chained jobs
> +============
> +
> +.. contents:: :depth: 4
> +
> +This is a design document about the innards of Ganeti's job processing.
> +Readers are advised to study previous design documents on the topic:
> +
> +- :ref:`Original job queue <jqueue-original-design>`
> +- :ref:`Job priorities <jqueue-job-priority-design>`
> +- :doc:`LU-generated jobs <design-lu-generated-jobs>`
> +
> +
> +Current state and shortcomings
> +==============================
> +
> +Ever since the introduction of the job queue with Ganeti 2.0 there have
> +been situations where we wanted to run several jobs in a specific order.
> +Due to the job queue's current design, such a guarantee can not be
> +given. Jobs are run according to their priority, their ability to
> +acquire all necessary locks and other factors.
> +
> +One way to work around this limitation is to do some kind of job
> +grouping in the client code. Once all jobs of a group have finished, the
> +next group is submitted and waited for. There are different kinds of
> +clients for Ganeti, some of which don't share code (e.g. Python clients
> +vs. htools). This design proposes a solution which would be implemented
> +as part of the job queue in the master daemon.

FYI, for htools at least, the current solution is working well enough,
so I'm not likely to change over. The rationale is that queue management
is easier in the current situation as compared to submitting all jobs
upfront.

> +
> +
> +Proposed changes
> +================
> +
> +With the implementation of :ref:`job priorities
> +<jqueue-job-priority-design>` the processing code was re-architectured
> +and became a lot more versatile. It now returns jobs to the queue in
> +case the locks for an opcode can't be acquired, allowing other
> +jobs/opcodes to be run in the meantime.
> +
> +The proposal is to add a new, optional property to opcodes to define
> +dependencies on other jobs. Job X could define opcodes with a dependency
> +on the success of job Y and would only be run once job Y is finished. If
> +there's a dependency on success and job Y failed, job X would fail as
> +well. Since such dependencies would use job IDs, the jobs still need to
> +be submitted in the right order.
> +
> +The new attribute's value would be a list of two-valued tuples. Each
> +tuple contains the job ID and a boolean denoting whether success is
> +required.
> +
> +With this change, clients can submit all wanted jobs in the right order
> +and proceed to wait for changes on all these jobs (see
> +``cli.JobExecutor``). The master daemon will take care of executing them
> +in the right order, while still presenting the client with a simple
> +interface.
> +
> +.. highlight:: javascript
> +
> +Example data structures::
> +
> +  # First job
> +  {
> +    "job_id": "6151",
> +    "ops": [
> +      { "OP_ID": "OP_INSTANCE_REPLACE_DISKS", ..., },
> +      { "OP_ID": "OP_INSTANCE_FAILOVER", ..., },
> +      ],
> +  }
> +
> +  # Second job, runs in parallel with first job
> +  {
> +    "job_id": "7687",
> +    "ops": [
> +      { "OP_ID": "OP_INSTANCE_MIGRATE", ..., },
> +      ],
> +  }
> +
> +  # Third job, depending on success of previous jobs
> +  {
> +    "job_id": "9218",
> +    "ops": [
> +      { "OP_ID": "OP_NODE_SET_PARAMS",
> +        "depend": [
> +          [6151, True],
> +          [7687, True],
> +          ],
> +        "offline": True, },
> +      ],
> +  }

Question: does this mean one has to submit the first job, get its id,
and only then submit the second job where 'depend' contains the id
gotten from the first job submit?

> +
> +
> +.. TODO:
> +
> +   - How to notify jobs waiting in queue?
> +   - How to handle quick job archivals? Minimum age? Separate backlog
> +     of N jobs with job ID and final status?
> +   - How to handle job submissions referencing unknown jobs?
> +   - Should explicit status ("success", "error") be used to define
> +     dependencies?
> +
> +
> +Other discussed solutions
> +=========================
> +
> +Job-level attribute
> +-------------------
> +
> +At a first look it might seem to be better to put dependencies on
> +previous jobs at a job level. However, it turns out that having the
> +option of defining only a single opcode in a job as having such a
> +dependency can be useful as well. The code complexity in the job queue
> +is equivalent if not simpler.

Ack.

> +Since opcodes are guaranteed to run in order, clients can just define
> +the dependency on the first opcode.
> +
> +Another reason for the choice of an opcode-level attribute is that the
> +current LUXI interface for submitting jobs is a bit restricted and would
> +need to be changed to allow the addition of job-level attributes,
> +potentially requiring changes in all LUXI clients and/or breaking
> +backwards compatibility.

Not only this. If it would be a Luxi-level parameter, then we would not
have in the job queue this information. The dependency data should be
persistent, so it needs to be in the job or opcode.

> +Client-side logic
> +-----------------
> +
> +There's at least one implementation of a batched job executor twisted
> +into the ``burnin`` tool's code. While certainly possible, a client-side
> +solution should be avoided due to the different clients already in use.
> +For one, the :doc:`remote API <rapi>` client shouldn't import
> +non-standard modules. htools are written in Haskell and can't use Python
> +modules. A batched job executor contains quite some logic.

Disagree here (last sentence) :)

> Even if
> +cleanly abstracted in a (Python) library, sharing code between different
> +clients is difficult if not impossible.

Ack.

iustin

Reply via email to