Re: [libvirt] [RFC] New domain job control and stat APIs

2019-07-25 Thread Daniel P . Berrangé
On Thu, Jul 25, 2019 at 01:52:01PM +0200, Pavel Hrdina wrote:
> On Wed, Jul 24, 2019 at 03:43:14PM +0100, Daniel P. Berrangé wrote:
> 
> [...]
> 
> > > +typedef struct _virDomainJob virDomainJob;
> > > +typedef virDomainJob *virDomainJobPtr;
> > > +struct _virDomainJob {
> > > +char *name;
> > > +virDomainJobType type;
> > > +virDomainJobState state;
> > > +
> > > +/* possibly overkill? - currently empty*/
> > > +virTypedParameterPtr data;
> > > +size_t ndata;
> > > +};
> > 
> > I'd have a preference for keeping this as an opaque
> > struct not exposed in the public header, and making
> > it a first class object. So instead have accessors
> > 
> >   char * virDomainJobGetName(virDomainJobPtr job);
> >   virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
> >   virDomainJobState virDOmainJobGetState(virDomainJobPtr job);
> > 
> > which fetch from the local struct. The local struct would
> > also need to contain a reference to the virDomainPtrpass over
> > 
> > This avoids the query you have about virTypedParameterPtr
> > inclusion, as we can ignore it now, and if we get a compelling
> > need, we then just add a remotable method:
> > 
> >virDomainJobGetStats(virDomainJobPtr job,
> > virTypedParameterPtr **params,
> > unsigned int *nparams);
> 
> Agreed to make it first class object.  Shouldn't we rename it to
> virJob* as well or do we want to have it tied to domains only?

I guess the obvious use case would be in the storage pool APIs
which already have long running background tasks where this would
be useful.

The difficulty though is that we then need APIs for the job
dispatched to different drivers depending on the instance of
the job. eg one job needs to be processed by QEMU driver
whle the next job needs to be processed by the storage driver.
This could get quite fugly with the way we do API dispatch.
We already have this with the virStream APIs of course, but
I think they are one of the more horrible parts of our internal
framework.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC] New domain job control and stat APIs

2019-07-25 Thread Pavel Hrdina
On Wed, Jul 24, 2019 at 03:43:14PM +0100, Daniel P. Berrangé wrote:

[...]

> > +typedef struct _virDomainJob virDomainJob;
> > +typedef virDomainJob *virDomainJobPtr;
> > +struct _virDomainJob {
> > +char *name;
> > +virDomainJobType type;
> > +virDomainJobState state;
> > +
> > +/* possibly overkill? - currently empty*/
> > +virTypedParameterPtr data;
> > +size_t ndata;
> > +};
> 
> I'd have a preference for keeping this as an opaque
> struct not exposed in the public header, and making
> it a first class object. So instead have accessors
> 
>   char * virDomainJobGetName(virDomainJobPtr job);
>   virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
>   virDomainJobState virDOmainJobGetState(virDomainJobPtr job);
> 
> which fetch from the local struct. The local struct would
> also need to contain a reference to the virDomainPtrpass over
> 
> This avoids the query you have about virTypedParameterPtr
> inclusion, as we can ignore it now, and if we get a compelling
> need, we then just add a remotable method:
> 
>virDomainJobGetStats(virDomainJobPtr job,
> virTypedParameterPtr **params,
>   unsigned int *nparams);

Agreed to make it first class object.  Shouldn't we rename it to
virJob* as well or do we want to have it tied to domains only?

> > +void virDomainJobFree(virDomainJobPtr job);
> > +
> > +int virDomainJobList(virDomainPtr domain,
> > + virDomainJobPtr **jobs,
> > + unsigned int flags);
> > +
> > +int virDomainJobGetXMLDesc(virDomainPtr domain,
> > +   const char *jobname,
> > +   unsigned int flags);
> 
> This would become
> 
>  int virDomainJobGetXMLDesc(virDomainJobPtr job,
> unsigned int flags);
> 
> 
> > +
> > +typedef enum {
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> > +
> > +# ifdef VIR_ENUM_SENTINELS
> > +VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> > +# endif
> > +} virDomainJobControlOperation;
> > +
> > +int virDomainJobControl(virDomainPtr domain,
> > +const char *jobname,
> > +virDomainJobControlOperation op,
> > +unsigned int flags);
> 
> And again change to
> 
>  int virDomainJobControl(virDomainJobPtr job,
>  virDomainJobControlOperation op,
>  unsigned int flags);
> 
> 
> I think I'd also have a preference to identify a job based on a UUID too,
> as that gives stronger uniqueness. eg when debugging there's no risk of
> confusing two jobs which are against different domains but happened to get
> the same name. 

Agreed, we need to have a unique ID and we should definitely avoid any
magic with ID=0 to pick the "current" job as that would only lead to lot
of issues.

Overall looks good.

Pavel


signature.asc
Description: PGP signature
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [RFC] New domain job control and stat APIs

2019-07-24 Thread Daniel P . Berrangé
On Wed, Jul 10, 2019 at 02:27:21PM +0200, Peter Krempa wrote:
> Currently we don't have a consolidated approach for managing
> asynchronous long-running domain jobs. Historically there were
> long-running jobs which interlocked with each other and thus there was
> only one such job possible at given time (migration, save, restore, dump)
> 
> These jobs have a not very flexible set of APIs:
> virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.

Yeah the person who designed virDomainGetJobInfo was not doing
a very good job of it ;-P

> With this series I want to introduce a set of APIs for managing the jobs
> which are designed to be universal enough and a new event so that noting
> will try to retrofit onto existing infrastructure.
> 
> An example of the job XML would be:
> 
> 

I'd expect to see a unique identifier of some kind here.

>   
> vda
> vda[1]
> vda[5]
>   
>   
>12345
>12345
>   
> 
> 
> but this will be mostly a topic for the second part of this excercise
> after we discuss the APIs.

> +typedef enum {
> +VIR_DOMAIN_JOB_TYPE_NONE = 0,
> +VIR_DOMAIN_JOB_TYPE_MIGRATION = 1,
> +VIR_DOMAIN_JOB_TYPE_BLOCK_PULL = 2,
> +[...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +VIR_DOMAIN_JOB_TYPE_LAST
> +# endif
> +} virDomainJobType;
> +
> +
> +typedef enum {
> +VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> +VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> +VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state and 
> may be finalized */
> +VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> +VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully */
> +VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> +[...]
> +
> +# ifdef VIR_ENUM_SENTINELS
> +VIR_DOMAIN_JOB_STATE_LAST
> +# endif
> +} virDomainJobState;
> +
> +
> +typedef struct _virDomainJob virDomainJob;
> +typedef virDomainJob *virDomainJobPtr;
> +struct _virDomainJob {
> +char *name;
> +virDomainJobType type;
> +virDomainJobState state;
> +
> +/* possibly overkill? - currently empty*/
> +virTypedParameterPtr data;
> +size_t ndata;
> +};

I'd have a preference for keeping this as an opaque
struct not exposed in the public header, and making
it a first class object. So instead have accessors

  char * virDomainJobGetName(virDomainJobPtr job);
  virDomainJobType virDOmainJobGetType(virDomainJobPtr job);
  virDomainJobState virDOmainJobGetState(virDomainJobPtr job);

which fetch from the local struct. The local struct would
also need to contain a reference to the virDomainPtrpass over

This avoids the query you have about virTypedParameterPtr
inclusion, as we can ignore it now, and if we get a compelling
need, we then just add a remotable method:

   virDomainJobGetStats(virDomainJobPtr job,
virTypedParameterPtr **params,
unsigned int *nparams);

> +void virDomainJobFree(virDomainJobPtr job);
> +
> +int virDomainJobList(virDomainPtr domain,
> + virDomainJobPtr **jobs,
> + unsigned int flags);
> +
> +int virDomainJobGetXMLDesc(virDomainPtr domain,
> +   const char *jobname,
> +   unsigned int flags);

This would become

 int virDomainJobGetXMLDesc(virDomainJobPtr job,
unsigned int flags);


> +
> +typedef enum {
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_NONE = 0,
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_ABORT = 1,
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_FINALIZE = 2,
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_PAUSE = 3,
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_RESUME = 4,
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_DISMISS = 5,
> +
> +# ifdef VIR_ENUM_SENTINELS
> +VIR_DOMAIN_JOB_CONTROL_OPERATION_LAST
> +# endif
> +} virDomainJobControlOperation;
> +
> +int virDomainJobControl(virDomainPtr domain,
> +const char *jobname,
> +virDomainJobControlOperation op,
> +unsigned int flags);

And again change to

 int virDomainJobControl(virDomainJobPtr job,
 virDomainJobControlOperation op,
 unsigned int flags);


I think I'd also have a preference to identify a job based on a UUID too,
as that gives stronger uniqueness. eg when debugging there's no risk of
confusing two jobs which are against different domains but happened to get
the same name. 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] [RFC] New domain job control and stat APIs

2019-07-10 Thread Peter Krempa
On Wed, Jul 10, 2019 at 08:38:20 -0500, Eric Blake wrote:
> On 7/10/19 7:27 AM, Peter Krempa wrote:
> > Currently we don't have a consolidated approach for managing
> > asynchronous long-running domain jobs. Historically there were
> > long-running jobs which interlocked with each other and thus there was
> > only one such job possible at given time (migration, save, restore, dump)
> 
> Yes, I agree with the problem statement. In fact, one of my questions
> when starting on incremental backup was whether I'd have to implement
> any of this first, or punt it to later. Well, here we are; it's later,
> and I still don't have incremental backup in.
> 
> > 
> > These jobs have a not very flexible set of APIs:
> > virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.
> > 
> > These don't really allow selecting which job to terminate since there's
> > only one, thus if we wanted to add different kinds of jobs which not
> > necessarily interlock but are able to run in parallel we had to
> > introduce another set of APIs.
> > 
> > This resulted into creation of block job APIs:
> > virDomainBlockJobAbort, virDomainGetBlockJobInfo
> > 
> > These allow parallel jobs (discriminated by disk to which the job
> > belongs) but are not universal and nor allow parallel jobs on a single
> > disk.
> > 
> > Similarly blockjobs can also become detached from the disk e.g. if the
> > guest unplugs the disk fronted. That way the job would linger in a limbo
> > and would not be controllable. (This is certainly a possibility with
> > -blockdev).
> > 
> > With -blockdev we also get a potentially long-running blockdev-create
> > job which is not bound to any disk as part of kicking of a snapshot or
> > block copy job. This one might also get stuck and in the current state
> > is not really controllable.
> > 
> > Additionally the upcomming block-backup job will be a combination of the
> > above. It's a job which spans multiple disks (thus not really a block
> > job in libvirt terminology) but not a domain job either as there
> > can be potentially more than one block backup job. The proposal for
> > block-backup introduces it's own single-purpose set of APIs for managing
> > the backup job only, but abuses the block job and domain job events to
> > distribute the async state updates.
> 
> At the bare minimum, a push job implementation has to send async state
> update events, but a pull job does not. If the backup API goes into
> 5.6.0 but not your new job control APIs, we can limit the qemu
> implementation to JUST pull backups for now (the XML has been shown to
> be extensible to add in push support once we also have sane APIs for
> managing async state updates with push support).
> 
> > 
> > With this series I want to introduce a set of APIs for managing the jobs
> > which are designed to be universal enough and a new event so that noting
> > will try to retrofit onto existing infrastructure.
> > 
> > An example of the job XML would be:
> > 
> > 
> >   
> > vda
> > vda[1]
> > vda[5]
> >   
> 
> Where what forms a valid  is dependent on the type='...' of the
> , correct?

Yes exactly.

> 
> >   
> >12345
> >12345
> >   
> > 
> 
> Looks reasonable.
> 
> > 
> > but this will be mostly a topic for the second part of this excercise
> > after we discuss the APIs.
> > 
> > The new infrastructure will also allow adding a flag for all the
> > existing APIs which kick-off a job so that the job will persist even
> > after it finishes. This will also properly implement the statistics for
> > a finished migration and similar.
> > 
> > Obviously we will need to take special care when wiring up these so that
> > the old APIs work for old situations and also the events are reported
> > correctly.
> > 
> > The initial idea would be to implement the stats XML both for the domain
> > jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
> > so that they won't have to infer whether the given job type is already
> > reported in the new API.
> > 
> > Additionally we can also implement flags for the XML getter API that
> > will skip the stats gathering as that may require monitor interactions.
> > Also one possibility would be to return an abbreviated XML in the
> > listing API.
> 
> Makes sense.
> 
> > ---

[...]

> > +typedef enum {
> > +VIR_DOMAIN_JOB_STATE_NONE = 0, /* unknown job state */
> > +VIR_DOMAIN_JOB_STATE_RUNNING = 1, /* job is currently running */
> > +VIR_DOMAIN_JOB_STATE_READY = 2, /* job reached a synchronized state 
> > and may be finalized */
> > +VIR_DOMAIN_JOB_STATE_FAILED = 3, /* job has failed */
> > +VIR_DOMAIN_JOB_STATE_COMPLETED = 4, /* job has completed successfully 
> > */
> > +VIR_DOMAIN_JOB_STATE_ABORTED = 5, /* job has been aborted */
> > +[...]
> > +
> > +# ifdef VIR_ENUM_SENTINELS
> > +VIR_DOMAIN_JOB_STATE_LAST
> > +# endif
> > +} virDomainJobState;
> 
> Not all job types will utilize all of the possible job states, but I
> don't see that as an 

Re: [libvirt] [RFC] New domain job control and stat APIs

2019-07-10 Thread Eric Blake
On 7/10/19 7:27 AM, Peter Krempa wrote:
> Currently we don't have a consolidated approach for managing
> asynchronous long-running domain jobs. Historically there were
> long-running jobs which interlocked with each other and thus there was
> only one such job possible at given time (migration, save, restore, dump)

Yes, I agree with the problem statement. In fact, one of my questions
when starting on incremental backup was whether I'd have to implement
any of this first, or punt it to later. Well, here we are; it's later,
and I still don't have incremental backup in.

> 
> These jobs have a not very flexible set of APIs:
> virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.
> 
> These don't really allow selecting which job to terminate since there's
> only one, thus if we wanted to add different kinds of jobs which not
> necessarily interlock but are able to run in parallel we had to
> introduce another set of APIs.
> 
> This resulted into creation of block job APIs:
> virDomainBlockJobAbort, virDomainGetBlockJobInfo
> 
> These allow parallel jobs (discriminated by disk to which the job
> belongs) but are not universal and nor allow parallel jobs on a single
> disk.
> 
> Similarly blockjobs can also become detached from the disk e.g. if the
> guest unplugs the disk fronted. That way the job would linger in a limbo
> and would not be controllable. (This is certainly a possibility with
> -blockdev).
> 
> With -blockdev we also get a potentially long-running blockdev-create
> job which is not bound to any disk as part of kicking of a snapshot or
> block copy job. This one might also get stuck and in the current state
> is not really controllable.
> 
> Additionally the upcomming block-backup job will be a combination of the
> above. It's a job which spans multiple disks (thus not really a block
> job in libvirt terminology) but not a domain job either as there
> can be potentially more than one block backup job. The proposal for
> block-backup introduces it's own single-purpose set of APIs for managing
> the backup job only, but abuses the block job and domain job events to
> distribute the async state updates.

At the bare minimum, a push job implementation has to send async state
update events, but a pull job does not. If the backup API goes into
5.6.0 but not your new job control APIs, we can limit the qemu
implementation to JUST pull backups for now (the XML has been shown to
be extensible to add in push support once we also have sane APIs for
managing async state updates with push support).

> 
> With this series I want to introduce a set of APIs for managing the jobs
> which are designed to be universal enough and a new event so that noting
> will try to retrofit onto existing infrastructure.
> 
> An example of the job XML would be:
> 
> 
>   
> vda
> vda[1]
> vda[5]
>   

Where what forms a valid  is dependent on the type='...' of the
, correct?

>   
>12345
>12345
>   
> 

Looks reasonable.

> 
> but this will be mostly a topic for the second part of this excercise
> after we discuss the APIs.
> 
> The new infrastructure will also allow adding a flag for all the
> existing APIs which kick-off a job so that the job will persist even
> after it finishes. This will also properly implement the statistics for
> a finished migration and similar.
> 
> Obviously we will need to take special care when wiring up these so that
> the old APIs work for old situations and also the events are reported
> correctly.
> 
> The initial idea would be to implement the stats XML both for the domain
> jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
> so that they won't have to infer whether the given job type is already
> reported in the new API.
> 
> Additionally we can also implement flags for the XML getter API that
> will skip the stats gathering as that may require monitor interactions.
> Also one possibility would be to return an abbreviated XML in the
> listing API.

Makes sense.

> ---
>  include/libvirt/libvirt-domain.h | 91 +++
>  src/libvirt-domain.c | 94 
>  2 files changed, 185 insertions(+)
> 
> diff --git a/include/libvirt/libvirt-domain.h 
> b/include/libvirt/libvirt-domain.h
> index 2dbd74d4f3..dac1be 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -4485,6 +4485,28 @@ typedef void 
> (*virConnectDomainEventBlockThresholdCallback)(virConnectPtr conn,
>  unsigned long 
> long excess,
>  void *opaque);
> 
> +/**
> + * virConnectDomainEventJobStateCallback:
> + * @conn: connection object
> + * @dom: domain on which the event occurred
> + * @jobname: name of job which changed state
> + * @jobtype: type of the job
> + * @newstate: the new state the job entered
> + * @opaque: application specified data
> + *
> + * The 

[libvirt] [RFC] New domain job control and stat APIs

2019-07-10 Thread Peter Krempa
Currently we don't have a consolidated approach for managing
asynchronous long-running domain jobs. Historically there were
long-running jobs which interlocked with each other and thus there was
only one such job possible at given time (migration, save, restore, dump)

These jobs have a not very flexible set of APIs:
virDomainGetJobInfo, virDomainGetJobStats, virDomainAbortJob.

These don't really allow selecting which job to terminate since there's
only one, thus if we wanted to add different kinds of jobs which not
necessarily interlock but are able to run in parallel we had to
introduce another set of APIs.

This resulted into creation of block job APIs:
virDomainBlockJobAbort, virDomainGetBlockJobInfo

These allow parallel jobs (discriminated by disk to which the job
belongs) but are not universal and nor allow parallel jobs on a single
disk.

Similarly blockjobs can also become detached from the disk e.g. if the
guest unplugs the disk fronted. That way the job would linger in a limbo
and would not be controllable. (This is certainly a possibility with
-blockdev).

With -blockdev we also get a potentially long-running blockdev-create
job which is not bound to any disk as part of kicking of a snapshot or
block copy job. This one might also get stuck and in the current state
is not really controllable.

Additionally the upcomming block-backup job will be a combination of the
above. It's a job which spans multiple disks (thus not really a block
job in libvirt terminology) but not a domain job either as there
can be potentially more than one block backup job. The proposal for
block-backup introduces it's own single-purpose set of APIs for managing
the backup job only, but abuses the block job and domain job events to
distribute the async state updates.

With this series I want to introduce a set of APIs for managing the jobs
which are designed to be universal enough and a new event so that noting
will try to retrofit onto existing infrastructure.

An example of the job XML would be:


  
vda
vda[1]
vda[5]
  
  
   12345
   12345
  


but this will be mostly a topic for the second part of this excercise
after we discuss the APIs.

The new infrastructure will also allow adding a flag for all the
existing APIs which kick-off a job so that the job will persist even
after it finishes. This will also properly implement the statistics for
a finished migration and similar.

Obviously we will need to take special care when wiring up these so that
the old APIs work for old situations and also the events are reported
correctly.

The initial idea would be to implement the stats XML both for the domain
jobs (migration, dump) and blockjobs to simplify the job for mgmt apps
so that they won't have to infer whether the given job type is already
reported in the new API.

Additionally we can also implement flags for the XML getter API that
will skip the stats gathering as that may require monitor interactions.
Also one possibility would be to return an abbreviated XML in the
listing API.
---
 include/libvirt/libvirt-domain.h | 91 +++
 src/libvirt-domain.c | 94 
 2 files changed, 185 insertions(+)

diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
index 2dbd74d4f3..dac1be 100644
--- a/include/libvirt/libvirt-domain.h
+++ b/include/libvirt/libvirt-domain.h
@@ -4485,6 +4485,28 @@ typedef void 
(*virConnectDomainEventBlockThresholdCallback)(virConnectPtr conn,
 unsigned long long 
excess,
 void *opaque);

+/**
+ * virConnectDomainEventJobStateCallback:
+ * @conn: connection object
+ * @dom: domain on which the event occurred
+ * @jobname: name of job which changed state
+ * @jobtype: type of the job
+ * @newstate: the new state the job entered
+ * @opaque: application specified data
+ *
+ * The callback occurs when a long running domain job (see virDomainJobList)
+ * changes state.
+ *
+ * The callback signature to use when registering for an event of type
+ * VIR_DOMAIN_EVENT_ID_JOB_STATE with virConnectDomainEventRegisterAny()
+ */
+typedef void (*virConnectDomainEventJobStateCallback)(virConnectPtr conn,
+  virDomainPtr dom,
+  const char *jobname,
+  virDomainJobType jobtype,
+  virDomainJobState 
newstate,
+  void *opaque);
+
 /**
  * VIR_DOMAIN_EVENT_CALLBACK:
  *
@@ -4527,6 +4549,7 @@ typedef enum {
 VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED = 22, /* 
virConnectDomainEventDeviceRemovalFailedCallback */
 VIR_DOMAIN_EVENT_ID_METADATA_CHANGE = 23, /* 
virConnectDomainEventMetadataChangeCallback */