RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-03 Thread Zhang, Chen



> -Original Message-
> From: Zhanghailiang 
> Sent: Wednesday, June 3, 2020 5:39 PM
> To: Zhang, Chen ; Dr . David Alan Gilbert
> ; Juan Quintela ; qemu-dev
> 
> Cc: Zhang Chen ; Jason Wang
> 
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> > -Original Message-
> > From: Zhang, Chen [mailto:chen.zh...@intel.com]
> > Sent: Wednesday, June 3, 2020 5:11 PM
> > To: Zhanghailiang ; Dr . David Alan
> > Gilbert ; Juan Quintela ;
> > qemu-dev 
> > Cc: Zhang Chen ; Jason Wang
> 
> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > request into one.
> >
> >
> >
> > > -Original Message-
> > > From: Zhanghailiang 
> > > Sent: Tuesday, June 2, 2020 2:59 PM
> > > To: Zhang, Chen ; Dr . David Alan Gilbert
> > > ; Juan Quintela ; qemu-
> dev
> > > 
> > > Cc: Zhang Chen ; Jason Wang
> > 
> > > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > > request into one.
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Zhang Chen [mailto:chen.zh...@intel.com]
> > > > Sent: Friday, May 15, 2020 12:28 PM
> > > > To: Dr . David Alan Gilbert ; Juan Quintela
> > > > ; Zhanghailiang
> > > ;
> > > > qemu-dev 
> > > > Cc: Zhang Chen ; Jason Wang
> > > > ; Zhang Chen 
> > > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > > > request into one.
> > > >
> > > > From: Zhang Chen 
> > > >
> > > > When COLO guest occur issues, COLO-compare will catch lots of
> > > > different network packet and trigger notification multi times,
> > > > force periodic may happen at the same time. So this can be
> > > > efficient merge checkpoint request within
> COLO_CHECKPOINT_INTERVAL.
> > > >
> > > > Signed-off-by: Zhang Chen 
> > > > ---
> > > >  migration/colo.c | 22 --
> > > >  1 file changed, 16 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/migration/colo.c b/migration/colo.c index
> > > > d5bced22cb..e6a7d8c6e2 100644
> > > > --- a/migration/colo.c
> > > > +++ b/migration/colo.c
> > > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> > > >
> > > >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> > > >
> > > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > > > +COLO_CHECKPOINT_INTERVAL 1000
> > > > +
> > > >  bool migration_in_colo_state(void)  {
> > > >  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20
> > > > @@
> > > > out:
> > > >  void colo_checkpoint_notify(void *opaque)  {
> > > >  MigrationState *s = opaque;
> > > > -int64_t next_notify_time;
> > > > +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > >
> > > > -qemu_sem_post(&s->colo_checkpoint_sem);
> > > > -s->colo_checkpoint_time =
> > qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > > -next_notify_time = s->colo_checkpoint_time +
> > > > -s->parameters.x_checkpoint_delay;
> > > > -timer_mod(s->colo_delay_timer, next_notify_time);
> > > > +/*
> > > > + * When COLO guest occur issues, COLO-compare will catch lots of
> > > > + * different network packet and trigger notification multi times,
> > > > + * force periodic may happen at the same time. So this can be
> > > > + * efficient merge checkpoint request within
> > > > COLO_CHECKPOINT_INTERVAL.
> > > > + */
> > > > +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL)
> > {
> > > > +qemu_sem_post(&s->colo_checkpoint_sem);
> > >
> > > It is not right here, this notification should not be controlled by
> > > the interval time, I got what happened here, when multiple
> > > checkpoint requires come, this Colo_delay_time will be added every
> > > time and it will be a big value which is not what we want.
> >
> > Not just this, multi checkpoint will spend lots of resource to sync
> > memory from PVM to SVM, It will make VM stop/start multi times, but
> > for the results are same with one checkpoint.
> > So in short time just need one checkpoint, because do checkpoint still
> > need some time...
> >
> 
> Yes, this because we use semaphore here, it will be increased multiple times,
> And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore'
> has fixed this problem.
> Did you try the qemu upstream which has merged this patch ?

Oh, Thanks reminder, I will drop this patch and rebase on upstream for V2.

Thank
Zhang Chen

> 
> > Thanks
> > Zhang Chen
> >
> > >
> > > Besides, please update this patch based on [PATCH 0/6] colo:
> > > migration related bugfixes series which Has modified the same place.
> > >
> > >
> > >
> > > > +timer_mod(s->colo_delay_timer, now +
> > > > +  s->parameters.x_checkpoint_delay);
> > > > +s->colo_checkpoint_time = now;
> > > > +}
> > > >  }
> > > >
> > > >  void migrate_start_colo_process(MigrationState *s)
> > > > --
> > > > 2.17.1




RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-03 Thread Zhanghailiang
> -Original Message-
> From: Zhang, Chen [mailto:chen.zh...@intel.com]
> Sent: Wednesday, June 3, 2020 5:11 PM
> To: Zhanghailiang ; Dr . David Alan
> Gilbert ; Juan Quintela ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> 
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> 
> 
> > -Original Message-
> > From: Zhanghailiang 
> > Sent: Tuesday, June 2, 2020 2:59 PM
> > To: Zhang, Chen ; Dr . David Alan Gilbert
> > ; Juan Quintela ; qemu-dev
> > 
> > Cc: Zhang Chen ; Jason Wang
> 
> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > request into one.
> >
> >
> >
> > > -Original Message-
> > > From: Zhang Chen [mailto:chen.zh...@intel.com]
> > > Sent: Friday, May 15, 2020 12:28 PM
> > > To: Dr . David Alan Gilbert ; Juan Quintela
> > > ; Zhanghailiang
> > ;
> > > qemu-dev 
> > > Cc: Zhang Chen ; Jason Wang
> > > ; Zhang Chen 
> > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> > > into one.
> > >
> > > From: Zhang Chen 
> > >
> > > When COLO guest occur issues, COLO-compare will catch lots of
> > > different network packet and trigger notification multi times, force
> > > periodic may happen at the same time. So this can be efficient merge
> > > checkpoint request within COLO_CHECKPOINT_INTERVAL.
> > >
> > > Signed-off-by: Zhang Chen 
> > > ---
> > >  migration/colo.c | 22 --
> > >  1 file changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > d5bced22cb..e6a7d8c6e2 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> > >
> > >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> > >
> > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > > +COLO_CHECKPOINT_INTERVAL 1000
> > > +
> > >  bool migration_in_colo_state(void)
> > >  {
> > >  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20
> > > @@
> > > out:
> > >  void colo_checkpoint_notify(void *opaque)  {
> > >  MigrationState *s = opaque;
> > > -int64_t next_notify_time;
> > > +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > >
> > > -qemu_sem_post(&s->colo_checkpoint_sem);
> > > -s->colo_checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > -next_notify_time = s->colo_checkpoint_time +
> > > -s->parameters.x_checkpoint_delay;
> > > -timer_mod(s->colo_delay_timer, next_notify_time);
> > > +/*
> > > + * When COLO guest occur issues, COLO-compare will catch lots of
> > > + * different network packet and trigger notification multi times,
> > > + * force periodic may happen at the same time. So this can be
> > > + * efficient merge checkpoint request within
> > > COLO_CHECKPOINT_INTERVAL.
> > > + */
> > > +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL)
> {
> > > +qemu_sem_post(&s->colo_checkpoint_sem);
> >
> > It is not right here, this notification should not be controlled by
> > the interval time, I got what happened here, when multiple checkpoint
> > requires come, this Colo_delay_time will be added every time and it
> > will be a big value which is not what we want.
> 
> Not just this, multi checkpoint will spend lots of resource to sync memory
> from PVM to SVM, It will make VM stop/start multi times, but for the results
> are same with one checkpoint.
> So in short time just need one checkpoint, because do checkpoint still need
> some time...
> 

Yes, this because we use semaphore here, it will be increased multiple times,
And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore' 
has fixed this problem.
Did you try the qemu upstream which has merged this patch ?

> Thanks
> Zhang Chen
> 
> >
> > Besides, please update this patch based on [PATCH 0/6] colo: migration
> > related bugfixes series which Has modified the same place.
> >
> >
> >
> > > +timer_mod(s->colo_delay_timer, now +
> > > +  s->parameters.x_checkpoint_delay);
> > > +s->colo_checkpoint_time = now;
> > > +}
> > >  }
> > >
> > >  void migrate_start_colo_process(MigrationState *s)
> > > --
> > > 2.17.1




RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-03 Thread Zhang, Chen



> -Original Message-
> From: Zhanghailiang 
> Sent: Tuesday, June 2, 2020 2:59 PM
> To: Zhang, Chen ; Dr . David Alan Gilbert
> ; Juan Quintela ; qemu-dev
> 
> Cc: Zhang Chen ; Jason Wang
> 
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> 
> 
> > -Original Message-
> > From: Zhang Chen [mailto:chen.zh...@intel.com]
> > Sent: Friday, May 15, 2020 12:28 PM
> > To: Dr . David Alan Gilbert ; Juan Quintela
> > ; Zhanghailiang
> ;
> > qemu-dev 
> > Cc: Zhang Chen ; Jason Wang
> > ; Zhang Chen 
> > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> > into one.
> >
> > From: Zhang Chen 
> >
> > When COLO guest occur issues, COLO-compare will catch lots of
> > different network packet and trigger notification multi times, force
> > periodic may happen at the same time. So this can be efficient merge
> > checkpoint request within COLO_CHECKPOINT_INTERVAL.
> >
> > Signed-off-by: Zhang Chen 
> > ---
> >  migration/colo.c | 22 --
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c index
> > d5bced22cb..e6a7d8c6e2 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> >
> >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> >
> > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > +COLO_CHECKPOINT_INTERVAL 1000
> > +
> >  bool migration_in_colo_state(void)
> >  {
> >  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@
> > out:
> >  void colo_checkpoint_notify(void *opaque)  {
> >  MigrationState *s = opaque;
> > -int64_t next_notify_time;
> > +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >
> > -qemu_sem_post(&s->colo_checkpoint_sem);
> > -s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > -next_notify_time = s->colo_checkpoint_time +
> > -s->parameters.x_checkpoint_delay;
> > -timer_mod(s->colo_delay_timer, next_notify_time);
> > +/*
> > + * When COLO guest occur issues, COLO-compare will catch lots of
> > + * different network packet and trigger notification multi times,
> > + * force periodic may happen at the same time. So this can be
> > + * efficient merge checkpoint request within
> > COLO_CHECKPOINT_INTERVAL.
> > + */
> > +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
> > +qemu_sem_post(&s->colo_checkpoint_sem);
> 
> It is not right here, this notification should not be controlled by the 
> interval
> time, I got what happened here, when multiple checkpoint requires come,
> this Colo_delay_time will be added every time and it will be a big value which
> is not what we want.

Not just this, multi checkpoint will spend lots of resource to sync memory from 
PVM to SVM,
It will make VM stop/start multi times, but for the results are same with one 
checkpoint. 
So in short time just need one checkpoint, because do checkpoint still need 
some time...

Thanks
Zhang Chen

> 
> Besides, please update this patch based on [PATCH 0/6] colo: migration
> related bugfixes series which Has modified the same place.
> 
> 
> 
> > +timer_mod(s->colo_delay_timer, now +
> > +  s->parameters.x_checkpoint_delay);
> > +s->colo_checkpoint_time = now;
> > +}
> >  }
> >
> >  void migrate_start_colo_process(MigrationState *s)
> > --
> > 2.17.1




RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request into one.

2020-06-02 Thread Zhanghailiang



> -Original Message-
> From: Zhang Chen [mailto:chen.zh...@intel.com]
> Sent: Friday, May 15, 2020 12:28 PM
> To: Dr . David Alan Gilbert ; Juan Quintela
> ; Zhanghailiang ;
> qemu-dev 
> Cc: Zhang Chen ; Jason Wang
> ; Zhang Chen 
> Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request into
> one.
> 
> From: Zhang Chen 
> 
> When COLO guest occur issues, COLO-compare will catch lots of different
> network packet and trigger notification multi times, force periodic may
> happen at the same time. So this can be efficient merge checkpoint request
> within COLO_CHECKPOINT_INTERVAL.
> 
> Signed-off-by: Zhang Chen 
> ---
>  migration/colo.c | 22 --
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> d5bced22cb..e6a7d8c6e2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> 
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> 
> +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> +COLO_CHECKPOINT_INTERVAL 1000
> +
>  bool migration_in_colo_state(void)
>  {
>  MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@
> out:
>  void colo_checkpoint_notify(void *opaque)  {
>  MigrationState *s = opaque;
> -int64_t next_notify_time;
> +int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> 
> -qemu_sem_post(&s->colo_checkpoint_sem);
> -s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -next_notify_time = s->colo_checkpoint_time +
> -s->parameters.x_checkpoint_delay;
> -timer_mod(s->colo_delay_timer, next_notify_time);
> +/*
> + * When COLO guest occur issues, COLO-compare will catch lots of
> + * different network packet and trigger notification multi times,
> + * force periodic may happen at the same time. So this can be
> + * efficient merge checkpoint request within
> COLO_CHECKPOINT_INTERVAL.
> + */
> +if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
> +qemu_sem_post(&s->colo_checkpoint_sem);

It is not right here, this notification should not be controlled by the 
interval time,
I got what happened here, when multiple checkpoint requires come, this
Colo_delay_time will be added every time and it will be a big value which is 
not what we want.

Besides, please update this patch based on [PATCH 0/6] colo: migration related 
bugfixes series which
Has modified the same place.



> +timer_mod(s->colo_delay_timer, now +
> +  s->parameters.x_checkpoint_delay);
> +s->colo_checkpoint_time = now;
> +}
>  }
> 
>  void migrate_start_colo_process(MigrationState *s)
> --
> 2.17.1