Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-03-07 Thread Leonardo Bras Soares Passos
On Tue, Mar 1, 2022 at 12:57 AM Peter Xu  wrote:
>
> On Fri, Feb 18, 2022 at 05:57:13PM +0100, Juan Quintela wrote:
> > I did a change on:
> >
> > commit d48c3a044537689866fe44e65d24c7d39a68868a
> > Author: Juan Quintela 
> > Date:   Fri Nov 19 15:35:58 2021 +0100
> >
> > multifd: Use a single writev on the send side
> >
> > Until now, we wrote the packet header with write(), and the rest of the
> > pages with writev().  Just increase the size of the iovec and do a
> > single writev().
> >
> > Signed-off-by: Juan Quintela 
> > Reviewed-by: Dr. David Alan Gilbert 
> >
> > And now we need to "perserve" this header until we do the sync,
> > otherwise we are overwritting it with other things.
> >
> > What testing have you done after this commit?
> >
> > Notice that it is not _complicated_ to fix it, I will try to come with
> > some idea on monday, but basically is having an array of buffers for
> > each thread, and store them until we call a sync().
>
> Or can we conditionally merge the two write()s?  IMHO the array of buffers
> idea sounds too complicated, and I'm not extremely sure whether it'll pay
> off at last.  We could keep the two write()s with ZEROCOPY enabled, and use
> the merged version otherwise.

I think that's a great idea!
It would optimize the non-zerocopy version while letting us have a
simpler zerocopy implementation.
The array of buffers implementation would either require us to have a
'large' amount of memory for keeping the headers, or having flush
happening too often.

>
> Btw, is there any performance measurements for above commit d48c3a044537?
> I had a feeling that the single write() may not help that much, because for
> multifd the bottleneck should be on the nic not on the processor.

I am quite curious about those numbers too.

>
> IOW, we could find that the major time used does not fall into the
> user<->kernel switches (which is where the extra overhead of write()
> syscall, iiuc), but we simply blocked on any of the write()s because the
> socket write buffer is full...  So we could have saved some cpu cycles by
> merging the calls, but performance-wise we may not get much.
>
> Thanks,
>
> --
> Peter Xu
>

Thanks Peter!




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-28 Thread Peter Xu
On Fri, Feb 18, 2022 at 05:57:13PM +0100, Juan Quintela wrote:
> I did a change on:
> 
> commit d48c3a044537689866fe44e65d24c7d39a68868a
> Author: Juan Quintela 
> Date:   Fri Nov 19 15:35:58 2021 +0100
> 
> multifd: Use a single writev on the send side
> 
> Until now, we wrote the packet header with write(), and the rest of the
> pages with writev().  Just increase the size of the iovec and do a
> single writev().
> 
> Signed-off-by: Juan Quintela 
> Reviewed-by: Dr. David Alan Gilbert 
> 
> And now we need to "perserve" this header until we do the sync,
> otherwise we are overwritting it with other things.
> 
> What testing have you done after this commit?
> 
> Notice that it is not _complicated_ to fix it, I will try to come with
> some idea on monday, but basically is having an array of buffers for
> each thread, and store them until we call a sync().

Or can we conditionally merge the two write()s?  IMHO the array of buffers
idea sounds too complicated, and I'm not extremely sure whether it'll pay
off at last.  We could keep the two write()s with ZEROCOPY enabled, and use
the merged version otherwise.

Btw, is there any performance measurements for above commit d48c3a044537?
I had a feeling that the single write() may not help that much, because for
multifd the bottleneck should be on the nic not on the processor.

IOW, we could find that the major time used does not fall into the
user<->kernel switches (which is where the extra overhead of write()
syscall, iiuc), but we simply blocked on any of the write()s because the
socket write buffer is full...  So we could have saved some cpu cycles by
merging the calls, but performance-wise we may not get much.

Thanks,

-- 
Peter Xu




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-21 Thread Leonardo Bras Soares Passos
On Mon, Feb 21, 2022 at 4:41 PM Leonardo Bras Soares Passos
 wrote:
>
> Hello Juan, thanks for the feedback!
>
> On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela  wrote:
> >
> > Leonardo Bras  wrote:
> > > Implement zero copy send on nocomp_send_write(), by making use of 
> > > QIOChannel
> > > writev + flags & flush interface.
> > >
> > > Change multifd_send_sync_main() so flush_zero_copy() can be called
> > > after each iteration in order to make sure all dirty pages are sent before
> > > a new iteration is started. It will also flush at the beginning and at the
> > > end of migration.
> > >
> > > Also make it return -1 if flush_zero_copy() fails, in order to cancel
> > > the migration process, and avoid resuming the guest in the target host
> > > without receiving all current RAM.
> > >
> > > This will work fine on RAM migration because the RAM pages are not 
> > > usually freed,
> > > and there is no problem on changing the pages content between 
> > > writev_zero_copy() and
> > > the actual sending of the buffer, because this change will dirty the page 
> > > and
> > > cause it to be re-sent on a next iteration anyway.
> > >
> > > A lot of locked memory may be needed in order to use multid migration
> >^^
> > multifd.
> >
> > I can fix it on the commit.
>
> No worries, fixed for v9.
>
> >
> >
> > > @@ -1479,7 +1479,16 @@ static bool 
> > > migrate_params_check(MigrationParameters *params, Error **errp)
> > >  error_prepend(errp, "Invalid mapping given for 
> > > block-bitmap-mapping: ");
> > >  return false;
> > >  }
> > > -
> > > +#ifdef CONFIG_LINUX
> > > +if (params->zero_copy_send &&
> > > +(!migrate_use_multifd() ||
> > > + params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > > + (params->tls_creds && *params->tls_creds))) {
> > > +error_setg(errp,
> > > +   "Zero copy only available for non-compressed non-TLS 
> > > multifd migration");
> > > +return false;
> > > +}
> > > +#endif
> > >  return true;
> > >  }
> >
> > Test is long, but it is exactly what we need.  Good.
>
> Thanks!
>
>
> >
> >
> > >
> > > diff --git a/migration/multifd.c b/migration/multifd.c
> > > index 43998ad117..2d68b9cf4f 100644
> > > --- a/migration/multifd.c
> > > +++ b/migration/multifd.c
> > > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
> > >  multifd_send_state = NULL;
> > >  }
> > >
> > > -void multifd_send_sync_main(QEMUFile *f)
> > > +int multifd_send_sync_main(QEMUFile *f)
> > >  {
> > >  int i;
> > > +bool flush_zero_copy;
> > >
> > >  if (!migrate_use_multifd()) {
> > > -return;
> > > +return 0;
> > >  }
> > >  if (multifd_send_state->pages->num) {
> > >  if (multifd_send_pages(f) < 0) {
> > >  error_report("%s: multifd_send_pages fail", __func__);
> > > -return;
> > > +return 0;
> > >  }
> > >  }
> > > +
> > > +/*
> > > + * When using zero-copy, it's necessary to flush after each 
> > > iteration to
> > > + * make sure pages from earlier iterations don't end up replacing 
> > > newer
> > > + * pages.
> > > + */
> > > +flush_zero_copy = migrate_use_zero_copy_send();
> > > +
> > >  for (i = 0; i < migrate_multifd_channels(); i++) {
> > >  MultiFDSendParams *p = _send_state->params[i];
> > >
> > > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> > >  if (p->quit) {
> > >  error_report("%s: channel %d has already quit", __func__, i);
> > >  qemu_mutex_unlock(>mutex);
> > > -return;
> > > +return 0;
> > >  }
> > >
> > >  p->packet_num = multifd_send_state->packet_num++;
> > > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
> > >  ram_counters.transferred += p->packet_len;
> > >  qemu_mutex_unlock(>mutex);
> > >  qemu_sem_post(>sem);
> > > +
> > > +if (flush_zero_copy) {
> > > +int ret;
> > > +Error *err = NULL;
> > > +
> > > +ret = qio_channel_flush(p->c, );
> > > +if (ret < 0) {
> > > +error_report_err(err);
> > > +return -1;
> > > +}
> > > +}
> > >  }
> > >  for (i = 0; i < migrate_multifd_channels(); i++) {
> > >  MultiFDSendParams *p = _send_state->params[i];
> > > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
> > >  qemu_sem_wait(>sem_sync);
> > >  }
> > >  trace_multifd_send_sync_main(multifd_send_state->packet_num);
> > > +
> > > +return 0;
> > >  }
> >
> > We are leaving pages is flight for potentially a lot of time. I *think*
> > that we can sync shorter than that.
> >
> > >  static void *multifd_send_thread(void *opaque)
> > > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
> > >  p->iov[0].iov_len = 

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-21 Thread Leonardo Bras Soares Passos
On Fri, Feb 18, 2022 at 2:36 PM Juan Quintela  wrote:
>
> Leonardo Bras Soares Passos  wrote:
> > Hello Peter, thanks for reviewing!
> >
> > On Mon, Feb 7, 2022 at 11:22 PM Peter Xu  wrote:
> >>
> >> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> >> > -void multifd_send_sync_main(QEMUFile *f)
> >> > +int multifd_send_sync_main(QEMUFile *f)
> >> >  {
> >> >  int i;
> >> > +bool flush_zero_copy;
> >> >
> >> >  if (!migrate_use_multifd()) {
> >> > -return;
> >> > +return 0;
> >> >  }
> >> >  if (multifd_send_state->pages->num) {
> >> >  if (multifd_send_pages(f) < 0) {
> >> >  error_report("%s: multifd_send_pages fail", __func__);
> >> > -return;
> >> > +return 0;
> >>
> >> I've not checked how it used to do if multifd_send_pages() failed, but.. 
> >> should
> >> it returns -1 rather than 0 when there will be a return code?
> >
> > Yeah, that makes sense.
> > The point here is that I was trying not to modify much of the current 
> > behavior.
>
> if (qatomic_read(_send_state->exiting)) {
> return -1;
> }
>
> if (p->quit) {
> error_report("%s: channel %d has already quit!", __func__, i);
> qemu_mutex_unlock(>mutex);
> return -1;
> }
>
> This are the only two cases where the current code can return one
> error.  In the 1st case we are exiting, we are already in the middle of
> finishing, so we don't really care.
> In the second one, we have already quit, and error as already quite big.
>
> But I agree with both comments:
> - we need to improve the error paths
> - leonardo changes don't affect what is already there.
>



>
> > I mean, multifd_send_sync_main() would previously return void, so any
> > other errors would not matter to the caller of this function, which
> > will continue to run as if nothing happened.
> >
> > Now, if it fails with flush_zero_copy, the operation needs to be aborted.
> >
> > Maybe, I should make it different:
> > - In any error, return -1.
> > - Create/use a specific error code in the case of a failing
> > flush_zero_copy, so I can test the return value for it on the caller
> > function and return early.
>
> We need to add the check.  It don't matter if the problem is zero_copy
> or the existing one, we are under a minor catastrophe and migration has
> to be aborted.

Ok, I will fix that so we can abort in case of any error.
Maybe it's better to do that on a separated patch, before 5/5, right?

>
> > Or alternatively, the other errors could also return early, but since
> > this will change how the code currently works, I would probably need
> > another patch for that change. (so it can be easily reverted if
> > needed)
> >
> > What do you think is better?
> >
> >
> >> >  }
> >> >  }
> >> > +
> >> > +/*
> >> > + * When using zero-copy, it's necessary to flush after each 
> >> > iteration to
> >> > + * make sure pages from earlier iterations don't end up replacing 
> >> > newer
> >> > + * pages.
> >> > + */
> >> > +flush_zero_copy = migrate_use_zero_copy_send();
> >> > +
> >> >  for (i = 0; i < migrate_multifd_channels(); i++) {
> >> >  MultiFDSendParams *p = _send_state->params[i];
> >> >
> >> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >> >  if (p->quit) {
> >> >  error_report("%s: channel %d has already quit", __func__, 
> >> > i);
> >> >  qemu_mutex_unlock(>mutex);
> >> > -return;
> >> > +return 0;
> >>
> >> Same question here.
> >
> > Please see above,
> >
> >>
> >> >  }
> >>
> >> The rest looks good.  Thanks,
>
> Later, Juan.
>

Thanks for the feedback!

Best regards,
Leo




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-21 Thread Leonardo Bras Soares Passos
Hello Juan, thanks for the feedback!

On Fri, Feb 18, 2022 at 1:57 PM Juan Quintela  wrote:
>
> Leonardo Bras  wrote:
> > Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
> > writev + flags & flush interface.
> >
> > Change multifd_send_sync_main() so flush_zero_copy() can be called
> > after each iteration in order to make sure all dirty pages are sent before
> > a new iteration is started. It will also flush at the beginning and at the
> > end of migration.
> >
> > Also make it return -1 if flush_zero_copy() fails, in order to cancel
> > the migration process, and avoid resuming the guest in the target host
> > without receiving all current RAM.
> >
> > This will work fine on RAM migration because the RAM pages are not usually 
> > freed,
> > and there is no problem on changing the pages content between 
> > writev_zero_copy() and
> > the actual sending of the buffer, because this change will dirty the page 
> > and
> > cause it to be re-sent on a next iteration anyway.
> >
> > A lot of locked memory may be needed in order to use multid migration
>^^
> multifd.
>
> I can fix it on the commit.

No worries, fixed for v9.

>
>
> > @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters 
> > *params, Error **errp)
> >  error_prepend(errp, "Invalid mapping given for 
> > block-bitmap-mapping: ");
> >  return false;
> >  }
> > -
> > +#ifdef CONFIG_LINUX
> > +if (params->zero_copy_send &&
> > +(!migrate_use_multifd() ||
> > + params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> > + (params->tls_creds && *params->tls_creds))) {
> > +error_setg(errp,
> > +   "Zero copy only available for non-compressed non-TLS 
> > multifd migration");
> > +return false;
> > +}
> > +#endif
> >  return true;
> >  }
>
> Test is long, but it is exactly what we need.  Good.

Thanks!


>
>
> >
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 43998ad117..2d68b9cf4f 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
> >  multifd_send_state = NULL;
> >  }
> >
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f)
> >  {
> >  int i;
> > +bool flush_zero_copy;
> >
> >  if (!migrate_use_multifd()) {
> > -return;
> > +return 0;
> >  }
> >  if (multifd_send_state->pages->num) {
> >  if (multifd_send_pages(f) < 0) {
> >  error_report("%s: multifd_send_pages fail", __func__);
> > -return;
> > +return 0;
> >  }
> >  }
> > +
> > +/*
> > + * When using zero-copy, it's necessary to flush after each iteration 
> > to
> > + * make sure pages from earlier iterations don't end up replacing newer
> > + * pages.
> > + */
> > +flush_zero_copy = migrate_use_zero_copy_send();
> > +
> >  for (i = 0; i < migrate_multifd_channels(); i++) {
> >  MultiFDSendParams *p = _send_state->params[i];
> >
> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >  if (p->quit) {
> >  error_report("%s: channel %d has already quit", __func__, i);
> >  qemu_mutex_unlock(>mutex);
> > -return;
> > +return 0;
> >  }
> >
> >  p->packet_num = multifd_send_state->packet_num++;
> > @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
> >  ram_counters.transferred += p->packet_len;
> >  qemu_mutex_unlock(>mutex);
> >  qemu_sem_post(>sem);
> > +
> > +if (flush_zero_copy) {
> > +int ret;
> > +Error *err = NULL;
> > +
> > +ret = qio_channel_flush(p->c, );
> > +if (ret < 0) {
> > +error_report_err(err);
> > +return -1;
> > +}
> > +}
> >  }
> >  for (i = 0; i < migrate_multifd_channels(); i++) {
> >  MultiFDSendParams *p = _send_state->params[i];
> > @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
> >  qemu_sem_wait(>sem_sync);
> >  }
> >  trace_multifd_send_sync_main(multifd_send_state->packet_num);
> > +
> > +return 0;
> >  }
>
> We are leaving pages is flight for potentially a lot of time. I *think*
> that we can sync shorter than that.
>
> >  static void *multifd_send_thread(void *opaque)
> > @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
> >  p->iov[0].iov_len = p->packet_len;
> >  p->iov[0].iov_base = p->packet;
> >
> > -ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> > - _err);
> > +ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> > NULL,
> > +  0, p->write_flags, 
> > _err);
> >   

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-18 Thread Juan Quintela
Leonardo Bras Soares Passos  wrote:
> Hello Peter, thanks for reviewing!
>
> On Mon, Feb 7, 2022 at 11:22 PM Peter Xu  wrote:
>>
>> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
>> > -void multifd_send_sync_main(QEMUFile *f)
>> > +int multifd_send_sync_main(QEMUFile *f)
>> >  {
>> >  int i;
>> > +bool flush_zero_copy;
>> >
>> >  if (!migrate_use_multifd()) {
>> > -return;
>> > +return 0;
>> >  }
>> >  if (multifd_send_state->pages->num) {
>> >  if (multifd_send_pages(f) < 0) {
>> >  error_report("%s: multifd_send_pages fail", __func__);
>> > -return;
>> > +return 0;
>>
>> I've not checked how it used to do if multifd_send_pages() failed, but.. 
>> should
>> it returns -1 rather than 0 when there will be a return code?
>
> Yeah, that makes sense.
> The point here is that I was trying not to modify much of the current 
> behavior.

if (qatomic_read(_send_state->exiting)) {
return -1;
}

if (p->quit) {
error_report("%s: channel %d has already quit!", __func__, i);
qemu_mutex_unlock(>mutex);
return -1;
}

This are the only two cases where the current code can return one
error.  In the 1st case we are exiting, we are already in the middle of
finishing, so we don't really care.
In the second one, we have already quit, and error as already quite big.

But I agree with both comments:
- we need to improve the error paths
- leonardo changes don't affect what is already there.


> I mean, multifd_send_sync_main() would previously return void, so any
> other errors would not matter to the caller of this function, which
> will continue to run as if nothing happened.
>
> Now, if it fails with flush_zero_copy, the operation needs to be aborted.
>
> Maybe, I should make it different:
> - In any error, return -1.
> - Create/use a specific error code in the case of a failing
> flush_zero_copy, so I can test the return value for it on the caller
> function and return early.

We need to add the check.  It don't matter if the problem is zero_copy
or the existing one, we are under a minor catastrophe and migration has
to be aborted.

> Or alternatively, the other errors could also return early, but since
> this will change how the code currently works, I would probably need
> another patch for that change. (so it can be easily reverted if
> needed)
>
> What do you think is better?
>
>
>> >  }
>> >  }
>> > +
>> > +/*
>> > + * When using zero-copy, it's necessary to flush after each iteration 
>> > to
>> > + * make sure pages from earlier iterations don't end up replacing 
>> > newer
>> > + * pages.
>> > + */
>> > +flush_zero_copy = migrate_use_zero_copy_send();
>> > +
>> >  for (i = 0; i < migrate_multifd_channels(); i++) {
>> >  MultiFDSendParams *p = _send_state->params[i];
>> >
>> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
>> >  if (p->quit) {
>> >  error_report("%s: channel %d has already quit", __func__, i);
>> >  qemu_mutex_unlock(>mutex);
>> > -return;
>> > +return 0;
>>
>> Same question here.
>
> Please see above,
>
>>
>> >  }
>>
>> The rest looks good.  Thanks,

Later, Juan.




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-18 Thread Juan Quintela
Leonardo Bras  wrote:
> Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
> writev + flags & flush interface.
>
> Change multifd_send_sync_main() so flush_zero_copy() can be called
> after each iteration in order to make sure all dirty pages are sent before
> a new iteration is started. It will also flush at the beginning and at the
> end of migration.
>
> Also make it return -1 if flush_zero_copy() fails, in order to cancel
> the migration process, and avoid resuming the guest in the target host
> without receiving all current RAM.
>
> This will work fine on RAM migration because the RAM pages are not usually 
> freed,
> and there is no problem on changing the pages content between 
> writev_zero_copy() and
> the actual sending of the buffer, because this change will dirty the page and
> cause it to be re-sent on a next iteration anyway.
>
> A lot of locked memory may be needed in order to use multid migration
   ^^
multifd.

I can fix it on the commit.


> @@ -1479,7 +1479,16 @@ static bool migrate_params_check(MigrationParameters 
> *params, Error **errp)
>  error_prepend(errp, "Invalid mapping given for block-bitmap-mapping: 
> ");
>  return false;
>  }
> -
> +#ifdef CONFIG_LINUX
> +if (params->zero_copy_send &&
> +(!migrate_use_multifd() ||
> + params->multifd_compression != MULTIFD_COMPRESSION_NONE ||
> + (params->tls_creds && *params->tls_creds))) {
> +error_setg(errp,
> +   "Zero copy only available for non-compressed non-TLS 
> multifd migration");
> +return false;
> +}
> +#endif
>  return true;
>  }

Test is long, but it is exactly what we need.  Good.


>  
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 43998ad117..2d68b9cf4f 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -568,19 +568,28 @@ void multifd_save_cleanup(void)
>  multifd_send_state = NULL;
>  }
>  
> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f)
>  {
>  int i;
> +bool flush_zero_copy;
>  
>  if (!migrate_use_multifd()) {
> -return;
> +return 0;
>  }
>  if (multifd_send_state->pages->num) {
>  if (multifd_send_pages(f) < 0) {
>  error_report("%s: multifd_send_pages fail", __func__);
> -return;
> +return 0;
>  }
>  }
> +
> +/*
> + * When using zero-copy, it's necessary to flush after each iteration to
> + * make sure pages from earlier iterations don't end up replacing newer
> + * pages.
> + */
> +flush_zero_copy = migrate_use_zero_copy_send();
> +
>  for (i = 0; i < migrate_multifd_channels(); i++) {
>  MultiFDSendParams *p = _send_state->params[i];
>  
> @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
>  if (p->quit) {
>  error_report("%s: channel %d has already quit", __func__, i);
>  qemu_mutex_unlock(>mutex);
> -return;
> +return 0;
>  }
>  
>  p->packet_num = multifd_send_state->packet_num++;
> @@ -602,6 +611,17 @@ void multifd_send_sync_main(QEMUFile *f)
>  ram_counters.transferred += p->packet_len;
>  qemu_mutex_unlock(>mutex);
>  qemu_sem_post(>sem);
> +
> +if (flush_zero_copy) {
> +int ret;
> +Error *err = NULL;
> +
> +ret = qio_channel_flush(p->c, );
> +if (ret < 0) {
> +error_report_err(err);
> +return -1;
> +}
> +}
>  }
>  for (i = 0; i < migrate_multifd_channels(); i++) {
>  MultiFDSendParams *p = _send_state->params[i];
> @@ -610,6 +630,8 @@ void multifd_send_sync_main(QEMUFile *f)
>  qemu_sem_wait(>sem_sync);
>  }
>  trace_multifd_send_sync_main(multifd_send_state->packet_num);
> +
> +return 0;
>  }

We are leaving pages is flight for potentially a lot of time. I *think*
that we can sync shorter than that.

>  static void *multifd_send_thread(void *opaque)
> @@ -668,8 +690,8 @@ static void *multifd_send_thread(void *opaque)
>  p->iov[0].iov_len = p->packet_len;
>  p->iov[0].iov_base = p->packet;
>  
> -ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
> - _err);
> +ret = qio_channel_writev_full_all(p->c, p->iov, p->iovs_num, 
> NULL,
> +  0, p->write_flags, _err);
>  if (ret != 0) {
>  break;
>  }

I still think that it would be better to have a sync here in each
thread.  I don't know the size, but once every couple megabytes of RAM
or so.

I did a change on:

commit d48c3a044537689866fe44e65d24c7d39a68868a
Author: Juan Quintela 
Date:   Fri Nov 19 15:35:58 2021 +0100

multifd: Use a single writev on the send side


Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-07 Thread Peter Xu
On Mon, Feb 07, 2022 at 11:49:38PM -0300, Leonardo Bras Soares Passos wrote:
> Hello Peter, thanks for reviewing!
> 
> On Mon, Feb 7, 2022 at 11:22 PM Peter Xu  wrote:
> >
> > On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> > > -void multifd_send_sync_main(QEMUFile *f)
> > > +int multifd_send_sync_main(QEMUFile *f)
> > >  {
> > >  int i;
> > > +bool flush_zero_copy;
> > >
> > >  if (!migrate_use_multifd()) {
> > > -return;
> > > +return 0;
> > >  }
> > >  if (multifd_send_state->pages->num) {
> > >  if (multifd_send_pages(f) < 0) {
> > >  error_report("%s: multifd_send_pages fail", __func__);
> > > -return;
> > > +return 0;
> >
> > I've not checked how it used to do if multifd_send_pages() failed, but.. 
> > should
> > it returns -1 rather than 0 when there will be a return code?
> 
> Yeah, that makes sense.
> The point here is that I was trying not to modify much of the current 
> behavior.
> 
> I mean, multifd_send_sync_main() would previously return void, so any
> other errors would not matter to the caller of this function, which
> will continue to run as if nothing happened.
> 
> Now, if it fails with flush_zero_copy, the operation needs to be aborted.

Right, so how I understand is we'll fail anyway, and this allows us to fail
(probably) sooner.

> 
> Maybe, I should make it different:
> - In any error, return -1.
> - Create/use a specific error code in the case of a failing
> flush_zero_copy, so I can test the return value for it on the caller
> function and return early.
> 
> Or alternatively, the other errors could also return early, but since
> this will change how the code currently works, I would probably need
> another patch for that change. (so it can be easily reverted if
> needed)

Yeah, should work too to add a patch before this one.

> 
> What do you think is better?

I just don't see how it could continue if e.g. multifd_send_pages() failed.

The other thing is returning zero looks weird itself when there's obviously an
error.  Normally we could allow that but better with a comment showing why.
For this case it's more natural to me if we could just fail early.

Juan?

-- 
Peter Xu




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-07 Thread Leonardo Bras Soares Passos
Hello Peter, thanks for reviewing!

On Mon, Feb 7, 2022 at 11:22 PM Peter Xu  wrote:
>
> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> > -void multifd_send_sync_main(QEMUFile *f)
> > +int multifd_send_sync_main(QEMUFile *f)
> >  {
> >  int i;
> > +bool flush_zero_copy;
> >
> >  if (!migrate_use_multifd()) {
> > -return;
> > +return 0;
> >  }
> >  if (multifd_send_state->pages->num) {
> >  if (multifd_send_pages(f) < 0) {
> >  error_report("%s: multifd_send_pages fail", __func__);
> > -return;
> > +return 0;
>
> I've not checked how it used to do if multifd_send_pages() failed, but.. 
> should
> it returns -1 rather than 0 when there will be a return code?

Yeah, that makes sense.
The point here is that I was trying not to modify much of the current behavior.

I mean, multifd_send_sync_main() would previously return void, so any
other errors would not matter to the caller of this function, which
will continue to run as if nothing happened.

Now, if it fails with flush_zero_copy, the operation needs to be aborted.

Maybe, I should make it different:
- In any error, return -1.
- Create/use a specific error code in the case of a failing
flush_zero_copy, so I can test the return value for it on the caller
function and return early.

Or alternatively, the other errors could also return early, but since
this will change how the code currently works, I would probably need
another patch for that change. (so it can be easily reverted if
needed)

What do you think is better?


> >  }
> >  }
> > +
> > +/*
> > + * When using zero-copy, it's necessary to flush after each iteration 
> > to
> > + * make sure pages from earlier iterations don't end up replacing newer
> > + * pages.
> > + */
> > +flush_zero_copy = migrate_use_zero_copy_send();
> > +
> >  for (i = 0; i < migrate_multifd_channels(); i++) {
> >  MultiFDSendParams *p = _send_state->params[i];
> >
> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >  if (p->quit) {
> >  error_report("%s: channel %d has already quit", __func__, i);
> >  qemu_mutex_unlock(>mutex);
> > -return;
> > +return 0;
>
> Same question here.

Please see above,

>
> >  }
>
> The rest looks good.  Thanks,

Thank you!

Best regards,
Leo




Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

2022-02-07 Thread Peter Xu
On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> -void multifd_send_sync_main(QEMUFile *f)
> +int multifd_send_sync_main(QEMUFile *f)
>  {
>  int i;
> +bool flush_zero_copy;
>  
>  if (!migrate_use_multifd()) {
> -return;
> +return 0;
>  }
>  if (multifd_send_state->pages->num) {
>  if (multifd_send_pages(f) < 0) {
>  error_report("%s: multifd_send_pages fail", __func__);
> -return;
> +return 0;

I've not checked how it used to do if multifd_send_pages() failed, but.. should
it returns -1 rather than 0 when there will be a return code?

>  }
>  }
> +
> +/*
> + * When using zero-copy, it's necessary to flush after each iteration to
> + * make sure pages from earlier iterations don't end up replacing newer
> + * pages.
> + */
> +flush_zero_copy = migrate_use_zero_copy_send();
> +
>  for (i = 0; i < migrate_multifd_channels(); i++) {
>  MultiFDSendParams *p = _send_state->params[i];
>  
> @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
>  if (p->quit) {
>  error_report("%s: channel %d has already quit", __func__, i);
>  qemu_mutex_unlock(>mutex);
> -return;
> +return 0;

Same question here.

>  }

The rest looks good.  Thanks,

-- 
Peter Xu