Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

Leonardo Bras Soares Passos Mon, 21 Feb 2022 11:49:13 -0800

On Fri, Feb 18, 2022 at 2:36 PM Juan Quintela <quint...@redhat.com> wrote:
>
> Leonardo Bras Soares Passos <leob...@redhat.com> wrote:
> > Hello Peter, thanks for reviewing!
> >
> > On Mon, Feb 7, 2022 at 11:22 PM Peter Xu <pet...@redhat.com> wrote:
> >>
> >> On Tue, Feb 01, 2022 at 03:29:03AM -0300, Leonardo Bras wrote:
> >> > -void multifd_send_sync_main(QEMUFile *f)
> >> > +int multifd_send_sync_main(QEMUFile *f)
> >> >  {
> >> >      int i;
> >> > +    bool flush_zero_copy;
> >> >
> >> >      if (!migrate_use_multifd()) {
> >> > -        return;
> >> > +        return 0;
> >> >      }
> >> >      if (multifd_send_state->pages->num) {
> >> >          if (multifd_send_pages(f) < 0) {
> >> >              error_report("%s: multifd_send_pages fail", __func__);
> >> > -            return;
> >> > +            return 0;
> >>
> >> I've not checked how it used to do if multifd_send_pages() failed, but.. 
> >> should
> >> it returns -1 rather than 0 when there will be a return code?
> >
> > Yeah, that makes sense.
> > The point here is that I was trying not to modify much of the current 
> > behavior.
>
>     if (qatomic_read(&multifd_send_state->exiting)) {
>         return -1;
>     }
>
>         if (p->quit) {
>             error_report("%s: channel %d has already quit!", __func__, i);
>             qemu_mutex_unlock(&p->mutex);
>             return -1;
>         }
>
> This are the only two cases where the current code can return one
> error.  In the 1st case we are exiting, we are already in the middle of
> finishing, so we don't really care.
> In the second one, we have already quit, and error as already quite big.
>
> But I agree with both comments:
> - we need to improve the error paths
> - leonardo changes don't affect what is already there.
>




>
> > I mean, multifd_send_sync_main() would previously return void, so any
> > other errors would not matter to the caller of this function, which
> > will continue to run as if nothing happened.
> >
> > Now, if it fails with flush_zero_copy, the operation needs to be aborted.
> >
> > Maybe, I should make it different:
> > - In any error, return -1.
> > - Create/use a specific error code in the case of a failing
> > flush_zero_copy, so I can test the return value for it on the caller
> > function and return early.
>
> We need to add the check.  It don't matter if the problem is zero_copy
> or the existing one, we are under a minor catastrophe and migration has
> to be aborted.

Ok, I will fix that so we can abort in case of any error.
Maybe it's better to do that on a separated patch, before 5/5, right?

>
> > Or alternatively, the other errors could also return early, but since
> > this will change how the code currently works, I would probably need
> > another patch for that change. (so it can be easily reverted if
> > needed)
> >
> > What do you think is better?
> >
> >
> >> >          }
> >> >      }
> >> > +
> >> > +    /*
> >> > +     * When using zero-copy, it's necessary to flush after each 
> >> > iteration to
> >> > +     * make sure pages from earlier iterations don't end up replacing 
> >> > newer
> >> > +     * pages.
> >> > +     */
> >> > +    flush_zero_copy = migrate_use_zero_copy_send();
> >> > +
> >> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> >> >          MultiFDSendParams *p = &multifd_send_state->params[i];
> >> >
> >> > @@ -591,7 +600,7 @@ void multifd_send_sync_main(QEMUFile *f)
> >> >          if (p->quit) {
> >> >              error_report("%s: channel %d has already quit", __func__, 
> >> > i);
> >> >              qemu_mutex_unlock(&p->mutex);
> >> > -            return;
> >> > +            return 0;
> >>
> >> Same question here.
> >
> > Please see above,
> >
> >>
> >> >          }
> >>
> >> The rest looks good.  Thanks,
>
> Later, Juan.
>

Thanks for the feedback!

Best regards,
Leo

Re: [PATCH v8 5/5] multifd: Implement zero copy write in multifd migration (multifd-zero-copy)

Reply via email to