On Wed, Aug 02, 2023 at 04:58:38PM -0300, Fabiano Rosas wrote:
> Peter Xu <pet...@redhat.com> writes:
> 
> > On Wed, Aug 02, 2023 at 11:36:43AM -0300, Fabiano Rosas wrote:
> >> This function currently has a straight-forward part which is waiting
> >> for the thread to join and a complicated part which is doing a
> >> qemu_file_shutdown() on the return path file.
> >> 
> >> The shutdown is tricky because all calls to qemu_file_shutdown() set
> >> f->last_error to -EIO, which means we can never know if an error is an
> >> actual error or if we cleanly shutdown the file previously.
> >> 
> >> This is particularly bothersome for postcopy because it would send the
> >> return path thread into the retry routine which would wait on the
> >> postcopy_pause_rp_sem and consequently block the main thread. We
> >> haven't had reports of this so I must presume we never reach here with
> >> postcopy.
> >> 
> >> The shutdown call is also racy because since it doesn't take the
> >> qemu_file_lock, it could NULL-dereference if the return path thread
> >> happens to be in the middle of the critical region at
> >> migration_release_dst_files().
> >
> > After you rework the thread model on resume, shall we move
> > migration_release_dst_files() into the migration thread to be after the
> > pthread_join()?  I assume then we don't even need a mutex to protect it?
> >
> 
> I just need to figure out if it's ok to move the postcopy_qemufile_src
> cleanup along. No idea why it is there in the first place. I see you
> moved it from postcopy_pause and we're about to move it back to the
> exact same place =D

It was there because the old postcopy-preempt was sending data via
postcopy_qemufile_src from the migration thread, while postcopy_pause is
also the migration thread context.

Then we had 9358982744 ("migration: Send requested page directly in
rp-return thread") where we moved that "send page" operation into the
return path thread to reduce latencies.  After moving there it also means
the file handle can be accessed in >1 threads, so I just moved it over to
operate that always in the return path thread, then no race should happen.

With your change, return path will vanish before migration thread accesses
it later (so as mentioned above, it must be after pthread_join()
succeeded), then I assume it'll be fine too to have it back in migration
thread.

Or perhaps just take the file lock?

> 
> >> 
> >> Move this more complicated part of the code to a separate routine so
> >> we can wait on the thread without all of this baggage.
> >
> > I think you mentioned "some nuance" on having mark_source_rp_bad() in
> > await_return_path_close_on_source(), I did remember I tried to look into
> > that "nuance" too a long time ago but I just forgot what was that.  Great
> > if you can share some details.
> >
> 
> Well, mark_source_rp_bad() at await_return_path_close_on_source() is
> basically useless:
> 
> - We only call mark_source_rp_bad() if s->to_dst_file has an error and the
>   migration_completion() already checks that condition and fails the
>   migration anyway.
> 
> - If to_dst_file has an error, chances are the destination already did
>   cleanup by this point, so from_dst_file would already have an errno,
>   due to that. At qemu_fill_buffer(), the len == 0 case basically means
>   "the other end finished cleanly". We still set -EIO in that case, I
>   don't know why. Possibly because not all backends will have the same
>   semantics for len == 0.

I don't know either, but I checked it's from a555b8092a ("qemu-file: Don't
do IO after shutdown").  Maybe there's better way to do so we could
identify the difference, but yes can be for later.

> 
> - If the above doesn't happen, then due to the shutdown, from_dst_file
>   will already have an error again set by qemu_file_shutdown().
> 
> Not to mention that mark_source_rp_bad() is in a race with the return
> path thread which could clear the error during postcopy retry.
> 
> As this patch tries to convey, this whole shutdown routine is weird. We
> don't have any documentation explaining when it could happen, so we're
> left with wanting to call it always. Except that doesn't work because in
> postcopy we'd trigger the retry logic and that hangs, and because of the
> "shutdown means -EIO" issue we'd be eating up whatever error happened
> before (it's all -EIO, so there's no way to tell them apart).
> 
> Given all of that, I thought just moving this aside would be better for
> the time being than to try to rationalise all of this. This series fixes
> a reproducible bug while everything I said above is just code inspection
> and some artificial testing of mine.

Yeah we can leave that for later.  Actually my other series removed that (I
totally forgot it, until I just noticed and checked).  But that'll conflict
with yours.  I think this series should in earlier if it fixes a possible
race, so I'll just rebase when it lands.

Thanks,

-- 
Peter Xu


Reply via email to