On Thu, 19 Sept 2024 at 12:59, Peter Xu <pet...@redhat.com> wrote:
>
> On Thu, Sep 19, 2024 at 10:08:25AM +0100, Peter Maydell wrote:
> > Thanks for looking at the issues with the migration tests.
> > This run went through first time without my needing to retry any
> > jobs, so fingers crossed that we have at least improved the reliability.
> > (I have a feeling there's still something funny with the k8s runners,
> > but that's not migration-test specific, it's just that test tends
> > to be the longest running and so most likely to be affected.)
>
> Kudos all go to Fabiano for debugging the hard problem.
>
> And yes, please let either of us know if it fails again, we can either keep
> looking, or still can disable it when necessary (if it takes long to debug).

On the subject of potential races in the migration code,
there's a couple of outstanding Coverity issues that might
be worth looking at. If they're false-positives let me know
and I can reclassify them in Coverity.

CID 1527402: In migrate_fd_cleanup() Coverity thinks there's
a race because we read s->to_dst_file in the "if (s->to_dst_file)"
check without holding the qemu_file_lock. This might be a
false-positive because the race Coverity identifies happens
if two threads both call migrate_fd_cleanup() at the same
time, which is probably not permitted. (But OTOH taking a
mutex gets you for free any necessary memory barriers...)

CID 1527413: In postcopy_pause_incoming() we read
mis->postcopy_qemufile_dst without holding the
postcopy_prio_thread_mutex which we use to protect the write
to that field, so Coverity thinks there's a race if two
threads call this function at once.

(The only other migration Coverity issue is CID 1560071,
which is the "better to use pstrcpy()" not-really-a-bug
we discussed in another thread.)

thanks
-- PMM

Reply via email to