On 05/16/2011 07:30 PM, Eric Blake wrote:
> +        while (priv->jobSignals&  QEMU_JOB_SIGNAL_BLKSTAT)
> +            ignore_value(virCondWait(&priv->signalCond,&vm->lock));
> +
> +        priv->jobSignalsData.statDevName = disk->info.alias;
> +        priv->jobSignalsData.blockStat = stats;
> +        priv->jobSignalsData.statRetCode = -1;
> +        priv->jobSignals |= QEMU_JOB_SIGNAL_BLKSTAT;
> +
> +        while (priv->jobSignals&  QEMU_JOB_SIGNAL_BLKSTAT)
> +            ignore_value(virCondWait(&priv->signalCond,&vm->lock));

I'm not sure that the original problem with jobSignals is fixed (the 
one that you solved by splitting the retcode field).  A similar race 
can happen with two threads, both requesting blkstat.

Thread 1 starts a migration.
Thread 2 queries BlockStats, waits for the condition variable and
    priv->jobSignals to be 0.
Thread 2 sets priv->jobSignals to non-zero, then waits for condition
    variable and priv->jobSignals to clear BLKSTATS.
Thread 1 processes thread 2's query, and clears priv->jobSignals
Thread 3 sees priv->jobSignals == 0 and queries BlockStats.
Thread 1 processes thread 3's query, and clears priv->jobSignals
Thread 2 gets condition lock, but sees the answer intended for
thread 3.

At this point I'm not even sure if there is a deadlock or what.

I think the code should use two condition variables.  signalCond is 
broadcast when you _can start_ a job, resultCond is broadcast when a 
job has finished.

       while ((priv->jobSignals | priv->jobResults) & QEMU_JOB_SIGNAL_BLKSTAT)
           ignore_value(virCondWait(&priv->signalCond, &vm->lock));

       priv->jobSignalsData.statDevName = disk->info.alias;
       priv->jobSignalsData.blockStat = stats;
       priv->jobSignalsData.statRetCode = -1;
       priv->jobSignals |= QEMU_JOB_SIGNAL_BLKSTAT;

       while ((priv->jobResults & QEMU_JOB_SIGNAL_BLKSTAT) == 0)
            ignore_value(virCondWait(&priv->resultCond, &vm->lock));

       ret = priv->jobSignalsData.statRetCode;
       priv->jobResults ^= QEMU_JOB_SIGNAL_BLKSTAT;
       virCondBroadcast(&priv->signalCond);

...

       priv->jobSignalsData.statRetCode = ret;
       priv->jobResults |= QEMU_JOB_SIGNAL_BLKSTAT;
       priv->jobSignals ^= QEMU_JOB_SIGNAL_BLKSTAT;
       virCondBroadcast(&priv->resultCond);

...

       while (priv->jobSignals & ~priv->jobResults) {
           if (qemuMigrationProcessJobSignals(driver, vm, job, false) < 0)
               goto cleanup;
       }

...

 cleanup:
     while (priv->jobSignals & ~priv->jobResults) {
         qemuMigrationProcessJobSignals(driver, vm, job, false);
     }

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Reply via email to