On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Hao <ha...@linux.vnet.ibm.com> wrote: > Test case 185 failed since commit 4486e89c219 --- "vl: introduce > vm_shutdown()". > It's because of the newly introduced function vm_shutdown calls > bdrv_drain_all, > which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs > that doubles the speed and offset is doubled. > Some jobs' status are changed as well. > > The fix is to not resume the jobs that are already yielded and also change > 185.out accordingly. > > Suggested-by: Stefan Hajnoczi <stefa...@gmail.com> > Signed-off-by: QingFeng Hao <ha...@linux.vnet.ibm.com> > --- > blockjob.c | 10 +++++++++- > include/block/blockjob.h | 5 +++++ > tests/qemu-iotests/185.out | 11 +++++++++--
If drain no longer forces the block job to iterate, shouldn't the test output remain the same? (The means the test is fixed by the QEMU patch.) > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/blockjob.c b/blockjob.c > index ef3ed69ff1..fa9838ac97 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob > *job) > > static void block_job_pause(BlockJob *job) > { > - job->pause_count++; > + if (!job->yielded) { > + job->pause_count++; > + } The pause cannot be ignored. This change introduces a bug. Pause is not a synchronous operation that stops the job immediately. Pause just remembers that the job needs to be paused. When the job runs again (e.g. timer callback, fd handler) it eventually reaches block_job_pause_point() where it really pauses. The bug in this patch is: 1. The job has a timer pending. 2. block_job_pause() is called during drain. 3. The timer fires during drain but now the job doesn't know it needs to pause, so it continues running! Instead what needs to happen is that block_job_pause() remains unmodified but block_job_resume if extended: static void block_job_resume(BlockJob *job) { assert(job->pause_count > 0); job->pause_count--; if (job->pause_count) { return; } + if (job_yielded_before_pause_and_is_still_yielded) { block_job_enter(job); + } } This handles the case I mentioned above, where the yield ends before pause ends (therefore resume must enter the job!). To make this a little clearer, there are two cases to consider: Case 1: 1. Job yields 2. Pause 3. Job is entered from timer/fd callback 4. Resume (enter job? yes) Case 2: 1. Job yields 2. Pause 3. Resume (enter job? no) 4. Job is entered from timer/fd callback Stefan