On Tue, Mar 27, 2018 at 11:32:00AM +0800, QingFeng Hao wrote:
> 
> 在 2018/3/23 18:04, Stefan Hajnoczi 写道:
> > On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Hao <ha...@linux.vnet.ibm.com> 
> > wrote:
> > > Test case 185 failed since commit 4486e89c219 --- "vl: introduce 
> > > vm_shutdown()".
> > > It's because of the newly introduced function vm_shutdown calls 
> > > bdrv_drain_all,
> > > which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs
> > > that doubles the speed and offset is doubled.
> > > Some jobs' status are changed as well.
> > > 
> > > The fix is to not resume the jobs that are already yielded and also change
> > > 185.out accordingly.
> > > 
> > > Suggested-by: Stefan Hajnoczi <stefa...@gmail.com>
> > > Signed-off-by: QingFeng Hao <ha...@linux.vnet.ibm.com>
> > > ---
> > >   blockjob.c                 | 10 +++++++++-
> > >   include/block/blockjob.h   |  5 +++++
> > >   tests/qemu-iotests/185.out | 11 +++++++++--
> > 
> > If drain no longer forces the block job to iterate, shouldn't the test
> > output remain the same?  (The means the test is fixed by the QEMU
> > patch.)
> > 
> > >   3 files changed, 23 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/blockjob.c b/blockjob.c
> > > index ef3ed69ff1..fa9838ac97 100644
> > > --- a/blockjob.c
> > > +++ b/blockjob.c
> > > @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, 
> > > BlockJob *job)
> > > 
> > >   static void block_job_pause(BlockJob *job)
> > >   {
> > > -    job->pause_count++;
> > > +    if (!job->yielded) {
> > > +        job->pause_count++;
> > > +    }
> > 
> > The pause cannot be ignored.  This change introduces a bug.
> > 
> > Pause is not a synchronous operation that stops the job immediately.
> > Pause just remembers that the job needs to be paused.   When the job
> > runs again (e.g. timer callback, fd handler) it eventually reaches
> > block_job_pause_point() where it really pauses.
> > 
> > The bug in this patch is:
> > 
> > 1. The job has a timer pending.
> > 2. block_job_pause() is called during drain.
> > 3. The timer fires during drain but now the job doesn't know it needs
> > to pause, so it continues running!
> > 
> > Instead what needs to happen is that block_job_pause() remains
> > unmodified but block_job_resume if extended:
> > 
> > static void block_job_resume(BlockJob *job)
> > {
> >      assert(job->pause_count > 0);
> >      job->pause_count--;
> >      if (job->pause_count) {
> >          return;
> >      }
> > +    if (job_yielded_before_pause_and_is_still_yielded) {
> Thanks a lot for your great comments! But I can't figure out how to check
> this.
> >      block_job_enter(job);
> > +    }
> > }
> > 
> > This handles the case I mentioned above, where the yield ends before
> > pause ends (therefore resume must enter the job!).
> > 
> > To make this a little clearer, there are two cases to consider:
> > 
> > Case 1:
> > 1. Job yields
> > 2. Pause
> > 3. Job is entered from timer/fd callback
> How do I know that if job is entered from ...? thanks

Sorry, in order to answer your question properly I would have to study
the code and get the point where I could write the patch myself.

I have sent a patch to update the test output for the upcoming QEMU 2.12
release.  At this time in the release cycle it's the most appropriate
solution.

Stefan

Attachment: signature.asc
Description: PGP signature

Reply via email to