Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/1] iotests: fix test case 185
On Tue, Mar 27, 2018 at 11:32:00AM +0800, QingFeng Hao wrote: > > 在 2018/3/23 18:04, Stefan Hajnoczi 写道: > > On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Hao> > wrote: > > > Test case 185 failed since commit 4486e89c219 --- "vl: introduce > > > vm_shutdown()". > > > It's because of the newly introduced function vm_shutdown calls > > > bdrv_drain_all, > > > which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs > > > that doubles the speed and offset is doubled. > > > Some jobs' status are changed as well. > > > > > > The fix is to not resume the jobs that are already yielded and also change > > > 185.out accordingly. > > > > > > Suggested-by: Stefan Hajnoczi > > > Signed-off-by: QingFeng Hao > > > --- > > > blockjob.c | 10 +- > > > include/block/blockjob.h | 5 + > > > tests/qemu-iotests/185.out | 11 +-- > > > > If drain no longer forces the block job to iterate, shouldn't the test > > output remain the same? (The means the test is fixed by the QEMU > > patch.) > > > > > 3 files changed, 23 insertions(+), 3 deletions(-) > > > > > > diff --git a/blockjob.c b/blockjob.c > > > index ef3ed69ff1..fa9838ac97 100644 > > > --- a/blockjob.c > > > +++ b/blockjob.c > > > @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, > > > BlockJob *job) > > > > > > static void block_job_pause(BlockJob *job) > > > { > > > -job->pause_count++; > > > +if (!job->yielded) { > > > +job->pause_count++; > > > +} > > > > The pause cannot be ignored. This change introduces a bug. > > > > Pause is not a synchronous operation that stops the job immediately. > > Pause just remembers that the job needs to be paused. When the job > > runs again (e.g. timer callback, fd handler) it eventually reaches > > block_job_pause_point() where it really pauses. > > > > The bug in this patch is: > > > > 1. The job has a timer pending. > > 2. block_job_pause() is called during drain. > > 3. The timer fires during drain but now the job doesn't know it needs > > to pause, so it continues running! > > > > Instead what needs to happen is that block_job_pause() remains > > unmodified but block_job_resume if extended: > > > > static void block_job_resume(BlockJob *job) > > { > > assert(job->pause_count > 0); > > job->pause_count--; > > if (job->pause_count) { > > return; > > } > > +if (job_yielded_before_pause_and_is_still_yielded) { > Thanks a lot for your great comments! But I can't figure out how to check > this. > > block_job_enter(job); > > +} > > } > > > > This handles the case I mentioned above, where the yield ends before > > pause ends (therefore resume must enter the job!). > > > > To make this a little clearer, there are two cases to consider: > > > > Case 1: > > 1. Job yields > > 2. Pause > > 3. Job is entered from timer/fd callback > How do I know that if job is entered from ...? thanks Sorry, in order to answer your question properly I would have to study the code and get the point where I could write the patch myself. I have sent a patch to update the test output for the upcoming QEMU 2.12 release. At this time in the release cycle it's the most appropriate solution. Stefan signature.asc Description: PGP signature
Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/1] iotests: fix test case 185
在 2018/3/26 18:29, Kevin Wolf 写道: Am 23.03.2018 um 04:43 hat QingFeng Hao geschrieben: Test case 185 failed since commit 4486e89c219 --- "vl: introduce vm_shutdown()". It's because of the newly introduced function vm_shutdown calls bdrv_drain_all, which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs that doubles the speed and offset is doubled. Some jobs' status are changed as well. The fix is to not resume the jobs that are already yielded and also change 185.out accordingly. Suggested-by: Stefan HajnocziSigned-off-by: QingFeng Hao Stefan already commented on the fix itself, but I want to add two more points: Please change your subject line. "iotests: fix test case 185" means that you are fixing the test case, not qemu code that makes the test case fail. The subject line should describe the actual change. In all likelihood it will start with "blockjob:" rather than "iotests:". Sure! thanks for pointing that. diff --git a/include/block/blockjob.h b/include/block/blockjob.h index fc645dac68..f8f208bbcf 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -99,6 +99,11 @@ typedef struct BlockJob { bool ready; /** + * Set to true when the job is yielded. + */ +bool yielded; This is the same as !busy, so we don't need a new field for this. Mostly yes, but the trick is that busy is set to be true in block_job_do_yield. Kevin -- Regards QingFeng Hao
Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/1] iotests: fix test case 185
在 2018/3/23 18:04, Stefan Hajnoczi 写道: On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Haowrote: Test case 185 failed since commit 4486e89c219 --- "vl: introduce vm_shutdown()". It's because of the newly introduced function vm_shutdown calls bdrv_drain_all, which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs that doubles the speed and offset is doubled. Some jobs' status are changed as well. The fix is to not resume the jobs that are already yielded and also change 185.out accordingly. Suggested-by: Stefan Hajnoczi Signed-off-by: QingFeng Hao --- blockjob.c | 10 +- include/block/blockjob.h | 5 + tests/qemu-iotests/185.out | 11 +-- If drain no longer forces the block job to iterate, shouldn't the test output remain the same? (The means the test is fixed by the QEMU patch.) 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/blockjob.c b/blockjob.c index ef3ed69ff1..fa9838ac97 100644 --- a/blockjob.c +++ b/blockjob.c @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job) static void block_job_pause(BlockJob *job) { -job->pause_count++; +if (!job->yielded) { +job->pause_count++; +} The pause cannot be ignored. This change introduces a bug. Pause is not a synchronous operation that stops the job immediately. Pause just remembers that the job needs to be paused. When the job runs again (e.g. timer callback, fd handler) it eventually reaches block_job_pause_point() where it really pauses. The bug in this patch is: 1. The job has a timer pending. 2. block_job_pause() is called during drain. 3. The timer fires during drain but now the job doesn't know it needs to pause, so it continues running! Instead what needs to happen is that block_job_pause() remains unmodified but block_job_resume if extended: static void block_job_resume(BlockJob *job) { assert(job->pause_count > 0); job->pause_count--; if (job->pause_count) { return; } +if (job_yielded_before_pause_and_is_still_yielded) { Thanks a lot for your great comments! But I can't figure out how to check this. block_job_enter(job); +} } This handles the case I mentioned above, where the yield ends before pause ends (therefore resume must enter the job!). To make this a little clearer, there are two cases to consider: Case 1: 1. Job yields 2. Pause 3. Job is entered from timer/fd callback How do I know that if job is entered from ...? thanks 4. Resume (enter job? yes) Case 2: 1. Job yields 2. Pause 3. Resume (enter job? no) 4. Job is entered from timer/fd callback Stefan -- Regards QingFeng Hao
Re: [Qemu-block] [Qemu-devel] [PATCH v2 1/1] iotests: fix test case 185
On Fri, Mar 23, 2018 at 3:43 AM, QingFeng Haowrote: > Test case 185 failed since commit 4486e89c219 --- "vl: introduce > vm_shutdown()". > It's because of the newly introduced function vm_shutdown calls > bdrv_drain_all, > which is called later by bdrv_close_all. bdrv_drain_all resumes the jobs > that doubles the speed and offset is doubled. > Some jobs' status are changed as well. > > The fix is to not resume the jobs that are already yielded and also change > 185.out accordingly. > > Suggested-by: Stefan Hajnoczi > Signed-off-by: QingFeng Hao > --- > blockjob.c | 10 +- > include/block/blockjob.h | 5 + > tests/qemu-iotests/185.out | 11 +-- If drain no longer forces the block job to iterate, shouldn't the test output remain the same? (The means the test is fixed by the QEMU patch.) > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/blockjob.c b/blockjob.c > index ef3ed69ff1..fa9838ac97 100644 > --- a/blockjob.c > +++ b/blockjob.c > @@ -206,11 +206,16 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob > *job) > > static void block_job_pause(BlockJob *job) > { > -job->pause_count++; > +if (!job->yielded) { > +job->pause_count++; > +} The pause cannot be ignored. This change introduces a bug. Pause is not a synchronous operation that stops the job immediately. Pause just remembers that the job needs to be paused. When the job runs again (e.g. timer callback, fd handler) it eventually reaches block_job_pause_point() where it really pauses. The bug in this patch is: 1. The job has a timer pending. 2. block_job_pause() is called during drain. 3. The timer fires during drain but now the job doesn't know it needs to pause, so it continues running! Instead what needs to happen is that block_job_pause() remains unmodified but block_job_resume if extended: static void block_job_resume(BlockJob *job) { assert(job->pause_count > 0); job->pause_count--; if (job->pause_count) { return; } +if (job_yielded_before_pause_and_is_still_yielded) { block_job_enter(job); +} } This handles the case I mentioned above, where the yield ends before pause ends (therefore resume must enter the job!). To make this a little clearer, there are two cases to consider: Case 1: 1. Job yields 2. Pause 3. Job is entered from timer/fd callback 4. Resume (enter job? yes) Case 2: 1. Job yields 2. Pause 3. Resume (enter job? no) 4. Job is entered from timer/fd callback Stefan