Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

Alberto Garcia Thu, 16 Nov 2017 07:56:04 -0800

On Wed 15 Nov 2017 05:31:20 PM CET, Anton Nefedov wrote:
>> I have the impression that one major source of headaches is the fact
>> that the reopen queue contains nodes that don't need to be reopened at
>> all. Ideally this should be detected early on in bdrv_reopen_queue(), so
>> there's no chance that the queue contains nodes used by a different
>> block job. If we had that then op blockers should be enough to prevent
>> these things. Or am I missing something?
>> 
> After applying Max's patch I tried the similar approach; that is keep
> BDSes referenced while they are in the reopen queue.  Now I get the
> stream job hanging. Somehow one blk_root_drained_begin() is not paired
> with blk_root_drained_end(). So the job stays paused.


I can see this if I apply Max's patch and keep refs to BDSs in the
reopen queue:

#0  block_job_pause (...) at blockjob.c:130
#1  0x000055c143cb586d in block_job_drained_begin (...) at blockjob.c:227
#2  0x000055c143d08067 in blk_set_dev_ops (...) at block/block-backend.c:887
#3  0x000055c143cb69db in block_job_create (...) at blockjob.c:678
#4  0x000055c143d17c0c in mirror_start_job (...) at block/mirror.c:1177

There's a ops->drained_begin(opaque) call in blk_set_dev_ops() that
doesn't seem to be paired. And when you call block_job_start() then it
yields immediately waiting for the resume (that never arrives).

John, this change was yours (f4d9cc88ee69a5b04). Any idea?

Berto

Re: [Qemu-block] segfault in parallel blockjobs (iotest 30)

Reply via email to