On Fri, Nov 24, 2017 at 01:57:46AM +0800, Fam Zheng wrote:
> Jeff's block job patch made the latent drain bug visible, and I find this
> patch, which by itself also makes some sense, can hide it again. :) With it
> applied we are at least back to the ground where patchew's iotests (make
> docker-test-block@fedora) can pass.
> 

Unfortunately, I am still seeing segfaults and aborts even with this patch.
For instance, on tests: 097 141 176.

k
> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> list changes. One drained_end call before the mirror_exit() already did one
> blk_root_drained_end(), a second drained_end on an updated parent node can do
> another same blk_root_drained_end(), making it unbalanced with
> blk_root_drained_begin(). This is shown by the following three backtraces as
> captured by rr with a crashed "qemu-img commit", essentially the same as in
> the failed iotest 020:
> 
> * Backtrace 1, where drain begins:
> 
> (rr) bt
> 
> * Backtrace 2, in the early phase of bdrv_parent_drained_end(), before
>   mirror_exit happend:
> 
> (rr) bt
> 
> * Backtrace 3, in a later phase of the same bdrv_parent_drained_end(), after
>   mirror_exit() which changed the node graph:
> 
> (rr) bt
> 
> IMO we should rethink bdrv_parent_drained_begin/end to avoid such 
> complications
> and maybe in the long term get rid of the nested BDRV_POLL_WHILE() if 
> possible.
> 
> It's late for me so I'm posting the patch anyway in case we could use it for
> -rc3.
> 
> Note this doesn't fix the hanging 056, which I haven't debugged yet.
> 
> Fam
> 
> Fam Zheng (1):
>   block: Don't poll for drain end
> 
>  block/io.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> -- 
> 2.14.3
> 

Reply via email to