On Fri, Nov 24, 2017 at 01:57:46AM +0800, Fam Zheng wrote: > Jeff's block job patch made the latent drain bug visible, and I find this > patch, which by itself also makes some sense, can hide it again. :) With it > applied we are at least back to the ground where patchew's iotests (make > docker-test-block@fedora) can pass. >
Unfortunately, I am still seeing segfaults and aborts even with this patch. For instance, on tests: 097 141 176. k > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent > list changes. One drained_end call before the mirror_exit() already did one > blk_root_drained_end(), a second drained_end on an updated parent node can do > another same blk_root_drained_end(), making it unbalanced with > blk_root_drained_begin(). This is shown by the following three backtraces as > captured by rr with a crashed "qemu-img commit", essentially the same as in > the failed iotest 020: > > * Backtrace 1, where drain begins: > > (rr) bt > > * Backtrace 2, in the early phase of bdrv_parent_drained_end(), before > mirror_exit happend: > > (rr) bt > > * Backtrace 3, in a later phase of the same bdrv_parent_drained_end(), after > mirror_exit() which changed the node graph: > > (rr) bt > > IMO we should rethink bdrv_parent_drained_begin/end to avoid such > complications > and maybe in the long term get rid of the nested BDRV_POLL_WHILE() if > possible. > > It's late for me so I'm posting the patch anyway in case we could use it for > -rc3. > > Note this doesn't fix the hanging 056, which I haven't debugged yet. > > Fam > > Fam Zheng (1): > block: Don't poll for drain end > > block/io.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > -- > 2.14.3 >