> -----Original Message----- > From: Stefan Hajnoczi <stefa...@redhat.com> > Sent: Thursday, May 13, 2021 10:26 PM > To: Zhang, Chen <chen.zh...@intel.com> > Cc: Kevin Wolf <kw...@redhat.com>; Max Reitz <mre...@redhat.com>; Fam > Zheng <f...@euphon.net>; qemu-dev <qemu-de...@nongnu.org>; qemu- > block <qemu-block@nongnu.org>; Zhang Chen <zhangc...@gmail.com>; > Minghao Yuan <me...@qq.com> > Subject: Re: [RFC PATCH] block/io.c: Flush parent for quorum in generic code > > On Wed, May 12, 2021 at 03:49:57PM +0800, Zhang Chen wrote: > > Fix the issue from this patch: > > [PATCH] block: Flush all children in generic code From > > 883833e29cb800b4d92b5d4736252f4004885191 > > > > Quorum driver do not have the primary child. > > It will caused guest block flush issue when use quorum and NBD. > > The vm guest flushes failed,and then guest filesystem is shutdown. > > > > Signed-off-by: Zhang Chen <chen.zh...@intel.com> > > Reported-by: Minghao Yuan <me...@qq.com> > > --- > > block/io.c | 31 ++++++++++++++++++++++--------- > > 1 file changed, 22 insertions(+), 9 deletions(-) > ... > > +flush_data: > > + if (no_primary_child) { > > + /* Flush parent */ > > + ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0; > > + } else { > > + /* Flush childrens */ > > + ret = 0; > > + QLIST_FOREACH(child, &bs->children, next) { > > + if (child->perm & (BLK_PERM_WRITE | > BLK_PERM_WRITE_UNCHANGED)) { > > + int this_child_ret = bdrv_co_flush(child->bs); > > + if (!ret) { > > + ret = this_child_ret; > > + } > > } > > } > > I'm missing something: > > The quorum driver has a valid bs->children list even though it does not have a > primary child. Why does QLIST_FOREACH() loop fail for you? >
Yes, in most cases QLIST_FOREACH() works for me. But not work when one of the child disconnected, the original patch changed the default behavior of quorum driver when occurs issue. For example: VM quorum driver have two children, local disk1 and NBD disk2. When network down and NBD disk2 disconnected, current code will report "event": "QUORUM_REPORT_BAD" "type": "flush", "error": "Input/output error" And "event": "BLOCK_IO_ERROR" "#block008", "reason": "Input/output error" The guest even cannot read/write the normal local disk1. VM IO will crashed caused by NDB disk2 input/output error. I think we do need report the event about we lose a child(NBD disk2) at this time, but no need crash all IO system. Because we can fix it by x-blockdev-change and drive_add/drive_del for new children. Before the original patch(883833e2), VM still can read/write the local disk1. This patch just the RFC version, please give me more comments to fix this issue. Thanks Chen > Does this patch effectively skip bdrv_co_flush() calls on all quorum children? > That seems wrong since children need to be flushed so that data is persisted. > Yes, > Stefan