> -----Original Message-----
> From: Stefan Hajnoczi <stefa...@redhat.com>
> Sent: Thursday, May 13, 2021 10:26 PM
> To: Zhang, Chen <chen.zh...@intel.com>
> Cc: Kevin Wolf <kw...@redhat.com>; Max Reitz <mre...@redhat.com>; Fam
> Zheng <f...@euphon.net>; qemu-dev <qemu-de...@nongnu.org>; qemu-
> block <qemu-block@nongnu.org>; Zhang Chen <zhangc...@gmail.com>;
> Minghao Yuan <me...@qq.com>
> Subject: Re: [RFC PATCH] block/io.c: Flush parent for quorum in generic code
> 
> On Wed, May 12, 2021 at 03:49:57PM +0800, Zhang Chen wrote:
> > Fix the issue from this patch:
> > [PATCH] block: Flush all children in generic code From
> > 883833e29cb800b4d92b5d4736252f4004885191
> >
> > Quorum driver do not have the primary child.
> > It will caused guest block flush issue when use quorum and NBD.
> > The vm guest flushes failed,and then guest filesystem is shutdown.
> >
> > Signed-off-by: Zhang Chen <chen.zh...@intel.com>
> > Reported-by: Minghao Yuan <me...@qq.com>
> > ---
> >  block/io.c | 31 ++++++++++++++++++++++---------
> >  1 file changed, 22 insertions(+), 9 deletions(-)
> ...
> > +flush_data:
> > +    if (no_primary_child) {
> > +        /* Flush parent */
> > +        ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> > +    } else {
> > +        /* Flush childrens */
> > +        ret = 0;
> > +        QLIST_FOREACH(child, &bs->children, next) {
> > +            if (child->perm & (BLK_PERM_WRITE |
> BLK_PERM_WRITE_UNCHANGED)) {
> > +                int this_child_ret = bdrv_co_flush(child->bs);
> > +                if (!ret) {
> > +                    ret = this_child_ret;
> > +                }
> >              }
> >          }
> 
> I'm missing something:
> 
> The quorum driver has a valid bs->children list even though it does not have a
> primary child. Why does QLIST_FOREACH() loop fail for you?
> 

Yes, in most cases QLIST_FOREACH() works for me.
But not work when one of the child disconnected, the original patch changed
the default behavior of quorum driver when occurs issue.

For example:
VM quorum driver have two children, local disk1 and NBD disk2.
When network down and NBD disk2 disconnected, current code will report 
"event": "QUORUM_REPORT_BAD" "type": "flush", "error": "Input/output error"
And
"event": "BLOCK_IO_ERROR" "#block008", "reason": "Input/output error"

The guest even cannot read/write the normal local disk1. VM IO will crashed 
caused by NDB disk2 input/output error.
I think we do need report the event about we lose a child(NBD disk2) at this 
time, but no need crash all IO system.
Because we can fix it by x-blockdev-change and drive_add/drive_del for new 
children. 
Before the original patch(883833e2), VM still can read/write the local disk1.

This patch just the RFC version, please give me more comments to fix this issue.
 
Thanks
Chen


> Does this patch effectively skip bdrv_co_flush() calls on all quorum children?
> That seems wrong since children need to be flushed so that data is persisted.
> 

Yes, 

> Stefan

Reply via email to