The Wednesday 10 Sep 2014 à 15:18:22 (+0800), Liu Yuan wrote :
> On Sun, Sep 07, 2014 at 05:12:31PM +0200, Benoît Canet wrote:
> > The Monday 01 Sep 2014 à 15:43:06 (+0800), Liu Yuan wrote :
> > > This patch set mainly add mainly two logics to implement device recover
> > > - notify qourum driver of the broken states from the child driver(s)
> > > - dirty track and sync the device after it is repaired
> > >
> > > Thus quorum allow VMs to continue while some child devices are broken and
> > > when
> > > the child devices are repaired and return back, we sync dirty bits during
> > > downtime to keep data consistency.
> > >
> > > The recovery logic is based on the driver state bitmap and will sync the
> > > dirty
> > > bits with a timeslice window in a coroutine in this prtimive
> > > implementation.
> > >
> > > Simple graph about 2 children with threshold=1 and read-pattern=fifo:
> > > (similary to DRBD)
> > >
> > > + denote device sync iteration
> > > - IO on a single device
> > > = IO on two devices
> > >
> > > sync complete, release dirty bitmap
> > > ^
> > > |
> > > ====-----------------++++----++++----++==========
> > > | |
> > > | v
> > > | device repaired and begin to sync
> > > v
> > > device broken, create a dirty bitmap
> > >
> > > This sync logic can take care of nested broken problem, that devices are
> > > broken while in sync. We just start a sync process after the devices are
> > > repaired again and switch the devices from broken to sound only when
> > > the sync
> > > completes.
> > >
> > > For read-pattern=quorum mode, it enjoys the recovery logic without any
> > > problem.
> > >
> > > Todo:
> > > - use aio interface to sync data (multiple transfer in one go)
> > > - dynamic slice window to control sync bandwidth more smoothly
> > > - add auto-reconnection mechanism to other protocol (if not support yet)
> > > - add tests
> > >
> > > Cc: Eric Blake <ebl...@redhat.com>
> > > Cc: Benoit Canet <ben...@irqsave.net>
> > > Cc: Kevin Wolf <kw...@redhat.com>
> > > Cc: Stefan Hajnoczi <stefa...@redhat.com>
> > >
> > > Liu Yuan (8):
> > > block/quorum: initialize qcrs.aiocb for read
> > > block: add driver operation callbacks
> > > block/sheepdog: propagate disconnect/reconnect events to upper driver
> > > block/quorum: add quorum_aio_release() helper
> > > quorum: fix quorum_aio_cancel()
> > > block/quorum: add broken state to BlockDriverState
> > > block: add two helpers
> > > quorum: add basic device recovery logic
> > >
> > > block.c | 17 +++
> > > block/quorum.c | 324
> > > +++++++++++++++++++++++++++++++++++++++++-----
> > > block/sheepdog.c | 9 ++
> > > include/block/block.h | 9 ++
> > > include/block/block_int.h | 6 +
> > > trace-events | 5 +
> > > 6 files changed, 336 insertions(+), 34 deletions(-)
> > >
> > > --
> > > 1.9.1
> > >
> >
> > Hi liu,
> >
> > Had you noticed that your series conflict with one of Fam's series in the
> > quorum cancel
> > function fix patch ?
>
> Not yet, thanks for reminding.
I think Fam somehow digested you patch.
>
> > Could you find an arrangement with Fam so the two patches don't collide
> > anymore ?
> >
> > Do you intend to respin your series ?
>
> Yes, I'll rebase the v2 later before more possible reviews.
>
> Thanks
> Yuan
>