Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100:
> On 2022-Feb-02 11:49:44 +0200, Andriy Gapon <a...@freebsd.org> wrote:
> >On 02/02/2022 11:14, Warner Losh wrote:
> >> On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <a...@freebsd.org 
> >> <mailto:a...@freebsd.org>> wrote:
> >>     Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH 
> >> command
> >>     without BIO_ORDERED flag.  Not sure if it happens to do the right 
> >> thing anyway
> >>     or not.
> >> 
> >> 
> >> It's an unordered flush then. The flush will happen whenever. I have a 
> >> vague
> >> memory that ZFS will only issue this command in cases where there's no 
> >> other I/O
> >> pending.
> >
> >I think that there is still a potential problem that an earlier write 
> >request 
> >might get re-ordered after the flush.
> >I think that we should add BIO_ORDERED for correctness.
> 
> I've raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261731 to
> make geom_gate support BIO_ORDERED.  Exposing the BIO_ORDERED flag to
> userland is quite easy (once a decision is made as to how to do that).
> Enhancing the geom_gate clients to correctly implement BIO_ORDERED is
> somewhat harder.

The clients are single threaded wrt IOs, so I don't think updating them
are required.

I do have patches to improve things by making ggated multithreaded to
improve IOPs, and so making this improvement would allow those patches
to be useful.

I do have a question though, what is the exact semantics of _ORDERED?

Does all the previous IOs have to be ack'd/received by the kernel before
executing them, OR can once ggated, for example, received notification
that the writes before an _ORDERED completes, that it can then execute
the _ORDERED command w/o the other side receiving it?

The reason I ask, is that if the connection is broken before the kernel
ack's the pre-_ORDERED bios, but after the _ORDERED bio has been written,
what are the implications?

I can think of an issue where the pre and _ORDERED bio is overlapping
that might cause issue.  Here is the scenario that I'm thinking of.

_WRITE 16 sectors at offset 0
_WRITE _ORDERED 16 sectors at offset 8
connection is now broken
ggate reconnects
kernel reissues both IOs.
_WRITE 16 sectors at offset 0
kernel crashes before the second _WRITE happens and needs to read the
data.

We now have a situation where sectors 16-24 have "new" data, while
sectors 8-16 have "old" data on them, which may corrupt what a FS
thinks.

And right now, the ggate protocol (from what I remember) doesn't have
a way to know when the remote kernel has received notification that an
IO is complete.

I guess this situation isn't any worse than it is right now w/o passing
the _ORDERED flag down though.

> I've done some experiments and OpenZFS doesn't generate BIO_ORDERED
> operations so I've also raised https://github.com/openzfs/zfs/issues/13065
> I haven't looked into how difficult that would be to fix.

-- 
  John-Mark Gurney                              Voice: +1 415 225 5579

     "All that I will do, has been done, All that I have, has not."

Attachment: signature.asc
Description: PGP signature

Reply via email to