Peter Jeremy wrote this message on Sat, Feb 05, 2022 at 20:50 +1100: > On 2022-Feb-02 11:49:44 +0200, Andriy Gapon <a...@freebsd.org> wrote: > >On 02/02/2022 11:14, Warner Losh wrote: > >> On Wed, Feb 2, 2022 at 2:05 AM Andriy Gapon <a...@freebsd.org > >> <mailto:a...@freebsd.org>> wrote: > >> Hmm... it looks like both the old and new (Open)ZFS use BIO_FLUSH > >> command > >> without BIO_ORDERED flag. Not sure if it happens to do the right > >> thing anyway > >> or not. > >> > >> > >> It's an unordered flush then. The flush will happen whenever. I have a > >> vague > >> memory that ZFS will only issue this command in cases where there's no > >> other I/O > >> pending. > > > >I think that there is still a potential problem that an earlier write > >request > >might get re-ordered after the flush. > >I think that we should add BIO_ORDERED for correctness. > > I've raised https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261731 to > make geom_gate support BIO_ORDERED. Exposing the BIO_ORDERED flag to > userland is quite easy (once a decision is made as to how to do that). > Enhancing the geom_gate clients to correctly implement BIO_ORDERED is > somewhat harder.
The clients are single threaded wrt IOs, so I don't think updating them are required. I do have patches to improve things by making ggated multithreaded to improve IOPs, and so making this improvement would allow those patches to be useful. I do have a question though, what is the exact semantics of _ORDERED? Does all the previous IOs have to be ack'd/received by the kernel before executing them, OR can once ggated, for example, received notification that the writes before an _ORDERED completes, that it can then execute the _ORDERED command w/o the other side receiving it? The reason I ask, is that if the connection is broken before the kernel ack's the pre-_ORDERED bios, but after the _ORDERED bio has been written, what are the implications? I can think of an issue where the pre and _ORDERED bio is overlapping that might cause issue. Here is the scenario that I'm thinking of. _WRITE 16 sectors at offset 0 _WRITE _ORDERED 16 sectors at offset 8 connection is now broken ggate reconnects kernel reissues both IOs. _WRITE 16 sectors at offset 0 kernel crashes before the second _WRITE happens and needs to read the data. We now have a situation where sectors 16-24 have "new" data, while sectors 8-16 have "old" data on them, which may corrupt what a FS thinks. And right now, the ggate protocol (from what I remember) doesn't have a way to know when the remote kernel has received notification that an IO is complete. I guess this situation isn't any worse than it is right now w/o passing the _ORDERED flag down though. > I've done some experiments and OpenZFS doesn't generate BIO_ORDERED > operations so I've also raised https://github.com/openzfs/zfs/issues/13065 > I haven't looked into how difficult that would be to fix. -- John-Mark Gurney Voice: +1 415 225 5579 "All that I will do, has been done, All that I have, has not."
signature.asc
Description: PGP signature