Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
Am 06.07.2016 um 04:04 hat Eric Blake geschrieben: > On 06/21/2016 11:54 PM, Fam Zheng wrote: > > On Mon, 06/20 17:39, Eric Blake wrote: > >> We have max_transfer documented in BlockLimits, but while we > >> honor it during pwrite_zeroes, we were blindly ignoring it > >> during pwritev and preadv, leading to multiple drivers having > >> to implement fragmentation themselves. This series moves > >> fragmentation to the block layer, then fixes the NBD driver to > >> use it; if you like this but it needs a v2, you can request that > >> I further do other drivers (I know at least iscsi and qcow2 do > >> some self-fragmenting and/or error reporting that can be > >> simplified by deferring fragmentation to the block layer). > >> > >> Prequisite: Kevin's block branch, plus my work on byte-based > >> block limits (v2 at the moment): > >> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html > >> > >> Also available as a tag at: > >> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1 > > > > Patches 1-6: > > > > Reviewed-by: Fam Zheng > > ping - series still applies to latest master without tweaks Apart from the build fix and the minor comments I made, this looks good to me. Kevin pgpUhpjnXMZwa.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On 06/21/2016 11:54 PM, Fam Zheng wrote: > On Mon, 06/20 17:39, Eric Blake wrote: >> We have max_transfer documented in BlockLimits, but while we >> honor it during pwrite_zeroes, we were blindly ignoring it >> during pwritev and preadv, leading to multiple drivers having >> to implement fragmentation themselves. This series moves >> fragmentation to the block layer, then fixes the NBD driver to >> use it; if you like this but it needs a v2, you can request that >> I further do other drivers (I know at least iscsi and qcow2 do >> some self-fragmenting and/or error reporting that can be >> simplified by deferring fragmentation to the block layer). >> >> Prequisite: Kevin's block branch, plus my work on byte-based >> block limits (v2 at the moment): >> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html >> >> Also available as a tag at: >> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1 > > Patches 1-6: > > Reviewed-by: Fam Zheng ping - series still applies to latest master without tweaks -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On Tue, Jun 21, 2016 at 04:05:22PM -0600, Eric Blake wrote: > On 06/21/2016 04:23 AM, Stefan Hajnoczi wrote: > > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote: > >> We have max_transfer documented in BlockLimits, but while we > >> honor it during pwrite_zeroes, we were blindly ignoring it > >> during pwritev and preadv, leading to multiple drivers having > >> to implement fragmentation themselves. This series moves > >> fragmentation to the block layer, then fixes the NBD driver to > >> use it; if you like this but it needs a v2, you can request that > >> I further do other drivers (I know at least iscsi and qcow2 do > >> some self-fragmenting and/or error reporting that can be > >> simplified by deferring fragmentation to the block layer). > >> > > > I'm concerned that requests A & B which should be atomic can now be > > interleaved. > > > > For example, two writes that are overlapping and fragmented. > > Applications expect to either see A or B on disk when both requests have > > completed. Fragmentation must serialize overlapping requests in order > > to prevent interleaved results where the application sees some of A and > > some of B when both requests have completed. > > > > A similar scenario happens when A is a read and B is a write, too. Read > > A is supposed to see either "before B" or "after B". With fragmentation > > it can see "some of before B and some of after B". > > This patch series doesn't change that, it just changes where the > atomicity is broken. I know but I needed to clarify what the right semantics are :). Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On Tue, Jun 21, 2016 at 12:43:08PM +0200, Kevin Wolf wrote: > Am 21.06.2016 um 12:23 hat Stefan Hajnoczi geschrieben: > > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote: > > > We have max_transfer documented in BlockLimits, but while we > > > honor it during pwrite_zeroes, we were blindly ignoring it > > > during pwritev and preadv, leading to multiple drivers having > > > to implement fragmentation themselves. This series moves > > > fragmentation to the block layer, then fixes the NBD driver to > > > use it; if you like this but it needs a v2, you can request that > > > I further do other drivers (I know at least iscsi and qcow2 do > > > some self-fragmenting and/or error reporting that can be > > > simplified by deferring fragmentation to the block layer). > > > > I'm concerned that requests A & B which should be atomic can now be > > interleaved. > > I don't think there is any guarantee of atomicity for overlapping > requests, at least not with more than a single sector (logical block > size, not BDRV_SECTOR_SIZE). > > That is, as far as I know neither hardware nor the Linux kernel nor the > qemu block layer (image formats fragment all the time!) protect against > this. If you have concurrent overlapping requests, you always get > undefined behaviour. Kevin, I think you are right and my mental model was wrong. I spent some time looking at SCSI SAM and SBC documents. The closest information I found was SBC "4.27 Model for uninterrupted sequences on LBA ranges" where ORWRITE, XDWRITEREAD, XPWRITE, and COMPARE AND WRITE are singled out. It implies that the device server may access LBAs while other types of commands are executing. If anyone knows a better reference in the SCSI specifications that clarifies behavior of overlapping I/O requests, please share! Stefan signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On Mon, 06/20 17:39, Eric Blake wrote: > We have max_transfer documented in BlockLimits, but while we > honor it during pwrite_zeroes, we were blindly ignoring it > during pwritev and preadv, leading to multiple drivers having > to implement fragmentation themselves. This series moves > fragmentation to the block layer, then fixes the NBD driver to > use it; if you like this but it needs a v2, you can request that > I further do other drivers (I know at least iscsi and qcow2 do > some self-fragmenting and/or error reporting that can be > simplified by deferring fragmentation to the block layer). > > Prequisite: Kevin's block branch, plus my work on byte-based > block limits (v2 at the moment): > https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html > > Also available as a tag at: > git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1 Patches 1-6: Reviewed-by: Fam Zheng
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On 06/21/2016 04:23 AM, Stefan Hajnoczi wrote: > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote: >> We have max_transfer documented in BlockLimits, but while we >> honor it during pwrite_zeroes, we were blindly ignoring it >> during pwritev and preadv, leading to multiple drivers having >> to implement fragmentation themselves. This series moves >> fragmentation to the block layer, then fixes the NBD driver to >> use it; if you like this but it needs a v2, you can request that >> I further do other drivers (I know at least iscsi and qcow2 do >> some self-fragmenting and/or error reporting that can be >> simplified by deferring fragmentation to the block layer). >> > I'm concerned that requests A & B which should be atomic can now be > interleaved. > > For example, two writes that are overlapping and fragmented. > Applications expect to either see A or B on disk when both requests have > completed. Fragmentation must serialize overlapping requests in order > to prevent interleaved results where the application sees some of A and > some of B when both requests have completed. > > A similar scenario happens when A is a read and B is a write, too. Read > A is supposed to see either "before B" or "after B". With fragmentation > it can see "some of before B and some of after B". This patch series doesn't change that, it just changes where the atomicity is broken. Consider that pre-patch, a raw format wrapping an NBD protocol connection would see a 32M max_transfer, but we were not fragmenting at the raw protocol. Thus, if someone requests a 40M operation, the raw protocol treats it as a single transaction, and NBD then fragments it into two transactions over the wire; and NBD can complete transactions out of order. All this series does is move the fragmentation to the raw layer, rather than the NBD layer, but the race between overlapping oversize requests is present whether or not the series is applied. > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
Am 21.06.2016 um 12:23 hat Stefan Hajnoczi geschrieben: > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote: > > We have max_transfer documented in BlockLimits, but while we > > honor it during pwrite_zeroes, we were blindly ignoring it > > during pwritev and preadv, leading to multiple drivers having > > to implement fragmentation themselves. This series moves > > fragmentation to the block layer, then fixes the NBD driver to > > use it; if you like this but it needs a v2, you can request that > > I further do other drivers (I know at least iscsi and qcow2 do > > some self-fragmenting and/or error reporting that can be > > simplified by deferring fragmentation to the block layer). > > I'm concerned that requests A & B which should be atomic can now be > interleaved. I don't think there is any guarantee of atomicity for overlapping requests, at least not with more than a single sector (logical block size, not BDRV_SECTOR_SIZE). That is, as far as I know neither hardware nor the Linux kernel nor the qemu block layer (image formats fragment all the time!) protect against this. If you have concurrent overlapping requests, you always get undefined behaviour. > For example, two writes that are overlapping and fragmented. > Applications expect to either see A or B on disk when both requests have > completed. Fragmentation must serialize overlapping requests in order > to prevent interleaved results where the application sees some of A and > some of B when both requests have completed. > > A similar scenario happens when A is a read and B is a write, too. Read > A is supposed to see either "before B" or "after B". With fragmentation > it can see "some of before B and some of after B". If we wanted to achieve this semantics, it would be easy enough: Add a mark_request_serialising() in the right place. But I'm pretty sure that this isn't needed. Kevin pgprxOykknfvs.pgp Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote: > We have max_transfer documented in BlockLimits, but while we > honor it during pwrite_zeroes, we were blindly ignoring it > during pwritev and preadv, leading to multiple drivers having > to implement fragmentation themselves. This series moves > fragmentation to the block layer, then fixes the NBD driver to > use it; if you like this but it needs a v2, you can request that > I further do other drivers (I know at least iscsi and qcow2 do > some self-fragmenting and/or error reporting that can be > simplified by deferring fragmentation to the block layer). > > Prequisite: Kevin's block branch, plus my work on byte-based > block limits (v2 at the moment): > https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html > > Also available as a tag at: > git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1 > > Eric Blake (5): > block: Fragment reads to max transfer length > block: Fragment writes to max transfer length > raw_bsd: Don't advertise flags not supported by protocol layer > nbd: Rely on block layer to break up large requests > nbd: Drop unused offset parameter > > include/block/nbd.h | 1 - > nbd/nbd-internal.h | 4 +-- > block/io.c | 84 > +++-- > block/nbd-client.c | 78 ++--- > block/nbd.c | 12 ++-- > block/raw_bsd.c | 6 ++-- > nbd/common.c| 3 +- > 7 files changed, 95 insertions(+), 93 deletions(-) I'm concerned that requests A & B which should be atomic can now be interleaved. For example, two writes that are overlapping and fragmented. Applications expect to either see A or B on disk when both requests have completed. Fragmentation must serialize overlapping requests in order to prevent interleaved results where the application sees some of A and some of B when both requests have completed. A similar scenario happens when A is a read and B is a write, too. Read A is supposed to see either "before B" or "after B". With fragmentation it can see "some of before B and some of after B". signature.asc Description: PGP signature
Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer
On 06/20/2016 05:39 PM, Eric Blake wrote: > We have max_transfer documented in BlockLimits, but while we > honor it during pwrite_zeroes, we were blindly ignoring it > during pwritev and preadv, leading to multiple drivers having > to implement fragmentation themselves. This series moves > fragmentation to the block layer, then fixes the NBD driver to > use it; if you like this but it needs a v2, you can request that > I further do other drivers (I know at least iscsi and qcow2 do > some self-fragmenting and/or error reporting that can be > simplified by deferring fragmentation to the block layer). iscsi turned out to be easy (see 6/5), but qcow2 is too tricky (we still have to fragment in qcow2_alloc_cluster_offset() if cluster allocation is not contiguous, so there is no one good value of max_transfer to use) > > Prequisite: Kevin's block branch, plus my work on byte-based > block limits (v2 at the moment): > https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html > > Also available as a tag at: > git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1 > > Eric Blake (5): > block: Fragment reads to max transfer length > block: Fragment writes to max transfer length > raw_bsd: Don't advertise flags not supported by protocol layer > nbd: Rely on block layer to break up large requests > nbd: Drop unused offset parameter > > include/block/nbd.h | 1 - > nbd/nbd-internal.h | 4 +-- > block/io.c | 84 > +++-- > block/nbd-client.c | 78 ++--- > block/nbd.c | 12 ++-- > block/raw_bsd.c | 6 ++-- > nbd/common.c| 3 +- > 7 files changed, 95 insertions(+), 93 deletions(-) > -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature