Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-07-08 Thread Kevin Wolf
Am 06.07.2016 um 04:04 hat Eric Blake geschrieben:
> On 06/21/2016 11:54 PM, Fam Zheng wrote:
> > On Mon, 06/20 17:39, Eric Blake wrote:
> >> We have max_transfer documented in BlockLimits, but while we
> >> honor it during pwrite_zeroes, we were blindly ignoring it
> >> during pwritev and preadv, leading to multiple drivers having
> >> to implement fragmentation themselves.  This series moves
> >> fragmentation to the block layer, then fixes the NBD driver to
> >> use it; if you like this but it needs a v2, you can request that
> >> I further do other drivers (I know at least iscsi and qcow2 do
> >> some self-fragmenting and/or error reporting that can be
> >> simplified by deferring fragmentation to the block layer).
> >>
> >> Prequisite: Kevin's block branch, plus my work on byte-based
> >> block limits (v2 at the moment):
> >> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html
> >>
> >> Also available as a tag at:
> >> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1
> > 
> > Patches 1-6:
> > 
> > Reviewed-by: Fam Zheng 
> 
> ping - series still applies to latest master without tweaks

Apart from the build fix and the minor comments I made, this looks good
to me.

Kevin


pgpUhpjnXMZwa.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-07-05 Thread Eric Blake
On 06/21/2016 11:54 PM, Fam Zheng wrote:
> On Mon, 06/20 17:39, Eric Blake wrote:
>> We have max_transfer documented in BlockLimits, but while we
>> honor it during pwrite_zeroes, we were blindly ignoring it
>> during pwritev and preadv, leading to multiple drivers having
>> to implement fragmentation themselves.  This series moves
>> fragmentation to the block layer, then fixes the NBD driver to
>> use it; if you like this but it needs a v2, you can request that
>> I further do other drivers (I know at least iscsi and qcow2 do
>> some self-fragmenting and/or error reporting that can be
>> simplified by deferring fragmentation to the block layer).
>>
>> Prequisite: Kevin's block branch, plus my work on byte-based
>> block limits (v2 at the moment):
>> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html
>>
>> Also available as a tag at:
>> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1
> 
> Patches 1-6:
> 
> Reviewed-by: Fam Zheng 

ping - series still applies to latest master without tweaks

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-22 Thread Stefan Hajnoczi
On Tue, Jun 21, 2016 at 04:05:22PM -0600, Eric Blake wrote:
> On 06/21/2016 04:23 AM, Stefan Hajnoczi wrote:
> > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote:
> >> We have max_transfer documented in BlockLimits, but while we
> >> honor it during pwrite_zeroes, we were blindly ignoring it
> >> during pwritev and preadv, leading to multiple drivers having
> >> to implement fragmentation themselves.  This series moves
> >> fragmentation to the block layer, then fixes the NBD driver to
> >> use it; if you like this but it needs a v2, you can request that
> >> I further do other drivers (I know at least iscsi and qcow2 do
> >> some self-fragmenting and/or error reporting that can be
> >> simplified by deferring fragmentation to the block layer).
> >>
> 
> > I'm concerned that requests A & B which should be atomic can now be
> > interleaved.
> > 
> > For example, two writes that are overlapping and fragmented.
> > Applications expect to either see A or B on disk when both requests have
> > completed.  Fragmentation must serialize overlapping requests in order
> > to prevent interleaved results where the application sees some of A and
> > some of B when both requests have completed.
> > 
> > A similar scenario happens when A is a read and B is a write, too.  Read
> > A is supposed to see either "before B" or "after B".  With fragmentation
> > it can see "some of before B and some of after B".
> 
> This patch series doesn't change that, it just changes where the
> atomicity is broken.

I know but I needed to clarify what the right semantics are :).

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-22 Thread Stefan Hajnoczi
On Tue, Jun 21, 2016 at 12:43:08PM +0200, Kevin Wolf wrote:
> Am 21.06.2016 um 12:23 hat Stefan Hajnoczi geschrieben:
> > On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote:
> > > We have max_transfer documented in BlockLimits, but while we
> > > honor it during pwrite_zeroes, we were blindly ignoring it
> > > during pwritev and preadv, leading to multiple drivers having
> > > to implement fragmentation themselves.  This series moves
> > > fragmentation to the block layer, then fixes the NBD driver to
> > > use it; if you like this but it needs a v2, you can request that
> > > I further do other drivers (I know at least iscsi and qcow2 do
> > > some self-fragmenting and/or error reporting that can be
> > > simplified by deferring fragmentation to the block layer).
> > 
> > I'm concerned that requests A & B which should be atomic can now be
> > interleaved.
> 
> I don't think there is any guarantee of atomicity for overlapping
> requests, at least not with more than a single sector (logical block
> size, not BDRV_SECTOR_SIZE).
> 
> That is, as far as I know neither hardware nor the Linux kernel nor the
> qemu block layer (image formats fragment all the time!) protect against
> this. If you have concurrent overlapping requests, you always get
> undefined behaviour.

Kevin, I think you are right and my mental model was wrong.

I spent some time looking at SCSI SAM and SBC documents.  The closest
information I found was SBC "4.27 Model for uninterrupted sequences on LBA
ranges" where ORWRITE, XDWRITEREAD, XPWRITE, and COMPARE AND WRITE are
singled out.  It implies that the device server may access LBAs while
other types of commands are executing.

If anyone knows a better reference in the SCSI specifications that
clarifies behavior of overlapping I/O requests, please share!

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-21 Thread Fam Zheng
On Mon, 06/20 17:39, Eric Blake wrote:
> We have max_transfer documented in BlockLimits, but while we
> honor it during pwrite_zeroes, we were blindly ignoring it
> during pwritev and preadv, leading to multiple drivers having
> to implement fragmentation themselves.  This series moves
> fragmentation to the block layer, then fixes the NBD driver to
> use it; if you like this but it needs a v2, you can request that
> I further do other drivers (I know at least iscsi and qcow2 do
> some self-fragmenting and/or error reporting that can be
> simplified by deferring fragmentation to the block layer).
> 
> Prequisite: Kevin's block branch, plus my work on byte-based
> block limits (v2 at the moment):
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html
> 
> Also available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1

Patches 1-6:

Reviewed-by: Fam Zheng 



Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-21 Thread Eric Blake
On 06/21/2016 04:23 AM, Stefan Hajnoczi wrote:
> On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote:
>> We have max_transfer documented in BlockLimits, but while we
>> honor it during pwrite_zeroes, we were blindly ignoring it
>> during pwritev and preadv, leading to multiple drivers having
>> to implement fragmentation themselves.  This series moves
>> fragmentation to the block layer, then fixes the NBD driver to
>> use it; if you like this but it needs a v2, you can request that
>> I further do other drivers (I know at least iscsi and qcow2 do
>> some self-fragmenting and/or error reporting that can be
>> simplified by deferring fragmentation to the block layer).
>>

> I'm concerned that requests A & B which should be atomic can now be
> interleaved.
> 
> For example, two writes that are overlapping and fragmented.
> Applications expect to either see A or B on disk when both requests have
> completed.  Fragmentation must serialize overlapping requests in order
> to prevent interleaved results where the application sees some of A and
> some of B when both requests have completed.
> 
> A similar scenario happens when A is a read and B is a write, too.  Read
> A is supposed to see either "before B" or "after B".  With fragmentation
> it can see "some of before B and some of after B".

This patch series doesn't change that, it just changes where the
atomicity is broken.  Consider that pre-patch, a raw format wrapping an
NBD protocol connection would see a 32M max_transfer, but we were not
fragmenting at the raw protocol.  Thus, if someone requests a 40M
operation, the raw protocol treats it as a single transaction, and NBD
then fragments it into two transactions over the wire; and NBD can
complete transactions out of order.  All this series does is move the
fragmentation to the raw layer, rather than the NBD layer, but the race
between overlapping oversize requests is present whether or not the
series is applied.

> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-21 Thread Kevin Wolf
Am 21.06.2016 um 12:23 hat Stefan Hajnoczi geschrieben:
> On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote:
> > We have max_transfer documented in BlockLimits, but while we
> > honor it during pwrite_zeroes, we were blindly ignoring it
> > during pwritev and preadv, leading to multiple drivers having
> > to implement fragmentation themselves.  This series moves
> > fragmentation to the block layer, then fixes the NBD driver to
> > use it; if you like this but it needs a v2, you can request that
> > I further do other drivers (I know at least iscsi and qcow2 do
> > some self-fragmenting and/or error reporting that can be
> > simplified by deferring fragmentation to the block layer).
> 
> I'm concerned that requests A & B which should be atomic can now be
> interleaved.

I don't think there is any guarantee of atomicity for overlapping
requests, at least not with more than a single sector (logical block
size, not BDRV_SECTOR_SIZE).

That is, as far as I know neither hardware nor the Linux kernel nor the
qemu block layer (image formats fragment all the time!) protect against
this. If you have concurrent overlapping requests, you always get
undefined behaviour.

> For example, two writes that are overlapping and fragmented.
> Applications expect to either see A or B on disk when both requests have
> completed.  Fragmentation must serialize overlapping requests in order
> to prevent interleaved results where the application sees some of A and
> some of B when both requests have completed.
> 
> A similar scenario happens when A is a read and B is a write, too.  Read
> A is supposed to see either "before B" or "after B".  With fragmentation
> it can see "some of before B and some of after B".

If we wanted to achieve this semantics, it would be easy enough: Add a
mark_request_serialising() in the right place. But I'm pretty sure that
this isn't needed.

Kevin


pgprxOykknfvs.pgp
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-21 Thread Stefan Hajnoczi
On Mon, Jun 20, 2016 at 05:39:24PM -0600, Eric Blake wrote:
> We have max_transfer documented in BlockLimits, but while we
> honor it during pwrite_zeroes, we were blindly ignoring it
> during pwritev and preadv, leading to multiple drivers having
> to implement fragmentation themselves.  This series moves
> fragmentation to the block layer, then fixes the NBD driver to
> use it; if you like this but it needs a v2, you can request that
> I further do other drivers (I know at least iscsi and qcow2 do
> some self-fragmenting and/or error reporting that can be
> simplified by deferring fragmentation to the block layer).
> 
> Prequisite: Kevin's block branch, plus my work on byte-based
> block limits (v2 at the moment):
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html
> 
> Also available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1
> 
> Eric Blake (5):
>   block: Fragment reads to max transfer length
>   block: Fragment writes to max transfer length
>   raw_bsd: Don't advertise flags not supported by protocol layer
>   nbd: Rely on block layer to break up large requests
>   nbd: Drop unused offset parameter
> 
>  include/block/nbd.h |  1 -
>  nbd/nbd-internal.h  |  4 +--
>  block/io.c  | 84 
> +++--
>  block/nbd-client.c  | 78 ++---
>  block/nbd.c | 12 ++--
>  block/raw_bsd.c |  6 ++--
>  nbd/common.c|  3 +-
>  7 files changed, 95 insertions(+), 93 deletions(-)

I'm concerned that requests A & B which should be atomic can now be
interleaved.

For example, two writes that are overlapping and fragmented.
Applications expect to either see A or B on disk when both requests have
completed.  Fragmentation must serialize overlapping requests in order
to prevent interleaved results where the application sees some of A and
some of B when both requests have completed.

A similar scenario happens when A is a read and B is a write, too.  Read
A is supposed to see either "before B" or "after B".  With fragmentation
it can see "some of before B and some of after B".


signature.asc
Description: PGP signature


Re: [Qemu-devel] [PATCH 0/5] Auto-fragment large transactions at the block layer

2016-06-20 Thread Eric Blake
On 06/20/2016 05:39 PM, Eric Blake wrote:
> We have max_transfer documented in BlockLimits, but while we
> honor it during pwrite_zeroes, we were blindly ignoring it
> during pwritev and preadv, leading to multiple drivers having
> to implement fragmentation themselves.  This series moves
> fragmentation to the block layer, then fixes the NBD driver to
> use it; if you like this but it needs a v2, you can request that
> I further do other drivers (I know at least iscsi and qcow2 do
> some self-fragmenting and/or error reporting that can be
> simplified by deferring fragmentation to the block layer).

iscsi turned out to be easy (see 6/5), but qcow2 is too tricky (we still
have to fragment in qcow2_alloc_cluster_offset() if cluster allocation
is not contiguous, so there is no one good value of max_transfer to use)

> 
> Prequisite: Kevin's block branch, plus my work on byte-based
> block limits (v2 at the moment):
> https://lists.gnu.org/archive/html/qemu-devel/2016-06/msg04006.html
> 
> Also available as a tag at:
> git fetch git://repo.or.cz/qemu/ericb.git nbd-fragment-v1
> 
> Eric Blake (5):
>   block: Fragment reads to max transfer length
>   block: Fragment writes to max transfer length
>   raw_bsd: Don't advertise flags not supported by protocol layer
>   nbd: Rely on block layer to break up large requests
>   nbd: Drop unused offset parameter
> 
>  include/block/nbd.h |  1 -
>  nbd/nbd-internal.h  |  4 +--
>  block/io.c  | 84 
> +++--
>  block/nbd-client.c  | 78 ++---
>  block/nbd.c | 12 ++--
>  block/raw_bsd.c |  6 ++--
>  nbd/common.c|  3 +-
>  7 files changed, 95 insertions(+), 93 deletions(-)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature