Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Michael Roth via
On Fri, Sep 24, 2021 at 08:50:05AM +0200, Christian Borntraeger wrote:
> Peter, Michael,
> 
> do we still do stable releases for QEMU or has this stopped?

Hi Christian,

Yes, it's just been a perfect storm of job moves / bad timing / much-needed
testing rework. I plan to restart the stable releases starting with 6.0.1 and
6.1.1 shortly after. I'll get the release schedules posted on the release wiki
week.

Sorry for the delays on this.

-Mike

> 
> Am 24.09.21 um 07:27 schrieb Paolo Bonzini:
> > Yes, the question is whether it still exists... Paolo El jue., 23 sept. 
> > 2021 16:48, Christian Borntraeger  escribió: Am 
> > 23.09.21 um 15:04 schrieb Paolo Bonzini: > Linux limits the size of iovecs 
> > to 1024 (UIO_MAXIOV ZjQcmQRYFpfptBannerStart
> > This Message Is From an External Sender
> > This message came from outside your organization.
> > ZjQcmQRYFpfptBannerEnd
> > Yes, the question is whether it still exists...
> > 
> > Paolo
> > 
> > El jue., 23 sept. 2021 16:48, Christian Borntraeger  > > escribió:
> > 
> > 
> > 
> > Am 23.09.21 um 15:04 schrieb Paolo Bonzini:
> >  > Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> >  > sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> >  > requests with many iovecs are rejected with -EINVAL by the
> >  > io_submit() or readv()/writev() system calls.
> >  >
> >  > In fact, the same limit applies to SG_IO as well.  To fix both the
> >  > EINVAL and the possible performance issues from using fewer iovecs
> >  > than allowed by Linux (some HBAs have max_segments as low as 128),
> >  > introduce a separate entry in BlockLimits to hold the max_segments
> >  > value from sysfs.  This new limit is used only for SG_IO and clamped
> >  > to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> >  > bs->bl.max_transfer.
> >  >
> >  > Reported-by: Halil Pasic  > >
> >  > Cc: Hanna Reitz mailto:hre...@redhat.com>>
> >  > Cc: Kevin Wolf mailto:kw...@redhat.com>>
> >  > Cc: qemu-bl...@nongnu.org 
> >  > Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, 
> > do not round to power of 2", 2021-06-25)
> > 
> > This sneaked in shortly before the 6.1 release (between rc0 and rc1 I 
> > think).
> > Shouldnt that go to stable in cass this still exist?
> > 
> > 
> >  > Signed-off-by: Paolo Bonzini  > >
> >  > ---
> >  >   block/block-backend.c          | 6 ++
> >  >   block/file-posix.c             | 2 +-
> >  >   block/io.c                     | 1 +
> >  >   hw/scsi/scsi-generic.c         | 2 +-
> >  >   include/block/block_int.h      | 7 +++
> >  >   include/sysemu/block-backend.h | 1 +
> >  >   6 files changed, 17 insertions(+), 2 deletions(-)
> >  >
> >  > diff --git a/block/block-backend.c b/block/block-backend.c
> >  > index 6140d133e2..ba2b5ebb10 100644
> >  > --- a/block/block-backend.c
> >  > +++ b/block/block-backend.c
> >  > @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend 
> > *blk)
> >  >       return ROUND_DOWN(max, blk_get_request_alignment(blk));
> >  >   }
> >  >
> >  > +int blk_get_max_hw_iov(BlockBackend *blk)
> >  > +{
> >  > +    return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
> >  > +                        blk->root->bs->bl.max_iov);
> >  > +}
> >  > +
> >  >   int blk_get_max_iov(BlockBackend *blk)
> >  >   {
> >  >       return blk->root->bs->bl.max_iov;
> >  > diff --git a/block/file-posix.c b/block/file-posix.c
> >  > index cb9bffe047..1567edb3d5 100644
> >  > --- a/block/file-posix.c
> >  > +++ b/block/file-posix.c
> >  > @@ -1273,7 +1273,7 @@ static void 
> > raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >  >
> >  >           ret = hdev_get_max_segments(s->fd, );
> >  >           if (ret > 0) {
> >  > -            bs->bl.max_iov = ret;
> >  > +            bs->bl.max_hw_iov = ret;
> >  >           }
> >  >       }
> >  >   }
> >  > diff --git a/block/io.c b/block/io.c
> >  > index a19942718b..f38e7f81d8 100644
> >  > --- a/block/io.c
> >  > +++ b/block/io.c
> >  > @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, 
> > const BlockLimits *src)
> >  >       dst->min_mem_alignment = MAX(dst->min_mem_alignment,
> >  >                                    src->min_mem_alignment);
> >  >       dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
> >  > +    dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, 
> > src->max_hw_iov);
> >  >   }
> >  >
> >  >   typedef struct BdrvRefreshLimitsState {
> >  > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> >  > index 665baf900e..0306ccc7b1 100644
> >  > --- 

Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Kevin Wolf
Am 23.09.2021 um 15:04 hat Paolo Bonzini geschrieben:
> Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> requests with many iovecs are rejected with -EINVAL by the
> io_submit() or readv()/writev() system calls.
> 
> In fact, the same limit applies to SG_IO as well.  To fix both the
> EINVAL and the possible performance issues from using fewer iovecs
> than allowed by Linux (some HBAs have max_segments as low as 128),
> introduce a separate entry in BlockLimits to hold the max_segments
> value from sysfs.  This new limit is used only for SG_IO and clamped
> to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> bs->bl.max_transfer.
> 
> Reported-by: Halil Pasic 
> Cc: Hanna Reitz 
> Cc: Kevin Wolf 
> Cc: qemu-bl...@nongnu.org
> Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
> round to power of 2", 2021-06-25)
> Signed-off-by: Paolo Bonzini 

Thanks, applied to the block branch.

Kevin




Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-24 Thread Christian Borntraeger

Peter, Michael,

do we still do stable releases for QEMU or has this stopped?

Am 24.09.21 um 07:27 schrieb Paolo Bonzini:

Yes, the question is whether it still exists... Paolo El jue., 23 sept. 2021 16:48, 
Christian Borntraeger  escribió: Am 23.09.21 um 15:04 
schrieb Paolo Bonzini: > Linux limits the size of iovecs to 1024 (UIO_MAXIOV 
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Yes, the question is whether it still exists...

Paolo

El jue., 23 sept. 2021 16:48, Christian Borntraeger mailto:borntrae...@de.ibm.com>> escribió:



Am 23.09.21 um 15:04 schrieb Paolo Bonzini:
 > Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
 > sources, IOV_MAX in POSIX).  Because of this, on some host adapters
 > requests with many iovecs are rejected with -EINVAL by the
 > io_submit() or readv()/writev() system calls.
 >
 > In fact, the same limit applies to SG_IO as well.  To fix both the
 > EINVAL and the possible performance issues from using fewer iovecs
 > than allowed by Linux (some HBAs have max_segments as low as 128),
 > introduce a separate entry in BlockLimits to hold the max_segments
 > value from sysfs.  This new limit is used only for SG_IO and clamped
 > to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
 > bs->bl.max_transfer.
 >
 > Reported-by: Halil Pasic mailto:pa...@linux.ibm.com>>
 > Cc: Hanna Reitz mailto:hre...@redhat.com>>
 > Cc: Kevin Wolf mailto:kw...@redhat.com>>
 > Cc: qemu-bl...@nongnu.org 
 > Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
round to power of 2", 2021-06-25)

This sneaked in shortly before the 6.1 release (between rc0 and rc1 I 
think).
Shouldnt that go to stable in cass this still exist?


 > Signed-off-by: Paolo Bonzini mailto:pbonz...@redhat.com>>
 > ---
 >   block/block-backend.c          | 6 ++
 >   block/file-posix.c             | 2 +-
 >   block/io.c                     | 1 +
 >   hw/scsi/scsi-generic.c         | 2 +-
 >   include/block/block_int.h      | 7 +++
 >   include/sysemu/block-backend.h | 1 +
 >   6 files changed, 17 insertions(+), 2 deletions(-)
 >
 > diff --git a/block/block-backend.c b/block/block-backend.c
 > index 6140d133e2..ba2b5ebb10 100644
 > --- a/block/block-backend.c
 > +++ b/block/block-backend.c
 > @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
 >       return ROUND_DOWN(max, blk_get_request_alignment(blk));
 >   }
 >
 > +int blk_get_max_hw_iov(BlockBackend *blk)
 > +{
 > +    return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
 > +                        blk->root->bs->bl.max_iov);
 > +}
 > +
 >   int blk_get_max_iov(BlockBackend *blk)
 >   {
 >       return blk->root->bs->bl.max_iov;
 > diff --git a/block/file-posix.c b/block/file-posix.c
 > index cb9bffe047..1567edb3d5 100644
 > --- a/block/file-posix.c
 > +++ b/block/file-posix.c
 > @@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState 
*bs, Error **errp)
 >
 >           ret = hdev_get_max_segments(s->fd, );
 >           if (ret > 0) {
 > -            bs->bl.max_iov = ret;
 > +            bs->bl.max_hw_iov = ret;
 >           }
 >       }
 >   }
 > diff --git a/block/io.c b/block/io.c
 > index a19942718b..f38e7f81d8 100644
 > --- a/block/io.c
 > +++ b/block/io.c
 > @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, 
const BlockLimits *src)
 >       dst->min_mem_alignment = MAX(dst->min_mem_alignment,
 >                                    src->min_mem_alignment);
 >       dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
 > +    dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
 >   }
 >
 >   typedef struct BdrvRefreshLimitsState {
 > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
 > index 665baf900e..0306ccc7b1 100644
 > --- a/hw/scsi/scsi-generic.c
 > +++ b/hw/scsi/scsi-generic.c
 > @@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq 
*r, SCSIDevice *s, int len)
 >           page = r->req.cmd.buf[2];
 >           if (page == 0xb0) {
 >               uint64_t max_transfer = 
blk_get_max_hw_transfer(s->conf.blk);
 > -            uint32_t max_iov = blk_get_max_iov(s->conf.blk);
 > +            uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
 >
 >               assert(max_transfer);
 >               max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
qemu_real_host_page_size)
 > diff --git a/include/block/block_int.h b/include/block/block_int.h
 > index f1a54db0f8..c31cbd034a 100644
 > --- a/include/block/block_int.h
 > +++ 

Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-23 Thread Paolo Bonzini
Yes, the question is whether it still exists...

Paolo

El jue., 23 sept. 2021 16:48, Christian Borntraeger 
escribió:

>
>
> Am 23.09.21 um 15:04 schrieb Paolo Bonzini:
> > Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> > sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> > requests with many iovecs are rejected with -EINVAL by the
> > io_submit() or readv()/writev() system calls.
> >
> > In fact, the same limit applies to SG_IO as well.  To fix both the
> > EINVAL and the possible performance issues from using fewer iovecs
> > than allowed by Linux (some HBAs have max_segments as low as 128),
> > introduce a separate entry in BlockLimits to hold the max_segments
> > value from sysfs.  This new limit is used only for SG_IO and clamped
> > to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> > bs->bl.max_transfer.
> >
> > Reported-by: Halil Pasic 
> > Cc: Hanna Reitz 
> > Cc: Kevin Wolf 
> > Cc: qemu-bl...@nongnu.org
> > Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do
> not round to power of 2", 2021-06-25)
>
> This sneaked in shortly before the 6.1 release (between rc0 and rc1 I
> think).
> Shouldnt that go to stable in cass this still exist?
>
>
> > Signed-off-by: Paolo Bonzini 
> > ---
> >   block/block-backend.c  | 6 ++
> >   block/file-posix.c | 2 +-
> >   block/io.c | 1 +
> >   hw/scsi/scsi-generic.c | 2 +-
> >   include/block/block_int.h  | 7 +++
> >   include/sysemu/block-backend.h | 1 +
> >   6 files changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index 6140d133e2..ba2b5ebb10 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> > @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
> >   return ROUND_DOWN(max, blk_get_request_alignment(blk));
> >   }
> >
> > +int blk_get_max_hw_iov(BlockBackend *blk)
> > +{
> > +return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
> > +blk->root->bs->bl.max_iov);
> > +}
> > +
> >   int blk_get_max_iov(BlockBackend *blk)
> >   {
> >   return blk->root->bs->bl.max_iov;
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index cb9bffe047..1567edb3d5 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState
> *bs, Error **errp)
> >
> >   ret = hdev_get_max_segments(s->fd, );
> >   if (ret > 0) {
> > -bs->bl.max_iov = ret;
> > +bs->bl.max_hw_iov = ret;
> >   }
> >   }
> >   }
> > diff --git a/block/io.c b/block/io.c
> > index a19942718b..f38e7f81d8 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst,
> const BlockLimits *src)
> >   dst->min_mem_alignment = MAX(dst->min_mem_alignment,
> >src->min_mem_alignment);
> >   dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
> > +dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
> >   }
> >
> >   typedef struct BdrvRefreshLimitsState {
> > diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> > index 665baf900e..0306ccc7b1 100644
> > --- a/hw/scsi/scsi-generic.c
> > +++ b/hw/scsi/scsi-generic.c
> > @@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq
> *r, SCSIDevice *s, int len)
> >   page = r->req.cmd.buf[2];
> >   if (page == 0xb0) {
> >   uint64_t max_transfer =
> blk_get_max_hw_transfer(s->conf.blk);
> > -uint32_t max_iov = blk_get_max_iov(s->conf.blk);
> > +uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
> >
> >   assert(max_transfer);
> >   max_transfer = MIN_NON_ZERO(max_transfer, max_iov *
> qemu_real_host_page_size)
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index f1a54db0f8..c31cbd034a 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -702,6 +702,13 @@ typedef struct BlockLimits {
> >*/
> >   uint64_t max_hw_transfer;
> >
> > +/* Maximal number of scatter/gather elements allowed by the
> hardware.
> > + * Applies whenever transfers to the device bypass the kernel I/O
> > + * scheduler, for example with SG_IO.  If larger than max_iov
> > + * or if zero, blk_get_max_hw_iov will fall back to max_iov.
> > + */
> > +int max_hw_iov;
> > +
> >   /* memory alignment, in bytes so that no bounce buffer is needed */
> >   size_t min_mem_alignment;
> >
> > diff --git a/include/sysemu/block-backend.h
> b/include/sysemu/block-backend.h
> > index 29d4fdbf63..82bae55161 100644
> > --- a/include/sysemu/block-backend.h
> > +++ b/include/sysemu/block-backend.h
> > @@ -211,6 +211,7 @@ uint32_t blk_get_request_alignment(BlockBackend
> *blk);
> >   uint32_t 

Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-23 Thread Halil Pasic
On Thu, 23 Sep 2021 16:28:11 +0200
Halil Pasic  wrote:

> Can't we use some of the established constants instead of hard coding a
> qemu specific IOV_MAX?
> 
> POSIX.1 seems to guarantee the availability of IOV_MAX in 
> according to: https://man7.org/linux/man-pages/man2/readv.2.html
> and  may have UIO_MAXIOV defined.

Never mind, the 
#define IOV_MAX 1024
in osdep.h is conditional and I guess we already use IOV_MAX from limit
when CONFIG_IOVEC is defined, i.e. when we don't emulate the interface.

Sorry for the noise.

Regards,
Halil



Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-23 Thread Christian Borntraeger




Am 23.09.21 um 15:04 schrieb Paolo Bonzini:

Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
sources, IOV_MAX in POSIX).  Because of this, on some host adapters
requests with many iovecs are rejected with -EINVAL by the
io_submit() or readv()/writev() system calls.

In fact, the same limit applies to SG_IO as well.  To fix both the
EINVAL and the possible performance issues from using fewer iovecs
than allowed by Linux (some HBAs have max_segments as low as 128),
introduce a separate entry in BlockLimits to hold the max_segments
value from sysfs.  This new limit is used only for SG_IO and clamped
to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
bs->bl.max_transfer.

Reported-by: Halil Pasic 
Cc: Hanna Reitz 
Cc: Kevin Wolf 
Cc: qemu-bl...@nongnu.org
Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not round to 
power of 2", 2021-06-25)


This sneaked in shortly before the 6.1 release (between rc0 and rc1 I think).
Shouldnt that go to stable in cass this still exist?



Signed-off-by: Paolo Bonzini 
---
  block/block-backend.c  | 6 ++
  block/file-posix.c | 2 +-
  block/io.c | 1 +
  hw/scsi/scsi-generic.c | 2 +-
  include/block/block_int.h  | 7 +++
  include/sysemu/block-backend.h | 1 +
  6 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6140d133e2..ba2b5ebb10 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
  return ROUND_DOWN(max, blk_get_request_alignment(blk));
  }
  
+int blk_get_max_hw_iov(BlockBackend *blk)

+{
+return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
+blk->root->bs->bl.max_iov);
+}
+
  int blk_get_max_iov(BlockBackend *blk)
  {
  return blk->root->bs->bl.max_iov;
diff --git a/block/file-posix.c b/block/file-posix.c
index cb9bffe047..1567edb3d5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
  
  ret = hdev_get_max_segments(s->fd, );

  if (ret > 0) {
-bs->bl.max_iov = ret;
+bs->bl.max_hw_iov = ret;
  }
  }
  }
diff --git a/block/io.c b/block/io.c
index a19942718b..f38e7f81d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
BlockLimits *src)
  dst->min_mem_alignment = MAX(dst->min_mem_alignment,
   src->min_mem_alignment);
  dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
+dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
  }
  
  typedef struct BdrvRefreshLimitsState {

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 665baf900e..0306ccc7b1 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, 
SCSIDevice *s, int len)
  page = r->req.cmd.buf[2];
  if (page == 0xb0) {
  uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
-uint32_t max_iov = blk_get_max_iov(s->conf.blk);
+uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
  
  assert(max_transfer);

  max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
qemu_real_host_page_size)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f1a54db0f8..c31cbd034a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -702,6 +702,13 @@ typedef struct BlockLimits {
   */
  uint64_t max_hw_transfer;
  
+/* Maximal number of scatter/gather elements allowed by the hardware.

+ * Applies whenever transfers to the device bypass the kernel I/O
+ * scheduler, for example with SG_IO.  If larger than max_iov
+ * or if zero, blk_get_max_hw_iov will fall back to max_iov.
+ */
+int max_hw_iov;
+
  /* memory alignment, in bytes so that no bounce buffer is needed */
  size_t min_mem_alignment;
  
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h

index 29d4fdbf63..82bae55161 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -211,6 +211,7 @@ uint32_t blk_get_request_alignment(BlockBackend *blk);
  uint32_t blk_get_max_transfer(BlockBackend *blk);
  uint64_t blk_get_max_hw_transfer(BlockBackend *blk);
  int blk_get_max_iov(BlockBackend *blk);
+int blk_get_max_hw_iov(BlockBackend *blk);
  void blk_set_guest_block_size(BlockBackend *blk, int align);
  void *blk_try_blockalign(BlockBackend *blk, size_t size);
  void *blk_blockalign(BlockBackend *blk, size_t size);





Re: [PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-23 Thread Halil Pasic
On Thu, 23 Sep 2021 09:04:36 -0400
Paolo Bonzini  wrote:

> Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
> sources, IOV_MAX in POSIX).  Because of this, on some host adapters
> requests with many iovecs are rejected with -EINVAL by the
> io_submit() or readv()/writev() system calls.
> 
> In fact, the same limit applies to SG_IO as well.  To fix both the
> EINVAL and the possible performance issues from using fewer iovecs
> than allowed by Linux (some HBAs have max_segments as low as 128),
> introduce a separate entry in BlockLimits to hold the max_segments
> value from sysfs.  This new limit is used only for SG_IO and clamped
> to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
> bs->bl.max_transfer.

Doesn't this patch render bs->bl.max_iov a constant?

$ git grep -p -e 'bl\(.\|->\)max_iov'
block/block-backend.c=int blk_get_max_iov(BlockBackend *blk)
block/block-backend.c:return blk->root->bs->bl.max_iov;
block/file-posix.c=static void raw_refresh_limits(BlockDriverState *bs, Error 
**errp)
block/file-posix.c:bs->bl.max_iov = ret;
block/io.c=void bdrv_refresh_limits(BlockDriverState *bs, Transaction *tran, 
Error **errp)
block/io.c:bs->bl.max_iov = IOV_MAX;
block/mirror.c=static int coroutine_fn mirror_run(Job *job, Error **errp)
block/mirror.c:s->max_iov = MIN(bs->bl.max_iov, target_bs->bl.max_iov);

Can't we use some of the established constants instead of hard coding a
qemu specific IOV_MAX?

POSIX.1 seems to guarantee the availability of IOV_MAX in 
according to: https://man7.org/linux/man-pages/man2/readv.2.html
and  may have UIO_MAXIOV defined.

> 
> Reported-by: Halil Pasic 
> Cc: Hanna Reitz 
> Cc: Kevin Wolf 
> Cc: qemu-bl...@nongnu.org
> Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
> round to power of 2", 2021-06-25)
> Signed-off-by: Paolo Bonzini 
> ---
>  block/block-backend.c  | 6 ++
>  block/file-posix.c | 2 +-
>  block/io.c | 1 +
>  hw/scsi/scsi-generic.c | 2 +-
>  include/block/block_int.h  | 7 +++
>  include/sysemu/block-backend.h | 1 +
>  6 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 6140d133e2..ba2b5ebb10 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
>  return ROUND_DOWN(max, blk_get_request_alignment(blk));
>  }
>  
> +int blk_get_max_hw_iov(BlockBackend *blk)
> +{
> +return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
> +blk->root->bs->bl.max_iov);
> +}
> +
>  int blk_get_max_iov(BlockBackend *blk)
>  {
>  return blk->root->bs->bl.max_iov;
> diff --git a/block/file-posix.c b/block/file-posix.c
> index cb9bffe047..1567edb3d5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  
>  ret = hdev_get_max_segments(s->fd, );
>  if (ret > 0) {
> -bs->bl.max_iov = ret;
> +bs->bl.max_hw_iov = ret;
>  }
>  }
>  }
> diff --git a/block/io.c b/block/io.c
> index a19942718b..f38e7f81d8 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
> BlockLimits *src)
>  dst->min_mem_alignment = MAX(dst->min_mem_alignment,
>   src->min_mem_alignment);
>  dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
> +dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
>  }
>  
>  typedef struct BdrvRefreshLimitsState {
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 665baf900e..0306ccc7b1 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, 
> SCSIDevice *s, int len)
>  page = r->req.cmd.buf[2];
>  if (page == 0xb0) {
>  uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
> -uint32_t max_iov = blk_get_max_iov(s->conf.blk);
> +uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
>  
>  assert(max_transfer);
>  max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
> qemu_real_host_page_size)
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index f1a54db0f8..c31cbd034a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -702,6 +702,13 @@ typedef struct BlockLimits {
>   */
>  uint64_t max_hw_transfer;
>  
> +/* Maximal number of scatter/gather elements allowed by the hardware.
> + * Applies whenever transfers to the device bypass the kernel I/O
> + * scheduler, for example with SG_IO.  If larger than max_iov
> + * or if zero, blk_get_max_hw_iov will fall back to max_iov.
> + */
> +int max_hw_iov;
> +
>  

[PATCH] block: introduce max_hw_iov for use in scsi-generic

2021-09-23 Thread Paolo Bonzini
Linux limits the size of iovecs to 1024 (UIO_MAXIOV in the kernel
sources, IOV_MAX in POSIX).  Because of this, on some host adapters
requests with many iovecs are rejected with -EINVAL by the
io_submit() or readv()/writev() system calls.

In fact, the same limit applies to SG_IO as well.  To fix both the
EINVAL and the possible performance issues from using fewer iovecs
than allowed by Linux (some HBAs have max_segments as low as 128),
introduce a separate entry in BlockLimits to hold the max_segments
value from sysfs.  This new limit is used only for SG_IO and clamped
to bs->bl.max_iov anyway, just like max_hw_transfer is clamped to
bs->bl.max_transfer.

Reported-by: Halil Pasic 
Cc: Hanna Reitz 
Cc: Kevin Wolf 
Cc: qemu-bl...@nongnu.org
Fixes: 18473467d5 ("file-posix: try BLKSECTGET on block devices too, do not 
round to power of 2", 2021-06-25)
Signed-off-by: Paolo Bonzini 
---
 block/block-backend.c  | 6 ++
 block/file-posix.c | 2 +-
 block/io.c | 1 +
 hw/scsi/scsi-generic.c | 2 +-
 include/block/block_int.h  | 7 +++
 include/sysemu/block-backend.h | 1 +
 6 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6140d133e2..ba2b5ebb10 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1986,6 +1986,12 @@ uint32_t blk_get_max_transfer(BlockBackend *blk)
 return ROUND_DOWN(max, blk_get_request_alignment(blk));
 }
 
+int blk_get_max_hw_iov(BlockBackend *blk)
+{
+return MIN_NON_ZERO(blk->root->bs->bl.max_hw_iov,
+blk->root->bs->bl.max_iov);
+}
+
 int blk_get_max_iov(BlockBackend *blk)
 {
 return blk->root->bs->bl.max_iov;
diff --git a/block/file-posix.c b/block/file-posix.c
index cb9bffe047..1567edb3d5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1273,7 +1273,7 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 
 ret = hdev_get_max_segments(s->fd, );
 if (ret > 0) {
-bs->bl.max_iov = ret;
+bs->bl.max_hw_iov = ret;
 }
 }
 }
diff --git a/block/io.c b/block/io.c
index a19942718b..f38e7f81d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -136,6 +136,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const 
BlockLimits *src)
 dst->min_mem_alignment = MAX(dst->min_mem_alignment,
  src->min_mem_alignment);
 dst->max_iov = MIN_NON_ZERO(dst->max_iov, src->max_iov);
+dst->max_hw_iov = MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov);
 }
 
 typedef struct BdrvRefreshLimitsState {
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 665baf900e..0306ccc7b1 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -180,7 +180,7 @@ static int scsi_handle_inquiry_reply(SCSIGenericReq *r, 
SCSIDevice *s, int len)
 page = r->req.cmd.buf[2];
 if (page == 0xb0) {
 uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
-uint32_t max_iov = blk_get_max_iov(s->conf.blk);
+uint32_t max_iov = blk_get_max_hw_iov(s->conf.blk);
 
 assert(max_transfer);
 max_transfer = MIN_NON_ZERO(max_transfer, max_iov * 
qemu_real_host_page_size)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f1a54db0f8..c31cbd034a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -702,6 +702,13 @@ typedef struct BlockLimits {
  */
 uint64_t max_hw_transfer;
 
+/* Maximal number of scatter/gather elements allowed by the hardware.
+ * Applies whenever transfers to the device bypass the kernel I/O
+ * scheduler, for example with SG_IO.  If larger than max_iov
+ * or if zero, blk_get_max_hw_iov will fall back to max_iov.
+ */
+int max_hw_iov;
+
 /* memory alignment, in bytes so that no bounce buffer is needed */
 size_t min_mem_alignment;
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 29d4fdbf63..82bae55161 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -211,6 +211,7 @@ uint32_t blk_get_request_alignment(BlockBackend *blk);
 uint32_t blk_get_max_transfer(BlockBackend *blk);
 uint64_t blk_get_max_hw_transfer(BlockBackend *blk);
 int blk_get_max_iov(BlockBackend *blk);
+int blk_get_max_hw_iov(BlockBackend *blk);
 void blk_set_guest_block_size(BlockBackend *blk, int align);
 void *blk_try_blockalign(BlockBackend *blk, size_t size);
 void *blk_blockalign(BlockBackend *blk, size_t size);
-- 
2.27.0