Re: [RFC PATCH 9/9] hw/sd: Allow card size not power of 2 again

2021-06-24 Thread Tom Yan
Hi,

On Thu, 24 Jun 2021 at 02:01, Philippe Mathieu-Daudé  wrote:
>
> In commit a9bcedd15a5 ("hw/sd/sdcard: Do not allow invalid SD card
> sizes") we tried to protect us from CVE-2020-13253 by only allowing
> card with power-of-2 sizes. However doing so we disrupted valid user
> cases. As a compromise, allow any card size, but warn only power of 2
> sizes are supported, still suggesting the user how to increase a
> card with 'qemu-img resize'.
>
> Cc: Tom Yan 
> Cc: Warner Losh 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1910586
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/sd/sd.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/hw/sd/sd.c b/hw/sd/sd.c
> index 9c8dd11bad1..cab4aab1475 100644
> --- a/hw/sd/sd.c
> +++ b/hw/sd/sd.c
> @@ -2131,23 +2131,16 @@ static void sd_realize(DeviceState *dev, Error **errp)
>  blk_size = blk_getlength(sd->blk);
>  if (blk_size > 0 && !is_power_of_2(blk_size)) {
>  int64_t blk_size_aligned = pow2ceil(blk_size);
> -char *blk_size_str;
> +g_autofree char *blk_size_s = size_to_str(blk_size);
> +g_autofree char *blk_size_aligned_s = 
> size_to_str(blk_size_aligned);
>
> -blk_size_str = size_to_str(blk_size);
> -error_setg(errp, "Invalid SD card size: %s", blk_size_str);
> -g_free(blk_size_str);
> -
> -blk_size_str = size_to_str(blk_size_aligned);
> -error_append_hint(errp,
> -  "SD card size has to be a power of 2, e.g. 
> %s.\n"
> -  "You can resize disk images with"
> -  " 'qemu-img resize  '\n"
> -  "(note that this will lose data if you make 
> the"
> -  " image smaller than it currently is).\n",
> -  blk_size_str);
> -g_free(blk_size_str);
> -
> -return;
> +warn_report("SD card size is not a power of 2 (%s). It might 
> work"
> +" but is not supported by QEMU. If possible, resize"
> +" your disk image to %s with:",
> +blk_size_s, blk_size_aligned_s);
> +warn_report(" 'qemu-img resize  '");
> +warn_report("(note that this will lose data if you make the"
> +" image smaller than it currently is).");

Not trying to be picky, but I don't think this is much better. IMHO
it's quite irresponsible to give a warning like that, leaving users in
a state like "Should I use it or not then?", without giving a concrete
reference to what exactly might/would lead to the warned problem.

I really think we should get (/ have gotten) things clear first. What
exactly is the bug we have been talking about here? I mean like, where
does it occur and what's the nature of it.

1. Is it specific to a certain type / model of backend / physical
storage device that will be made use of by qemu for the emulated
storage? (I presume not since you mention about image, unless you
irrationally related/bound the emulated storage type and the physical
storage type together.)

2. Does it have anything to do with a certain flaw in qemu itself?
Like the code that does read/write operation is flawed that it cannot
be handled by a certain *proper* backend device?

3. Or is it actually a bug in a certain driver / firmware blob that
will be used by an *emulated* device in the guest? In that case, can
we emulate another model so that it won't be using the problematic
driver / firmware?

Also, could you provide any information / reference to what the bug
really is? Like a bug report (for the problem itself, not some vague
claim that qemu should workaround the problem)?

>  }
>
>  ret = blk_set_perm(sd->blk, BLK_PERM_CONSISTENT_READ | 
> BLK_PERM_WRITE,
> --
> 2.31.1
>



Re: [PATCH v2 1/5] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-12-10 Thread Tom Yan
On Wed, 9 Dec 2020 at 21:54, Maxim Levitsky  wrote:
>
> From: Tom Yan 
>
> We can and should get max transfer length and max segments for all host
> devices / cdroms (on Linux).
>
> Also use MIN_NON_ZERO instead when we clamp max transfer length against
> max segments.
>
> Signed-off-by: Tom Yan 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/file-posix.c | 59 +-
>  1 file changed, 43 insertions(+), 16 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index d5fd1dbcd2..226ddbbdad 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1162,6 +1162,12 @@ static void raw_reopen_abort(BDRVReopenState *state)
>
>  static int sg_get_max_transfer_length(int fd)
>  {
> +/*
> + * BLKSECTGET for /dev/sg* character devices incorrectly returns
> + * the max transfer size in bytes (rather than in blocks).
> + * Also note that /dev/sg* doesn't support BLKSSZGET ioctl.
> + */
The second statement should be removed. Also maybe it's better to have
the first one right above the line `return max_bytes;`.
> +
>  #ifdef BLKSECTGET
>  int max_bytes = 0;
>
> @@ -1175,7 +1181,22 @@ static int sg_get_max_transfer_length(int fd)
>  #endif
>  }
>
> -static int sg_get_max_segments(int fd)
> +static int get_max_transfer_length(int fd)
> +{
> +#if defined(BLKSECTGET)
> +int sect = 0;
> +
> +if (ioctl(fd, BLKSECTGET, ) == 0) {
> +return sect << 9;
> +} else {
> +return -errno;
> +}
> +#else
> +return -ENOSYS;
> +#endif
> +}
> +
> +static int get_max_segments(int fd)
>  {
>  #ifdef CONFIG_LINUX
>  char buf[32];
> @@ -1230,23 +1251,29 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
>
> -if (bs->sg) {
> -int ret = sg_get_max_transfer_length(s->fd);
> +raw_probe_alignment(bs, s->fd, errp);
> +bs->bl.min_mem_alignment = s->buf_align;
> +bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
> +}
>
> -if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> -bs->bl.max_transfer = pow2floor(ret);
> -}
> +static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> +BDRVRawState *s = bs->opaque;
>
> -ret = sg_get_max_segments(s->fd);
> -if (ret > 0) {
> -bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> -  ret * qemu_real_host_page_size);
> -}
> +int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
> +   get_max_transfer_length(s->fd);
> +
> +if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> +bs->bl.max_transfer = pow2floor(ret);
>  }
>
> -raw_probe_alignment(bs, s->fd, errp);
> -bs->bl.min_mem_alignment = s->buf_align;
> -bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
> +ret = get_max_segments(s->fd);
> +if (ret > 0) {
> +bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> +   ret * qemu_real_host_page_size);
> +}
> +
> +raw_refresh_limits(bs, errp);
>  }
>
>  static int check_for_dasd(int fd)
> @@ -3601,7 +3628,7 @@ static BlockDriver bdrv_host_device = {
>  .bdrv_co_pdiscard   = hdev_co_pdiscard,
>  .bdrv_co_copy_range_from = raw_co_copy_range_from,
>  .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> -.bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_refresh_limits = hdev_refresh_limits,
>  .bdrv_io_plug = raw_aio_plug,
>  .bdrv_io_unplug = raw_aio_unplug,
>  .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> @@ -3725,7 +3752,7 @@ static BlockDriver bdrv_host_cdrom = {
>  .bdrv_co_preadv = raw_co_preadv,
>  .bdrv_co_pwritev= raw_co_pwritev,
>  .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> -.bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_refresh_limits = hdev_refresh_limits,
>  .bdrv_io_plug = raw_aio_plug,
>  .bdrv_io_unplug = raw_aio_unplug,
>  .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> --
> 2.26.2
>



Re: [PATCH 1/5] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-11-04 Thread Tom Yan
Actually I made a mistake in this. BLKSECTGET (the one in the block
layer) returns the number of "sectors", which is "defined" as 512-byte
block. So we shouldn't use BLKSSZGET here, but simply 512 (1 << 9).
See logical_to_sectors() in sd.h of the kernel.

On Thu, 5 Nov 2020 at 01:32, Maxim Levitsky  wrote:
>
> From: Tom Yan 
>
> We can and should get max transfer length and max segments for all host
> devices / cdroms (on Linux).
>
> Also use MIN_NON_ZERO instead when we clamp max transfer length against
> max segments.
>
> Signed-off-by: Tom Yan 
> Signed-off-by: Maxim Levitsky 
> ---
>  block/file-posix.c | 61 ++
>  1 file changed, 45 insertions(+), 16 deletions(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index c63926d592..6581f41b2b 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1162,6 +1162,12 @@ static void raw_reopen_abort(BDRVReopenState *state)
>
>  static int sg_get_max_transfer_length(int fd)
>  {
> +/*
> + * BLKSECTGET for /dev/sg* character devices incorrectly returns
> + * the max transfer size in bytes (rather than in blocks).
> + * Also note that /dev/sg* doesn't support BLKSSZGET ioctl.
> + */
> +
>  #ifdef BLKSECTGET
>  int max_bytes = 0;
>
> @@ -1175,7 +1181,24 @@ static int sg_get_max_transfer_length(int fd)
>  #endif
>  }
>
> -static int sg_get_max_segments(int fd)
> +static int get_max_transfer_length(int fd)
> +{
> +#if defined(BLKSECTGET) && defined(BLKSSZGET)
> +int sect = 0;
> +int ssz = 0;
> +
> +if (ioctl(fd, BLKSECTGET, ) == 0 &&
> +ioctl(fd, BLKSSZGET, ) == 0) {
> +return sect * ssz;
> +} else {
> +return -errno;
> +}
> +#else
> +return -ENOSYS;
> +#endif
> +}
> +
> +static int get_max_segments(int fd)
>  {
>  #ifdef CONFIG_LINUX
>  char buf[32];
> @@ -1230,23 +1253,29 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
>
> -if (bs->sg) {
> -int ret = sg_get_max_transfer_length(s->fd);
> +raw_probe_alignment(bs, s->fd, errp);
> +bs->bl.min_mem_alignment = s->buf_align;
> +bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
> +}
>
> -if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> -bs->bl.max_transfer = pow2floor(ret);
> -}
> +static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> +BDRVRawState *s = bs->opaque;
>
> -ret = sg_get_max_segments(s->fd);
> -if (ret > 0) {
> -bs->bl.max_transfer = MIN(bs->bl.max_transfer,
> -  ret * qemu_real_host_page_size);
> -}
> +int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
> +   get_max_transfer_length(s->fd);
> +
> +if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
> +bs->bl.max_transfer = pow2floor(ret);
>  }
>
> -raw_probe_alignment(bs, s->fd, errp);
> -bs->bl.min_mem_alignment = s->buf_align;
> -bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
> +ret = get_max_segments(s->fd);
> +if (ret > 0) {
> +bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> +   ret * qemu_real_host_page_size);
> +}
> +
> +raw_refresh_limits(bs, errp);
>  }
>
>  static int check_for_dasd(int fd)
> @@ -3600,7 +3629,7 @@ static BlockDriver bdrv_host_device = {
>  .bdrv_co_pdiscard   = hdev_co_pdiscard,
>  .bdrv_co_copy_range_from = raw_co_copy_range_from,
>  .bdrv_co_copy_range_to  = raw_co_copy_range_to,
> -.bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_refresh_limits = hdev_refresh_limits,
>  .bdrv_io_plug = raw_aio_plug,
>  .bdrv_io_unplug = raw_aio_unplug,
>  .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> @@ -3724,7 +3753,7 @@ static BlockDriver bdrv_host_cdrom = {
>  .bdrv_co_preadv = raw_co_preadv,
>  .bdrv_co_pwritev= raw_co_pwritev,
>  .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
> -.bdrv_refresh_limits = raw_refresh_limits,
> +.bdrv_refresh_limits = hdev_refresh_limits,
>  .bdrv_io_plug = raw_aio_plug,
>  .bdrv_io_unplug = raw_aio_unplug,
>  .bdrv_attach_aio_context = raw_aio_attach_aio_context,
> --
> 2.26.2
>



Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-06 Thread Tom Yan
Your patch only fixed it for sg devices. S_ISBLK() will never be true
when sg_get_max_segments() is only called in raw_refresh_limits() when
bs->sg is true.

With that said, it's not like we cannot ditch the bs->sg check in
raw_refresh_limits() and switch to S_ISBLK()/S_ISCHR(). (Although not
strictly necessary, I would also do S_ISCHR() for
SG_GET_SG_TABLESIZE.)

I don't know if it's a better approach though. Seems repetitive to me
(and bs->sg is not only used/checked in
file-posix/raw_refresh_limits):
https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3390
https://github.com/qemu/qemu/blob/v5.1.0/block/file-posix.c#L3504

On Mon, 7 Sep 2020 at 03:50, Dmitry Fomichev  wrote:
>
>
>
> > -Original Message-
> > From: Qemu-block  > bounces+dmitry.fomichev=wdc@nongnu.org> On Behalf Of Maxim
> > Levitsky
> > Sent: Sunday, September 6, 2020 1:06 PM
> > To: Tom Yan ; ebl...@redhat.com;
> > pbonz...@redhat.com; f...@euphon.net; anie...@linux.vnet.ibm.com;
> > kw...@redhat.com; mre...@redhat.com
> > Cc: qemu-de...@nongnu.org; qemu-block@nongnu.org
> > Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that
> > actually works with sg
> >
> > On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:
> > > I don't disagree with your proposal, but the thing is, do we even need
> > > another field/limit for case 1? For example, do we *really* need to
> > > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
> > > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
> > > of "dynamic" limit?
> >
> > It depends on if we have other block drivers that have IO segment/max
> > transfer sizes,
> > that we need to enforce, other than the SG_IO case.
> >
> > Knowing that the commit that added these limits to file-posix, which I was
> > fixing is relatively
> > new, I guess there are other cases for these limits.
> >
> > I'll check this tomorrow.
> >
> > >
> > > Either way I can add another field (`max_pwrite`, maybe?) to
> > > BlockLimits, as an infrastructure/hint, but what should be switched to
> > > it, and what value should each block driver set it to, is probably
> > > beyond what I can take.
> >
> > Implementation wise, I think I can do this, but I'll wait few days for the
> > feedback on my proposal.
> >
> > Thanks for finding this bug!
>
> There was also 
> https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomic...@wdc.com/
> posted three weeks ago, but, seemingly, it was ignored... sorry if this
> bug is already being fixed in a different way, but I think the fix presented
> in that patch is worth considering.
>
> Very best,
> Dmitry
>
> >
> > Best regards,
> >   Maxim Levitsky
> >
> > >
> > > IMHO my patches should be merged first (they are really just fixing a
> > > regression and some bugs). If anyone finds it necessary and is capable
> > > and willing to fix the insufficiency of BlockLimits, they can do it
> > > afterwards with another patch series as an improvement. I can add a
> > > patch to remove the clamping in nbd/server.c as a perhaps-temporary
> > > measure to address the original issue, if desired.
> > >
> > > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky 
> > wrote:
> > > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > > > > Maybe you want to add some condition for this:
> > > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > > > > Or not clamp it at all.
> > > > >
> > > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan  wrote:
> > > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky
> > appears
> > > > > > to have assumed that the only "SCSI Passthrough" is `-device
> > > > > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, 
> > > > > > getting
> > > > > > max_sectors is necessary to it (more precisely, hw_max_sectors
> > might
> > > > > > what matters, but BLKSECTGET reports max_sectors, so).
> > > > > >
> > > > > > I'm unsure about how qemu-nbd works, but the commit clearly
> > wasn't the
> > > > > > right approach to fix the original issue it addresses. (It should 
> > > > > > for
> > > > > > example ignore the max_transfer if it will never matter in to it, 

Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-06 Thread Tom Yan
I don't disagree with your proposal, but the thing is, do we even need
another field/limit for case 1? For example, do we *really* need to
clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
of "dynamic" limit?

Either way I can add another field (`max_pwrite`, maybe?) to
BlockLimits, as an infrastructure/hint, but what should be switched to
it, and what value should each block driver set it to, is probably
beyond what I can take.

IMHO my patches should be merged first (they are really just fixing a
regression and some bugs). If anyone finds it necessary and is capable
and willing to fix the insufficiency of BlockLimits, they can do it
afterwards with another patch series as an improvement. I can add a
patch to remove the clamping in nbd/server.c as a perhaps-temporary
measure to address the original issue, if desired.

On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky  wrote:
>
> On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > Maybe you want to add some condition for this:
> > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > Or not clamp it at all.
> >
> > On Sun, 6 Sep 2020 at 18:58, Tom Yan  wrote:
> > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> > > to have assumed that the only "SCSI Passthrough" is `-device
> > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > > max_sectors is necessary to it (more precisely, hw_max_sectors might
> > > what matters, but BLKSECTGET reports max_sectors, so).
> > >
> > > I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> > > right approach to fix the original issue it addresses. (It should for
> > > example ignore the max_transfer if it will never matter in to it, or
> > > overrides it in certain cases; when I glimpsed over
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> > > it could be file-posix problem when it is reporting the right thing,
> > > regardless of whether "removing" the code helps.)
> > >
> > > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > > will probably bring even more unexpected problems, because they are
> > > literally different sets of things behind the scene / in the kernel.
>
> Yes, I did overlook the fact that scsi-block is kind of hybrid passthough 
> device,
> doing SG_IO for some things and regular IO for others.
>
> I don't think that my approach was wrong way to fix the problem, but as you 
> found
> out, abusing 'bs->sg' hack (which I would be very happy to remove completely)
> wasn't a good idea.
> I actualy was aware of scsi-block and that it does SG_IO but it
> was forgotten some where on the way.
>
> So in summary what the problem is and what I think is the right solution:
>
>
> Each qemu block driver exposes block limits and assumes that they are the same
> for two IO interfaces the block driver can expose:
>
> 1. Regular qemu blk_pread/pwrite alike functions
> 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
>host block devices/sg char devices, and by iscsi
>
> The problem is that these two interfaces can have different block limits.
>
> I don't know about iscsi, but for files, doing regular IO is always unlimited,
> since it passes through the kernel block layer and segemented there on
> demand which is faster that doing it in userspace, while SG_IO is passed as is
> to the underlying SCSI device and lacks this segmentation.
>
> Regardless of how NBD uses these limits, I think that these limits should be 
> correctly
> exposed by the block drivers, and therefore I propose is that each qemu block 
> driver
> would expose a pair of block limits.
> One for the regular block IO, and other for SG_IO.
>
> Then block driver clients (like scsi devices that you mention, nbd, etc)
> can choose which limit to use, depending on which IO api they use.
> The scsi-generic, and scsi-block can use the SG_IO limits,
> while the rest  can use the normal (unlimited for file I/O) limits, including 
> the NBD.
>
> Best regards,
> Maxim Levitsky
>
> > >
> > > On Fri, 4 Sep 2020 at 10:09, Tom Yan  wrote:
> > > > sg devices have different major/minor than their corresponding
> > > > block devices. Using sysfs to get max segments never really worked
> > > > for them.
> > > >
> > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > > which is appa

Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-06 Thread Tom Yan
Maybe you want to add some condition for this:
https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
Or not clamp it at all.

On Sun, 6 Sep 2020 at 18:58, Tom Yan  wrote:
>
> In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
> to have assumed that the only "SCSI Passthrough" is `-device
> scsi-generic`, while the fact is there's also `-device scsi-block`
> (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> max_sectors is necessary to it (more precisely, hw_max_sectors might
> what matters, but BLKSECTGET reports max_sectors, so).
>
> I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
> right approach to fix the original issue it addresses. (It should for
> example ignore the max_transfer if it will never matter in to it, or
> overrides it in certain cases; when I glimpsed over
> https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
> it could be file-posix problem when it is reporting the right thing,
> regardless of whether "removing" the code helps.)
>
> I don't think we want to "mark" `-device scsi-block` as sg either. It
> will probably bring even more unexpected problems, because they are
> literally different sets of things behind the scene / in the kernel.
>
> On Fri, 4 Sep 2020 at 10:09, Tom Yan  wrote:
> >
> > sg devices have different major/minor than their corresponding
> > block devices. Using sysfs to get max segments never really worked
> > for them.
> >
> > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > which is apparently equivalent to max segments.
> >
> > Signed-off-by: Tom Yan 
> > ---
> >  block/file-posix.c | 17 -
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 411a23cf99..9e37594145 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
> >  #endif
> >  }
> >
> > +static int sg_get_max_segments(int fd)
> > +{
> > +#ifdef SG_GET_SG_TABLESIZE
> > +long max_segments = 0;
> > +
> > +if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) {
> > +return max_segments;
> > +} else {
> > +return -errno;
> > +}
> > +#else
> > +return -ENOSYS;
> > +#endif
> > +}
> > +
> >  static int get_max_transfer_length(int fd)
> >  {
> >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
> > Error **errp)
> >  bs->bl.max_transfer = pow2floor(ret);
> >  }
> >
> > -ret = get_max_segments(s->fd);
> > +ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
> >  if (ret > 0) {
> >  bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > ret * qemu_real_host_page_size);
> > --
> > 2.28.0
> >



Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-06 Thread Tom Yan
In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky appears
to have assumed that the only "SCSI Passthrough" is `-device
scsi-generic`, while the fact is there's also `-device scsi-block`
(passthrough without the sg driver). Unlike `-device scsi-hd`, getting
max_sectors is necessary to it (more precisely, hw_max_sectors might
what matters, but BLKSECTGET reports max_sectors, so).

I'm unsure about how qemu-nbd works, but the commit clearly wasn't the
right approach to fix the original issue it addresses. (It should for
example ignore the max_transfer if it will never matter in to it, or
overrides it in certain cases; when I glimpsed over
https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see how
it could be file-posix problem when it is reporting the right thing,
regardless of whether "removing" the code helps.)

I don't think we want to "mark" `-device scsi-block` as sg either. It
will probably bring even more unexpected problems, because they are
literally different sets of things behind the scene / in the kernel.

On Fri, 4 Sep 2020 at 10:09, Tom Yan  wrote:
>
> sg devices have different major/minor than their corresponding
> block devices. Using sysfs to get max segments never really worked
> for them.
>
> Fortunately the sg driver provides an ioctl to get sg_tablesize,
> which is apparently equivalent to max segments.
>
> Signed-off-by: Tom Yan 
> ---
>  block/file-posix.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 411a23cf99..9e37594145 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
>  #endif
>  }
>
> +static int sg_get_max_segments(int fd)
> +{
> +#ifdef SG_GET_SG_TABLESIZE
> +long max_segments = 0;
> +
> +if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) {
> +return max_segments;
> +} else {
> +return -errno;
> +}
> +#else
> +return -ENOSYS;
> +#endif
> +}
> +
>  static int get_max_transfer_length(int fd)
>  {
>  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> @@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.max_transfer = pow2floor(ret);
>  }
>
> -ret = get_max_segments(s->fd);
> +ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
>  if (ret > 0) {
>  bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> ret * qemu_real_host_page_size);
> --
> 2.28.0
>



[PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-03 Thread Tom Yan
sg devices have different major/minor than their corresponding
block devices. Using sysfs to get max segments never really worked
for them.

Fortunately the sg driver provides an ioctl to get sg_tablesize,
which is apparently equivalent to max segments.

Signed-off-by: Tom Yan 
---
 block/file-posix.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 411a23cf99..9e37594145 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
+static int sg_get_max_segments(int fd)
+{
+#ifdef SG_GET_SG_TABLESIZE
+long max_segments = 0;
+
+if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) {
+return max_segments;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
 static int get_max_transfer_length(int fd)
 {
 #if defined(BLKSECTGET) && defined(BLKSSZGET)
@@ -1268,7 +1283,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_transfer = pow2floor(ret);
 }
 
-ret = get_max_segments(s->fd);
+ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
 if (ret > 0) {
 bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
ret * qemu_real_host_page_size);
-- 
2.28.0




[PATCH RESEND v2 1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-09-03 Thread Tom Yan
We can and should get max transfer length and max segments for all host
devices / cdroms (on Linux).

Also use MIN_NON_ZERO instead when we clamp max transfer length against
max segments.

Signed-off-by: Tom Yan 
---
v2: fix get_max_transfer_length for non-sg devices; also add/fix
sg_get_max_segments for sg devices
 block/file-posix.c | 56 +-
 1 file changed, 40 insertions(+), 16 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 9a00d4190a..0c9124c8aa 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,7 +1178,23 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int get_max_transfer_length(int fd)
+{
+#if defined(BLKSECTGET) && defined(BLKSSZGET)
+int sect = 0;
+int ssz = 0;
+
+if (ioctl(fd, BLKSECTGET, ) == 0 && ioctl(fd, BLKSSZGET, )) {
+return sect * ssz;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
+static int get_max_segments(int fd)
 {
 #ifdef CONFIG_LINUX
 char buf[32];
@@ -1233,23 +1249,31 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 
-if (bs->sg) {
-int ret = sg_get_max_transfer_length(s->fd);
+raw_probe_alignment(bs, s->fd, errp);
+bs->bl.min_mem_alignment = s->buf_align;
+bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+}
+
+static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVRawState *s = bs->opaque;
+
+/* BLKSECTGET has always been implemented wrongly in the sg driver
+   and BLKSSZGET has never been supported by it*/
+int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
+   get_max_transfer_length(s->fd);
 
-if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-bs->bl.max_transfer = pow2floor(ret);
-}
+if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
+bs->bl.max_transfer = pow2floor(ret);
+}
 
-ret = sg_get_max_segments(s->fd);
-if (ret > 0) {
-bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-  ret * qemu_real_host_page_size);
-}
+ret = get_max_segments(s->fd);
+if (ret > 0) {
+bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+   ret * qemu_real_host_page_size);
 }
 
-raw_probe_alignment(bs, s->fd, errp);
-bs->bl.min_mem_alignment = s->buf_align;
-bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+raw_refresh_limits(bs, errp);
 }
 
 static int check_for_dasd(int fd)
@@ -3604,7 +3628,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_pdiscard   = hdev_co_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
@@ -3728,7 +3752,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_preadv = raw_co_preadv,
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-- 
2.28.0




[PATCH v3 1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-09-03 Thread Tom Yan
We can and should get max transfer length and max segments for all host
devices / cdroms (on Linux).

Also use MIN_NON_ZERO instead when we clamp max transfer length against
max segments.

Signed-off-by: Tom Yan 
---
v2: fix get_max_transfer_length for non-sg devices; also add/fix
sg_get_max_segments for sg devices
v3: fix typo in get_max_transfer_length
 block/file-posix.c | 57 +-
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 9a00d4190a..411a23cf99 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,7 +1178,24 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int get_max_transfer_length(int fd)
+{
+#if defined(BLKSECTGET) && defined(BLKSSZGET)
+int sect = 0;
+int ssz = 0;
+
+if (ioctl(fd, BLKSECTGET, ) == 0 &&
+ioctl(fd, BLKSSZGET, ) == 0) {
+return sect * ssz;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
+static int get_max_segments(int fd)
 {
 #ifdef CONFIG_LINUX
 char buf[32];
@@ -1233,23 +1250,31 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 
-if (bs->sg) {
-int ret = sg_get_max_transfer_length(s->fd);
+raw_probe_alignment(bs, s->fd, errp);
+bs->bl.min_mem_alignment = s->buf_align;
+bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+}
+
+static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVRawState *s = bs->opaque;
+
+/* BLKSECTGET has always been implemented wrongly in the sg driver
+   and BLKSSZGET has never been supported by it*/
+int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
+   get_max_transfer_length(s->fd);
 
-if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-bs->bl.max_transfer = pow2floor(ret);
-}
+if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
+bs->bl.max_transfer = pow2floor(ret);
+}
 
-ret = sg_get_max_segments(s->fd);
-if (ret > 0) {
-bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-  ret * qemu_real_host_page_size);
-}
+ret = get_max_segments(s->fd);
+if (ret > 0) {
+bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+   ret * qemu_real_host_page_size);
 }
 
-raw_probe_alignment(bs, s->fd, errp);
-bs->bl.min_mem_alignment = s->buf_align;
-bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+raw_refresh_limits(bs, errp);
 }
 
 static int check_for_dasd(int fd)
@@ -3604,7 +3629,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_pdiscard   = hdev_co_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
@@ -3728,7 +3753,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_preadv = raw_co_preadv,
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-- 
2.28.0




[PATCH v4 1/2] file-posix: split hdev_refresh_limits from raw_refresh_limits

2020-09-03 Thread Tom Yan
We can and should get max transfer length and max segments for all host
devices / cdroms (on Linux).

Also use MIN_NON_ZERO instead when we clamp max transfer length against
max segments.

Signed-off-by: Tom Yan 
---
v2: fix get_max_transfer_length for non-sg devices; also add/fix
sg_get_max_segments for sg devices
v3: fix typo in get_max_transfer_length
v4: fix accidental .orig inclusion
 block/file-posix.c | 57 +-
 1 file changed, 41 insertions(+), 16 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 9a00d4190a..411a23cf99 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,7 +1178,24 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
-static int sg_get_max_segments(int fd)
+static int get_max_transfer_length(int fd)
+{
+#if defined(BLKSECTGET) && defined(BLKSSZGET)
+int sect = 0;
+int ssz = 0;
+
+if (ioctl(fd, BLKSECTGET, ) == 0 &&
+ioctl(fd, BLKSSZGET, ) == 0) {
+return sect * ssz;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
+static int get_max_segments(int fd)
 {
 #ifdef CONFIG_LINUX
 char buf[32];
@@ -1233,23 +1250,31 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 
-if (bs->sg) {
-int ret = sg_get_max_transfer_length(s->fd);
+raw_probe_alignment(bs, s->fd, errp);
+bs->bl.min_mem_alignment = s->buf_align;
+bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+}
+
+static void hdev_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+BDRVRawState *s = bs->opaque;
+
+/* BLKSECTGET has always been implemented wrongly in the sg driver
+   and BLKSSZGET has never been supported by it*/
+int ret = bs->sg ? sg_get_max_transfer_length(s->fd) :
+   get_max_transfer_length(s->fd);
 
-if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
-bs->bl.max_transfer = pow2floor(ret);
-}
+if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
+bs->bl.max_transfer = pow2floor(ret);
+}
 
-ret = sg_get_max_segments(s->fd);
-if (ret > 0) {
-bs->bl.max_transfer = MIN(bs->bl.max_transfer,
-  ret * qemu_real_host_page_size);
-}
+ret = get_max_segments(s->fd);
+if (ret > 0) {
+bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
+   ret * qemu_real_host_page_size);
 }
 
-raw_probe_alignment(bs, s->fd, errp);
-bs->bl.min_mem_alignment = s->buf_align;
-bs->bl.opt_mem_alignment = MAX(s->buf_align, qemu_real_host_page_size);
+raw_refresh_limits(bs, errp);
 }
 
 static int check_for_dasd(int fd)
@@ -3604,7 +3629,7 @@ static BlockDriver bdrv_host_device = {
 .bdrv_co_pdiscard   = hdev_co_pdiscard,
 .bdrv_co_copy_range_from = raw_co_copy_range_from,
 .bdrv_co_copy_range_to  = raw_co_copy_range_to,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
@@ -3728,7 +3753,7 @@ static BlockDriver bdrv_host_cdrom = {
 .bdrv_co_preadv = raw_co_preadv,
 .bdrv_co_pwritev= raw_co_pwritev,
 .bdrv_co_flush_to_disk  = raw_co_flush_to_disk,
-.bdrv_refresh_limits = raw_refresh_limits,
+.bdrv_refresh_limits = hdev_refresh_limits,
 .bdrv_io_plug = raw_aio_plug,
 .bdrv_io_unplug = raw_aio_unplug,
 .bdrv_attach_aio_context = raw_aio_attach_aio_context,
-- 
2.28.0




[PATCH RESEND v2 2/2] file-posix: add sg_get_max_segments that actually works with sg

2020-09-03 Thread Tom Yan
sg devices have different major/minor than their corresponding
block devices. Using sysfs to get max segments never really worked
for them.

Fortunately the sg driver provides an ioctl to get sg_tablesize,
which is apparently equivalent to max segments.

Signed-off-by: Tom Yan 
---
 block/file-posix.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 0c9124c8aa..d2cd9a3362 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1178,6 +1178,21 @@ static int sg_get_max_transfer_length(int fd)
 #endif
 }
 
+static int sg_get_max_segments(int fd)
+{
+#ifdef SG_GET_SG_TABLESIZE
+long max_segments = 0;
+
+if (ioctl(fd, SG_GET_SG_TABLESIZE, _segments) == 0) {
+return max_segments;
+} else {
+return -errno;
+}
+#else
+return -ENOSYS;
+#endif
+}
+
 static int get_max_transfer_length(int fd)
 {
 #if defined(BLKSECTGET) && defined(BLKSSZGET)
@@ -1267,7 +1282,7 @@ static void hdev_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_transfer = pow2floor(ret);
 }
 
-ret = get_max_segments(s->fd);
+ret = bs->sg ? sg_get_max_segments(s->fd) : get_max_segments(s->fd);
 if (ret > 0) {
 bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
ret * qemu_real_host_page_size);
-- 
2.28.0