Re: [Qemu-block] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:57, Eric Blake wrote:
> When the user requests a partition, we were using data read
> from the disk as disk offsets without a bounds check. We got
> lucky that even when computed offsets are out-of-bounds,
> blk_pread() will gracefully catch the error later (so I don't
> think a malicious image can crash or exploit qemu-nbd, and am
> not treating this as a security flaw), but it's better to
> flag the problem up front than to risk permanent EIO death of
> the block device down the road.  Also, note that the
> partition code blindly overwrites any offset passed in by the
> user; so make the -o/-P combo an error for less confusion.
> 
> This can be tested with nbdkit:
> $ echo hi > file
> $ nbdkit -fv --filter=truncate partitioning file truncate=64k
> 
> Pre-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
> $ qemu-io -f raw nbd://localhost:10810
> qemu-io> r -v 0 1
> Disconnect client, due to: Failed to send reply: reading from file failed: 
> Input/output error
> Connection closed
> read failed: Input/output error
> qemu-io> q
> [1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
> nbd://localhost:10809
> 
> Post-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
> qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size exceeds 
> file length 65536
> 
> Signed-off-by: Eric Blake 
> ---
> v3: new patch
> ---
>   qemu-nbd.c | 18 +-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 51b55f2e066..ff4adb9b3eb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
>   fd_size -= dev_offset;

This reuse of file-size variable as export-size is a source of errors, like your
assert in the following part.

> 
>   if (partition != -1) {
> -ret = find_partition(blk, partition, &dev_offset, &fd_size);
> +off_t limit;
> +
> +if (dev_offset) {
> +error_report("Cannot request partition and offset together");
> +exit(EXIT_FAILURE);
> +}
> +ret = find_partition(blk, partition, &dev_offset, &limit);
>   if (ret < 0) {
>   error_report("Could not find partition %d: %s", partition,
>strerror(-ret));
>   exit(EXIT_FAILURE);
>   }
> +/* partition limits are (32-bit << 9); can't overflow 64 bits */
> +assert(dev_offset >= 0 && dev_offset + limit >= dev_offset);
> +if (dev_offset + limit > fd_size) {
> +error_report("Discovered partition %d at offset %lld size %lld, "
> + "but size exceeds file length %lld", partition,
> + (long long int) dev_offset, (long long int) limit,
> + (long long int) fd_size);
> +exit(EXIT_FAILURE);
> +}
> +fd_size = limit;
>   }
> 
>   export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> 

Ok, anyway, finally I understand the point, thank you for detailed explanation:

Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] Virtio-BLK/SCSI write requests and data payload checksums

2019-01-15 Thread Vadim Rozenfeld
On Fri, 2019-01-11 at 09:50 +0100, Peter Lieven wrote:
> > Am 11.01.2019 um 08:14 schrieb Vadim Rozenfeld  > >:
> > 
> > > On Thu, 2019-01-10 at 14:57 +0100, Peter Lieven wrote:
> > > > Am 18.12.18 um 15:45 schrieb Peter Lieven:
> > > > > Am 18.12.18 um 14:15 schrieb Vadim Rozenfeld:
> > > > > Peter, I must be missing something here, but what exactly the
> > > > > problem
> > > > > is?
> > > > 
> > > > The issue is that I see concurrent read requests coming in from
> > > > Windows Guest with vioscsi as driver that
> > > > 
> > > > have the same buffer address from guest memory space. I noticed
> > > > this because I have Data Digests enabled
> > > > 
> > > > and the calculated Digest has a wrong CRC32C. This happens
> > > > because
> > > > the CRC is calculated while or after data
> > > > 
> > > > from a second requests reads into the same buffer. I see this
> > > > only
> > > > with Windows and the request size seems
> > > > 
> > > > to be always 4K. This is either a bug in Windows, vioscsi or a
> > > > mad
> > > > application issuing concurrent reads using
> > > > 
> > 
> > vioscsi driver (just like viostor) doesn't allocate any memory by
> > itself, except for some small pieces, needed to build virtio queues
> > and
> > to maintain some internally used data. 
> > 
> > All SCSI Request Block (SRB) memory related jobs are done with
> > StorPortGetScatterGatherList function which just retrieves SG List
> > from
> > the specified SRB. SG List is a set SG Elements, where each
> > elements
> > is
> > 
> > typedef struct _STOR_SCATTER_GATHER_ELEMENT {
> >  STOR_PHYSICAL_ADDRESS PhysicalAddress;
> >  ULONG Length;
> >  ULONG_PTR Reserved;
> > } STOR_SCATTER_GATHER_ELEMENT,
> > *PSTOR_SCATTER_GATHER_ELEMENT;
> > 
> > PhysicalAddress is a physical address of one single page in a data
> > buffer,
> > Length is almost always 4K (page size). Theoretically it can be
> > less
> > for the first chunk if data buffer is not page aligned, or for the
> > last
> > chunk, if buffer size is not rounded up to the page granularity. 
> > 
> > In my understanding, class driver can allocate SG list from non-
> > paged
> > pool, which can explain why you are observing the same memory
> > address
> > for different read/write SRBs, just because class driver can
> > recycle
> > the same pages from the same non-paged pool.
> 
> Vadim, thanks for the explaination. The issue is that I see the same
> buffer address in parallel requests. But if I unterstand you
> correctly this is likely a windows or application issue,
> 

Hi Peter,

Sorry for delay in response.

The entire call stack before it comes to vioscsi's BuildIo routine can
be quite big and complicated. Like this one below, for example.

# ChildEBP RetAddr  Args to Child  
00 aa7235d0 84168cf3 857e7008 875df7e8 878710e0
vioscsi!VioScsiBuildIo+0x3fd (FPO: [Non-Fpo]) (CONV: stdcall)
[x:\work\dev\internal-kvm-guest-drivers-windows\vioscsi\vioscsi.c @
1269]
01 aa723654 84168986 9cf51f08 00c0 
storport!RaidAdapterPostScatterGatherExecute+0x343 (FPO: [0,25,4])
02 aa723668 8120f418 87871028  8932d090
storport!RaidpAdapterContinueScatterGather+0x36 (FPO: [Non-Fpo])
03 aa7236b0 84168376 88b0c9d8 87871028 9cf51f08
hal!HalBuildScatterGatherListV2+0x288 (FPO: [Non-Fpo])
04 aa72371c 84167e58 a056c0e8 a053de28 aa723788
storport!RaUnitStartIo+0x256 (FPO: [Non-Fpo])
05 aa723798 84167129   a053de28
storport!RaidStartIoPacket+0xcf8 (FPO: [2,23,4])
06 aa7237c0 84166f73 8136c940 a053de28 a056c030
storport!RaidUnitSubmitRequest+0x89 (FPO: [Non-Fpo])
07 aa7237fc 84166e00 8136c940 a056c030 8c65a60f
storport!RaUnitScsiIrp+0x133 (FPO: [Non-Fpo])
08 aa723810 812dd008 a056c030 a053de28 a01e36b8
storport!RaDriverScsiIrp+0x40 (FPO: [Non-Fpo])
09 aa723828 840c46dc 8c6c45e0   nt!IofCallDriver+0x48
(FPO: [Non-Fpo])
0a aa723888 840fb438 8fe138c0 a01ef680 6de948f5
CLASSPNP!SubmitTransferPacket+0x1dc (FPO: [Non-Fpo])
0b aa723928 840fb20a 00c0 003f 
CLASSPNP!ClasspModeSense+0xa0 (FPO: [Non-Fpo])
0c aa72393c 840a87c6 8c6c45e0 8fe138c0 00c0
CLASSPNP!ClassModeSense+0x18 (FPO: [Non-Fpo])
0d aa723978 840a17a3 8c6463c8 8c6c4698 8c6c4698
disk!DiskIoctlGetMediaTypesEx+0x11e (FPO: [Non-Fpo])
0e aa7239a8 840c4ca4 8c6c45e0 8c6463c8 8c6c45e0
disk!DiskDeviceControl+0x223 (FPO: [Non-Fpo])
0f aa723a18 840c37e3 8c6c45e0 8c6463c8 8aac5be4
CLASSPNP!ClassDeviceControlDispatch+0x12c (FPO: [2,21,4])
10 aa723a30 812dd008 8c6c45e0 8c6463c8 8c6463c8
CLASSPNP!ClassGlobalDispatch+0x23 (FPO: [Non-Fpo])
11 aa723a48 84be1bd2 8aac5ae0 8aac5b98 8c646480 nt!IofCallDriver+0x48
(FPO: [Non-Fpo])
12 aa723a68 84be1139 8aac5ae0 8c6463c8 8c6b8f68
partmgr!PmFilterDeviceControl+0x142 (FPO: [Non-Fpo])
13 aa723ab8 812dd008 8aac5ae0 8c6463c8 8c6463c8
partmgr!PmGlobalDispatch+0x39 (FPO: [Non-Fpo])
14 aa723ad4 8155b575 89313c9c 030c 031cf8f0 nt!IofCallDriver+0x48
(FPO: [Non-Fpo])
15 aa723bb8 8155afca   031cf900
nt!IopXxxControlFile

[Qemu-block] qemu-img info ran at 100% CPU for 45 minutes without writing a byte (stopped it)

2019-01-15 Thread james harvey
In IRC, jsnow said this was a known problem, "that lseek is not
reliably fast", but requested I send this to the block list.  I used a
workaround of giving the qcow to a VM not otherwise using it, and ran
dd within the VM.  So, I don't personally need a fix, but I'm happy to
mail the list to give info to help down the bug.

I ran:

# qemu-img convert /var/lib/libvirt/images/win7.qcow2 -O raw
/mnt/tmpqcow/win7.raw

45 minutes later, qemu-img had been running with 100% CPU every time I
checked, and it had allocated the raw file, but still hadn't actually
written a single byte: (note the dd in the VM completed the 90GB in
about 8 minutes)

# ls -la /mnt/tmpqcow
-rw-r--r-- 1 root root 96636764160 Jan 15 18:50 win7.raw
# du /mnt/tmpqcow/win7.raw
0   /mnt/tmpqcow/win7.raw

Both /var/lib/libvirt/images and /mnt/tmpqcow are on the same Samsung
960 EVO NVMe (spec 1900MB/s write, 3200MB/s read.)

$ ls -la /var/lib/libvirt/images/win7.qcow2
-rw--- 1 root root 96251936768 Jan 15 18:45
/var/lib/libvirt/images/win7.qcow2

# qemu-img info /var/lib/libvirt/images/win7.qcow2
image: /var/lib/libvirt/images/win7.qcow2
file format: qcow2
virtual size: 90G (96636764160 bytes)
disk size: 90G
cluster_size: 65536
Format specific information:
compat: 1.1
lazy refcounts: true
refcount bits: 16
corrupt: false

After running this long, I ran strace for 15 seconds, here:
https://termbin.com/gg9k -- It's repeatedly running lseek with
SEEK_DATA and SEEK_HOLE.  The SEEK_HOLE always results in 96251936768,
and SEEK_DATA is different results.

Starting over to get the beginning of an strace, here: https://termbin.com/misf

Running up to date Arch Linux.  Kernel 4.20.1.  qemu 3.1.0.

/var/lib/libvirt/images and /mnt/tmpqcow are on separate BTRFS
volumes, on top of LVM thin volumes.  I'll admit I'm only now seeing
that BTRFS' copy on write is being used on "/var/lib/libvirt/".  I
thought it was turned off.  This is the only VM I'm running as a qcow
or through libvirt - rest are virtio direct through qemu.

Hitting CTRL+C didn't stop qemu-img convert, I had to kill it.



Re: [Qemu-block] [PATCH v3 4/9] qcow2-threads: qcow2_co_do_compress: protect queuing by mutex

2019-01-15 Thread Paolo Bonzini
On 08/01/19 18:06, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  block/qcow2-threads.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
> index 20b2616529..156e0667be 100644
> --- a/block/qcow2-threads.c
> +++ b/block/qcow2-threads.c
> @@ -158,15 +158,19 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
> size_t dest_size,
>  .func = func,
>  };
>  
> +qemu_co_mutex_lock(&s->lock);
>  while (s->nb_compress_threads >= MAX_COMPRESS_THREADS) {
> -qemu_co_queue_wait(&s->compress_wait_queue, NULL);
> +qemu_co_queue_wait(&s->compress_wait_queue, &s->lock);
>  }
> -
>  s->nb_compress_threads++;
> +qemu_co_mutex_unlock(&s->lock);
> +
>  thread_pool_submit_co(pool, qcow2_compress_pool_func, &arg);
> -s->nb_compress_threads--;
>  
> +qemu_co_mutex_lock(&s->lock);
> +s->nb_compress_threads--;
>  qemu_co_queue_next(&s->compress_wait_queue);
> +qemu_co_mutex_unlock(&s->lock);
>  
>  return arg.ret;
>  }
> 

Reviewed-by: Paolo Bonzini 

but, some information would be nice to have in the commit message.

Paolo



[Qemu-block] [PULL v2 46/49] block/sheepdog: Use QEMU_NONSTRING for non NUL-terminated arrays

2019-01-15 Thread Michael S. Tsirkin
From: Philippe Mathieu-Daudé 

GCC 8 added a -Wstringop-truncation warning:

  The -Wstringop-truncation warning added in GCC 8.0 via r254630 for
  bug 81117 is specifically intended to highlight likely unintended
  uses of the strncpy function that truncate the terminating NUL
  character from the source string.

This new warning leads to compilation failures:

CC  block/sheepdog.o
  qemu/block/sheepdog.c: In function 'find_vdi_name':
  qemu/block/sheepdog.c:1239:5: error: 'strncpy' specified bound 256 equals 
destination size [-Werror=stringop-truncation]
   strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
   ^~
  make: *** [qemu/rules.mak:69: block/sheepdog.o] Error 1

As described previous to the strncpy() calls, the use of strncpy() is
correct here:

/* This pair of strncpy calls ensures that the buffer is zero-filled,
 * which is desirable since we'll soon be sending those bytes, and
 * don't want the send_req to read uninitialized data.
 */
strncpy(buf, filename, SD_MAX_VDI_LEN);
strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);

Use the QEMU_NONSTRING attribute, since this array is intended to store
character arrays that do not necessarily contain a terminating NUL.

Suggested-by: Michael S. Tsirkin 
Reviewed-by: Eric Blake 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Philippe Mathieu-Daudé 
---
 block/sheepdog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 90ab43baa4..ed14f7afbe 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1224,7 +1224,7 @@ static int find_vdi_name(BDRVSheepdogState *s, const char 
*filename,
 SheepdogVdiReq hdr;
 SheepdogVdiRsp *rsp = (SheepdogVdiRsp *)&hdr;
 unsigned int wlen, rlen = 0;
-char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN];
+char buf[SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN] QEMU_NONSTRING;
 
 fd = connect_to_sdog(s, errp);
 if (fd < 0) {
-- 
MST




[Qemu-block] [PULL v2 04/49] qemu: avoid memory leak while remove disk

2019-01-15 Thread Michael S. Tsirkin
From: Jian Wang 

Memset vhost_dev to zero in the vhost_dev_cleanup function.
This causes dev.vqs to be NULL, so that
vqs does not free up space when calling the g_free function.
This will result in a memory leak. But you can't release vqs
directly in the vhost_dev_cleanup function, because vhost_net
will also call this function, and vhost_net's vqs is assigned by array.
In order to solve this problem, we first save the pointer of vqs,
and release the space of vqs after vhost_dev_cleanup is called.

Signed-off-by: Jian Wang 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/block/vhost-user-blk.c | 7 +--
 hw/scsi/vhost-scsi.c  | 3 ++-
 hw/scsi/vhost-user-scsi.c | 3 ++-
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 1451940845..c3af28fad4 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -250,6 +250,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(vdev);
 VhostUserState *user;
+struct vhost_virtqueue *vqs = NULL;
 int i, ret;
 
 if (!s->chardev.chr) {
@@ -288,6 +289,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
 s->dev.vq_index = 0;
 s->dev.backend_features = 0;
+vqs = s->dev.vqs;
 
 vhost_dev_set_config_notifier(&s->dev, &blk_ops);
 
@@ -314,7 +316,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 vhost_err:
 vhost_dev_cleanup(&s->dev);
 virtio_err:
-g_free(s->dev.vqs);
+g_free(vqs);
 virtio_cleanup(vdev);
 
 vhost_user_cleanup(user);
@@ -326,10 +328,11 @@ static void vhost_user_blk_device_unrealize(DeviceState 
*dev, Error **errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserBlk *s = VHOST_USER_BLK(dev);
+struct vhost_virtqueue *vqs = s->dev.vqs;
 
 vhost_user_blk_set_status(vdev, 0);
 vhost_dev_cleanup(&s->dev);
-g_free(s->dev.vqs);
+g_free(vqs);
 virtio_cleanup(vdev);
 
 if (s->vhost_user) {
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 7f21b4f9d6..61e2e57da9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -215,6 +215,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 {
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(dev);
+struct vhost_virtqueue *vqs = vsc->dev.vqs;
 
 migrate_del_blocker(vsc->migration_blocker);
 error_free(vsc->migration_blocker);
@@ -223,7 +224,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 vhost_scsi_set_status(vdev, 0);
 
 vhost_dev_cleanup(&vsc->dev);
-g_free(vsc->dev.vqs);
+g_free(vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
 }
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index 2e1ba4a87b..6728878a52 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -121,12 +121,13 @@ static void vhost_user_scsi_unrealize(DeviceState *dev, 
Error **errp)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VHostUserSCSI *s = VHOST_USER_SCSI(dev);
 VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s);
+struct vhost_virtqueue *vqs = vsc->dev.vqs;
 
 /* This will stop the vhost backend. */
 vhost_user_scsi_set_status(vdev, 0);
 
 vhost_dev_cleanup(&vsc->dev);
-g_free(vsc->dev.vqs);
+g_free(vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
 
-- 
MST




Re: [Qemu-block] [PATCH v3 06/19] qemu-nbd: Avoid strtol open-coding

2019-01-15 Thread Richard W.M. Jones
On Sat, Jan 12, 2019 at 11:57:59AM -0600, Eric Blake wrote:
> Our copy-and-pasted open-coding of strtol handling forgot to
> handle overflow conditions.  Use qemu_strto*() instead.
> 
> In the case of --partition, since we insist on a user-supplied
> partition to be non-zero, we can use 0 rather than -1 for our
> initial value to distinguish when a partition is not being
> served, for slightly more optimal code.
> 
> The error messages for out-of-bounds values are less specific,
> but should not be a terrible loss in quality.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-8-ebl...@redhat.com>
> 
> ---
> v3: rebase to use int64_t rather than off_t [Vladimir]
> ---
>  qemu-nbd.c | 28 +---
>  1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 96c0829970c..4670b659167 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -546,9 +546,8 @@ int main(int argc, char **argv)
>  };
>  int ch;
>  int opt_ind = 0;
> -char *end;
>  int flags = BDRV_O_RDWR;
> -int partition = -1;
> +int partition = 0;
>  int ret = 0;
>  bool seen_cache = false;
>  bool seen_discard = false;
> @@ -660,9 +659,8 @@ int main(int argc, char **argv)
>  port = optarg;
>  break;
>  case 'o':
> -dev_offset = strtoll (optarg, &end, 0);
> -if (*end) {
> -error_report("Invalid offset `%s'", optarg);
> +if (qemu_strtou64(optarg, NULL, 0, &dev_offset) < 0) {
> +error_report("Invalid offset '%s'", optarg);
>  exit(EXIT_FAILURE);
>  }
>  break;
> @@ -684,13 +682,9 @@ int main(int argc, char **argv)
>  flags &= ~BDRV_O_RDWR;
>  break;
>  case 'P':
> -partition = strtol(optarg, &end, 0);
> -if (*end) {
> -error_report("Invalid partition `%s'", optarg);
> -exit(EXIT_FAILURE);
> -}
> -if (partition < 1 || partition > 8) {
> -error_report("Invalid partition %d", partition);
> +if (qemu_strtoi(optarg, NULL, 0, &partition) < 0 ||
> +partition < 1 || partition > 8) {
> +error_report("Invalid partition '%s'", optarg);
>  exit(EXIT_FAILURE);
>  }
>  break;
> @@ -711,15 +705,11 @@ int main(int argc, char **argv)
>  device = optarg;
>  break;
>  case 'e':
> -shared = strtol(optarg, &end, 0);
> -if (*end) {
> +if (qemu_strtoi(optarg, NULL, 0, &shared) < 0 ||
> +shared < 1) {
>  error_report("Invalid shared device number '%s'", optarg);
>  exit(EXIT_FAILURE);
>  }
> -if (shared < 1) {
> -error_report("Shared device number must be greater than 0");
> -exit(EXIT_FAILURE);
> -}
>  break;
>  case 'f':
>  fmt = optarg;
> @@ -1007,7 +997,7 @@ int main(int argc, char **argv)
>  }
>  fd_size -= dev_offset;
> 
> -if (partition != -1) {
> +if (partition) {
>  int64_t limit;
> 
>  if (dev_offset) {

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/



Re: [Qemu-block] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-15 Thread Eric Blake
On 1/15/19 12:00 PM, Richard W.M. Jones wrote:
> On Sat, Jan 12, 2019 at 11:57:56AM -0600, Eric Blake wrote:
>> When the user requests a partition, we were using data read
>> from the disk as disk offsets without a bounds check. We got
>> lucky that even when computed offsets are out-of-bounds,
>> blk_pread() will gracefully catch the error later (so I don't
>> think a malicious image can crash or exploit qemu-nbd, and am
>> not treating this as a security flaw), but it's better to
>> flag the problem up front than to risk permanent EIO death of
>> the block device down the road.  Also, note that the
>> partition code blindly overwrites any offset passed in by the
>> user; so make the -o/-P combo an error for less confusion.
>>
>> This can be tested with nbdkit:
>> $ echo hi > file
>> $ nbdkit -fv --filter=truncate partitioning file truncate=64k
>>
>> Pre-patch:
>> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
>> $ qemu-io -f raw nbd://localhost:10810
>> qemu-io> r -v 0 1
>> Disconnect client, due to: Failed to send reply: reading from file failed: 
>> Input/output error
>> Connection closed
>> read failed: Input/output error
>> qemu-io> q
>> [1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
>> nbd://localhost:10809
>>
>> Post-patch:
>> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
>> qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size 
>> exceeds file length 65536
>>
>> Signed-off-by: Eric Blake 
>> ---
>> v3: new patch
>> ---
>>  qemu-nbd.c | 18 +-
>>  1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 51b55f2e066..ff4adb9b3eb 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
>>  fd_size -= dev_offset;
>>
>>  if (partition != -1) {
>> -ret = find_partition(blk, partition, &dev_offset, &fd_size);
>> +off_t limit;
> 
> I was only vaguely following the other review comments, but off_t does
> seem odd here.  Even though we can assume that _FILE_OFFSET_BITS=64
> maybe we should just use {u,}int64_t?  Does this represent an offset
> in a host file?  Only in the case where qemu-nbd is serving a raw
> format file.  In other cases (say, qcow2) the partition size exists in
> an abstract virtual space.

Yes, later patches switch it to int64_t.  Here, it remains off_t because
find_partition(&limit) still expects off_t.  I suppose in my later
patches, I could use uint64_t limit in spite of keeping int64_t fd_size,
and change the signature of find_partition() accordingly, since I've
decoupled the MBR partition lookup (which is a 41-bit value, always
positive) from the file size checks.

> 
> But it's not a big deal, so:

Yeah, no need to rewrite this patch, since later patches improve the
type anyway (whether or not I stick to int64_t or uint64_t for the
find_partition() code).

> 
> Reviewed-by: Richard W.M. Jones 
> 
> Rich.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PULL 0/2] Block patches

2019-01-15 Thread Peter Maydell
On Mon, 14 Jan 2019 at 16:34, Stefan Hajnoczi  wrote:
>
> The following changes since commit 7260438b7056469610ee166f7abe9ff8a26b8b16:
>
>   Merge remote-tracking branch 
> 'remotes/palmer/tags/riscv-for-master-3.2-part2' into staging (2019-01-14 
> 11:41:43 +)
>
> are available in the Git repository at:
>
>   git://github.com/stefanha/qemu.git tags/block-pull-request
>
> for you to fetch changes up to fef1660132b0f25bf2d275d7f986ddcfe19a4426:
>
>   aio-posix: Fix concurrent aio_poll/set_fd_handler. (2019-01-14 14:09:41 
> +)
>
> 
> Pull request
>
> No user-visible changes.
>
> 
>

Applied, thanks.
-- PMM



Re: [Qemu-block] [Qemu-devel] [PATCH 03/15] hw/ssi: Remove SSIBus from "qemu/typedefs.h"

2019-01-15 Thread Paolo Bonzini
On 15/01/19 13:56, Thomas Huth wrote:
> Yes, agreed, removing things from typedefs.h just to add lots of
> #includes in other files is not really the best idea. I also dropped
> this patch in v2 of my current PULL request because of this reason.
> Phil, I suggest to simply drop this patch. What we maybe could do is to
> split up typedefs.h per subsystem, so that we additionally have a
> hw/arm/typedefs.h, hw/ppc/typedefs.h etc. in the end, then the
> target-specific typedefs would not clutter the common qemu/typedefs.h
> file anymore.

I disagree.  The *inclusions* of target-specific typedefs.h would then
clutter something.

For the (important, mind) case of circular includes, allowing struct in
prototypes pretty much solves the issues, and a subsystem-specific
typedefs.h is another solution according to maintainer's discretion.

In this case, however, keeping subsystems self-contained is a good
reason to apply the patch.  Having SSIBus in typedefs.h means that the
next SSI type has a higher chance of being added to typedefs.h even if
it's not necessary.

Sometimes we need to take patches that seem unnecessary in order to keep
the codebase tidy.  Think of the churn that was the introduction of
hw/SUBSYSTEM.  It was painful and it added all the complexity to the
Makefiles in order to support recursive Makefile.objs in our
not-that-recursive makefiles.  However, it made MAINTAINERS usable and
now, a few years later, few of us would probably be happy to go back to
the flat hw/ directory.

Paolo



Re: [Qemu-block] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-15 Thread Richard W.M. Jones
On Sat, Jan 12, 2019 at 11:57:56AM -0600, Eric Blake wrote:
> When the user requests a partition, we were using data read
> from the disk as disk offsets without a bounds check. We got
> lucky that even when computed offsets are out-of-bounds,
> blk_pread() will gracefully catch the error later (so I don't
> think a malicious image can crash or exploit qemu-nbd, and am
> not treating this as a security flaw), but it's better to
> flag the problem up front than to risk permanent EIO death of
> the block device down the road.  Also, note that the
> partition code blindly overwrites any offset passed in by the
> user; so make the -o/-P combo an error for less confusion.
> 
> This can be tested with nbdkit:
> $ echo hi > file
> $ nbdkit -fv --filter=truncate partitioning file truncate=64k
> 
> Pre-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
> $ qemu-io -f raw nbd://localhost:10810
> qemu-io> r -v 0 1
> Disconnect client, due to: Failed to send reply: reading from file failed: 
> Input/output error
> Connection closed
> read failed: Input/output error
> qemu-io> q
> [1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
> nbd://localhost:10809
> 
> Post-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
> qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size exceeds 
> file length 65536
> 
> Signed-off-by: Eric Blake 
> ---
> v3: new patch
> ---
>  qemu-nbd.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 51b55f2e066..ff4adb9b3eb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
>  fd_size -= dev_offset;
> 
>  if (partition != -1) {
> -ret = find_partition(blk, partition, &dev_offset, &fd_size);
> +off_t limit;

I was only vaguely following the other review comments, but off_t does
seem odd here.  Even though we can assume that _FILE_OFFSET_BITS=64
maybe we should just use {u,}int64_t?  Does this represent an offset
in a host file?  Only in the case where qemu-nbd is serving a raw
format file.  In other cases (say, qcow2) the partition size exists in
an abstract virtual space.

> +if (dev_offset) {
> +error_report("Cannot request partition and offset together");
> +exit(EXIT_FAILURE);
> +}
> +ret = find_partition(blk, partition, &dev_offset, &limit);
>  if (ret < 0) {
>  error_report("Could not find partition %d: %s", partition,
>   strerror(-ret));
>  exit(EXIT_FAILURE);
>  }
> +/* partition limits are (32-bit << 9); can't overflow 64 bits */
> +assert(dev_offset >= 0 && dev_offset + limit >= dev_offset);
> +if (dev_offset + limit > fd_size) {
> +error_report("Discovered partition %d at offset %lld size %lld, "
> + "but size exceeds file length %lld", partition,
> + (long long int) dev_offset, (long long int) limit,
> + (long long int) fd_size);
> +exit(EXIT_FAILURE);
> +}
> +fd_size = limit;
>  }
> 
>  export = nbd_export_new(bs, dev_offset, fd_size, export_name,

But it's not a big deal, so:

Reviewed-by: Richard W.M. Jones 

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-block] [PATCH v3 01/19] maint: Allow for EXAMPLES in texi2pod

2019-01-15 Thread Richard W.M. Jones
On Sat, Jan 12, 2019 at 11:57:54AM -0600, Eric Blake wrote:
> The next commit will add an EXAMPLES section to qemu-nbd.8;
> for that to work, we need to recognize EXAMPLES in texi2pod.
> We also need to add a dependency from all man pages against
> the generator script, since a change to the generator may
> cause the resulting man page to differ.
> 
> Signed-off-by: Eric Blake 
> ---
> v3: add generic dependency for all man pages in $(DOCS) instead of
> per-line editing [Vladimir]
> ---
>  Makefile| 2 ++
>  scripts/texi2pod.pl | 2 +-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index a9ac16d94e8..e2d3ace190a 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -857,6 +857,8 @@ docs/interop/qemu-qmp-ref.dvi 
> docs/interop/qemu-qmp-ref.html \
>  docs/interop/qemu-qmp-ref.txt docs/interop/qemu-qmp-ref.7: \
>   docs/interop/qemu-qmp-ref.texi docs/interop/qemu-qmp-qapi.texi
> 
> +$(filter %.1 %.7 %.8,$(DOCS)): scripts/texi2pod.pl

Much simpler, and adds the dependencies for all applicable $(DOCS)
targets as was suggested by Vladimir previously, so:

Reviewed-by: Richard W.M. Jones 

Rich.

>  # Reports/Analysis
> 
>  %/coverage-report.html:
> diff --git a/scripts/texi2pod.pl b/scripts/texi2pod.pl
> index 39ce584a322..839b7917cf7 100755
> --- a/scripts/texi2pod.pl
> +++ b/scripts/texi2pod.pl
> @@ -398,7 +398,7 @@ $sects{NAME} = "$fn \- $tl\n";
>  $sects{FOOTNOTES} .= "=back\n" if exists $sects{FOOTNOTES};
> 
>  for $sect (qw(NAME SYNOPSIS DESCRIPTION OPTIONS ENVIRONMENT FILES
> -   BUGS NOTES FOOTNOTES SEEALSO AUTHOR COPYRIGHT)) {
> +   BUGS NOTES FOOTNOTES EXAMPLES SEEALSO AUTHOR COPYRIGHT)) {
>  if(exists $sects{$sect}) {
>   $head = $sect;
>   $head =~ s/SEEALSO/SEE ALSO/;
> -- 
> 2.20.1

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v



Re: [Qemu-block] [Qemu-devel] [PATCH 03/15] hw/ssi: Remove SSIBus from "qemu/typedefs.h"

2019-01-15 Thread Paolo Bonzini
On 15/01/19 13:28, Markus Armbruster wrote:
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/arm/strongarm.h  | 1 +
>>  include/hw/arm/pxa.h| 1 +
>>  include/hw/ssi/pl022.h  | 1 +
>>  include/hw/ssi/ssi.h| 1 +
>>  include/qemu/typedefs.h | 1 -
>>  5 files changed, 4 insertions(+), 1 deletion(-)
> When typedefs.h changes, we recompile the world, but it pretty much only
> ever changes when new typedefs are added.  Thus, *keeping* a typedef
> there is therefore pretty cheap.
> 
> Nevertheless, we shouldn't keep typedefs there without a real reason.
> Being able to move one away without having to add any new #include
> directives is a strong sign for "no real reason".  I like patches doing
> that.
> 
> What I don't like is adding #include directives just so you can move
> typedefs out of typedefs.h: it slows down the build.  Granted, the four

(three - one added line is the typedef).

> added by this patch are a drop in the bucket.  The point I'm trying to
> make is typedefs.h's purpose: it's for avoiding #include directives.
> Circular ones in particular, but others, too.

In this case, adding ssi.h inclusions to SSI controllers seems to be a
feature, not a bug.

Paolo



Re: [Qemu-block] [PATCH v3 02/19] qemu-nbd: Enhance man page

2019-01-15 Thread Richard W.M. Jones
On Sat, Jan 12, 2019 at 11:57:55AM -0600, Eric Blake wrote:
> Document some useful qemu-nbd command lines. Mention some restrictions
> on particular options, like -p being only for MBR images, or -c/-d
> being Linux-only.  Update some text given the recent change to no
> longer serve oldstyle protocol (missed in commit 7f7dfe2a).  Also,
> consistently use trailing '.' in describing options.
> 
> Signed-off-by: Eric Blake 
>
> ---
> v3: wording improvements, use -t in more examples [Rich]
> ---
>  qemu-nbd.texi | 91 ---
>  1 file changed, 72 insertions(+), 19 deletions(-)
> 
> diff --git a/qemu-nbd.texi b/qemu-nbd.texi
> index 96b1546006a..3f22559beb4 100644
> --- a/qemu-nbd.texi
> +++ b/qemu-nbd.texi
> @@ -10,11 +10,17 @@
> 
>  Export a QEMU disk image using the NBD protocol.
> 
> +Other uses:
> +@itemize
> +@item
> +Bind a /dev/nbdX block device to a QEMU server (on Linux).
> +@end itemize
> +
>  @c man end
> 
>  @c man begin OPTIONS
>  @var{filename} is a disk image filename, or a set of block
> -driver options if @var{--image-opts} is specified.
> +driver options if @option{--image-opts} is specified.
> 
>  @var{dev} is an NBD device.
> 
> @@ -27,24 +33,25 @@ supported. The common object types that it makes sense to 
> define are the
>  keys, and the @code{tls-creds} object, which is used to supply TLS
>  credentials for the qemu-nbd server.
>  @item -p, --port=@var{port}
> -The TCP port to listen on (default @samp{10809})
> +The TCP port to listen on (default @samp{10809}).
>  @item -o, --offset=@var{offset}
> -The offset into the image
> +The offset into the image.
>  @item -b, --bind=@var{iface}
> -The interface to bind to (default @samp{0.0.0.0})
> +The interface to bind to (default @samp{0.0.0.0}).
>  @item -k, --socket=@var{path}
> -Use a unix socket with path @var{path}
> +Use a unix socket with path @var{path}.
>  @item --image-opts
>  Treat @var{filename} as a set of image options, instead of a plain
>  filename. If this flag is specified, the @var{-f} flag should
>  not be used, instead the '@code{format=}' option should be set.
>  @item -f, --format=@var{fmt}
>  Force the use of the block driver for format @var{fmt} instead of
> -auto-detecting
> +auto-detecting.
>  @item -r, --read-only
> -Export the disk as read-only
> +Export the disk as read-only.
>  @item -P, --partition=@var{num}
> -Only expose partition @var{num}
> +Only expose MBR partition @var{num}.  Understands physical partitions
> +1-4 and logical partitions 5-8.
>  @item -B, --bitmap=@var{name}
>  If @var{filename} has a qcow2 persistent bitmap @var{name}, expose
>  that bitmap via the ``qemu:dirty-bitmap:@var{name}'' context
> @@ -52,7 +59,7 @@ accessible through NBD_OPT_SET_META_CONTEXT.
>  @item -s, --snapshot
>  Use @var{filename} as an external snapshot, create a temporary
>  file with backing_file=@var{filename}, redirect the write to
> -the temporary one
> +the temporary one.
>  @item -l, --load-snapshot=@var{snapshot_param}
>  Load an internal snapshot inside @var{filename} and export it
>  as an read-only device, @var{snapshot_param} format is
> @@ -76,19 +83,20 @@ driver-specific optimized zero write commands.  
> @var{detect-zeroes} is one of
>  converts a zero write to an unmap operation and can only be used if
>  @var{discard} is set to @samp{unmap}.  The default is @samp{off}.
>  @item -c, --connect=@var{dev}
> -Connect @var{filename} to NBD device @var{dev}
> +Connect @var{filename} to NBD device @var{dev} (Linux only).
>  @item -d, --disconnect
> -Disconnect the device @var{dev}
> +Disconnect the device @var{dev} (Linux only).
>  @item -e, --shared=@var{num}
> -Allow up to @var{num} clients to share the device (default @samp{1})
> +Allow up to @var{num} clients to share the device (default
> +@samp{1}). Safe for readers, but for now, consistency is not
> +guaranteed between multiple writers.
>  @item -t, --persistent
> -Don't exit on the last connection
> +Don't exit on the last connection.
>  @item -x, --export-name=@var{name}
> -Set the NBD volume export name. This switches the server to use
> -the new style NBD protocol negotiation
> +Set the NBD volume export name (default of a zero-length string).
>  @item -D, --description=@var{description}
>  Set the NBD volume export description, as a human-readable
> -string. Requires the use of @option{-x}
> +string.
>  @item --tls-creds=ID
>  Enable mandatory TLS encryption for the server by setting the ID
>  of the TLS credentials object previously created with the --object
> @@ -96,11 +104,11 @@ option.
>  @item --fork
>  Fork off the server process and exit the parent once the server is running.
>  @item -v, --verbose
> -Display extra debugging information
> +Display extra debugging information.
>  @item -h, --help
> -Display this help and exit
> +Display this help and exit.
>  @item -V, --version
> -Display version information and exit
> +Display version information and exit.
>  @item -T, --trace 
> [[enable=]@var{pat

Re: [Qemu-block] [PATCH v3 04/19] nbd/server: Hoist length check to qemp_nbd_server_add

2019-01-15 Thread Eric Blake
On 1/15/19 10:26 AM, Vladimir Sementsov-Ogievskiy wrote:

>>> @size is not size of the image, but size of the export, so it may be less 
>>> than dev_offset
>>> (qemu-nbd.c do "fd_size -= dev_offset" before "nbd_export_new(bs, 
>>> dev_offset, fd_size, "
>>
>> But the assert is fine because patch 3/19 fixed qemu-nbd.c to never pass
>> in dev_offset larger than size (it fails up front if dev_offset is out
>> of bounds, whether from the -o command line option or from what it read
>> from the partition header with the -P command line option).
>>
> 
> Don't follow =(
> 
> Assume, image size 3M, and we have offset 2M, i.e. -o 2M.
> 
> than in qemu-nbd.c, we have
> 
> fd_size = blk_getlength(blk); # 3M
> ...
> fd_size -= dev_offset; # 1M
> ...
> export = nbd_export_new(bs, dev_offset, fd_size # bs, 2M, 1M
> 
> in nbd_export_new:
> 
> assert(dev_offset <= size); # 2M <= 1M
> 
> fail.

Ouch, you are right. I don't need the assertion in server.c at all;
because all callers pass in a validated size, but the validated size has
no comparable relation to dev_offset.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-15 Thread Eric Blake
On 1/15/19 10:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:57, Eric Blake wrote:
>> When the user requests a partition, we were using data read
>> from the disk as disk offsets without a bounds check. We got
>> lucky that even when computed offsets are out-of-bounds,
>> blk_pread() will gracefully catch the error later (so I don't
>> think a malicious image can crash or exploit qemu-nbd, and am
>> not treating this as a security flaw), but it's better to
>> flag the problem up front than to risk permanent EIO death of
>> the block device down the road.  Also, note that the
>> partition code blindly overwrites any offset passed in by the
>> user; so make the -o/-P combo an error for less confusion.
>>
>> This can be tested with nbdkit:
>> $ echo hi > file
>> $ nbdkit -fv --filter=truncate partitioning file truncate=64k
>>
>> Pre-patch:
>> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
>> $ qemu-io -f raw nbd://localhost:10810
>> qemu-io> r -v 0 1
>> Disconnect client, due to: Failed to send reply: reading from file failed: 
>> Input/output error
>> Connection closed
>> read failed: Input/output error
>> qemu-io> q
>> [1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
>> nbd://localhost:10809
>>
>> Post-patch:
>> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
>> qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size 
>> exceeds file length 65536
>>
>> Signed-off-by: Eric Blake 
>> ---
>> v3: new patch
>> ---
>>   qemu-nbd.c | 18 +-
>>   1 file changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/qemu-nbd.c b/qemu-nbd.c
>> index 51b55f2e066..ff4adb9b3eb 100644
>> --- a/qemu-nbd.c
>> +++ b/qemu-nbd.c
>> @@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
>>   fd_size -= dev_offset;
>>
>>   if (partition != -1) {
>> -ret = find_partition(blk, partition, &dev_offset, &fd_size);
>> +off_t limit;
>> +
>> +if (dev_offset) {
>> +error_report("Cannot request partition and offset together");
> 
> hmm, but you still allow to request partition and set -o 0, I think it's 
> better
> to forbid it too.

Not worth the bother. Someday, we may want to permit -o and -P together,
where we first calculate the bounds of the partition from -P, and then
use -o to further restrict an even smaller subset of the image exposed
to the user.  Under that interpretation, the added error makes sense as
a short-term stop-gap that prevents us from doing what we want (since we
did not actually treat a non-zero offset as an offset within the
partition), but an offset of 0 is the same as omitting the offset.  So I
didn't see the point to complicate the code just to check whether -o had
been explicitly supplied with its already-default value.

> 
>> +exit(EXIT_FAILURE);
>> +}
>> +ret = find_partition(blk, partition, &dev_offset, &limit);
>>   if (ret < 0) {
>>   error_report("Could not find partition %d: %s", partition,
>>strerror(-ret));
>>   exit(EXIT_FAILURE);
>>   }
>> +/* partition limits are (32-bit << 9); can't overflow 64 bits */
>> +assert(dev_offset >= 0 && dev_offset + limit >= dev_offset);
> 
> hmm, so these values are read from file and may be whatsoever. Why to assert 
> instead
> of error_report and exit() ?

The assertion is letting the compiler know that we can't get a negative
value.  (off_t)(uint32_t << 9)*2 is always a positive value for 64-bit
off_t, but the compiler doesn't necessarily know that ((off_t)a +
(off_t)b >= a) is going to be true if it doesn't track the range
limitations imposed by find_partition().  You are right that we are
doing an assertion based on data obtained by reading from potentially
malicious data, but the assertion is safe because it will NEVER fail,
but is rather augmenting the compiler's data-type analysis.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 04/19] nbd/server: Hoist length check to qemp_nbd_server_add

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:25, Eric Blake wrote:
> On 1/15/19 3:44 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 12.01.2019 20:57, Eric Blake wrote:
>>> We only had two callers to nbd_export_new; qemu-nbd.c always
>>> passed a valid offset/length pair (because it already checked
>>> the file length, to ensure that offset was in bounds), while
>>> blockdev-nbd always passed 0/-1.  Then nbd_export_new reduces
>>> the size to a multiple of BDRV_SECTOR_SIZE (can only happen
>>> when offset is not sector-aligned, since bdrv_getlength()
>>> currently rounds up), which can result in offset being greater
>>> than the enforced length, but that's not fatal (the server
>>> rejects client requests that exceed the advertised length).
>>>
>>> However, I'm finding it easier to work with the code if we are
>>> consistent on having both callers pass in a valid length, and
>>> just assert that things are sane in nbd_export_new.
>>>
>>> Signed-off-by: Eric Blake 
>>>
> 
>>> +++ b/nbd/server.c
>>> @@ -1499,13 +1499,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, 
>>> off_t dev_offset, off_t size,
>>>exp->name = g_strdup(name);
>>>exp->description = g_strdup(description);
>>>exp->nbdflags = nbdflags;
>>> -exp->size = size < 0 ? blk_getlength(blk) : size;
>>> -if (exp->size < 0) {
>>> -error_setg_errno(errp, -exp->size,
>>> - "Failed to determine the NBD export's length");
>>> -goto fail;
>>> -}
>>> -exp->size -= exp->size % BDRV_SECTOR_SIZE;
>>> +assert(dev_offset <= size);
>>
>> @size is not size of the image, but size of the export, so it may be less 
>> than dev_offset
>> (qemu-nbd.c do "fd_size -= dev_offset" before "nbd_export_new(bs, 
>> dev_offset, fd_size, "
> 
> But the assert is fine because patch 3/19 fixed qemu-nbd.c to never pass
> in dev_offset larger than size (it fails up front if dev_offset is out
> of bounds, whether from the -o command line option or from what it read
> from the partition header with the -P command line option).
> 

Don't follow =(

Assume, image size 3M, and we have offset 2M, i.e. -o 2M.

than in qemu-nbd.c, we have

fd_size = blk_getlength(blk); # 3M
...
fd_size -= dev_offset; # 1M
...
export = nbd_export_new(bs, dev_offset, fd_size # bs, 2M, 1M

in nbd_export_new:

assert(dev_offset <= size); # 2M <= 1M

fail.

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 03/19] qemu-nbd: Sanity check partition bounds

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:57, Eric Blake wrote:
> When the user requests a partition, we were using data read
> from the disk as disk offsets without a bounds check. We got
> lucky that even when computed offsets are out-of-bounds,
> blk_pread() will gracefully catch the error later (so I don't
> think a malicious image can crash or exploit qemu-nbd, and am
> not treating this as a security flaw), but it's better to
> flag the problem up front than to risk permanent EIO death of
> the block device down the road.  Also, note that the
> partition code blindly overwrites any offset passed in by the
> user; so make the -o/-P combo an error for less confusion.
> 
> This can be tested with nbdkit:
> $ echo hi > file
> $ nbdkit -fv --filter=truncate partitioning file truncate=64k
> 
> Pre-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809 &
> $ qemu-io -f raw nbd://localhost:10810
> qemu-io> r -v 0 1
> Disconnect client, due to: Failed to send reply: reading from file failed: 
> Input/output error
> Connection closed
> read failed: Input/output error
> qemu-io> q
> [1]+  Doneqemu-nbd -p 10810 -P 1 -f raw 
> nbd://localhost:10809
> 
> Post-patch:
> $ qemu-nbd -p 10810 -P 1 -f raw nbd://localhost:10809
> qemu-nbd: Discovered partition 1 at offset 1048576 size 512, but size exceeds 
> file length 65536
> 
> Signed-off-by: Eric Blake 
> ---
> v3: new patch
> ---
>   qemu-nbd.c | 18 +-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 51b55f2e066..ff4adb9b3eb 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -1013,12 +1013,28 @@ int main(int argc, char **argv)
>   fd_size -= dev_offset;
> 
>   if (partition != -1) {
> -ret = find_partition(blk, partition, &dev_offset, &fd_size);
> +off_t limit;
> +
> +if (dev_offset) {
> +error_report("Cannot request partition and offset together");

hmm, but you still allow to request partition and set -o 0, I think it's better
to forbid it too.

> +exit(EXIT_FAILURE);
> +}
> +ret = find_partition(blk, partition, &dev_offset, &limit);
>   if (ret < 0) {
>   error_report("Could not find partition %d: %s", partition,
>strerror(-ret));
>   exit(EXIT_FAILURE);
>   }
> +/* partition limits are (32-bit << 9); can't overflow 64 bits */
> +assert(dev_offset >= 0 && dev_offset + limit >= dev_offset);

hmm, so these values are read from file and may be whatsoever. Why to assert 
instead
of error_report and exit() ?

> +if (dev_offset + limit > fd_size) {
> +error_report("Discovered partition %d at offset %lld size %lld, "
> + "but size exceeds file length %lld", partition,
> + (long long int) dev_offset, (long long int) limit,
> + (long long int) fd_size);
> +exit(EXIT_FAILURE);
> +}
> +fd_size = limit;
>   }
> 
>   export = nbd_export_new(bs, dev_offset, fd_size, export_name,
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-15 Thread Eric Blake
On 1/15/19 9:52 AM, Vladimir Sementsov-Ogievskiy wrote:

 -nbd_opt_meta_request(const char *context, const char *export) "Requesting 
 to set meta context %s for export %s"
 +nbd_opt_meta_request(const char *optname, const char *context, const char 
 *export) "Requesting %s %s for export %s"
>>>
>>> Hmm, you forget nbd_opt_lookup()
>>
>> Where? The updated trace point at [2] has nbd_opt_lookup() for
>> determining optname.
> 
> Your morning is my evening)
> 
> Yes, it's ok, and in the next patch too. But you usually trace both number + 
> lookup-string,
> and here only string. Is there a reason to?

I didn't see any added value in tracing the number, since the trace
point can only happen on one of two opt values (the number is more
important when the opt value can be anything, including one that does
not have a corresponding string name compiled in).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:55, Eric Blake wrote:
> On 1/15/19 9:52 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
> -nbd_opt_meta_request(const char *context, const char *export) 
> "Requesting to set meta context %s for export %s"
> +nbd_opt_meta_request(const char *optname, const char *context, const 
> char *export) "Requesting %s %s for export %s"

 Hmm, you forget nbd_opt_lookup()
>>>
>>> Where? The updated trace point at [2] has nbd_opt_lookup() for
>>> determining optname.
>>
>> Your morning is my evening)
>>
>> Yes, it's ok, and in the next patch too. But you usually trace both number + 
>> lookup-string,
>> and here only string. Is there a reason to?
> 
> I didn't see any added value in tracing the number, since the trace
> point can only happen on one of two opt values (the number is more
> important when the opt value can be anything, including one that does
> not have a corresponding string name compiled in).
> 

OK, agreed.

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 11/19] nbd/client: Split out nbd_receive_one_meta_context()

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:50, Eric Blake wrote:
> On 1/15/19 9:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 12.01.2019 20:58, Eric Blake wrote:
>>> Extract portions of nbd_negotiate_simple_meta_context() to
>>> a new function nbd_receive_one_meta_context() that copies the
>>> pattern of nbd_receive_list() for performing the argument
>>> validation of one reply.  The error message when the server
>>> replies with more than one context changes slightly, but
>>> that shouldn't happen in the common case.
>>>
>>> Signed-off-by: Eric Blake 
>>> Message-Id: <20181215135324.152629-15-ebl...@redhat.com>
>>>
>>> ---
>>> v3: rebase, without changing into a loop
>>> ---
> 
>>>
>>> +static int nbd_receive_one_meta_context(QIOChannel *ioc,
>>> +uint32_t opt,
>>> +char **name,
>>> +uint32_t *id,
>>> +Error **errp)
>>> +{
> 
>>> +reply.length -= sizeof(local_id);
>>> +local_name = g_malloc(reply.length + 1);
>>> +if (nbd_read(ioc, local_name, reply.length, errp) < 0) {
>>> +g_free(local_name);
>>> +return -1;
>>> +}
>>> +local_name[reply.length] = '\0';
>>> +trace_nbd_opt_meta_reply(nbd_opt_lookup(opt), local_name, local_id);
>>> +
> 
> [1]
> 
>>> @@ -746,36 +796,20 @@ static int 
>>> nbd_negotiate_simple_meta_context(QIOChannel *ioc,
>>>return -1;
>>>}
>>>g_free(name);
>>> -
>>> -trace_nbd_opt_meta_reply(context, info->context_id);
>>>received = true;
>>>
>>> -/* receive NBD_REP_ACK */
>>> -if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, &reply,
>>> - errp) < 0)
>>> -{
>>> +ret = nbd_receive_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
>>> +   &name, &info->context_id, errp);
>>
>> indent, and, no reasons to use variables instead of NULL, NULL, as success 
>> here is an error
>> path anyway
> 
> Not sure how I missed the indent, and good point about not needing &name
> or &info->context_id on this second call.
> 
>>> +++ b/nbd/trace-events
>>> @@ -13,7 +13,7 @@ nbd_receive_query_exports_success(const char *wantname) 
>>> "Found desired export na
>>>nbd_receive_starttls_new_client(void) "Setting up TLS"
>>>nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
>>>nbd_opt_meta_request(const char *optname, const char *context, const 
>>> char *export) "Requesting %s %s for export %s"
>>> -nbd_opt_meta_reply(const char *context, uint32_t id) "Received mapping of 
>>> context %s to id %" PRIu32
>>> +nbd_opt_meta_reply(const char *optname, const char *context, uint32_t id) 
>>> "Received %s mapping of %s to id %" PRIu32
>>
>> nbd_opt_lookup
> 
> Added at [1], so I'm not sure what you are still wanting here.

Yes, my fault, it's OK.

> 
>>
>>>nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving 
>>> negotiation tlscreds=%p hostname=%s"
>>>nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
>>>nbd_receive_negotiate_server_flags(uint32_t globalflags) "Global flags 
>>> are 0x%" PRIx32
>>>
>>
>> With at least nbd_opt_lookup:
>> Reviewed-by: Vladimir Sementsov-Ogievskiy 
>>
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:44, Eric Blake wrote:
> On 1/15/19 7:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 12.01.2019 20:58, Eric Blake wrote:
>>> Refactor nbd_negotiate_simple_meta_context() to pull out the
>>> code that can be reused to send a LIST request for 0 or 1 query.
>>> No semantic change.  The old comment about 'sizeof(uint32_t)'
>>> being equivalent to '/* number of queries */' is no longer
>>> needed, now that we are computing 'sizeof(queries)' instead.
>>>
>>> Signed-off-by: Eric Blake 
>>> Message-Id: <20181215135324.152629-14-ebl...@redhat.com>
>>> Reviewed-by: Richard W.M. Jones 
>>>
>>> ---
>>> v3: Improve commit message [Rich], formatting tweak [checkpatch],
>>> rebase to dropped patch
>>> ---
> 
>>> +++ b/nbd/client.c
>>> @@ -629,6 +629,49 @@ static QIOChannel *nbd_receive_starttls(QIOChannel 
>>> *ioc,
>>>return QIO_CHANNEL(tioc);
>>>}
>>>
>>> +/*
>>> + * nbd_send_one_meta_context:
>>> + * Send 0 or 1 set/list meta context queries.
>>> + * Return 0 on success, -1 with errp set for any error
>>> + */
>>> +static int nbd_send_one_meta_context(QIOChannel *ioc,
>>> + uint32_t opt,
>>> + const char *export,
>>> + const char *query,
>>> + Error **errp)
>>> +{
>>> +int ret;
>>> +uint32_t export_len = strlen(export);
>>> +uint32_t queries = !!query;
>>
>> n_ or nb_ prefix may make it more clear
>>
>>> +uint32_t context_len = 0;
>>> +uint32_t data_len;
>>> +char *data;
>>> +char *p;
>>> +
>>> +data_len = sizeof(export_len) + export_len + sizeof(queries);
> 
> [1]
> 
>>> +if (query) {
>>> +context_len = strlen(query);
>>
>> looks like it then should be query_len
> 
> Sure, an alternative name may make things easier to read (I think this
> is somewhat fallout from my rebase churn, where earlier versions of the
> patch shared as much code with NBD_OPT_SET_META_CONTEXT, and that code
> used the name 'context' rather than 'query'; but now that I've split
> things to add a new function, it doesn't have to maintain the old naming).
> 
>>
>>> +data_len += sizeof(context_len) + context_len;
>>> +} else {
>>> +assert(opt == NBD_OPT_LIST_META_CONTEXT);
>>> +}
>>> +data = g_malloc(data_len);
>>> +p = data;
>>
>> may use p = data = g_malloc
> 
> Will do.
> 
>>
>>> +
>>> +trace_nbd_opt_meta_request(nbd_opt_lookup(opt), query ?: "(all)", 
>>> export);
> 
> [2]
> 
>>> +stl_be_p(p, export_len);
>>> +memcpy(p += sizeof(export_len), export, export_len);
>>> +stl_be_p(p += export_len, queries);
>>> +if (query) {
>>> +stl_be_p(p += sizeof(uint32_t), context_len);
>>
>> :), aha, please, s/uint32_t/queries, as you promised
> 
> I did up at [1], but should indeed do so again here.
> 
>>
>> Hmm, its my code. It's hard to read and not very comfortable to maintain..
>>
>> In block/nbd-client.c we have
>> payload_advance* functions, to read such formatted data, I think, it should 
>> be
>> good to make something like this for server-part. Not about these series, of 
>> course.
> 
> Yes, I wouldn't object to even more cleanups to make the code easier to
> maintain, but not as part of this series.
> 
>>
>> Interesting, troubles around "don't use be64_to_cpuS, use only be64_to_cpu",
>> do they apply somehow to *_be_p functions family?
> 
> The problem was in the in-place conversion routines where the
> destination type was strongly typed to something wider than char*.  This
> is not an inplace conversion, because st*_p takes a raw pointer
> interpreted as char* as its destination.  So no, clang does not have
> problems with this construct.
> 
>>
>>> +memcpy(p += sizeof(context_len), query, context_len);
>>> +}
>>> +
>>> +ret = nbd_send_option_request(ioc, opt, data_len, data, errp);
>>> +g_free(data);
>>> +return ret;
>>> +}
>>> +
>>>/* nbd_negotiate_simple_meta_context:
>>> * Request the server to set the meta context for export @info->name
>>> * using @info->x_dirty_bitmap with a fallback to "base:allocation",
>>> @@ -653,26 +696,10 @@ static int 
>>> nbd_negotiate_simple_meta_context(QIOChannel *ioc,
>>>NBDOptionReply reply;
>>>const char *context = info->x_dirty_bitmap ?: "base:allocation";
>>>bool received = false;
>>> -uint32_t export_len = strlen(info->name);
>>> -uint32_t context_len = strlen(context);
>>> -uint32_t data_len = sizeof(export_len) + export_len +
>>> -sizeof(uint32_t) + /* number of queries */
>>> -sizeof(context_len) + context_len;
>>> -char *data = g_malloc(data_len);
>>> -char *p = data;
>>>
>>> -trace_nbd_opt_meta_request(context, info->name);
>>> -stl_be_p(p, export_len);
>>> -memcpy(p += sizeof(export_len), info->name, export_len);
>>> -stl_be_p(p += export_len, 1);
>>> -stl_be_p(p += sizeof(uint32_t), cont

Re: [Qemu-block] [PATCH v3 11/19] nbd/client: Split out nbd_receive_one_meta_context()

2019-01-15 Thread Eric Blake
On 1/15/19 9:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:58, Eric Blake wrote:
>> Extract portions of nbd_negotiate_simple_meta_context() to
>> a new function nbd_receive_one_meta_context() that copies the
>> pattern of nbd_receive_list() for performing the argument
>> validation of one reply.  The error message when the server
>> replies with more than one context changes slightly, but
>> that shouldn't happen in the common case.
>>
>> Signed-off-by: Eric Blake 
>> Message-Id: <20181215135324.152629-15-ebl...@redhat.com>
>>
>> ---
>> v3: rebase, without changing into a loop
>> ---

>> 
>> +static int nbd_receive_one_meta_context(QIOChannel *ioc,
>> +uint32_t opt,
>> +char **name,
>> +uint32_t *id,
>> +Error **errp)
>> +{

>> +reply.length -= sizeof(local_id);
>> +local_name = g_malloc(reply.length + 1);
>> +if (nbd_read(ioc, local_name, reply.length, errp) < 0) {
>> +g_free(local_name);
>> +return -1;
>> +}
>> +local_name[reply.length] = '\0';
>> +trace_nbd_opt_meta_reply(nbd_opt_lookup(opt), local_name, local_id);
>> +

[1]

>> @@ -746,36 +796,20 @@ static int 
>> nbd_negotiate_simple_meta_context(QIOChannel *ioc,
>>   return -1;
>>   }
>>   g_free(name);
>> -
>> -trace_nbd_opt_meta_reply(context, info->context_id);
>>   received = true;
>>
>> -/* receive NBD_REP_ACK */
>> -if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, &reply,
>> - errp) < 0)
>> -{
>> +ret = nbd_receive_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
>> +   &name, &info->context_id, errp);
> 
> indent, and, no reasons to use variables instead of NULL, NULL, as success 
> here is an error
> path anyway

Not sure how I missed the indent, and good point about not needing &name
or &info->context_id on this second call.

>> +++ b/nbd/trace-events
>> @@ -13,7 +13,7 @@ nbd_receive_query_exports_success(const char *wantname) 
>> "Found desired export na
>>   nbd_receive_starttls_new_client(void) "Setting up TLS"
>>   nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
>>   nbd_opt_meta_request(const char *optname, const char *context, const char 
>> *export) "Requesting %s %s for export %s"
>> -nbd_opt_meta_reply(const char *context, uint32_t id) "Received mapping of 
>> context %s to id %" PRIu32
>> +nbd_opt_meta_reply(const char *optname, const char *context, uint32_t id) 
>> "Received %s mapping of %s to id %" PRIu32
> 
> nbd_opt_lookup

Added at [1], so I'm not sure what you are still wanting here.

> 
>>   nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving 
>> negotiation tlscreds=%p hostname=%s"
>>   nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
>>   nbd_receive_negotiate_server_flags(uint32_t globalflags) "Global flags are 
>> 0x%" PRIx32
>>
> 
> With at least nbd_opt_lookup:
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 14/19] nbd/client: Pull out oldstyle size determination

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:35, Vladimir Sementsov-Ogievskiy wrote:
> It must be set to true before calling, and nbd_start_negotiate may set it to 
> true.

set it to false, I wanted to say

-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-15 Thread Eric Blake
On 1/15/19 7:18 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:58, Eric Blake wrote:
>> Refactor nbd_negotiate_simple_meta_context() to pull out the
>> code that can be reused to send a LIST request for 0 or 1 query.
>> No semantic change.  The old comment about 'sizeof(uint32_t)'
>> being equivalent to '/* number of queries */' is no longer
>> needed, now that we are computing 'sizeof(queries)' instead.
>>
>> Signed-off-by: Eric Blake 
>> Message-Id: <20181215135324.152629-14-ebl...@redhat.com>
>> Reviewed-by: Richard W.M. Jones 
>>
>> ---
>> v3: Improve commit message [Rich], formatting tweak [checkpatch],
>> rebase to dropped patch
>> ---

>> +++ b/nbd/client.c
>> @@ -629,6 +629,49 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>>   return QIO_CHANNEL(tioc);
>>   }
>>
>> +/*
>> + * nbd_send_one_meta_context:
>> + * Send 0 or 1 set/list meta context queries.
>> + * Return 0 on success, -1 with errp set for any error
>> + */
>> +static int nbd_send_one_meta_context(QIOChannel *ioc,
>> + uint32_t opt,
>> + const char *export,
>> + const char *query,
>> + Error **errp)
>> +{
>> +int ret;
>> +uint32_t export_len = strlen(export);
>> +uint32_t queries = !!query;
> 
> n_ or nb_ prefix may make it more clear
> 
>> +uint32_t context_len = 0;
>> +uint32_t data_len;
>> +char *data;
>> +char *p;
>> +
>> +data_len = sizeof(export_len) + export_len + sizeof(queries);

[1]

>> +if (query) {
>> +context_len = strlen(query);
> 
> looks like it then should be query_len

Sure, an alternative name may make things easier to read (I think this
is somewhat fallout from my rebase churn, where earlier versions of the
patch shared as much code with NBD_OPT_SET_META_CONTEXT, and that code
used the name 'context' rather than 'query'; but now that I've split
things to add a new function, it doesn't have to maintain the old naming).

> 
>> +data_len += sizeof(context_len) + context_len;
>> +} else {
>> +assert(opt == NBD_OPT_LIST_META_CONTEXT);
>> +}
>> +data = g_malloc(data_len);
>> +p = data;
> 
> may use p = data = g_malloc

Will do.

> 
>> +
>> +trace_nbd_opt_meta_request(nbd_opt_lookup(opt), query ?: "(all)", 
>> export);

[2]

>> +stl_be_p(p, export_len);
>> +memcpy(p += sizeof(export_len), export, export_len);
>> +stl_be_p(p += export_len, queries);
>> +if (query) {
>> +stl_be_p(p += sizeof(uint32_t), context_len);
> 
> :), aha, please, s/uint32_t/queries, as you promised

I did up at [1], but should indeed do so again here.

> 
> Hmm, its my code. It's hard to read and not very comfortable to maintain..
> 
> In block/nbd-client.c we have
> payload_advance* functions, to read such formatted data, I think, it should be
> good to make something like this for server-part. Not about these series, of 
> course.

Yes, I wouldn't object to even more cleanups to make the code easier to
maintain, but not as part of this series.

> 
> Interesting, troubles around "don't use be64_to_cpuS, use only be64_to_cpu",
> do they apply somehow to *_be_p functions family?

The problem was in the in-place conversion routines where the
destination type was strongly typed to something wider than char*.  This
is not an inplace conversion, because st*_p takes a raw pointer
interpreted as char* as its destination.  So no, clang does not have
problems with this construct.

> 
>> +memcpy(p += sizeof(context_len), query, context_len);
>> +}
>> +
>> +ret = nbd_send_option_request(ioc, opt, data_len, data, errp);
>> +g_free(data);
>> +return ret;
>> +}
>> +
>>   /* nbd_negotiate_simple_meta_context:
>>* Request the server to set the meta context for export @info->name
>>* using @info->x_dirty_bitmap with a fallback to "base:allocation",
>> @@ -653,26 +696,10 @@ static int 
>> nbd_negotiate_simple_meta_context(QIOChannel *ioc,
>>   NBDOptionReply reply;
>>   const char *context = info->x_dirty_bitmap ?: "base:allocation";
>>   bool received = false;
>> -uint32_t export_len = strlen(info->name);
>> -uint32_t context_len = strlen(context);
>> -uint32_t data_len = sizeof(export_len) + export_len +
>> -sizeof(uint32_t) + /* number of queries */
>> -sizeof(context_len) + context_len;
>> -char *data = g_malloc(data_len);
>> -char *p = data;
>>
>> -trace_nbd_opt_meta_request(context, info->name);
>> -stl_be_p(p, export_len);
>> -memcpy(p += sizeof(export_len), info->name, export_len);
>> -stl_be_p(p += export_len, 1);
>> -stl_be_p(p += sizeof(uint32_t), context_len);
>> -memcpy(p += sizeof(context_len), context, context_len);
>> -
>> -ret = nbd_send_option_request(ioc, NBD_OPT_SET_META_CONTEXT, data_len, 
>> data,
>> -  errp);

Re: [Qemu-block] [PATCH v3 05/19] nbd/server: Favor [u]int64_t over off_t

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
15.01.2019 18:33, Eric Blake wrote:
> On 1/15/19 4:19 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 12.01.2019 20:57, Eric Blake wrote:
>>> Although our compile-time environment is set up so that we always
>>> support long files with 64-bit off_t, we have no guarantee whether
>>> off_t is the same type as int64_t.  This requires casts when
>>> printing values, and prevents us from directly using qemu_strtoi64().
>>> Let's just flip to [u]int64_t (signed for length, because we have to
>>> detect failure of blk_getlength()
>>
>> we have not, after previous patch
> 
> nbd/server.c no longer has to check for blk_getlength() failures, but
> blockdev-nbd.c and qemu-nbd.c still do.  Since the callers have to use
> an int64_t type for the length as part of their error checking, it's
> easier to accept an int64_t length to nbd_export_new(), even if
> nbd_export_new() could also use an unsigned type.
> 
>>
>> and because off_t was signed;
>>> unsigned for offset because it lets us simplify some math without
>>> having to worry about signed overflow).
>>>
>>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>>> Signed-off-by: Eric Blake 
>>>
>>> ---
>>> v3: new patch
>>> ---
>>>include/block/nbd.h |  4 ++--
>>>nbd/server.c| 14 +++---
>>>qemu-nbd.c  | 26 ++
>>>3 files changed, 19 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>>> index 1971b557896..0f252829376 100644
>>> --- a/include/block/nbd.h
>>> +++ b/include/block/nbd.h
>>> @@ -294,8 +294,8 @@ int nbd_errno_to_system_errno(int err);
>>>typedef struct NBDExport NBDExport;
>>>typedef struct NBDClient NBDClient;
>>>
>>> -NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t 
>>> size,
>>> -  const char *name, const char *description,
>>> +NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>>> +  int64_t size, const char *name, const char *desc,
>>
>> in previous patch you drop use of negative @size parameter, so it looks 
>> better
>> to use unsigned for size like for offset
> 
> You can't have a size larger than 2^63; but an unsigned type holds
> nearly 2^64.  I prefer a signed size, for the same reason that off_t is
> signed, in that checking for a negative size is easier to rule out sizes
> that are too large.
> 
> 
>>>
>>>static int find_partition(BlockBackend *blk, int partition,
>>> -  off_t *offset, off_t *size)
>>> +  uint64_t *offset, int64_t *size)
>>
>> function never return negative @size, so what is the reason to keep it 
>> signed?
> 
> Because the C compiler does NOT like:
> 
> int64_t len;
> find_partition(..., &len);
> 
> with a uint64_t* parameter type - you HAVE to match the signed-ness of
> your caller's parameter with your pointer type. Since the caller already
> has to use a signed type (to check for blk_getlength() failure AND
> because sizes really cannot exceed 2^63), it's easier to keep it signed
> here.
> 
>>
>> Also, type conversion (uint64_t) should be dropped from the function code I 
>> think.
> 
> Are you talking about this part:
> 
>  if ((ext_partnum + j + 1) == partition) {
>  *offset = (uint64_t)ext[j].start_sector_abs << 9;
>  *size = (uint64_t)ext[j].nb_sectors_abs << 9;
>  return 0;
>  }
>  }
>  ext_partnum += 4;
>  } else if ((i + 1) == partition) {
>  *offset = (uint64_t)mbr[i].start_sector_abs << 9;
>  *size = (uint64_t)mbr[i].nb_sectors_abs << 9;
>  return 0;
> 
> No - that has to keep the cast, because .start_sector_abs and
> .nb_sectors_abs are uint32_t values, but we want to shift into 64-bit
> results.  You need the cast to force the correct arithmetic rather than
> truncating into a 32-bit value that then gets widened into 64-bit storage.

Oops, I'm stupid)

I thought about something like (uint64_t),
but pointed to  = 
(uint64_t)

> 
>>> @@ -665,10 +665,6 @@ int main(int argc, char **argv)
>>>error_report("Invalid offset `%s'", optarg);
>>>exit(EXIT_FAILURE);
>>>}
>>> -if (dev_offset < 0) {
>>> -error_report("Offset must be positive `%s'", optarg);
>>> -exit(EXIT_FAILURE);
>>> -}
>>
>> hm, then, may be, s/strtoll/strtoull before this?
> 
> I clean that up in patch 6/19.
> 
>>
>>>break;
>>>case 'l':
>>>if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
>>> @@ -1005,15 +1001,14 @@ int main(int argc, char **argv)
>>>}
>>>
>>>if (dev_offset >= fd_size) {
>>> -error_report("Offset (%lld) has to be smaller than the image size "
>>> - "(%lld)",
>>> - (long long int)dev_offset, (long long int)fd_size);
>>> + 

Re: [Qemu-block] [PATCH v3 14/19] nbd/client: Pull out oldstyle size determination

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:58, Eric Blake wrote:
> Another refactoring creating nbd_negotiate_finish_oldstyle()
> for further reuse during 'qemu-nbd --list'.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-18-ebl...@redhat.com>
> Reviewed-by: Richard W.M. Jones 

Reviewed-by: Vladimir Sementsov-Ogievskiy 

Hmm, looking at this, I found that nbd_start_negotiate() has very unusual 
semantics
for @zeroes field, without any comment about it:

It must be set to true before calling, and nbd_start_negotiate may set it to 
true.

I think better is not to rely on caller initialization and set zeroes to true 
at the
beginning of nbd_start_negotiate().

> ---
>   nbd/client.c | 49 -
>   1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 5053433ea5e..620fbb5ef01 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -818,7 +818,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
> *ioc,
>* Start the handshake to the server.  After a positive return, the server
>* is ready to accept additional NBD_OPT requests.
>* Returns: negative errno: failure talking to server
> - *  0: server is oldstyle, client must still parse export size
> + *  0: server is oldstyle, must call nbd_negotiate_finish_oldstyle
>*  1: server is newstyle, but can only accept EXPORT_NAME
>*  2: server is newstyle, but lacks structured replies
>*  3: server is newstyle and set up for structured replies
> @@ -923,6 +923,36 @@ static int nbd_start_negotiate(QIOChannel *ioc, 
> QCryptoTLSCreds *tlscreds,
>   }
>   }
> 
> +/*
> + * nbd_negotiate_finish_oldstyle:
> + * Populate @info with the size and export flags from an oldstyle server,
> + * but does not consume 124 bytes of reserved zero padding.
> + * Returns 0 on success, -1 with @errp set on failure
> + */
> +static int nbd_negotiate_finish_oldstyle(QIOChannel *ioc, NBDExportInfo 
> *info,
> + Error **errp)
> +{
> +uint32_t oldflags;
> +
> +if (nbd_read(ioc, &info->size, sizeof(info->size), errp) < 0) {
> +error_prepend(errp, "Failed to read export length: ");
> +return -EINVAL;
> +}
> +info->size = be64_to_cpu(info->size);
> +
> +if (nbd_read(ioc, &oldflags, sizeof(oldflags), errp) < 0) {
> +error_prepend(errp, "Failed to read export flags: ");
> +return -EINVAL;
> +}
> +oldflags = be32_to_cpu(oldflags);
> +if (oldflags & ~0x) {
> +error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
> +return -EINVAL;
> +}
> +info->flags = oldflags;
> +return 0;
> +}
> +
>   /*
>* nbd_receive_negotiate:
>* Connect to server, complete negotiation, and move into transmission 
> phase.
> @@ -936,7 +966,6 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
> QCryptoTLSCreds *tlscreds,
>   int result;
>   bool zeroes = true;
>   bool base_allocation = info->base_allocation;
> -uint32_t oldflags;
> 
>   assert(info->name);
>   trace_nbd_receive_negotiate_name(info->name);
> @@ -1009,23 +1038,9 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
> QCryptoTLSCreds *tlscreds,
>   error_setg(errp, "Server does not support non-empty export 
> names");
>   return -EINVAL;
>   }
> -
> -if (nbd_read(ioc, &info->size, sizeof(info->size), errp) < 0) {
> -error_prepend(errp, "Failed to read export length: ");
> +if (nbd_negotiate_finish_oldstyle(ioc, info, errp) < 0) {
>   return -EINVAL;
>   }
> -info->size = be64_to_cpu(info->size);
> -
> -if (nbd_read(ioc, &oldflags, sizeof(oldflags), errp) < 0) {
> -error_prepend(errp, "Failed to read export flags: ");
> -return -EINVAL;
> -}
> -oldflags = be32_to_cpu(oldflags);
> -if (oldflags & ~0x) {
> -error_setg(errp, "Unexpected export flags %0x" PRIx32, oldflags);
> -return -EINVAL;
> -}
> -info->flags = oldflags;
>   break;
>   default:
>   return result;
> 


-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 06/19] qemu-nbd: Avoid strtol open-coding

2019-01-15 Thread Eric Blake
On 1/15/19 6:31 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:57, Eric Blake wrote:
>> Our copy-and-pasted open-coding of strtol handling forgot to
>> handle overflow conditions.  Use qemu_strto*() instead.
>>
>> In the case of --partition, since we insist on a user-supplied
>> partition to be non-zero, we can use 0 rather than -1 for our
>> initial value to distinguish when a partition is not being
>> served, for slightly more optimal code.
>>
>> The error messages for out-of-bounds values are less specific,
>> but should not be a terrible loss in quality.
>>
>> Signed-off-by: Eric Blake 
>> Message-Id: <20181215135324.152629-8-ebl...@redhat.com>
>>

>> -if (partition < 1 || partition > 8) {
>> -error_report("Invalid partition %d", partition);
>> +if (qemu_strtoi(optarg, NULL, 0, &partition) < 0 ||
> 
> we can use unsigned conversion like for offset (and unsigned type for 
> partition), but this doesn't really matter.

Yes, but I didn't see the point in changing the variable types in this
patch.

> 
> anyway,
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 13/19] nbd/client: Split handshake into two functions

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:58, Eric Blake wrote:
> An upcoming patch will add the ability for qemu-nbd to list
> the services provided by an NBD server.  Share the common
> code of the TLS handshake by splitting the initial exchange
> into a separate function, leaving only the export handling
> in the original function.  Functionally, there should be no
> change in behavior in this patch, although some of the code
> motion may be difficult to follow due to indentation changes
> (view with 'git diff -w' for a smaller changeset).
> 
> I considered an enum for the return code coordinating state
> between the two functions, but in the end just settled with
> ample comments.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Richard W.M. Jones 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Message-Id: <20181215135324.152629-17-ebl...@redhat.com>
> ---
>   nbd/client.c | 144 +++
>   nbd/trace-events |   2 +-
>   2 files changed, 95 insertions(+), 51 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index fa931fd8e5d..5053433ea5e 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -813,21 +813,24 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
> *ioc,
>   return received;
>   }
> 
> -int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
> -  const char *hostname, QIOChannel **outioc,
> -  NBDExportInfo *info, Error **errp)
> +/*
> + * nbd_start_negotiate:
> + * Start the handshake to the server.  After a positive return, the server
> + * is ready to accept additional NBD_OPT requests.

+ * @zeroes must be set to true before call !!!


> + * Returns: negative errno: failure talking to server
> + *  0: server is oldstyle, client must still parse export size
> + *  1: server is newstyle, but can only accept EXPORT_NAME
> + *  2: server is newstyle, but lacks structured replies
> + *  3: server is newstyle and set up for structured replies
> + */
> +static int nbd_start_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
> +   const char *hostname, QIOChannel **outioc,
> +   bool structured_reply, bool *zeroes,
> +   Error **errp)
>   {
>   uint64_t magic;
> -bool zeroes = true;
> -bool structured_reply = info->structured_reply;
> -bool base_allocation = info->base_allocation;
> 
> -trace_nbd_receive_negotiate(tlscreds, hostname ? hostname : "");
> -
> -assert(info->name);
> -trace_nbd_receive_negotiate_name(info->name);
> -info->structured_reply = false;
> -info->base_allocation = false;
> +trace_nbd_start_negotiate(tlscreds, hostname ? hostname : "");

or, a lot better:

+*zeroes = true


> 
>   if (outioc) {
>   *outioc = NULL;
> @@ -872,7 +875,7 @@ int nbd_receive_negotiate(QIOChannel *ioc, 
> QCryptoTLSCreds *tlscreds,
>   clientflags |= NBD_FLAG_C_FIXED_NEWSTYLE;
>   }
>   if (globalflags & NBD_FLAG_NO_ZEROES) {
> -zeroes = false;
> +*zeroes = false;
>   clientflags |= NBD_FLAG_C_NO_ZEROES;
>   }
>   /* client requested flags */

[..]

> +/*
> + * nbd_receive_negotiate:
> + * Connect to server, complete negotiation, and move into transmission phase.
> + * Returns: negative errno: failure talking to server
> + *  0: server is connected
> + */
> +int nbd_receive_negotiate(QIOChannel *ioc, QCryptoTLSCreds *tlscreds,
> +  const char *hostname, QIOChannel **outioc,
> +  NBDExportInfo *info, Error **errp)
> +{
> +int result;
> +bool zeroes = true;

and then, this initialization may be dropped (or left as is, if gcc like it)

> +bool base_allocation = info->base_allocation;
> +uint32_t oldflags;
> +
> +assert(info->name);
> +trace_nbd_receive_negotiate_name(info->name);
> +
> +result = nbd_start_negotiate(ioc, tlscreds, hostname, outioc,
> + info->structured_reply, &zeroes, errp);
> +



-- 
Best regards,
Vladimir


Re: [Qemu-block] [PATCH v3 05/19] nbd/server: Favor [u]int64_t over off_t

2019-01-15 Thread Eric Blake
On 1/15/19 4:19 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:57, Eric Blake wrote:
>> Although our compile-time environment is set up so that we always
>> support long files with 64-bit off_t, we have no guarantee whether
>> off_t is the same type as int64_t.  This requires casts when
>> printing values, and prevents us from directly using qemu_strtoi64().
>> Let's just flip to [u]int64_t (signed for length, because we have to
>> detect failure of blk_getlength()
> 
> we have not, after previous patch

nbd/server.c no longer has to check for blk_getlength() failures, but
blockdev-nbd.c and qemu-nbd.c still do.  Since the callers have to use
an int64_t type for the length as part of their error checking, it's
easier to accept an int64_t length to nbd_export_new(), even if
nbd_export_new() could also use an unsigned type.

> 
> and because off_t was signed;
>> unsigned for offset because it lets us simplify some math without
>> having to worry about signed overflow).
>>
>> Suggested-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: Eric Blake 
>>
>> ---
>> v3: new patch
>> ---
>>   include/block/nbd.h |  4 ++--
>>   nbd/server.c| 14 +++---
>>   qemu-nbd.c  | 26 ++
>>   3 files changed, 19 insertions(+), 25 deletions(-)
>>
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 1971b557896..0f252829376 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -294,8 +294,8 @@ int nbd_errno_to_system_errno(int err);
>>   typedef struct NBDExport NBDExport;
>>   typedef struct NBDClient NBDClient;
>>
>> -NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t 
>> size,
>> -  const char *name, const char *description,
>> +NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
>> +  int64_t size, const char *name, const char *desc,
> 
> in previous patch you drop use of negative @size parameter, so it looks better
> to use unsigned for size like for offset

You can't have a size larger than 2^63; but an unsigned type holds
nearly 2^64.  I prefer a signed size, for the same reason that off_t is
signed, in that checking for a negative size is easier to rule out sizes
that are too large.


>>
>>   static int find_partition(BlockBackend *blk, int partition,
>> -  off_t *offset, off_t *size)
>> +  uint64_t *offset, int64_t *size)
> 
> function never return negative @size, so what is the reason to keep it signed?

Because the C compiler does NOT like:

int64_t len;
find_partition(..., &len);

with a uint64_t* parameter type - you HAVE to match the signed-ness of
your caller's parameter with your pointer type. Since the caller already
has to use a signed type (to check for blk_getlength() failure AND
because sizes really cannot exceed 2^63), it's easier to keep it signed
here.

> 
> Also, type conversion (uint64_t) should be dropped from the function code I 
> think.

Are you talking about this part:

if ((ext_partnum + j + 1) == partition) {
*offset = (uint64_t)ext[j].start_sector_abs << 9;
*size = (uint64_t)ext[j].nb_sectors_abs << 9;
return 0;
}
}
ext_partnum += 4;
} else if ((i + 1) == partition) {
*offset = (uint64_t)mbr[i].start_sector_abs << 9;
*size = (uint64_t)mbr[i].nb_sectors_abs << 9;
return 0;

No - that has to keep the cast, because .start_sector_abs and
.nb_sectors_abs are uint32_t values, but we want to shift into 64-bit
results.  You need the cast to force the correct arithmetic rather than
truncating into a 32-bit value that then gets widened into 64-bit storage.

>> @@ -665,10 +665,6 @@ int main(int argc, char **argv)
>>   error_report("Invalid offset `%s'", optarg);
>>   exit(EXIT_FAILURE);
>>   }
>> -if (dev_offset < 0) {
>> -error_report("Offset must be positive `%s'", optarg);
>> -exit(EXIT_FAILURE);
>> -}
> 
> hm, then, may be, s/strtoll/strtoull before this?

I clean that up in patch 6/19.

> 
>>   break;
>>   case 'l':
>>   if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
>> @@ -1005,15 +1001,14 @@ int main(int argc, char **argv)
>>   }
>>
>>   if (dev_offset >= fd_size) {
>> -error_report("Offset (%lld) has to be smaller than the image size "
>> - "(%lld)",
>> - (long long int)dev_offset, (long long int)fd_size);
>> +error_report("Offset (%" PRIu64 ") has to be smaller than the image 
>> "
>> + "size (%" PRIu64 ")", dev_offset, fd_size);
> 
> PRId64 for fd_size

Sure.


>> @@ -1026,12 +1021,11 @@ int main(int argc, char **argv)
>>   exit(EXIT_FAILURE);
>>   }
>>   /* partition limits are (3

Re: [Qemu-block] [PATCH v12 09/10] qcow2: skip writing zero buffers to empty COW areas

2019-01-15 Thread Alberto Garcia
On Mon 14 Jan 2019 12:18:30 PM CET, Anton Nefedov wrote:
> If COW areas of the newly allocated clusters are zeroes on the backing image,
> efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
> cluster instead of writing explicit zero buffers later in perform_cow().
>
> iotest 060:
> write to the discarded cluster does not trigger COW anymore.
> Use a backing image instead.
>
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

> +ret = handle_alloc_space(bs, l2meta);

I insist that it would be nice to have a short comment explaining what
this does.

Berto



Re: [Qemu-block] [PATCH v3 04/19] nbd/server: Hoist length check to qemp_nbd_server_add

2019-01-15 Thread Eric Blake
On 1/15/19 3:44 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.01.2019 20:57, Eric Blake wrote:
>> We only had two callers to nbd_export_new; qemu-nbd.c always
>> passed a valid offset/length pair (because it already checked
>> the file length, to ensure that offset was in bounds), while
>> blockdev-nbd always passed 0/-1.  Then nbd_export_new reduces
>> the size to a multiple of BDRV_SECTOR_SIZE (can only happen
>> when offset is not sector-aligned, since bdrv_getlength()
>> currently rounds up), which can result in offset being greater
>> than the enforced length, but that's not fatal (the server
>> rejects client requests that exceed the advertised length).
>>
>> However, I'm finding it easier to work with the code if we are
>> consistent on having both callers pass in a valid length, and
>> just assert that things are sane in nbd_export_new.
>>
>> Signed-off-by: Eric Blake 
>>

>> +++ b/nbd/server.c
>> @@ -1499,13 +1499,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
>> dev_offset, off_t size,
>>   exp->name = g_strdup(name);
>>   exp->description = g_strdup(description);
>>   exp->nbdflags = nbdflags;
>> -exp->size = size < 0 ? blk_getlength(blk) : size;
>> -if (exp->size < 0) {
>> -error_setg_errno(errp, -exp->size,
>> - "Failed to determine the NBD export's length");
>> -goto fail;
>> -}
>> -exp->size -= exp->size % BDRV_SECTOR_SIZE;
>> +assert(dev_offset <= size);
> 
> @size is not size of the image, but size of the export, so it may be less 
> than dev_offset
> (qemu-nbd.c do "fd_size -= dev_offset" before "nbd_export_new(bs, dev_offset, 
> fd_size, "

But the assert is fine because patch 3/19 fixed qemu-nbd.c to never pass
in dev_offset larger than size (it fails up front if dev_offset is out
of bounds, whether from the -o command line option or from what it read
from the partition header with the -P command line option).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v3 11/19] nbd/client: Split out nbd_receive_one_meta_context()

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:58, Eric Blake wrote:
> Extract portions of nbd_negotiate_simple_meta_context() to
> a new function nbd_receive_one_meta_context() that copies the
> pattern of nbd_receive_list() for performing the argument
> validation of one reply.  The error message when the server
> replies with more than one context changes slightly, but
> that shouldn't happen in the common case.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-15-ebl...@redhat.com>
> 
> ---
> v3: rebase, without changing into a loop
> ---
>   nbd/client.c | 148 +--
>   nbd/trace-events |   2 +-
>   2 files changed, 92 insertions(+), 58 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 3c716be2719..22505199d3b 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -672,7 +672,86 @@ static int nbd_send_one_meta_context(QIOChannel *ioc,
>   return ret;
>   }
> 
> -/* nbd_negotiate_simple_meta_context:
> +/*
> + * nbd_receive_one_meta_context:
> + * Called in a loop to receive and trace one set/list meta context reply.
> + * Pass non-NULL @name or @id to collect results back to the caller, which
> + * must eventually call g_free().
> + * return 1 if name is set and iteration must continue,
> + *0 if iteration is complete (including if option is unsupported),
> + *-1 with errp set for any error
> + */
> +static int nbd_receive_one_meta_context(QIOChannel *ioc,
> +uint32_t opt,
> +char **name,
> +uint32_t *id,
> +Error **errp)
> +{
> +int ret;
> +NBDOptionReply reply;
> +char *local_name = NULL;
> +uint32_t local_id;
> +
> +if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
> +return -1;
> +}
> +
> +ret = nbd_handle_reply_err(ioc, &reply, errp);
> +if (ret <= 0) {
> +return ret;
> +}
> +
> +if (reply.type == NBD_REP_ACK) {
> +if (reply.length != 0) {
> +error_setg(errp, "Unexpected length to ACK response");
> +nbd_send_opt_abort(ioc);
> +return -1;
> +}
> +return 0;
> +} else if (reply.type != NBD_REP_META_CONTEXT) {
> +error_setg(errp, "Unexpected reply type %u (%s), expected %u (%s)",
> +   reply.type, nbd_rep_lookup(reply.type),
> +   NBD_REP_META_CONTEXT, 
> nbd_rep_lookup(NBD_REP_META_CONTEXT));
> +nbd_send_opt_abort(ioc);
> +return -1;
> +}
> +
> +if (reply.length <= sizeof(local_id) ||
> +reply.length > NBD_MAX_BUFFER_SIZE) {
> +error_setg(errp, "Failed to negotiate meta context, server "
> +   "answered with unexpected length %" PRIu32,
> +   reply.length);
> +nbd_send_opt_abort(ioc);
> +return -1;
> +}
> +
> +if (nbd_read(ioc, &local_id, sizeof(local_id), errp) < 0) {
> +return -1;
> +}
> +local_id = be32_to_cpu(local_id);
> +
> +reply.length -= sizeof(local_id);
> +local_name = g_malloc(reply.length + 1);
> +if (nbd_read(ioc, local_name, reply.length, errp) < 0) {
> +g_free(local_name);
> +return -1;
> +}
> +local_name[reply.length] = '\0';
> +trace_nbd_opt_meta_reply(nbd_opt_lookup(opt), local_name, local_id);
> +
> +if (name) {
> +*name = local_name;
> +} else {
> +g_free(local_name);
> +}
> +if (id) {
> +*id = local_id;
> +}
> +return 1;
> +}
> +
> +/*
> + * nbd_negotiate_simple_meta_context:
>* Request the server to set the meta context for export @info->name
>* using @info->x_dirty_bitmap with a fallback to "base:allocation",
>* setting @info->context_id to the resulting id. Fail if the server
> @@ -693,50 +772,21 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
> *ioc,
>* function should lose the term _simple.
>*/
>   int ret;
> -NBDOptionReply reply;
>   const char *context = info->x_dirty_bitmap ?: "base:allocation";
>   bool received = false;
> +char *name = NULL;
> 
>   if (nbd_send_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
> info->name, context, errp) < 0) {
>   return -1;
>   }
> 
> -if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, &reply,
> - errp) < 0)
> -{
> +ret = nbd_receive_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
> +   &name, &info->context_id, errp);
> +if (ret < 0) {
>   return -1;
>   }
> -
> -ret = nbd_handle_reply_err(ioc, &reply, errp);
> -if (ret <= 0) {
> -return ret;
> -}
> -
> -if (reply.type == NBD_REP_META_CONTEXT) {
> -char *name;
> -
> -if (reply.length != sizeof(info->context_id) + strlen(context)) {
> -

Re: [Qemu-block] [PATCH] throttle-groups: fix restart coroutine iothread race

2019-01-15 Thread Alberto Garcia
On Tue 15 Jan 2019 03:18:00 PM CET, Stefan Hajnoczi wrote:
>> So if my understanding is correct QEMU can be shut down when there
>> are iothreads waiting for a mutex. Is that something that we should
>> be worried about?
>
> Nothing joins the iothreads in vl.c:main().
>
> The assumption is that anything using iothreads will detach from them.
> For example, the vm runstate changes during shutdown so devices can
> disable the iothread code path (and this involves draining in-flight
> requests).
>
> My fix effectively does this by waiting for in-flight throttling
> restart coroutines.

Yeah, it's clear in the case of your fix. Thanks!

Berto



Re: [Qemu-block] [PATCH] throttle-groups: fix restart coroutine iothread race

2019-01-15 Thread Stefan Hajnoczi
On Mon, Jan 14, 2019 at 09:56:28PM +0100, Alberto Garcia wrote:
> On Mon 14 Jan 2019 05:31:17 PM CET, Stefan Hajnoczi  
> wrote:
> > On Mon, Jan 14, 2019 at 05:26:48PM +0100, Alberto Garcia wrote:
> >> On Mon 14 Jan 2019 05:15:25 PM CET, Stefan Hajnoczi wrote:
> >> >> > I've been able to reproduce this in an iotest, please see v2 of this
> >> >> > series.
> >> >> 
> >> >> That iotest doesn't crash for me :-?
> >> >
> >> > Does my iotest pass for you?
> >> 
> >> Yes, it does. I'm trying to figure out why because if I run the QMP
> >> commands by hand then it does crash.
> >
> > I ran the iotest 20 times on my machine and it segfaulted every time
> > (with the fix not yet applied).
> 
> Yeah I can also reproduce it all the time if I run it by hand...
> 
> I was debugging it and although I don't know why this is different when
> I run it through tests/qemu-iotests/check, here's why it doesn't crash:
> 
> After the ThrottleGroupMember is unregistered and its BlockBackend is
> destroyed, the throttle_group_co_restart_queue() coroutine takes
> control.
> 
> The first thing that it does is lock tgm->throttled_reqs_lock. It turns
> out that although this memory has been freed (it's part of the
> BlockBackend struct) it is still accessible but contains pure
> gargabe. 'Garbage' here means that the mutex counter contains some
> random value != 0, so the thread waits, it doesn't have a chance to
> crash the process, and QEMU shuts down cleanly.
> 
> So if my understanding is correct QEMU can be shut down when there are
> iothreads waiting for a mutex. Is that something that we should be
> worried about?

Nothing joins the iothreads in vl.c:main().

The assumption is that anything using iothreads will detach from them.
For example, the vm runstate changes during shutdown so devices can
disable the iothread code path (and this involves draining in-flight
requests).

My fix effectively does this by waiting for in-flight throttling restart
coroutines.

Stefan


signature.asc
Description: PGP signature


[Qemu-block] Crash when deleting an iothread that is being used

2019-01-15 Thread Alberto Garcia
Here's how to reproduce the crash:

{ "execute": "qmp_capabilities" }
{ "execute": "blockdev-add", "arguments": {"driver": "null-co", "node-name": 
"hd0"}}
{ "execute": "object-add", "arguments": {"qom-type": "iothread", "id": 
"iothread0"}}
{ "execute": "x-blockdev-set-iothread", "arguments": {"node-name": "hd0", 
"iothread": "iothread0"}}
{ "execute": "object-del", "arguments": {"id": "iothread0"}}
{ "execute": "blockdev-del", "arguments": {"node-name": "hd0"}}

The problem is that bs->aio_context is the one that belonged to the
IOThread and was destroyed by the object-del call. One would need to
do x-blockdev-set-iothread(hd0, null) before deleting the thread.

The IOThread class does not have a can_be_deleted() method to prevent
threads from being deleted. One possible implementation would require
a reference count but that doesn't seem immediately trivial because
users don't use the IOThread itself but its AioContext, and not all
bdrv_set_aio_context() are related to IOThreads.

A quicker fix is of course to prevent the threads from being deleted
at all :-)

Berto



Re: [Qemu-block] [PATCH v3 10/19] nbd/client: Split out nbd_send_one_meta_context()

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:58, Eric Blake wrote:
> Refactor nbd_negotiate_simple_meta_context() to pull out the
> code that can be reused to send a LIST request for 0 or 1 query.
> No semantic change.  The old comment about 'sizeof(uint32_t)'
> being equivalent to '/* number of queries */' is no longer
> needed, now that we are computing 'sizeof(queries)' instead.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-14-ebl...@redhat.com>
> Reviewed-by: Richard W.M. Jones 
> 
> ---
> v3: Improve commit message [Rich], formatting tweak [checkpatch],
> rebase to dropped patch
> ---
>   nbd/client.c | 67 +---
>   nbd/trace-events |  2 +-
>   2 files changed, 48 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index 77993890f04..3c716be2719 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -629,6 +629,49 @@ static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
>   return QIO_CHANNEL(tioc);
>   }
> 
> +/*
> + * nbd_send_one_meta_context:
> + * Send 0 or 1 set/list meta context queries.
> + * Return 0 on success, -1 with errp set for any error
> + */
> +static int nbd_send_one_meta_context(QIOChannel *ioc,
> + uint32_t opt,
> + const char *export,
> + const char *query,
> + Error **errp)
> +{
> +int ret;
> +uint32_t export_len = strlen(export);
> +uint32_t queries = !!query;

n_ or nb_ prefix may make it more clear

> +uint32_t context_len = 0;
> +uint32_t data_len;
> +char *data;
> +char *p;
> +
> +data_len = sizeof(export_len) + export_len + sizeof(queries);
> +if (query) {
> +context_len = strlen(query);

looks like it then should be query_len

> +data_len += sizeof(context_len) + context_len;
> +} else {
> +assert(opt == NBD_OPT_LIST_META_CONTEXT);
> +}
> +data = g_malloc(data_len);
> +p = data;

may use p = data = g_malloc

> +
> +trace_nbd_opt_meta_request(nbd_opt_lookup(opt), query ?: "(all)", 
> export);
> +stl_be_p(p, export_len);
> +memcpy(p += sizeof(export_len), export, export_len);
> +stl_be_p(p += export_len, queries);
> +if (query) {
> +stl_be_p(p += sizeof(uint32_t), context_len);

:), aha, please, s/uint32_t/queries, as you promised

Hmm, its my code. It's hard to read and not very comfortable to maintain..

In block/nbd-client.c we have
payload_advance* functions, to read such formatted data, I think, it should be
good to make something like this for server-part. Not about these series, of 
course.

Interesting, troubles around "don't use be64_to_cpuS, use only be64_to_cpu",
do they apply somehow to *_be_p functions family?

> +memcpy(p += sizeof(context_len), query, context_len);
> +}
> +
> +ret = nbd_send_option_request(ioc, opt, data_len, data, errp);
> +g_free(data);
> +return ret;
> +}
> +
>   /* nbd_negotiate_simple_meta_context:
>* Request the server to set the meta context for export @info->name
>* using @info->x_dirty_bitmap with a fallback to "base:allocation",
> @@ -653,26 +696,10 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
> *ioc,
>   NBDOptionReply reply;
>   const char *context = info->x_dirty_bitmap ?: "base:allocation";
>   bool received = false;
> -uint32_t export_len = strlen(info->name);
> -uint32_t context_len = strlen(context);
> -uint32_t data_len = sizeof(export_len) + export_len +
> -sizeof(uint32_t) + /* number of queries */
> -sizeof(context_len) + context_len;
> -char *data = g_malloc(data_len);
> -char *p = data;
> 
> -trace_nbd_opt_meta_request(context, info->name);
> -stl_be_p(p, export_len);
> -memcpy(p += sizeof(export_len), info->name, export_len);
> -stl_be_p(p += export_len, 1);
> -stl_be_p(p += sizeof(uint32_t), context_len);
> -memcpy(p += sizeof(context_len), context, context_len);
> -
> -ret = nbd_send_option_request(ioc, NBD_OPT_SET_META_CONTEXT, data_len, 
> data,
> -  errp);
> -g_free(data);
> -if (ret < 0) {
> -return ret;
> +if (nbd_send_one_meta_context(ioc, NBD_OPT_SET_META_CONTEXT,
> +  info->name, context, errp) < 0) {
> +return -1;
>   }
> 
>   if (nbd_receive_option_reply(ioc, NBD_OPT_SET_META_CONTEXT, &reply,
> @@ -689,7 +716,7 @@ static int nbd_negotiate_simple_meta_context(QIOChannel 
> *ioc,
>   if (reply.type == NBD_REP_META_CONTEXT) {
>   char *name;
> 
> -if (reply.length != sizeof(info->context_id) + context_len) {
> +if (reply.length != sizeof(info->context_id) + strlen(context)) {
>   error_setg(errp, "Failed to negotiate meta context '%s', server 
> "
>  "answered with unexpected length %

Re: [Qemu-block] [Qemu-devel] [PATCH 03/15] hw/ssi: Remove SSIBus from "qemu/typedefs.h"

2019-01-15 Thread Thomas Huth
On 2019-01-15 13:28, Markus Armbruster wrote:
> Philippe Mathieu-Daudé  writes:
> 
>> From: Philippe Mathieu-Daudé 
>>
>> There are only three files requiring this typedef, let them
>> include "hw/ssi/ssi.h" directly to simplify "qemu/typedefs.h".
>>
>> To clean "qemu/typedefs.h", move the forward declaration
>> to "hw/ssi/ssi.h".
>>
>> Signed-off-by: Philippe Mathieu-Daudé 
>> ---
>>  hw/arm/strongarm.h  | 1 +
>>  include/hw/arm/pxa.h| 1 +
>>  include/hw/ssi/pl022.h  | 1 +
>>  include/hw/ssi/ssi.h| 1 +
>>  include/qemu/typedefs.h | 1 -
>>  5 files changed, 4 insertions(+), 1 deletion(-)
> 
> When typedefs.h changes, we recompile the world, but it pretty much only
> ever changes when new typedefs are added.  Thus, *keeping* a typedef
> there is therefore pretty cheap.
> 
> Nevertheless, we shouldn't keep typedefs there without a real reason.
> Being able to move one away without having to add any new #include
> directives is a strong sign for "no real reason".  I like patches doing
> that.
> 
> What I don't like is adding #include directives just so you can move
> typedefs out of typedefs.h: it slows down the build.  Granted, the four
> added by this patch are a drop in the bucket.  The point I'm trying to
> make is typedefs.h's purpose: it's for avoiding #include directives.
> Circular ones in particular, but others, too.

Yes, agreed, removing things from typedefs.h just to add lots of
#includes in other files is not really the best idea. I also dropped
this patch in v2 of my current PULL request because of this reason.
Phil, I suggest to simply drop this patch. What we maybe could do is to
split up typedefs.h per subsystem, so that we additionally have a
hw/arm/typedefs.h, hw/ppc/typedefs.h etc. in the end, then the
target-specific typedefs would not clutter the common qemu/typedefs.h
file anymore.

 Thomas




Re: [Qemu-block] [PATCH v3 06/19] qemu-nbd: Avoid strtol open-coding

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:57, Eric Blake wrote:
> Our copy-and-pasted open-coding of strtol handling forgot to
> handle overflow conditions.  Use qemu_strto*() instead.
> 
> In the case of --partition, since we insist on a user-supplied
> partition to be non-zero, we can use 0 rather than -1 for our
> initial value to distinguish when a partition is not being
> served, for slightly more optimal code.
> 
> The error messages for out-of-bounds values are less specific,
> but should not be a terrible loss in quality.
> 
> Signed-off-by: Eric Blake 
> Message-Id: <20181215135324.152629-8-ebl...@redhat.com>
> 
> ---
> v3: rebase to use int64_t rather than off_t [Vladimir]
> ---
>   qemu-nbd.c | 28 +---
>   1 file changed, 9 insertions(+), 19 deletions(-)
> 
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index 96c0829970c..4670b659167 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -546,9 +546,8 @@ int main(int argc, char **argv)
>   };
>   int ch;
>   int opt_ind = 0;
> -char *end;
>   int flags = BDRV_O_RDWR;
> -int partition = -1;
> +int partition = 0;
>   int ret = 0;
>   bool seen_cache = false;
>   bool seen_discard = false;
> @@ -660,9 +659,8 @@ int main(int argc, char **argv)
>   port = optarg;
>   break;
>   case 'o':
> -dev_offset = strtoll (optarg, &end, 0);
> -if (*end) {
> -error_report("Invalid offset `%s'", optarg);
> +if (qemu_strtou64(optarg, NULL, 0, &dev_offset) < 0) {
> +error_report("Invalid offset '%s'", optarg);
>   exit(EXIT_FAILURE);
>   }
>   break;
> @@ -684,13 +682,9 @@ int main(int argc, char **argv)
>   flags &= ~BDRV_O_RDWR;
>   break;
>   case 'P':
> -partition = strtol(optarg, &end, 0);
> -if (*end) {
> -error_report("Invalid partition `%s'", optarg);
> -exit(EXIT_FAILURE);
> -}
> -if (partition < 1 || partition > 8) {
> -error_report("Invalid partition %d", partition);
> +if (qemu_strtoi(optarg, NULL, 0, &partition) < 0 ||

we can use unsigned conversion like for offset (and unsigned type for 
partition), but this doesn't really matter.

> +partition < 1 || partition > 8) {
> +error_report("Invalid partition '%s'", optarg);
>   exit(EXIT_FAILURE);
>   }
>   break;
> @@ -711,15 +705,11 @@ int main(int argc, char **argv)
>   device = optarg;
>   break;
>   case 'e':
> -shared = strtol(optarg, &end, 0);
> -if (*end) {
> +if (qemu_strtoi(optarg, NULL, 0, &shared) < 0 ||

and here

> +shared < 1) {
>   error_report("Invalid shared device number '%s'", optarg);
>   exit(EXIT_FAILURE);
>   }
> -if (shared < 1) {
> -error_report("Shared device number must be greater than 0");
> -exit(EXIT_FAILURE);
> -}
>   break;
>   case 'f':
>   fmt = optarg;
> @@ -1007,7 +997,7 @@ int main(int argc, char **argv)
>   }
>   fd_size -= dev_offset;
> 
> -if (partition != -1) {
> +if (partition) {
>   int64_t limit;
> 
>   if (dev_offset) {
> 

anyway,
Reviewed-by: Vladimir Sementsov-Ogievskiy 

-- 
Best regards,
Vladimir


Re: [Qemu-block] [Qemu-devel] [PATCH 03/15] hw/ssi: Remove SSIBus from "qemu/typedefs.h"

2019-01-15 Thread Markus Armbruster
Philippe Mathieu-Daudé  writes:

> From: Philippe Mathieu-Daudé 
>
> There are only three files requiring this typedef, let them
> include "hw/ssi/ssi.h" directly to simplify "qemu/typedefs.h".
>
> To clean "qemu/typedefs.h", move the forward declaration
> to "hw/ssi/ssi.h".
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/arm/strongarm.h  | 1 +
>  include/hw/arm/pxa.h| 1 +
>  include/hw/ssi/pl022.h  | 1 +
>  include/hw/ssi/ssi.h| 1 +
>  include/qemu/typedefs.h | 1 -
>  5 files changed, 4 insertions(+), 1 deletion(-)

When typedefs.h changes, we recompile the world, but it pretty much only
ever changes when new typedefs are added.  Thus, *keeping* a typedef
there is therefore pretty cheap.

Nevertheless, we shouldn't keep typedefs there without a real reason.
Being able to move one away without having to add any new #include
directives is a strong sign for "no real reason".  I like patches doing
that.

What I don't like is adding #include directives just so you can move
typedefs out of typedefs.h: it slows down the build.  Granted, the four
added by this patch are a drop in the bucket.  The point I'm trying to
make is typedefs.h's purpose: it's for avoiding #include directives.
Circular ones in particular, but others, too.



Re: [Qemu-block] [Qemu-devel] [PATCH 0/2] qcow2: include LUKS payload overhead in qemu-img measure

2019-01-15 Thread Philippe Mathieu-Daudé
On 1/15/19 12:10 PM, Stefan Hajnoczi wrote:
> The LUKS payload has a significant size (>1 MB).  It must be included in the
> qemu-img measure calculation so we arrive at the correct minimum image size.
> 
> Stefan Hajnoczi (2):
>   qcow2: include LUKS payload overhead in qemu-img measure
>   iotests: add LUKS payload overhead to 178 qemu-img measure test
> 
>  block/qcow2.c| 51 +++-
>  tests/qemu-iotests/178   |  8 +
>  tests/qemu-iotests/178.out.qcow2 | 24 +++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 

Reviewed-by: Philippe Mathieu-Daudé 



[Qemu-block] [PATCH 2/2] iotests: add LUKS payload overhead to 178 qemu-img measure test

2019-01-15 Thread Stefan Hajnoczi
The previous patch includes the LUKS payload overhead into the qemu-img
measure calculation for qcow2.  Update qemu-iotests 178 to exercise this
new code path.

Signed-off-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/178   |  8 
 tests/qemu-iotests/178.out.qcow2 | 24 
 2 files changed, 32 insertions(+)

diff --git a/tests/qemu-iotests/178 b/tests/qemu-iotests/178
index 3f4b4a4564..23eb017ea1 100755
--- a/tests/qemu-iotests/178
+++ b/tests/qemu-iotests/178
@@ -142,6 +142,14 @@ for ofmt in human json; do
 # The backing file doesn't need to exist :)
 $QEMU_IMG measure --output=$ofmt -o backing_file=x \
   -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
+
+echo
+echo "== $fmt input image and LUKS encryption =="
+echo
+$QEMU_IMG measure --output=$ofmt \
+  --object secret,id=sec0,data=base \
+  -o encrypt.format=luks,encrypt.key-secret=sec0 \
+  -f "$fmt" -O "$IMGFMT" "$TEST_IMG"
 fi
 
 echo
diff --git a/tests/qemu-iotests/178.out.qcow2 b/tests/qemu-iotests/178.out.qcow2
index d42d4a4597..55a8dc926f 100644
--- a/tests/qemu-iotests/178.out.qcow2
+++ b/tests/qemu-iotests/178.out.qcow2
@@ -68,6 +68,11 @@ converted image file size in bytes: 458752
 required size: 1074135040
 fully allocated size: 1074135040
 
+== qcow2 input image and LUKS encryption ==
+
+required size: 2686976
+fully allocated size: 1076232192
+
 == qcow2 input image and preallocation (human) ==
 
 required size: 1074135040
@@ -114,6 +119,11 @@ converted image file size in bytes: 524288
 required size: 1074135040
 fully allocated size: 1074135040
 
+== raw input image and LUKS encryption ==
+
+required size: 2686976
+fully allocated size: 1076232192
+
 == raw input image and preallocation (human) ==
 
 required size: 1074135040
@@ -205,6 +215,13 @@ converted image file size in bytes: 458752
 "fully-allocated": 1074135040
 }
 
+== qcow2 input image and LUKS encryption ==
+
+{
+"required": 2686976,
+"fully-allocated": 1076232192
+}
+
 == qcow2 input image and preallocation (json) ==
 
 {
@@ -263,6 +280,13 @@ converted image file size in bytes: 524288
 "fully-allocated": 1074135040
 }
 
+== raw input image and LUKS encryption ==
+
+{
+"required": 2686976,
+"fully-allocated": 1076232192
+}
+
 == raw input image and preallocation (json) ==
 
 {
-- 
2.20.1




[Qemu-block] [PATCH 1/2] qcow2: include LUKS payload overhead in qemu-img measure

2019-01-15 Thread Stefan Hajnoczi
LUKS encryption reserves clusters for its own payload data.  The size of
this area must be included in the qemu-img measure calculation so that
we arrive at the correct minimum required image size.

(Ab)use the qcrypto_block_create() API to determine the payload
overhead.  We discard the payload data that qcrypto thinks will be
written to the image.

Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c | 51 ++-
 1 file changed, 50 insertions(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4897abae5e..7ab93a5d2f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4231,6 +4231,24 @@ static coroutine_fn int 
qcow2_co_flush_to_os(BlockDriverState *bs)
 return ret;
 }
 
+static ssize_t qcow2_measure_crypto_hdr_init_func(QCryptoBlock *block,
+size_t headerlen, void *opaque, Error **errp)
+{
+size_t *headerlenp = opaque;
+
+/* Stash away the payload size */
+*headerlenp = headerlen;
+return 0;
+}
+
+static ssize_t qcow2_measure_crypto_hdr_write_func(QCryptoBlock *block,
+size_t offset, const uint8_t *buf, size_t buflen,
+void *opaque, Error **errp)
+{
+/* Discard the bytes, we're not actually writing to an image */
+return buflen;
+}
+
 static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
Error **errp)
 {
@@ -4240,11 +4258,13 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 uint64_t virtual_size; /* disk size as seen by guest */
 uint64_t refcount_bits;
 uint64_t l2_tables;
+uint64_t luks_payload_size = 0;
 size_t cluster_size;
 int version;
 char *optstr;
 PreallocMode prealloc;
 bool has_backing_file;
+bool has_luks;
 
 /* Parse image creation options */
 cluster_size = qcow2_opt_get_cluster_size_del(opts, &local_err);
@@ -4274,6 +4294,35 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 has_backing_file = !!optstr;
 g_free(optstr);
 
+optstr = qemu_opt_get_del(opts, BLOCK_OPT_ENCRYPT_FORMAT);
+has_luks = optstr && strcmp(optstr, "luks") == 0;
+g_free(optstr);
+
+if (has_luks) {
+QCryptoBlockCreateOptions cryptoopts = {
+.format = Q_CRYPTO_BLOCK_FORMAT_LUKS,
+};
+QCryptoBlock *crypto;
+size_t headerlen;
+
+optstr = qemu_opt_get_del(opts, "encrypt.key-secret");
+cryptoopts.u.luks.has_key_secret = !!optstr;
+cryptoopts.u.luks.key_secret = optstr;
+
+crypto = qcrypto_block_create(&cryptoopts, "encrypt.",
+  qcow2_measure_crypto_hdr_init_func,
+  qcow2_measure_crypto_hdr_write_func,
+  &headerlen, &local_err);
+
+g_free(optstr);
+if (!crypto) {
+goto err;
+}
+qcrypto_block_free(crypto);
+
+luks_payload_size = ROUND_UP(headerlen, cluster_size);
+}
+
 virtual_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 virtual_size = ROUND_UP(virtual_size, cluster_size);
 
@@ -4344,7 +4393,7 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, 
BlockDriverState *in_bs,
 info = g_new(BlockMeasureInfo, 1);
 info->fully_allocated =
 qcow2_calc_prealloc_size(virtual_size, cluster_size,
- ctz32(refcount_bits));
+ ctz32(refcount_bits)) + luks_payload_size;
 
 /* Remove data clusters that are not required.  This overestimates the
  * required size because metadata needed for the fully allocated file is
-- 
2.20.1




[Qemu-block] [PATCH 0/2] qcow2: include LUKS payload overhead in qemu-img measure

2019-01-15 Thread Stefan Hajnoczi
The LUKS payload has a significant size (>1 MB).  It must be included in the
qemu-img measure calculation so we arrive at the correct minimum image size.

Stefan Hajnoczi (2):
  qcow2: include LUKS payload overhead in qemu-img measure
  iotests: add LUKS payload overhead to 178 qemu-img measure test

 block/qcow2.c| 51 +++-
 tests/qemu-iotests/178   |  8 +
 tests/qemu-iotests/178.out.qcow2 | 24 +++
 3 files changed, 82 insertions(+), 1 deletion(-)

-- 
2.20.1




Re: [Qemu-block] [PATCH v3 05/19] nbd/server: Favor [u]int64_t over off_t

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:57, Eric Blake wrote:
> Although our compile-time environment is set up so that we always
> support long files with 64-bit off_t, we have no guarantee whether
> off_t is the same type as int64_t.  This requires casts when
> printing values, and prevents us from directly using qemu_strtoi64().
> Let's just flip to [u]int64_t (signed for length, because we have to
> detect failure of blk_getlength()

we have not, after previous patch

and because off_t was signed;
> unsigned for offset because it lets us simplify some math without
> having to worry about signed overflow).
> 
> Suggested-by: Vladimir Sementsov-Ogievskiy 
> Signed-off-by: Eric Blake 
> 
> ---
> v3: new patch
> ---
>   include/block/nbd.h |  4 ++--
>   nbd/server.c| 14 +++---
>   qemu-nbd.c  | 26 ++
>   3 files changed, 19 insertions(+), 25 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 1971b557896..0f252829376 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -294,8 +294,8 @@ int nbd_errno_to_system_errno(int err);
>   typedef struct NBDExport NBDExport;
>   typedef struct NBDClient NBDClient;
> 
> -NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
> -  const char *name, const char *description,
> +NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> +  int64_t size, const char *name, const char *desc,

in previous patch you drop use of negative @size parameter, so it looks better
to use unsigned for size like for offset

> const char *bitmap, uint16_t nbdflags,
> void (*close)(NBDExport *), bool writethrough,
> BlockBackend *on_eject_blk, Error **errp);
> diff --git a/nbd/server.c b/nbd/server.c
> index c9937ccdc2a..15357d40fd7 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -77,8 +77,8 @@ struct NBDExport {
>   BlockBackend *blk;
>   char *name;
>   char *description;
> -off_t dev_offset;
> -off_t size;
> +uint64_t dev_offset;
> +int64_t size;
>   uint16_t nbdflags;
>   QTAILQ_HEAD(, NBDClient) clients;
>   QTAILQ_ENTRY(NBDExport) next;
> @@ -1455,8 +1455,8 @@ static void nbd_eject_notifier(Notifier *n, void *data)
>   nbd_export_close(exp);
>   }
> 
> -NBDExport *nbd_export_new(BlockDriverState *bs, off_t dev_offset, off_t size,
> -  const char *name, const char *description,
> +NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
> +  int64_t size, const char *name, const char *desc,
> const char *bitmap, uint16_t nbdflags,
> void (*close)(NBDExport *), bool writethrough,
> BlockBackend *on_eject_blk, Error **errp)
> @@ -1497,7 +1497,7 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
> dev_offset, off_t size,
>   exp->blk = blk;
>   exp->dev_offset = dev_offset;
>   exp->name = g_strdup(name);
> -exp->description = g_strdup(description);
> +exp->description = g_strdup(desc);

unrelated but at least obvious, OK. However tiny note in commit message won't 
hurt.

>   exp->nbdflags = nbdflags;
>   assert(dev_offset <= size);
>   exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);
> @@ -2131,8 +2131,8 @@ static int nbd_co_receive_request(NBDRequestData *req, 
> NBDRequest *request,
>   if (request->from > client->exp->size ||
>   request->from + request->len > client->exp->size) {
>   error_setg(errp, "operation past EOF; From: %" PRIu64 ", Len: %" 
> PRIu32
> -   ", Size: %" PRIu64, request->from, request->len,
> -   (uint64_t)client->exp->size);
> +   ", Size: %" PRId64, request->from, request->len,
> +   client->exp->size);
>   return (request->type == NBD_CMD_WRITE ||
>   request->type == NBD_CMD_WRITE_ZEROES) ? -ENOSPC : -EINVAL;
>   }
> diff --git a/qemu-nbd.c b/qemu-nbd.c
> index ff4adb9b3eb..96c0829970c 100644
> --- a/qemu-nbd.c
> +++ b/qemu-nbd.c
> @@ -176,7 +176,7 @@ static void read_partition(uint8_t *p, struct 
> partition_record *r)
>   }
> 
>   static int find_partition(BlockBackend *blk, int partition,
> -  off_t *offset, off_t *size)
> +  uint64_t *offset, int64_t *size)

function never return negative @size, so what is the reason to keep it signed?

Also, type conversion (uint64_t) should be dropped from the function code I 
think.

>   {
>   struct partition_record mbr[4];
>   uint8_t data[MBR_SIZE];
> @@ -500,14 +500,14 @@ int main(int argc, char **argv)
>   {
>   BlockBackend *blk;
>   BlockDriverState *bs;
> -off_t dev_offset = 0;
> +uint64_t dev_offset = 0;
>   uint16_t nbdflags = 0;
>   bool disconnect = false;

Re: [Qemu-block] [PATCH v3 04/19] nbd/server: Hoist length check to qemp_nbd_server_add

2019-01-15 Thread Vladimir Sementsov-Ogievskiy
12.01.2019 20:57, Eric Blake wrote:
> We only had two callers to nbd_export_new; qemu-nbd.c always
> passed a valid offset/length pair (because it already checked
> the file length, to ensure that offset was in bounds), while
> blockdev-nbd always passed 0/-1.  Then nbd_export_new reduces
> the size to a multiple of BDRV_SECTOR_SIZE (can only happen
> when offset is not sector-aligned, since bdrv_getlength()
> currently rounds up), which can result in offset being greater
> than the enforced length, but that's not fatal (the server
> rejects client requests that exceed the advertised length).
> 
> However, I'm finding it easier to work with the code if we are
> consistent on having both callers pass in a valid length, and
> just assert that things are sane in nbd_export_new.
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v3: new patch
> ---
>   blockdev-nbd.c | 10 +-
>   nbd/server.c   |  9 ++---
>   2 files changed, 11 insertions(+), 8 deletions(-)
> 
> diff --git a/blockdev-nbd.c b/blockdev-nbd.c
> index c76d5416b90..d73ac1b026a 100644
> --- a/blockdev-nbd.c
> +++ b/blockdev-nbd.c
> @@ -146,6 +146,7 @@ void qmp_nbd_server_add(const char *device, bool 
> has_name, const char *name,
>   BlockDriverState *bs = NULL;
>   BlockBackend *on_eject_blk;
>   NBDExport *exp;
> +int64_t len;
> 
>   if (!nbd_server) {
>   error_setg(errp, "NBD server not running");
> @@ -168,6 +169,13 @@ void qmp_nbd_server_add(const char *device, bool 
> has_name, const char *name,
>   return;
>   }
> 
> +len = bdrv_getlength(bs);
> +if (len < 0) {
> +error_setg_errno(errp, -len,
> + "Failed to determine the NBD export's length");
> +return;
> +}
> +
>   if (!has_writable) {
>   writable = false;
>   }
> @@ -175,7 +183,7 @@ void qmp_nbd_server_add(const char *device, bool 
> has_name, const char *name,
>   writable = false;
>   }
> 
> -exp = nbd_export_new(bs, 0, -1, name, NULL, bitmap,
> +exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
>writable ? 0 : NBD_FLAG_READ_ONLY,
>NULL, false, on_eject_blk, errp);
>   if (!exp) {
> diff --git a/nbd/server.c b/nbd/server.c
> index e8c56607eff..c9937ccdc2a 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1499,13 +1499,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
> dev_offset, off_t size,
>   exp->name = g_strdup(name);
>   exp->description = g_strdup(description);
>   exp->nbdflags = nbdflags;
> -exp->size = size < 0 ? blk_getlength(blk) : size;
> -if (exp->size < 0) {
> -error_setg_errno(errp, -exp->size,
> - "Failed to determine the NBD export's length");
> -goto fail;
> -}
> -exp->size -= exp->size % BDRV_SECTOR_SIZE;
> +assert(dev_offset <= size);

@size is not size of the image, but size of the export, so it may be less than 
dev_offset
(qemu-nbd.c do "fd_size -= dev_offset" before "nbd_export_new(bs, dev_offset, 
fd_size, "

> +exp->size = QEMU_ALIGN_DOWN(size, BDRV_SECTOR_SIZE);
> 
>   if (bitmap) {
>   BdrvDirtyBitmap *bm = NULL;
> 


-- 
Best regards,
Vladimir