date:20170705

Re: [Qemu-block] [Qemu-devel] [PATCH v2 00/13] vvfat: misc fixes for read-only mode

2017-07-05 Thread Hervé Poussineau


Hi,

Thanks to have taken this patch series.

However, I already have in my repository the v3 patch series, whose changelog 
is:

Changes v2->v3:
- added patches 5, 12, 16
- fixed warning (unused variable) (patch 11)
- added #defines for constants for deleted byte following Philippe remarks 
(patch 14)
- added #define and explanations for OEM name following Philippe remarks (patch 
15)

Changes v1->v2:
- small changes following Kevin remarks (patches 3, 5, 6)
- use g_utf8_* functions instead of ad-hock code (patches 8 and 9)
- fix a bug with filenames starting with a dot (patch 9)

Hervé Poussineau (16):
  vvfat: fix qemu-img map and qemu-img convert
  vvfat: replace tabs by 8 spaces
  vvfat: fix typos
  vvfat: rename useless enumeration values
  vvfat: add constants for special values of name[0]
  vvfat: introduce offset_to_bootsector, offset_to_fat and
offset_to_root_dir
  vvfat: fix field names in FAT12/FAT16 and FAT32 boot sectors
  vvfat: always create . and .. entries at first and in that order
  vvfat: correctly create long names for non-ASCII filenames
  vvfat: correctly create base short names for non-ASCII filenames
  vvfat: correctly generate numeric-tail of short file names
  vvfat: correctly parse non-ASCII short and long file names
  vvfat: limit number of entries in root directory in FAT12/FAT16
  vvfat: handle KANJI lead byte 0xe5
  vvfat: change OEM name to 'MSWIN4.1'
  vvfat: initialize memory after allocating it

Should I rebase on top of your branch, or should I send the v3 as is?

It fixes the last random errors I had in Win9x Scandisk (uninitialized memory).

Regards,

Hervé

Le 03/07/2017 à 18:50, Kevin Wolf a écrit :

Am 22.05.2017 um 23:11 hat Hervé Poussineau geschrieben:

Hi,

This patchset fixes some of issues I encountered when trying to use vvfat, and 
fixes
bug #1599539: https://bugs.launchpad.net/qemu/+bug/1599539

Patch 1 fixes a crash when using 'qemu-img convert'.
Patches 2 to 6 are code cleanup. No functionnal changes.
Patches 7 to 13 fix problems detected by disk checking utilities in read-only 
mode.

With these patches, vvfat creates valid FAT volumes and can be used with QEMU 
disk utilities.

Read-write mode is still buggy after this patchset, but at least, I was not
able to crash QEMU anymore.

Note that patch 2 doesn't pass checkpatch.pl, as it changes indentation only.


Thanks, fixed the build error in patch 9 (yet unused variables) and
applied to the block branch.

There were a few more minor comments for this series, but it has been on
the list for long enough and I figured that they can be addressed on
top.

Kevin

Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Eric Blake

On 07/03/2017 05:14 PM, Eric Blake wrote:
> Not all callers care about which BDS owns the mapping for a given
> range of the file.  In particular, bdrv_is_allocated() cares more
> about finding the largest run of allocated data from the guest
> perspective, whether or not that data is consecutive from the
> host perspective.  Therefore, doing subsequent refinements such
> as checking how much of the format-layer allocation also satisfies
> BDRV_BLOCK_ZERO at the protocol layer is wasted work - in the best
> case, it just costs extra CPU cycles during a single
> bdrv_is_allocated(), but in the worst case, it results in a smaller
> *pnum, and forces callers to iterate through more status probes when
> visiting the entire file for even more extra CPU cycles.
> 
> This patch only optimizes the block layer.  But subsequent patches
> will tweak the driver callback to be byte-based, and in the process,
> can also pass this hint through to the driver.
> 
> Signed-off-by: Eric Blake 
> 

> @@ -1810,12 +1817,13 @@ static int64_t coroutine_fn 
> bdrv_co_get_block_status(BlockDriverState *bs,
>  }
>  }
> 
> -if (local_file && local_file != bs &&
> +if (!allocation && local_file && local_file != bs &&
>  (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>  (ret & BDRV_BLOCK_OFFSET_VALID)) {
>  int file_pnum;
> 
> -ret2 = bdrv_co_get_block_status(local_file, ret >> BDRV_SECTOR_BITS,
> +ret2 = bdrv_co_get_block_status(local_file, true,
> +ret >> BDRV_SECTOR_BITS,
>  *pnum, &file_pnum, NULL);
>  if (ret2 >= 0) {
>  /* Ignore errors.  This is just providing extra information, it

Hmm. My initial thinking here was that if we already have a good primary
status, we want our secondary status (where we are probing bs->file for
whether we can add BDRV_BLOCK_ZERO) to be as fast as possible, so I
hard-coded the answer that favors is_allocated (I have to be careful
describing this, since v3 will switch from 'bool allocated=true' to
'bool mapping=false' to express that same request).  But it turns out
that, at least for file-posix.c (for that matter, for several protocol
drivers), it's a LOT faster to just blindly state that the entire file
is allocated and data than it is to lseek(SEEK_HOLE).  So favoring
allocation status instead of mapping status defeats the purpose, and
this should be s/true/allocation/ (which is always false at this point)
[or conversely s/false/mapping/, which is always true, in v3].

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v4 0/1] virtio-scsi-ccw: fix iotest 068 for s390x

2017-07-05 Thread QingFeng Hao




在 2017/7/5 23:15, Stefan Hajnoczi 写道:

On Tue, Jul 04, 2017 at 03:23:49PM +0200, QingFeng Hao wrote:

This commit fixes iotest 068 for s390x as s390x uses virtio-scsi-ccw.
It's based on commit c324fd0a39c by Stefan Hajnoczi.
Thanks!

Change history:
v4:
 Got Cornelia Huck's Reviewed-by and take the comment to change the
 commit message.

v3:
 Take Christian Borntraeger and Cornelia Huck's comment to check
 if kvm is enabled in s390_assign_subch_ioeventfd instead of
 kvm_s390_assign_subch_ioeventfd to as the former is a general one.

v2:
 Remove Stefan from sign-off list and change the patch's commit message
 according to Christian Borntraeger's comment.

QingFeng Hao (1):
   virtio-scsi-ccw: use ioeventfd even when KVM is disabled

  hw/s390x/virtio-ccw.c | 2 +-
  target/s390x/cpu.h| 6 +-
  2 files changed, 6 insertions(+), 2 deletions(-)

I didn't realize s390 also has this old check.  Thanks for fixing it!

Thanks Stefan!


Reviewed-by: Stefan Hajnoczi 


--
Regards
QingFeng Hao

Re: [Qemu-block] [Qemu-devel] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> bdrv_is_allocated_above() was relying on intermediate->total_sectors,
> which is a field that can have stale contents depending on the value
> of intermediate->has_variable_length.  An audit shows that we are safe
> (we were first calling through bdrv_co_get_block_status() which in
> turn calls bdrv_nb_sectors() and therefore just refreshed the current
> length), but it's nicer to favor our accessor functions to avoid having
> to repeat such an audit, even if it means refresh_total_sectors() is
> called more frequently.
> 
> Suggested-by: John Snow 
> Signed-off-by: Eric Blake 
> Reviewed-by: Manos Pitsidianakis 
> Reviewed-by: Jeff Cody 

Reviewed-by: John Snow

Re: [Qemu-block] [Qemu-devel] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
> (necessary for interaction with sector-based dirty bitmaps, until
> a later patch converts those to be byte-based) does not suffer from
> truncation problems.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake 
> 

Reviewed-by: John Snow

Re: [Qemu-block] [Qemu-devel] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()

2017-07-05 Thread John Snow



On 07/05/2017 05:08 PM, Eric Blake wrote:
> stream_complete() skips the work of rewriting the backing file if
> the job was cancelled, if data->reached_end is false, or if there
> was an error detected (non-zero data->ret) during the streaming.
> But note that in stream_run(), data->reached_end is only set if the
> loop ran to completion, and data->ret is only 0 in two cases:
> either the loop ran to completion (possibly by cancellation, but
> stream_complete checks for that), or we took an early goto out
> because there is no bs->backing.  Thus, we can preserve the same
> semantics without the use of reached_end, by merely checking for
> bs->backing (and logically, if there was no backing file, streaming
> is a no-op, so there is no backing file to rewrite).
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Eric Blake 
> 
> ---
> v4: new patch
> ---
>  block/stream.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/block/stream.c b/block/stream.c
> index 746d525..12f1659 100644
> --- a/block/stream.c
> +++ b/block/stream.c
> @@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,
> 
>  typedef struct {
>  int ret;
> -bool reached_end;
>  } StreamCompleteData;
> 
>  static void stream_complete(BlockJob *job, void *opaque)
> @@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
>  BlockDriverState *base = s->base;
>  Error *local_err = NULL;
> 
> -if (!block_job_is_cancelled(&s->common) && data->reached_end &&
> +if (!block_job_is_cancelled(&s->common) && bs->backing &&
>  data->ret == 0) {
>  const char *base_id = NULL, *base_fmt = NULL;
>  if (base) {
> @@ -211,7 +210,6 @@ out:
>  /* Modify backing chain and close BDSes in main loop */
>  data = g_malloc(sizeof(*data));
>  data->ret = ret;
> -data->reached_end = sector_num == end;
>  block_job_defer_to_main_loop(&s->common, stream_complete, data);
>  }
> 

This seems ever so slightly less intuitive to me, but it is functionally
identical.

Reviewed-by: John Snow

Re: [Qemu-block] [Qemu-devel] [PATCH v2 01/15] block: add default implementations for bdrv_co_get_block_status()

2017-07-05 Thread Eric Blake

On 07/03/2017 05:14 PM, Eric Blake wrote:
> From: Manos Pitsidianakis 
> 
> bdrv_co_get_block_status_from_file() and
> bdrv_co_get_block_status_from_backing() set *file to bs->file and
> bs->backing respectively, so that bdrv_co_get_block_status() can recurse
> to them. Future block drivers won't have to duplicate code to implement
> this.
> 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Manos Pitsidianakis 
> Message-Id: <20170629184320.7151-4-el13...@mail.ntua.gr>
> 

Missing my Signed-off-by if we take this one in isolation through my
series, and still awaiting resolution on what will happen to the rest of
his series (v3 had some valid review comments that still need addressing)

> ---
> v2: Including this patch from Manos, since it affects my later patches;
> however, I anticipate that we will get a full v3 series from Manos
> merged first


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 00/11] Block layer thread-safety, part 2

2017-07-05 Thread Paolo Bonzini

On 29/06/2017 15:27, Paolo Bonzini wrote:
> This part takes care of drivers and devices, making sure that they can
> accept concurrent I/O from multiple AioContext.
> 
> The following drivers are thread-safe without using any QemuMutex/CoMutex:
> crypto, gluster, null, rbd, win32-aio.  NBD has already been fixed,
> because the patch fixed an unrelated testcase.
> 
> The following drivers already use mutexes for everything except possibly
> snapshots, which do not (yet?) need protection: bochs, cloop, dmg, qcow,
> parallels, vhdx, vmdk, curl, iscsi, nfs.
> 
> The following drivers already use mutexes for _almost_ everything: vpc
> (missing get_block_status), vdi (missing bitmap access), vvfat (missing
> commit), not protected), qcow2 (must call CoQueue APIs under CoMutex).
> They are fixed by patches 1-5.
> 
> The following drivers must be changed to use CoMutex to protect internal
> data: qed (patches 6-9), sheepdog (patch 10).
> 
> The following driver must be changed to support I/O from any AioContext:
> ssh.  It is fixed by patch 11.
> 
> Paolo
> 
> v1->v2: new patch 8 + adjustments to patch 9 to fix qemu-iotests testcase
> 183 (bdrv_invalidate_cache from block migration)
> 
> Paolo Bonzini (11):
>   qcow2: call CoQueue APIs under CoMutex
>   coroutine-lock: add qemu_co_rwlock_downgrade and
> qemu_co_rwlock_upgrade
>   vdi: make it thread-safe
>   vpc: make it thread-safe
>   vvfat: make it thread-safe
>   qed: move tail of qed_aio_write_main to qed_aio_write_{cow,alloc}
>   block: invoke .bdrv_drain callback in coroutine context and from
> AioContext
>   qed: introduce bdrv_qed_init_state
>   qed: protect table cache with CoMutex
>   sheepdog: add queue_lock
>   ssh: support I/O from any AioContext
> 
>  block/io.c |  42 +++--
>  block/qcow2.c  |   4 +-
>  block/qed-cluster.c|   4 +-
>  block/qed-l2-cache.c   |   6 ++
>  block/qed-table.c  |  24 +++--
>  block/qed.c| 214 
> -
>  block/qed.h|  11 ++-
>  block/sheepdog.c   |  21 -
>  block/ssh.c|  24 +++--
>  block/vdi.c|  48 +-
>  block/vpc.c|  20 ++---
>  block/vvfat.c  |   8 +-
>  include/block/block_int.h  |   2 +-
>  include/qemu/coroutine.h   |  18 
>  util/qemu-coroutine-lock.c |  35 
>  15 files changed, 331 insertions(+), 150 deletions(-)
> 


ping?

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH 12/17] migration: add postcopy migration of dirty bitmaps

2017-07-05 Thread John Snow

On 07/05/2017 05:24 AM, Vladimir Sementsov-Ogievskiy wrote:
> 16.02.2017 16:04, Fam Zheng wrote:
>>> +dbms->node_name = bdrv_get_node_name(bs);
>>> +if (!dbms->node_name || dbms->node_name[0] == '\0') {
>>> +dbms->node_name = bdrv_get_device_name(bs);
>>> +}
>>> +dbms->bitmap = bitmap;
>> What protects the case that the bitmap is released before migration
>> completes?
>>
> What is the source of such deletion? qmp command? Theoretically possible.
> 
> I see the following variants:
> 
> 1. additional variable BdrvDirtyBItmap.migration, which forbids bitmap
> deletion
> 
> 2. make bitmap anonymous (bdrv_dirty_bitmap_make_anon) - it will not be
> available through qmp
> 

Making the bitmap anonymous would forbid us to query the bitmap, which
there is no general reason to do, excepting the idea that a third party
attempting to use the bitmap during a migration is probably a bad idea.
I don't really like the idea of "hiding" information from the user,
though, because then we'd have to worry about name collisions when we
de-anonymized the bitmap again. That's not so palatable.

> what do you think?
> 

The modes for bitmaps are getting messy.

As a reminder, the officially exposed "modes" of a bitmap are currently:

FROZEN: Cannot be reset/deleted. Implication is that the bitmap is
otherwise "ACTIVE."
DISABLED: Not recording any writes (by choice.)
ACTIVE: Actively recording writes.

These are documented in the public API as possibilities for
DirtyBitmapStatus in block-core.json. We didn't add a new condition for
"readonly" either, which I think is actually required:

READONLY: Not recording any writes (by necessity.)

Your new use case here sounds like Frozen to me, but it simply does not
have an anonymous successor to force it to be recognized as "frozen." We
can add a `bool protected` or `bool frozen` field to force recognition
of this status and adjust the json documentation accordingly.

I think then we'd have four recognized states:

FROZEN: Cannot be reset/deleted. Bitmap is in-use by a block job or
other internal process. Bitmap is otherwise ACTIVE.
DISABLED: Not recording any writes (by choice.)
READONLY: Not able to record any writes (by necessity.)
ACTIVE: Normal bitmap status.

Sound right?

[Qemu-block] [PATCH v4 18/21] backup: Switch backup_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of backups to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are cluster-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v2-v4: no change
---
 block/backup.c | 32 +++-
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index c029d44..04def91 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -370,11 +370,10 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 int ret = 0;
 int clusters_per_iter;
 uint32_t granularity;
-int64_t sector;
+int64_t offset;
 int64_t cluster;
 int64_t end;
 int64_t last_cluster = -1;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
 BdrvDirtyBitmapIter *dbi;

 granularity = bdrv_dirty_bitmap_granularity(job->sync_bitmap);
@@ -382,8 +381,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 dbi = bdrv_dirty_iter_new(job->sync_bitmap, 0);

 /* Find the next dirty sector(s) */
-while ((sector = bdrv_dirty_iter_next(dbi)) != -1) {
-cluster = sector / sectors_per_cluster;
+while ((offset = bdrv_dirty_iter_next(dbi) * BDRV_SECTOR_SIZE) >= 0) {
+cluster = offset / job->cluster_size;

 /* Fake progress updates for any clusters we skipped */
 if (cluster != last_cluster + 1) {
@@ -410,7 +409,8 @@ static int coroutine_fn 
backup_run_incremental(BackupBlockJob *job)
 /* If the bitmap granularity is smaller than the backup granularity,
  * we need to advance the iterator pointer to the next cluster. */
 if (granularity < job->cluster_size) {
-bdrv_set_dirty_iter(dbi, cluster * sectors_per_cluster);
+bdrv_set_dirty_iter(dbi,
+cluster * job->cluster_size / 
BDRV_SECTOR_SIZE);
 }

 last_cluster = cluster - 1;
@@ -432,17 +432,15 @@ static void coroutine_fn backup_run(void *opaque)
 BackupBlockJob *job = opaque;
 BackupCompleteData *data;
 BlockDriverState *bs = blk_bs(job->common.blk);
-int64_t start, end;
+int64_t offset;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;

 QLIST_INIT(&job->inflight_reqs);
 qemu_co_rwlock_init(&job->flush_rwlock);

-start = 0;
-end = DIV_ROUND_UP(job->common.len, job->cluster_size);
-
-job->done_bitmap = bitmap_new(end);
+job->done_bitmap = bitmap_new(DIV_ROUND_UP(job->common.len,
+   job->cluster_size));

 job->before_write.notify = backup_before_write_notify;
 bdrv_add_before_write_notifier(bs, &job->before_write);
@@ -457,7 +455,8 @@ static void coroutine_fn backup_run(void *opaque)
 ret = backup_run_incremental(job);
 } else {
 /* Both FULL and TOP SYNC_MODE's require copying.. */
-for (; start < end; start++) {
+for (offset = 0; offset < job->common.len;
+ offset += job->cluster_size) {
 bool error_is_read;
 int alloced = 0;

@@ -480,8 +479,8 @@ static void coroutine_fn backup_run(void *opaque)
  * needed but at some point that is always the case. */
 alloced =
 bdrv_is_allocated(bs,
-start * sectors_per_cluster + i,
-sectors_per_cluster - i, &n);
+  (offset >> BDRV_SECTOR_BITS) + i,
+  sectors_per_cluster - i, &n);
 i += n;

 if (alloced || n == 0) {
@@ -499,9 +498,8 @@ static void coroutine_fn backup_run(void *opaque)
 if (alloced < 0) {
 ret = alloced;
 } else {
-ret = backup_do_cow(job, start * job->cluster_size,
-job->cluster_size, &error_is_read,
-false);
+ret = backup_do_cow(job, offset, job->cluster_size,
+&error_is_read, false);
 }
 if (ret < 0) {
 /* Depending on error action, fail now or retry cluster */
@@ -510,7 +508,7 @@ static void coroutine_fn backup_run(void *opaque)
 if (action == BLOCK_ERROR_ACTION_REPORT) {
 break;
 } else {
-start--;
+offset -= job->cluster_size;
 continue;
 }
 }
-- 
2.9.4

[Qemu-block] [PATCH v4 19/21] block: Make bdrv_is_allocated() byte-based

2017-07-05 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned
on input and that *pnum is sector-aligned on return to the caller,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, this code adds usages like
DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned
values, where the call might reasonbly give non-aligned results
in the future; on the other hand, no rounding is needed for callers
that should just continue to work with byte alignment.

For the most part this patch is just the addition of scaling at the
callers followed by inverse scaling at bdrv_is_allocated().  But
some code, particularly bdrv_commit(), gets a lot simpler because it
no longer has to mess with sectors; also, it is now possible to pass
NULL if the caller does not care how much of the image is allocated
beyond the initial offset.

For ease of review, bdrv_is_allocated_above() will be tackled
separately.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Juan Quintela 
Reviewed-by: Jeff Cody 

---
v4: rebase to vvfat TAB cleanup, R-b kept
v3: no change
v2: rebase to earlier changes, tweak commit message
---
 include/block/block.h |  4 +--
 block/backup.c| 17 -
 block/commit.c| 21 +++-
 block/io.c| 49 +---
 block/stream.c|  5 ++--
 block/vvfat.c | 34 ++---
 migration/block.c |  9 ---
 qemu-img.c|  5 +++-
 qemu-io-cmds.c| 70 +++
 9 files changed, 114 insertions(+), 100 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index a9dc753..d3e01fb 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -427,8 +427,8 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int64_t sector_num,
 int nb_sectors, int *pnum,
 BlockDriverState **file);
-int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-  int *pnum);
+int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
+  int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
 int64_t sector_num, int nb_sectors, int *pnum);

diff --git a/block/backup.c b/block/backup.c
index 04def91..b2048bf 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -47,12 +47,6 @@ typedef struct BackupBlockJob {
 QLIST_HEAD(, CowRequest) inflight_reqs;
 } BackupBlockJob;

-/* Size of a cluster in sectors, instead of bytes. */
-static inline int64_t cluster_size_sectors(BackupBlockJob *job)
-{
-  return job->cluster_size / BDRV_SECTOR_SIZE;
-}
-
 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
int64_t offset,
@@ -433,7 +427,6 @@ static void coroutine_fn backup_run(void *opaque)
 BackupCompleteData *data;
 BlockDriverState *bs = blk_bs(job->common.blk);
 int64_t offset;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
 int ret = 0;

 QLIST_INIT(&job->inflight_reqs);
@@ -465,12 +458,13 @@ static void coroutine_fn backup_run(void *opaque)
 }

 if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
-int i, n;
+int i;
+int64_t n;

 /* Check to see if these blocks are already in the
  * backing file. */

-for (i = 0; i < sectors_per_cluster;) {
+for (i = 0; i < job->cluster_size;) {
 /* bdrv_is_allocated() only returns true/false based
  * on the first set of sectors it comes across that
  * are are all in the same state.
@@ -478,9 +472,8 @@ static void coroutine_fn backup_run(void *opaque)
  * backup cluster length.  We end up copying more than
  * needed but at some point that is always the case. */
 alloced =
-bdrv_is_allocated(bs,
-  (offset >> BDRV_SECTOR_BITS) + i,
-  sectors_per_cluster - i, &n);
+bdrv_is_allocated(bs, offset + i,
+

[Qemu-block] [PATCH v4 16/21] backup: Switch block_backup.h to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting
the public interface to backup jobs (no semantic change), including
a change to CowRequest to track by bytes instead of cluster indices.

Note that this does not change the difference between the public
interface (starting point, and size of the subsequent range) and
the internal interface (starting and end points).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Xie Changlong 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: change a couple more parameter names
---
 include/block/block_backup.h | 11 +--
 block/backup.c   | 33 -
 block/replication.c  | 12 
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/include/block/block_backup.h b/include/block/block_backup.h
index 8a75947..994a3bd 100644
--- a/include/block/block_backup.h
+++ b/include/block/block_backup.h
@@ -21,17 +21,16 @@
 #include "block/block_int.h"

 typedef struct CowRequest {
-int64_t start;
-int64_t end;
+int64_t start_byte;
+int64_t end_byte;
 QLIST_ENTRY(CowRequest) list;
 CoQueue wait_queue; /* coroutines blocked on this request */
 } CowRequest;

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-  int nb_sectors);
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+  uint64_t bytes);
 void backup_cow_request_begin(CowRequest *req, BlockJob *job,
-  int64_t sector_num,
-  int nb_sectors);
+  int64_t offset, uint64_t bytes);
 void backup_cow_request_end(CowRequest *req);

 void backup_do_checkpoint(BlockJob *job, Error **errp);
diff --git a/block/backup.c b/block/backup.c
index 4e64710..cfbd921 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -55,7 +55,7 @@ static inline int64_t cluster_size_sectors(BackupBlockJob 
*job)

 /* See if in-flight requests overlap and wait for them to complete */
 static void coroutine_fn wait_for_overlapping_requests(BackupBlockJob *job,
-   int64_t start,
+   int64_t offset,
int64_t end)
 {
 CowRequest *req;
@@ -64,7 +64,7 @@ static void coroutine_fn 
wait_for_overlapping_requests(BackupBlockJob *job,
 do {
 retry = false;
 QLIST_FOREACH(req, &job->inflight_reqs, list) {
-if (end > req->start && start < req->end) {
+if (end > req->start_byte && offset < req->end_byte) {
 qemu_co_queue_wait(&req->wait_queue, NULL);
 retry = true;
 break;
@@ -75,10 +75,10 @@ static void coroutine_fn 
wait_for_overlapping_requests(BackupBlockJob *job,

 /* Keep track of an in-flight request */
 static void cow_request_begin(CowRequest *req, BackupBlockJob *job,
- int64_t start, int64_t end)
+  int64_t offset, int64_t end)
 {
-req->start = start;
-req->end = end;
+req->start_byte = offset;
+req->end_byte = end;
 qemu_co_queue_init(&req->wait_queue);
 QLIST_INSERT_HEAD(&job->inflight_reqs, req, list);
 }
@@ -114,8 +114,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
   sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE);

-wait_for_overlapping_requests(job, start, end);
-cow_request_begin(&cow_request, job, start, end);
+wait_for_overlapping_requests(job, start * job->cluster_size,
+  end * job->cluster_size);
+cow_request_begin(&cow_request, job, start * job->cluster_size,
+  end * job->cluster_size);

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
@@ -277,32 +279,29 @@ void backup_do_checkpoint(BlockJob *job, Error **errp)
 bitmap_zero(backup_job->done_bitmap, len);
 }

-void backup_wait_for_overlapping_requests(BlockJob *job, int64_t sector_num,
-  int nb_sectors)
+void backup_wait_for_overlapping_requests(BlockJob *job, int64_t offset,
+  uint64_t bytes)
 {
 BackupBlockJob *backup_job = container_of(job, BackupBlockJob, common);
-int64_t sectors_per_cluster = cluster_size_sectors(backup_job);
 int64_t start, end;

 assert(job->driver->job_type == BLOCK_JOB_TYPE_BACKUP);

-start = sector_num / sectors_per_cluster;
-end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+start = QEMU_ALIGN_DOWN(offset, backup_job->cluster_size);
+end = QEMU_ALIGN_UP(offset + bytes, backup_job->cluster_size);
 wait_for

[Qemu-block] [PATCH v4 17/21] backup: Switch backup_do_cow() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v2-v4: no change
---
 block/backup.c | 62 --
 1 file changed, 26 insertions(+), 36 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index cfbd921..c029d44 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -91,7 +91,7 @@ static void cow_request_end(CowRequest *req)
 }

 static int coroutine_fn backup_do_cow(BackupBlockJob *job,
-  int64_t sector_num, int nb_sectors,
+  int64_t offset, uint64_t bytes,
   bool *error_is_read,
   bool is_write_notifier)
 {
@@ -101,34 +101,28 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 QEMUIOVector bounce_qiov;
 void *bounce_buffer = NULL;
 int ret = 0;
-int64_t sectors_per_cluster = cluster_size_sectors(job);
-int64_t start, end; /* clusters */
+int64_t start, end; /* bytes */
 int n; /* bytes */

 qemu_co_rwlock_rdlock(&job->flush_rwlock);

-start = sector_num / sectors_per_cluster;
-end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);
+start = QEMU_ALIGN_DOWN(offset, job->cluster_size);
+end = QEMU_ALIGN_UP(bytes + offset, job->cluster_size);

-trace_backup_do_cow_enter(job, start * job->cluster_size,
-  sector_num * BDRV_SECTOR_SIZE,
-  nb_sectors * BDRV_SECTOR_SIZE);
+trace_backup_do_cow_enter(job, start, offset, bytes);

-wait_for_overlapping_requests(job, start * job->cluster_size,
-  end * job->cluster_size);
-cow_request_begin(&cow_request, job, start * job->cluster_size,
-  end * job->cluster_size);
+wait_for_overlapping_requests(job, start, end);
+cow_request_begin(&cow_request, job, start, end);

-for (; start < end; start++) {
-if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start * job->cluster_size);
+for (; start < end; start += job->cluster_size) {
+if (test_bit(start / job->cluster_size, job->done_bitmap)) {
+trace_backup_do_cow_skip(job, start);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start * job->cluster_size);
+trace_backup_do_cow_process(job, start);

-n = MIN(job->cluster_size,
-job->common.len - start * job->cluster_size);
+n = MIN(job->cluster_size, job->common.len - start);

 if (!bounce_buffer) {
 bounce_buffer = blk_blockalign(blk, job->cluster_size);
@@ -137,11 +131,10 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 iov.iov_len = n;
 qemu_iovec_init_external(&bounce_qiov, &iov, 1);

-ret = blk_co_preadv(blk, start * job->cluster_size,
-bounce_qiov.size, &bounce_qiov,
+ret = blk_co_preadv(blk, start, bounce_qiov.size, &bounce_qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
+trace_backup_do_cow_read_fail(job, start, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -149,22 +142,22 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 }

 if (buffer_is_zero(iov.iov_base, iov.iov_len)) {
-ret = blk_co_pwrite_zeroes(job->target, start * job->cluster_size,
+ret = blk_co_pwrite_zeroes(job->target, start,
bounce_qiov.size, BDRV_REQ_MAY_UNMAP);
 } else {
-ret = blk_co_pwritev(job->target, start * job->cluster_size,
+ret = blk_co_pwritev(job->target, start,
  bounce_qiov.size, &bounce_qiov,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
ret);
+trace_backup_do_cow_write_fail(job, start, ret);
 if (error_is_read) {
 *error_is_read = false;
 }
 goto out;
 }

-set_bit(start, job->done_bitmap);
+set_bit(start / job->cluster_size, job->done_bitmap);

 /* Publish progress, guest I/O counts as progress too.  Note that the
  * offset field is an opaque progress value, it is not a disk offset.
@@ -180,8 +173,7 @@ out:

 cow_request_end(&cow_request);

-trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
-   nb_

[Qemu-block] [PATCH v4 08/21] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to the
buffer size.

Add an assertion that our use of s->granularity >> BDRV_SECTOR_BITS
(necessary for interaction with sector-based dirty bitmaps, until
a later patch converts those to be byte-based) does not suffer from
truncation problems.

[checkpatch has a false positive on use of MIN() in this patch]

Signed-off-by: Eric Blake 

---
v4: add assertion and formatting tweak [Kevin], R-b dropped
v2-v3: no change
---
 block/mirror.c | 82 +-
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index b4dfe95..9aca0cb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -24,9 +24,8 @@

 #define SLICE_TIME1ULL /* ns */
 #define MAX_IN_FLIGHT 16
-#define MAX_IO_SECTORS ((1 << 20) >> BDRV_SECTOR_BITS) /* 1 Mb */
-#define DEFAULT_MIRROR_BUF_SIZE \
-(MAX_IN_FLIGHT * MAX_IO_SECTORS * BDRV_SECTOR_SIZE)
+#define MAX_IO_BYTES (1 << 20) /* 1 Mb */
+#define DEFAULT_MIRROR_BUF_SIZE (MAX_IN_FLIGHT * MAX_IO_BYTES)

 /* The mirroring buffer is a list of granularity-sized chunks.
  * Free chunks are organized in a list.
@@ -67,11 +66,11 @@ typedef struct MirrorBlockJob {
 uint64_t last_pause_ns;
 unsigned long *in_flight_bitmap;
 int in_flight;
-int64_t sectors_in_flight;
+int64_t bytes_in_flight;
 int ret;
 bool unmap;
 bool waiting_for_io;
-int target_cluster_sectors;
+int target_cluster_size;
 int max_iov;
 bool initial_zeroing_ongoing;
 } MirrorBlockJob;
@@ -79,8 +78,8 @@ typedef struct MirrorBlockJob {
 typedef struct MirrorOp {
 MirrorBlockJob *s;
 QEMUIOVector qiov;
-int64_t sector_num;
-int nb_sectors;
+int64_t offset;
+uint64_t bytes;
 } MirrorOp;

 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
@@ -101,13 +100,12 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 MirrorBlockJob *s = op->s;
 struct iovec *iov;
 int64_t chunk_num;
-int i, nb_chunks, sectors_per_chunk;
+int i, nb_chunks;

-trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
-op->nb_sectors * BDRV_SECTOR_SIZE, ret);
+trace_mirror_iteration_done(s, op->offset, op->bytes, ret);

 s->in_flight--;
-s->sectors_in_flight -= op->nb_sectors;
+s->bytes_in_flight -= op->bytes;
 iov = op->qiov.iov;
 for (i = 0; i < op->qiov.niov; i++) {
 MirrorBuffer *buf = (MirrorBuffer *) iov[i].iov_base;
@@ -115,16 +113,15 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 s->buf_free_count++;
 }

-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-chunk_num = op->sector_num / sectors_per_chunk;
-nb_chunks = DIV_ROUND_UP(op->nb_sectors, sectors_per_chunk);
+chunk_num = op->offset / s->granularity;
+nb_chunks = DIV_ROUND_UP(op->bytes, s->granularity);
 bitmap_clear(s->in_flight_bitmap, chunk_num, nb_chunks);
 if (ret >= 0) {
 if (s->cow_bitmap) {
 bitmap_set(s->cow_bitmap, chunk_num, nb_chunks);
 }
 if (!s->initial_zeroing_ongoing) {
-s->common.offset += (uint64_t)op->nb_sectors * BDRV_SECTOR_SIZE;
+s->common.offset += op->bytes;
 }
 }
 qemu_iovec_destroy(&op->qiov);
@@ -144,7 +141,8 @@ static void mirror_write_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+  op->bytes >> BDRV_SECTOR_BITS);
 action = mirror_error_action(s, false, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -163,7 +161,8 @@ static void mirror_read_complete(void *opaque, int ret)
 if (ret < 0) {
 BlockErrorAction action;

-bdrv_set_dirty_bitmap(s->dirty_bitmap, op->sector_num, op->nb_sectors);
+bdrv_set_dirty_bitmap(s->dirty_bitmap, op->offset >> BDRV_SECTOR_BITS,
+  op->bytes >> BDRV_SECTOR_BITS);
 action = mirror_error_action(s, true, -ret);
 if (action == BLOCK_ERROR_ACTION_REPORT && s->ret >= 0) {
 s->ret = ret;
@@ -171,7 +170,7 @@ static void mirror_read_complete(void *opaque, int ret)

 mirror_iteration_done(op, ret);
 } else {
-blk_aio_pwritev(s->target, op->sector_num * BDRV_SECTOR_SIZE, 
&op->qiov,
+blk_aio_pwritev(s->target, op->offset, &op->qiov,
 0, mirror_write_complete, op);
 }
 aio_context_release(blk_get_aio_context(s->common.blk));
@@ -211,7 +210,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
 align_nb_secto

[Qemu-block] [PATCH v4 13/21] mirror: Switch mirror_iteration() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of mirroring to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are both sector-aligned and multiples of the granularity).  Drop
the now-unused mirror_clip_sectors().

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v4: no change
v3: rebase to Paolo's thread-safety changes, R-b kept
v2: straightforward rebase to earlier mirror_clip_bytes() change, R-b kept
---
 block/mirror.c | 105 +
 1 file changed, 46 insertions(+), 59 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index d3325d0..f54a8d7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -184,15 +184,6 @@ static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
 return MIN(bytes, s->bdev_length - offset);
 }

-/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
-static inline int mirror_clip_sectors(MirrorBlockJob *s,
-  int64_t sector_num,
-  int nb_sectors)
-{
-return MIN(nb_sectors,
-   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
-}
-
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
@@ -336,30 +327,28 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
 static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
 {
 BlockDriverState *source = s->source;
-int64_t sector_num, first_chunk;
+int64_t offset, first_chunk;
 uint64_t delay_ns = 0;
 /* At least the first dirty chunk is mirrored in one iteration. */
 int nb_chunks = 1;
-int64_t end = s->bdev_length / BDRV_SECTOR_SIZE;
 int sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
 bool write_zeroes_ok = bdrv_can_write_zeroes_with_unmap(blk_bs(s->target));
 int max_io_bytes = MAX(s->buf_size / MAX_IN_FLIGHT, MAX_IO_BYTES);

 bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-sector_num = bdrv_dirty_iter_next(s->dbi);
-if (sector_num < 0) {
+offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+if (offset < 0) {
 bdrv_set_dirty_iter(s->dbi, 0);
-sector_num = bdrv_dirty_iter_next(s->dbi);
+offset = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap) *
   BDRV_SECTOR_SIZE);
-assert(sector_num >= 0);
+assert(offset >= 0);
 }
 bdrv_dirty_bitmap_unlock(s->dirty_bitmap);

-first_chunk = sector_num / sectors_per_chunk;
+first_chunk = offset / s->granularity;
 while (test_bit(first_chunk, s->in_flight_bitmap)) {
-trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
- s->in_flight);
+trace_mirror_yield_in_flight(s, offset, s->in_flight);
 mirror_wait_for_io(s);
 }

@@ -368,25 +357,26 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 /* Find the number of consective dirty chunks following the first dirty
  * one, and wait for in flight requests in them. */
 bdrv_dirty_bitmap_lock(s->dirty_bitmap);
-while (nb_chunks * sectors_per_chunk < (s->buf_size >> BDRV_SECTOR_BITS)) {
+while (nb_chunks * s->granularity < s->buf_size) {
 int64_t next_dirty;
-int64_t next_sector = sector_num + nb_chunks * sectors_per_chunk;
-int64_t next_chunk = next_sector / sectors_per_chunk;
-if (next_sector >= end ||
-!bdrv_get_dirty_locked(source, s->dirty_bitmap, next_sector)) {
+int64_t next_offset = offset + nb_chunks * s->granularity;
+int64_t next_chunk = next_offset / s->granularity;
+if (next_offset >= s->bdev_length ||
+!bdrv_get_dirty_locked(source, s->dirty_bitmap,
+   next_offset >> BDRV_SECTOR_BITS)) {
 break;
 }
 if (test_bit(next_chunk, s->in_flight_bitmap)) {
 break;
 }

-next_dirty = bdrv_dirty_iter_next(s->dbi);
-if (next_dirty > next_sector || next_dirty < 0) {
+next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
+if (next_dirty > next_offset || next_dirty < 0) {
 /* The bitmap iterator's cache is stale, refresh it */
-bdrv_set_dirty_iter(s->dbi, next_sector);
-next_dirty = bdrv_dirty_iter_next(s->dbi);
+bdrv_set_dirty_iter(s->dbi, next_offset >> BDRV_SECTOR_BITS);
+next_dirty = bdrv_dirty_iter_next(s->dbi) * BDRV_SECTOR_SIZE;
 }
-assert(next_dirty == next_sector);
+assert(next_dirty == next_offset);
 nb_chunks++;
 }

@@ -394,14 +3

[Qemu-block] [PATCH v4 14/21] block: Drop unused bdrv_round_sectors_to_clusters()

2017-07-05 Thread Eric Blake

Now that the last user [mirror_iteration()] has converted to using
bytes, we no longer need a function to round sectors to clusters.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v3-v4: no change
v2: hoist to earlier series, no change
---
 include/block/block.h |  4 
 block/io.c| 21 -
 2 files changed, 25 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index f432ea6..a9dc753 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -472,10 +472,6 @@ const char *bdrv_get_device_or_node_name(const 
BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors);
 void bdrv_round_to_clusters(BlockDriverState *bs,
 int64_t offset, unsigned int bytes,
 int64_t *cluster_offset,
diff --git a/block/io.c b/block/io.c
index 9186c5a..f662c97 100644
--- a/block/io.c
+++ b/block/io.c
@@ -419,27 +419,6 @@ static void mark_request_serialising(BdrvTrackedRequest 
*req, uint64_t align)
 }

 /**
- * Round a region to cluster boundaries (sector-based)
- */
-void bdrv_round_sectors_to_clusters(BlockDriverState *bs,
-int64_t sector_num, int nb_sectors,
-int64_t *cluster_sector_num,
-int *cluster_nb_sectors)
-{
-BlockDriverInfo bdi;
-
-if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
-*cluster_sector_num = sector_num;
-*cluster_nb_sectors = nb_sectors;
-} else {
-int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
-*cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
-*cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
-nb_sectors, c);
-}
-}
-
-/**
  * Round a region to cluster boundaries
  */
 void bdrv_round_to_clusters(BlockDriverState *bs,
-- 
2.9.4

[Qemu-block] [PATCH v4 12/21] mirror: Switch mirror_do_read() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: rebase to earlier changes
---
 block/mirror.c | 75 ++
 1 file changed, 33 insertions(+), 42 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 374cefd..d3325d0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -196,7 +196,7 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
 /* Round offset and/or bytes to target cluster if COW is needed, and
  * return the offset of the adjusted tail against original. */
 static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
-unsigned int *bytes)
+uint64_t *bytes)
 {
 bool need_cow;
 int ret = 0;
@@ -204,6 +204,7 @@ static int mirror_cow_align(MirrorBlockJob *s, int64_t 
*offset,
 unsigned int align_bytes = *bytes;
 int max_bytes = s->granularity * s->max_iov;

+assert(*bytes < INT_MAX);
 need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
 need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
   s->cow_bitmap);
@@ -239,59 +240,50 @@ static inline void mirror_wait_for_io(MirrorBlockJob *s)
 }

 /* Submit async read while handling COW.
- * Returns: The number of sectors copied after and including sector_num,
- *  excluding any sectors copied prior to sector_num due to alignment.
- *  This will be nb_sectors if no alignment is necessary, or
- *  (new_end - sector_num) if tail is rounded up or down due to
+ * Returns: The number of bytes copied after and including offset,
+ *  excluding any bytes copied prior to offset due to alignment.
+ *  This will be @bytes if no alignment is necessary, or
+ *  (new_end - offset) if tail is rounded up or down due to
  *  alignment or buffer limit.
  */
-static int mirror_do_read(MirrorBlockJob *s, int64_t sector_num,
-  int nb_sectors)
+static uint64_t mirror_do_read(MirrorBlockJob *s, int64_t offset,
+   uint64_t bytes)
 {
 BlockBackend *source = s->common.blk;
-int sectors_per_chunk, nb_chunks;
-int ret;
+int nb_chunks;
+uint64_t ret;
 MirrorOp *op;
-int max_sectors;
+uint64_t max_bytes;

-sectors_per_chunk = s->granularity >> BDRV_SECTOR_BITS;
-max_sectors = sectors_per_chunk * s->max_iov;
+max_bytes = s->granularity * s->max_iov;

 /* We can only handle as much as buf_size at a time. */
-nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
-nb_sectors = MIN(max_sectors, nb_sectors);
-assert(nb_sectors);
-assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
-ret = nb_sectors;
+bytes = MIN(s->buf_size, MIN(max_bytes, bytes));
+assert(bytes);
+assert(bytes < BDRV_REQUEST_MAX_BYTES);
+ret = bytes;

 if (s->cow_bitmap) {
-int64_t offset = sector_num * BDRV_SECTOR_SIZE;
-unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
-int gap;
-
-gap = mirror_cow_align(s, &offset, &bytes);
-sector_num = offset / BDRV_SECTOR_SIZE;
-nb_sectors = bytes / BDRV_SECTOR_SIZE;
-ret += gap / BDRV_SECTOR_SIZE;
+ret += mirror_cow_align(s, &offset, &bytes);
 }
-assert(nb_sectors << BDRV_SECTOR_BITS <= s->buf_size);
-/* The sector range must meet granularity because:
+assert(bytes <= s->buf_size);
+/* The range will be sector-aligned because:
  * 1) Caller passes in aligned values;
- * 2) mirror_cow_align is used only when target cluster is larger. */
-assert(!(sector_num % sectors_per_chunk));
-nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);
+ * 2) mirror_cow_align is used only when target cluster is larger.
+ * But it might not be cluster-aligned at end-of-file. */
+assert(QEMU_IS_ALIGNED(bytes, BDRV_SECTOR_SIZE));
+nb_chunks = DIV_ROUND_UP(bytes, s->granularity);

 while (s->buf_free_count < nb_chunks) {
-trace_mirror_yield_in_flight(s, sector_num * BDRV_SECTOR_SIZE,
- s->in_flight);
+trace_mirror_yield_in_flight(s, offset, s->in_flight);
 mirror_wait_for_io(s);
 }

 /* Allocate a MirrorOp that is used as an AIO callback.  */
 op = g_new(MirrorOp, 1);
 op->s = s;
-op->offset = sector_num * BDRV_SECTOR_SIZE;
-op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+op->offset = offset;
+op->bytes = bytes;

 /* Now make a QEMUIOVector taking enough granularity-sized chunks
  * from s->buf_free.
@@ -299,7 +291,7 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 qemu_iovec_init(&op->qiov, nb_chunks);
 while (nb_chunks-- > 0) {
 MirrorBuffer *buf

[Qemu-block] [PATCH v4 20/21] block: Minimize raw use of bds->total_sectors

2017-07-05 Thread Eric Blake

bdrv_is_allocated_above() was relying on intermediate->total_sectors,
which is a field that can have stale contents depending on the value
of intermediate->has_variable_length.  An audit shows that we are safe
(we were first calling through bdrv_co_get_block_status() which in
turn calls bdrv_nb_sectors() and therefore just refreshed the current
length), but it's nicer to favor our accessor functions to avoid having
to repeat such an audit, even if it means refresh_total_sectors() is
called more frequently.

Suggested-by: John Snow 
Signed-off-by: Eric Blake 
Reviewed-by: Manos Pitsidianakis 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: new patch
---
 block/io.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/io.c b/block/io.c
index cb40069..fb8d1c7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1952,6 +1952,7 @@ int bdrv_is_allocated_above(BlockDriverState *top,
 intermediate = top;
 while (intermediate && intermediate != base) {
 int64_t pnum_inter;
+int64_t size_inter;
 int psectors_inter;

 ret = bdrv_is_allocated(intermediate, sector_num * BDRV_SECTOR_SIZE,
@@ -1969,13 +1970,14 @@ int bdrv_is_allocated_above(BlockDriverState *top,

 /*
  * [sector_num, nb_sectors] is unallocated on top but intermediate
- * might have
- *
- * [sector_num+x, nr_sectors] allocated.
+ * might have [sector_num+x, nb_sectors-x] allocated.
  */
+size_inter = bdrv_nb_sectors(intermediate);
+if (size_inter < 0) {
+return size_inter;
+}
 if (n > psectors_inter &&
-(intermediate == top ||
- sector_num + psectors_inter < intermediate->total_sectors)) {
+(intermediate == top || sector_num + psectors_inter < size_inter)) 
{
 n = psectors_inter;
 }

-- 
2.9.4

[Qemu-block] [PATCH v4 21/21] block: Make bdrv_is_allocated_above() byte-based

2017-07-05 Thread Eric Blake

We are gradually moving away from sector-based interfaces, towards
byte-based.  In the common case, allocation is unlikely to ever use
values that are not naturally sector-aligned, but it is possible
that byte-based values will let us be more precise about allocation
at the end of an unaligned file that can do byte-based access.

Changing the signature of the function to use int64_t *pnum ensures
that the compiler enforces that all callers are updated.  For now,
the io.c layer still assert()s that all callers are sector-aligned,
but that can be relaxed when a later patch implements byte-based
block status.  Therefore, for the most part this patch is just the
addition of scaling at the callers followed by inverse scaling at
bdrv_is_allocated().  But some code, particularly stream_run(),
gets a lot simpler because it no longer has to mess with sectors.

For ease of review, bdrv_is_allocated() was tackled separately.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Xie Changlong  [replication 
part]
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: tweak function comments, favor bdrv_getlength() over ->total_sectors
---
 include/block/block.h |  2 +-
 block/commit.c| 20 
 block/io.c| 42 --
 block/mirror.c|  5 -
 block/replication.c   | 17 -
 block/stream.c| 21 +
 qemu-img.c| 10 +++---
 7 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index d3e01fb..f0fdbe8 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -430,7 +430,7 @@ int64_t bdrv_get_block_status_above(BlockDriverState *bs,
 int bdrv_is_allocated(BlockDriverState *bs, int64_t offset, int64_t bytes,
   int64_t *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-int64_t sector_num, int nb_sectors, int *pnum);
+int64_t offset, int64_t bytes, int64_t *pnum);

 bool bdrv_is_read_only(BlockDriverState *bs);
 bool bdrv_is_writable(BlockDriverState *bs);
diff --git a/block/commit.c b/block/commit.c
index 241aa95..774a8a5 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -146,7 +146,7 @@ static void coroutine_fn commit_run(void *opaque)
 int64_t offset;
 uint64_t delay_ns = 0;
 int ret = 0;
-int n = 0; /* sectors */
+int64_t n = 0; /* bytes */
 void *buf = NULL;
 int bytes_written = 0;
 int64_t base_len;
@@ -171,7 +171,7 @@ static void coroutine_fn commit_run(void *opaque)

 buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
+for (offset = 0; offset < s->common.len; offset += n) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -183,15 +183,12 @@ static void coroutine_fn commit_run(void *opaque)
 }
 /* Copy if allocated above the base */
 ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-  offset / BDRV_SECTOR_SIZE,
-  COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
-  &n);
+  offset, COMMIT_BUFFER_SIZE, &n);
 copy = (ret == 1);
-trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
+trace_commit_one_iteration(s, offset, n, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base, offset,
-  n * BDRV_SECTOR_SIZE, buf);
-bytes_written += n * BDRV_SECTOR_SIZE;
+ret = commit_populate(s->top, s->base, offset, n, buf);
+bytes_written += n;
 }
 if (ret < 0) {
 BlockErrorAction action =
@@ -204,11 +201,10 @@ static void coroutine_fn commit_run(void *opaque)
 }
 }
 /* Publish progress */
-s->common.offset += n * BDRV_SECTOR_SIZE;
+s->common.offset += n;

 if (copy && s->common.speed) {
-delay_ns = ratelimit_calculate_delay(&s->limit,
- n * BDRV_SECTOR_SIZE);
+delay_ns = ratelimit_calculate_delay(&s->limit, n);
 }
 }

diff --git a/block/io.c b/block/io.c
index fb8d1c7..569c503 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1931,54 +1931,52 @@ int coroutine_fn bdrv_is_allocated(BlockDriverState 
*bs, int64_t offset,
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
- * Return true if the given sector is allocated in any image between
- * BASE and TOP (inclusive).  BASE can be NULL to check if the given
- * sector is allocated in any image of the chain.  Return false otherwise,
+ * Return true if the (prefix of the) given range is allocated in any image
+ * between BASE and T

[Qemu-block] [PATCH v4 15/21] backup: Switch BackupBlockJob to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Continue by converting an
internal structure (no semantic change), and all references to
tracking progress.  Drop a redundant local variable bytes_per_cluster.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
---
v2-v4: no change
---
 block/backup.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 06431ac..4e64710 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -39,7 +39,7 @@ typedef struct BackupBlockJob {
 BlockdevOnError on_source_error;
 BlockdevOnError on_target_error;
 CoRwlock flush_rwlock;
-uint64_t sectors_read;
+uint64_t bytes_read;
 unsigned long *done_bitmap;
 int64_t cluster_size;
 bool compress;
@@ -102,16 +102,15 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 void *bounce_buffer = NULL;
 int ret = 0;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
-int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
-int64_t start, end;
-int n;
+int64_t start, end; /* clusters */
+int n; /* bytes */

 qemu_co_rwlock_rdlock(&job->flush_rwlock);

 start = sector_num / sectors_per_cluster;
 end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+trace_backup_do_cow_enter(job, start * job->cluster_size,
   sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE);

@@ -120,28 +119,27 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start * bytes_per_cluster);
+trace_backup_do_cow_skip(job, start * job->cluster_size);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start * bytes_per_cluster);
+trace_backup_do_cow_process(job, start * job->cluster_size);

-n = MIN(sectors_per_cluster,
-job->common.len / BDRV_SECTOR_SIZE -
-start * sectors_per_cluster);
+n = MIN(job->cluster_size,
+job->common.len - start * job->cluster_size);

 if (!bounce_buffer) {
 bounce_buffer = blk_blockalign(blk, job->cluster_size);
 }
 iov.iov_base = bounce_buffer;
-iov.iov_len = n * BDRV_SECTOR_SIZE;
+iov.iov_len = n;
 qemu_iovec_init_external(&bounce_qiov, &iov, 1);

 ret = blk_co_preadv(blk, start * job->cluster_size,
 bounce_qiov.size, &bounce_qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
+trace_backup_do_cow_read_fail(job, start * job->cluster_size, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -157,7 +155,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
ret);
+trace_backup_do_cow_write_fail(job, start * job->cluster_size, 
ret);
 if (error_is_read) {
 *error_is_read = false;
 }
@@ -169,8 +167,8 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 /* Publish progress, guest I/O counts as progress too.  Note that the
  * offset field is an opaque progress value, it is not a disk offset.
  */
-job->sectors_read += n;
-job->common.offset += n * BDRV_SECTOR_SIZE;
+job->bytes_read += n;
+job->common.offset += n;
 }

 out:
@@ -363,9 +361,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
  */
 if (job->common.speed) {
 uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-  job->sectors_read *
-  BDRV_SECTOR_SIZE);
-job->sectors_read = 0;
+  job->bytes_read);
+job->bytes_read = 0;
 block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
 } else {
 block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, 0);
-- 
2.9.4

[Qemu-block] [PATCH v4 09/21] mirror: Switch mirror_do_zero_or_discard() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v2-v4: no change
---
 block/mirror.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 9aca0cb..2736d0b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -305,8 +305,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 }

 static void mirror_do_zero_or_discard(MirrorBlockJob *s,
-  int64_t sector_num,
-  int nb_sectors,
+  int64_t offset,
+  uint64_t bytes,
   bool is_discard)
 {
 MirrorOp *op;
@@ -315,16 +315,16 @@ static void mirror_do_zero_or_discard(MirrorBlockJob *s,
  * so the freeing in mirror_iteration_done is nop. */
 op = g_new0(MirrorOp, 1);
 op->s = s;
-op->offset = sector_num * BDRV_SECTOR_SIZE;
-op->bytes = nb_sectors * BDRV_SECTOR_SIZE;
+op->offset = offset;
+op->bytes = bytes;

 s->in_flight++;
-s->bytes_in_flight += nb_sectors * BDRV_SECTOR_SIZE;
+s->bytes_in_flight += bytes;
 if (is_discard) {
-blk_aio_pdiscard(s->target, sector_num << BDRV_SECTOR_BITS,
+blk_aio_pdiscard(s->target, offset,
  op->bytes, mirror_write_complete, op);
 } else {
-blk_aio_pwrite_zeroes(s->target, sector_num * BDRV_SECTOR_SIZE,
+blk_aio_pwrite_zeroes(s->target, offset,
   op->bytes, s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
   mirror_write_complete, op);
 }
@@ -453,7 +453,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 break;
 case MIRROR_METHOD_ZERO:
 case MIRROR_METHOD_DISCARD:
-mirror_do_zero_or_discard(s, sector_num, io_sectors,
+mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+  io_sectors * BDRV_SECTOR_SIZE,
   mirror_method == MIRROR_METHOD_DISCARD);
 if (write_zeroes_ok) {
 io_bytes_acct = 0;
@@ -657,7 +658,8 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 continue;
 }

-mirror_do_zero_or_discard(s, sector_num, nb_sectors, false);
+mirror_do_zero_or_discard(s, sector_num * BDRV_SECTOR_SIZE,
+  nb_sectors * BDRV_SECTOR_SIZE, false);
 sector_num += nb_sectors;
 }

-- 
2.9.4

[Qemu-block] [PATCH v4 11/21] mirror: Switch mirror_cow_align() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Convert another internal
function (no semantic change), and add mirror_clip_bytes() as a
counterpart to mirror_clip_sectors().  Some of the conversion is
a bit tricky, requiring temporaries to convert between units; it
will be cleared up in a following patch.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v3-v4: no change
v2: tweak mirror_clip_bytes() signature to match previous patch
---
 block/mirror.c | 64 ++
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 70682b6..374cefd 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,6 +176,15 @@ static void mirror_read_complete(void *opaque, int ret)
 aio_context_release(blk_get_aio_context(s->common.blk));
 }

+/* Clip bytes relative to offset to not exceed end-of-file */
+static inline int64_t mirror_clip_bytes(MirrorBlockJob *s,
+int64_t offset,
+int64_t bytes)
+{
+return MIN(bytes, s->bdev_length - offset);
+}
+
+/* Clip nb_sectors relative to sector_num to not exceed end-of-file */
 static inline int mirror_clip_sectors(MirrorBlockJob *s,
   int64_t sector_num,
   int nb_sectors)
@@ -184,44 +193,39 @@ static inline int mirror_clip_sectors(MirrorBlockJob *s,
s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

-/* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
- * return the offset of the adjusted tail sector against original. */
-static int mirror_cow_align(MirrorBlockJob *s,
-int64_t *sector_num,
-int *nb_sectors)
+/* Round offset and/or bytes to target cluster if COW is needed, and
+ * return the offset of the adjusted tail against original. */
+static int mirror_cow_align(MirrorBlockJob *s, int64_t *offset,
+unsigned int *bytes)
 {
 bool need_cow;
 int ret = 0;
-int chunk_sectors = s->granularity >> BDRV_SECTOR_BITS;
-int64_t align_sector_num = *sector_num;
-int align_nb_sectors = *nb_sectors;
-int max_sectors = chunk_sectors * s->max_iov;
+int64_t align_offset = *offset;
+unsigned int align_bytes = *bytes;
+int max_bytes = s->granularity * s->max_iov;

-need_cow = !test_bit(*sector_num / chunk_sectors, s->cow_bitmap);
-need_cow |= !test_bit((*sector_num + *nb_sectors - 1) / chunk_sectors,
+need_cow = !test_bit(*offset / s->granularity, s->cow_bitmap);
+need_cow |= !test_bit((*offset + *bytes - 1) / s->granularity,
   s->cow_bitmap);
 if (need_cow) {
-bdrv_round_sectors_to_clusters(blk_bs(s->target), *sector_num,
-   *nb_sectors, &align_sector_num,
-   &align_nb_sectors);
+bdrv_round_to_clusters(blk_bs(s->target), *offset, *bytes,
+   &align_offset, &align_bytes);
 }

-if (align_nb_sectors > max_sectors) {
-align_nb_sectors = max_sectors;
+if (align_bytes > max_bytes) {
+align_bytes = max_bytes;
 if (need_cow) {
-align_nb_sectors = QEMU_ALIGN_DOWN(align_nb_sectors,
-   s->target_cluster_size >>
-   BDRV_SECTOR_BITS);
+align_bytes = QEMU_ALIGN_DOWN(align_bytes,
+  s->target_cluster_size);
 }
 }
-/* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
+/* Clipping may result in align_bytes unaligned to chunk boundary, but
  * that doesn't matter because it's already the end of source image. */
-align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
-   align_nb_sectors);
+align_bytes = mirror_clip_bytes(s, align_offset, align_bytes);

-ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
-*sector_num = align_sector_num;
-*nb_sectors = align_nb_sectors;
+ret = align_offset + align_bytes - (*offset + *bytes);
+*offset = align_offset;
+*bytes = align_bytes;
 assert(ret >= 0);
 return ret;
 }
@@ -257,10 +261,18 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 nb_sectors = MIN(s->buf_size >> BDRV_SECTOR_BITS, nb_sectors);
 nb_sectors = MIN(max_sectors, nb_sectors);
 assert(nb_sectors);
+assert(nb_sectors < BDRV_REQUEST_MAX_SECTORS);
 ret = nb_sectors;

 if (s->cow_bitmap) {
-ret += mirror_cow_align(s, §or_num, &nb_sectors);
+int64_t offset = sector_num * BDRV_SECTOR_SIZE;
+unsigned int bytes = nb_sectors * BDRV_SECTOR_SIZE;
+i

[Qemu-block] [PATCH v4 06/21] commit: Switch commit_populate() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: no change
---
 block/commit.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 4cda7f2..6f67d78 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -47,26 +47,25 @@ typedef struct CommitBlockJob {
 } CommitBlockJob;

 static int coroutine_fn commit_populate(BlockBackend *bs, BlockBackend *base,
-int64_t sector_num, int nb_sectors,
+int64_t offset, uint64_t bytes,
 void *buf)
 {
 int ret = 0;
 QEMUIOVector qiov;
 struct iovec iov = {
 .iov_base = buf,
-.iov_len = nb_sectors * BDRV_SECTOR_SIZE,
+.iov_len = bytes,
 };

+assert(bytes < SIZE_MAX);
 qemu_iovec_init_external(&qiov, &iov, 1);

-ret = blk_co_preadv(bs, sector_num * BDRV_SECTOR_SIZE,
-qiov.size, &qiov, 0);
+ret = blk_co_preadv(bs, offset, qiov.size, &qiov, 0);
 if (ret < 0) {
 return ret;
 }

-ret = blk_co_pwritev(base, sector_num * BDRV_SECTOR_SIZE,
- qiov.size, &qiov, 0);
+ret = blk_co_pwritev(base, offset, qiov.size, &qiov, 0);
 if (ret < 0) {
 return ret;
 }
@@ -193,7 +192,9 @@ static void coroutine_fn commit_run(void *opaque)
 trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base, sector_num, n, buf);
+ret = commit_populate(s->top, s->base,
+  sector_num * BDRV_SECTOR_SIZE,
+  n * BDRV_SECTOR_SIZE, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
 }
 if (ret < 0) {
-- 
2.9.4

[Qemu-block] [PATCH v4 10/21] mirror: Update signature of mirror_clip_sectors()

2017-07-05 Thread Eric Blake

Rather than having a void function that modifies its input
in-place as the output, change the signature to reduce a layer
of indirection and return the result.

Suggested-by: John Snow 
Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v3-v4: no change
v2: new patch
---
 block/mirror.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2736d0b..70682b6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -176,12 +176,12 @@ static void mirror_read_complete(void *opaque, int ret)
 aio_context_release(blk_get_aio_context(s->common.blk));
 }

-static inline void mirror_clip_sectors(MirrorBlockJob *s,
-   int64_t sector_num,
-   int *nb_sectors)
+static inline int mirror_clip_sectors(MirrorBlockJob *s,
+  int64_t sector_num,
+  int nb_sectors)
 {
-*nb_sectors = MIN(*nb_sectors,
-  s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
+return MIN(nb_sectors,
+   s->bdev_length / BDRV_SECTOR_SIZE - sector_num);
 }

 /* Round sector_num and/or nb_sectors to target cluster if COW is needed, and
@@ -216,7 +216,8 @@ static int mirror_cow_align(MirrorBlockJob *s,
 }
 /* Clipping may result in align_nb_sectors unaligned to chunk boundary, but
  * that doesn't matter because it's already the end of source image. */
-mirror_clip_sectors(s, align_sector_num, &align_nb_sectors);
+align_nb_sectors = mirror_clip_sectors(s, align_sector_num,
+   align_nb_sectors);

 ret = align_sector_num + align_nb_sectors - (*sector_num + *nb_sectors);
 *sector_num = align_sector_num;
@@ -445,7 +446,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 return 0;
 }

-mirror_clip_sectors(s, sector_num, &io_sectors);
+io_sectors = mirror_clip_sectors(s, sector_num, io_sectors);
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
 io_sectors = mirror_do_read(s, sector_num, io_sectors);
-- 
2.9.4

[Qemu-block] [PATCH v4 03/21] stream: Switch stream_populate() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Start by converting an
internal function (no semantic change).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: no change
---
 block/stream.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 6cb3939..746d525 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -41,20 +41,20 @@ typedef struct StreamBlockJob {
 } StreamBlockJob;

 static int coroutine_fn stream_populate(BlockBackend *blk,
-int64_t sector_num, int nb_sectors,
+int64_t offset, uint64_t bytes,
 void *buf)
 {
 struct iovec iov = {
 .iov_base = buf,
-.iov_len  = nb_sectors * BDRV_SECTOR_SIZE,
+.iov_len  = bytes,
 };
 QEMUIOVector qiov;

+assert(bytes < SIZE_MAX);
 qemu_iovec_init_external(&qiov, &iov, 1);

 /* Copy-on-read the unallocated clusters */
-return blk_co_preadv(blk, sector_num * BDRV_SECTOR_SIZE, qiov.size, &qiov,
- BDRV_REQ_COPY_ON_READ);
+return blk_co_preadv(blk, offset, qiov.size, &qiov, BDRV_REQ_COPY_ON_READ);
 }

 typedef struct {
@@ -171,7 +171,8 @@ static void coroutine_fn stream_run(void *opaque)
 trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = stream_populate(blk, sector_num, n, buf);
+ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
+  n * BDRV_SECTOR_SIZE, buf);
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.9.4

[Qemu-block] [PATCH v4 02/21] trace: Show blockjob actions via bytes, not sectors

2017-07-05 Thread Eric Blake

Upcoming patches are going to switch to byte-based interfaces
instead of sector-based.  Even worse, trace_backup_do_cow_enter()
had a weird mix of cluster and sector indices.

The trace interface is low enough that there are no stability
guarantees, and therefore nothing wrong with changing our units,
even in cases like trace_backup_do_cow_skip() where we are not
changing the trace output.  So make the tracing uniformly use
bytes.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: improve commit message, no code change
---
 block/backup.c | 16 ++--
 block/commit.c |  3 ++-
 block/mirror.c | 26 +-
 block/stream.c |  3 ++-
 block/trace-events | 14 +++---
 5 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/block/backup.c b/block/backup.c
index 9ca1d8e..06431ac 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -102,6 +102,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 void *bounce_buffer = NULL;
 int ret = 0;
 int64_t sectors_per_cluster = cluster_size_sectors(job);
+int64_t bytes_per_cluster = sectors_per_cluster * BDRV_SECTOR_SIZE;
 int64_t start, end;
 int n;

@@ -110,18 +111,20 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 start = sector_num / sectors_per_cluster;
 end = DIV_ROUND_UP(sector_num + nb_sectors, sectors_per_cluster);

-trace_backup_do_cow_enter(job, start, sector_num, nb_sectors);
+trace_backup_do_cow_enter(job, start * bytes_per_cluster,
+  sector_num * BDRV_SECTOR_SIZE,
+  nb_sectors * BDRV_SECTOR_SIZE);

 wait_for_overlapping_requests(job, start, end);
 cow_request_begin(&cow_request, job, start, end);

 for (; start < end; start++) {
 if (test_bit(start, job->done_bitmap)) {
-trace_backup_do_cow_skip(job, start);
+trace_backup_do_cow_skip(job, start * bytes_per_cluster);
 continue; /* already copied */
 }

-trace_backup_do_cow_process(job, start);
+trace_backup_do_cow_process(job, start * bytes_per_cluster);

 n = MIN(sectors_per_cluster,
 job->common.len / BDRV_SECTOR_SIZE -
@@ -138,7 +141,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
 bounce_qiov.size, &bounce_qiov,
 is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);
 if (ret < 0) {
-trace_backup_do_cow_read_fail(job, start, ret);
+trace_backup_do_cow_read_fail(job, start * bytes_per_cluster, ret);
 if (error_is_read) {
 *error_is_read = true;
 }
@@ -154,7 +157,7 @@ static int coroutine_fn backup_do_cow(BackupBlockJob *job,
  job->compress ? BDRV_REQ_WRITE_COMPRESSED : 
0);
 }
 if (ret < 0) {
-trace_backup_do_cow_write_fail(job, start, ret);
+trace_backup_do_cow_write_fail(job, start * bytes_per_cluster, 
ret);
 if (error_is_read) {
 *error_is_read = false;
 }
@@ -177,7 +180,8 @@ out:

 cow_request_end(&cow_request);

-trace_backup_do_cow_return(job, sector_num, nb_sectors, ret);
+trace_backup_do_cow_return(job, sector_num * BDRV_SECTOR_SIZE,
+   nb_sectors * BDRV_SECTOR_SIZE, ret);

 qemu_co_rwlock_unlock(&job->flush_rwlock);

diff --git a/block/commit.c b/block/commit.c
index 6993994..4cda7f2 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -190,7 +190,8 @@ static void coroutine_fn commit_run(void *opaque)
   COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
   &n);
 copy = (ret == 1);
-trace_commit_one_iteration(s, sector_num, n, ret);
+trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
+   n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
 ret = commit_populate(s->top, s->base, sector_num, n, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
diff --git a/block/mirror.c b/block/mirror.c
index eb27efc..b4dfe95 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -103,7 +103,8 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 int64_t chunk_num;
 int i, nb_chunks, sectors_per_chunk;

-trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors, ret);
+trace_mirror_iteration_done(s, op->sector_num * BDRV_SECTOR_SIZE,
+op->nb_sectors * BDRV_SECTOR_SIZE, ret);

 s->in_flight--;
 s->sectors_in_flight -= op->nb_sectors;
@@ -268,7 +269,8 @@ static int mirror_do_read(MirrorBlockJob *s, int64_t 
sector_num,
 nb_chunks = DIV_ROUND_UP(nb_sectors, sectors_per_chunk);

 while (s->buf_free_count < nb_chunks) {
-trace_mirror_yield_in_flight(s, sector_num, s->in_flig

[Qemu-block] [PATCH v4 07/21] commit: Switch commit_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of committing to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 

---
v2: no change
---
 block/commit.c | 16 ++--
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/block/commit.c b/block/commit.c
index 6f67d78..c3a7bca 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -143,17 +143,16 @@ static void coroutine_fn commit_run(void *opaque)
 {
 CommitBlockJob *s = opaque;
 CommitCompleteData *data;
-int64_t sector_num, end;
+int64_t offset;
 uint64_t delay_ns = 0;
 int ret = 0;
-int n = 0;
+int n = 0; /* sectors */
 void *buf = NULL;
 int bytes_written = 0;
 int64_t base_len;

 ret = s->common.len = blk_getlength(s->top);

-
 if (s->common.len < 0) {
 goto out;
 }
@@ -170,10 +169,9 @@ static void coroutine_fn commit_run(void *opaque)
 }
 }

-end = s->common.len >> BDRV_SECTOR_BITS;
 buf = blk_blockalign(s->top, COMMIT_BUFFER_SIZE);

-for (sector_num = 0; sector_num < end; sector_num += n) {
+for (offset = 0; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -185,15 +183,13 @@ static void coroutine_fn commit_run(void *opaque)
 }
 /* Copy if allocated above the base */
 ret = bdrv_is_allocated_above(blk_bs(s->top), blk_bs(s->base),
-  sector_num,
+  offset / BDRV_SECTOR_SIZE,
   COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
   &n);
 copy = (ret == 1);
-trace_commit_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-   n * BDRV_SECTOR_SIZE, ret);
+trace_commit_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = commit_populate(s->top, s->base,
-  sector_num * BDRV_SECTOR_SIZE,
+ret = commit_populate(s->top, s->base, offset,
   n * BDRV_SECTOR_SIZE, buf);
 bytes_written += n * BDRV_SECTOR_SIZE;
 }
-- 
2.9.4

[Qemu-block] [PATCH v4 05/21] stream: Switch stream_run() to byte-based

2017-07-05 Thread Eric Blake

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the internal
loop iteration of streaming to track by bytes instead of sectors
(although we are still guaranteed that we iterate by steps that
are sector-aligned).

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 

---
v4: avoid reached_end change by rebasing to earlier changes [Kevin], R-b kept
v2-v3: no change
---
 block/stream.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 12f1659..e3dd2ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -107,12 +107,11 @@ static void coroutine_fn stream_run(void *opaque)
 BlockBackend *blk = s->common.blk;
 BlockDriverState *bs = blk_bs(blk);
 BlockDriverState *base = s->base;
-int64_t sector_num = 0;
-int64_t end = -1;
+int64_t offset = 0;
 uint64_t delay_ns = 0;
 int error = 0;
 int ret = 0;
-int n = 0;
+int n = 0; /* sectors */
 void *buf;

 if (!bs->backing) {
@@ -125,7 +124,6 @@ static void coroutine_fn stream_run(void *opaque)
 goto out;
 }

-end = s->common.len >> BDRV_SECTOR_BITS;
 buf = qemu_blockalign(bs, STREAM_BUFFER_SIZE);

 /* Turn on copy-on-read for the whole block device so that guest read
@@ -137,7 +135,7 @@ static void coroutine_fn stream_run(void *opaque)
 bdrv_enable_copy_on_read(bs);
 }

-for (sector_num = 0; sector_num < end; sector_num += n) {
+for ( ; offset < s->common.len; offset += n * BDRV_SECTOR_SIZE) {
 bool copy;

 /* Note that even when no rate limit is applied we need to yield
@@ -150,28 +148,26 @@ static void coroutine_fn stream_run(void *opaque)

 copy = false;

-ret = bdrv_is_allocated(bs, sector_num,
+ret = bdrv_is_allocated(bs, offset / BDRV_SECTOR_SIZE,
 STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
 if (ret == 1) {
 /* Allocated in the top, no need to copy.  */
 } else if (ret >= 0) {
 /* Copy if allocated in the intermediate images.  Limit to the
- * known-unallocated area [sector_num, sector_num+n).  */
+ * known-unallocated area [offset, offset+n*BDRV_SECTOR_SIZE).  */
 ret = bdrv_is_allocated_above(backing_bs(bs), base,
-  sector_num, n, &n);
+  offset / BDRV_SECTOR_SIZE, n, &n);

 /* Finish early if end of backing file has been reached */
 if (ret == 0 && n == 0) {
-n = end - sector_num;
+n = (s->common.len - offset) / BDRV_SECTOR_SIZE;
 }

 copy = (ret == 1);
 }
-trace_stream_one_iteration(s, sector_num * BDRV_SECTOR_SIZE,
-   n * BDRV_SECTOR_SIZE, ret);
+trace_stream_one_iteration(s, offset, n * BDRV_SECTOR_SIZE, ret);
 if (copy) {
-ret = stream_populate(blk, sector_num * BDRV_SECTOR_SIZE,
-  n * BDRV_SECTOR_SIZE, buf);
+ret = stream_populate(blk, offset, n * BDRV_SECTOR_SIZE, buf);
 }
 if (ret < 0) {
 BlockErrorAction action =
-- 
2.9.4

[Qemu-block] [PATCH v4 00/21] make bdrv_is_allocated[_above] byte-based

2017-07-05 Thread Eric Blake

There are patches floating around to add NBD_CMD_BLOCK_STATUS,
but NBD wants to report status on byte granularity (even if the
reporting will probably be naturally aligned to sectors or even
much higher levels).  I've therefore started the task of
converting our block status code to report at a byte granularity
rather than sectors.

The overall conversion currently looks like:
part 1: bdrv_is_allocated (this series, v3 was at [1])
part 2: dirty-bitmap (v4 is posted [2]; needs reviews)
part 3: bdrv_get_block_status (v2 is posted [3] and is mostly reviewed)
part 4: upcoming series, for .bdrv_co_block_status (second half of v1 [4])

Available as a tag at:
git fetch git://repo.or.cz/qemu/ericb.git nbd-byte-allocated-v4

Depends on Max's block branch (which in turn includes Kevin's)

[1] https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg06077.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00269.html
[3] https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg00427.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-04/msg02642.html

Changes since v3:
new patch 4/21 to avoid a semantic change in 5/21 [Kevin]
new assertion in 8/21 [Kevin]
whitespace rebase churn in vvfat in 19/21

Most patches are now reviewed multiple times, although Kevin has not yet
reviewed the second half of the series.

001/21:[] [--] 'blockjob: Track job ratelimits via bytes, not sectors'
002/21:[] [--] 'trace: Show blockjob actions via bytes, not sectors'
003/21:[] [--] 'stream: Switch stream_populate() to byte-based'
004/21:[down] 'stream: Drop reached_end for stream_complete()'
005/21:[0002] [FC] 'stream: Switch stream_run() to byte-based'
006/21:[] [--] 'commit: Switch commit_populate() to byte-based'
007/21:[] [--] 'commit: Switch commit_run() to byte-based'
008/21:[0005] [FC] 'mirror: Switch MirrorBlockJob to byte-based'
009/21:[] [--] 'mirror: Switch mirror_do_zero_or_discard() to byte-based'
010/21:[] [--] 'mirror: Update signature of mirror_clip_sectors()'
011/21:[] [--] 'mirror: Switch mirror_cow_align() to byte-based'
012/21:[] [--] 'mirror: Switch mirror_do_read() to byte-based'
013/21:[] [--] 'mirror: Switch mirror_iteration() to byte-based'
014/21:[] [--] 'block: Drop unused bdrv_round_sectors_to_clusters()'
015/21:[] [--] 'backup: Switch BackupBlockJob to byte-based'
016/21:[] [--] 'backup: Switch block_backup.h to byte-based'
017/21:[] [--] 'backup: Switch backup_do_cow() to byte-based'
018/21:[] [--] 'backup: Switch backup_run() to byte-based'
019/21:[0004] [FC] 'block: Make bdrv_is_allocated() byte-based'
020/21:[] [--] 'block: Minimize raw use of bds->total_sectors'
021/21:[] [--] 'block: Make bdrv_is_allocated_above() byte-based'

Eric Blake (21):
  blockjob: Track job ratelimits via bytes, not sectors
  trace: Show blockjob actions via bytes, not sectors
  stream: Switch stream_populate() to byte-based
  stream: Drop reached_end for stream_complete()
  stream: Switch stream_run() to byte-based
  commit: Switch commit_populate() to byte-based
  commit: Switch commit_run() to byte-based
  mirror: Switch MirrorBlockJob to byte-based
  mirror: Switch mirror_do_zero_or_discard() to byte-based
  mirror: Update signature of mirror_clip_sectors()
  mirror: Switch mirror_cow_align() to byte-based
  mirror: Switch mirror_do_read() to byte-based
  mirror: Switch mirror_iteration() to byte-based
  block: Drop unused bdrv_round_sectors_to_clusters()
  backup: Switch BackupBlockJob to byte-based
  backup: Switch block_backup.h to byte-based
  backup: Switch backup_do_cow() to byte-based
  backup: Switch backup_run() to byte-based
  block: Make bdrv_is_allocated() byte-based
  block: Minimize raw use of bds->total_sectors
  block: Make bdrv_is_allocated_above() byte-based

 include/block/block.h|  10 +-
 include/block/block_backup.h |  11 +-
 include/qemu/ratelimit.h |   3 +-
 block/backup.c   | 130 --
 block/commit.c   |  54 
 block/io.c   |  92 +++--
 block/mirror.c   | 305 ++-
 block/replication.c  |  29 ++--
 block/stream.c   |  37 +++---
 block/vvfat.c|  34 +++--
 migration/block.c|   9 +-
 qemu-img.c   |  15 ++-
 qemu-io-cmds.c   |  70 +-
 block/trace-events   |  14 +-
 14 files changed, 400 insertions(+), 413 deletions(-)

-- 
2.9.4

[Qemu-block] [PATCH v4 04/21] stream: Drop reached_end for stream_complete()

2017-07-05 Thread Eric Blake

stream_complete() skips the work of rewriting the backing file if
the job was cancelled, if data->reached_end is false, or if there
was an error detected (non-zero data->ret) during the streaming.
But note that in stream_run(), data->reached_end is only set if the
loop ran to completion, and data->ret is only 0 in two cases:
either the loop ran to completion (possibly by cancellation, but
stream_complete checks for that), or we took an early goto out
because there is no bs->backing.  Thus, we can preserve the same
semantics without the use of reached_end, by merely checking for
bs->backing (and logically, if there was no backing file, streaming
is a no-op, so there is no backing file to rewrite).

Suggested-by: Kevin Wolf 
Signed-off-by: Eric Blake 

---
v4: new patch
---
 block/stream.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/stream.c b/block/stream.c
index 746d525..12f1659 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -59,7 +59,6 @@ static int coroutine_fn stream_populate(BlockBackend *blk,

 typedef struct {
 int ret;
-bool reached_end;
 } StreamCompleteData;

 static void stream_complete(BlockJob *job, void *opaque)
@@ -70,7 +69,7 @@ static void stream_complete(BlockJob *job, void *opaque)
 BlockDriverState *base = s->base;
 Error *local_err = NULL;

-if (!block_job_is_cancelled(&s->common) && data->reached_end &&
+if (!block_job_is_cancelled(&s->common) && bs->backing &&
 data->ret == 0) {
 const char *base_id = NULL, *base_fmt = NULL;
 if (base) {
@@ -211,7 +210,6 @@ out:
 /* Modify backing chain and close BDSes in main loop */
 data = g_malloc(sizeof(*data));
 data->ret = ret;
-data->reached_end = sector_num == end;
 block_job_defer_to_main_loop(&s->common, stream_complete, data);
 }

-- 
2.9.4

[Qemu-block] [PATCH v4 01/21] blockjob: Track job ratelimits via bytes, not sectors

2017-07-05 Thread Eric Blake

The user interface specifies job rate limits in bytes/second.
It's pointless to have our internal representation track things
in sectors/second, particularly since we want to move away from
sector-based interfaces.

Fix up a doc typo found while verifying that the ratelimit
code handles the scaling difference.

Repetition of expressions like 'n * BDRV_SECTOR_SIZE' will be
cleaned up later when functions are converted to iterate over
images by bytes rather than by sectors.

Signed-off-by: Eric Blake 
Reviewed-by: John Snow 
Reviewed-by: Jeff Cody 
Reviewed-by: Kevin Wolf 
---
v2: adjust commit message based on review; no code change
---
 include/qemu/ratelimit.h |  3 ++-
 block/backup.c   |  5 +++--
 block/commit.c   |  5 +++--
 block/mirror.c   | 13 +++--
 block/stream.c   |  5 +++--
 5 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/include/qemu/ratelimit.h b/include/qemu/ratelimit.h
index 8da1232..8dece48 100644
--- a/include/qemu/ratelimit.h
+++ b/include/qemu/ratelimit.h
@@ -24,7 +24,8 @@ typedef struct {

 /** Calculate and return delay for next request in ns
  *
- * Record that we sent @p n data units. If we may send more data units
+ * Record that we sent @n data units (where @n matches the scale chosen
+ * during ratelimit_set_speed). If we may send more data units
  * in the current time slice, return 0 (i.e. no delay). Otherwise
  * return the amount of time (in ns) until the start of the next time
  * slice that will permit sending the next chunk of data.
diff --git a/block/backup.c b/block/backup.c
index 5387fbd..9ca1d8e 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -208,7 +208,7 @@ static void backup_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 error_setg(errp, QERR_INVALID_PARAMETER, "speed");
 return;
 }
-ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -359,7 +359,8 @@ static bool coroutine_fn yield_and_check(BackupBlockJob 
*job)
  */
 if (job->common.speed) {
 uint64_t delay_ns = ratelimit_calculate_delay(&job->limit,
-  job->sectors_read);
+  job->sectors_read *
+  BDRV_SECTOR_SIZE);
 job->sectors_read = 0;
 block_job_sleep_ns(&job->common, QEMU_CLOCK_REALTIME, delay_ns);
 } else {
diff --git a/block/commit.c b/block/commit.c
index 524bd54..6993994 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -209,7 +209,8 @@ static void coroutine_fn commit_run(void *opaque)
 s->common.offset += n * BDRV_SECTOR_SIZE;

 if (copy && s->common.speed) {
-delay_ns = ratelimit_calculate_delay(&s->limit, n);
+delay_ns = ratelimit_calculate_delay(&s->limit,
+ n * BDRV_SECTOR_SIZE);
 }
 }

@@ -231,7 +232,7 @@ static void commit_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 error_setg(errp, QERR_INVALID_PARAMETER, "speed");
 return;
 }
-ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+ratelimit_set_speed(&s->limit, speed, SLICE_TIME);
 }

 static const BlockJobDriver commit_job_driver = {
diff --git a/block/mirror.c b/block/mirror.c
index 61a862d..eb27efc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -396,7 +396,8 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 bitmap_set(s->in_flight_bitmap, sector_num / sectors_per_chunk, nb_chunks);
 while (nb_chunks > 0 && sector_num < end) {
 int64_t ret;
-int io_sectors, io_sectors_acct;
+int io_sectors;
+int64_t io_bytes_acct;
 BlockDriverState *file;
 enum MirrorMethod {
 MIRROR_METHOD_COPY,
@@ -444,16 +445,16 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 switch (mirror_method) {
 case MIRROR_METHOD_COPY:
 io_sectors = mirror_do_read(s, sector_num, io_sectors);
-io_sectors_acct = io_sectors;
+io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
 break;
 case MIRROR_METHOD_ZERO:
 case MIRROR_METHOD_DISCARD:
 mirror_do_zero_or_discard(s, sector_num, io_sectors,
   mirror_method == MIRROR_METHOD_DISCARD);
 if (write_zeroes_ok) {
-io_sectors_acct = 0;
+io_bytes_acct = 0;
 } else {
-io_sectors_acct = io_sectors;
+io_bytes_acct = io_sectors * BDRV_SECTOR_SIZE;
 }
 break;
 default:
@@ -463,7 +464,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 sector_num += io_sectors;
 nb_chunks -= DIV_R

Re: [Qemu-block] [PATCH v3 07/20] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread Eric Blake

On 07/05/2017 06:42 AM, Kevin Wolf wrote:
> Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Continue by converting an
>> internal structure (no semantic change), and all references to the
>> buffer size.
>>
>> [checkpatch has a false positive on use of MIN() in this patch]
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: John Snow 
> 
> I wouldn't mind an assertion that granularity is a multiple of
> BDRV_SECTOR_SIZE, along with a comment that explains that this is
> required so that we avoid rounding problems when dealing with the bitmap
> functions.

That goes away later when series two converts the bitmap functions to be
byte-based, but you're right that the intermediate state should be
easier to follow.

> 
> blockdev_mirror_common() does already check this, but it feels like it's
> a bit far away from where the actual problem would happen in the mirror
> job code.

Indeed.

> 
>> @@ -768,17 +765,17 @@ static void coroutine_fn mirror_run(void *opaque)
>>   * the destination do COW.  Instead, we copy sectors around the
>>   * dirty data if needed.  We need a bitmap to do that.
>>   */
>> +s->target_cluster_size = BDRV_SECTOR_SIZE;
>>  bdrv_get_backing_filename(target_bs, backing_filename,
>>sizeof(backing_filename));
>>  if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
>> -target_cluster_size = bdi.cluster_size;
>> +s->target_cluster_size = bdi.cluster_size;
>>  }
> 
> Why have the unrelated bdrv_get_backing_filename() between the two
> assignments of s->target_cluster_size? Or actually, wouldn't it be
> even easier to read with an else branch?
> 
> if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
> s->target_cluster_size = bdi.cluster_size;
> } else {
> s->target_cluster_size = BDRV_SECTOR_SIZE;
> }

Yes, that looks nicer.

> 
> None of these comments are critical, so anyway:
> 
> Reviewed-by: Kevin Wolf 

I'm respinning v4 anyways, so I'll make the change (and while it is
small, it's still enough that I'll drop R-b).

> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4 5/5] tests: Add check-qobject for equality tests

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> Add a new test file (check-qobject.c) for unit tests that concern
> QObjects as a whole.
> 
> Its only purpose for now is to test the qobject_is_equal() function.
> 
> Signed-off-by: Max Reitz 
> ---
>  tests/Makefile.include |   4 +-
>  qobject/qnum.c |  16 +-
>  tests/check-qobject.c  | 404 
> +
>  3 files changed, 417 insertions(+), 7 deletions(-)
>  create mode 100644 tests/check-qobject.c
> 

> +++ b/qobject/qnum.c
> @@ -217,12 +217,16 @@ QNum *qobject_to_qnum(const QObject *obj)
>  /**
>   * qnum_is_equal(): Test whether the two QNums are equal
>   *
> - * Negative integers are never considered equal to unsigned integers.
> - * Doubles are only considered equal to integers if their fractional
> - * part is zero and their integral part is exactly equal to the
> - * integer.  Because doubles have limited precision, there are
> - * therefore integers which do not have an equal double (e.g.
> - * INT64_MAX).
> + * This comparison is done independently of the internal
> + * representation.  Any two numbers are considered equal if they are
> + * mathmatically equal, that means:

s/mathmatically/mathematically/

> + * - Negative integers are never considered equal to unsigned
> + *   integers.
> + * - Floating point values are only considered equal to integers if
> + *   their fractional part is zero and their integral part is exactly
> + *   equal to the integer.  Because doubles have limited precision,
> + *   there are therefore integers which do not have an equal floating
> + *   point value (e.g. INT64_MAX).
>   */

> +static void qobject_is_equal_num_test(void)
> +{
> +QNum *u0, *i0, *d0, *d0p25, *dnan, *um42, *im42, *dm42;

Given my comments on 2/5, do you want a dinf?

> +QNum *umax, *imax, *umax_exact, *umax_exact_p1;
> +QNum *dumax, *dimax, *dumax_exact, *dumax_exact_p1;
> +QString *s0, *s_empty;
> +QBool *bfalse;
> +
> +u0 = qnum_from_uint(0u);
> +i0 = qnum_from_int(0);
> +d0 = qnum_from_double(0.0);
> +d0p25 = qnum_from_double(0.25);
> +dnan = qnum_from_double(0.0 / 0.0);

Are there compilers that complain if we open-code division by zero
instead of using NAN from  (similarly, if you test infinity, I'd
use the INFINITY macro instead of an open-coded computation)

> +um42 = qnum_from_uint((uint64_t)-42);
> +im42 = qnum_from_int(-42);
> +dm42 = qnum_from_int(-42.0);
> +
> +/* 2^64 - 1: Not exactly representable as a double (needs 64 bits
> + * of precision, but double only has 53).  The double equivalent
> + * may be either 2^64 or 2^64 - 2^11. */
> +umax = qnum_from_uint(UINT64_MAX);
> +
> +/* 2^63 - 1: Not exactly representable as a double (needs 63 bits
> + * of precision, but double only has 53).  The double equivalent
> + * may be either 2^63 or 2^63 - 2^10. */
> +imax = qnum_from_int(INT64_MAX);
> +/* 2^64 - 2^11: Exactly representable as a double (the least
> + * significant 11 bits are set to 0, so we only need the 53 bits
> + * of precision double offers).  This is the maximum value which
> + * is exactly representable both as a uint64_t and a double. */
> +umax_exact = qnum_from_uint(UINT64_MAX - 0x7ff);
> +
> +/* 2^64 - 2^11 + 1: Not exactly representable as a double (needs
> + * 64 bits again), but whereas (double)UINT64_MAX may be rounded
> + * up to 2^64, this will most likely be rounded down to
> + * 2^64 - 2^11. */
> +umax_exact_p1 = qnum_from_uint(UINT64_MAX - 0x7ff + 1);

Nice.

> +
> +dumax = qnum_from_double((double)qnum_get_uint(umax));
> +dimax = qnum_from_double((double)qnum_get_int(imax));
> +dumax_exact = qnum_from_double((double)qnum_get_uint(umax_exact));
> +dumax_exact_p1 = qnum_from_double((double)qnum_get_uint(umax_exact_p1));

Compiler-dependent what values (some) of these doubles hold.

> +
> +s0 = qstring_from_str("0");
> +s_empty = qstring_new();
> +bfalse = qbool_from_bool(false);
> +
> +/* The internal representation should not matter, as long as the
> + * precision is sufficient */
> +test_equality(true, u0, i0, d0);
> +
> +/* No automatic type conversion */
> +test_equality(false, u0, s0, s_empty, bfalse, qnull(), NULL);
> +test_equality(false, i0, s0, s_empty, bfalse, qnull(), NULL);
> +test_equality(false, d0, s0, s_empty, bfalse, qnull(), NULL);
> +
> +/* Do not round */
> +test_equality(false, u0, d0p25);
> +test_equality(false, i0, d0p25);
> +
> +/* Do not assume any object is equal to itself -- note however
> + * that NaN cannot occur in a JSON object anyway. */
> +g_assert(qobject_is_equal(QOBJECT(dnan), QOBJECT(dnan)) == false);

If you test infinity, that also cannot occur in JSON objects.

> +
> +/* No unsigned overflow */
> +test_equality(false, um42, im42);
> +test_equality(false, um42, dm42);
> +test_equality(true, im42, dm42);
> +
>

Re: [Qemu-block] [PATCH v4 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> Currently, bdrv_reopen_prepare() assumes that all BDS options are
> strings. However, this is not the case if the BDS has been created
> through the json: pseudo-protocol or blockdev-add.
> 
> Note that the user-invokable reopen command is an HMP command, so you
> can only specify strings there. Therefore, specifying a non-string
> option with the "same" value as it was when originally created will now
> return an error because the values are supposedly similar (and there is
> no way for the user to circumvent this but to just not specify the
> option again -- however, this is still strictly better than just
> crashing).
> 
> Signed-off-by: Max Reitz 
> ---
>  block.c | 29 ++---
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v4 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 02:04 PM, Max Reitz wrote:
> This generic function (along with its implementations for different
> types) determines whether two QObjects are equal.
> 
> Signed-off-by: Max Reitz 
> ---
> Markus also proposed just reporting two values as unequal if they have a
> different internal representation (i.e. a different QNum kind).
> 
> I don't like this very much, because I feel like QInt and QFloat have
> been unified for a reason: Outside of these classes, nobody should care
> about the exact internal representation.  In JSON, there is no
> difference anyway.  We probably want to use integers as long as we can
> and doubles whenever we cannot.
> 
> In any case, I feel like the class should hide the different internal
> representations from the user.  This necessitates being able to compare
> floating point values against integers.  Since apparently the main use
> of QObject is to parse and emit JSON (and represent such objects
> internally), we also have to agree that JSON doesn't make a difference:
> 42 is just the same as 42.0.
> 
> Finally, I think it's rather pointless not to consider 42u and 42 the
> same value.  But since unsigned/signed are two different kinds of QNums
> already, we cannot consider them equal without considering 42.0 equal,
> too.
> 
> Because of this, I have decided to continue to compare QNum values even
> if they are of a different kind.

This explanation may deserve to be in the commit log proper.

>  /**
> + * qnum_is_equal(): Test whether the two QNums are equal
> + *
> + * Negative integers are never considered equal to unsigned integers.
> + * Doubles are only considered equal to integers if their fractional
> + * part is zero and their integral part is exactly equal to the
> + * integer.  Because doubles have limited precision, there are
> + * therefore integers which do not have an equal double (e.g.
> + * INT64_MAX).
> + */
> +bool qnum_is_equal(const QObject *x, const QObject *y)
> +{
> +QNum *num_x = qobject_to_qnum(x);
> +QNum *num_y = qobject_to_qnum(y);
> +double integral_part; /* Needed for the modf() calls below */
> +
> +switch (num_x->kind) {
> +case QNUM_I64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +/* Comparison in native int64_t type */
> +return num_x->u.i64 == num_y->u.i64;
> +case QNUM_U64:
> +/* Implicit conversion of x to uin64_t, so we have to
> + * check its sign before */
> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Comparing x to y in double (which the implicit
> + * conversion would do) is not exact.  So after having
> + * checked that y is an integer in the int64_t range
> + * (i.e. that it is within bounds and its fractional part
> + * is zero), compare both as integers. */
> +return num_y->u.dbl >= -0x1p63 && num_y->u.dbl < 0x1p63 &&
> +modf(num_y->u.dbl, &integral_part) == 0.0 &&

'man modf': given modf(x, &iptr), if x is a NaN, a Nan is returned
(good, NaN, is never equal to any integer value). But if x is positive
infinity, +0 is returned...

> +num_x->u.i64 == (int64_t)num_y->u.dbl;

...and *iptr is set to positive infinity.  You are now converting
infinity to int64_t (whether via num_y->u.dbl or via &integral_part),
which falls in the unspecified portion of C99 (your quotes from 6.3.1.4
mentioned converting a finite value of real to integer, and say nothing
about converting NaN or infinity to integer).

Adding an 'isfinite(num_y->u.dbl) &&' to the expression would cover your
bases (or even 'isfinite(integral_part)', if we are worried about a
static checker complaining that we assign but never read integral_part).

> +}
> +abort();
> +case QNUM_U64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +/* Comparison in native uint64_t type */
> +return num_x->u.u64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Comparing x to y in double (which the implicit
> + * conversion would do) is not exact.  So after having
> + * checked that y is an integer in the uint64_t range
> + * (i.e. that it is within bounds and its fractional part
> + * is zero), compare both as integers. */
> +return num_y->u.dbl >= 0 && num_y->u.dbl < 0x1p64 &&
> +modf(num_y->u.dbl, &integral_part) == 0.0 &&
> +num_x->u.u64 == (uint64_t)num_y->u.dbl;

And again.

With that addition,
Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH v4 5/5] tests: Add check-qobject for equality tests

2017-07-05 Thread Max Reitz

Add a new test file (check-qobject.c) for unit tests that concern
QObjects as a whole.

Its only purpose for now is to test the qobject_is_equal() function.

Signed-off-by: Max Reitz 
---
 tests/Makefile.include |   4 +-
 qobject/qnum.c |  16 +-
 tests/check-qobject.c  | 404 +
 3 files changed, 417 insertions(+), 7 deletions(-)
 create mode 100644 tests/check-qobject.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 42e17e2..07b130c 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -18,6 +18,7 @@ check-unit-y += tests/check-qlist$(EXESUF)
 gcov-files-check-qlist-y = qobject/qlist.c
 check-unit-y += tests/check-qnull$(EXESUF)
 gcov-files-check-qnull-y = qobject/qnull.c
+check-unit-y += tests/check-qobject$(EXESUF)
 check-unit-y += tests/check-qjson$(EXESUF)
 gcov-files-check-qjson-y = qobject/qjson.c
 check-unit-y += tests/test-qobject-output-visitor$(EXESUF)
@@ -508,7 +509,7 @@ GENERATED_FILES += tests/test-qapi-types.h 
tests/test-qapi-visit.h \
tests/test-qmp-introspect.h
 
 test-obj-y = tests/check-qnum.o tests/check-qstring.o tests/check-qdict.o \
-   tests/check-qlist.o tests/check-qnull.o \
+   tests/check-qlist.o tests/check-qnull.o tests/check-qobject.o \
tests/check-qjson.o \
tests/test-coroutine.o tests/test-string-output-visitor.o \
tests/test-string-input-visitor.o tests/test-qobject-output-visitor.o \
@@ -541,6 +542,7 @@ tests/check-qstring$(EXESUF): tests/check-qstring.o 
$(test-util-obj-y)
 tests/check-qdict$(EXESUF): tests/check-qdict.o $(test-util-obj-y)
 tests/check-qlist$(EXESUF): tests/check-qlist.o $(test-util-obj-y)
 tests/check-qnull$(EXESUF): tests/check-qnull.o $(test-util-obj-y)
+tests/check-qobject$(EXESUF): tests/check-qobject.o $(test-util-obj-y)
 tests/check-qjson$(EXESUF): tests/check-qjson.o $(test-util-obj-y)
 tests/check-qom-interface$(EXESUF): tests/check-qom-interface.o 
$(test-qom-obj-y)
 tests/check-qom-proplist$(EXESUF): tests/check-qom-proplist.o $(test-qom-obj-y)
diff --git a/qobject/qnum.c b/qobject/qnum.c
index 96c348c..3d029f6 100644
--- a/qobject/qnum.c
+++ b/qobject/qnum.c
@@ -217,12 +217,16 @@ QNum *qobject_to_qnum(const QObject *obj)
 /**
  * qnum_is_equal(): Test whether the two QNums are equal
  *
- * Negative integers are never considered equal to unsigned integers.
- * Doubles are only considered equal to integers if their fractional
- * part is zero and their integral part is exactly equal to the
- * integer.  Because doubles have limited precision, there are
- * therefore integers which do not have an equal double (e.g.
- * INT64_MAX).
+ * This comparison is done independently of the internal
+ * representation.  Any two numbers are considered equal if they are
+ * mathmatically equal, that means:
+ * - Negative integers are never considered equal to unsigned
+ *   integers.
+ * - Floating point values are only considered equal to integers if
+ *   their fractional part is zero and their integral part is exactly
+ *   equal to the integer.  Because doubles have limited precision,
+ *   there are therefore integers which do not have an equal floating
+ *   point value (e.g. INT64_MAX).
  */
 bool qnum_is_equal(const QObject *x, const QObject *y)
 {
diff --git a/tests/check-qobject.c b/tests/check-qobject.c
new file mode 100644
index 000..fd964bf
--- /dev/null
+++ b/tests/check-qobject.c
@@ -0,0 +1,404 @@
+/*
+ * Generic QObject unit-tests.
+ *
+ * Copyright (C) 2017 Red Hat Inc.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ */
+#include "qemu/osdep.h"
+
+#include "qapi/qmp/types.h"
+#include "qemu-common.h"
+
+/* Marks the end of the test_equality() argument list.
+ * We cannot use NULL there because that is a valid argument. */
+static QObject _test_equality_end_of_arguments;
+
+/**
+ * Test whether all variadic QObject *arguments are equal (@expected
+ * is true) or whether they are all not equal (@expected is false).
+ * Every QObject is tested to be equal to itself (to test
+ * reflexivity), all tests are done both ways (to test symmetry), and
+ * transitivity is not assumed but checked (each object is compared to
+ * every other one).
+ *
+ * Note that qobject_is_equal() is not really an equivalence relation,
+ * so this function may not be used for all objects (reflexivity is
+ * not guaranteed, e.g. in the case of a QNum containing NaN).
+ */
+static void do_test_equality(bool expected, ...)
+{
+va_list ap_count, ap_extract;
+QObject **args;
+int arg_count = 0;
+int i, j;
+
+va_start(ap_count, expected);
+va_copy(ap_extract, ap_count);
+while (va_arg(ap_count, QObject *) != &_test_equality_end_of_arguments) {
+arg_count++;
+}
+va_end(ap_count);
+
+args = g_new(QObject *, arg_count);
+for (i = 0; i < arg_count; i++) {
+args[i] = va_arg(ap_extract, QObject *)

[Qemu-block] [PATCH v4 0/5] block: Don't compare strings in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

bdrv_reopen_prepare() assumes that all BDS options are strings, which is
not necessarily correct. This series introduces a new qobject_is_equal()
function which can be used to test whether any options have changed,
independently of their type.


v4:
- Patch 1: Kept, because Markus gave his R-b already...
- Patch 2: Check that doubles match integers exactly, not just after the
   integer has been (lossily) converted to a double
   [Markus/Eric]
- Patch 3: Winged comment and s/simply can not/can simply omit/ (:-()
   [Markus/Eric]
- Patch 5:
  - Mention (at least one) reason for the non-reflexivity of
qobject_is_equal() (that being NaN) [Eric]
  - Mention that NaN cannot occur in JSON [Eric]
  - Tests for integer values that cannot be exactly represented as a
double


git-backport-diff against v3:

Key:
[] : patches are identical
[] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/5:[] [--] 'qapi/qnull: Add own header'
002/5:[0032] [FC] 'qapi: Add qobject_is_equal()'
003/5:[0022] [FC] 'block: qobject_is_equal() in bdrv_reopen_prepare()'
004/5:[] [--] 'iotests: Add test for non-string option reopening'
005/5:[0114] [FC] 'tests: Add check-qobject for equality tests'


Max Reitz (5):
  qapi/qnull: Add own header
  qapi: Add qobject_is_equal()
  block: qobject_is_equal() in bdrv_reopen_prepare()
  iotests: Add test for non-string option reopening
  tests: Add check-qobject for equality tests

 tests/Makefile.include |   4 +-
 include/qapi/qmp/qbool.h   |   1 +
 include/qapi/qmp/qdict.h   |   1 +
 include/qapi/qmp/qlist.h   |   1 +
 include/qapi/qmp/qnull.h   |  28 
 include/qapi/qmp/qnum.h|   1 +
 include/qapi/qmp/qobject.h |  17 +-
 include/qapi/qmp/qstring.h |   1 +
 include/qapi/qmp/types.h   |   1 +
 block.c|  29 ++--
 qobject/qbool.c|   8 +
 qobject/qdict.c|  29 
 qobject/qlist.c|  32 
 qobject/qnull.c|  11 +-
 qobject/qnum.c |  77 +
 qobject/qobject.c  |  29 
 qobject/qstring.c  |   9 +
 tests/check-qnull.c|   2 +-
 tests/check-qobject.c  | 404 +
 tests/qemu-iotests/133 |   9 +
 tests/qemu-iotests/133.out |   5 +
 21 files changed, 677 insertions(+), 22 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h
 create mode 100644 tests/check-qobject.c

-- 
2.9.4

[Qemu-block] [PATCH v4 4/5] iotests: Add test for non-string option reopening

2017-07-05 Thread Max Reitz

Reviewed-by: Kevin Wolf 
Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/133 | 9 +
 tests/qemu-iotests/133.out | 5 +
 2 files changed, 14 insertions(+)

diff --git a/tests/qemu-iotests/133 b/tests/qemu-iotests/133
index 9d35a6a..af6b3e1 100755
--- a/tests/qemu-iotests/133
+++ b/tests/qemu-iotests/133
@@ -83,6 +83,15 @@ $QEMU_IO -c 'reopen -o driver=qcow2' $TEST_IMG
 $QEMU_IO -c 'reopen -o file.driver=file' $TEST_IMG
 $QEMU_IO -c 'reopen -o backing.driver=qcow2' $TEST_IMG
 
+echo
+echo "=== Check that reopening works with non-string options ==="
+echo
+
+# Using the json: pseudo-protocol we can create non-string options
+# (Invoke 'info' just so we get some output afterwards)
+IMGOPTSSYNTAX=false $QEMU_IO -f null-co -c 'reopen' -c 'info' \
+"json:{'driver': 'null-co', 'size': 65536}"
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/133.out b/tests/qemu-iotests/133.out
index cc86b94..f4a85ae 100644
--- a/tests/qemu-iotests/133.out
+++ b/tests/qemu-iotests/133.out
@@ -19,4 +19,9 @@ Cannot change the option 'driver'
 
 === Check that unchanged driver is okay ===
 
+
+=== Check that reopening works with non-string options ===
+
+format name: null-co
+format name: null-co
 *** done
-- 
2.9.4

[Qemu-block] [PATCH v4 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

Currently, bdrv_reopen_prepare() assumes that all BDS options are
strings. However, this is not the case if the BDS has been created
through the json: pseudo-protocol or blockdev-add.

Note that the user-invokable reopen command is an HMP command, so you
can only specify strings there. Therefore, specifying a non-string
option with the "same" value as it was when originally created will now
return an error because the values are supposedly similar (and there is
no way for the user to circumvent this but to just not specify the
option again -- however, this is still strictly better than just
crashing).

Signed-off-by: Max Reitz 
---
 block.c | 29 ++---
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 913bb43..f8c8e92 100644
--- a/block.c
+++ b/block.c
@@ -2947,19 +2947,26 @@ int bdrv_reopen_prepare(BDRVReopenState *reopen_state, 
BlockReopenQueue *queue,
 const QDictEntry *entry = qdict_first(reopen_state->options);
 
 do {
-QString *new_obj = qobject_to_qstring(entry->value);
-const char *new = qstring_get_str(new_obj);
+QObject *new = entry->value;
+QObject *old = qdict_get(reopen_state->bs->options, entry->key);
+
 /*
- * Caution: while qdict_get_try_str() is fine, getting
- * non-string types would require more care.  When
- * bs->options come from -blockdev or blockdev_add, its
- * members are typed according to the QAPI schema, but
- * when they come from -drive, they're all QString.
+ * TODO: When using -drive to specify blockdev options, all values
+ * will be strings; however, when using -blockdev, blockdev-add or
+ * filenames using the json:{} pseudo-protocol, they will be
+ * correctly typed.
+ * In contrast, reopening options are (currently) always strings
+ * (because you can only specify them through qemu-io; all other
+ * callers do not specify any options).
+ * Therefore, when using anything other than -drive to create a 
BDS,
+ * this cannot detect non-string options as unchanged, because
+ * qobject_is_equal() always returns false for objects of different
+ * type.  In the future, this should be remedied by correctly 
typing
+ * all options.  For now, this is not too big of an issue because
+ * the user can simply omit options which cannot be changed anyway,
+ * so they will stay unchanged.
  */
-const char *old = qdict_get_try_str(reopen_state->bs->options,
-entry->key);
-
-if (!old || strcmp(new, old)) {
+if (!qobject_is_equal(new, old)) {
 error_setg(errp, "Cannot change the option '%s'", entry->key);
 ret = -EINVAL;
 goto error;
-- 
2.9.4

[Qemu-block] [PATCH v4 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

This generic function (along with its implementations for different
types) determines whether two QObjects are equal.

Signed-off-by: Max Reitz 
---
Markus also proposed just reporting two values as unequal if they have a
different internal representation (i.e. a different QNum kind).

I don't like this very much, because I feel like QInt and QFloat have
been unified for a reason: Outside of these classes, nobody should care
about the exact internal representation.  In JSON, there is no
difference anyway.  We probably want to use integers as long as we can
and doubles whenever we cannot.

In any case, I feel like the class should hide the different internal
representations from the user.  This necessitates being able to compare
floating point values against integers.  Since apparently the main use
of QObject is to parse and emit JSON (and represent such objects
internally), we also have to agree that JSON doesn't make a difference:
42 is just the same as 42.0.

Finally, I think it's rather pointless not to consider 42u and 42 the
same value.  But since unsigned/signed are two different kinds of QNums
already, we cannot consider them equal without considering 42.0 equal,
too.

Because of this, I have decided to continue to compare QNum values even
if they are of a different kind.
---
 include/qapi/qmp/qbool.h   |  1 +
 include/qapi/qmp/qdict.h   |  1 +
 include/qapi/qmp/qlist.h   |  1 +
 include/qapi/qmp/qnull.h   |  2 ++
 include/qapi/qmp/qnum.h|  1 +
 include/qapi/qmp/qobject.h |  9 ++
 include/qapi/qmp/qstring.h |  1 +
 qobject/qbool.c|  8 +
 qobject/qdict.c| 29 ++
 qobject/qlist.c| 32 
 qobject/qnull.c|  9 ++
 qobject/qnum.c | 73 ++
 qobject/qobject.c  | 29 ++
 qobject/qstring.c  |  9 ++
 14 files changed, 205 insertions(+)

diff --git a/include/qapi/qmp/qbool.h b/include/qapi/qmp/qbool.h
index a4c..f77ea86 100644
--- a/include/qapi/qmp/qbool.h
+++ b/include/qapi/qmp/qbool.h
@@ -24,6 +24,7 @@ typedef struct QBool {
 QBool *qbool_from_bool(bool value);
 bool qbool_get_bool(const QBool *qb);
 QBool *qobject_to_qbool(const QObject *obj);
+bool qbool_is_equal(const QObject *x, const QObject *y);
 void qbool_destroy_obj(QObject *obj);
 
 #endif /* QBOOL_H */
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h
index 363e431..84f8ea7 100644
--- a/include/qapi/qmp/qdict.h
+++ b/include/qapi/qmp/qdict.h
@@ -42,6 +42,7 @@ void qdict_del(QDict *qdict, const char *key);
 int qdict_haskey(const QDict *qdict, const char *key);
 QObject *qdict_get(const QDict *qdict, const char *key);
 QDict *qobject_to_qdict(const QObject *obj);
+bool qdict_is_equal(const QObject *x, const QObject *y);
 void qdict_iter(const QDict *qdict,
 void (*iter)(const char *key, QObject *obj, void *opaque),
 void *opaque);
diff --git a/include/qapi/qmp/qlist.h b/include/qapi/qmp/qlist.h
index c4b5fda..24e1e9f 100644
--- a/include/qapi/qmp/qlist.h
+++ b/include/qapi/qmp/qlist.h
@@ -58,6 +58,7 @@ QObject *qlist_peek(QList *qlist);
 int qlist_empty(const QList *qlist);
 size_t qlist_size(const QList *qlist);
 QList *qobject_to_qlist(const QObject *obj);
+bool qlist_is_equal(const QObject *x, const QObject *y);
 void qlist_destroy_obj(QObject *obj);
 
 static inline const QListEntry *qlist_first(const QList *qlist)
diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
index 48edad4..f4fbcae 100644
--- a/include/qapi/qmp/qnull.h
+++ b/include/qapi/qmp/qnull.h
@@ -23,4 +23,6 @@ static inline QObject *qnull(void)
 return &qnull_;
 }
 
+bool qnull_is_equal(const QObject *x, const QObject *y);
+
 #endif /* QNULL_H */
diff --git a/include/qapi/qmp/qnum.h b/include/qapi/qmp/qnum.h
index 09d745c..237d01b 100644
--- a/include/qapi/qmp/qnum.h
+++ b/include/qapi/qmp/qnum.h
@@ -48,6 +48,7 @@ double qnum_get_double(QNum *qn);
 char *qnum_to_string(QNum *qn);
 
 QNum *qobject_to_qnum(const QObject *obj);
+bool qnum_is_equal(const QObject *x, const QObject *y);
 void qnum_destroy_obj(QObject *obj);
 
 #endif /* QNUM_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index ef1d1a9..38ac688 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -68,6 +68,15 @@ static inline void qobject_incref(QObject *obj)
 }
 
 /**
+ * qobject_is_equal(): Return whether the two objects are equal.
+ *
+ * Any of the pointers may be NULL; return true if both are.  Always
+ * return false if only one is (therefore a QNull object is not
+ * considered equal to a NULL pointer).
+ */
+bool qobject_is_equal(const QObject *x, const QObject *y);
+
+/**
  * qobject_destroy(): Free resources used by the object
  */
 void qobject_destroy(QObject *obj);
diff --git a/include/qapi/qmp/qstring.h b/include/qapi/qmp/qstring.h
index 10076b7..65c05a9 100644
--- a/include/qapi/qmp/qstring.h
+++ b/include/

[Qemu-block] [PATCH v4 1/5] qapi/qnull: Add own header

2017-07-05 Thread Max Reitz

Reviewed-by: Markus Armbruster 
Reviewed-by: Eric Blake 
Signed-off-by: Max Reitz 
---
 include/qapi/qmp/qnull.h   | 26 ++
 include/qapi/qmp/qobject.h |  8 
 include/qapi/qmp/types.h   |  1 +
 qobject/qnull.c|  2 +-
 tests/check-qnull.c|  2 +-
 5 files changed, 29 insertions(+), 10 deletions(-)
 create mode 100644 include/qapi/qmp/qnull.h

diff --git a/include/qapi/qmp/qnull.h b/include/qapi/qmp/qnull.h
new file mode 100644
index 000..48edad4
--- /dev/null
+++ b/include/qapi/qmp/qnull.h
@@ -0,0 +1,26 @@
+/*
+ * QNull
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster 
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef QNULL_H
+#define QNULL_H
+
+#include "qapi/qmp/qobject.h"
+
+extern QObject qnull_;
+
+static inline QObject *qnull(void)
+{
+qobject_incref(&qnull_);
+return &qnull_;
+}
+
+#endif /* QNULL_H */
diff --git a/include/qapi/qmp/qobject.h b/include/qapi/qmp/qobject.h
index b8ddbca..ef1d1a9 100644
--- a/include/qapi/qmp/qobject.h
+++ b/include/qapi/qmp/qobject.h
@@ -93,12 +93,4 @@ static inline QType qobject_type(const QObject *obj)
 return obj->type;
 }
 
-extern QObject qnull_;
-
-static inline QObject *qnull(void)
-{
-qobject_incref(&qnull_);
-return &qnull_;
-}
-
 #endif /* QOBJECT_H */
diff --git a/include/qapi/qmp/types.h b/include/qapi/qmp/types.h
index a4bc662..749ac44 100644
--- a/include/qapi/qmp/types.h
+++ b/include/qapi/qmp/types.h
@@ -19,5 +19,6 @@
 #include "qapi/qmp/qstring.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qlist.h"
+#include "qapi/qmp/qnull.h"
 
 #endif /* QAPI_QMP_TYPES_H */
diff --git a/qobject/qnull.c b/qobject/qnull.c
index c124d05..43918f1 100644
--- a/qobject/qnull.c
+++ b/qobject/qnull.c
@@ -12,7 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu-common.h"
-#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 
 QObject qnull_ = {
 .type = QTYPE_QNULL,
diff --git a/tests/check-qnull.c b/tests/check-qnull.c
index 8dd1c96..4a67b9a 100644
--- a/tests/check-qnull.c
+++ b/tests/check-qnull.c
@@ -8,7 +8,7 @@
  */
 #include "qemu/osdep.h"
 
-#include "qapi/qmp/qobject.h"
+#include "qapi/qmp/qnull.h"
 #include "qemu-common.h"
 #include "qapi/qobject-input-visitor.h"
 #include "qapi/qobject-output-visitor.h"
-- 
2.9.4

Re: [Qemu-block] [Qemu-devel] [PATCH v3 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Max Reitz

On 2017-07-05 09:14, Markus Armbruster wrote:
> Max Reitz  writes:
> 
>> Currently, bdrv_reopen_prepare() assumes that all BDS options are
>> strings. However, this is not the case if the BDS has been created
>> through the json: pseudo-protocol or blockdev-add.
>>
>> Note that the user-invokable reopen command is an HMP command, so you
>> can only specify strings there. Therefore, specifying a non-string
>> option with the "same" value as it was when originally created will now
>> return an error because the values are supposedly similar (and there is
>> no way for the user to circumvent this but to just not specify the
>> option again -- however, this is still strictly better than just
>> crashing).
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block.c | 31 ++-
>>  1 file changed, 18 insertions(+), 13 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 913bb43..45eb248 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2947,19 +2947,24 @@ int bdrv_reopen_prepare(BDRVReopenState 
>> *reopen_state, BlockReopenQueue *queue,
>>  const QDictEntry *entry = qdict_first(reopen_state->options);
>>  
>>  do {
>> -QString *new_obj = qobject_to_qstring(entry->value);
>> -const char *new = qstring_get_str(new_obj);
>> -/*
>> - * Caution: while qdict_get_try_str() is fine, getting
>> - * non-string types would require more care.  When
>> - * bs->options come from -blockdev or blockdev_add, its
>> - * members are typed according to the QAPI schema, but
>> - * when they come from -drive, they're all QString.
>> - */
>> -const char *old = qdict_get_try_str(reopen_state->bs->options,
>> -entry->key);
>> -
>> -if (!old || strcmp(new, old)) {
>> +QObject *new = entry->value;
>> +QObject *old = qdict_get(reopen_state->bs->options, entry->key);
>> +
>> +/* TODO: When using -drive to specify blockdev options, all 
>> values
>> + * will be strings; however, when using -blockdev, blockdev-add 
>> or
>> + * filenames using the json:{} pseudo-protocol, they will be
>> + * correctly typed.
>> + * In contrast, reopening options are (currently) always strings
>> + * (because you can only specify them through qemu-io; all other
>> + * callers do not specify any options).
>> + * Therefore, when using anything other than -drive to create a 
>> BDS,
>> + * this cannot detect non-string options as unchanged, because
>> + * qobject_is_equal() always returns false for objects of 
>> different
>> + * type.  In the future, this should be remedied by correctly 
>> typing
>> + * all options.  For now, this is not too big of an issue 
>> because
>> + * the user simply can not specify options which cannot be 
>> changed
>> + * anyway, so they will stay unchanged. */
> 
> I'm not the maintainer, and this is not a demand: consider winging this
> comment and wrapping lines around column 70.

I actually did consider wrapping around column 70 and decided against it
because this comment is already indented by 12 characters, so none of
the lines exceed 65 characters (80 - (12 + strlen(" * "))).

About winging...  For some reason I don't quite like it here.  But it
probably is better because the comment is not immediately related to the
qobject_is_equal() call below, so I'll do it.

> As much as I fancy word play (see your reply to Eric), I have to admit
> that I had to read "the user simply can not specify options" about three
> times to make sense of it.  Please consider a wording that's easier to
> grasp, perhaps "the user can simply refrain from specifying options that
> cannot be changed".

Aw.  I've wanted this to be an example I could point people to to
educate them about the difference.  Now, alas, none shall learn. :-(

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 12:04 PM, Max Reitz wrote:

> Or, well, yes, it is in this case, but I meant literally UINT64_MAX + 1,
> not the uint64_t value. I meant the natural number 2^64.
> 
> Because the issue is that (double)UINT64_MAX will (or may, depending on
> the environment and such) give us 2.0^64 == 0x1p64.
> 
> And this is where the quotation I gave below comes in. When casting an
> integer to a double we may end up with a value that is not representable
> in the original integer type, so we cannot easily cast it back.

And it's more than just UINT64_MAX that rounds to the double value of
0x1p64.  Okay, I'm starting to be convinced that using a floating-point
comparison to 0x1p64 is going to be easier than an integer comparison to
the rounding point (what point is that, anyways: where the low-order 11
bits that don't fit in 53 bits of precision cause us to round up instead
of down?)

>>> Justifying the use of binary exponents is going to be harder than that,
>>> and would need a really good comment in the code, compared to just using
>>> a safe double-cast.
>>
>> It's not safe.
>>
>> Max
>>
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 12:00 PM, Max Reitz wrote:
>> (uint64_t)(UINT64_MAX + 1) is well-defined - it is 0.
>>
>> (Adding in unsigned integers is always well-defined - it wraps around on
>> mathematical overflow modulo the integer size.  You're thinking of
>> overflow addition on signed integers, which is indeed undefined)
> 
> It's not. See the standard:
> 
> When a finite value of real floating type is converted to an integer

Ah, you're talking:

(uint64_t)(double)(UINT64_MAX + 1), which is indeed undefined.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 19:00, Max Reitz wrote:
> On 2017-07-05 18:29, Eric Blake wrote:
>> On 07/05/2017 11:22 AM, Max Reitz wrote:
>>
>> return (double)x == x && x == y;
>
> Yes, that would do, too; and spares me of having to think about how well
> comparing an arbitrary double to UINT64_MAX actually works. :-)

 On second thought, this won't do, because (double)x == x is always true
 if x is an integer (because this will implicitly cast the second x to a
 double, too). However, (uint64_t)(double)x == x should work.
>>>
>>> Hm, well, the nice thing with this is that (double)UINT64_MAX is
>>> actually UINT64_MAX + 1, and now (uint64_t)(UINT64_MAX + 1) is
>>> undefined... Urgs.
>>
>> (uint64_t)(UINT64_MAX + 1) is well-defined - it is 0.
>>
>> (Adding in unsigned integers is always well-defined - it wraps around on
>> mathematical overflow modulo the integer size.  You're thinking of
>> overflow addition on signed integers, which is indeed undefined)
> 
> It's not. See the standard:

Or, well, yes, it is in this case, but I meant literally UINT64_MAX + 1,
not the uint64_t value. I meant the natural number 2^64.

Because the issue is that (double)UINT64_MAX will (or may, depending on
the environment and such) give us 2.0^64 == 0x1p64.

And this is where the quotation I gave below comes in. When casting an
integer to a double we may end up with a value that is not representable
in the original integer type, so we cannot easily cast it back.

Max

> When a finite value of real floating type is converted to an integer
> type other than _Bool, the fractional part is discarded (i.e., the value
> is truncated toward zero). If the value of the integral part cannot be
> represented by the integer type, the behavior is undefined. [61]
> 
> [61] The remaindering operation performed when a value of integer type
> is converted to unsigned type need not be performed when a value of real
> floating type is converted to unsigned type. Thus, the range of portable
> real floating values is (−1, Utype_MAX+1).
> 
> See for yourself:
> 
> printf("%i\n", (uint64_t)(double)UINT64_MAX == UINT64_MAX);
> 
> This yields 1 with gcc and -O3.
> 
>>>
>>> So I guess one thing that isn't very obvious but that should *always*
>>> work (and is always well-defined) is this:
>>>
>>> For uint64_t: y < 0x1p64 && (uint64_t)y == x
>>>
>>> For int64_t: y >= -0x1p63 && y < 0x1p63 && (int64_t)y == x
>>
>> That's harder to read, compared to the double-cast method which is
>> well-defined after all.
>>
>>>
>>> I hope. :-/
>>>
>>> (But finally a chance to use binary exponents! Yay!)
>>
>> Justifying the use of binary exponents is going to be harder than that,
>> and would need a really good comment in the code, compared to just using
>> a safe double-cast.
> 
> It's not safe.
> 
> Max
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 18:29, Eric Blake wrote:
> On 07/05/2017 11:22 AM, Max Reitz wrote:
> 
> return (double)x == x && x == y;

 Yes, that would do, too; and spares me of having to think about how well
 comparing an arbitrary double to UINT64_MAX actually works. :-)
>>>
>>> On second thought, this won't do, because (double)x == x is always true
>>> if x is an integer (because this will implicitly cast the second x to a
>>> double, too). However, (uint64_t)(double)x == x should work.
>>
>> Hm, well, the nice thing with this is that (double)UINT64_MAX is
>> actually UINT64_MAX + 1, and now (uint64_t)(UINT64_MAX + 1) is
>> undefined... Urgs.
> 
> (uint64_t)(UINT64_MAX + 1) is well-defined - it is 0.
> 
> (Adding in unsigned integers is always well-defined - it wraps around on
> mathematical overflow modulo the integer size.  You're thinking of
> overflow addition on signed integers, which is indeed undefined)

It's not. See the standard:

When a finite value of real floating type is converted to an integer
type other than _Bool, the fractional part is discarded (i.e., the value
is truncated toward zero). If the value of the integral part cannot be
represented by the integer type, the behavior is undefined. [61]

[61] The remaindering operation performed when a value of integer type
is converted to unsigned type need not be performed when a value of real
floating type is converted to unsigned type. Thus, the range of portable
real floating values is (−1, Utype_MAX+1).

See for yourself:

printf("%i\n", (uint64_t)(double)UINT64_MAX == UINT64_MAX);

This yields 1 with gcc and -O3.

>>
>> So I guess one thing that isn't very obvious but that should *always*
>> work (and is always well-defined) is this:
>>
>> For uint64_t: y < 0x1p64 && (uint64_t)y == x
>>
>> For int64_t: y >= -0x1p63 && y < 0x1p63 && (int64_t)y == x
> 
> That's harder to read, compared to the double-cast method which is
> well-defined after all.
> 
>>
>> I hope. :-/
>>
>> (But finally a chance to use binary exponents! Yay!)
> 
> Justifying the use of binary exponents is going to be harder than that,
> and would need a really good comment in the code, compared to just using
> a safe double-cast.

It's not safe.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Paolo Bonzini



On 05/07/2017 18:40, Marc-André Lureau wrote:
> Hi
> 
> - Original Message -
>>
>>
>> On 05/07/2017 18:06, Marc-André Lureau wrote:
> coroutine_fn too)
 It's not controversial, I would not have expected the functions to call
 coroutine_fn. :)  How do they do that?

>>> For example,  null_co_readv() calls  null_co_common() which calls
>>> co_aio_sleep_ns()
>>
>> But these are bdrv_co_*, not bdrv_aio_*.
> 
> Oops, right.
> 
> Indeed, it's not needed, but to avoid coroutine annotation mismatch, we would 
> have to remove a few:
> 
> static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,
> 
> static coroutine_fn BlockAIOCB *hdev_aio_pdiscard(BlockDriverState *bs,
> 
> Only those 2, it seems.

Good!  If it's just those two, they are wrong indeed.  I'd be surprised
to see more (and even more surprised to see that the annotations were
right :)).

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Marc-André Lureau

Hi

- Original Message -
> 
> 
> On 05/07/2017 18:06, Marc-André Lureau wrote:
> >>> coroutine_fn too)
> >> It's not controversial, I would not have expected the functions to call
> >> coroutine_fn. :)  How do they do that?
> >>
> > For example,  null_co_readv() calls  null_co_common() which calls
> > co_aio_sleep_ns()
> 
> But these are bdrv_co_*, not bdrv_aio_*.

Oops, right.

Indeed, it's not needed, but to avoid coroutine annotation mismatch, we would 
have to remove a few:

static coroutine_fn BlockAIOCB *raw_aio_pdiscard(BlockDriverState *bs,

static coroutine_fn BlockAIOCB *hdev_aio_pdiscard(BlockDriverState *bs,

Only those 2, it seems.

Re: [Qemu-block] [PATCH v3 09/20] mirror: Update signature of mirror_clip_sectors()

2017-07-05 Thread Kevin Wolf

Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
> Rather than having a void function that modifies its input
> in-place as the output, change the signature to reduce a layer
> of indirection and return the result.
> 
> Suggested-by: John Snow 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [Qemu-block] [PATCH v3 08/20] mirror: Switch mirror_do_zero_or_discard() to byte-based

2017-07-05 Thread Kevin Wolf

Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Convert another internal
> function (no semantic change).
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 

Reviewed-by: Kevin Wolf

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 18:22, Max Reitz wrote:
> On 2017-07-05 18:05, Max Reitz wrote:
>> On 2017-07-05 15:48, Max Reitz wrote:
>>> On 2017-07-05 09:07, Markus Armbruster wrote:
 Max Reitz  writes:

> This generic function (along with its implementations for different
> types) determines whether two QObjects are equal.
>
> Signed-off-by: Max Reitz 
 [...]
> diff --git a/qobject/qnum.c b/qobject/qnum.c
> index 476e81c..784d061 100644
> --- a/qobject/qnum.c
> +++ b/qobject/qnum.c
> @@ -213,6 +213,59 @@ QNum *qobject_to_qnum(const QObject *obj)
>  }
>  
>  /**
> + * qnum_is_equal(): Test whether the two QNums are equal
> + */
> +bool qnum_is_equal(const QObject *x, const QObject *y)
> +{
> +QNum *num_x = qobject_to_qnum(x);
> +QNum *num_y = qobject_to_qnum(y);
> +
> +switch (num_x->kind) {
> +case QNUM_I64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +/* Comparison in native int64_t type */
> +return num_x->u.i64 == num_y->u.i64;
> +case QNUM_U64:
> +/* Implicit conversion of x to uin64_t, so we have to
> + * check its sign before */
> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Implicit conversion of x to double; no overflow
> + * possible */
> +return num_x->u.i64 == num_y->u.dbl;

 Overflow is impossible, but loss of precision is possible:

 (double)9007199254740993ull == 9007199254740992.0

 yields true.  Is this what we want?
>>>
>>> I'd argue that yes, because the floating point value represents
>>> basically all of the values which are "equal" to it.
>>>
>>> But I don't have a string opinion. I guess the alternative would be to
>>> convert the double to an integer instead and check for overflows before?
>>>
> +}
> +abort();
> +case QNUM_U64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +/* Comparison in native uint64_t type */
> +return num_x->u.u64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Implicit conversion of x to double; no overflow
> + * possible */
> +return num_x->u.u64 == num_y->u.dbl;

 Similar loss of precision.

> +}
> +abort();
> +case QNUM_DOUBLE:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +return qnum_is_equal(y, x);
> +case QNUM_DOUBLE:
> +/* Comparison in native double type */
> +return num_x->u.dbl == num_y->u.dbl;
> +}
> +abort();
> +}
> +
> +abort();
> +}

 I think there's more than one sane interpretations of "is equal",
 including:

 * The mathematical numbers represented by @x and @y are equal.

 * @x and @y have the same contents, i.e. same kind and u.

 * @x and @y are the same object (listed for completeness; we don't need
   a function to compare pointers).

 Your patch implements yet another one.  Which one do we want, and why?
>>>
>>> Mine is the first one, except that I think that a floating point value
>>> does not represent a single number but just some number in a range.
>>>
 The second is easier to implement than the first.
>>>
>>> It seems much less useful, though.
>>>
 If we really want the first, you need to fix the loss of precision bugs.
>>>
>>> I'm not sure, but I don't mind either, so...
>>>
 I guess the obvious fix is

 return (double)x == x && x == y;
>>>
>>> Yes, that would do, too; and spares me of having to think about how well
>>> comparing an arbitrary double to UINT64_MAX actually works. :-)
>>
>> On second thought, this won't do, because (double)x == x is always true
>> if x is an integer (because this will implicitly cast the second x to a
>> double, too). However, (uint64_t)(double)x == x should work.
> 
> Hm, well, the nice thing with this is that (double)UINT64_MAX is
> actually UINT64_MAX + 1, and now (uint64_t)(UINT64_MAX + 1) is
> undefined... Urgs.
> 
> So I guess one thing that isn't very obvious but that should *always*
> work (and is always well-defined) is this:
> 
> For uint64_t: y < 0x1p64 && (uint64_t)y == x

Here comes iteration number 4: Forgot the y >= 0 check.

> For int64_t: y >= -0x1p63 && y < 0x1p63 && (int64_t)y == x

Also, I should check that the fractional part of y is 0 (through modf(y,
&_) == 0.0).

Floating point numbers are so much fun!

(And all of this gives me such great ideas for tests to add to patch 5!)

Max

> I hope. :-/
> 
> (

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 11:22 AM, Max Reitz wrote:

 return (double)x == x && x == y;
>>>
>>> Yes, that would do, too; and spares me of having to think about how well
>>> comparing an arbitrary double to UINT64_MAX actually works. :-)
>>
>> On second thought, this won't do, because (double)x == x is always true
>> if x is an integer (because this will implicitly cast the second x to a
>> double, too). However, (uint64_t)(double)x == x should work.
> 
> Hm, well, the nice thing with this is that (double)UINT64_MAX is
> actually UINT64_MAX + 1, and now (uint64_t)(UINT64_MAX + 1) is
> undefined... Urgs.

(uint64_t)(UINT64_MAX + 1) is well-defined - it is 0.

(Adding in unsigned integers is always well-defined - it wraps around on
mathematical overflow modulo the integer size.  You're thinking of
overflow addition on signed integers, which is indeed undefined)

> 
> So I guess one thing that isn't very obvious but that should *always*
> work (and is always well-defined) is this:
> 
> For uint64_t: y < 0x1p64 && (uint64_t)y == x
> 
> For int64_t: y >= -0x1p63 && y < 0x1p63 && (int64_t)y == x

That's harder to read, compared to the double-cast method which is
well-defined after all.

> 
> I hope. :-/
> 
> (But finally a chance to use binary exponents! Yay!)

Justifying the use of binary exponents is going to be harder than that,
and would need a really good comment in the code, compared to just using
a safe double-cast.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 18:05, Max Reitz wrote:
> On 2017-07-05 15:48, Max Reitz wrote:
>> On 2017-07-05 09:07, Markus Armbruster wrote:
>>> Max Reitz  writes:
>>>
 This generic function (along with its implementations for different
 types) determines whether two QObjects are equal.

 Signed-off-by: Max Reitz 
>>> [...]
 diff --git a/qobject/qnum.c b/qobject/qnum.c
 index 476e81c..784d061 100644
 --- a/qobject/qnum.c
 +++ b/qobject/qnum.c
 @@ -213,6 +213,59 @@ QNum *qobject_to_qnum(const QObject *obj)
  }
  
  /**
 + * qnum_is_equal(): Test whether the two QNums are equal
 + */
 +bool qnum_is_equal(const QObject *x, const QObject *y)
 +{
 +QNum *num_x = qobject_to_qnum(x);
 +QNum *num_y = qobject_to_qnum(y);
 +
 +switch (num_x->kind) {
 +case QNUM_I64:
 +switch (num_y->kind) {
 +case QNUM_I64:
 +/* Comparison in native int64_t type */
 +return num_x->u.i64 == num_y->u.i64;
 +case QNUM_U64:
 +/* Implicit conversion of x to uin64_t, so we have to
 + * check its sign before */
 +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
 +case QNUM_DOUBLE:
 +/* Implicit conversion of x to double; no overflow
 + * possible */
 +return num_x->u.i64 == num_y->u.dbl;
>>>
>>> Overflow is impossible, but loss of precision is possible:
>>>
>>> (double)9007199254740993ull == 9007199254740992.0
>>>
>>> yields true.  Is this what we want?
>>
>> I'd argue that yes, because the floating point value represents
>> basically all of the values which are "equal" to it.
>>
>> But I don't have a string opinion. I guess the alternative would be to
>> convert the double to an integer instead and check for overflows before?
>>
 +}
 +abort();
 +case QNUM_U64:
 +switch (num_y->kind) {
 +case QNUM_I64:
 +return qnum_is_equal(y, x);
 +case QNUM_U64:
 +/* Comparison in native uint64_t type */
 +return num_x->u.u64 == num_y->u.u64;
 +case QNUM_DOUBLE:
 +/* Implicit conversion of x to double; no overflow
 + * possible */
 +return num_x->u.u64 == num_y->u.dbl;
>>>
>>> Similar loss of precision.
>>>
 +}
 +abort();
 +case QNUM_DOUBLE:
 +switch (num_y->kind) {
 +case QNUM_I64:
 +return qnum_is_equal(y, x);
 +case QNUM_U64:
 +return qnum_is_equal(y, x);
 +case QNUM_DOUBLE:
 +/* Comparison in native double type */
 +return num_x->u.dbl == num_y->u.dbl;
 +}
 +abort();
 +}
 +
 +abort();
 +}
>>>
>>> I think there's more than one sane interpretations of "is equal",
>>> including:
>>>
>>> * The mathematical numbers represented by @x and @y are equal.
>>>
>>> * @x and @y have the same contents, i.e. same kind and u.
>>>
>>> * @x and @y are the same object (listed for completeness; we don't need
>>>   a function to compare pointers).
>>>
>>> Your patch implements yet another one.  Which one do we want, and why?
>>
>> Mine is the first one, except that I think that a floating point value
>> does not represent a single number but just some number in a range.
>>
>>> The second is easier to implement than the first.
>>
>> It seems much less useful, though.
>>
>>> If we really want the first, you need to fix the loss of precision bugs.
>>
>> I'm not sure, but I don't mind either, so...
>>
>>> I guess the obvious fix is
>>>
>>> return (double)x == x && x == y;
>>
>> Yes, that would do, too; and spares me of having to think about how well
>> comparing an arbitrary double to UINT64_MAX actually works. :-)
> 
> On second thought, this won't do, because (double)x == x is always true
> if x is an integer (because this will implicitly cast the second x to a
> double, too). However, (uint64_t)(double)x == x should work.

Hm, well, the nice thing with this is that (double)UINT64_MAX is
actually UINT64_MAX + 1, and now (uint64_t)(UINT64_MAX + 1) is
undefined... Urgs.

So I guess one thing that isn't very obvious but that should *always*
work (and is always well-defined) is this:

For uint64_t: y < 0x1p64 && (uint64_t)y == x

For int64_t: y >= -0x1p63 && y < 0x1p63 && (int64_t)y == x

I hope. :-/

(But finally a chance to use binary exponents! Yay!)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Markus Armbruster

Eric Blake  writes:

> On 07/05/2017 08:48 AM, Max Reitz wrote:
  /**
 + * qnum_is_equal(): Test whether the two QNums are equal
 + */
 +bool qnum_is_equal(const QObject *x, const QObject *y)
 +{
 +QNum *num_x = qobject_to_qnum(x);
 +QNum *num_y = qobject_to_qnum(y);
 +
 +switch (num_x->kind) {
 +case QNUM_I64:
 +switch (num_y->kind) {
 +case QNUM_I64:
 +/* Comparison in native int64_t type */
 +return num_x->u.i64 == num_y->u.i64;
 +case QNUM_U64:
 +/* Implicit conversion of x to uin64_t, so we have to
 + * check its sign before */
 +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
 +case QNUM_DOUBLE:
 +/* Implicit conversion of x to double; no overflow
 + * possible */
 +return num_x->u.i64 == num_y->u.dbl;
>>>
>>> Overflow is impossible, but loss of precision is possible:
>>>
>>> (double)9007199254740993ull == 9007199254740992.0
>>>
>>> yields true.  Is this what we want?
>> 
>> I'd argue that yes, because the floating point value represents
>> basically all of the values which are "equal" to it.
>
> But the problem is that we CAN represent the fully-precise number as an
> integer, so having multiple distinct integers that compare differently
> against each other, but equal to the same double, is awkward.

Yup.

>> But I don't have a string opinion. I guess the alternative would be to
>> convert the double to an integer instead and check for overflows before?
>
> That's the solution Markus gave, and I'm in favor of the tighter check:
>
[...]
>>> I think there's more than one sane interpretations of "is equal",
>>> including:
>>> 
>>> * The mathematical numbers represented by @x and @y are equal.
>>> 
>>> * @x and @y have the same contents, i.e. same kind and u.
>>> 
>>> * @x and @y are the same object (listed for completeness; we don't need
>>>   a function to compare pointers).
>>> 
>>> Your patch implements yet another one.  Which one do we want, and why?
>>
>> Mine is the first one, except that I think that a floating point value
>> does not represent a single number but just some number in a range.
>>
>>> The second is easier to implement than the first.
>>
>> It seems much less useful, though.

Depends on what for.

Common Lisp has

* eq: same object
* eql: eq, or both numbers with same type and value, or both characters
  that represent the same character
* =: mathematically equal (arguments must be numbers)
* equal: like eql, but it recursively descends into lists (I'm
  simplifying)

Decades of use back the assertion that eq, eql and = are all useful.

>>> If we really want the first, you need to fix the loss of precision bugs.
>>
>> I'm not sure, but I don't mind either, so...

For what it's worth, your v2 had QInt compare not equal to QFloat.
Makes me suspect that eql is good enough for the problem at hand.

>>> I guess the obvious fix is
>>>
>>> return (double)x == x && x == y;
>> 
>> Yes, that would do, too; and spares me of having to think about how well
>> comparing an arbitrary double to UINT64_MAX actually works. :-)
>
> It basically says that we are unwilling to declare an integer equivalent
> to the double if the double loses precision when trying to store the
> integer.
>
>>> Note that this is what you do for mixed signedness: first check @x is
>>> exactly representable in @y's type, then compare in @y's type.
>>> 
>>> Regardless of which one we pick, the function comment needs to explain.

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Paolo Bonzini



On 05/07/2017 18:06, Marc-André Lureau wrote:
>>> coroutine_fn too)
>> It's not controversial, I would not have expected the functions to call
>> coroutine_fn. :)  How do they do that?
>>
> For example,  null_co_readv() calls  null_co_common() which calls 
> co_aio_sleep_ns()

But these are bdrv_co_*, not bdrv_aio_*.

Paolo

Re: [Qemu-block] [PATCH] docs: add qemu-block-drivers(7) man page

2017-07-05 Thread Paolo Bonzini



On 22/06/2017 14:17, Stefan Hajnoczi wrote:
> +@c man begin SYNOPSIS
> +QEMU block driver reference manual
> +@c man end
> +

I think this should be wrapped with @ignore / @end ignore.  Otherwise
looks like a great idea.

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Marc-André Lureau

Hi

- Original Message -
> 
> 
> On 05/07/2017 16:21, Marc-André Lureau wrote:
> >>
> >> They are, but it's an implementation detail.  Why is this patch necessary?
> > I didn't think this would be controversial :) well, the checks I added to
> > clang verify function pointer share the coroutine attribute.
> > 
> > The function themself are/need to be coroutine_fn (as they will call
> > coroutine_fn too)
> 
> It's not controversial, I would not have expected the functions to call
> coroutine_fn. :)  How do they do that?
> 

For example,  null_co_readv() calls  null_co_common() which calls 
co_aio_sleep_ns()

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 15:48, Max Reitz wrote:
> On 2017-07-05 09:07, Markus Armbruster wrote:
>> Max Reitz  writes:
>>
>>> This generic function (along with its implementations for different
>>> types) determines whether two QObjects are equal.
>>>
>>> Signed-off-by: Max Reitz 
>> [...]
>>> diff --git a/qobject/qnum.c b/qobject/qnum.c
>>> index 476e81c..784d061 100644
>>> --- a/qobject/qnum.c
>>> +++ b/qobject/qnum.c
>>> @@ -213,6 +213,59 @@ QNum *qobject_to_qnum(const QObject *obj)
>>>  }
>>>  
>>>  /**
>>> + * qnum_is_equal(): Test whether the two QNums are equal
>>> + */
>>> +bool qnum_is_equal(const QObject *x, const QObject *y)
>>> +{
>>> +QNum *num_x = qobject_to_qnum(x);
>>> +QNum *num_y = qobject_to_qnum(y);
>>> +
>>> +switch (num_x->kind) {
>>> +case QNUM_I64:
>>> +switch (num_y->kind) {
>>> +case QNUM_I64:
>>> +/* Comparison in native int64_t type */
>>> +return num_x->u.i64 == num_y->u.i64;
>>> +case QNUM_U64:
>>> +/* Implicit conversion of x to uin64_t, so we have to
>>> + * check its sign before */
>>> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
>>> +case QNUM_DOUBLE:
>>> +/* Implicit conversion of x to double; no overflow
>>> + * possible */
>>> +return num_x->u.i64 == num_y->u.dbl;
>>
>> Overflow is impossible, but loss of precision is possible:
>>
>> (double)9007199254740993ull == 9007199254740992.0
>>
>> yields true.  Is this what we want?
> 
> I'd argue that yes, because the floating point value represents
> basically all of the values which are "equal" to it.
> 
> But I don't have a string opinion. I guess the alternative would be to
> convert the double to an integer instead and check for overflows before?
> 
>>> +}
>>> +abort();
>>> +case QNUM_U64:
>>> +switch (num_y->kind) {
>>> +case QNUM_I64:
>>> +return qnum_is_equal(y, x);
>>> +case QNUM_U64:
>>> +/* Comparison in native uint64_t type */
>>> +return num_x->u.u64 == num_y->u.u64;
>>> +case QNUM_DOUBLE:
>>> +/* Implicit conversion of x to double; no overflow
>>> + * possible */
>>> +return num_x->u.u64 == num_y->u.dbl;
>>
>> Similar loss of precision.
>>
>>> +}
>>> +abort();
>>> +case QNUM_DOUBLE:
>>> +switch (num_y->kind) {
>>> +case QNUM_I64:
>>> +return qnum_is_equal(y, x);
>>> +case QNUM_U64:
>>> +return qnum_is_equal(y, x);
>>> +case QNUM_DOUBLE:
>>> +/* Comparison in native double type */
>>> +return num_x->u.dbl == num_y->u.dbl;
>>> +}
>>> +abort();
>>> +}
>>> +
>>> +abort();
>>> +}
>>
>> I think there's more than one sane interpretations of "is equal",
>> including:
>>
>> * The mathematical numbers represented by @x and @y are equal.
>>
>> * @x and @y have the same contents, i.e. same kind and u.
>>
>> * @x and @y are the same object (listed for completeness; we don't need
>>   a function to compare pointers).
>>
>> Your patch implements yet another one.  Which one do we want, and why?
> 
> Mine is the first one, except that I think that a floating point value
> does not represent a single number but just some number in a range.
> 
>> The second is easier to implement than the first.
> 
> It seems much less useful, though.
> 
>> If we really want the first, you need to fix the loss of precision bugs.
> 
> I'm not sure, but I don't mind either, so...
> 
>> I guess the obvious fix is
>>
>> return (double)x == x && x == y;
> 
> Yes, that would do, too; and spares me of having to think about how well
> comparing an arbitrary double to UINT64_MAX actually works. :-)

On second thought, this won't do, because (double)x == x is always true
if x is an integer (because this will implicitly cast the second x to a
double, too). However, (uint64_t)(double)x == x should work.

Max

> 
>> Note that this is what you do for mixed signedness: first check @x is
>> exactly representable in @y's type, then compare in @y's type.
>>
>> Regardless of which one we pick, the function comment needs to explain.
> 
> OK, will do.
> 
> Max
> 
>>> +
>>> +/**
>>>   * qnum_destroy_obj(): Free all memory allocated by a
>>>   * QNum object
>>>   */
>> [...]
>>
>> Remainder of the patch looks good to me.
>>
> 
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [RFC v1 2/4] util/oslib-win32: Remove invalid check

2017-07-05 Thread Alistair Francis

On Fri, Jun 30, 2017 at 3:37 AM, Paolo Bonzini  wrote:
>
>
> On 29/06/2017 18:37, Alistair Francis wrote:
>>> Hmm, I think it's possible, poll_msgs is true here.
>> poll_msgs?
>>
>> If nhandles is 0 then we have already entered an earlier if statement
>> and set ready to either WAIT_FAILED or WAIT_TIMEOUT in which case we
>> can't enter this part of the if statement.
>
> No, that's not correct.  The code is:
>
> if (poll_msgs) {
> /* Wait for either messages or handles
>  * -> Use MsgWaitForMultipleObjectsEx
>  */
> ready = MsgWaitForMultipleObjectsEx(nhandles, handles, timeout,
> QS_ALLINPUT, MWMO_ALERTABLE);
>
> if (ready == WAIT_FAILED) {
> gchar *emsg = g_win32_error_message(GetLastError());
> g_warning("MsgWaitForMultipleObjectsEx failed: %s", emsg);
> g_free(emsg);
> }
> } else if (nhandles == 0) {
> /* No handles to wait for, just the timeout */
> if (timeout == INFINITE) {
> ready = WAIT_FAILED;
> } else {
> SleepEx(timeout, TRUE);
> ready = WAIT_TIMEOUT;
> }
>
> You can have poll_msgs == TRUE && nhandles == 0.  This happens for
>
>GPollFD fds[1] = { .fd = G_WIN32_MSG_HANDLE, .events = G_IO_IN };
>g_poll(fds, 1, timeout);

Ah. Yeah good point.

Ok, I'll respin the series without this patch then.

Thanks,
Alistair

>
> Thanks,
>
> Paolo
>

Re: [Qemu-block] [PATCH v4 0/1] virtio-scsi-ccw: fix iotest 068 for s390x

2017-07-05 Thread Stefan Hajnoczi

On Tue, Jul 04, 2017 at 03:23:49PM +0200, QingFeng Hao wrote:
> This commit fixes iotest 068 for s390x as s390x uses virtio-scsi-ccw.
> It's based on commit c324fd0a39c by Stefan Hajnoczi. 
> Thanks!
> 
> Change history:
> v4:
> Got Cornelia Huck's Reviewed-by and take the comment to change the
> commit message.
> 
> v3:
> Take Christian Borntraeger and Cornelia Huck's comment to check
> if kvm is enabled in s390_assign_subch_ioeventfd instead of
> kvm_s390_assign_subch_ioeventfd to as the former is a general one.
> 
> v2:
> Remove Stefan from sign-off list and change the patch's commit message 
> according to Christian Borntraeger's comment.
> 
> QingFeng Hao (1):
>   virtio-scsi-ccw: use ioeventfd even when KVM is disabled
> 
>  hw/s390x/virtio-ccw.c | 2 +-
>  target/s390x/cpu.h| 6 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)

I didn't realize s390 also has this old check.  Thanks for fixing it!

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH v4] tests: Avoid non-portable 'echo -ARG'

2017-07-05 Thread Max Reitz

On 2017-07-03 20:09, Eric Blake wrote:
> POSIX says that backslashes in the arguments to 'echo', as well as
> any use of 'echo -n' and 'echo -e', are non-portable; it recommends
> people should favor 'printf' instead.  This is definitely true where
> we do not control which shell is running (such as in makefile snippets
> or in documentation examples).  But even for scripts where we
> require bash (and therefore, where echo does what we want by default),
> it is still possible to use 'shopt -s xpg_echo' to change bash's
> behavior of echo.  And setting a good example never hurts when we are
> not sure if a snippet will be copied from a bash-only script to a
> general shell script (although I don't change the use of non-portable
> \e for ESC when we know the running shell is bash).
> 
> Replace 'echo -n "..."' with 'printf %s "..."', and 'echo -e "..."'
> with 'printf %b "...\n"', with the optimization that the %s/%b
> argument can be omitted if the string being printed is a strict
> literal with no '%', '$', or '`' (we could technically also make
> this optimization when there are $ or `` substitutions but where
> we can prove their results will not be problematic, but proving
> that such substitutions are safe makes the patch less trivial
> compared to just being consistent).
> 
> In the qemu-iotests check script, fix unusual shell quoting
> that would result in word-splitting if 'date' outputs a space.
> 
> In test 051, take an opportunity to shorten the line.
> 
> In test 068, get rid of a pointless second invocation of bash.
> 
> CC: qemu-triv...@nongnu.org
> Signed-off-by: Eric Blake 
> 
> ---
> v4: even more use of %b/%s [Max]
> v3: use 'printf %s' in a few more places that substitute [Max]
> v2: be robust to potential % in substitutions
> ---
>  qemu-options.hx |  4 ++--
>  tests/multiboot/run_test.sh | 10 +-
>  tests/qemu-iotests/051  |  7 ---
>  tests/qemu-iotests/068  |  2 +-
>  tests/qemu-iotests/142  | 48 
> ++---
>  tests/qemu-iotests/171  | 14 ++---
>  tests/qemu-iotests/check| 18 -
>  tests/rocker/all| 10 +-
>  tests/tcg/cris/Makefile |  8 
>  9 files changed, 61 insertions(+), 60 deletions(-)

Thanks a lot (not least for bearing my nagging)! Applied to my block branch:

https://github.com/XanClic/qemu/commits/block

(I agree it's fit for qemu-trivial now, but, well...)

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [RFC] QMP design: Fixing query-block and friends

2017-07-05 Thread Stefan Hajnoczi

On Tue, Jun 27, 2017 at 02:24:12PM -0400, John Snow wrote:
> On 06/27/2017 12:31 PM, Kevin Wolf wrote:
> > If that's what we're going to do, I think I can figure out something
> > nice for block nodes. That shouldn't be too hard. The only question
> > would be whether we want a command to query one node or whether we would
> > keep returning all of them.
> > 
> 
> Personally in favor of allowing filtering, as an option. I don't see why
> we couldn't, or why it would be a problem, or worse in any way.

Yes.  Filtering is important for performance with large numbers of
drives.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Paolo Bonzini



On 05/07/2017 16:21, Marc-André Lureau wrote:
>>
>> They are, but it's an implementation detail.  Why is this patch necessary?
> I didn't think this would be controversial :) well, the checks I added to 
> clang verify function pointer share the coroutine attribute.
> 
> The function themself are/need to be coroutine_fn (as they will call 
> coroutine_fn too)

It's not controversial, I would not have expected the functions to call
coroutine_fn. :)  How do they do that?

Paolo

Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Fam Zheng

On Wed, 07/05 09:01, Eric Blake wrote:
> On 07/05/2017 07:07 AM, Fam Zheng wrote:
> >>>
> >>> Sorry for bikeshedding.
> >>
> >> Not a problem, I also had some double-takes in writing my own code
> >> trying to remember which way I wanted the 'allocation' boolean to be
> >> set, so coming up with a more intuitive name/default state in order to
> >> help legibility is worth it.  Do any of my above suggestions sound better?
> >>
> > 
> > I'd vote for "mapping" because it has a close connection with offset (as in
> > BDRV_BLOCK_OFFSET_VALID).
> > 
> > Or simply call it "offset" and if false, never return 
> > BDRV_BLOCK_OFFSET_VALID.
> 
> Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
> regardless of the state of the boolean (namely, any driver that also
> returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
> should be repeated at the next BDS in the chain).  So stating that
> 'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
> strong, but I can probably come up with appropriate wording that meets
> in the middle ground (if 'offset' is true, then make all efforts to
> include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
> is false, then omitting the flag in order to get a larger pnum is
> acceptable)).

Yes you are totally right, I thought that too after replied.

Fam

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Marc-André Lureau



- Original Message -
> On 05/07/2017 00:03, Marc-André Lureau wrote:
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  include/block/block_int.h | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > index 15fa602150..93eb2a9528 100644
> > --- a/include/block/block_int.h
> > +++ b/include/block/block_int.h
> > @@ -133,15 +133,15 @@ struct BlockDriver {
> >  void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
> >  
> >  /* aio */
> > -BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
> > +BlockAIOCB * coroutine_fn (*bdrv_aio_readv)(BlockDriverState *bs,
> >  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> >  BlockCompletionFunc *cb, void *opaque);
> > -BlockAIOCB *(*bdrv_aio_writev)(BlockDriverState *bs,
> > +BlockAIOCB * coroutine_fn (*bdrv_aio_writev)(BlockDriverState *bs,
> >  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> >  BlockCompletionFunc *cb, void *opaque);
> > -BlockAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
> > +BlockAIOCB * coroutine_fn (*bdrv_aio_flush)(BlockDriverState *bs,
> >  BlockCompletionFunc *cb, void *opaque);
> > -BlockAIOCB *(*bdrv_aio_pdiscard)(BlockDriverState *bs,
> > +BlockAIOCB * coroutine_fn (*bdrv_aio_pdiscard)(BlockDriverState *bs,
> >  int64_t offset, int bytes,
> >  BlockCompletionFunc *cb, void *opaque);
> >  
> > @@ -247,7 +247,7 @@ struct BlockDriver {
> >  void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
> >  
> >  /* to control generic scsi devices */
> > -BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
> > +BlockAIOCB * coroutine_fn (*bdrv_aio_ioctl)(BlockDriverState *bs,
> >  unsigned long int req, void *buf,
> >  BlockCompletionFunc *cb, void *opaque);
> >  int coroutine_fn (*bdrv_co_ioctl)(BlockDriverState *bs,
> > 
> 
> 
> They are, but it's an implementation detail.  Why is this patch necessary?

I didn't think this would be controversial :) well, the checks I added to clang 
verify function pointer share the coroutine attribute.

The function themself are/need to be coroutine_fn (as they will call 
coroutine_fn too)

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Eric Blake

On 07/05/2017 08:48 AM, Max Reitz wrote:
>>>  /**
>>> + * qnum_is_equal(): Test whether the two QNums are equal
>>> + */
>>> +bool qnum_is_equal(const QObject *x, const QObject *y)
>>> +{
>>> +QNum *num_x = qobject_to_qnum(x);
>>> +QNum *num_y = qobject_to_qnum(y);
>>> +
>>> +switch (num_x->kind) {
>>> +case QNUM_I64:
>>> +switch (num_y->kind) {
>>> +case QNUM_I64:
>>> +/* Comparison in native int64_t type */
>>> +return num_x->u.i64 == num_y->u.i64;
>>> +case QNUM_U64:
>>> +/* Implicit conversion of x to uin64_t, so we have to
>>> + * check its sign before */
>>> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
>>> +case QNUM_DOUBLE:
>>> +/* Implicit conversion of x to double; no overflow
>>> + * possible */
>>> +return num_x->u.i64 == num_y->u.dbl;
>>
>> Overflow is impossible, but loss of precision is possible:
>>
>> (double)9007199254740993ull == 9007199254740992.0
>>
>> yields true.  Is this what we want?
> 
> I'd argue that yes, because the floating point value represents
> basically all of the values which are "equal" to it.

But the problem is that we CAN represent the fully-precise number as an
integer, so having multiple distinct integers that compare differently
against each other, but equal to the same double, is awkward.

> 
> But I don't have a string opinion. I guess the alternative would be to
> convert the double to an integer instead and check for overflows before?

That's the solution Markus gave, and I'm in favor of the tighter check:

> 
>> I guess the obvious fix is
>>
>> return (double)x == x && x == y;
> 
> Yes, that would do, too; and spares me of having to think about how well
> comparing an arbitrary double to UINT64_MAX actually works. :-)

It basically says that we are unwilling to declare an integer equivalent
to the double if the double loses precision when trying to store the
integer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 31/35] parallels: mark coroutine_fn

2017-07-05 Thread Denis V. Lunev

On 07/05/2017 01:03 AM, Marc-André Lureau wrote:
> Signed-off-by: Marc-André Lureau 
> ---
>  block/parallels.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/block/parallels.c b/block/parallels.c
> index 8be46a7d48..213e42b9d2 100644
> --- a/block/parallels.c
> +++ b/block/parallels.c
> @@ -472,7 +472,8 @@ static int parallels_check(BlockDriverState *bs, 
> BdrvCheckResult *res,
>  }
>  
>  
> -static int parallels_create(const char *filename, QemuOpts *opts, Error 
> **errp)
> +static int coroutine_fn
> +parallels_create(const char *filename, QemuOpts *opts, Error **errp)
>  {
>  int64_t total_size, cl_size;
>  uint8_t tmp[BDRV_SECTOR_SIZE];
Reviewed-by: Denis V. Lunev

Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Eric Blake

On 07/05/2017 07:07 AM, Fam Zheng wrote:
>>>
>>> Sorry for bikeshedding.
>>
>> Not a problem, I also had some double-takes in writing my own code
>> trying to remember which way I wanted the 'allocation' boolean to be
>> set, so coming up with a more intuitive name/default state in order to
>> help legibility is worth it.  Do any of my above suggestions sound better?
>>
> 
> I'd vote for "mapping" because it has a close connection with offset (as in
> BDRV_BLOCK_OFFSET_VALID).
> 
> Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Well, there ARE drivers that WANT to return BDRV_BLOCK_OFFSET_VALID
regardless of the state of the boolean (namely, any driver that also
returns BDRV_BLOCK_RAW, to hint that this is a passthrough and the query
should be repeated at the next BDS in the chain).  So stating that
'offset' is false MUST preclude BDRV_BLOCK_OFFSET_VALID is a bit too
strong, but I can probably come up with appropriate wording that meets
in the middle ground (if 'offset' is true, then make all efforts to
include BDRV_BLOCK_OFFSET_VALID in the return if that is possible; if it
is false, then omitting the flag in order to get a larger pnum is
acceptable)).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Max Reitz

On 2017-07-05 09:07, Markus Armbruster wrote:
> Max Reitz  writes:
> 
>> This generic function (along with its implementations for different
>> types) determines whether two QObjects are equal.
>>
>> Signed-off-by: Max Reitz 
> [...]
>> diff --git a/qobject/qnum.c b/qobject/qnum.c
>> index 476e81c..784d061 100644
>> --- a/qobject/qnum.c
>> +++ b/qobject/qnum.c
>> @@ -213,6 +213,59 @@ QNum *qobject_to_qnum(const QObject *obj)
>>  }
>>  
>>  /**
>> + * qnum_is_equal(): Test whether the two QNums are equal
>> + */
>> +bool qnum_is_equal(const QObject *x, const QObject *y)
>> +{
>> +QNum *num_x = qobject_to_qnum(x);
>> +QNum *num_y = qobject_to_qnum(y);
>> +
>> +switch (num_x->kind) {
>> +case QNUM_I64:
>> +switch (num_y->kind) {
>> +case QNUM_I64:
>> +/* Comparison in native int64_t type */
>> +return num_x->u.i64 == num_y->u.i64;
>> +case QNUM_U64:
>> +/* Implicit conversion of x to uin64_t, so we have to
>> + * check its sign before */
>> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
>> +case QNUM_DOUBLE:
>> +/* Implicit conversion of x to double; no overflow
>> + * possible */
>> +return num_x->u.i64 == num_y->u.dbl;
> 
> Overflow is impossible, but loss of precision is possible:
> 
> (double)9007199254740993ull == 9007199254740992.0
> 
> yields true.  Is this what we want?

I'd argue that yes, because the floating point value represents
basically all of the values which are "equal" to it.

But I don't have a string opinion. I guess the alternative would be to
convert the double to an integer instead and check for overflows before?

>> +}
>> +abort();
>> +case QNUM_U64:
>> +switch (num_y->kind) {
>> +case QNUM_I64:
>> +return qnum_is_equal(y, x);
>> +case QNUM_U64:
>> +/* Comparison in native uint64_t type */
>> +return num_x->u.u64 == num_y->u.u64;
>> +case QNUM_DOUBLE:
>> +/* Implicit conversion of x to double; no overflow
>> + * possible */
>> +return num_x->u.u64 == num_y->u.dbl;
> 
> Similar loss of precision.
> 
>> +}
>> +abort();
>> +case QNUM_DOUBLE:
>> +switch (num_y->kind) {
>> +case QNUM_I64:
>> +return qnum_is_equal(y, x);
>> +case QNUM_U64:
>> +return qnum_is_equal(y, x);
>> +case QNUM_DOUBLE:
>> +/* Comparison in native double type */
>> +return num_x->u.dbl == num_y->u.dbl;
>> +}
>> +abort();
>> +}
>> +
>> +abort();
>> +}
> 
> I think there's more than one sane interpretations of "is equal",
> including:
> 
> * The mathematical numbers represented by @x and @y are equal.
> 
> * @x and @y have the same contents, i.e. same kind and u.
> 
> * @x and @y are the same object (listed for completeness; we don't need
>   a function to compare pointers).
> 
> Your patch implements yet another one.  Which one do we want, and why?

Mine is the first one, except that I think that a floating point value
does not represent a single number but just some number in a range.

> The second is easier to implement than the first.

It seems much less useful, though.

> If we really want the first, you need to fix the loss of precision bugs.

I'm not sure, but I don't mind either, so...

> I guess the obvious fix is
> 
> return (double)x == x && x == y;

Yes, that would do, too; and spares me of having to think about how well
comparing an arbitrary double to UINT64_MAX actually works. :-)

> Note that this is what you do for mixed signedness: first check @x is
> exactly representable in @y's type, then compare in @y's type.
> 
> Regardless of which one we pick, the function comment needs to explain.

OK, will do.

Max

>> +
>> +/**
>>   * qnum_destroy_obj(): Free all memory allocated by a
>>   * QNum object
>>   */
> [...]
> 
> Remainder of the patch looks good to me.
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 6/6] block: Move NVMe spec definitions to a separate header

2017-07-05 Thread Paolo Bonzini



On 05/07/2017 15:36, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> ---
>  block/nvme.c|   7 +-
>  block/nvme.h| 700 
> 

This should be include/block/nvme.h.  Can be fixed by maintainer, I suppose.

Paolo

>  hw/block/nvme.h | 698 +--
>  3 files changed, 702 insertions(+), 703 deletions(-)
>  create mode 100644 block/nvme.h
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 7913017..2680b29 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -22,12 +22,7 @@
>  #include "block/nvme-vfio.h"
>  #include "trace.h"
>  
> -/* TODO: Move nvme spec definitions from hw/block/nvme.h into a separate file
> - * that doesn't depend on dma/pci headers. */
> -#include "sysemu/dma.h"
> -#include "hw/pci/pci.h"
> -#include "hw/block/block.h"
> -#include "hw/block/nvme.h"
> +#include "block/nvme.h"
>  
>  #define NVME_SQ_ENTRY_BYTES 64
>  #define NVME_CQ_ENTRY_BYTES 16
> diff --git a/block/nvme.h b/block/nvme.h
> new file mode 100644
> index 000..ed18091
> --- /dev/null
> +++ b/block/nvme.h
> @@ -0,0 +1,700 @@
> +#ifndef BLOCK_NVME_H
> +#define BLOC_NVMEK_H
> +
> +typedef struct NvmeBar {
> +uint64_tcap;
> +uint32_tvs;
> +uint32_tintms;
> +uint32_tintmc;
> +uint32_tcc;
> +uint32_trsvd1;
> +uint32_tcsts;
> +uint32_tnssrc;
> +uint32_taqa;
> +uint64_tasq;
> +uint64_tacq;
> +uint32_tcmbloc;
> +uint32_tcmbsz;
> +} NvmeBar;
> +
> +enum NvmeCapShift {
> +CAP_MQES_SHIFT = 0,
> +CAP_CQR_SHIFT  = 16,
> +CAP_AMS_SHIFT  = 17,
> +CAP_TO_SHIFT   = 24,
> +CAP_DSTRD_SHIFT= 32,
> +CAP_NSSRS_SHIFT= 33,
> +CAP_CSS_SHIFT  = 37,
> +CAP_MPSMIN_SHIFT   = 48,
> +CAP_MPSMAX_SHIFT   = 52,
> +};
> +
> +enum NvmeCapMask {
> +CAP_MQES_MASK  = 0x,
> +CAP_CQR_MASK   = 0x1,
> +CAP_AMS_MASK   = 0x3,
> +CAP_TO_MASK= 0xff,
> +CAP_DSTRD_MASK = 0xf,
> +CAP_NSSRS_MASK = 0x1,
> +CAP_CSS_MASK   = 0xff,
> +CAP_MPSMIN_MASK= 0xf,
> +CAP_MPSMAX_MASK= 0xf,
> +};
> +
> +#define NVME_CAP_MQES(cap)  (((cap) >> CAP_MQES_SHIFT)   & CAP_MQES_MASK)
> +#define NVME_CAP_CQR(cap)   (((cap) >> CAP_CQR_SHIFT)& CAP_CQR_MASK)
> +#define NVME_CAP_AMS(cap)   (((cap) >> CAP_AMS_SHIFT)& CAP_AMS_MASK)
> +#define NVME_CAP_TO(cap)(((cap) >> CAP_TO_SHIFT) & CAP_TO_MASK)
> +#define NVME_CAP_DSTRD(cap) (((cap) >> CAP_DSTRD_SHIFT)  & CAP_DSTRD_MASK)
> +#define NVME_CAP_NSSRS(cap) (((cap) >> CAP_NSSRS_SHIFT)  & CAP_NSSRS_MASK)
> +#define NVME_CAP_CSS(cap)   (((cap) >> CAP_CSS_SHIFT)& CAP_CSS_MASK)
> +#define NVME_CAP_MPSMIN(cap)(((cap) >> CAP_MPSMIN_SHIFT) & CAP_MPSMIN_MASK)
> +#define NVME_CAP_MPSMAX(cap)(((cap) >> CAP_MPSMAX_SHIFT) & CAP_MPSMAX_MASK)
> +
> +#define NVME_CAP_SET_MQES(cap, val)   (cap |= (uint64_t)(val & 
> CAP_MQES_MASK)  \
> +   << CAP_MQES_SHIFT)
> +#define NVME_CAP_SET_CQR(cap, val)(cap |= (uint64_t)(val & CAP_CQR_MASK) 
>   \
> +   << CAP_CQR_SHIFT)
> +#define NVME_CAP_SET_AMS(cap, val)(cap |= (uint64_t)(val & CAP_AMS_MASK) 
>   \
> +   << CAP_AMS_SHIFT)
> +#define NVME_CAP_SET_TO(cap, val) (cap |= (uint64_t)(val & CAP_TO_MASK)  
>   \
> +   << CAP_TO_SHIFT)
> +#define NVME_CAP_SET_DSTRD(cap, val)  (cap |= (uint64_t)(val & 
> CAP_DSTRD_MASK) \
> +   << 
> CAP_DSTRD_SHIFT)
> +#define NVME_CAP_SET_NSSRS(cap, val)  (cap |= (uint64_t)(val & 
> CAP_NSSRS_MASK) \
> +   << 
> CAP_NSSRS_SHIFT)
> +#define NVME_CAP_SET_CSS(cap, val)(cap |= (uint64_t)(val & CAP_CSS_MASK) 
>   \
> +   << CAP_CSS_SHIFT)
> +#define NVME_CAP_SET_MPSMIN(cap, val) (cap |= (uint64_t)(val & 
> CAP_MPSMIN_MASK)\
> +   << 
> CAP_MPSMIN_SHIFT)
> +#define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & 
> CAP_MPSMAX_MASK)\
> +<< 
> CAP_MPSMAX_SHIFT)
> +
> +enum NvmeCcShift {
> +CC_EN_SHIFT = 0,
> +CC_CSS_SHIFT= 4,
> +CC_MPS_SHIFT= 7,
> +CC_AMS_SHIFT= 11,
> +CC_SHN_SHIFT= 14,
> +CC_IOSQES_SHIFT = 16,
> +CC_IOCQES_SHIFT = 20,
> +};
> +
> +enum NvmeCcMask {
> +CC_EN_MASK  = 0x1,
> +CC_CSS_MASK = 0x7,
> +CC_MPS_MASK = 0xf,
> +CC_AMS_MASK = 0x7,
> +CC_SHN_MASK = 0x3,
> +CC_IOSQES_MASK  = 0xf,
> +CC_IOCQES_MASK  = 0xf,
> +};
> +
> +#define NVME_CC_EN(cc) ((cc >> CC_EN_SHIFT) & CC_EN_MASK)
> +#define NVME_C

Re: [Qemu-block] [PATCH v3 04/20] stream: Switch stream_run() to byte-based

2017-07-05 Thread Kevin Wolf

Am 05.07.2017 um 14:13 hat Eric Blake geschrieben:
> On 07/04/2017 10:00 AM, Kevin Wolf wrote:
> > Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
> >> We are gradually converting to byte-based interfaces, as they are
> >> easier to reason about than sector-based.  Change the internal
> >> loop iteration of streaming to track by bytes instead of sectors
> >> (although we are still guaranteed that we iterate by steps that
> >> are sector-aligned).
> >>
> >> Signed-off-by: Eric Blake 
> >> Reviewed-by: John Snow 
> >>
> >> ---
> >> v2: no change
> >> ---
> >>  block/stream.c | 24 ++--
> >>  1 file changed, 10 insertions(+), 14 deletions(-)
> >>
> >> diff --git a/block/stream.c b/block/stream.c
> >> index 746d525..2f9618b 100644
> >> --- a/block/stream.c
> >> +++ b/block/stream.c
> >> @@ -108,12 +108,11 @@ static void coroutine_fn stream_run(void *opaque)
> >>  BlockBackend *blk = s->common.blk;
> >>  BlockDriverState *bs = blk_bs(blk);
> >>  BlockDriverState *base = s->base;
> >> -int64_t sector_num = 0;
> >> -int64_t end = -1;
> > 
> > Here, end was initialised to -1. This made a differnce for early 'goto
> > out' paths because otherwise data->reached_end would incorrectly be true
> > in stream_complete.
> 
> Oh, good call.
> 
> > 
> > Because we also check data->ret, I think the only case where it actually
> > makes a difference is for the !bs->backing case: This used to result in
> > data->reached_end == false, but now it ends up as true. This is because
> > s->common.len hasn't been set yet, so it is still 0.
> 
> When !bs->backing, ret is 0; so that means my commit message is wrong
> about there being no semantic change.  But remember, when !bs->backing,
> stream is a no-op (there was no backing file to stream into the current
> layer, anyways) - so which do we want, declaring that the operation
> never reached the end (odd, since we did nothing), or that it is
> complete?  In other words, is this something where I should fix the
> semantics (perhaps as a separate bug-fix patch), or where I need to fix
> this patch to preserve existing semantics?
> 
> The next code reached is stream_complete().  Before my patch, it skipped
> the main body of the function (because !data->reached_end); with my
> patch, we are now calling bdrv_change_backing_file() (presumably a no-op
> when there is no backing?)

Effectively probably yes, but it involves another write+flush for
rewriting the qcow2 header (even though nothing is expected to change in
it).

But if we wanted to avoid this, I think we could just directly check
bs->backing in stream_complete(). So I wonder why data->reached_end even
exists in the first place.

Kevin


pgpHGVE60GfZf.pgp
Description: PGP signature

Re: [Qemu-block] [PATCH v3 0/6] block: Add VFIO based driver for NVMe device

2017-07-05 Thread Paolo Bonzini



On 05/07/2017 15:36, Fam Zheng wrote:
> v3: Rebase, small tweaks/fixes and add locks to provide basic thread safety
> (basic because it is not really tested).

Sounds good, it can be converted to CoMutex later.

Paolo

[Qemu-block] [PATCH v3 5/6] qemu-img: Map bench buffer

2017-07-05 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 qemu-img.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 91ad6be..fea156c 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3875,6 +3875,7 @@ static int img_bench(int argc, char **argv)
 struct timeval t1, t2;
 int i;
 bool force_share = false;
+size_t buf_size;
 
 for (;;) {
 static const struct option long_options[] = {
@@ -4063,9 +4064,12 @@ static int img_bench(int argc, char **argv)
 printf("Sending flush every %d requests\n", flush_interval);
 }
 
-data.buf = blk_blockalign(blk, data.nrreq * data.bufsize);
+buf_size = data.nrreq * data.bufsize;
+data.buf = blk_blockalign(blk, buf_size);
 memset(data.buf, pattern, data.nrreq * data.bufsize);
 
+blk_dma_map(blk, data.buf, buf_size);
+
 data.qiov = g_new(QEMUIOVector, data.nrreq);
 for (i = 0; i < data.nrreq; i++) {
 qemu_iovec_init(&data.qiov[i], 1);
@@ -4086,6 +4090,9 @@ static int img_bench(int argc, char **argv)
+ ((double)(t2.tv_usec - t1.tv_usec) / 100));
 
 out:
+if (data.buf) {
+blk_dma_unmap(blk, data.buf);
+}
 qemu_vfree(data.buf);
 blk_unref(blk);
 
-- 
2.9.4

[Qemu-block] [PATCH v3 6/6] block: Move NVMe spec definitions to a separate header

2017-07-05 Thread Fam Zheng

Signed-off-by: Fam Zheng 
---
 block/nvme.c|   7 +-
 block/nvme.h| 700 
 hw/block/nvme.h | 698 +--
 3 files changed, 702 insertions(+), 703 deletions(-)
 create mode 100644 block/nvme.h

diff --git a/block/nvme.c b/block/nvme.c
index 7913017..2680b29 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -22,12 +22,7 @@
 #include "block/nvme-vfio.h"
 #include "trace.h"
 
-/* TODO: Move nvme spec definitions from hw/block/nvme.h into a separate file
- * that doesn't depend on dma/pci headers. */
-#include "sysemu/dma.h"
-#include "hw/pci/pci.h"
-#include "hw/block/block.h"
-#include "hw/block/nvme.h"
+#include "block/nvme.h"
 
 #define NVME_SQ_ENTRY_BYTES 64
 #define NVME_CQ_ENTRY_BYTES 16
diff --git a/block/nvme.h b/block/nvme.h
new file mode 100644
index 000..ed18091
--- /dev/null
+++ b/block/nvme.h
@@ -0,0 +1,700 @@
+#ifndef BLOCK_NVME_H
+#define BLOC_NVMEK_H
+
+typedef struct NvmeBar {
+uint64_tcap;
+uint32_tvs;
+uint32_tintms;
+uint32_tintmc;
+uint32_tcc;
+uint32_trsvd1;
+uint32_tcsts;
+uint32_tnssrc;
+uint32_taqa;
+uint64_tasq;
+uint64_tacq;
+uint32_tcmbloc;
+uint32_tcmbsz;
+} NvmeBar;
+
+enum NvmeCapShift {
+CAP_MQES_SHIFT = 0,
+CAP_CQR_SHIFT  = 16,
+CAP_AMS_SHIFT  = 17,
+CAP_TO_SHIFT   = 24,
+CAP_DSTRD_SHIFT= 32,
+CAP_NSSRS_SHIFT= 33,
+CAP_CSS_SHIFT  = 37,
+CAP_MPSMIN_SHIFT   = 48,
+CAP_MPSMAX_SHIFT   = 52,
+};
+
+enum NvmeCapMask {
+CAP_MQES_MASK  = 0x,
+CAP_CQR_MASK   = 0x1,
+CAP_AMS_MASK   = 0x3,
+CAP_TO_MASK= 0xff,
+CAP_DSTRD_MASK = 0xf,
+CAP_NSSRS_MASK = 0x1,
+CAP_CSS_MASK   = 0xff,
+CAP_MPSMIN_MASK= 0xf,
+CAP_MPSMAX_MASK= 0xf,
+};
+
+#define NVME_CAP_MQES(cap)  (((cap) >> CAP_MQES_SHIFT)   & CAP_MQES_MASK)
+#define NVME_CAP_CQR(cap)   (((cap) >> CAP_CQR_SHIFT)& CAP_CQR_MASK)
+#define NVME_CAP_AMS(cap)   (((cap) >> CAP_AMS_SHIFT)& CAP_AMS_MASK)
+#define NVME_CAP_TO(cap)(((cap) >> CAP_TO_SHIFT) & CAP_TO_MASK)
+#define NVME_CAP_DSTRD(cap) (((cap) >> CAP_DSTRD_SHIFT)  & CAP_DSTRD_MASK)
+#define NVME_CAP_NSSRS(cap) (((cap) >> CAP_NSSRS_SHIFT)  & CAP_NSSRS_MASK)
+#define NVME_CAP_CSS(cap)   (((cap) >> CAP_CSS_SHIFT)& CAP_CSS_MASK)
+#define NVME_CAP_MPSMIN(cap)(((cap) >> CAP_MPSMIN_SHIFT) & CAP_MPSMIN_MASK)
+#define NVME_CAP_MPSMAX(cap)(((cap) >> CAP_MPSMAX_SHIFT) & CAP_MPSMAX_MASK)
+
+#define NVME_CAP_SET_MQES(cap, val)   (cap |= (uint64_t)(val & CAP_MQES_MASK)  
\
+   << CAP_MQES_SHIFT)
+#define NVME_CAP_SET_CQR(cap, val)(cap |= (uint64_t)(val & CAP_CQR_MASK)   
\
+   << CAP_CQR_SHIFT)
+#define NVME_CAP_SET_AMS(cap, val)(cap |= (uint64_t)(val & CAP_AMS_MASK)   
\
+   << CAP_AMS_SHIFT)
+#define NVME_CAP_SET_TO(cap, val) (cap |= (uint64_t)(val & CAP_TO_MASK)
\
+   << CAP_TO_SHIFT)
+#define NVME_CAP_SET_DSTRD(cap, val)  (cap |= (uint64_t)(val & CAP_DSTRD_MASK) 
\
+   << CAP_DSTRD_SHIFT)
+#define NVME_CAP_SET_NSSRS(cap, val)  (cap |= (uint64_t)(val & CAP_NSSRS_MASK) 
\
+   << CAP_NSSRS_SHIFT)
+#define NVME_CAP_SET_CSS(cap, val)(cap |= (uint64_t)(val & CAP_CSS_MASK)   
\
+   << CAP_CSS_SHIFT)
+#define NVME_CAP_SET_MPSMIN(cap, val) (cap |= (uint64_t)(val & 
CAP_MPSMIN_MASK)\
+   << CAP_MPSMIN_SHIFT)
+#define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & 
CAP_MPSMAX_MASK)\
+<< 
CAP_MPSMAX_SHIFT)
+
+enum NvmeCcShift {
+CC_EN_SHIFT = 0,
+CC_CSS_SHIFT= 4,
+CC_MPS_SHIFT= 7,
+CC_AMS_SHIFT= 11,
+CC_SHN_SHIFT= 14,
+CC_IOSQES_SHIFT = 16,
+CC_IOCQES_SHIFT = 20,
+};
+
+enum NvmeCcMask {
+CC_EN_MASK  = 0x1,
+CC_CSS_MASK = 0x7,
+CC_MPS_MASK = 0xf,
+CC_AMS_MASK = 0x7,
+CC_SHN_MASK = 0x3,
+CC_IOSQES_MASK  = 0xf,
+CC_IOCQES_MASK  = 0xf,
+};
+
+#define NVME_CC_EN(cc) ((cc >> CC_EN_SHIFT) & CC_EN_MASK)
+#define NVME_CC_CSS(cc)((cc >> CC_CSS_SHIFT)& CC_CSS_MASK)
+#define NVME_CC_MPS(cc)((cc >> CC_MPS_SHIFT)& CC_MPS_MASK)
+#define NVME_CC_AMS(cc)((cc >> CC_AMS_SHIFT)& CC_AMS_MASK)
+#define NVME_CC_SHN(cc)((cc >> CC_SHN_SHIFT)& CC_SHN_MASK)
+#define NVME_CC_IOSQES(cc) ((cc >> CC_IOSQES_SHIFT) & CC_IOSQES_MASK)
+#define NVME_CC_IOCQES(cc) ((cc >> CC_IOCQES_SHIFT) & CC_IOCQES_MASK)
+
+enum Nvm

Re: [Qemu-block] [PATCH v3 1/4] block: Pass bdrv_* methods to bs->file by default

2017-07-05 Thread Kevin Wolf

Am 05.07.2017 um 14:43 hat Eric Blake geschrieben:
> On 07/04/2017 04:44 AM, Kevin Wolf wrote:
> > Am 04.07.2017 um 10:40 hat Manos Pitsidianakis geschrieben:
> >> The following functions fail if bs->drv does not implement them:
> >>
> 
> >> @@ -511,6 +513,8 @@ int bdrv_probe_geometry(BlockDriverState *bs, 
> >> HDGeometry *geo)
> >>  
> >>  if (drv && drv->bdrv_probe_geometry) {
> >>  return drv->bdrv_probe_geometry(bs, geo);
> >> +} else if (drv && bs->file && bs->file->bs) {
> >> +return bdrv_probe_geometry(bs->file->bs, geo);
> >>  }
> >>  
> >>  return -ENOTSUP;
> > 
> > The probed geometry would refer to the physical image file, not to the
> > actually presented image. So propagating this to bs->file is wrong for
> > image formats.
> > 
> > Possibly checking .is_filter in addition would be enough.
> 
> In fact, this made me think: if we define .is_filter as an enum instead
> of a bool, we could implement 0 for no filter, 1 for file filter, and 2
> for backing filter.  Then instead of having to have 2 separate default
> bdrv_get_block_status_for_*() generic functions in patch 4, we could
> instead patch bdrv_co_get_block_status() to check .is_filter, and
> automatically fall back to bs->file or bs->backing according to the enum
> value stored in .is_filter.

After reviewing the rest of the series, I'm not completely convinced of
using .is_filter any more. If you look at the patch for the raw format
driver, it actually turns out that even though we would intuitively call
it a filter, it has different properties than for example a throttle
filter and passing through function calls to bs->file makes sense for a
different set of functions.

Kevin


pgpmsUfD7TuBE.pgp
Description: PGP signature

[Qemu-block] [PATCH v3 4/6] block/nvme: Implement .bdrv_dma_map and .bdrv_dma_unmap

2017-07-05 Thread Fam Zheng

Forward these two calls to the IOVA manager.

Signed-off-by: Fam Zheng 
---
 block/nvme.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/nvme.c b/block/nvme.c
index eb999a1..7913017 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1056,6 +1056,20 @@ static void nvme_aio_unplug(BlockDriverState *bs)
 }
 }
 
+static void nvme_dma_map(BlockDriverState *bs, void *host, size_t size)
+{
+BDRVNVMeState *s = bs->opaque;
+
+nvme_vfio_dma_map(s->vfio, host, size, false, NULL);
+}
+
+static void nvme_dma_unmap(BlockDriverState *bs, void *host)
+{
+BDRVNVMeState *s = bs->opaque;
+
+nvme_vfio_dma_unmap(s->vfio, host);
+}
+
 static BlockDriver bdrv_nvme = {
 .format_name  = "nvme",
 .protocol_name= "nvme",
@@ -1081,6 +1095,9 @@ static BlockDriver bdrv_nvme = {
 
 .bdrv_io_plug = nvme_aio_plug,
 .bdrv_io_unplug   = nvme_aio_unplug,
+
+.bdrv_dma_map = nvme_dma_map,
+.bdrv_dma_unmap   = nvme_dma_unmap,
 };
 
 static void bdrv_nvme_init(void)
-- 
2.9.4

[Qemu-block] [PATCH v3 2/6] block: Add VFIO based NVMe driver

2017-07-05 Thread Fam Zheng

This is a new protocol driver that exclusively opens a host NVMe
controller through VFIO. It achieves better latency than linux-aio by
completely bypassing host kernel vfs/block layer.

$rw-$bs-$iodepth  linux-aio nvme://

randread-4k-1 8269  8851
randread-512k-1   584   610
randwrite-4k-128601 34649
randwrite-512k-1  1809  1975

The driver also integrates with the polling mechanism of iothread.

This patch is co-authored by Paolo and me.

Signed-off-by: Fam Zheng 
---
 MAINTAINERS |6 +
 block/Makefile.objs |1 +
 block/nvme-vfio.c   |  703 +
 block/nvme-vfio.h   |   30 ++
 block/nvme.c| 1091 +++
 block/trace-events  |   32 ++
 6 files changed, 1863 insertions(+)
 create mode 100644 block/nvme-vfio.c
 create mode 100644 block/nvme-vfio.h
 create mode 100644 block/nvme.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 839f7ca..4cce80c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1746,6 +1746,12 @@ L: qemu-block@nongnu.org
 S: Supported
 F: block/null.c
 
+NVMe Block Driver
+M: Fam Zheng 
+L: qemu-block@nongnu.org
+S: Supported
+F: block/nvme*
+
 Bootdevice
 M: Gonglei 
 S: Maintained
diff --git a/block/Makefile.objs b/block/Makefile.objs
index f9368b5..8866487 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -11,6 +11,7 @@ block-obj-$(CONFIG_POSIX) += file-posix.o
 block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 block-obj-y += null.o mirror.o commit.o io.o
 block-obj-y += throttle-groups.o
+block-obj-$(CONFIG_LINUX) += nvme.o nvme-vfio.o
 
 block-obj-y += nbd.o nbd-client.o sheepdog.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/nvme-vfio.c b/block/nvme-vfio.c
new file mode 100644
index 000..f030a82
--- /dev/null
+++ b/block/nvme-vfio.c
@@ -0,0 +1,703 @@
+/*
+ * NVMe VFIO interface
+ *
+ * Copyright 2016, 2017 Red Hat, Inc.
+ *
+ * Authors:
+ *   Fam Zheng 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+#include 
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/cpu-common.h"
+#include "trace.h"
+#include "qemu/queue.h"
+#include "qemu/error-report.h"
+#include "standard-headers/linux/pci_regs.h"
+#include "qemu/event_notifier.h"
+#include "block/nvme-vfio.h"
+#include "trace.h"
+
+#define NVME_DEBUG 0
+
+#define NVME_VFIO_IOVA_MIN 0x1ULL
+/* XXX: Once VFIO exposes the iova bit width in the IOMMU capability interface,
+ * we can use a runtime limit; alternatively it's also possible to do platform
+ * specific detection by reading sysfs entries. Until then, 39 is a safe bet.
+ **/
+#define NVME_VFIO_IOVA_MAX (1ULL << 39)
+
+typedef struct {
+/* Page aligned addr. */
+void *host;
+size_t size;
+uint64_t iova;
+} IOVAMapping;
+
+struct NVMeVFIOState {
+int container;
+int group;
+int device;
+RAMBlockNotifier ram_notifier;
+struct vfio_region_info config_region_info, bar_region_info[6];
+
+/* VFIO's IO virtual address space is managed by splitting into a few
+ * sections:
+ *
+ * ---   <= 0
+ * |x|
+ * |-|   <= NVME_VFIO_IOVA_MIN
+ * | |
+ * |Fixed|
+ * | |
+ * |-|   <= low_water_mark
+ * | |
+ * |Free |
+ * | |
+ * |-|   <= high_water_mark
+ * | |
+ * |Temp |
+ * | |
+ * |-|   <= NVME_VFIO_IOVA_MAX
+ * |x|
+ * |x|
+ * ---
+ *
+ * - Addresses lower than NVME_VFIO_IOVA_MIN are reserved as invalid;
+ *
+ * - Fixed mappings of HVAs are assigned "low" IOVAs in the range of
+ *   [NVME_VFIO_IOVA_MIN, low_water_mark).  Once allocated they will not be
+ *   reclaimed - low_water_mark never shrinks;
+ *
+ * - IOVAs in range [low_water_mark, high_water_mark) are free;
+ *
+ * - IOVAs in range [high_water_mark, NVME_VFIO_IOVA_MAX) are volatile
+ *   mappings. At each nvme_vfio_dma_reset_temporary() call, the whole area
+ *   is recycled. The caller should make sure I/O's depending on these
+ *   mappings are completed before calling.
+ **/
+uint64_t low_water_mark;
+uint64_t high_water_mark;
+IOVAMapping *mappings;
+int nr_mappings;
+QemuMutex lock;
+};
+
+/** Find group file and return the full path in @path by PCI device address
+ * @device. If succeeded, caller needs to g_free the returned path. */
+static int sysfs_find_group_file(const char *device, char **path, Error **errp)
+{
+int ret;
+char *sysfs_link = NULL;
+char *sysfs_group = NULL;
+char *p;
+
+sysfs_link = g_strdup_printf("/sys/bus/pci/

[Qemu-block] [PATCH v3 3/6] block: Introduce bdrv_dma_map and bdrv_dma_unmap

2017-07-05 Thread Fam Zheng

Allow block driver to map and unmap a buffer for later I/O, as a performance
hint.

Signed-off-by: Fam Zheng 
---
 block/block-backend.c  | 10 ++
 block/io.c | 24 
 include/block/block.h  |  2 ++
 include/block/block_int.h  |  4 
 include/sysemu/block-backend.h |  3 +++
 5 files changed, 43 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 0df3457..784b936 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1974,3 +1974,13 @@ static void blk_root_drained_end(BdrvChild *child)
 }
 }
 }
+
+void blk_dma_map(BlockBackend *blk, void *host, size_t size)
+{
+bdrv_dma_map(blk_bs(blk), host, size);
+}
+
+void blk_dma_unmap(BlockBackend *blk, void *host)
+{
+bdrv_dma_unmap(blk_bs(blk), host);
+}
diff --git a/block/io.c b/block/io.c
index 2de7c77..988e4db 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2537,3 +2537,27 @@ void bdrv_io_unplug(BlockDriverState *bs)
 bdrv_io_unplug(child->bs);
 }
 }
+
+void bdrv_dma_map(BlockDriverState *bs, void *host, size_t size)
+{
+BdrvChild *child;
+
+if (bs->drv && bs->drv->bdrv_dma_map) {
+bs->drv->bdrv_dma_map(bs, host, size);
+}
+QLIST_FOREACH(child, &bs->children, next) {
+bdrv_dma_map(child->bs, host, size);
+}
+}
+
+void bdrv_dma_unmap(BlockDriverState *bs, void *host)
+{
+BdrvChild *child;
+
+if (bs->drv && bs->drv->bdrv_dma_unmap) {
+bs->drv->bdrv_dma_unmap(bs, host);
+}
+QLIST_FOREACH(child, &bs->children, next) {
+bdrv_dma_unmap(child->bs, host);
+}
+}
diff --git a/include/block/block.h b/include/block/block.h
index 4c149ad..f59b50a 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -624,4 +624,6 @@ void bdrv_add_child(BlockDriverState *parent, 
BlockDriverState *child,
 Error **errp);
 void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
 
+void bdrv_dma_map(BlockDriverState *bs, void *host, size_t size);
+void bdrv_dma_unmap(BlockDriverState *bs, void *host);
 #endif
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602..4092669 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -381,6 +381,10 @@ struct BlockDriver {
  uint64_t parent_perm, uint64_t parent_shared,
  uint64_t *nperm, uint64_t *nshared);
 
+/* Map and unmap a buffer for I/O, as a performance hint to the
+ * driver. */
+void (*bdrv_dma_map)(BlockDriverState *bs, void *host, size_t size);
+void (*bdrv_dma_unmap)(BlockDriverState *bs, void *host);
 QLIST_ENTRY(BlockDriver) list;
 };
 
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 1e05281..5f7ccdb 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -239,4 +239,7 @@ void blk_io_limits_disable(BlockBackend *blk);
 void blk_io_limits_enable(BlockBackend *blk, const char *group);
 void blk_io_limits_update_group(BlockBackend *blk, const char *group);
 
+void blk_dma_map(BlockBackend *blk, void *host, size_t size);
+void blk_dma_unmap(BlockBackend *blk, void *host);
+
 #endif
-- 
2.9.4

[Qemu-block] [PATCH v3 0/6] block: Add VFIO based driver for NVMe device

2017-07-05 Thread Fam Zheng

v3: Rebase, small tweaks/fixes and add locks to provide basic thread safety
(basic because it is not really tested).

v2:
- Implement "split vfio addr space" appraoch. [Paolo]
- Add back 'device reset' in nvme_close(). [Paolo]
- Better variable namings. [Stefan]
- "Reuse" macro definitions from NVMe emulation code.
- Rebase onto current master which has polling by default and update
  performance results accordingly.
- Update MAINTAINERS.
- Specify namespace in URI.
- The sporadical I/O error from v1 "disappeared" in this version.
- Tests one: qemu-img bench, fio, bonnie++ and installation of
  ubuntu/fedora/rhel on QEMU emulated nvme and a Intel P3700 card.

Fam Zheng (6):
  stubs: Add stubs for ram block API
  block: Add VFIO based NVMe driver
  block: Introduce bdrv_dma_map and bdrv_dma_unmap
  block/nvme: Implement .bdrv_dma_map and .bdrv_dma_unmap
  qemu-img: Map bench buffer
  block: Move NVMe spec definitions to a separate header

 MAINTAINERS|6 +
 block/Makefile.objs|1 +
 block/block-backend.c  |   10 +
 block/io.c |   24 +
 block/nvme-vfio.c  |  703 +
 block/nvme-vfio.h  |   30 ++
 block/nvme.c   | 1103 
 block/nvme.h   |  700 +
 block/trace-events |   32 ++
 hw/block/nvme.h|  698 +
 include/block/block.h  |2 +
 include/block/block_int.h  |4 +
 include/sysemu/block-backend.h |3 +
 qemu-img.c |9 +-
 stubs/Makefile.objs|1 +
 stubs/ram-block.c  |   16 +
 16 files changed, 2644 insertions(+), 698 deletions(-)
 create mode 100644 block/nvme-vfio.c
 create mode 100644 block/nvme-vfio.h
 create mode 100644 block/nvme.c
 create mode 100644 block/nvme.h
 create mode 100644 stubs/ram-block.c

-- 
2.9.4

[Qemu-block] [PATCH v3 1/6] stubs: Add stubs for ram block API

2017-07-05 Thread Fam Zheng

These functions will be wanted by block-obj-y but the actual definition
is in obj-y, so stub them to keep the linker happy.

Signed-off-by: Fam Zheng 
Acked-by: Paolo Bonzini 
---
 stubs/Makefile.objs |  1 +
 stubs/ram-block.c   | 16 
 2 files changed, 17 insertions(+)
 create mode 100644 stubs/ram-block.c

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index f5b47bf..c93a800 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -39,3 +39,4 @@ stub-obj-y += pc_madt_cpu_entry.o
 stub-obj-y += vmgenid.o
 stub-obj-y += xen-common.o
 stub-obj-y += xen-hvm.o
+stub-obj-y += ram-block.o
diff --git a/stubs/ram-block.c b/stubs/ram-block.c
new file mode 100644
index 000..cfa5d86
--- /dev/null
+++ b/stubs/ram-block.c
@@ -0,0 +1,16 @@
+#include "qemu/osdep.h"
+#include "exec/ramlist.h"
+#include "exec/cpu-common.h"
+
+void ram_block_notifier_add(RAMBlockNotifier *n)
+{
+}
+
+void ram_block_notifier_remove(RAMBlockNotifier *n)
+{
+}
+
+int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
+{
+return 0;
+}
-- 
2.9.4

Re: [Qemu-block] [Qemu-devel] [PATCH] docs: add qemu-block-drivers(7) man page

2017-07-05 Thread Stefan Hajnoczi

On Tue, Jun 27, 2017 at 10:20:10AM -0500, Eric Blake wrote:
> Then again, much of your patch is just code motion, rather than new
> documentation.  So my comments are probably worth separate patches.

Yes, they sound like good improvements to me but beyond the scope of
this patch, which moves the existing documentation.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-block] Fuzzing event loops

2017-07-05 Thread Stefan Hajnoczi

On Mon, Jun 26, 2017 at 04:53:45PM -0400, John Snow wrote:
> On 06/11/2017 06:47 AM, Stefan Hajnoczi wrote:
> > I wanted to share this idea about fuzzing event loops:
> > 
> > https://blog.acolyer.org/2017/06/09/node-fz-fuzzing-the-server-side-event-driven-architecture/
> > 
> > The idea is to expose ordering dependencies and atomicity bugs in
> > event loop callbacks/coroutines by randomly shuffling the order in
> > which fd handlers, timers, etc execute.
> > 
> > I'm not sure we'd find many bugs since QEMU tends to use big locks or
> > request serialization when concurrency gets tricky in the block layer.
> > Still, it's an interesting concept that we could apply in the future.
> > 
> > Stefan
> > 
> 
> Sounds fun, probably too detailed for a GSoC/Outreachy project, right?
> Do we have a page on the wiki for random "Hey, this might be nice..." ideas?
> 
> (Or is that a bad idea itself so we don't have a graveyard of 'not my
> problem' projects?)

This idea is difficult as an internship because it's an investigation
project.  It may produce no results or may require deep knowledge of
QEMU internals to resolve issues that are identified.

I just wanted to share the idea.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] [PATCH v3 1/4] block: Pass bdrv_* methods to bs->file by default

2017-07-05 Thread Eric Blake

On 07/04/2017 04:44 AM, Kevin Wolf wrote:
> Am 04.07.2017 um 10:40 hat Manos Pitsidianakis geschrieben:
>> The following functions fail if bs->drv does not implement them:
>>

>> @@ -511,6 +513,8 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry 
>> *geo)
>>  
>>  if (drv && drv->bdrv_probe_geometry) {
>>  return drv->bdrv_probe_geometry(bs, geo);
>> +} else if (drv && bs->file && bs->file->bs) {
>> +return bdrv_probe_geometry(bs->file->bs, geo);
>>  }
>>  
>>  return -ENOTSUP;
> 
> The probed geometry would refer to the physical image file, not to the
> actually presented image. So propagating this to bs->file is wrong for
> image formats.
> 
> Possibly checking .is_filter in addition would be enough.

In fact, this made me think: if we define .is_filter as an enum instead
of a bool, we could implement 0 for no filter, 1 for file filter, and 2
for backing filter.  Then instead of having to have 2 separate default
bdrv_get_block_status_for_*() generic functions in patch 4, we could
instead patch bdrv_co_get_block_status() to check .is_filter, and
automatically fall back to bs->file or bs->backing according to the enum
value stored in .is_filter.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 1/4] block: Pass bdrv_* methods to bs->file by default

2017-07-05 Thread Eric Blake

[adding libvirt for block size discussion]

On 07/04/2017 04:44 AM, Kevin Wolf wrote:
> Am 04.07.2017 um 10:40 hat Manos Pitsidianakis geschrieben:
>> The following functions fail if bs->drv does not implement them:
>>
>> bdrv_probe_blocksizes
>> bdrv_probe_geometry
>> bdrv_truncate
>> bdrv_has_zero_init
>> bdrv_get_info
>> bdrv_media_changed
>> bdrv_eject
>> bdrv_lock_medium
>> bdrv_co_ioctl
>>
>> Instead, the call should be passed to bs->file if it exists, to allow
>> filter drivers to support those methods without implementing them.
>>

> 
>> +return bdrv_probe_blocksizes(bs->file->bs, bsz);
>>  }
>>  
>>  return -ENOTSUP;
> 
> This change actually changes existing behaviour. Basically, all of your
> additions also apply for qcow2 images or other image formats, and we
> need to check whether they make sense there.
> 
> bdrv_probe_blocksizes() is used for the default values for the physical
> and logical block sizes for block devices if no value is explicitly
> specified in -device. This makes sense as it will allow the guest to
> optimise its requests so that they align with the block size that is
> optimal for the storage that contains the image. (I also think that we
> probably should implement .bdrv_probe_blocksizes() not only for
> host_device, but also for regular files to get optimal performance by
> default.)
> 
> Changing the defaults can be a problem for cross-version live migration
> if the default values are used. Management tools that want to do live
> migration are supposed to explicitly provide all options they can, so
> this should be okay.
> 
> Eric, does libvirt set the physical/logical block size explicitly? If
> not, it would be good if it started to do so.

Libvirt supports user control of block size via this XML in :

which it translates to command-line logical_block_size= and
physical_block_size=.  But I didn't see anywhere that libvirt tries to
do anything with block size in QMP, which makes me wonder if that is a
libvirt bug for not allowing a block size when hot-plugging a drive, or
a weakness in QMP.

Your point that if libvirt isn't always specifying block size, then we
risk the block size changing on migration (that is, libvirt should be
patched to specify block size always, and not just when the XML
specifies it).


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v3 04/20] stream: Switch stream_run() to byte-based

2017-07-05 Thread Eric Blake

On 07/04/2017 10:00 AM, Kevin Wolf wrote:
> Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
>> We are gradually converting to byte-based interfaces, as they are
>> easier to reason about than sector-based.  Change the internal
>> loop iteration of streaming to track by bytes instead of sectors
>> (although we are still guaranteed that we iterate by steps that
>> are sector-aligned).
>>
>> Signed-off-by: Eric Blake 
>> Reviewed-by: John Snow 
>>
>> ---
>> v2: no change
>> ---
>>  block/stream.c | 24 ++--
>>  1 file changed, 10 insertions(+), 14 deletions(-)
>>
>> diff --git a/block/stream.c b/block/stream.c
>> index 746d525..2f9618b 100644
>> --- a/block/stream.c
>> +++ b/block/stream.c
>> @@ -108,12 +108,11 @@ static void coroutine_fn stream_run(void *opaque)
>>  BlockBackend *blk = s->common.blk;
>>  BlockDriverState *bs = blk_bs(blk);
>>  BlockDriverState *base = s->base;
>> -int64_t sector_num = 0;
>> -int64_t end = -1;
> 
> Here, end was initialised to -1. This made a differnce for early 'goto
> out' paths because otherwise data->reached_end would incorrectly be true
> in stream_complete.

Oh, good call.

> 
> Because we also check data->ret, I think the only case where it actually
> makes a difference is for the !bs->backing case: This used to result in
> data->reached_end == false, but now it ends up as true. This is because
> s->common.len hasn't been set yet, so it is still 0.

When !bs->backing, ret is 0; so that means my commit message is wrong
about there being no semantic change.  But remember, when !bs->backing,
stream is a no-op (there was no backing file to stream into the current
layer, anyways) - so which do we want, declaring that the operation
never reached the end (odd, since we did nothing), or that it is
complete?  In other words, is this something where I should fix the
semantics (perhaps as a separate bug-fix patch), or where I need to fix
this patch to preserve existing semantics?

The next code reached is stream_complete().  Before my patch, it skipped
the main body of the function (because !data->reached_end); with my
patch, we are now calling bdrv_change_backing_file() (presumably a no-op
when there is no backing?)

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v2 14/15] block: Align block status requests

2017-07-05 Thread Fam Zheng

On Wed, 07/05 07:01, Eric Blake wrote:
> On 07/04/2017 04:44 AM, Fam Zheng wrote:
> > On Mon, 07/03 17:14, Eric Blake wrote:
> >> Any device that has request_alignment greater than 512 should be
> >> unable to report status at a finer granularity; it may also be
> >> simpler for such devices to be guaranteed that the block layer
> >> has rounded things out to the granularity boundary (the way the
> >> block layer already rounds all other I/O out).  Besides, getting
> >> the code correct for super-sector alignment also benefits us
> >> for the fact that our public interface now has byte granularity,
> >> even though none of our drivers have byte-level callbacks.
> >>
> >> Add an assertion in blkdebug that proves that the block layer
> >> never requests status of unaligned sections, similar to what it
> >> does on other requests (while still keeping the generic helper
> >> in place for when future patches add a throttle driver).  Note
> >> note that iotest 177 already covers this (it would fail if you
> > 
> > Drop one "note"?
> 
> Indeed. There are studies on how people read that show that redundant
> words that cross a line boundaries are the most easy to mentally overlook.
> 
> 
> >> +/* Round out to request_alignment boundaries */
> >> +align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
> >> +aligned_offset = QEMU_ALIGN_DOWN(offset, align);
> >> +aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
> >> +
> >> +{
> > 
> > Not sure why using a {} block here?
> > 
> >> +int count; /* sectors */
> 
> Because it makes it easier (less indentation churn) to delete the code
> when series 4 later deletes the .bdrv_co_get_block_status sector-based
> callback in favor of the newer .bdrv_co_block_status byte-based (patch
> 16/15 which start series 4 turns it into an if statement, then a later
> patch deletes the entire conditional); it is also justifiable because it
> creates a tighter scope for 'int count' which is needed only on the
> boundary of sector-based operations when the rest of the function is
> byte-based (rather than declaring the helper variable up front).  I'll
> have to call it out more specifically in the commit message as intentional.

Thanks for explaining, with the syntax error fixed in the commit message:

Reviewed-by: Fam Zheng

Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Fam Zheng

On Wed, 07/05 06:56, Eric Blake wrote:
> On 07/04/2017 02:06 AM, Fam Zheng wrote:
> > On Mon, 07/03 17:14, Eric Blake wrote:
> >> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn 
> >> bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
> >>   * Drivers not implementing the functionality are assumed to not support
> >>   * backing files, hence all their sectors are reported as allocated.
> >>   *
> >> + * If 'allocation' is true, the caller only cares about allocation
> >> + * status; this is a hint that a larger 'pnum' result is more
> >> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
> >> + *
> > 
> > This is slightly unintuitive. I would guess "allocation == false" means "I 
> > don't
> > care about the allocation status" but it actually is "I don't care about the
> > consecutiveness". The "only" semantics is missing in the parameter name.
> > 
> > Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> > true" means BDRV_BLOCK_OFFSET_VALID is wanted.
> 
> Reasonable idea; other [shorter] names I've been toying with:
> strict
> mapping
> precise
> 
> any of which, if true (set true by bdrv_get_block_status), means that I
> care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
> offsets, if false  it means I'm okay getting a larger *pnum even if it
> extends over disjoint host offsets; or:
> 
> fast
> 
> which if true (set true by bdrv_is_allocated) means I want a larger
> *pnum even if I have to sacrifice accuracy by omitting
> BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.
> 
> > 
> > Sorry for bikeshedding.
> 
> Not a problem, I also had some double-takes in writing my own code
> trying to remember which way I wanted the 'allocation' boolean to be
> set, so coming up with a more intuitive name/default state in order to
> help legibility is worth it.  Do any of my above suggestions sound better?
> 

I'd vote for "mapping" because it has a close connection with offset (as in
BDRV_BLOCK_OFFSET_VALID).

Or simply call it "offset" and if false, never return BDRV_BLOCK_OFFSET_VALID.

Fam

Re: [Qemu-block] [PATCH v2 14/15] block: Align block status requests

2017-07-05 Thread Eric Blake

On 07/04/2017 04:44 AM, Fam Zheng wrote:
> On Mon, 07/03 17:14, Eric Blake wrote:
>> Any device that has request_alignment greater than 512 should be
>> unable to report status at a finer granularity; it may also be
>> simpler for such devices to be guaranteed that the block layer
>> has rounded things out to the granularity boundary (the way the
>> block layer already rounds all other I/O out).  Besides, getting
>> the code correct for super-sector alignment also benefits us
>> for the fact that our public interface now has byte granularity,
>> even though none of our drivers have byte-level callbacks.
>>
>> Add an assertion in blkdebug that proves that the block layer
>> never requests status of unaligned sections, similar to what it
>> does on other requests (while still keeping the generic helper
>> in place for when future patches add a throttle driver).  Note
>> note that iotest 177 already covers this (it would fail if you
> 
> Drop one "note"?

Indeed. There are studies on how people read that show that redundant
words that cross a line boundaries are the most easy to mentally overlook.

>> +/* Round out to request_alignment boundaries */
>> +align = MAX(bs->bl.request_alignment, BDRV_SECTOR_SIZE);
>> +aligned_offset = QEMU_ALIGN_DOWN(offset, align);
>> +aligned_bytes = ROUND_UP(offset + bytes, align) - aligned_offset;
>> +
>> +{
> 
> Not sure why using a {} block here?
> 
>> +int count; /* sectors */

Because it makes it easier (less indentation churn) to delete the code
when series 4 later deletes the .bdrv_co_get_block_status sector-based
callback in favor of the newer .bdrv_co_block_status byte-based (patch
16/15 which start series 4 turns it into an if statement, then a later
patch deletes the entire conditional); it is also justifiable because it
creates a tighter scope for 'int count' which is needed only on the
boundary of sector-based operations when the rest of the function is
byte-based (rather than declaring the helper variable up front).  I'll
have to call it out more specifically in the commit message as intentional.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH v2 03/15] block: Add flag to avoid wasted work in bdrv_is_allocated()

2017-07-05 Thread Eric Blake

On 07/04/2017 02:06 AM, Fam Zheng wrote:
> On Mon, 07/03 17:14, Eric Blake wrote:
>> @@ -1717,6 +1718,10 @@ int64_t coroutine_fn 
>> bdrv_co_get_block_status_from_backing(BlockDriverState *bs,
>>   * Drivers not implementing the functionality are assumed to not support
>>   * backing files, hence all their sectors are reported as allocated.
>>   *
>> + * If 'allocation' is true, the caller only cares about allocation
>> + * status; this is a hint that a larger 'pnum' result is more
>> + * important than including BDRV_BLOCK_OFFSET_VALID in the return.
>> + *
> 
> This is slightly unintuitive. I would guess "allocation == false" means "I 
> don't
> care about the allocation status" but it actually is "I don't care about the
> consecutiveness". The "only" semantics is missing in the parameter name.
> 
> Maybe rename it to "consecutive" and invert the logic?  I.e. "consecutive ==
> true" means BDRV_BLOCK_OFFSET_VALID is wanted.

Reasonable idea; other [shorter] names I've been toying with:
strict
mapping
precise

any of which, if true (set true by bdrv_get_block_status), means that I
care more about BDRV_BLOCK_OFFSET_VALID and validity for learning host
offsets, if false  it means I'm okay getting a larger *pnum even if it
extends over disjoint host offsets; or:

fast

which if true (set true by bdrv_is_allocated) means I want a larger
*pnum even if I have to sacrifice accuracy by omitting
BDRV_BLOCK_OFFSET_VALID, because I'm trying to reduce effort spent.

> 
> Sorry for bikeshedding.

Not a problem, I also had some double-takes in writing my own code
trying to remember which way I wanted the 'allocation' boolean to be
set, so coming up with a more intuitive name/default state in order to
help legibility is worth it.  Do any of my above suggestions sound better?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 2/5] block: Don't try to set *errp directly

2017-07-05 Thread Markus Armbruster

Eduardo Habkost  writes:

> Assigning directly to *errp is not valid, as errp may be NULL,
> &error_fatal, or &error_abort.  Use error_propagate() instead.
>
> With this, there's no need to check if errp is NULL anymore, as
> error_propagate() and error_prepend() are able to handle that.
>
> Cc: Kevin Wolf 
> Cc: Max Reitz 
> Cc: qemu-block@nongnu.org
> Signed-off-by: Eduardo Habkost 
> ---
>  block.c | 8 +++-
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/block.c b/block.c
> index fa1d06d846..1750a1838e 100644
> --- a/block.c
> +++ b/block.c
> @@ -4263,11 +4263,9 @@ bool bdrv_op_is_blocked(BlockDriverState *bs, 
> BlockOpType op, Error **errp)
>  assert((int) op >= 0 && op < BLOCK_OP_TYPE_MAX);
>  if (!QLIST_EMPTY(&bs->op_blockers[op])) {
>  blocker = QLIST_FIRST(&bs->op_blockers[op]);
> -if (errp) {
> -*errp = error_copy(blocker->reason);
> -error_prepend(errp, "Node '%s' is busy: ",
> -  bdrv_get_device_or_node_name(bs));
> -}
> +error_propagate(errp, error_copy(blocker->reason));
> +error_prepend(errp, "Node '%s' is busy: ",
> +  bdrv_get_device_or_node_name(bs));
>  return true;
>  }
>  return false;

Reviewed-by: Markus Armbruster

Re: [Qemu-block] [PATCH v3 07/20] mirror: Switch MirrorBlockJob to byte-based

2017-07-05 Thread Kevin Wolf

Am 27.06.2017 um 21:24 hat Eric Blake geschrieben:
> We are gradually converting to byte-based interfaces, as they are
> easier to reason about than sector-based.  Continue by converting an
> internal structure (no semantic change), and all references to the
> buffer size.
> 
> [checkpatch has a false positive on use of MIN() in this patch]
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: John Snow 

I wouldn't mind an assertion that granularity is a multiple of
BDRV_SECTOR_SIZE, along with a comment that explains that this is
required so that we avoid rounding problems when dealing with the bitmap
functions.

blockdev_mirror_common() does already check this, but it feels like it's
a bit far away from where the actual problem would happen in the mirror
job code.

> @@ -768,17 +765,17 @@ static void coroutine_fn mirror_run(void *opaque)
>   * the destination do COW.  Instead, we copy sectors around the
>   * dirty data if needed.  We need a bitmap to do that.
>   */
> +s->target_cluster_size = BDRV_SECTOR_SIZE;
>  bdrv_get_backing_filename(target_bs, backing_filename,
>sizeof(backing_filename));
>  if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
> -target_cluster_size = bdi.cluster_size;
> +s->target_cluster_size = bdi.cluster_size;
>  }

Why have the unrelated bdrv_get_backing_filename() between the two
assignments of s->target_cluster_size? Or actually, wouldn't it be
even easier to read with an else branch?

if (!bdrv_get_info(target_bs, &bdi) && bdi.cluster_size) {
s->target_cluster_size = bdi.cluster_size;
} else {
s->target_cluster_size = BDRV_SECTOR_SIZE;
}

None of these comments are critical, so anyway:

Reviewed-by: Kevin Wolf

Re: [Qemu-block] [Qemu-devel] [PATCH 08/35] block: all bdrv_aio callbacks are coroutine_fn

2017-07-05 Thread Paolo Bonzini

On 05/07/2017 00:03, Marc-André Lureau wrote:
> Signed-off-by: Marc-André Lureau 
> ---
>  include/block/block_int.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 15fa602150..93eb2a9528 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -133,15 +133,15 @@ struct BlockDriver {
>  void (*bdrv_refresh_filename)(BlockDriverState *bs, QDict *options);
>  
>  /* aio */
> -BlockAIOCB *(*bdrv_aio_readv)(BlockDriverState *bs,
> +BlockAIOCB * coroutine_fn (*bdrv_aio_readv)(BlockDriverState *bs,
>  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>  BlockCompletionFunc *cb, void *opaque);
> -BlockAIOCB *(*bdrv_aio_writev)(BlockDriverState *bs,
> +BlockAIOCB * coroutine_fn (*bdrv_aio_writev)(BlockDriverState *bs,
>  int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
>  BlockCompletionFunc *cb, void *opaque);
> -BlockAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
> +BlockAIOCB * coroutine_fn (*bdrv_aio_flush)(BlockDriverState *bs,
>  BlockCompletionFunc *cb, void *opaque);
> -BlockAIOCB *(*bdrv_aio_pdiscard)(BlockDriverState *bs,
> +BlockAIOCB * coroutine_fn (*bdrv_aio_pdiscard)(BlockDriverState *bs,
>  int64_t offset, int bytes,
>  BlockCompletionFunc *cb, void *opaque);
>  
> @@ -247,7 +247,7 @@ struct BlockDriver {
>  void (*bdrv_lock_medium)(BlockDriverState *bs, bool locked);
>  
>  /* to control generic scsi devices */
> -BlockAIOCB *(*bdrv_aio_ioctl)(BlockDriverState *bs,
> +BlockAIOCB * coroutine_fn (*bdrv_aio_ioctl)(BlockDriverState *bs,
>  unsigned long int req, void *buf,
>  BlockCompletionFunc *cb, void *opaque);
>  int coroutine_fn (*bdrv_co_ioctl)(BlockDriverState *bs,
> 


They are, but it's an implementation detail.  Why is this patch necessary?

Thanks,

Paolo

Re: [Qemu-block] [PATCH 0/2] block: make .bdrv_create() a coroutine_fn

2017-07-05 Thread Marc-André Lureau



- Original Message -
> The BlockDriver->bdrv_create() function is always called from coroutine
> context.  These patches rename it and clean up qcow2 code that is currently
> calling CoMutex functions outside coroutine_fn.
> 
> Stefan Hajnoczi (2):
>   block: rename .bdrv_create() to .bdrv_co_create()
>   qcow2: make qcow2_co_create2() a coroutine_fn
> 

I came to the same changes with my series, so
Reviewed-by: Marc-André Lureau 

>  include/block/block_int.h |  3 ++-
>  block.c   |  4 ++--
>  block/crypto.c|  8 
>  block/file-posix.c| 15 ---
>  block/file-win32.c|  3 ++-
>  block/gluster.c   | 12 ++--
>  block/iscsi.c |  7 ---
>  block/nfs.c   |  5 +++--
>  block/parallels.c |  6 --
>  block/qcow.c  |  5 +++--
>  block/qcow2.c | 22 --
>  block/qed.c   |  6 --
>  block/raw-format.c|  5 +++--
>  block/rbd.c   |  6 --
>  block/sheepdog.c  | 10 +-
>  block/ssh.c   |  5 +++--
>  block/vdi.c   |  5 +++--
>  block/vhdx.c  |  5 +++--
>  block/vmdk.c  |  5 +++--
>  block/vpc.c   |  5 +++--
>  20 files changed, 81 insertions(+), 61 deletions(-)
> 
> --
> 2.9.4
> 
>

[Qemu-block] [PATCH 1/2] block: rename .bdrv_create() to .bdrv_co_create()

2017-07-05 Thread Stefan Hajnoczi

BlockDriver->bdrv_create() has been called from coroutine context since
commit 5b7e1542cfa41a281af9629d31cef03704d976e6 ("block: make
bdrv_create adopt coroutine").

Make this explicit by renaming to .bdrv_co_create() and add the
coroutine_fn annotation.  This makes it obvious to block driver authors
that they may yield, use CoMutex, or other coroutine_fn APIs.

Signed-off-by: Stefan Hajnoczi 
---
 include/block/block_int.h |  3 ++-
 block.c   |  4 ++--
 block/crypto.c|  8 
 block/file-posix.c| 15 ---
 block/file-win32.c|  3 ++-
 block/gluster.c   | 12 ++--
 block/iscsi.c |  7 ---
 block/nfs.c   |  5 +++--
 block/parallels.c |  6 --
 block/qcow.c  |  5 +++--
 block/qcow2.c |  5 +++--
 block/qed.c   |  6 --
 block/raw-format.c|  5 +++--
 block/rbd.c   |  6 --
 block/sheepdog.c  | 10 +-
 block/ssh.c   |  5 +++--
 block/vdi.c   |  5 +++--
 block/vhdx.c  |  5 +++--
 block/vmdk.c  |  5 +++--
 block/vpc.c   |  5 +++--
 20 files changed, 72 insertions(+), 53 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 15fa602..3f32c61 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -126,7 +126,8 @@ struct BlockDriver {
 int (*bdrv_file_open)(BlockDriverState *bs, QDict *options, int flags,
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
-int (*bdrv_create)(const char *filename, QemuOpts *opts, Error **errp);
+int coroutine_fn (*bdrv_co_create)(const char *filename, QemuOpts *opts,
+   Error **errp);
 int (*bdrv_set_key)(BlockDriverState *bs, const char *key);
 int (*bdrv_make_empty)(BlockDriverState *bs);
 
diff --git a/block.c b/block.c
index 6943962..223ca3f 100644
--- a/block.c
+++ b/block.c
@@ -416,7 +416,7 @@ static void coroutine_fn bdrv_create_co_entry(void *opaque)
 CreateCo *cco = opaque;
 assert(cco->drv);
 
-ret = cco->drv->bdrv_create(cco->filename, cco->opts, &local_err);
+ret = cco->drv->bdrv_co_create(cco->filename, cco->opts, &local_err);
 error_propagate(&cco->err, local_err);
 cco->ret = ret;
 }
@@ -435,7 +435,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 .err = NULL,
 };
 
-if (!drv->bdrv_create) {
+if (!drv->bdrv_co_create) {
 error_setg(errp, "Driver '%s' does not support image creation", 
drv->format_name);
 ret = -ENOTSUP;
 goto out;
diff --git a/block/crypto.c b/block/crypto.c
index 10e5ddc..8d2f403 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -568,9 +568,9 @@ static int block_crypto_open_luks(BlockDriverState *bs,
  bs, options, flags, errp);
 }
 
-static int block_crypto_create_luks(const char *filename,
-QemuOpts *opts,
-Error **errp)
+static int coroutine_fn block_crypto_co_create_luks(const char *filename,
+QemuOpts *opts,
+Error **errp)
 {
 return block_crypto_create_generic(Q_CRYPTO_BLOCK_FORMAT_LUKS,
filename, opts, errp);
@@ -630,7 +630,7 @@ BlockDriver bdrv_crypto_luks = {
 .bdrv_open  = block_crypto_open_luks,
 .bdrv_close = block_crypto_close,
 .bdrv_child_perm= bdrv_format_default_perms,
-.bdrv_create= block_crypto_create_luks,
+.bdrv_co_create = block_crypto_co_create_luks,
 .bdrv_truncate  = block_crypto_truncate,
 .create_opts= &block_crypto_create_opts_luks,
 
diff --git a/block/file-posix.c b/block/file-posix.c
index 3927fab..aefcae0 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1835,7 +1835,8 @@ static int64_t 
raw_get_allocated_file_size(BlockDriverState *bs)
 return (int64_t)st.st_blocks * 512;
 }
 
-static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
+static int coroutine_fn raw_co_create(const char *filename, QemuOpts *opts,
+  Error **errp)
 {
 int fd;
 int result = 0;
@@ -2197,7 +2198,7 @@ BlockDriver bdrv_file = {
 .bdrv_reopen_commit = raw_reopen_commit,
 .bdrv_reopen_abort = raw_reopen_abort,
 .bdrv_close = raw_close,
-.bdrv_create = raw_create,
+.bdrv_co_create = raw_co_create,
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
 .bdrv_co_get_block_status = raw_co_get_block_status,
 .bdrv_co_pwrite_zeroes = raw_co_pwrite_zeroes,
@@ -2592,8 +2593,8 @@ static coroutine_fn int 
hdev_co_pwrite_zeroes(BlockDriverState *bs,
 return -ENOTSUP;
 }
 
-static int hdev_create(const char *filename, QemuOpts *opts,
-   Error **errp

[Qemu-block] [PATCH 2/2] qcow2: make qcow2_co_create2() a coroutine_fn

2017-07-05 Thread Stefan Hajnoczi

qcow2_create2() calls qemu_co_mutex_lock().  Only a coroutine_fn may
call another coroutine_fn.

Rename the function (the block layer API is now called
.bdrv_co_create()) and add coroutine_fn.  It is always called from
coroutine context.

Reported-by: Marc-André Lureau 
Signed-off-by: Stefan Hajnoczi 
---
 block/qcow2.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 5279839..b0ae612 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2140,11 +2140,12 @@ static int preallocate(BlockDriverState *bs)
 return 0;
 }
 
-static int qcow2_create2(const char *filename, int64_t total_size,
- const char *backing_file, const char *backing_format,
- int flags, size_t cluster_size, PreallocMode prealloc,
- QemuOpts *opts, int version, int refcount_order,
- Error **errp)
+static int coroutine_fn
+qcow2_co_create2(const char *filename, int64_t total_size,
+ const char *backing_file, const char *backing_format,
+ int flags, size_t cluster_size, PreallocMode prealloc,
+ QemuOpts *opts, int version, int refcount_order,
+ Error **errp)
 {
 int cluster_bits;
 QDict *options;
@@ -2476,9 +2477,9 @@ static int coroutine_fn qcow2_co_create(const char 
*filename, QemuOpts *opts,
 
 refcount_order = ctz32(refcount_bits);
 
-ret = qcow2_create2(filename, size, backing_file, backing_fmt, flags,
-cluster_size, prealloc, opts, version, refcount_order,
-&local_err);
+ret = qcow2_co_create2(filename, size, backing_file, backing_fmt, flags,
+   cluster_size, prealloc, opts, version,
+   refcount_order, &local_err);
 error_propagate(errp, local_err);
 
 finish:
-- 
2.9.4

[Qemu-block] [PATCH 0/2] block: make .bdrv_create() a coroutine_fn

2017-07-05 Thread Stefan Hajnoczi

The BlockDriver->bdrv_create() function is always called from coroutine
context.  These patches rename it and clean up qcow2 code that is currently
calling CoMutex functions outside coroutine_fn.

Stefan Hajnoczi (2):
  block: rename .bdrv_create() to .bdrv_co_create()
  qcow2: make qcow2_co_create2() a coroutine_fn

 include/block/block_int.h |  3 ++-
 block.c   |  4 ++--
 block/crypto.c|  8 
 block/file-posix.c| 15 ---
 block/file-win32.c|  3 ++-
 block/gluster.c   | 12 ++--
 block/iscsi.c |  7 ---
 block/nfs.c   |  5 +++--
 block/parallels.c |  6 --
 block/qcow.c  |  5 +++--
 block/qcow2.c | 22 --
 block/qed.c   |  6 --
 block/raw-format.c|  5 +++--
 block/rbd.c   |  6 --
 block/sheepdog.c  | 10 +-
 block/ssh.c   |  5 +++--
 block/vdi.c   |  5 +++--
 block/vhdx.c  |  5 +++--
 block/vmdk.c  |  5 +++--
 block/vpc.c   |  5 +++--
 20 files changed, 81 insertions(+), 61 deletions(-)

-- 
2.9.4

Re: [Qemu-block] [PATCH 12/17] migration: add postcopy migration of dirty bitmaps

2017-07-05 Thread Vladimir Sementsov-Ogievskiy


16.02.2017 16:04, Fam Zheng wrote:

+dbms->node_name = bdrv_get_node_name(bs);
+if (!dbms->node_name || dbms->node_name[0] == '\0') {
+dbms->node_name = bdrv_get_device_name(bs);
+}
+dbms->bitmap = bitmap;

What protects the case that the bitmap is released before migration completes?


What is the source of such deletion? qmp command? Theoretically possible.

I see the following variants:

1. additional variable BdrvDirtyBItmap.migration, which forbids bitmap 
deletion


2. make bitmap anonymous (bdrv_dirty_bitmap_make_anon) - it will not be 
available through qmp


what do you think?

--
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PATCH v3 5/5] tests: Add check-qobject for equality tests

2017-07-05 Thread Markus Armbruster

Max Reitz  writes:

> Add a new test file (check-qobject.c) for unit tests that concern
> QObjects as a whole.
>
> Its only purpose for now is to test the qobject_is_equal() function.
>
> Signed-off-by: Max Reitz 

I'll review this one as soon as we made up our minds on what exactly
qnum_is_equal() should do.

Re: [Qemu-block] [Qemu-devel] [PATCH v3 3/5] block: qobject_is_equal() in bdrv_reopen_prepare()

2017-07-05 Thread Markus Armbruster

Max Reitz  writes:

> Currently, bdrv_reopen_prepare() assumes that all BDS options are
> strings. However, this is not the case if the BDS has been created
> through the json: pseudo-protocol or blockdev-add.
>
> Note that the user-invokable reopen command is an HMP command, so you
> can only specify strings there. Therefore, specifying a non-string
> option with the "same" value as it was when originally created will now
> return an error because the values are supposedly similar (and there is
> no way for the user to circumvent this but to just not specify the
> option again -- however, this is still strictly better than just
> crashing).
>
> Signed-off-by: Max Reitz 
> ---
>  block.c | 31 ++-
>  1 file changed, 18 insertions(+), 13 deletions(-)
>
> diff --git a/block.c b/block.c
> index 913bb43..45eb248 100644
> --- a/block.c
> +++ b/block.c
> @@ -2947,19 +2947,24 @@ int bdrv_reopen_prepare(BDRVReopenState 
> *reopen_state, BlockReopenQueue *queue,
>  const QDictEntry *entry = qdict_first(reopen_state->options);
>  
>  do {
> -QString *new_obj = qobject_to_qstring(entry->value);
> -const char *new = qstring_get_str(new_obj);
> -/*
> - * Caution: while qdict_get_try_str() is fine, getting
> - * non-string types would require more care.  When
> - * bs->options come from -blockdev or blockdev_add, its
> - * members are typed according to the QAPI schema, but
> - * when they come from -drive, they're all QString.
> - */
> -const char *old = qdict_get_try_str(reopen_state->bs->options,
> -entry->key);
> -
> -if (!old || strcmp(new, old)) {
> +QObject *new = entry->value;
> +QObject *old = qdict_get(reopen_state->bs->options, entry->key);
> +
> +/* TODO: When using -drive to specify blockdev options, all 
> values
> + * will be strings; however, when using -blockdev, blockdev-add 
> or
> + * filenames using the json:{} pseudo-protocol, they will be
> + * correctly typed.
> + * In contrast, reopening options are (currently) always strings
> + * (because you can only specify them through qemu-io; all other
> + * callers do not specify any options).
> + * Therefore, when using anything other than -drive to create a 
> BDS,
> + * this cannot detect non-string options as unchanged, because
> + * qobject_is_equal() always returns false for objects of 
> different
> + * type.  In the future, this should be remedied by correctly 
> typing
> + * all options.  For now, this is not too big of an issue because
> + * the user simply can not specify options which cannot be 
> changed
> + * anyway, so they will stay unchanged. */

I'm not the maintainer, and this is not a demand: consider winging this
comment and wrapping lines around column 70.

As much as I fancy word play (see your reply to Eric), I have to admit
that I had to read "the user simply can not specify options" about three
times to make sense of it.  Please consider a wording that's easier to
grasp, perhaps "the user can simply refrain from specifying options that
cannot be changed".

> +if (!qobject_is_equal(new, old)) {
>  error_setg(errp, "Cannot change the option '%s'", 
> entry->key);
>  ret = -EINVAL;
>  goto error;

Re: [Qemu-block] [Qemu-devel] [PATCH v3 2/5] qapi: Add qobject_is_equal()

2017-07-05 Thread Markus Armbruster

Max Reitz  writes:

> This generic function (along with its implementations for different
> types) determines whether two QObjects are equal.
>
> Signed-off-by: Max Reitz 
[...]
> diff --git a/qobject/qnum.c b/qobject/qnum.c
> index 476e81c..784d061 100644
> --- a/qobject/qnum.c
> +++ b/qobject/qnum.c
> @@ -213,6 +213,59 @@ QNum *qobject_to_qnum(const QObject *obj)
>  }
>  
>  /**
> + * qnum_is_equal(): Test whether the two QNums are equal
> + */
> +bool qnum_is_equal(const QObject *x, const QObject *y)
> +{
> +QNum *num_x = qobject_to_qnum(x);
> +QNum *num_y = qobject_to_qnum(y);
> +
> +switch (num_x->kind) {
> +case QNUM_I64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +/* Comparison in native int64_t type */
> +return num_x->u.i64 == num_y->u.i64;
> +case QNUM_U64:
> +/* Implicit conversion of x to uin64_t, so we have to
> + * check its sign before */
> +return num_x->u.i64 >= 0 && num_x->u.i64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Implicit conversion of x to double; no overflow
> + * possible */
> +return num_x->u.i64 == num_y->u.dbl;

Overflow is impossible, but loss of precision is possible:

(double)9007199254740993ull == 9007199254740992.0

yields true.  Is this what we want?

> +}
> +abort();
> +case QNUM_U64:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +/* Comparison in native uint64_t type */
> +return num_x->u.u64 == num_y->u.u64;
> +case QNUM_DOUBLE:
> +/* Implicit conversion of x to double; no overflow
> + * possible */
> +return num_x->u.u64 == num_y->u.dbl;

Similar loss of precision.

> +}
> +abort();
> +case QNUM_DOUBLE:
> +switch (num_y->kind) {
> +case QNUM_I64:
> +return qnum_is_equal(y, x);
> +case QNUM_U64:
> +return qnum_is_equal(y, x);
> +case QNUM_DOUBLE:
> +/* Comparison in native double type */
> +return num_x->u.dbl == num_y->u.dbl;
> +}
> +abort();
> +}
> +
> +abort();
> +}

I think there's more than one sane interpretations of "is equal",
including:

* The mathematical numbers represented by @x and @y are equal.

* @x and @y have the same contents, i.e. same kind and u.

* @x and @y are the same object (listed for completeness; we don't need
  a function to compare pointers).

Your patch implements yet another one.  Which one do we want, and why?

The second is easier to implement than the first.

If we really want the first, you need to fix the loss of precision bugs.
I guess the obvious fix is

return (double)x == x && x == y;

Note that this is what you do for mixed signedness: first check @x is
exactly representable in @y's type, then compare in @y's type.

Regardless of which one we pick, the function comment needs to explain.

> +
> +/**
>   * qnum_destroy_obj(): Free all memory allocated by a
>   * QNum object
>   */
[...]

Remainder of the patch looks good to me.

1 2 >

100 matches

Mail list logo