date:20211123

Re: [PATCH v11 19/26] linux-user: Add LoongArch signal support

2021-11-23 Thread gaosong


Hi Richard,

On 2021/11/24 下午3:27, Richard Henderson wrote:

On 11/24/21 3:46 AM, gaosong wrote:

Hi Richard,

On 2021/11/20 下午6:33, Richard Henderson wrote:

On 11/19/21 7:13 AM, Song Gao wrote:


+
+struct target_sigcontext {
+    uint64_t   sc_pc;
+    uint64_t   sc_gpr[32];
+    uint64_t   sc_fpr[32];
+    uint64_t   sc_fcc;
+    uint32_t   sc_fcsr;
+    uint32_t   sc_flags;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/sigcontext.h 




#define FPU_REG_WIDTH   256
union fpureg {
 uint32_t   val32[FPU_REG_WIDTH / 32];
 uint64_t   val64[FPU_REG_WIDTH / 64];
};

struct target_sigcontext {
 uint64_t   sc_pc;
 uint64_t   sc_regs[32];
 uint32_t   sc_flags;

 uint32_t   sc_fcsr;
 uint32_t   sc_vcsr;
 uint64_t   sc_fcc;
 uint64_t   scr[4];
 union fpureg    sc_fpregs[32] __attribute__((aligned(32)));

 uint32_t   sc_reserved;
};

Is this OK?


No, sc_reserved does not match.


uint8   sc_reserved[4096] __attribute__((aligned(16)));

BTW,  should we set scr[0-3] as 0 ?




+
+struct target_ucontext {
+    target_ulong tuc_flags;
+    target_ulong tuc_link;
+    target_stack_t tuc_stack;
+    target_ulong pad0;
+    struct target_sigcontext tuc_mcontext;
+    target_sigset_t tuc_sigmask;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/ucontext.h 




struct target_ucontext {
 target_ulong tuc_flags;
 target_ulong tuc_link;
 target_stack_t tuc_stack;
 target_sigset_t tuc_sigmask;
 target_ulong pad0;
 struct target_sigcontext tuc_mcontext;
};

Is this OK?


No, pad0 does not match __unused.


uint8  unused[1024 / 8 - sizeof(target_sigset_t)];
Is this right?

Thanks
Song Gao



r~

Re: [PATCH] virtio-balloon: correct used length

2021-11-23 Thread Michael S. Tsirkin

On Wed, Nov 24, 2021 at 12:32:55PM +0800, Jason Wang wrote:
> Spec said:
> 
> "and len the total of bytes written into the buffer."
> 
> For inflateq, deflateq and statsq, we don't process in_sg so the used
> length should be zero. For free_page_vq, though the free pages are
> supplied via in_sgs, zero used length should still be fine since
> anyway driver is expected to use the length in this case and it
> simplifies the error handling path.
> 
> Signed-off-by: Jason Wang 

I think for free page vq the point is that the pages are
zeroed by hypervisor, so we must set used len accordingly. No?

> ---
>  hw/virtio/virtio-balloon.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index c6962fcbfe..3e52daa793 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -231,7 +231,7 @@ static void balloon_stats_poll_cb(void *opaque)
>  return;
>  }
>  
> -virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
> +virtqueue_push(s->svq, s->stats_vq_elem, 0);
>  virtio_notify(vdev, s->svq);
>  g_free(s->stats_vq_elem);
>  s->stats_vq_elem = NULL;
> @@ -438,7 +438,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
> *vdev, VirtQueue *vq)
>  memory_region_unref(section.mr);
>  }
>  
> -virtqueue_push(vq, elem, offset);
> +virtqueue_push(vq, elem, 0);
>  virtio_notify(vdev, vq);
>  g_free(elem);
>  virtio_balloon_pbp_free(&pbp);
> @@ -549,7 +549,7 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
>  }
>  
>  out:
> -virtqueue_push(vq, elem, 1);
> +virtqueue_push(vq, elem, 0);
>  g_free(elem);
>  return ret;
>  }
> -- 
> 2.25.1

Re: [PATCH] microvm: use MachineState->dumpdtb

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/24/21 08:24, Igor Mammedov wrote:
> On Tue, 23 Nov 2021 10:16:48 +0100
> Gerd Hoffmann  wrote:
> 
>> There already is a machine property to dumb the device tree for
> s/dumb/dump/
> 
> 
>> debugging purposes, and the helper function qemu_fdt_dumpdtb()
>> implementing the dumbing.  Make microvm use it for consistency.
> ditto
> 
>>
>> Signed-off-by: Gerd Hoffmann 
>> ---
>>  hw/i386/microvm-dt.c | 9 +
>>  1 file changed, 1 insertion(+), 8 deletions(-)

Regardless of the funny typo:
Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 2/4] include/sysemu/blockdev.h: rename if_name in block_if_name

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/24/21 07:36, Emanuele Giuseppe Esposito wrote:
> In preparation to next patch, where we export it to be used
> also in softmmu/vlc.c

"vl.c"? :)

> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  blockdev.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH 4/4] include/sysemu/blockdev.h: remove drive_get_max_devs

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/24/21 07:36, Emanuele Giuseppe Esposito wrote:
> Remove drive_get_max_devs, as it is not used by anyone.

Maybe complete:

  Last use was removed in commit 8f2d75e81d5
  ("hw: Drop superfluous special checks for orphaned -drive").

> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  include/sysemu/blockdev.h |  1 -
>  blockdev.c| 17 -
>  2 files changed, 18 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

How to enable virgl in headless mode?

2021-11-23 Thread Yonggang Luo

Here is my command line
```
qemu-system-arm ^
 -monitor telnet:127.0.0.1:5318,server=on,wait=off,nodelay=on ^
 -serial telnet:127.0.0.1:5319,server=on,wait=on,nodelay=on ^
 -parallel none ^
 -cpu cortex-a15 -smp 4 -M virt -m 2G ^
 -kernel armhf-installed/vmlinuz ^
 -initrd armhf-installed/initrd.img ^
 -vnc :1,websocket=5701 ^
 -nographic ^
 -append "root=/dev/vda2 console=ttyAMA0" ^
 -device virtio-gpu-gl-device ^
 -device virtio-blk-device,drive=hd -drive
file=armhf-installed/debian_11.img,if=none,id=hd ^
 -device virtio-net-device,netdev=net0 -netdev
user,hostfwd=tcp::-:22,id=net0
```
-- 
 此致
礼
罗勇刚
Yours
sincerely,
Yonggang Luo

Re: [PATCH v1 1/7] softmmu: fix watchpoint-interrupt races

2021-11-23 Thread Richard Henderson


On 11/23/21 9:57 PM, Alex Bennée wrote:

From: Pavel Dovgalyuk 

Watchpoint may be processed in two phases. First one is detecting
the instruction with target memory access. And the second one is
executing only one instruction and setting the debug interrupt flag.
Hardware interrupts can break this sequence when they happen after
the first watchpoint phase.
This patch postpones the interrupt request until watchpoint is
processed.

Signed-off-by: Pavel Dovgalyuk 
Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Message-Id: <163662451431.125458.14945698834107669531.stgit@pasha-ThinkPad-X280>
Signed-off-by: Alex Bennée 
---
  accel/tcg/cpu-exec.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..9cb892e326 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -742,6 +742,11 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
  qemu_mutex_unlock_iothread();
  return true;
  }
+/* Process watchpoints first, or interrupts will ruin everything */
+if (cpu->watchpoint_hit) {
+qemu_mutex_unlock_iothread();
+return false;
+}


I think this is redundant with the next patch.


r~

[PATCH v5 28/31] block.c: assert BQL lock held in bdrv_co_invalidate_cache

2021-11-23 Thread Emanuele Giuseppe Esposito

bdrv_co_invalidate_cache is special: it is an I/O function,
but uses the block layer permission API, which is GS.

Because of this, we can assert that either the function is
being called with BQL held, and thus can use the permission API,
or make sure that the permission API is not used, by ensuring that
bs (and parents) .open_flags does not contain BDRV_O_INACTIVE.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/block.c b/block.c
index a0309f827d..805974676b 100644
--- a/block.c
+++ b/block.c
@@ -6574,6 +6574,26 @@ void bdrv_init_with_whitelist(void)
 bdrv_init();
 }
 
+static bool bdrv_is_active(BlockDriverState *bs)
+{
+BdrvChild *parent;
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+return false;
+}
+
+QLIST_FOREACH(parent, &bs->parents, next_parent) {
+if (parent->klass->parent_is_bds) {
+BlockDriverState *parent_bs = parent->opaque;
+if (!bdrv_is_active(parent_bs)) {
+return false;
+}
+}
+}
+
+   return true;
+}
+
 int coroutine_fn bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *child, *parent;
@@ -6585,6 +6605,12 @@ int coroutine_fn 
bdrv_co_invalidate_cache(BlockDriverState *bs, Error **errp)
 return -ENOMEDIUM;
 }
 
+/*
+ * No need to muck with permissions if bs is active.
+ * TODO: should activation be a separate function?
+ */
+assert(qemu_in_main_thread() || bdrv_is_active(bs));
+
 QLIST_FOREACH(child, &bs->children, next) {
 bdrv_co_invalidate_cache(child->bs, &local_err);
 if (local_err) {
-- 
2.27.0

Re: [RFC PATCH v2 06/44] hw/i386: Introduce kvm-type for TDX guest

2021-11-23 Thread Xiaoyao Li


On 8/26/2021 6:22 PM, Gerd Hoffmann wrote:

On Wed, Jul 07, 2021 at 05:54:36PM -0700, isaku.yamah...@gmail.com wrote:

From: Xiaoyao Li 

Introduce a machine property, kvm-type, to allow the user to create a
Trusted Domain eXtensions (TDX) VM, a.k.a. a Trusted Domain (TD), e.g.:

  # $QEMU \
-machine ...,kvm-type=tdx \
...


Sorry for the very late reply.


Can we align sev and tdx better than that?

SEV is enabled this way:

qemu -machine ...,confidential-guest-support=sev0 \
  -object sev-guest,id=sev0,...

(see docs/amd-memory-encryption.txt for details).

tdx could likewise use a tdx-guest object (and both sev-guest and
tdx-guest should probably have a common parent object type) to enable
and configure tdx support.


yes, sev only introduced a new object and passed it to 
confidential-guest-support. This is because SEV doesn't require the new 
type of VM.

However, TDX does require a new type of VM.

If we read KVM code, there is a parameter of CREATE_VM to pass the 
vm_type, though x86 doesn't use this field so far. On QEMU side, it also 
has the codes to pass/configure vm-type in command line. Of cousre, x86 
arch doesn't implement it. With upcoming TDX, it will implement and use 
vm type for TDX. That's the reason we wrote this patch to implement 
kvm-type for x86, similar to other arches.


yes, of course we can infer the vm_type from "-object tdx-guest". But I 
prefer to just use vm_type. Let's see others opinion.


thanks,
-Xiaoyao


take care,
   Gerd

Re: [PATCH 3/4] include/sysemu/blockdev.h: move drive_add and inline drive_def

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/24/21 07:36, Emanuele Giuseppe Esposito wrote:
> drive_add is only used in softmmu/vl.c, so it can be a static
> function there, and drive_def is only a particular use case of
> qemu_opts_parse_noisily, so it can be inlined.
> 
> Also remove drive_mark_claimed_by_board, as it is only defined
> but not implemented (nor used) anywhere.
> 
> Signed-off-by: Emanuele Giuseppe Esposito 
> ---
>  include/sysemu/blockdev.h  |  6 ++
>  block/monitor/block-hmp-cmds.c |  2 +-
>  blockdev.c | 27 +--
>  softmmu/vl.c   | 25 -
>  4 files changed, 28 insertions(+), 32 deletions(-)
> 
> diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
> index 32c2d6023c..aacc587a33 100644
> --- a/include/sysemu/blockdev.h
> +++ b/include/sysemu/blockdev.h
> @@ -27,6 +27,8 @@ typedef enum {
>  IF_COUNT
>  } BlockInterfaceType;
>  
> +extern const char *const block_if_name[];

Maybe a cleaner alternative is to ignore the previous patch,
and add a new public method:

  const char *block_if_name(BlockInterfaceType iface);

[PATCH v5 27/31] job.h: assertions in the callers of JobDriver funcion pointers

2021-11-23 Thread Emanuele Giuseppe Esposito

Also assert that job->run() callback is called in the job aiocontext.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 job.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/job.c b/job.c
index dbfa67bb0a..bb57ec6887 100644
--- a/job.c
+++ b/job.c
@@ -380,6 +380,8 @@ void job_ref(Job *job)
 
 void job_unref(Job *job)
 {
+assert(qemu_in_main_thread());
+
 if (--job->refcnt == 0) {
 assert(job->status == JOB_STATUS_NULL);
 assert(!timer_pending(&job->sleep_timer));
@@ -601,6 +603,7 @@ bool job_user_paused(Job *job)
 void job_user_resume(Job *job, Error **errp)
 {
 assert(job);
+assert(qemu_in_main_thread());
 if (!job->user_paused || job->pause_count <= 0) {
 error_setg(errp, "Can't resume a job that was not paused");
 return;
@@ -671,6 +674,7 @@ static void job_update_rc(Job *job)
 static void job_commit(Job *job)
 {
 assert(!job->ret);
+assert(qemu_in_main_thread());
 if (job->driver->commit) {
 job->driver->commit(job);
 }
@@ -679,6 +683,7 @@ static void job_commit(Job *job)
 static void job_abort(Job *job)
 {
 assert(job->ret);
+assert(qemu_in_main_thread());
 if (job->driver->abort) {
 job->driver->abort(job);
 }
@@ -686,6 +691,7 @@ static void job_abort(Job *job)
 
 static void job_clean(Job *job)
 {
+assert(qemu_in_main_thread());
 if (job->driver->clean) {
 job->driver->clean(job);
 }
@@ -725,6 +731,7 @@ static int job_finalize_single(Job *job)
 
 static void job_cancel_async(Job *job, bool force)
 {
+assert(qemu_in_main_thread());
 if (job->driver->cancel) {
 force = job->driver->cancel(job, force);
 } else {
@@ -824,6 +831,7 @@ static void job_completed_txn_abort(Job *job)
 
 static int job_prepare(Job *job)
 {
+assert(qemu_in_main_thread());
 if (job->ret == 0 && job->driver->prepare) {
 job->ret = job->driver->prepare(job);
 job_update_rc(job);
@@ -950,6 +958,7 @@ static void coroutine_fn job_co_entry(void *opaque)
 {
 Job *job = opaque;
 
+assert(job->aio_context == qemu_get_current_aio_context());
 assert(job && job->driver && job->driver->run);
 job_pause_point(job);
 job->ret = job->driver->run(job, &job->err);
@@ -1053,6 +1062,7 @@ void job_complete(Job *job, Error **errp)
 {
 /* Should not be reachable via external interface for internal jobs */
 assert(job->id);
+assert(qemu_in_main_thread());
 if (job_apply_verb(job, JOB_VERB_COMPLETE, errp)) {
 return;
 }
-- 
2.27.0

Re: [PATCH v11 19/26] linux-user: Add LoongArch signal support

2021-11-23 Thread Richard Henderson


On 11/24/21 3:46 AM, gaosong wrote:

Hi Richard,

On 2021/11/20 下午6:33, Richard Henderson wrote:

On 11/19/21 7:13 AM, Song Gao wrote:


+
+struct target_sigcontext {
+    uint64_t   sc_pc;
+    uint64_t   sc_gpr[32];
+    uint64_t   sc_fpr[32];
+    uint64_t   sc_fcc;
+    uint32_t   sc_fcsr;
+    uint32_t   sc_flags;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/sigcontext.h 




#define FPU_REG_WIDTH   256
union fpureg {
 uint32_t   val32[FPU_REG_WIDTH / 32];
 uint64_t   val64[FPU_REG_WIDTH / 64];
};

struct target_sigcontext {
 uint64_t   sc_pc;
 uint64_t   sc_regs[32];
 uint32_t   sc_flags;

 uint32_t   sc_fcsr;
 uint32_t   sc_vcsr;
 uint64_t   sc_fcc;
 uint64_t   scr[4];
 union fpuregsc_fpregs[32] __attribute__((aligned(32)));

 uint32_t   sc_reserved;
};

Is this OK?


No, sc_reserved does not match.




+
+struct target_ucontext {
+    target_ulong tuc_flags;
+    target_ulong tuc_link;
+    target_stack_t tuc_stack;
+    target_ulong pad0;
+    struct target_sigcontext tuc_mcontext;
+    target_sigset_t tuc_sigmask;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/ucontext.h 




struct target_ucontext {
 target_ulong tuc_flags;
 target_ulong tuc_link;
 target_stack_t tuc_stack;
 target_sigset_t tuc_sigmask;
 target_ulong pad0;
 struct target_sigcontext tuc_mcontext;
};

Is this OK?


No, pad0 does not match __unused.


r~

Re: [PATCH v5 02/18] exec/memop: Adding signed quad and octo defines

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/12/21 15:58, Frédéric Pétrot wrote:
> Adding defines to handle signed 64-bit and unsigned 128-bit quantities in
> memory accesses.
> 
> Signed-off-by: Frédéric Pétrot 
> ---
>  include/exec/memop.h | 7 +++
>  1 file changed, 7 insertions(+)

Reviewed-by: Philippe Mathieu-Daudé

[PATCH v5 25/31] block-backend-common.h: split function pointers in BlockDevOps

2021-11-23 Thread Emanuele Giuseppe Esposito

Assertions in the callers of the function pointrs are already
added by previous patches.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/sysemu/block-backend-common.h | 28 ++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/include/sysemu/block-backend-common.h 
b/include/sysemu/block-backend-common.h
index 6963bbf45a..ae630c624c 100644
--- a/include/sysemu/block-backend-common.h
+++ b/include/sysemu/block-backend-common.h
@@ -27,6 +27,14 @@
 
 /* Callbacks for block device models */
 typedef struct BlockDevOps {
+
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 /*
  * Runs when virtual media changed (monitor commands eject, change)
  * Argument load is true on load and false on eject.
@@ -44,16 +52,26 @@ typedef struct BlockDevOps {
  * true, even if they do not support eject requests.
  */
 void (*eject_request_cb)(void *opaque, bool force);
-/*
- * Is the virtual tray open?
- * Device models implement this only when the device has a tray.
- */
-bool (*is_tray_open)(void *opaque);
+
 /*
  * Is the virtual medium locked into the device?
  * Device models implement this only when device has such a lock.
  */
 bool (*is_medium_locked)(void *opaque);
+
+/*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
+
+/*
+ * Is the virtual tray open?
+ * Device models implement this only when the device has a tray.
+ */
+bool (*is_tray_open)(void *opaque);
+
 /*
  * Runs when the size changed (e.g. monitor command block_resize)
  */
-- 
2.27.0

Re: [PATCH v5 07/18] target/riscv: setup everything so that riscv128-softmmu compiles

2021-11-23 Thread Philippe Mathieu-Daudé

Hi Frédéric,

On 11/24/21 07:55, Frédéric Pétrot wrote:
> On 24/11/2021 07:12, Alistair Francis wrote:
>> On Sat, Nov 13, 2021 at 1:16 AM Frédéric Pétrot
>>  wrote:
>>>
>>> This patch is kind of a mess because several files have to be slightly
>>> modified to allow for a new target. In the current status, we have done
>>> our best to have RV64 and RV128 under the same RV64 umbrella, but there
>>> is still work to do to have a single executable for both.
>>> In particular, we have no atomic accesses for aligned 128-bit addresses.
>>>
>>> Once this patch applied, adding risc128-sofmmu to --target-list produces
>>> a (no so useful yet) executable.
>>
>> I can't remember if we discussed this before, but do we need the
>> riscv128-sofmmu executable? Can we instead just use a riscv64-sofmmu
>> executable?
> 
>   Hello Alistair,
>   Richard was also advocating for a single executable, but pointed out that
>   we need to disable mttcg because there is a need for specific tcg
> support for
>   128-bit aligned atomics.
>   Given my understanding of that part of QEMU, I choose the easy way to
> disable
>   it once and for all at compile time until we have that.


In rv128_base_cpu_init():

  if (qemu_tcg_mttcg_enabled) {
  /* Missing 128-bit aligned atomics */
  error_report("128-bit RISC-V currently does not work"
   " with Multi Threaded TCG. Please use:"
   " -accel tcg,thread=single");
  exit(EXIT_FAILURE);
  }

Regards,

Phil.

Re: [PATCH v1 4/7] plugins/meson.build: fix linker issue with weird paths

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/23/21 21:57, Alex Bennée wrote:
> Signed-off-by: Alex Bennée 
> Tested-by: Stefan Weil 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/712
> ---
>  plugins/meson.build | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH v1 3/7] tests/avocado: fix tcg_plugin mem access count test

2021-11-23 Thread Philippe Mathieu-Daudé

On 11/23/21 21:57, Alex Bennée wrote:
> When we cleaned up argument handling the test was missed.
> 
> Fixes: 5ae589faad ("tests/plugins/mem: introduce "track" arg and make args 
> not positional")
> Signed-off-by: Alex Bennée 
> ---
>  tests/avocado/tcg_plugins.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé

[PATCH v5 23/31] block_int-common.h: split function pointers in BdrvChildClass

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block_int-common.h | 67 +++-
 1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 0e63dc694f..3ceb2365a8 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -817,12 +817,16 @@ struct BdrvChildClass {
  */
 bool parent_is_bds;
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
 void (*inherit_options)(BdrvChildRole role, bool parent_is_format,
 int *child_flags, QDict *child_options,
 int parent_flags, QDict *parent_options);
-
 void (*change_media)(BdrvChild *child, bool load);
-void (*resize)(BdrvChild *child);
 
 /*
  * Returns a name that is supposedly more useful for human users than the
@@ -839,6 +843,40 @@ struct BdrvChildClass {
  */
 char *(*get_parent_desc)(BdrvChild *child);
 
+/*
+ * Notifies the parent that the child has been activated/inactivated (e.g.
+ * when migration is completing) and it can start/stop requesting
+ * permissions and doing I/O on it.
+ */
+void (*activate)(BdrvChild *child, Error **errp);
+int (*inactivate)(BdrvChild *child);
+
+void (*attach)(BdrvChild *child);
+void (*detach)(BdrvChild *child);
+
+/*
+ * Notifies the parent that the filename of its child has changed (e.g.
+ * because the direct child was removed from the backing chain), so that it
+ * can update its reference.
+ */
+int (*update_filename)(BdrvChild *child, BlockDriverState *new_base,
+   const char *filename, Error **errp);
+
+bool (*can_set_aio_ctx)(BdrvChild *child, AioContext *ctx,
+GSList **ignore, Error **errp);
+void (*set_aio_ctx)(BdrvChild *child, AioContext *ctx, GSList **ignore);
+
+AioContext *(*get_parent_aio_context)(BdrvChild *child);
+
+/*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
+
+void (*resize)(BdrvChild *child);
+
 /*
  * If this pair of functions is implemented, the parent doesn't issue new
  * requests after returning from .drained_begin() until .drained_end() is
@@ -863,31 +901,6 @@ struct BdrvChildClass {
  * activity on the child has stopped.
  */
 bool (*drained_poll)(BdrvChild *child);
-
-/*
- * Notifies the parent that the child has been activated/inactivated (e.g.
- * when migration is completing) and it can start/stop requesting
- * permissions and doing I/O on it.
- */
-void (*activate)(BdrvChild *child, Error **errp);
-int (*inactivate)(BdrvChild *child);
-
-void (*attach)(BdrvChild *child);
-void (*detach)(BdrvChild *child);
-
-/*
- * Notifies the parent that the filename of its child has changed (e.g.
- * because the direct child was removed from the backing chain), so that it
- * can update its reference.
- */
-int (*update_filename)(BdrvChild *child, BlockDriverState *new_base,
-   const char *filename, Error **errp);
-
-bool (*can_set_aio_ctx)(BdrvChild *child, AioContext *ctx,
-GSList **ignore, Error **errp);
-void (*set_aio_ctx)(BdrvChild *child, AioContext *ctx, GSList **ignore);
-
-AioContext *(*get_parent_aio_context)(BdrvChild *child);
 };
 
 extern const BdrvChildClass child_of_bds;
-- 
2.27.0

[PATCH v5 22/31] block_int-common.h: assertion in the callers of BlockDriver function pointers

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 block.c| 18 ++
 block/create.c | 10 ++
 2 files changed, 28 insertions(+)

diff --git a/block.c b/block.c
index b77ab0a104..180884b8c0 100644
--- a/block.c
+++ b/block.c
@@ -526,6 +526,7 @@ static void coroutine_fn bdrv_create_co_entry(void *opaque)
 
 CreateCo *cco = opaque;
 assert(cco->drv);
+assert(qemu_in_main_thread());
 
 ret = cco->drv->bdrv_co_create_opts(cco->drv,
 cco->filename, cco->opts, &local_err);
@@ -1090,6 +1091,7 @@ int refresh_total_sectors(BlockDriverState *bs, int64_t 
hint)
 static void bdrv_join_options(BlockDriverState *bs, QDict *options,
   QDict *old_options)
 {
+assert(qemu_in_main_thread());
 if (bs->drv && bs->drv->bdrv_join_options) {
 bs->drv->bdrv_join_options(options, old_options);
 } else {
@@ -1598,6 +1600,7 @@ static int bdrv_open_driver(BlockDriverState *bs, 
BlockDriver *drv,
 {
 Error *local_err = NULL;
 int i, ret;
+assert(qemu_in_main_thread());
 
 bdrv_assign_node_name(bs, node_name, &local_err);
 if (local_err) {
@@ -1989,6 +1992,8 @@ static int bdrv_fill_options(QDict **options, const char 
*filename,
 BlockDriver *drv = NULL;
 Error *local_err = NULL;
 
+assert(qemu_in_main_thread());
+
 /*
  * Caution: while qdict_get_try_str() is fine, getting non-string
  * types would require more care.  When @options come from
@@ -2182,6 +2187,7 @@ static void bdrv_child_perm(BlockDriverState *bs, 
BlockDriverState *child_bs,
 uint64_t *nperm, uint64_t *nshared)
 {
 assert(bs->drv && bs->drv->bdrv_child_perm);
+assert(qemu_in_main_thread());
 bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
  parent_perm, parent_shared,
  nperm, nshared);
@@ -2267,6 +2273,7 @@ static void bdrv_drv_set_perm_commit(void *opaque)
 {
 BlockDriverState *bs = opaque;
 uint64_t cumulative_perms, cumulative_shared_perms;
+assert(qemu_in_main_thread());
 
 if (bs->drv->bdrv_set_perm) {
 bdrv_get_cumulative_perm(bs, &cumulative_perms,
@@ -2278,6 +2285,7 @@ static void bdrv_drv_set_perm_commit(void *opaque)
 static void bdrv_drv_set_perm_abort(void *opaque)
 {
 BlockDriverState *bs = opaque;
+assert(qemu_in_main_thread());
 
 if (bs->drv->bdrv_abort_perm_update) {
 bs->drv->bdrv_abort_perm_update(bs);
@@ -2293,6 +2301,7 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, 
uint64_t perm,
  uint64_t shared_perm, Transaction *tran,
  Error **errp)
 {
+assert(qemu_in_main_thread());
 if (!bs->drv) {
 return 0;
 }
@@ -4357,6 +4366,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 
 assert(qemu_get_current_aio_context() == qemu_get_aio_context());
 assert(bs_queue != NULL);
+assert(qemu_in_main_thread());
 
 QTAILQ_FOREACH(bs_entry, bs_queue, entry) {
 ctx = bdrv_get_aio_context(bs_entry->state.bs);
@@ -4622,6 +4632,7 @@ static int bdrv_reopen_prepare(BDRVReopenState 
*reopen_state,
 
 assert(reopen_state != NULL);
 assert(reopen_state->bs->drv != NULL);
+assert(qemu_in_main_thread());
 drv = reopen_state->bs->drv;
 
 /* This function and each driver's bdrv_reopen_prepare() remove
@@ -4832,6 +4843,7 @@ static void bdrv_reopen_commit(BDRVReopenState 
*reopen_state)
 bs = reopen_state->bs;
 drv = bs->drv;
 assert(drv != NULL);
+assert(qemu_in_main_thread());
 
 /* If there are any driver level actions to take */
 if (drv->bdrv_reopen_commit) {
@@ -4873,6 +4885,7 @@ static void bdrv_reopen_abort(BDRVReopenState 
*reopen_state)
 assert(reopen_state != NULL);
 drv = reopen_state->bs->drv;
 assert(drv != NULL);
+assert(qemu_in_main_thread());
 
 if (drv->bdrv_reopen_abort) {
 drv->bdrv_reopen_abort(reopen_state);
@@ -4886,6 +4899,7 @@ static void bdrv_close(BlockDriverState *bs)
 BdrvChild *child, *next;
 
 assert(!bs->refcnt);
+assert(qemu_in_main_thread());
 
 bdrv_drained_begin(bs); /* complete I/O */
 bdrv_flush(bs);
@@ -6671,6 +6685,8 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
 int ret;
 uint64_t cumulative_perms, cumulative_shared_perms;
 
+assert(qemu_in_main_thread());
+
 if (!bs->drv) {
 return -ENOMEDIUM;
 }
@@ -7180,6 +7196,7 @@ static void bdrv_detach_aio_context(BlockDriverState *bs)
 BdrvAioNotifier *baf, *baf_tmp;
 
 assert(!bs->walking_aio_notifiers);
+assert(qemu_in_main_thread());
 bs->walking_aio_notifiers = true;
 QLIST_FOREACH_SAFE(baf, &bs->aio_notifiers, list, baf_tmp) {
 if (baf->deleted) {
@@ -7207,6 +7224,7 @@ static void bdrv_attach_aio_context(BlockDriverState *bs,
 Aio

Re: [PATCH] microvm: use MachineState->dumpdtb

2021-11-23 Thread Igor Mammedov

On Tue, 23 Nov 2021 10:16:48 +0100
Gerd Hoffmann  wrote:

> There already is a machine property to dumb the device tree for
s/dumb/dump/


> debugging purposes, and the helper function qemu_fdt_dumpdtb()
> implementing the dumbing.  Make microvm use it for consistency.
ditto

> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  hw/i386/microvm-dt.c | 9 +
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/hw/i386/microvm-dt.c b/hw/i386/microvm-dt.c
> index 875ba9196394..e6f5a90209c5 100644
> --- a/hw/i386/microvm-dt.c
> +++ b/hw/i386/microvm-dt.c
> @@ -330,12 +330,5 @@ void dt_setup_microvm(MicrovmMachineState *mms)
>  fprintf(stderr, "%s: add etc/fdt to fw_cfg\n", __func__);
>  fw_cfg_add_file(x86ms->fw_cfg, "etc/fdt", mms->fdt, size);
>  
> -if (debug) {
> -fprintf(stderr, "%s: writing microvm.fdt\n", __func__);
> -g_file_set_contents("microvm.fdt", mms->fdt, size, NULL);
> -int ret = system("dtc -I dtb -O dts microvm.fdt");
> -if (ret != 0) {
> -fprintf(stderr, "%s: oops, dtc not installed?\n", __func__);
> -}
> -}
> +qemu_fdt_dumpdtb(mms->fdt, size);
>  }

Re: [PATCH v5 07/18] target/riscv: setup everything so that riscv128-softmmu compiles

2021-11-23 Thread Frédéric Pétrot


On 24/11/2021 07:12, Alistair Francis wrote:

On Sat, Nov 13, 2021 at 1:16 AM Frédéric Pétrot
 wrote:


This patch is kind of a mess because several files have to be slightly
modified to allow for a new target. In the current status, we have done
our best to have RV64 and RV128 under the same RV64 umbrella, but there
is still work to do to have a single executable for both.
In particular, we have no atomic accesses for aligned 128-bit addresses.

Once this patch applied, adding risc128-sofmmu to --target-list produces
a (no so useful yet) executable.


I can't remember if we discussed this before, but do we need the
riscv128-sofmmu executable? Can we instead just use a riscv64-sofmmu
executable?


  Hello Alistair,
  Richard was also advocating for a single executable, but pointed out that
  we need to disable mttcg because there is a need for specific tcg support for
  128-bit aligned atomics.
  Given my understanding of that part of QEMU, I choose the easy way to disable
  it once and for all at compile time until we have that.

  Frédéric



Alistair



Signed-off-by: Frédéric Pétrot 
Co-authored-by: Fabien Portas 
---
  configs/devices/riscv128-softmmu/default.mak | 17 +++
  configs/targets/riscv128-softmmu.mak |  6 ++
  include/disas/dis-asm.h  |  1 +
  include/hw/riscv/sifive_cpu.h|  3 +++
  target/riscv/cpu-param.h |  5 +
  target/riscv/cpu.h   |  3 +++
  disas/riscv.c|  5 +
  target/riscv/cpu.c   | 22 ++--
  target/riscv/gdbstub.c   |  8 +++
  target/riscv/insn_trans/trans_rvd.c.inc  | 12 +--
  target/riscv/insn_trans/trans_rvf.c.inc  |  6 +++---
  target/riscv/Kconfig |  3 +++
  12 files changed, 80 insertions(+), 11 deletions(-)
  create mode 100644 configs/devices/riscv128-softmmu/default.mak
  create mode 100644 configs/targets/riscv128-softmmu.mak

diff --git a/configs/devices/riscv128-softmmu/default.mak 
b/configs/devices/riscv128-softmmu/default.mak
new file mode 100644
index 00..e838f35785
--- /dev/null
+++ b/configs/devices/riscv128-softmmu/default.mak
@@ -0,0 +1,17 @@
+# Default configuration for riscv128-softmmu
+
+# Uncomment the following lines to disable these optional devices:
+#
+#CONFIG_PCI_DEVICES=n
+# No does not seem to be an option for these two parameters
+CONFIG_SEMIHOSTING=y
+CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
+
+# Boards:
+#
+CONFIG_SPIKE=n
+CONFIG_SIFIVE_E=n
+CONFIG_SIFIVE_U=n
+CONFIG_RISCV_VIRT=y
+CONFIG_MICROCHIP_PFSOC=n
+CONFIG_SHAKTI_C=n
diff --git a/configs/targets/riscv128-softmmu.mak 
b/configs/targets/riscv128-softmmu.mak
new file mode 100644
index 00..d812cc1c80
--- /dev/null
+++ b/configs/targets/riscv128-softmmu.mak
@@ -0,0 +1,6 @@
+TARGET_ARCH=riscv128
+TARGET_BASE_ARCH=riscv
+# As long as we have no atomic accesses for aligned 128-bit addresses
+TARGET_SUPPORTS_MTTCG=n
+TARGET_XML_FILES=gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
+TARGET_NEED_FDT=y
diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index 08e1beec85..102a1e7f50 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -459,6 +459,7 @@ int print_insn_nios2(bfd_vma, disassemble_info*);
  int print_insn_xtensa   (bfd_vma, disassemble_info*);
  int print_insn_riscv32  (bfd_vma, disassemble_info*);
  int print_insn_riscv64  (bfd_vma, disassemble_info*);
+int print_insn_riscv128 (bfd_vma, disassemble_info*);
  int print_insn_rx(bfd_vma, disassemble_info *);
  int print_insn_hexagon(bfd_vma, disassemble_info *);

diff --git a/include/hw/riscv/sifive_cpu.h b/include/hw/riscv/sifive_cpu.h
index 136799633a..64078feba8 100644
--- a/include/hw/riscv/sifive_cpu.h
+++ b/include/hw/riscv/sifive_cpu.h
@@ -26,6 +26,9 @@
  #elif defined(TARGET_RISCV64)
  #define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
  #define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
+#else
+#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
+#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
  #endif

  #endif /* HW_SIFIVE_CPU_H */
diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
index 80eb615f93..c10459b56f 100644
--- a/target/riscv/cpu-param.h
+++ b/target/riscv/cpu-param.h
@@ -16,6 +16,11 @@
  # define TARGET_LONG_BITS 32
  # define TARGET_PHYS_ADDR_SPACE_BITS 34 /* 22-bit PPN */
  # define TARGET_VIRT_ADDR_SPACE_BITS 32 /* sv32 */
+#else
+/* 64-bit target, since QEMU isn't built to have TARGET_LONG_BITS over 64 */
+# define TARGET_LONG_BITS 64
+# define TARGET_PHYS_ADDR_SPACE_BITS 56 /* 44-bit PPN */
+# define TARGET_VIRT_ADDR_SPACE_BITS 48 /* sv48 */
  #endif
  #define TARGET_PAGE_BITS 12 /* 4 KiB Pages */
  /*
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 53a295efb7..8ff5b08d15 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu

[PATCH v5 18/31] include/block/snapshot: global state API + assertions

2021-11-23 Thread Emanuele Giuseppe Esposito

Snapshots run also under the BQL lock, so they all are
in the global state API. The aiocontext lock that they hold
is currently an overkill and in future could be removed.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
 include/block/snapshot.h | 13 +++--
 block/snapshot.c | 28 
 migration/savevm.c   |  2 ++
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/include/block/snapshot.h b/include/block/snapshot.h
index 940345692f..cda82c1ba1 100644
--- a/include/block/snapshot.h
+++ b/include/block/snapshot.h
@@ -45,6 +45,13 @@ typedef struct QEMUSnapshotInfo {
 uint64_t icount; /* record/replay step */
 } QEMUSnapshotInfo;
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo *sn_info,
const char *name);
 bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
@@ -73,9 +80,11 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState 
*bs,
  Error **errp);
 
 
-/* Group operations. All block drivers are involved.
+/*
+ * Group operations. All block drivers are involved.
  * These functions will properly handle dataplane (take aio_context_acquire
- * when appropriate for appropriate block drivers */
+ * when appropriate for appropriate block drivers
+ */
 
 bool bdrv_all_can_snapshot(bool has_devices, strList *devices,
Error **errp);
diff --git a/block/snapshot.c b/block/snapshot.c
index ccacda8bd5..3dd3c9d3bd 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -57,6 +57,8 @@ int bdrv_snapshot_find(BlockDriverState *bs, QEMUSnapshotInfo 
*sn_info,
 QEMUSnapshotInfo *sn_tab, *sn;
 int nb_sns, i, ret;
 
+assert(qemu_in_main_thread());
+
 ret = -ENOENT;
 nb_sns = bdrv_snapshot_list(bs, &sn_tab);
 if (nb_sns < 0) {
@@ -105,6 +107,7 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState *bs,
 bool ret = false;
 
 assert(id || name);
+assert(qemu_in_main_thread());
 
 nb_sns = bdrv_snapshot_list(bs, &sn_tab);
 if (nb_sns < 0) {
@@ -200,6 +203,7 @@ static BlockDriverState 
*bdrv_snapshot_fallback(BlockDriverState *bs)
 int bdrv_can_snapshot(BlockDriverState *bs)
 {
 BlockDriver *drv = bs->drv;
+assert(qemu_in_main_thread());
 if (!drv || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs)) {
 return 0;
 }
@@ -220,6 +224,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
+
+assert(qemu_in_main_thread());
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -240,6 +247,8 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 BdrvChild **fallback_ptr;
 int ret, open_ret;
 
+assert(qemu_in_main_thread());
+
 if (!drv) {
 error_setg(errp, "Block driver is closed");
 return -ENOMEDIUM;
@@ -348,6 +357,8 @@ int bdrv_snapshot_delete(BlockDriverState *bs,
 BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
 int ret;
 
+assert(qemu_in_main_thread());
+
 if (!drv) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, bdrv_get_device_name(bs));
 return -ENOMEDIUM;
@@ -380,6 +391,8 @@ int bdrv_snapshot_list(BlockDriverState *bs,
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *fallback_bs = bdrv_snapshot_fallback(bs);
+
+assert(qemu_in_main_thread());
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -419,6 +432,8 @@ int bdrv_snapshot_load_tmp(BlockDriverState *bs,
 {
 BlockDriver *drv = bs->drv;
 
+assert(qemu_in_main_thread());
+
 if (!drv) {
 error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, bdrv_get_device_name(bs));
 return -ENOMEDIUM;
@@ -447,6 +462,8 @@ int bdrv_snapshot_load_tmp_by_id_or_name(BlockDriverState 
*bs,
 int ret;
 Error *local_err = NULL;
 
+assert(qemu_in_main_thread());
+
 ret = bdrv_snapshot_load_tmp(bs, id_or_name, NULL, &local_err);
 if (ret == -ENOENT || ret == -EINVAL) {
 error_free(local_err);
@@ -515,6 +532,8 @@ bool bdrv_all_can_snapshot(bool has_devices, strList 
*devices,
 g_autoptr(GList) bdrvs = NULL;
 GList *iterbdrvs;
 
+assert(qemu_in_main_thread());
+
 if (bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp) < 0) 
{
 return false;
 }
@@ -549,6 +568,8 @@ int bdrv_all_delete_snapshot(const char *name,
 g_autoptr(GList) bdrvs = NULL;
 GList *iterbdrvs;
 
+assert(qemu_in_main_thread());
+
 if (bdrv_all_get_snapshot_devices(has_devices, devices, &bdrvs, errp) < 0) 
{
 return -1;
 }
@@ -588,6 +609,8 @@ int bdrv_all_goto_snapshot(const char *name,
 g_autoptr(GList) bdrvs = NULL;
 GList *iterbdrvs;
 
+assert(qemu_in_main_t

[PATCH v5 30/31] crypto: delegate permission functions to JobDriver .pre_run

2021-11-23 Thread Emanuele Giuseppe Esposito

block_crypto_amend_options_generic_luks uses the block layer
permission API, therefore it should be called with the BQL held.

However, the same function is being called ib two BlockDriver
callbacks: bdrv_amend_options (under BQL) and bdrv_co_amend (I/O).

The latter is I/O because it is invoked by block/amend.c's
blockdev_amend_run(), a .run callback of the amend JobDriver

Therefore we want to 1) change block_crypto_amend_options_generic_luks
to use the permission API only when the BQL is held, and
2) use the .pre_run JobDriver callback to check for
permissions before switching to the job aiocontext. This has also
the benefit of applying the same permission operation to all
amend implementations, not only luks.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/amend.c  | 20 
 block/crypto.c | 18 --
 2 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/block/amend.c b/block/amend.c
index 392df9ef83..fba6add51a 100644
--- a/block/amend.c
+++ b/block/amend.c
@@ -53,10 +53,30 @@ static int coroutine_fn blockdev_amend_run(Job *job, Error 
**errp)
 return ret;
 }
 
+static int blockdev_amend_refresh_perms(Job *job, Error **errp)
+{
+BlockdevAmendJob *s = container_of(job, BlockdevAmendJob, common);
+
+return bdrv_child_refresh_perms(s->bs, s->bs->file, errp);
+}
+
+static int blockdev_amend_pre_run(Job *job, Error **errp)
+{
+return blockdev_amend_refresh_perms(job, errp);
+}
+
+static void blockdev_amend_clean(Job *job)
+{
+Error *errp;
+blockdev_amend_refresh_perms(job, &errp);
+}
+
 static const JobDriver blockdev_amend_job_driver = {
 .instance_size = sizeof(BlockdevAmendJob),
 .job_type  = JOB_TYPE_AMEND,
 .run   = blockdev_amend_run,
+.pre_run   = blockdev_amend_pre_run,
+.clean = blockdev_amend_clean,
 };
 
 void qmp_x_blockdev_amend(const char *job_id,
diff --git a/block/crypto.c b/block/crypto.c
index c8ba4681e2..82f154516c 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -780,6 +780,7 @@ block_crypto_get_specific_info_luks(BlockDriverState *bs, 
Error **errp)
 static int
 block_crypto_amend_options_generic_luks(BlockDriverState *bs,
 QCryptoBlockAmendOptions 
*amend_options,
+bool under_bql,
 bool force,
 Error **errp)
 {
@@ -791,9 +792,12 @@ block_crypto_amend_options_generic_luks(BlockDriverState 
*bs,
 
 /* apply for exclusive read/write permissions to the underlying file*/
 crypto->updating_keys = true;
-ret = bdrv_child_refresh_perms(bs, bs->file, errp);
-if (ret) {
-goto cleanup;
+
+if (under_bql) {
+ret = bdrv_child_refresh_perms(bs, bs->file, errp);
+if (ret) {
+goto cleanup;
+}
 }
 
 ret = qcrypto_block_amend_options(crypto->block,
@@ -806,7 +810,9 @@ block_crypto_amend_options_generic_luks(BlockDriverState 
*bs,
 cleanup:
 /* release exclusive read/write permissions to the underlying file*/
 crypto->updating_keys = false;
-bdrv_child_refresh_perms(bs, bs->file, errp);
+if (under_bql) {
+bdrv_child_refresh_perms(bs, bs->file, errp);
+}
 return ret;
 }
 
@@ -834,7 +840,7 @@ block_crypto_amend_options_luks(BlockDriverState *bs,
 goto cleanup;
 }
 ret = block_crypto_amend_options_generic_luks(bs, amend_options,
-  force, errp);
+  true, force, errp);
 cleanup:
 qapi_free_QCryptoBlockAmendOptions(amend_options);
 return ret;
@@ -853,7 +859,7 @@ coroutine_fn block_crypto_co_amend_luks(BlockDriverState 
*bs,
 .u.luks = *qapi_BlockdevAmendOptionsLUKS_base(&opts->u.luks),
 };
 return block_crypto_amend_options_generic_luks(bs, &amend_opts,
-   force, errp);
+   false, force, errp);
 }
 
 static void
-- 
2.27.0

[PATCH v5 31/31] block.c: assertions to the block layer permissions API

2021-11-23 Thread Emanuele Giuseppe Esposito

Now that we "covered" the three main cases where the
permission API was being used under BQL (fuse,
amend and invalidate_cache), we can safely assert for
the permission functions implemented in block.c

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index 805974676b..6056ec4bc5 100644
--- a/block.c
+++ b/block.c
@@ -2138,6 +2138,7 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, 
Error **errp)
 
 assert(a->bs);
 assert(a->bs == b->bs);
+assert(qemu_in_main_thread());
 
 if ((b->perm & a->shared_perm) == b->perm) {
 return true;
@@ -2161,6 +2162,7 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, 
Error **errp)
 static bool bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
 {
 BdrvChild *a, *b;
+assert(qemu_in_main_thread());
 
 /*
  * During the loop we'll look at each pair twice. That's correct because
@@ -2245,6 +2247,8 @@ static void bdrv_child_set_perm_abort(void *opaque)
 {
 BdrvChildSetPermState *s = opaque;
 
+assert(qemu_in_main_thread());
+
 s->child->perm = s->old_perm;
 s->child->shared_perm = s->old_shared_perm;
 }
@@ -2258,6 +2262,7 @@ static void bdrv_child_set_perm(BdrvChild *c, uint64_t 
perm,
 uint64_t shared, Transaction *tran)
 {
 BdrvChildSetPermState *s = g_new(BdrvChildSetPermState, 1);
+assert(qemu_in_main_thread());
 
 *s = (BdrvChildSetPermState) {
 .child = c,
@@ -2442,6 +2447,7 @@ static int bdrv_node_refresh_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 BdrvChild *c;
 int ret;
 uint64_t cumulative_perms, cumulative_shared_perms;
+assert(qemu_in_main_thread());
 
 bdrv_get_cumulative_perm(bs, &cumulative_perms, &cumulative_shared_perms);
 
@@ -2510,6 +2516,7 @@ static int bdrv_list_refresh_perms(GSList *list, 
BlockReopenQueue *q,
 {
 int ret;
 BlockDriverState *bs;
+assert(qemu_in_main_thread());
 
 for ( ; list; list = list->next) {
 bs = list->data;
@@ -2582,6 +2589,7 @@ static int bdrv_refresh_perms(BlockDriverState *bs, Error 
**errp)
 int ret;
 Transaction *tran = tran_new();
 g_autoptr(GSList) list = bdrv_topological_dfs(NULL, NULL, bs);
+assert(qemu_in_main_thread());
 
 ret = bdrv_list_refresh_perms(list, NULL, tran, errp);
 tran_finalize(tran, ret);
@@ -2648,6 +2656,7 @@ static void bdrv_filter_default_perms(BlockDriverState 
*bs, BdrvChild *c,
   uint64_t perm, uint64_t shared,
   uint64_t *nperm, uint64_t *nshared)
 {
+assert(qemu_in_main_thread());
 *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
 }
@@ -2659,6 +2668,7 @@ static void bdrv_default_perms_for_cow(BlockDriverState 
*bs, BdrvChild *c,
uint64_t *nperm, uint64_t *nshared)
 {
 assert(role & BDRV_CHILD_COW);
+assert(qemu_in_main_thread());
 
 /*
  * We want consistent read from backing files if the parent needs it.
@@ -2696,6 +2706,7 @@ static void 
bdrv_default_perms_for_storage(BlockDriverState *bs, BdrvChild *c,
 {
 int flags;
 
+assert(qemu_in_main_thread());
 assert(role & (BDRV_CHILD_METADATA | BDRV_CHILD_DATA));
 
 flags = bdrv_reopen_get_flags(reopen_queue, bs);
@@ -6094,6 +6105,7 @@ static void xdbg_graph_add_edge(XDbgBlockGraphConstructor 
*gr, void *parent,
 {
 BlockPermission qapi_perm;
 XDbgBlockGraphEdge *edge;
+assert(qemu_in_main_thread());
 
 edge = g_new0(XDbgBlockGraphEdge, 1);
 
-- 
2.27.0

[PATCH v5 21/31] block_int-common.h: split function pointers in BlockDriver

2021-11-23 Thread Emanuele Giuseppe Esposito

Similar to the header split, also the function pointers in BlockDriver
can be split in I/O and global state.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block_int-common.h | 440 +--
 1 file changed, 235 insertions(+), 205 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 70534f94ae..0e63dc694f 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -96,6 +96,11 @@ typedef struct BdrvTrackedRequest {
 
 
 struct BlockDriver {
+/*
+ * These fields are initialized when this object is created,
+ * and are never changed afterwards.
+ */
+
 const char *format_name;
 int instance_size;
 
@@ -121,23 +126,7 @@ struct BlockDriver {
  * on those children.
  */
 bool is_format;
-/*
- * Return true if @to_replace can be replaced by a BDS with the
- * same data as @bs without it affecting @bs's behavior (that is,
- * without it being visible to @bs's parents).
- */
-bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
- BlockDriverState *to_replace);
-
-int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
-int (*bdrv_probe_device)(const char *filename);
 
-/*
- * Any driver implementing this callback is expected to be able to handle
- * NULL file names in its .bdrv_open() implementation.
- */
-void (*bdrv_parse_filename)(const char *filename, QDict *options,
-Error **errp);
 /*
  * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
  * this field set to true, except ones that are defined only by their
@@ -159,7 +148,66 @@ struct BlockDriver {
  */
 bool supports_backing;
 
-/* For handling image reopen for split or non-split files */
+bool has_variable_length;
+
+/*
+ * Drivers setting this field must be able to work with just a plain
+ * filename with ':' as a prefix, and no other options.
+ * Options may be extracted from the filename by implementing
+ * bdrv_parse_filename.
+ */
+const char *protocol_name;
+
+/* List of options for creating images, terminated by name == NULL */
+QemuOptsList *create_opts;
+
+/* List of options for image amend */
+QemuOptsList *amend_opts;
+
+/*
+ * If this driver supports reopening images this contains a
+ * NULL-terminated list of the runtime options that can be
+ * modified. If an option in this list is unspecified during
+ * reopen then it _must_ be reset to its default value or return
+ * an error.
+ */
+const char *const *mutable_opts;
+
+/*
+ * Pointer to a NULL-terminated array of names of strong options
+ * that can be specified for bdrv_open(). A strong option is one
+ * that changes the data of a BDS.
+ * If this pointer is NULL, the array is considered empty.
+ * "filename" and "driver" are always considered strong.
+ */
+const char *const *strong_runtime_opts;
+
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
+/*
+ * Return true if @to_replace can be replaced by a BDS with the
+ * same data as @bs without it affecting @bs's behavior (that is,
+ * without it being visible to @bs's parents).
+ */
+bool (*bdrv_recurse_can_replace)(BlockDriverState *bs,
+ BlockDriverState *to_replace);
+
+int (*bdrv_probe)(const uint8_t *buf, int buf_size, const char *filename);
+int (*bdrv_probe_device)(const char *filename);
+
+/*
+ * Any driver implementing this callback is expected to be able to handle
+ * NULL file names in its .bdrv_open() implementation.
+ */
+void (*bdrv_parse_filename)(const char *filename, QDict *options,
+Error **errp);
+
+/* For handling image reopen for split or non-split files. */
 int (*bdrv_reopen_prepare)(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp);
 void (*bdrv_reopen_commit)(BDRVReopenState *reopen_state);
@@ -175,7 +223,6 @@ struct BlockDriver {
   Error **errp);
 void (*bdrv_close)(BlockDriverState *bs);
 
-
 int coroutine_fn (*bdrv_co_create)(BlockdevCreateOptions *opts,
Error **errp);
 int coroutine_fn (*bdrv_co_create_opts)(BlockDriver *drv,
@@ -183,11 +230,6 @@ struct BlockDriver {
 QemuOpts *opts,
 Error **errp);
 
-int coroutine_fn (*bdrv_co_amend)(BlockDriverState *bs,
-  BlockdevAmendOptions *opts,
-  bool force,
-

[PATCH v5 08/31] assertions for block_int global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 15 +++
 block/backup.c  |  1 +
 block/block-backend.c   |  3 +++
 block/commit.c  |  2 ++
 block/dirty-bitmap.c|  1 +
 block/io.c  |  1 +
 block/mirror.c  |  4 
 block/monitor/bitmap-qmp-cmds.c |  6 ++
 block/stream.c  |  2 ++
 blockdev.c  |  7 +++
 10 files changed, 42 insertions(+)

diff --git a/block.c b/block.c
index 49bee69e27..198ec636ff 100644
--- a/block.c
+++ b/block.c
@@ -662,6 +662,8 @@ int coroutine_fn bdrv_co_create_opts_simple(BlockDriver 
*drv,
 Error *local_err = NULL;
 int ret;
 
+assert(qemu_in_main_thread());
+
 size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
 buf = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
 prealloc = qapi_enum_parse(&PreallocMode_lookup, buf,
@@ -2492,6 +2494,8 @@ void bdrv_get_cumulative_perm(BlockDriverState *bs, 
uint64_t *perm,
 uint64_t cumulative_perms = 0;
 uint64_t cumulative_shared_perms = BLK_PERM_ALL;
 
+assert(qemu_in_main_thread());
+
 QLIST_FOREACH(c, &bs->parents, next_parent) {
 cumulative_perms |= c->perm;
 cumulative_shared_perms &= c->shared_perm;
@@ -2552,6 +2556,8 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
 Transaction *tran = tran_new();
 int ret;
 
+assert(qemu_in_main_thread());
+
 bdrv_child_set_perm(c, perm, shared, tran);
 
 ret = bdrv_refresh_perms(c->bs, &local_err);
@@ -2582,6 +2588,8 @@ int bdrv_child_refresh_perms(BlockDriverState *bs, 
BdrvChild *c, Error **errp)
 uint64_t parent_perms, parent_shared;
 uint64_t perms, shared;
 
+assert(qemu_in_main_thread());
+
 bdrv_get_cumulative_perm(bs, &parent_perms, &parent_shared);
 bdrv_child_perm(bs, c->bs, c, c->role, NULL,
 parent_perms, parent_shared, &perms, &shared);
@@ -2724,6 +2732,7 @@ void bdrv_default_perms(BlockDriverState *bs, BdrvChild 
*c,
 uint64_t perm, uint64_t shared,
 uint64_t *nperm, uint64_t *nshared)
 {
+assert(qemu_in_main_thread());
 if (role & BDRV_CHILD_FILTERED) {
 assert(!(role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
  BDRV_CHILD_COW)));
@@ -3084,6 +3093,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 BdrvChild *child = NULL;
 Transaction *tran = tran_new();
 
+assert(qemu_in_main_thread());
+
 ret = bdrv_attach_child_common(child_bs, child_name, child_class,
child_role, perm, shared_perm, opaque,
&child, tran, errp);
@@ -7436,6 +7447,8 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
 {
 BlockDriverState *filtered;
 
+assert(qemu_in_main_thread());
+
 if (!bs || !bs->drv) {
 return false;
 }
@@ -7607,6 +7620,7 @@ static bool append_strong_runtime_options(QDict *d, 
BlockDriverState *bs)
  * would result in exactly bs->backing. */
 static bool bdrv_backing_overridden(BlockDriverState *bs)
 {
+assert(qemu_in_main_thread());
 if (bs->backing) {
 return strcmp(bs->auto_backing_file,
   bs->backing->bs->filename);
@@ -7995,6 +8009,7 @@ static BlockDriverState 
*bdrv_do_skip_filters(BlockDriverState *bs,
  */
 BlockDriverState *bdrv_skip_implicit_filters(BlockDriverState *bs)
 {
+assert(qemu_in_main_thread());
 return bdrv_do_skip_filters(bs, true);
 }
 
diff --git a/block/backup.c b/block/backup.c
index 21d5983779..c53041772c 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -372,6 +372,7 @@ BlockJob *backup_job_create(const char *job_id, 
BlockDriverState *bs,
 
 assert(bs);
 assert(target);
+assert(qemu_in_main_thread());
 
 /* QMP interface protects us from these cases */
 assert(sync_mode != MIRROR_SYNC_MODE_INCREMENTAL);
diff --git a/block/block-backend.c b/block/block-backend.c
index 11131ca68c..56670a774f 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1095,6 +1095,7 @@ static void blk_root_change_media(BdrvChild *child, bool 
load)
  */
 bool blk_dev_has_removable_media(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return !blk->dev || (blk->dev_ops && blk->dev_ops->change_media_cb);
 }
 
@@ -1112,6 +1113,7 @@ bool blk_dev_has_tray(BlockBackend *blk)
  */
 void blk_dev_eject_request(BlockBackend *blk, bool force)
 {
+assert(qemu_in_main_thread());
 if (blk->dev_ops && blk->dev_ops->eject_request_cb) {
 blk->dev_ops->eject_request_cb(blk->dev_opaque, force);
 }
@@ -1134,6 +1136,7 @@ bool blk_dev_is_tray_open(BlockBackend *blk)
  */
 bool blk_dev_is_medium_locked(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 if (blk->dev_ops && blk->dev_ops->is_medium_locked) {
 return blk->dev_ops->is_medium_locked(blk->dev_opaqu

[PATCH v5 26/31] job.h: split function pointers in JobDriver

2021-11-23 Thread Emanuele Giuseppe Esposito

The job API will be handled separately in another serie.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 6e67b6977f..4ea7a4a0cd 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -169,6 +169,12 @@ typedef struct Job {
  * Callbacks and other information about a Job driver.
  */
 struct JobDriver {
+
+/*
+ * These fields are initialized when this object is created,
+ * and are never changed afterwards
+ */
+
 /** Derived Job struct size */
 size_t instance_size;
 
@@ -184,9 +190,18 @@ struct JobDriver {
  * aborted. If it returns zero, the job moves into the WAITING state. If it
  * is the last job to complete in its transaction, all jobs in the
  * transaction move from WAITING to PENDING.
+ *
+ * This callback must be run in the job's context.
  */
 int coroutine_fn (*run)(Job *job, Error **errp);
 
+/*
+ * Functions run without regard to the BQL that may run in any
+ * arbitrary thread. These functions do not need to be thread-safe
+ * because the caller ensures that they are invoked from one
+ * thread at time.
+ */
+
 /**
  * If the callback is not NULL, it will be invoked when the job transitions
  * into the paused state.  Paused jobs must not perform any asynchronous
@@ -201,6 +216,13 @@ struct JobDriver {
  */
 void coroutine_fn (*resume)(Job *job);
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 /**
  * Called when the job is resumed by the user (i.e. user_paused becomes
  * false). .user_resume is called before .resume.
-- 
2.27.0

[PATCH v5 29/31] jobs: introduce pre_run function in JobDriver

2021-11-23 Thread Emanuele Giuseppe Esposito

.pre_run takes care of doing some initial job setup just before
the job is run inside the coroutine. Doing so can be useful
to invoke GS functions that require the BQL held.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/qemu/job.h |  9 +
 job.c  | 13 +
 2 files changed, 22 insertions(+)

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 4ea7a4a0cd..915ceff425 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -223,6 +223,15 @@ struct JobDriver {
  * the GS API.
  */
 
+/**
+ * Called in the Main Loop context, before the job coroutine
+ * is started in the job aiocontext. Useful to perform operations
+ * that needs to be done under BQL (like permissions setup).
+ * On success, it returns 0. Otherwise the job failed to initialize
+ * and won't start.
+ */
+int (*pre_run)(Job *job, Error **errp);
+
 /**
  * Called when the job is resumed by the user (i.e. user_paused becomes
  * false). .user_resume is called before .resume.
diff --git a/job.c b/job.c
index bb57ec6887..e048037099 100644
--- a/job.c
+++ b/job.c
@@ -967,11 +967,24 @@ static void coroutine_fn job_co_entry(void *opaque)
 aio_bh_schedule_oneshot(qemu_get_aio_context(), job_exit, job);
 }
 
+static int job_pre_run(Job *job)
+{
+assert(qemu_in_main_thread());
+if (job->driver->pre_run) {
+return job->driver->pre_run(job, &job->err);
+}
+
+return 0;
+}
+
 void job_start(Job *job)
 {
 assert(job && !job_started(job) && job->paused &&
job->driver && job->driver->run);
 job->co = qemu_coroutine_create(job_co_entry, job);
+if (job_pre_run(job)) {
+return;
+}
 job->pause_count--;
 job->busy = true;
 job->paused = false;
-- 
2.27.0

[PATCH v5 19/31] block/copy-before-write.h: global state API + assertions

2021-11-23 Thread Emanuele Giuseppe Esposito

copy-before-write functions always run under BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
 block/copy-before-write.h | 7 +++
 block/copy-before-write.c | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 51847e711a..9a45de2fce 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -29,6 +29,13 @@
 #include "block/block_int.h"
 #include "block/block-copy.h"
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
   BlockDriverState *target,
   const char *filter_node_name,
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index c30a5ff8de..36a8d7ba52 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -223,6 +223,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 QDict *opts;
 
 assert(source->total_sectors == target->total_sectors);
+assert(qemu_in_main_thread());
 
 opts = qdict_new();
 qdict_put_str(opts, "driver", "copy-before-write");
@@ -245,6 +246,7 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 
 void bdrv_cbw_drop(BlockDriverState *bs)
 {
+assert(qemu_in_main_thread());
 bdrv_drop_filter(bs, &error_abort);
 bdrv_unref(bs);
 }
-- 
2.27.0

[PATCH v5 10/31] block.c: modify .attach and .detach callbacks of child_of_bds

2021-11-23 Thread Emanuele Giuseppe Esposito

According to the assertions put in the previous patch, we should
first drain and then modify the ->children list. In this way
we prevent other iothreads to read the list while it is being
updated.

In this case, moving the drain won't cause any harm, because
child is a parameter of the drain function so it will still be
included in the operation, despite not being in the list.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 522a273140..5516c84ec4 100644
--- a/block.c
+++ b/block.c
@@ -1416,6 +1416,7 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
 
+bdrv_apply_subtree_drain(child, bs);
 assert_bdrv_graph_writable(bs);
 QLIST_INSERT_HEAD(&bs->children, child, next);
 
@@ -1423,7 +1424,6 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 bdrv_backing_attach(child);
 }
 
-bdrv_apply_subtree_drain(child, bs);
 }
 
 static void bdrv_child_cb_detach(BdrvChild *child)
@@ -1434,10 +1434,9 @@ static void bdrv_child_cb_detach(BdrvChild *child)
 bdrv_backing_detach(child);
 }
 
-bdrv_unapply_subtree_drain(child, bs);
-
 assert_bdrv_graph_writable(bs);
 QLIST_REMOVE(child, next);
+bdrv_unapply_subtree_drain(child, bs);
 }
 
 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
-- 
2.27.0

[PATCH v5 24/31] block_int-common.h: assertions in the callers of BdrvChildClass function pointers

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block.c b/block.c
index 180884b8c0..a0309f827d 100644
--- a/block.c
+++ b/block.c
@@ -1491,6 +1491,7 @@ const BdrvChildClass child_of_bds = {
 
 AioContext *bdrv_child_get_parent_aio_context(BdrvChild *c)
 {
+assert(qemu_in_main_thread());
 return c->klass->get_parent_aio_context(c);
 }
 
@@ -2120,6 +2121,7 @@ bool bdrv_is_writable(BlockDriverState *bs)
 
 static char *bdrv_child_user_desc(BdrvChild *c)
 {
+assert(qemu_in_main_thread());
 return c->klass->get_parent_desc(c);
 }
 
@@ -2832,6 +2834,7 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
 
 assert(!child->frozen);
 assert(old_bs != new_bs);
+assert(qemu_in_main_thread());
 
 if (old_bs && new_bs) {
 assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
@@ -2928,6 +2931,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
 BdrvChild *child = *s->child;
 BlockDriverState *bs = child->bs;
 
+assert(qemu_in_main_thread());
 /*
  * Pass free_empty_child=false, because we still need the child
  * for the AioContext operations on the parent below; those
@@ -3296,6 +3300,7 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild 
*child)
 static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
 {
 BdrvChild *c;
+assert(qemu_in_main_thread());
 QLIST_FOREACH(c, &bs->parents, next_parent) {
 if (c->klass->change_media) {
 c->klass->change_media(c, load);
@@ -3792,6 +3797,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 
 assert(!child_class || !flags);
 assert(!child_class == !parent);
+assert(qemu_in_main_thread());
 
 if (reference) {
 bool options_non_empty = options ? qdict_size(options) : false;
@@ -4178,6 +4184,7 @@ static BlockReopenQueue 
*bdrv_reopen_queue_child(BlockReopenQueue *bs_queue,
  * important to avoid graph changes between the recursive queuing here and
  * bdrv_reopen_multiple(). */
 assert(bs->quiesce_counter > 0);
+assert(qemu_in_main_thread());
 
 if (bs_queue == NULL) {
 bs_queue = g_new0(BlockReopenQueue, 1);
@@ -7271,6 +7278,7 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
 BdrvChild *child, *parent;
 
 g_assert(qemu_get_current_aio_context() == qemu_get_aio_context());
+assert(qemu_in_main_thread());
 
 if (old_context == new_context) {
 return;
@@ -7343,6 +7351,7 @@ void bdrv_set_aio_context_ignore(BlockDriverState *bs,
 static bool bdrv_parent_can_set_aio_context(BdrvChild *c, AioContext *ctx,
 GSList **ignore, Error **errp)
 {
+assert(qemu_in_main_thread());
 if (g_slist_find(*ignore, c)) {
 return true;
 }
-- 
2.27.0

[PATCH v5 20/31] block/coroutines: I/O API

2021-11-23 Thread Emanuele Giuseppe Esposito

block coroutines functions run in different aiocontext, and are
not protected by the BQL. Therefore are I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
 block/coroutines.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/coroutines.h b/block/coroutines.h
index c8c14a29c8..c61abd271a 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -29,6 +29,12 @@
 
 /* For blk_bs() in generated block/block-gen.c */
 #include "sysemu/block-backend.h"
+/*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
 
 int coroutine_fn bdrv_co_check(BlockDriverState *bs,
BdrvCheckResult *res, BdrvCheckMode fix);
-- 
2.27.0

[PATCH v5 09/31] block: introduce assert_bdrv_graph_writable

2021-11-23 Thread Emanuele Giuseppe Esposito

We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.

BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.

TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block_int-global-state.h | 10 +-
 block.c|  4 
 block/io.c | 11 +++
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/block/block_int-global-state.h 
b/include/block/block_int-global-state.h
index a1b7d0579d..fa96e8b449 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -312,4 +312,12 @@ void bdrv_remove_aio_context_notifier(BlockDriverState *bs,
  */
 void bdrv_drain_all_end_quiesce(BlockDriverState *bs);
 
-#endif /* BLOCK_INT_GLOBAL_STATE*/
+/**
+ * Make sure that the function is running under both drain and BQL.
+ * The latter protects from concurrent writings
+ * from the GS API, while the former prevents concurrent reads
+ * from I/O.
+ */
+void assert_bdrv_graph_writable(BlockDriverState *bs);
+
+#endif /* BLOCK_INT_GLOBAL_STATE */
diff --git a/block.c b/block.c
index 198ec636ff..522a273140 100644
--- a/block.c
+++ b/block.c
@@ -1416,6 +1416,7 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
 
+assert_bdrv_graph_writable(bs);
 QLIST_INSERT_HEAD(&bs->children, child, next);
 
 if (child->role & BDRV_CHILD_COW) {
@@ -1435,6 +1436,7 @@ static void bdrv_child_cb_detach(BdrvChild *child)
 
 bdrv_unapply_subtree_drain(child, bs);
 
+assert_bdrv_graph_writable(bs);
 QLIST_REMOVE(child, next);
 }
 
@@ -2818,6 +2820,7 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
 if (child->klass->detach) {
 child->klass->detach(child);
 }
+assert_bdrv_graph_writable(old_bs);
 QLIST_REMOVE(child, next_parent);
 }
 
@@ -2827,6 +2830,7 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
 }
 
 if (new_bs) {
+assert_bdrv_graph_writable(new_bs);
 QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
 
 /*
diff --git a/block/io.c b/block/io.c
index cb095deeec..3be08cad29 100644
--- a/block/io.c
+++ b/block/io.c
@@ -734,6 +734,17 @@ void bdrv_drain_all(void)
 bdrv_drain_all_end();
 }
 
+void assert_bdrv_graph_writable(BlockDriverState *bs)
+{
+/*
+ * TODO: this function is incomplete. Because the users of this
+ * assert lack the necessary drains, check only for BQL.
+ * Once the necessary drains are added,
+ * assert also for qatomic_read(&bs->quiesce_counter) > 0
+ */
+assert(qemu_in_main_thread());
+}
+
 /**
  * Remove an active request from the tracked requests list
  *
-- 
2.27.0

[PATCH v5 16/31] include/sysemu/blockdev.h: global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

blockdev functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/sysemu/blockdev.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index c4b7b8b54e..e53eb91be6 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -13,9 +13,6 @@
 #include "block/block.h"
 #include "qemu/queue.h"
 
-void blockdev_mark_auto_del(BlockBackend *blk);
-void blockdev_auto_del(BlockBackend *blk);
-
 typedef enum {
 IF_DEFAULT = -1,/* for use with drive_add() only */
 /*
@@ -40,6 +37,16 @@ struct DriveInfo {
 QTAILQ_ENTRY(DriveInfo) next;
 };
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
+void blockdev_mark_auto_del(BlockBackend *blk);
+void blockdev_auto_del(BlockBackend *blk);
+
 DriveInfo *blk_legacy_dinfo(BlockBackend *blk);
 DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, DriveInfo *dinfo);
 BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo);
-- 
2.27.0

[PATCH v5 13/31] block.c: add assertions to static functions

2021-11-23 Thread Emanuele Giuseppe Esposito

Following the assertion derived from the API split,
propagate the assertion also in the static functions.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c | 45 -
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 5516c84ec4..b77ab0a104 100644
--- a/block.c
+++ b/block.c
@@ -434,6 +434,7 @@ BlockDriverState *bdrv_new(void)
 static BlockDriver *bdrv_do_find_format(const char *format_name)
 {
 BlockDriver *drv1;
+assert(qemu_in_main_thread());
 
 QLIST_FOREACH(drv1, &bdrv_drivers, list) {
 if (!strcmp(drv1->format_name, format_name)) {
@@ -593,6 +594,8 @@ static int64_t create_file_fallback_truncate(BlockBackend 
*blk,
 int64_t size;
 int ret;
 
+assert(qemu_in_main_thread());
+
 ret = blk_truncate(blk, minimum_size, false, PREALLOC_MODE_OFF, 0,
&local_err);
 if (ret < 0 && ret != -ENOTSUP) {
@@ -631,6 +634,8 @@ static int 
create_file_fallback_zero_first_sector(BlockBackend *blk,
 int64_t bytes_to_clear;
 int ret;
 
+assert(qemu_in_main_thread());
+
 bytes_to_clear = MIN(current_size, BDRV_SECTOR_SIZE);
 if (bytes_to_clear) {
 ret = blk_pwrite_zeroes(blk, 0, bytes_to_clear, BDRV_REQ_MAY_UNMAP);
@@ -891,6 +896,7 @@ static BlockDriver *find_hdev_driver(const char *filename)
 {
 int score_max = 0, score;
 BlockDriver *drv = NULL, *d;
+assert(qemu_in_main_thread());
 
 QLIST_FOREACH(d, &bdrv_drivers, list) {
 if (d->bdrv_probe_device) {
@@ -908,6 +914,7 @@ static BlockDriver *find_hdev_driver(const char *filename)
 static BlockDriver *bdrv_do_find_protocol(const char *protocol)
 {
 BlockDriver *drv1;
+assert(qemu_in_main_thread());
 
 QLIST_FOREACH(drv1, &bdrv_drivers, list) {
 if (drv1->protocol_name && !strcmp(drv1->protocol_name, protocol)) {
@@ -1016,6 +1023,8 @@ static int find_image_format(BlockBackend *file, const 
char *filename,
 uint8_t buf[BLOCK_PROBE_BUF_SIZE];
 int ret = 0;
 
+assert(qemu_in_main_thread());
+
 /* Return the raw BlockDriver * to scsi-generic devices or empty drives */
 if (blk_is_sg(file) || !blk_is_inserted(file) || blk_getlength(file) == 0) 
{
 *pdrv = &bdrv_raw;
@@ -1097,6 +1106,7 @@ static BlockdevDetectZeroesOptions 
bdrv_parse_detect_zeroes(QemuOpts *opts,
 BlockdevDetectZeroesOptions detect_zeroes =
 qapi_enum_parse(&BlockdevDetectZeroesOptions_lookup, value,
 BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF, &local_err);
+assert(qemu_in_main_thread());
 g_free(value);
 if (local_err) {
 error_propagate(errp, local_err);
@@ -1214,6 +1224,7 @@ static void bdrv_child_cb_drained_end(BdrvChild *child,
 static int bdrv_child_cb_inactivate(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
+assert(qemu_in_main_thread());
 assert(bs->open_flags & BDRV_O_INACTIVE);
 return 0;
 }
@@ -1241,6 +1252,7 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
QDict *child_options,
int parent_flags, QDict *parent_options)
 {
 *child_flags = (parent_flags & ~BDRV_O_SNAPSHOT) | BDRV_O_TEMPORARY;
+assert(qemu_in_main_thread());
 
 /* For temporary files, unconditional cache=unsafe is fine */
 qdict_set_default_str(child_options, BDRV_OPT_CACHE_DIRECT, "off");
@@ -1260,6 +1272,7 @@ static void bdrv_backing_attach(BdrvChild *c)
 BlockDriverState *parent = c->opaque;
 BlockDriverState *backing_hd = c->bs;
 
+assert(qemu_in_main_thread());
 assert(!parent->backing_blocker);
 error_setg(&parent->backing_blocker,
"node is used as backing hd of '%s'",
@@ -1298,6 +1311,7 @@ static void bdrv_backing_detach(BdrvChild *c)
 {
 BlockDriverState *parent = c->opaque;
 
+assert(qemu_in_main_thread());
 assert(parent->backing_blocker);
 bdrv_op_unblock_all(c->bs, parent->backing_blocker);
 error_free(parent->backing_blocker);
@@ -1310,6 +1324,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 BlockDriverState *parent = c->opaque;
 bool read_only = bdrv_is_read_only(parent);
 int ret;
+assert(qemu_in_main_thread());
 
 if (read_only) {
 ret = bdrv_reopen_set_read_only(parent, false, errp);
@@ -1341,6 +1356,7 @@ static void bdrv_inherited_options(BdrvChildRole role, 
bool parent_is_format,
int parent_flags, QDict *parent_options)
 {
 int flags = parent_flags;
+assert(qemu_in_main_thread());
 
 /*
  * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
@@ -1479,6 +1495,7 @@ AioContext *bdrv_child_get_parent_aio_context(BdrvChild 
*c)
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
 int open_flags = flags;
+assert(qemu_in_main_thread());
 
 /*
  * Clear flags that are internal to the block layer before opening the
@@ -1491,6 +1508,8 @

[PATCH v5 15/31] assertions for blockjob.h global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 blockjob.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 10c807413e..74476af473 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -62,6 +62,7 @@ static bool is_block_job(Job *job)
 BlockJob *block_job_next(BlockJob *bjob)
 {
 Job *job = bjob ? &bjob->job : NULL;
+assert(qemu_in_main_thread());
 
 do {
 job = job_next(job);
@@ -73,6 +74,7 @@ BlockJob *block_job_next(BlockJob *bjob)
 BlockJob *block_job_get(const char *id)
 {
 Job *job = job_get(id);
+assert(qemu_in_main_thread());
 
 if (job && is_block_job(job)) {
 return container_of(job, BlockJob, job);
@@ -185,6 +187,7 @@ static const BdrvChildClass child_job = {
 
 void block_job_remove_all_bdrv(BlockJob *job)
 {
+assert(qemu_in_main_thread());
 /*
  * bdrv_root_unref_child() may reach child_job_[can_]set_aio_ctx(),
  * which will also traverse job->nodes, so consume the list one by
@@ -207,6 +210,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
 bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
 {
 GSList *el;
+assert(qemu_in_main_thread());
 
 for (el = job->nodes; el; el = el->next) {
 BdrvChild *c = el->data;
@@ -223,6 +227,7 @@ int block_job_add_bdrv(BlockJob *job, const char *name, 
BlockDriverState *bs,
 {
 BdrvChild *c;
 bool need_context_ops;
+assert(qemu_in_main_thread());
 
 bdrv_ref(bs);
 
@@ -272,6 +277,8 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 const BlockJobDriver *drv = block_job_driver(job);
 int64_t old_speed = job->speed;
 
+assert(qemu_in_main_thread());
+
 if (job_apply_verb(&job->job, JOB_VERB_SET_SPEED, errp) < 0) {
 return false;
 }
@@ -309,6 +316,8 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp)
 BlockJobInfo *info;
 uint64_t progress_current, progress_total;
 
+assert(qemu_in_main_thread());
+
 if (block_job_is_internal(job)) {
 error_setg(errp, "Cannot query QEMU internal jobs");
 return NULL;
@@ -498,6 +507,7 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 
 void block_job_iostatus_reset(BlockJob *job)
 {
+assert(qemu_in_main_thread());
 if (job->iostatus == BLOCK_DEVICE_IO_STATUS_OK) {
 return;
 }
-- 
2.27.0

[PATCH v5 17/31] assertions for blockdev.h global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Paolo Bonzini 
Reviewed-by: Stefan Hajnoczi 
---
 block/block-backend.c |  3 +++
 blockdev.c| 15 +++
 2 files changed, 18 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 56670a774f..aad6d9a4c3 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -794,6 +794,7 @@ bool bdrv_is_root_node(BlockDriverState *bs)
  */
 DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return blk->legacy_dinfo;
 }
 
@@ -805,6 +806,7 @@ DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
 DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, DriveInfo *dinfo)
 {
 assert(!blk->legacy_dinfo);
+assert(qemu_in_main_thread());
 return blk->legacy_dinfo = dinfo;
 }
 
@@ -815,6 +817,7 @@ DriveInfo *blk_set_legacy_dinfo(BlockBackend *blk, 
DriveInfo *dinfo)
 BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo)
 {
 BlockBackend *blk = NULL;
+assert(qemu_in_main_thread());
 
 while ((blk = blk_next(blk)) != NULL) {
 if (blk->legacy_dinfo == dinfo) {
diff --git a/blockdev.c b/blockdev.c
index 7706919410..eebd9317c2 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -113,6 +113,8 @@ void override_max_devs(BlockInterfaceType type, int 
max_devs)
 BlockBackend *blk;
 DriveInfo *dinfo;
 
+assert(qemu_in_main_thread());
+
 if (max_devs <= 0) {
 return;
 }
@@ -142,6 +144,8 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 DriveInfo *dinfo = blk_legacy_dinfo(blk);
 BlockJob *job;
 
+assert(qemu_in_main_thread());
+
 if (!dinfo) {
 return;
 }
@@ -163,6 +167,7 @@ void blockdev_mark_auto_del(BlockBackend *blk)
 void blockdev_auto_del(BlockBackend *blk)
 {
 DriveInfo *dinfo = blk_legacy_dinfo(blk);
+assert(qemu_in_main_thread());
 
 if (dinfo && dinfo->auto_del) {
 monitor_remove_blk(blk);
@@ -187,6 +192,8 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit)
 BlockBackend *blk;
 DriveInfo *dinfo;
 
+assert(qemu_in_main_thread());
+
 for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
 dinfo = blk_legacy_dinfo(blk);
 if (dinfo && dinfo->type == type
@@ -209,6 +216,8 @@ void drive_check_orphaned(void)
 Location loc;
 bool orphans = false;
 
+assert(qemu_in_main_thread());
+
 for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
 dinfo = blk_legacy_dinfo(blk);
 /*
@@ -242,6 +251,7 @@ void drive_check_orphaned(void)
 
 DriveInfo *drive_get_by_index(BlockInterfaceType type, int index)
 {
+assert(qemu_in_main_thread());
 return drive_get(type,
  drive_index_to_bus_id(type, index),
  drive_index_to_unit_id(type, index));
@@ -253,6 +263,8 @@ int drive_get_max_bus(BlockInterfaceType type)
 BlockBackend *blk;
 DriveInfo *dinfo;
 
+assert(qemu_in_main_thread());
+
 max_bus = -1;
 for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
 dinfo = blk_legacy_dinfo(blk);
@@ -269,6 +281,7 @@ int drive_get_max_bus(BlockInterfaceType type)
 DriveInfo *drive_get_next(BlockInterfaceType type)
 {
 static int next_block_unit[IF_COUNT];
+assert(qemu_in_main_thread());
 
 return drive_get(type, 0, next_block_unit[type]++);
 }
@@ -749,6 +762,8 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type,
 const char *filename;
 int i;
 
+assert(qemu_in_main_thread());
+
 /* Change legacy command line options into QMP ones */
 static const struct {
 const char *from;
-- 
2.27.0

[PATCH v5 07/31] include/block/block_int: split header into I/O and global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

Similarly to the previous patch, split block_int.h
in block_int-io.h and block_int-global-state.h

block_int-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block_int-common.h   | 1161 +++
 include/block/block_int-global-state.h |  315 +
 include/block/block_int-io.h   |  167 +++
 include/block/block_int.h  | 1475 +---
 blockdev.c |5 +
 5 files changed, 1651 insertions(+), 1472 deletions(-)
 create mode 100644 include/block/block_int-common.h
 create mode 100644 include/block/block_int-global-state.h
 create mode 100644 include/block/block_int-io.h

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
new file mode 100644
index 00..70534f94ae
--- /dev/null
+++ b/include/block/block_int-common.h
@@ -0,0 +1,1161 @@
+/*
+ * QEMU System Emulator block driver
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef BLOCK_INT_COMMON_H
+#define BLOCK_INT_COMMON_H
+
+#include "block/accounting.h"
+#include "block/block.h"
+#include "block/aio-wait.h"
+#include "qemu/queue.h"
+#include "qemu/coroutine.h"
+#include "qemu/stats64.h"
+#include "qemu/timer.h"
+#include "qemu/hbitmap.h"
+#include "block/snapshot.h"
+#include "qemu/throttle.h"
+#include "qemu/rcu.h"
+
+#define BLOCK_FLAG_LAZY_REFCOUNTS   8
+
+#define BLOCK_OPT_SIZE  "size"
+#define BLOCK_OPT_ENCRYPT   "encryption"
+#define BLOCK_OPT_ENCRYPT_FORMAT"encrypt.format"
+#define BLOCK_OPT_COMPAT6   "compat6"
+#define BLOCK_OPT_HWVERSION "hwversion"
+#define BLOCK_OPT_BACKING_FILE  "backing_file"
+#define BLOCK_OPT_BACKING_FMT   "backing_fmt"
+#define BLOCK_OPT_CLUSTER_SIZE  "cluster_size"
+#define BLOCK_OPT_TABLE_SIZE"table_size"
+#define BLOCK_OPT_PREALLOC  "preallocation"
+#define BLOCK_OPT_SUBFMT"subformat"
+#define BLOCK_OPT_COMPAT_LEVEL  "compat"
+#define BLOCK_OPT_LAZY_REFCOUNTS"lazy_refcounts"
+#define BLOCK_OPT_ADAPTER_TYPE  "adapter_type"
+#define BLOCK_OPT_REDUNDANCY"redundancy"
+#define BLOCK_OPT_NOCOW "nocow"
+#define BLOCK_OPT_EXTENT_SIZE_HINT  "extent_size_hint"
+#define BLOCK_OPT_OBJECT_SIZE   "object_size"
+#define BLOCK_OPT_REFCOUNT_BITS "refcount_bits"
+#define BLOCK_OPT_DATA_FILE "data_file"
+#define BLOCK_OPT_DATA_FILE_RAW "data_file_raw"
+#define BLOCK_OPT_COMPRESSION_TYPE  "compression_type"
+#define BLOCK_OPT_EXTL2 "extended_l2"
+
+#define BLOCK_PROBE_BUF_SIZE512
+
+enum BdrvTrackedRequestType {
+BDRV_TRACKED_READ,
+BDRV_TRACKED_WRITE,
+BDRV_TRACKED_DISCARD,
+BDRV_TRACKED_TRUNCATE,
+};
+
+/*
+ * That is not quite good that BdrvTrackedRequest structure is public,
+ * as block/io.c is very careful about incoming offset/bytes being
+ * correct. Be sure to assert bdrv_check_request() succeeded after any
+ * modification of BdrvTrackedRequest object out of block/io.c
+ */
+typedef struct BdrvTrackedRequest {
+BlockDriverState *bs;
+int64_t offset;
+int64_t bytes;
+enum BdrvTrackedRequestType type;
+
+bool serialising;
+int64_t overlap_offset;
+int64_t overlap_bytes;
+
+QLIST_ENTRY(BdrvTrackedRequest) list;
+Coroutine *co; /* owner, used for deadlock detection */
+CoQueue wait_queue; /* coroutines blocked on this request */
+
+struct BdrvTrackedRequest *waiting_for;
+} BdrvTrackedRequest;
+
+
+struct BlockDriver {
+const char *format_name;
+int instance_size;
+
+/*
+ * Set to true if the BlockDriver is a block filter. Block filters pass
+ * certain callbacks that refer to data (see block.c) to their bs->file
+ * or bs->backi

[PATCH v5 11/31] include/block/blockjob_int.h: split header into I/O and GS API

2021-11-23 Thread Emanuele Giuseppe Esposito

Since the I/O functions are not many, keep a single file.
Also split the function pointers in BlockJobDriver.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 include/block/blockjob_int.h | 28 
 1 file changed, 28 insertions(+)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 6633d83da2..718d7b92d2 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -38,6 +38,13 @@ struct BlockJobDriver {
 /** Generic JobDriver callbacks and settings */
 JobDriver job_driver;
 
+/*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
+
 /*
  * Returns whether the job has pending requests for the child or will
  * submit new requests before the next pause point. This callback is polled
@@ -46,6 +53,13 @@ struct BlockJobDriver {
  */
 bool (*drained_poll)(BlockJob *job);
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 /*
  * If the callback is not NULL, it will be invoked before the job is
  * resumed in a new AioContext.  This is the place to move any resources
@@ -56,6 +70,13 @@ struct BlockJobDriver {
 void (*set_speed)(BlockJob *job, int64_t speed);
 };
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 /**
  * block_job_create:
  * @job_id: The id of the newly-created job, or %NULL to have one
@@ -98,6 +119,13 @@ void block_job_free(Job *job);
  */
 void block_job_user_resume(Job *job);
 
+/*
+ * I/O API functions. These functions are thread-safe.
+ *
+ * See include/block/block-io.h for more information about
+ * the I/O API.
+ */
+
 /**
  * block_job_ratelimit_get_delay:
  *
-- 
2.27.0

[PATCH v5 14/31] include/block/blockjob.h: global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

blockjob functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 include/block/blockjob.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d200f33c10..fa0c3f7a47 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -77,6 +77,13 @@ typedef struct BlockJob {
 GSList *nodes;
 } BlockJob;
 
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
 /**
  * block_job_next:
  * @job: A block job, or %NULL.
@@ -158,6 +165,8 @@ BlockJobInfo *block_job_query(BlockJob *job, Error **errp);
  */
 void block_job_iostatus_reset(BlockJob *job);
 
+/* Common functions that are neither I/O nor Global State */
+
 /**
  * block_job_is_internal:
  * @job: The job to determine if it is user-visible or not.
-- 
2.27.0

[PATCH v5 06/31] block/block-backend.c: assertions for block-backend

2021-11-23 Thread Emanuele Giuseppe Esposito

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/block-backend.c  | 83 ++
 softmmu/qdev-monitor.c |  2 +
 2 files changed, 85 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index 7a4b50a8f0..11131ca68c 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -228,6 +228,7 @@ static void blk_root_activate(BdrvChild *child, Error 
**errp)
 
 void blk_set_force_allow_inactivate(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 blk->force_allow_inactivate = true;
 }
 
@@ -346,6 +347,8 @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, 
uint64_t shared_perm)
 {
 BlockBackend *blk;
 
+assert(qemu_in_main_thread());
+
 blk = g_new0(BlockBackend, 1);
 blk->refcnt = 1;
 blk->ctx = ctx;
@@ -383,6 +386,8 @@ BlockBackend *blk_new_with_bs(BlockDriverState *bs, 
uint64_t perm,
 {
 BlockBackend *blk = blk_new(bdrv_get_aio_context(bs), perm, shared_perm);
 
+assert(qemu_in_main_thread());
+
 if (blk_insert_bs(blk, bs, errp) < 0) {
 blk_unref(blk);
 return NULL;
@@ -411,6 +416,8 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 uint64_t perm = 0;
 uint64_t shared = BLK_PERM_ALL;
 
+assert(qemu_in_main_thread());
+
 /*
  * blk_new_open() is mainly used in .bdrv_create implementations and the
  * tools where sharing isn't a major concern because the BDS stays private
@@ -488,6 +495,7 @@ static void drive_info_del(DriveInfo *dinfo)
 
 int blk_get_refcnt(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return blk ? blk->refcnt : 0;
 }
 
@@ -498,6 +506,7 @@ int blk_get_refcnt(BlockBackend *blk)
 void blk_ref(BlockBackend *blk)
 {
 assert(blk->refcnt > 0);
+assert(qemu_in_main_thread());
 blk->refcnt++;
 }
 
@@ -508,6 +517,7 @@ void blk_ref(BlockBackend *blk)
  */
 void blk_unref(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 if (blk) {
 assert(blk->refcnt > 0);
 if (blk->refcnt > 1) {
@@ -528,6 +538,7 @@ void blk_unref(BlockBackend *blk)
  */
 BlockBackend *blk_all_next(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return blk ? QTAILQ_NEXT(blk, link)
: QTAILQ_FIRST(&block_backends);
 }
@@ -536,6 +547,8 @@ void blk_remove_all_bs(void)
 {
 BlockBackend *blk = NULL;
 
+assert(qemu_in_main_thread());
+
 while ((blk = blk_all_next(blk)) != NULL) {
 AioContext *ctx = blk_get_aio_context(blk);
 
@@ -559,6 +572,7 @@ void blk_remove_all_bs(void)
  */
 BlockBackend *blk_next(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return blk ? QTAILQ_NEXT(blk, monitor_link)
: QTAILQ_FIRST(&monitor_block_backends);
 }
@@ -625,6 +639,7 @@ static void bdrv_next_reset(BdrvNextIterator *it)
 
 BlockDriverState *bdrv_first(BdrvNextIterator *it)
 {
+assert(qemu_in_main_thread());
 bdrv_next_reset(it);
 return bdrv_next(it);
 }
@@ -662,6 +677,7 @@ bool monitor_add_blk(BlockBackend *blk, const char *name, 
Error **errp)
 {
 assert(!blk->name);
 assert(name && name[0]);
+assert(qemu_in_main_thread());
 
 if (!id_wellformed(name)) {
 error_setg(errp, "Invalid device name");
@@ -689,6 +705,8 @@ bool monitor_add_blk(BlockBackend *blk, const char *name, 
Error **errp)
  */
 void monitor_remove_blk(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
+
 if (!blk->name) {
 return;
 }
@@ -715,6 +733,7 @@ BlockBackend *blk_by_name(const char *name)
 {
 BlockBackend *blk = NULL;
 
+assert(qemu_in_main_thread());
 assert(name);
 while ((blk = blk_next(blk)) != NULL) {
 if (!strcmp(name, blk->name)) {
@@ -749,6 +768,7 @@ static BlockBackend *bdrv_first_blk(BlockDriverState *bs)
  */
 bool bdrv_has_blk(BlockDriverState *bs)
 {
+assert(qemu_in_main_thread());
 return bdrv_first_blk(bs) != NULL;
 }
 
@@ -759,6 +779,7 @@ bool bdrv_is_root_node(BlockDriverState *bs)
 {
 BdrvChild *c;
 
+assert(qemu_in_main_thread());
 QLIST_FOREACH(c, &bs->parents, next_parent) {
 if (c->klass != &child_root) {
 return false;
@@ -808,6 +829,7 @@ BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo)
  */
 BlockBackendPublic *blk_get_public(BlockBackend *blk)
 {
+assert(qemu_in_main_thread());
 return &blk->public;
 }
 
@@ -816,6 +838,7 @@ BlockBackendPublic *blk_get_public(BlockBackend *blk)
  */
 BlockBackend *blk_by_public(BlockBackendPublic *public)
 {
+assert(qemu_in_main_thread());
 return container_of(public, BlockBackend, public);
 }
 
@@ -828,6 +851,8 @@ void blk_remove_bs(BlockBackend *blk)
 BlockDriverState *bs;
 BdrvChild *root;
 
+assert(qemu_in_main_thread());
+
 notifier_list_notify(&blk->remove_bs_notifiers

[PATCH v5 12/31] assertions for blockjob_int.h

2021-11-23 Thread Emanuele Giuseppe Esposito

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 blockjob.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 4bad1408cb..10c807413e 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -84,6 +84,7 @@ BlockJob *block_job_get(const char *id)
 void block_job_free(Job *job)
 {
 BlockJob *bjob = container_of(job, BlockJob, job);
+assert(qemu_in_main_thread());
 
 block_job_remove_all_bdrv(bjob);
 blk_unref(bjob->blk);
@@ -436,6 +437,8 @@ void *block_job_create(const char *job_id, const 
BlockJobDriver *driver,
 BlockBackend *blk;
 BlockJob *job;
 
+assert(qemu_in_main_thread());
+
 if (job_id == NULL && !(flags & JOB_INTERNAL)) {
 job_id = bdrv_get_device_name(bs);
 }
@@ -505,6 +508,7 @@ void block_job_iostatus_reset(BlockJob *job)
 void block_job_user_resume(Job *job)
 {
 BlockJob *bjob = container_of(job, BlockJob, job);
+assert(qemu_in_main_thread());
 block_job_iostatus_reset(bjob);
 }
 
-- 
2.27.0

[PATCH v5 03/31] assertions for block global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

All the global state (GS) API functions will check that
qemu_in_main_thread() returns true. If not, it means
that the safety of BQL cannot be guaranteed, and
they need to be moved to I/O.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block.c| 135 -
 block/commit.c |   2 +
 block/io.c |  14 +
 blockdev.c |   1 +
 4 files changed, 150 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 84de6867e6..49bee69e27 100644
--- a/block.c
+++ b/block.c
@@ -278,6 +278,8 @@ bool bdrv_is_read_only(BlockDriverState *bs)
 int bdrv_can_set_read_only(BlockDriverState *bs, bool read_only,
bool ignore_allow_rdw, Error **errp)
 {
+assert(qemu_in_main_thread());
+
 /* Do not set read_only if copy_on_read is enabled */
 if (bs->copy_on_read && read_only) {
 error_setg(errp, "Can't set node '%s' to r/o with copy-on-read 
enabled",
@@ -311,6 +313,7 @@ int bdrv_apply_auto_read_only(BlockDriverState *bs, const 
char *errmsg,
   Error **errp)
 {
 int ret = 0;
+assert(qemu_in_main_thread());
 
 if (!(bs->open_flags & BDRV_O_RDWR)) {
 return 0;
@@ -393,6 +396,7 @@ char *bdrv_get_full_backing_filename(BlockDriverState *bs, 
Error **errp)
 void bdrv_register(BlockDriver *bdrv)
 {
 assert(bdrv->format_name);
+assert(qemu_in_main_thread());
 QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
 
@@ -401,6 +405,8 @@ BlockDriverState *bdrv_new(void)
 BlockDriverState *bs;
 int i;
 
+assert(qemu_in_main_thread());
+
 bs = g_new0(BlockDriverState, 1);
 QLIST_INIT(&bs->dirty_bitmaps);
 for (i = 0; i < BLOCK_OP_TYPE_MAX; i++) {
@@ -443,6 +449,8 @@ BlockDriver *bdrv_find_format(const char *format_name)
 BlockDriver *drv1;
 int i;
 
+assert(qemu_in_main_thread());
+
 drv1 = bdrv_do_find_format(format_name);
 if (drv1) {
 return drv1;
@@ -492,11 +500,13 @@ static int bdrv_format_is_whitelisted(const char 
*format_name, bool read_only)
 
 int bdrv_is_whitelisted(BlockDriver *drv, bool read_only)
 {
+assert(qemu_in_main_thread());
 return bdrv_format_is_whitelisted(drv->format_name, read_only);
 }
 
 bool bdrv_uses_whitelist(void)
 {
+assert(qemu_in_main_thread());
 return use_bdrv_whitelist;
 }
 
@@ -527,6 +537,8 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 {
 int ret;
 
+assert(qemu_in_main_thread());
+
 Coroutine *co;
 CreateCo cco = {
 .drv = drv,
@@ -702,6 +714,8 @@ int bdrv_create_file(const char *filename, QemuOpts *opts, 
Error **errp)
 QDict *qdict;
 int ret;
 
+assert(qemu_in_main_thread());
+
 drv = bdrv_find_protocol(filename, true, errp);
 if (drv == NULL) {
 return -ENOENT;
@@ -799,6 +813,7 @@ int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes 
*bsz)
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *filtered = bdrv_filter_bs(bs);
+assert(qemu_in_main_thread());
 
 if (drv && drv->bdrv_probe_blocksizes) {
 return drv->bdrv_probe_blocksizes(bs, bsz);
@@ -819,6 +834,7 @@ int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry 
*geo)
 {
 BlockDriver *drv = bs->drv;
 BlockDriverState *filtered = bdrv_filter_bs(bs);
+assert(qemu_in_main_thread());
 
 if (drv && drv->bdrv_probe_geometry) {
 return drv->bdrv_probe_geometry(bs, geo);
@@ -910,6 +926,7 @@ BlockDriver *bdrv_find_protocol(const char *filename,
 const char *p;
 int i;
 
+assert(qemu_in_main_thread());
 /* TODO Drivers without bdrv_file_open must be specified explicitly */
 
 /*
@@ -975,6 +992,7 @@ BlockDriver *bdrv_probe_all(const uint8_t *buf, int 
buf_size,
 {
 int score_max = 0, score;
 BlockDriver *drv = NULL, *d;
+assert(qemu_in_main_thread());
 
 QLIST_FOREACH(d, &bdrv_drivers, list) {
 if (d->bdrv_probe) {
@@ -1122,6 +1140,7 @@ int bdrv_parse_aio(const char *mode, int *flags)
  */
 int bdrv_parse_discard_flags(const char *mode, int *flags)
 {
+assert(qemu_in_main_thread());
 *flags &= ~BDRV_O_UNMAP;
 
 if (!strcmp(mode, "off") || !strcmp(mode, "ignore")) {
@@ -1142,6 +1161,7 @@ int bdrv_parse_discard_flags(const char *mode, int *flags)
  */
 int bdrv_parse_cache_mode(const char *mode, int *flags, bool *writethrough)
 {
+assert(qemu_in_main_thread());
 *flags &= ~BDRV_O_CACHE_MASK;
 
 if (!strcmp(mode, "off") || !strcmp(mode, "none")) {
@@ -1634,6 +1654,8 @@ BlockDriverState *bdrv_new_open_driver_opts(BlockDriver 
*drv,
 BlockDriverState *bs;
 int ret;
 
+assert(qemu_in_main_thread());
+
 bs = bdrv_new();
 bs->open_flags = flags;
 bs->options = options ?: qdict_new();
@@ -1659,6 +1681,7 @@ BlockDriverState *bdrv_new_open_driver_opts(BlockDriver 
*drv,
 BlockDriverState *bdrv_new_open_driver(BlockDriver *drv, const char *node_name,
int flags, Error **errp)
 {
+assert(qe

[PATCH v5 05/31] block-backend: special comments for blk_set/get_perm due to fuse

2021-11-23 Thread Emanuele Giuseppe Esposito

Fuse logic can be classified as I/O, so there is no BQL held
during its execution. And yet, it uses blk_{get/set}_perm
functions, that are classified as BQL and clearly require
the BQL lock. Since there is no easy solution for this,
add a couple of TODOs and FIXME in the relevant sections of the
code.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 block/block-backend.c | 10 ++
 block/export/fuse.c   | 16 ++--
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1f0bda578e..7a4b50a8f0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -888,6 +888,11 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, 
uint64_t shared_perm,
  Error **errp)
 {
 int ret;
+/*
+ * FIXME: blk_{get/set}_perm should be always called under
+ * BQL, but it is not the case right now (see block/export/fuse.c)
+ */
+/* assert(qemu_in_main_thread()); */
 
 if (blk->root && !blk->disable_perm) {
 ret = bdrv_child_try_set_perm(blk->root, perm, shared_perm, errp);
@@ -904,6 +909,11 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, 
uint64_t shared_perm,
 
 void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
 {
+/*
+ * FIXME: blk_{get/set}_perm should be always called under
+ * BQL, but it is not the case right now (see block/export/fuse.c)
+ */
+/* assert(qemu_in_main_thread()); */
 *perm = blk->perm;
 *shared_perm = blk->shared_perm;
 }
diff --git a/block/export/fuse.c b/block/export/fuse.c
index 823c126d23..7ceb8d783b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -89,7 +89,10 @@ static int fuse_export_create(BlockExport *blk_exp,
 /* For growable exports, take the RESIZE permission */
 if (args->growable) {
 uint64_t blk_perm, blk_shared_perm;
-
+/*
+ * FIXME: blk_{get/set}_perm should not be here, as permissions
+ * should be modified only under BQL and here we are not!
+ */
 blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
 
 ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
@@ -400,6 +403,11 @@ static int fuse_do_truncate(const FuseExport *exp, int64_t 
size,
 
 /* Growable exports have a permanent RESIZE permission */
 if (!exp->growable) {
+
+/*
+ * FIXME: blk_{get/set}_perm should not be here, as permissions
+ * should be modified only under BQL and here we are not!
+ */
 blk_get_perm(exp->common.blk, &blk_perm, &blk_shared_perm);
 
 ret = blk_set_perm(exp->common.blk, blk_perm | BLK_PERM_RESIZE,
@@ -413,7 +421,11 @@ static int fuse_do_truncate(const FuseExport *exp, int64_t 
size,
truncate_flags, NULL);
 
 if (!exp->growable) {
-/* Must succeed, because we are only giving up the RESIZE permission */
+/*
+ * Must succeed, because we are only giving up the RESIZE permission.
+ * FIXME: blk_{get/set}_perm should not be here, as permissions
+ * should be modified only under BQL and here we are not!
+ */
 blk_set_perm(exp->common.blk, blk_perm, blk_shared_perm, &error_abort);
 }
 
-- 
2.27.0

[PATCH v5 02/31] include/block/block: split header into I/O and global state API

2021-11-23 Thread Emanuele Giuseppe Esposito

block.h currently contains a mix of functions:
some of them run under the BQL and modify the block layer graph,
others are instead thread-safe and perform I/O in iothreads.
It is not easy to understand which function is part of which
group (I/O vs GS), and this patch aims to clarify it.

The "GS" functions need the BQL, and often use
aio_context_acquire/release and/or drain to be sure they
can modify the graph safely.
The I/O function are instead thread safe, and can run in
any AioContext.

By splitting the header in two files, block-io.h
and block-global-state.h we have a clearer view on what
needs what kind of protection. block-common.h
contains common structures shared by both headers.

block.h is left there for legacy and to avoid changing
all includes in all c files that use the block APIs.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block-common.h   | 380 +
 include/block/block-global-state.h | 271 +
 include/block/block-io.h   | 322 +++
 include/block/block.h  | 878 +
 block.c|   3 +
 block/meson.build  |   7 +-
 6 files changed, 1004 insertions(+), 857 deletions(-)
 create mode 100644 include/block/block-common.h
 create mode 100644 include/block/block-global-state.h
 create mode 100644 include/block/block-io.h

diff --git a/include/block/block-common.h b/include/block/block-common.h
new file mode 100644
index 00..dfaa73ac70
--- /dev/null
+++ b/include/block/block-common.h
@@ -0,0 +1,380 @@
+#ifndef BLOCK_COMMON_H
+#define BLOCK_COMMON_H
+
+#include "block/aio.h"
+#include "block/aio-wait.h"
+#include "qemu/iov.h"
+#include "qemu/coroutine.h"
+#include "block/accounting.h"
+#include "block/dirty-bitmap.h"
+#include "block/blockjob.h"
+#include "qemu/hbitmap.h"
+#include "qemu/transactions.h"
+
+/*
+ * generated_co_wrapper
+ *
+ * Function specifier, which does nothing but mark functions to be
+ * generated by scripts/block-coroutine-wrapper.py
+ *
+ * Read more in docs/devel/block-coroutine-wrapper.rst
+ */
+#define generated_co_wrapper
+
+/* block.c */
+typedef struct BlockDriver BlockDriver;
+typedef struct BdrvChild BdrvChild;
+typedef struct BdrvChildClass BdrvChildClass;
+
+typedef struct BlockDriverInfo {
+/* in bytes, 0 if irrelevant */
+int cluster_size;
+/* offset at which the VM state can be saved (0 if not possible) */
+int64_t vm_state_offset;
+bool is_dirty;
+/*
+ * True if this block driver only supports compressed writes
+ */
+bool needs_compressed_writes;
+} BlockDriverInfo;
+
+typedef struct BlockFragInfo {
+uint64_t allocated_clusters;
+uint64_t total_clusters;
+uint64_t fragmented_clusters;
+uint64_t compressed_clusters;
+} BlockFragInfo;
+
+typedef enum {
+BDRV_REQ_COPY_ON_READ   = 0x1,
+BDRV_REQ_ZERO_WRITE = 0x2,
+
+/*
+ * The BDRV_REQ_MAY_UNMAP flag is used in write_zeroes requests to indicate
+ * that the block driver should unmap (discard) blocks if it is guaranteed
+ * that the result will read back as zeroes. The flag is only passed to the
+ * driver if the block device is opened with BDRV_O_UNMAP.
+ */
+BDRV_REQ_MAY_UNMAP  = 0x4,
+
+BDRV_REQ_FUA= 0x10,
+BDRV_REQ_WRITE_COMPRESSED   = 0x20,
+
+/*
+ * Signifies that this write request will not change the visible disk
+ * content.
+ */
+BDRV_REQ_WRITE_UNCHANGED= 0x40,
+
+/*
+ * Forces request serialisation. Use only with write requests.
+ */
+BDRV_REQ_SERIALISING= 0x80,
+
+/*
+ * Execute the request only if the operation can be offloaded or otherwise
+ * be executed efficiently, but return an error instead of using a slow
+ * fallback.
+ */
+BDRV_REQ_NO_FALLBACK= 0x100,
+
+/*
+ * BDRV_REQ_PREFETCH makes sense only in the context of copy-on-read
+ * (i.e., together with the BDRV_REQ_COPY_ON_READ flag or when a COR
+ * filter is involved), in which case it signals that the COR operation
+ * need not read the data into memory (qiov) but only ensure they are
+ * copied to the top layer (i.e., that COR operation is done).
+ */
+BDRV_REQ_PREFETCH  = 0x200,
+
+/*
+ * If we need to wait for other requests, just fail immediately. Used
+ * only together with BDRV_REQ_SERIALISING.
+ */
+BDRV_REQ_NO_WAIT = 0x400,
+
+/* Mask of valid flags */
+BDRV_REQ_MASK   = 0x7ff,
+} BdrvRequestFlags;
+
+#define BDRV_O_NO_SHARE0x0001 /* don't share permissions */
+#define BDRV_O_RDWR0x0002
+#define BDRV_O_RESIZE  0x0004 /* request permission for resizing the node 
*/
+#define BDRV_O_SNAPSHOT0x0008 /* open the file read only and save
+ writes in a snapshot */
+#define BDRV_O_TEMPORARY   0x0010 /* delete the file after use */
+#def

[PATCH v5 00/31] block layer: split block APIs in global state and I/O

2021-11-23 Thread Emanuele Giuseppe Esposito

Currently, block layer APIs like block.h contain a mix of
functions that are either running in the main loop and under the
BQL, or are thread-safe functions and run in iothreads performing I/O.
The functions running under BQL also take care of modifying the
block graph, by using drain and/or aio_context_acquire/release.
This makes it very confusing to understand where each function
runs, and what assumptions it provided with regards to thread
safety.

We call the functions running under BQL "global state (GS) API", and
distinguish them from the thread-safe "I/O API".

The aim of this series is to split the relevant block headers in
global state and I/O sub-headers. The division will be done in
this way:
header.h will be split in header-global-state.h, header-io.h and
header-common.h. The latter will just contain the data structures
needed by header-global-state and header-io, and common helpers
that are neither in GS nor in I/O. header.h will remain for
legacy and to avoid changing all includes in all QEMU c files,
but will only include the two new headers. No function shall be
added in header.c .
Once we split all relevant headers, it will be much easier to see what
uses the AioContext lock and remove it, which is the overall main
goal of this and other series that I posted/will post.

In addition to splitting the relevant headers shown in this series,
it is also very helpful splitting the function pointers in some
block structures, to understand what runs under AioContext lock and
what doesn't. This is what patches 21-27 do.

Each function in the GS API will have an assertion, checking
that it is always running under BQL.
I/O functions are instead thread safe (or so should be), meaning
that they *can* run under BQL, but also in an iothread in another
AioContext. Therefore they do not provide any assertion, and
need to be audited manually to verify the correctness.

Adding assetions has helped finding 2 bugs already, as shown in
my series "Migration: fix missing iothread locking".

Tested this series by running unit tests, qemu-iotests and qtests
(x86_64).
Some functions in the GS API are used everywhere but not
properly tested. Therefore their assertion is never actually run in
the tests, so despite my very careful auditing, it is not impossible
to exclude that some will trigger while actually using QEMU.

Patch 1 introduces qemu_in_main_thread(), the function used in
all assertions. This had to be introduced otherwise all unit tests
would fail, since they run in the main loop but use the code in
stubs/iothread.c
Patches 2-27 (with the exception of patch 9-10, that are an additional
assert) are all structured in the same way: first we split the header
and in the next (or same, if small) patch we add assertions.
Patch 28-31 take care instead of the block layer permission API,
fixing some bugs where they are used in I/O functions.

Next steps once this get reviewed:
1) audit the GS API and replace the AioContext lock with drains,
or remove them when not necessary (requires further discussion).
2) [optional as it should be already the case] audit the I/O API
and check that thread safety is guaranteed

In order to keep this series a little bit smaller, move some
refactoring patches in another series, "block: minor refactoring
in preparation to the block layer API split".

Based-on: <20211124063640.3118897-1-eespo...@redhat.com>

Signed-off-by: Emanuele Giuseppe Esposito 
---
v5 -> v6:
* In short, apply all Hanna's comments. More in details,
  the following functions in the following headers have been moved:
  block-backend:
blk_replace_bs (to gs)
blk_nb_sectors (to io)
blk_name (to io)
blk_set_perm (to io)
blk_get_perm (to io)
blk_drain (to io)
blk_abort_aio_request (to io)
blk_make_empty (to gs)
blk_invalidate_cache (was in io, but had GS assertion)
blk_aio_cancel (to gs)
  block:
bdrv_replace_child_bs (to gs)
bdrv_get_device_name (to io)
bdrv_get_device_or_node_name (to io)
bdrv_drained_end_no_poll (to io)
bdrv_block_status (was in io, but had GS assertion)
bdrv_drain (to io)
bdrv_co_drain (to io)
bdrv_make_zero (was in GS, but did not have the assertion)
bdrv_save_vmstate (to io)
bdrv_load_vmstate (to io)
bdrv_aio_cancel_async (to io)
  block_int:
bdrv_get_parent_name (to io)
bdrv_apply_subtree_drain (to io)
bdrv_unapply_subtree_drain (to io)
bdrv_co_copy_range_from (was in io, but had GS assertion)
bdrv_co_copy_range_to (was in io, but had GS assertion)
->bdrv_save_vmstate (to io)
->bdrv_load_vmstate (to io)

  coding style (assertion after definitions):
bdrv_save_vmstate
bdrv_load_vmstate
block_job_next
block_job_get

  new patches:
block.c: modify .attach and .detach callbacks of child_of_bds
introduce pre_run as JobDriver callback to handle
  bdrv_co_amend usage of permission function
leave blk_set/get_perm as a TODO in fuse.c
make sure bdrv_co_invalidate_

[PATCH 4/4] include/sysemu/blockdev.h: remove drive_get_max_devs

2021-11-23 Thread Emanuele Giuseppe Esposito

Remove drive_get_max_devs, as it is not used by anyone.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/sysemu/blockdev.h |  1 -
 blockdev.c| 17 -
 2 files changed, 18 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index aacc587a33..c4b7b8b54e 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -50,7 +50,6 @@ DriveInfo *drive_get(BlockInterfaceType type, int bus, int 
unit);
 void drive_check_orphaned(void);
 DriveInfo *drive_get_by_index(BlockInterfaceType type, int index);
 int drive_get_max_bus(BlockInterfaceType type);
-int drive_get_max_devs(BlockInterfaceType type);
 DriveInfo *drive_get_next(BlockInterfaceType type);
 
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,
diff --git a/blockdev.c b/blockdev.c
index 928bb743a1..84897f9d70 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -168,23 +168,6 @@ void blockdev_auto_del(BlockBackend *blk)
 }
 }
 
-/**
- * Returns the current mapping of how many units per bus
- * a particular interface can support.
- *
- *  A positive integer indicates n units per bus.
- *  0 implies the mapping has not been established.
- * -1 indicates an invalid BlockInterfaceType was given.
- */
-int drive_get_max_devs(BlockInterfaceType type)
-{
-if (type >= IF_IDE && type < IF_COUNT) {
-return if_max_devs[type];
-}
-
-return -1;
-}
-
 static int drive_index_to_bus_id(BlockInterfaceType type, int index)
 {
 int max_devs = if_max_devs[type];
-- 
2.27.0

[PATCH v5 04/31] include/sysemu/block-backend: split header into I/O and global state (GS) API

2021-11-23 Thread Emanuele Giuseppe Esposito

Similarly to the previous patches, split block-backend.h
in block-backend-io.h and block-backend-global-state.h

In addition, remove "block/block.h" include as it seems
it is not necessary anymore, together with "qemu/iov.h"

block-backend-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.

Assertions are added in the next patch.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/sysemu/block-backend-common.h   |  84 ++
 include/sysemu/block-backend-global-state.h | 121 +
 include/sysemu/block-backend-io.h   | 139 ++
 include/sysemu/block-backend.h  | 269 +---
 block/block-backend.c   |   9 +-
 5 files changed, 353 insertions(+), 269 deletions(-)
 create mode 100644 include/sysemu/block-backend-common.h
 create mode 100644 include/sysemu/block-backend-global-state.h
 create mode 100644 include/sysemu/block-backend-io.h

diff --git a/include/sysemu/block-backend-common.h 
b/include/sysemu/block-backend-common.h
new file mode 100644
index 00..6963bbf45a
--- /dev/null
+++ b/include/sysemu/block-backend-common.h
@@ -0,0 +1,84 @@
+/*
+ * QEMU Block backends
+ *
+ * Copyright (C) 2014-2016 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster ,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef BLOCK_BACKEND_COMMON_H
+#define BLOCK_BACKEND_COMMON_H
+
+#include "qemu/iov.h"
+#include "block/throttle-groups.h"
+
+/*
+ * TODO Have to include block/block.h for a bunch of block layer
+ * types.  Unfortunately, this pulls in the whole BlockDriverState
+ * API, which we don't want used by many BlockBackend users.  Some of
+ * the types belong here, and the rest should be split into a common
+ * header and one for the BlockDriverState API.
+ */
+#include "block/block.h"
+
+/* Callbacks for block device models */
+typedef struct BlockDevOps {
+/*
+ * Runs when virtual media changed (monitor commands eject, change)
+ * Argument load is true on load and false on eject.
+ * Beware: doesn't run when a host device's physical media
+ * changes.  Sure would be useful if it did.
+ * Device models with removable media must implement this callback.
+ */
+void (*change_media_cb)(void *opaque, bool load, Error **errp);
+/*
+ * Runs when an eject request is issued from the monitor, the tray
+ * is closed, and the medium is locked.
+ * Device models that do not implement is_medium_locked will not need
+ * this callback.  Device models that can lock the medium or tray might
+ * want to implement the callback and unlock the tray when "force" is
+ * true, even if they do not support eject requests.
+ */
+void (*eject_request_cb)(void *opaque, bool force);
+/*
+ * Is the virtual tray open?
+ * Device models implement this only when the device has a tray.
+ */
+bool (*is_tray_open)(void *opaque);
+/*
+ * Is the virtual medium locked into the device?
+ * Device models implement this only when device has such a lock.
+ */
+bool (*is_medium_locked)(void *opaque);
+/*
+ * Runs when the size changed (e.g. monitor command block_resize)
+ */
+void (*resize_cb)(void *opaque);
+/*
+ * Runs when the backend receives a drain request.
+ */
+void (*drained_begin)(void *opaque);
+/*
+ * Runs when the backend's last drain request ends.
+ */
+void (*drained_end)(void *opaque);
+/*
+ * Is the device still busy?
+ */
+bool (*drained_poll)(void *opaque);
+} BlockDevOps;
+
+/*
+ * This struct is embedded in (the private) BlockBackend struct and contains
+ * fields that must be public. This is in particular for QLIST_ENTRY() and
+ * friends so that BlockBackends can be kept in lists outside block-backend.c
+ */
+typedef struct BlockBackendPublic {
+ThrottleGroupMember throttle_group_member;
+} BlockBackendPublic;
+
+#endif /* BLOCK_BACKEND_COMMON_H */
diff --git a/include/sysemu/block-backend-global-state.h 
b/include/sysemu/block-backend-global-state.h
new file mode 100644
index 00..77009bf7a2
--- /dev/null
+++ b/include/sysemu/block-backend-global-state.h
@@ -0,0 +1,121 @@
+/*
+ * QEMU Block backends
+ *
+ * Copyright (C) 2014-2016 Red Hat, Inc.
+ *
+ * Authors:
+ *  Markus Armbruster ,
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1
+ * or later.  See the COPYING.LIB file in the top-level directory.
+ */
+
+#ifndef BLOCK_BACKEND_GS_H
+#define BLOCK_BACKEND_GS_H
+
+#include "block-backend-common.h"
+
+/*
+ * Global state (GS) API. These functions run under the BQL lock.
+ *
+ * See include/block/block-global-state.h for more information about
+ * the GS API.
+ */
+
+BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm);
+BlockBackend *blk_new_with_bs(

[PATCH v5 01/31] main-loop.h: introduce qemu_in_main_thread()

2021-11-23 Thread Emanuele Giuseppe Esposito

When invoked from the main loop, this function is the same
as qemu_mutex_iothread_locked, and returns true if the BQL is held.
When invoked from iothreads or tests, it returns true only
if the current AioContext is the Main Loop.

This essentially just extends qemu_mutex_iothread_locked to work
also in unit tests or other users like storage-daemon, that run
in the Main Loop but end up using the implementation in
stubs/iothread-lock.c.

Using qemu_mutex_iothread_locked in unit tests defaults to false
because they use the implementation in stubs/iothread-lock,
making all assertions added in next patches fail despite the
AioContext is still the main loop.

Signed-off-by: Emanuele Giuseppe Esposito 
Reviewed-by: Stefan Hajnoczi 
---
 include/qemu/main-loop.h | 13 +
 softmmu/cpus.c   |  5 +
 stubs/iothread-lock.c|  5 +
 3 files changed, 23 insertions(+)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index 8dbc6fcb89..6b8fa57c5d 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -245,6 +245,19 @@ AioContext *iohandler_get_aio_context(void);
  */
 bool qemu_mutex_iothread_locked(void);
 
+/**
+ * qemu_in_main_thread: same as qemu_mutex_iothread_locked when
+ * softmmu/cpus.c implementation is linked. Otherwise this function
+ * checks that the current AioContext is the global AioContext
+ * (main loop).
+ *
+ * This is useful when checking that the BQL is held, to avoid
+ * returning false when invoked by unit tests or other users like
+ * storage-daemon that end up using stubs/iothread-lock.c
+ * implementation.
+ */
+bool qemu_in_main_thread(void);
+
 /**
  * qemu_mutex_lock_iothread: Lock the main loop mutex.
  *
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 071085f840..3f61a3c31d 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -481,6 +481,11 @@ bool qemu_mutex_iothread_locked(void)
 return iothread_locked;
 }
 
+bool qemu_in_main_thread(void)
+{
+return qemu_mutex_iothread_locked();
+}
+
 /*
  * The BQL is taken from so many places that it is worth profiling the
  * callers directly, instead of funneling them all through a single function.
diff --git a/stubs/iothread-lock.c b/stubs/iothread-lock.c
index 5b45b7fc8b..ff7386e42c 100644
--- a/stubs/iothread-lock.c
+++ b/stubs/iothread-lock.c
@@ -6,6 +6,11 @@ bool qemu_mutex_iothread_locked(void)
 return false;
 }
 
+bool qemu_in_main_thread(void)
+{
+return qemu_get_current_aio_context() == qemu_get_aio_context();
+}
+
 void qemu_mutex_lock_iothread_impl(const char *file, int line)
 {
 }
-- 
2.27.0

[PATCH 3/4] include/sysemu/blockdev.h: move drive_add and inline drive_def

2021-11-23 Thread Emanuele Giuseppe Esposito

drive_add is only used in softmmu/vl.c, so it can be a static
function there, and drive_def is only a particular use case of
qemu_opts_parse_noisily, so it can be inlined.

Also remove drive_mark_claimed_by_board, as it is only defined
but not implemented (nor used) anywhere.

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/sysemu/blockdev.h  |  6 ++
 block/monitor/block-hmp-cmds.c |  2 +-
 blockdev.c | 27 +--
 softmmu/vl.c   | 25 -
 4 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h
index 32c2d6023c..aacc587a33 100644
--- a/include/sysemu/blockdev.h
+++ b/include/sysemu/blockdev.h
@@ -27,6 +27,8 @@ typedef enum {
 IF_COUNT
 } BlockInterfaceType;
 
+extern const char *const block_if_name[];
+
 struct DriveInfo {
 BlockInterfaceType type;
 int bus;
@@ -45,16 +47,12 @@ BlockBackend *blk_by_legacy_dinfo(DriveInfo *dinfo);
 void override_max_devs(BlockInterfaceType type, int max_devs);
 
 DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit);
-void drive_mark_claimed_by_board(void);
 void drive_check_orphaned(void);
 DriveInfo *drive_get_by_index(BlockInterfaceType type, int index);
 int drive_get_max_bus(BlockInterfaceType type);
 int drive_get_max_devs(BlockInterfaceType type);
 DriveInfo *drive_get_next(BlockInterfaceType type);
 
-QemuOpts *drive_def(const char *optstr);
-QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
-const char *optstr);
 DriveInfo *drive_new(QemuOpts *arg, BlockInterfaceType block_default_type,
  Error **errp);
 
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 2ac4aedfff..bfb3c043a0 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -101,7 +101,7 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
 return;
 }
 
-opts = drive_def(optstr);
+opts = qemu_opts_parse_noisily(qemu_find_opts("drive"), optstr, false);
 if (!opts)
 return;
 
diff --git a/blockdev.c b/blockdev.c
index 1b6ffbbc73..928bb743a1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -71,7 +71,7 @@ void bdrv_set_monitor_owned(BlockDriverState *bs)
 QTAILQ_INSERT_TAIL(&monitor_bdrv_states, bs, monitor_list);
 }
 
-static const char *const block_if_name[IF_COUNT] = {
+const char *const block_if_name[IF_COUNT] = {
 [IF_NONE] = "none",
 [IF_IDE] = "ide",
 [IF_SCSI] = "scsi",
@@ -197,31 +197,6 @@ static int drive_index_to_unit_id(BlockInterfaceType type, 
int index)
 return max_devs ? index % max_devs : index;
 }
 
-QemuOpts *drive_def(const char *optstr)
-{
-return qemu_opts_parse_noisily(qemu_find_opts("drive"), optstr, false);
-}
-
-QemuOpts *drive_add(BlockInterfaceType type, int index, const char *file,
-const char *optstr)
-{
-QemuOpts *opts;
-
-opts = drive_def(optstr);
-if (!opts) {
-return NULL;
-}
-if (type != IF_DEFAULT) {
-qemu_opt_set(opts, "if", block_if_name[type], &error_abort);
-}
-if (index >= 0) {
-qemu_opt_set_number(opts, "index", index, &error_abort);
-}
-if (file)
-qemu_opt_set(opts, "file", file, &error_abort);
-return opts;
-}
-
 DriveInfo *drive_get(BlockInterfaceType type, int bus, int unit)
 {
 BlockBackend *blk;
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 1159a64bce..f47827926c 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -650,6 +650,27 @@ static int drive_enable_snapshot(void *opaque, QemuOpts 
*opts, Error **errp)
 return 0;
 }
 
+static QemuOpts *drive_add(BlockInterfaceType type, int index,
+   const char *file, const char *optstr)
+{
+QemuOpts *opts;
+
+opts = qemu_opts_parse_noisily(qemu_find_opts("drive"), optstr, false);
+if (!opts) {
+return NULL;
+}
+if (type != IF_DEFAULT) {
+qemu_opt_set(opts, "if", block_if_name[type], &error_abort);
+}
+if (index >= 0) {
+qemu_opt_set_number(opts, "index", index, &error_abort);
+}
+if (file) {
+qemu_opt_set(opts, "file", file, &error_abort);
+}
+return opts;
+}
+
 static void default_drive(int enable, int snapshot, BlockInterfaceType type,
   int index, const char *optstr)
 {
@@ -2884,7 +2905,9 @@ void qemu_init(int argc, char **argv, char **envp)
 break;
 }
 case QEMU_OPTION_drive:
-if (drive_def(optarg) == NULL) {
+opts = qemu_opts_parse_noisily(qemu_find_opts("drive"),
+   optarg, false);
+if (opts == NULL) {
 exit(1);
 }
 break;
-- 
2.27.0

[PATCH 1/4] block_int: make bdrv_backing_overridden static

2021-11-23 Thread Emanuele Giuseppe Esposito

bdrv_backing_overridden is only used in block.c, so there is
no need to leave it in block_int.h

Signed-off-by: Emanuele Giuseppe Esposito 
---
 include/block/block_int.h | 3 ---
 block.c   | 4 +++-
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index f4c75e8ba9..27008cfb22 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1122,9 +1122,6 @@ BlockDriver *bdrv_probe_all(const uint8_t *buf, int 
buf_size,
 void bdrv_parse_filename_strip_prefix(const char *filename, const char *prefix,
   QDict *options);
 
-bool bdrv_backing_overridden(BlockDriverState *bs);
-
-
 /**
  * bdrv_add_aio_context_notifier:
  *
diff --git a/block.c b/block.c
index 0ac5b163d2..10346b5011 100644
--- a/block.c
+++ b/block.c
@@ -103,6 +103,8 @@ static int bdrv_reopen_prepare(BDRVReopenState 
*reopen_state,
 static void bdrv_reopen_commit(BDRVReopenState *reopen_state);
 static void bdrv_reopen_abort(BDRVReopenState *reopen_state);
 
+static bool bdrv_backing_overridden(BlockDriverState *bs);
+
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
 
@@ -7475,7 +7477,7 @@ static bool append_strong_runtime_options(QDict *d, 
BlockDriverState *bs)
 /* Note: This function may return false positives; it may return true
  * even if opening the backing file specified by bs's image header
  * would result in exactly bs->backing. */
-bool bdrv_backing_overridden(BlockDriverState *bs)
+static bool bdrv_backing_overridden(BlockDriverState *bs)
 {
 if (bs->backing) {
 return strcmp(bs->auto_backing_file,
-- 
2.27.0

[PATCH 2/4] include/sysemu/blockdev.h: rename if_name in block_if_name

2021-11-23 Thread Emanuele Giuseppe Esposito

In preparation to next patch, where we export it to be used
also in softmmu/vlc.c

Signed-off-by: Emanuele Giuseppe Esposito 
---
 blockdev.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index b35072644e..1b6ffbbc73 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -71,7 +71,7 @@ void bdrv_set_monitor_owned(BlockDriverState *bs)
 QTAILQ_INSERT_TAIL(&monitor_bdrv_states, bs, monitor_list);
 }
 
-static const char *const if_name[IF_COUNT] = {
+static const char *const block_if_name[IF_COUNT] = {
 [IF_NONE] = "none",
 [IF_IDE] = "ide",
 [IF_SCSI] = "scsi",
@@ -120,7 +120,7 @@ void override_max_devs(BlockInterfaceType type, int 
max_devs)
 if (dinfo->type == type) {
 fprintf(stderr, "Cannot override units-per-bus property of"
 " the %s interface, because a drive of that type has"
-" already been added.\n", if_name[type]);
+" already been added.\n", block_if_name[type]);
 g_assert_not_reached();
 }
 }
@@ -212,7 +212,7 @@ QemuOpts *drive_add(BlockInterfaceType type, int index, 
const char *file,
 return NULL;
 }
 if (type != IF_DEFAULT) {
-qemu_opt_set(opts, "if", if_name[type], &error_abort);
+qemu_opt_set(opts, "if", block_if_name[type], &error_abort);
 }
 if (index >= 0) {
 qemu_opt_set_number(opts, "index", index, &error_abort);
@@ -269,7 +269,7 @@ void drive_check_orphaned(void)
 qemu_opts_loc_restore(dinfo->opts);
 error_report("machine type does not support"
  " if=%s,bus=%d,unit=%d",
- if_name[dinfo->type], dinfo->bus, dinfo->unit);
+ block_if_name[dinfo->type], dinfo->bus, dinfo->unit);
 loc_pop(&loc);
 orphans = true;
 }
@@ -887,7 +887,7 @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType 
block_default_type,
 value = qemu_opt_get(legacy_opts, "if");
 if (value) {
 for (type = 0;
- type < IF_COUNT && strcmp(value, if_name[type]);
+ type < IF_COUNT && strcmp(value, block_if_name[type]);
  type++) {
 }
 if (type == IF_COUNT) {
@@ -945,10 +945,10 @@ DriveInfo *drive_new(QemuOpts *all_opts, 
BlockInterfaceType block_default_type,
 mediastr = (media == MEDIA_CDROM) ? "-cd" : "-hd";
 }
 if (max_devs) {
-new_id = g_strdup_printf("%s%i%s%i", if_name[type], bus_id,
+new_id = g_strdup_printf("%s%i%s%i", block_if_name[type], bus_id,
  mediastr, unit_id);
 } else {
-new_id = g_strdup_printf("%s%s%i", if_name[type],
+new_id = g_strdup_printf("%s%s%i", block_if_name[type],
  mediastr, unit_id);
 }
 qdict_put_str(bs_opts, "id", new_id);
-- 
2.27.0

[PATCH 0/4] block: minor refactoring in preparation to the block layer API split

2021-11-23 Thread Emanuele Giuseppe Esposito

These patches are taken from my old patches and feedback of
my series "block layer: split block APIs in global state and I/O".

The reason for a separate series is that the original one is
already too long, and these patches are just refactoring the code,
mainly deleting or moving functions in blockdev.h and block_int.h.

Signed-off-by: Emanuele Giuseppe Esposito 

Emanuele Giuseppe Esposito (4):
  block_int: make bdrv_backing_overridden static
  include/sysemu/blockdev.h: rename if_name in block_if_name
  include/sysemu/blockdev.h: move drive_add and inline drive_def
  include/sysemu/blockdev.h: remove drive_get_max_devs

 include/block/block_int.h  |  3 --
 include/sysemu/blockdev.h  |  7 ++---
 block.c|  4 ++-
 block/monitor/block-hmp-cmds.c |  2 +-
 blockdev.c | 54 --
 softmmu/vl.c   | 25 +++-
 6 files changed, 36 insertions(+), 59 deletions(-)

-- 
2.27.0

Re: [PATCH v5 15/18] target/riscv: adding high part of some csrs

2021-11-23 Thread Alistair Francis

On Sat, Nov 13, 2021 at 1:19 AM Frédéric Pétrot
 wrote:
>
> Adding the high part of a very minimal set of csr.
>
> Signed-off-by: Frédéric Pétrot 
> Co-authored-by: Fabien Portas 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.h | 4 
>  target/riscv/machine.c | 2 ++
>  2 files changed, 6 insertions(+)
>
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index ae1f9cb876..15609a5533 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -195,6 +195,10 @@ struct CPURISCVState {
>  target_ulong hgatp;
>  uint64_t htimedelta;
>
> +/* Upper 64-bits of 128-bit CSRs */
> +uint64_t mscratchh;
> +uint64_t sscratchh;
> +
>  /* Virtual CSRs */
>  /*
>   * For RV32 this is 32-bit vsstatus and 32-bit vsstatush.
> diff --git a/target/riscv/machine.c b/target/riscv/machine.c
> index 7e2d02457e..6f0eabf66a 100644
> --- a/target/riscv/machine.c
> +++ b/target/riscv/machine.c
> @@ -179,6 +179,8 @@ static const VMStateDescription vmstate_rv128 = {
>  .needed = rv128_needed,
>  .fields = (VMStateField[]) {
>  VMSTATE_UINTTL_ARRAY(env.gprh, RISCVCPU, 32),
> +VMSTATE_UINT64(env.mscratchh, RISCVCPU),
> +VMSTATE_UINT64(env.sscratchh, RISCVCPU),
>  VMSTATE_END_OF_LIST()
>  }
>  };
> --
> 2.33.1
>
>

Re: [PATCH v5 11/18] target/riscv: support for 128-bit U-type instructions

2021-11-23 Thread Alistair Francis

On Sat, Nov 13, 2021 at 1:12 AM Frédéric Pétrot
 wrote:
>
> Adding the 128-bit version of lui and auipc, and introducing to that end
> a "set register with immediate" function to handle extension on 128 bits.
>
> Signed-off-by: Frédéric Pétrot 
> Co-authored-by: Fabien Portas 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/translate.c| 21 +
>  target/riscv/insn_trans/trans_rvi.c.inc |  8 
>  2 files changed, 25 insertions(+), 4 deletions(-)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 508ae87985..d2a2f1021d 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -289,6 +289,27 @@ static void gen_set_gpr(DisasContext *ctx, int reg_num, 
> TCGv t)
>  }
>  }
>
> +static void gen_set_gpri(DisasContext *ctx, int reg_num, target_long imm)
> +{
> +if (reg_num != 0) {
> +switch (get_ol(ctx)) {
> +case MXL_RV32:
> +tcg_gen_movi_tl(cpu_gpr[reg_num], (int32_t)imm);
> +break;
> +case MXL_RV64:
> +case MXL_RV128:
> +tcg_gen_movi_tl(cpu_gpr[reg_num], imm);
> +break;
> +default:
> +g_assert_not_reached();
> +}
> +
> +if (get_xl_max(ctx) == MXL_RV128) {
> +tcg_gen_movi_tl(cpu_gprh[reg_num], -(imm < 0));
> +}
> +}
> +}
> +
>  static void gen_set_gpr128(DisasContext *ctx, int reg_num, TCGv rl, TCGv rh)
>  {
>  assert(get_ol(ctx) == MXL_RV128);
> diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> b/target/riscv/insn_trans/trans_rvi.c.inc
> index fc73735b9e..0070fe606a 100644
> --- a/target/riscv/insn_trans/trans_rvi.c.inc
> +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> @@ -26,14 +26,14 @@ static bool trans_illegal(DisasContext *ctx, arg_empty *a)
>
>  static bool trans_c64_illegal(DisasContext *ctx, arg_empty *a)
>  {
> - REQUIRE_64BIT(ctx);
> - return trans_illegal(ctx, a);
> +REQUIRE_64_OR_128BIT(ctx);
> +return trans_illegal(ctx, a);
>  }
>
>  static bool trans_lui(DisasContext *ctx, arg_lui *a)
>  {
>  if (a->rd != 0) {
> -tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm);
> +gen_set_gpri(ctx, a->rd, a->imm);
>  }
>  return true;
>  }
> @@ -41,7 +41,7 @@ static bool trans_lui(DisasContext *ctx, arg_lui *a)
>  static bool trans_auipc(DisasContext *ctx, arg_auipc *a)
>  {
>  if (a->rd != 0) {
> -tcg_gen_movi_tl(cpu_gpr[a->rd], a->imm + ctx->base.pc_next);
> +gen_set_gpri(ctx, a->rd, a->imm + ctx->base.pc_next);
>  }
>  return true;
>  }
> --
> 2.33.1
>
>

Re: [PATCH v5 10/18] target/riscv: support for 128-bit bitwise instructions

2021-11-23 Thread Alistair Francis

On Sat, Nov 13, 2021 at 1:20 AM Frédéric Pétrot
 wrote:
>
> The 128-bit bitwise instructions do not need any function prototype change
> as the functions can be applied independently on the lower and upper part of
> the registers.
>
> Signed-off-by: Frédéric Pétrot 
> Co-authored-by: Fabien Portas 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/translate.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> index 554cf05084..508ae87985 100644
> --- a/target/riscv/translate.c
> +++ b/target/riscv/translate.c
> @@ -448,7 +448,15 @@ static bool gen_logic_imm_fn(DisasContext *ctx, arg_i *a,
>
>  func(dest, src1, a->imm);
>
> -gen_set_gpr(ctx, a->rd, dest);
> +if (get_xl(ctx) == MXL_RV128) {
> +TCGv src1h = get_gprh(ctx, a->rs1);
> +TCGv desth = dest_gprh(ctx, a->rd);
> +
> +func(desth, src1h, -(a->imm < 0));
> +gen_set_gpr128(ctx, a->rd, dest, desth);
> +} else {
> +gen_set_gpr(ctx, a->rd, dest);
> +}
>
>  return true;
>  }
> @@ -462,7 +470,16 @@ static bool gen_logic(DisasContext *ctx, arg_r *a,
>
>  func(dest, src1, src2);
>
> -gen_set_gpr(ctx, a->rd, dest);
> +if (get_xl(ctx) == MXL_RV128) {
> +TCGv src1h = get_gprh(ctx, a->rs1);
> +TCGv src2h = get_gprh(ctx, a->rs2);
> +TCGv desth = dest_gprh(ctx, a->rd);
> +
> +func(desth, src1h, src2h);
> +gen_set_gpr128(ctx, a->rd, dest, desth);
> +} else {
> +gen_set_gpr(ctx, a->rd, dest);
> +}
>
>  return true;
>  }
> --
> 2.33.1
>
>

Re: [PATCH v5 07/18] target/riscv: setup everything so that riscv128-softmmu compiles

2021-11-23 Thread Alistair Francis

On Sat, Nov 13, 2021 at 1:16 AM Frédéric Pétrot
 wrote:
>
> This patch is kind of a mess because several files have to be slightly
> modified to allow for a new target. In the current status, we have done
> our best to have RV64 and RV128 under the same RV64 umbrella, but there
> is still work to do to have a single executable for both.
> In particular, we have no atomic accesses for aligned 128-bit addresses.
>
> Once this patch applied, adding risc128-sofmmu to --target-list produces
> a (no so useful yet) executable.

I can't remember if we discussed this before, but do we need the
riscv128-sofmmu executable? Can we instead just use a riscv64-sofmmu
executable?

Alistair

>
> Signed-off-by: Frédéric Pétrot 
> Co-authored-by: Fabien Portas 
> ---
>  configs/devices/riscv128-softmmu/default.mak | 17 +++
>  configs/targets/riscv128-softmmu.mak |  6 ++
>  include/disas/dis-asm.h  |  1 +
>  include/hw/riscv/sifive_cpu.h|  3 +++
>  target/riscv/cpu-param.h |  5 +
>  target/riscv/cpu.h   |  3 +++
>  disas/riscv.c|  5 +
>  target/riscv/cpu.c   | 22 ++--
>  target/riscv/gdbstub.c   |  8 +++
>  target/riscv/insn_trans/trans_rvd.c.inc  | 12 +--
>  target/riscv/insn_trans/trans_rvf.c.inc  |  6 +++---
>  target/riscv/Kconfig |  3 +++
>  12 files changed, 80 insertions(+), 11 deletions(-)
>  create mode 100644 configs/devices/riscv128-softmmu/default.mak
>  create mode 100644 configs/targets/riscv128-softmmu.mak
>
> diff --git a/configs/devices/riscv128-softmmu/default.mak 
> b/configs/devices/riscv128-softmmu/default.mak
> new file mode 100644
> index 00..e838f35785
> --- /dev/null
> +++ b/configs/devices/riscv128-softmmu/default.mak
> @@ -0,0 +1,17 @@
> +# Default configuration for riscv128-softmmu
> +
> +# Uncomment the following lines to disable these optional devices:
> +#
> +#CONFIG_PCI_DEVICES=n
> +# No does not seem to be an option for these two parameters
> +CONFIG_SEMIHOSTING=y
> +CONFIG_ARM_COMPATIBLE_SEMIHOSTING=y
> +
> +# Boards:
> +#
> +CONFIG_SPIKE=n
> +CONFIG_SIFIVE_E=n
> +CONFIG_SIFIVE_U=n
> +CONFIG_RISCV_VIRT=y
> +CONFIG_MICROCHIP_PFSOC=n
> +CONFIG_SHAKTI_C=n
> diff --git a/configs/targets/riscv128-softmmu.mak 
> b/configs/targets/riscv128-softmmu.mak
> new file mode 100644
> index 00..d812cc1c80
> --- /dev/null
> +++ b/configs/targets/riscv128-softmmu.mak
> @@ -0,0 +1,6 @@
> +TARGET_ARCH=riscv128
> +TARGET_BASE_ARCH=riscv
> +# As long as we have no atomic accesses for aligned 128-bit addresses
> +TARGET_SUPPORTS_MTTCG=n
> +TARGET_XML_FILES=gdb-xml/riscv-64bit-cpu.xml gdb-xml/riscv-32bit-fpu.xml 
> gdb-xml/riscv-64bit-fpu.xml gdb-xml/riscv-64bit-virtual.xml
> +TARGET_NEED_FDT=y
> diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
> index 08e1beec85..102a1e7f50 100644
> --- a/include/disas/dis-asm.h
> +++ b/include/disas/dis-asm.h
> @@ -459,6 +459,7 @@ int print_insn_nios2(bfd_vma, disassemble_info*);
>  int print_insn_xtensa   (bfd_vma, disassemble_info*);
>  int print_insn_riscv32  (bfd_vma, disassemble_info*);
>  int print_insn_riscv64  (bfd_vma, disassemble_info*);
> +int print_insn_riscv128 (bfd_vma, disassemble_info*);
>  int print_insn_rx(bfd_vma, disassemble_info *);
>  int print_insn_hexagon(bfd_vma, disassemble_info *);
>
> diff --git a/include/hw/riscv/sifive_cpu.h b/include/hw/riscv/sifive_cpu.h
> index 136799633a..64078feba8 100644
> --- a/include/hw/riscv/sifive_cpu.h
> +++ b/include/hw/riscv/sifive_cpu.h
> @@ -26,6 +26,9 @@
>  #elif defined(TARGET_RISCV64)
>  #define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
>  #define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
> +#else
> +#define SIFIVE_E_CPU TYPE_RISCV_CPU_SIFIVE_E51
> +#define SIFIVE_U_CPU TYPE_RISCV_CPU_SIFIVE_U54
>  #endif
>
>  #endif /* HW_SIFIVE_CPU_H */
> diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
> index 80eb615f93..c10459b56f 100644
> --- a/target/riscv/cpu-param.h
> +++ b/target/riscv/cpu-param.h
> @@ -16,6 +16,11 @@
>  # define TARGET_LONG_BITS 32
>  # define TARGET_PHYS_ADDR_SPACE_BITS 34 /* 22-bit PPN */
>  # define TARGET_VIRT_ADDR_SPACE_BITS 32 /* sv32 */
> +#else
> +/* 64-bit target, since QEMU isn't built to have TARGET_LONG_BITS over 64 */
> +# define TARGET_LONG_BITS 64
> +# define TARGET_PHYS_ADDR_SPACE_BITS 56 /* 44-bit PPN */
> +# define TARGET_VIRT_ADDR_SPACE_BITS 48 /* sv48 */
>  #endif
>  #define TARGET_PAGE_BITS 12 /* 4 KiB Pages */
>  /*
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 53a295efb7..8ff5b08d15 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -38,6 +38,7 @@
>  #define TYPE_RISCV_CPU_ANY  RISCV_CPU_TYPE_NAME("any")
>  #define TYPE_RISCV_CPU_BASE32   RISCV_CPU_TYPE_NAME("rv32")
>  #define TYPE_RISCV_CPU_BASE64   RISCV_CPU

[PATCH] intel-iommu: ignore SNP bit in scalable mode

2021-11-23 Thread Jason Wang

When booting with scalable mode, I hit this error:

qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero 
iova=0xf002, level=0x1slpte=0x102681803)
qemu-system-x86_64: vtd_iommu_translate: detected translation failure 
(dev=01:00:00, iova=0xf002)
qemu-system-x86_64: New fault is not recorded due to compression of faults

This is because the SNP bit is set since Linux kernel commit
6c00612d0cba1 ("iommu/vt-d: Report right snoop capability when using
FL for IOVA") where SNP bit is set if scalable mode is on though this
seems to be an violation on the spec which said the SNP bit is
considered to be reserved if SC is not supported.

To unbreak the guest, ignore the SNP bit for scalable mode first. In
the future we may consider to add SC support.

Signed-off-by: Jason Wang 
---
 hw/i386/intel_iommu.c  | 18 --
 hw/i386/intel_iommu_internal.h |  2 ++
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 294499ee20..3bcac56c3e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -969,7 +969,8 @@ static dma_addr_t vtd_get_iova_pgtbl_base(IntelIOMMUState 
*s,
 static uint64_t vtd_spte_rsvd[5];
 static uint64_t vtd_spte_rsvd_large[5];
 
-static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level)
+static bool vtd_slpte_nonzero_rsvd(IntelIOMMUState *s,
+   uint64_t slpte, uint32_t level)
 {
 uint64_t rsvd_mask = vtd_spte_rsvd[level];
 
@@ -979,6 +980,10 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
uint32_t level)
 rsvd_mask = vtd_spte_rsvd_large[level];
 }
 
+if (s->scalable_mode) {
+rsvd_mask &= ~VTD_SPTE_SNP;
+}
+
 return slpte & rsvd_mask;
 }
 
@@ -1054,7 +1059,7 @@ static int vtd_iova_to_slpte(IntelIOMMUState *s, 
VTDContextEntry *ce,
   iova, level, slpte, is_write);
 return is_write ? -VTD_FR_WRITE : -VTD_FR_READ;
 }
-if (vtd_slpte_nonzero_rsvd(slpte, level)) {
+if (vtd_slpte_nonzero_rsvd(s, slpte, level)) {
 error_report_once("%s: detected splte reserve non-zero "
   "iova=0x%" PRIx64 ", level=0x%" PRIx32
   "slpte=0x%" PRIx64 ")", __func__, iova,
@@ -1185,7 +1190,8 @@ static int vtd_page_walk_one(IOMMUTLBEvent *event, 
vtd_page_walk_info *info)
  * @write: whether parent level has write permission
  * @info: constant information for the page walk
  */
-static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
+static int vtd_page_walk_level(IntelIOMMUState *s,
+   dma_addr_t addr, uint64_t start,
uint64_t end, uint32_t level, bool read,
bool write, vtd_page_walk_info *info)
 {
@@ -1214,7 +1220,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
 goto next;
 }
 
-if (vtd_slpte_nonzero_rsvd(slpte, level)) {
+if (vtd_slpte_nonzero_rsvd(s, slpte, level)) {
 trace_vtd_page_walk_skip_reserve(iova, iova_next);
 goto next;
 }
@@ -1235,7 +1241,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t 
start,
  * This is a valid PDE (or even bigger than PDE).  We need
  * to walk one further level.
  */
-ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, info->aw),
+ret = vtd_page_walk_level(s, vtd_get_slpte_addr(slpte, info->aw),
   iova, MIN(iova_next, end), level - 1,
   read_cur, write_cur, info);
 } else {
@@ -1294,7 +1300,7 @@ static int vtd_page_walk(IntelIOMMUState *s, 
VTDContextEntry *ce,
 end = vtd_iova_limit(s, ce, info->aw);
 }
 
-return vtd_page_walk_level(addr, start, end, level, true, true, info);
+return vtd_page_walk_level(s, addr, start, end, level, true, true, info);
 }
 
 static int vtd_root_entry_rsvd_bits_check(IntelIOMMUState *s,
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 3d5487fe2c..a6c788049b 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -388,6 +388,8 @@ typedef union VTDInvDesc VTDInvDesc;
 #define VTD_INV_DESC_DEVICE_IOTLB_RSVD_LO 0xffe0fff8
 
 /* Rsvd field masks for spte */
+#define VTD_SPTE_SNP 0x800ULL
+
 #define VTD_SPTE_PAGE_L1_RSVD_MASK(aw, dt_supported) \
 dt_supported ? \
 (0x800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \
-- 
2.25.1

[PATCH] virtio-balloon: correct used length

2021-11-23 Thread Jason Wang

Spec said:

"and len the total of bytes written into the buffer."

For inflateq, deflateq and statsq, we don't process in_sg so the used
length should be zero. For free_page_vq, though the free pages are
supplied via in_sgs, zero used length should still be fine since
anyway driver is expected to use the length in this case and it
simplifies the error handling path.

Signed-off-by: Jason Wang 
---
 hw/virtio/virtio-balloon.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index c6962fcbfe..3e52daa793 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -231,7 +231,7 @@ static void balloon_stats_poll_cb(void *opaque)
 return;
 }
 
-virtqueue_push(s->svq, s->stats_vq_elem, s->stats_vq_offset);
+virtqueue_push(s->svq, s->stats_vq_elem, 0);
 virtio_notify(vdev, s->svq);
 g_free(s->stats_vq_elem);
 s->stats_vq_elem = NULL;
@@ -438,7 +438,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 memory_region_unref(section.mr);
 }
 
-virtqueue_push(vq, elem, offset);
+virtqueue_push(vq, elem, 0);
 virtio_notify(vdev, vq);
 g_free(elem);
 virtio_balloon_pbp_free(&pbp);
@@ -549,7 +549,7 @@ static bool get_free_page_hints(VirtIOBalloon *dev)
 }
 
 out:
-virtqueue_push(vq, elem, 1);
+virtqueue_push(vq, elem, 0);
 g_free(elem);
 return ret;
 }
-- 
2.25.1

Re: [PATCH v11 19/26] linux-user: Add LoongArch signal support

2021-11-23 Thread gaosong


Hi Richard,

On 2021/11/20 下午6:33, Richard Henderson wrote:

On 11/19/21 7:13 AM, Song Gao wrote:


+
+struct target_sigcontext {
+    uint64_t   sc_pc;
+    uint64_t   sc_gpr[32];
+    uint64_t   sc_fpr[32];
+    uint64_t   sc_fcc;
+    uint32_t   sc_fcsr;
+    uint32_t   sc_flags;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/sigcontext.h 




#define FPU_REG_WIDTH   256
union fpureg {
uint32_t   val32[FPU_REG_WIDTH / 32];
uint64_t   val64[FPU_REG_WIDTH / 64];
};

struct target_sigcontext {
uint64_t   sc_pc;
uint64_t   sc_regs[32];
uint32_t   sc_flags;

uint32_t   sc_fcsr;
uint32_t   sc_vcsr;
uint64_t   sc_fcc;
uint64_t   scr[4];
union fpuregsc_fpregs[32] __attribute__((aligned(32)));

uint32_t   sc_reserved;
};

Is this OK?


+
+struct target_ucontext {
+    target_ulong tuc_flags;
+    target_ulong tuc_link;
+    target_stack_t tuc_stack;
+    target_ulong pad0;
+    struct target_sigcontext tuc_mcontext;
+    target_sigset_t tuc_sigmask;
+};


Does not match
https://github.com/loongson/linux/blob/loongarch-next/arch/loongarch/include/uapi/asm/ucontext.h 




struct target_ucontext {
target_ulong tuc_flags;
target_ulong tuc_link;
target_stack_t tuc_stack;
target_sigset_t tuc_sigmask;
target_ulong pad0;
struct target_sigcontext tuc_mcontext;
};

Is this OK?

Thanks
Song Gao

Re: [PATCH 0/2] change IVSHMEM endianess to LITTLE_ENDIAN

2021-11-23 Thread Cédric Le Goater


On 11/23/21 22:19, Daniel Henrique Barboza wrote:

Hi,

This small series fixes an issue reported in Gitlab [1] that
affects PowerPC big-endian and little-endian and probably all
other big-endians in the wild that might use 'ivshmem'.

It's not clear to me who is the maintainer/responsible for this device
(MAINTAINERS doesn't seem to have any 'ivhshmem' entries nor someone
that looks upon all hw/misc/* files) so I didn't add any CC in that
regard. 'qemu-ppc' is being copied for awareness since they are the
folks that are most likely being impacted by the bug.

[1] https://gitlab.com/qemu-project/qemu/-/issues/168


Do we want these fixes for 6.2 ?

Thanks,

C.



Daniel Henrique Barboza (2):
   ivshmem.c: change endianness to LITTLE_ENDIAN
   ivshmem-test.c: enable test_ivshmem_server for ppc64 arch

  hw/misc/ivshmem.c  | 2 +-
  tests/qtest/ivshmem-test.c | 5 +
  2 files changed, 2 insertions(+), 5 deletions(-)

Delay rc2 until 24 Nov

2021-11-23 Thread Richard Henderson


I'm going to delay rc2 from tonight until tomorrow.
Hoping for a resolution to the gicv3 issue.

Bear in mind that Thursday, 25 Nov starts the Thanksgiving holiday weekend in the US, so 
I'm officially off until Monday.



r~

[PATCH 2/2] ivshmem-test.c: enable test_ivshmem_server for ppc64 arch

2021-11-23 Thread Daniel Henrique Barboza

This test, if enabled by hand, was failing when the ivhsmem device was
being declared as DEVICE_NATIVE_ENDIAN with the following error:

/ppc64/ivshmem/pair: OK
/ppc64/ivshmem/server:
**
ERROR:/home/danielhb/qemu/tests/qtest/ivshmem-test.c:367:test_ivshmem_server:
assertion failed (ret != 0): (0 != 0)
Aborted

After the endianess change done in the previous patch, we can verify in
both a a Power 9 little-endian host and in a Power 8 big-endian host
that this test is now passing:

$ QTEST_QEMU_BINARY=./ppc64-softmmu/qemu-system-ppc64 
./tests/qtest/ivshmem-test -m slow
/ppc64/ivshmem/single: OK
/ppc64/ivshmem/hotplug: OK
/ppc64/ivshmem/memdev: OK
/ppc64/ivshmem/pair: OK
/ppc64/ivshmem/server: OK

Let's keep it that way by officialy enabling it for ppc64.

Signed-off-by: Daniel Henrique Barboza 
---
 tests/qtest/ivshmem-test.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/qtest/ivshmem-test.c b/tests/qtest/ivshmem-test.c
index dfa69424ed..fe94dd3b96 100644
--- a/tests/qtest/ivshmem-test.c
+++ b/tests/qtest/ivshmem-test.c
@@ -463,7 +463,6 @@ static gchar *mktempshm(int size, int *fd)
 int main(int argc, char **argv)
 {
 int ret, fd;
-const char *arch = qtest_get_arch();
 gchar dir[] = "/tmp/ivshmem-test.XX";
 
 g_test_init(&argc, &argv, NULL);
@@ -488,9 +487,7 @@ int main(int argc, char **argv)
 qtest_add_func("/ivshmem/memdev", test_ivshmem_memdev);
 if (g_test_slow()) {
 qtest_add_func("/ivshmem/pair", test_ivshmem_pair);
-if (strcmp(arch, "ppc64") != 0) {
-qtest_add_func("/ivshmem/server", test_ivshmem_server);
-}
+qtest_add_func("/ivshmem/server", test_ivshmem_server);
 }
 
 out:
-- 
2.31.1

[PATCH 1/2] ivshmem.c: change endianness to LITTLE_ENDIAN

2021-11-23 Thread Daniel Henrique Barboza

The ivshmem device, as with most PCI devices, uses little endian byte
order. However, the endianess of its mmio_ops is marked as
DEVICE_NATIVE_ENDIAN. This presents not only the usual problems with big
endian hosts but also with PowerPC little endian hosts as well, since
the Power architecture in QEMU uses big endian hardware (XIVE controller,
PCI Host Bridges, etc) even if the host is in little endian byte order.

As it is today, the IVPosition of the device will be byte swapped when
running in Power BE and LE. This can be seen by changing the existing
qtest 'ivshmem-test' to run in ppc64 hosts and printing the IVPOSITION
regs in test_ivshmem_server() right after the VM ids assert. For x86_64
the VM id values read are '0' and '1', for ppc64 (tested in a Power8
RHEL 7.9 BE server) and ppc64le (tested in a Power9 RHEL 8.6 LE server)
the ids will be '0' and '0x100'.

Change this device to LITTLE_ENDIAN fixes the issue for Power hosts of
both endianess, and every other big-endian architecture that might use
this device, without impacting x86 users.

Fixes: cb06608e17f8 ("ivshmem: convert to memory API")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/168
Signed-off-by: Daniel Henrique Barboza 
---
 hw/misc/ivshmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
index 1ba4a98377..299837e5c1 100644
--- a/hw/misc/ivshmem.c
+++ b/hw/misc/ivshmem.c
@@ -243,7 +243,7 @@ static uint64_t ivshmem_io_read(void *opaque, hwaddr addr,
 static const MemoryRegionOps ivshmem_mmio_ops = {
 .read = ivshmem_io_read,
 .write = ivshmem_io_write,
-.endianness = DEVICE_NATIVE_ENDIAN,
+.endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
 .min_access_size = 4,
 .max_access_size = 4,
-- 
2.31.1

[PATCH 0/2] change IVSHMEM endianess to LITTLE_ENDIAN

2021-11-23 Thread Daniel Henrique Barboza

Hi,

This small series fixes an issue reported in Gitlab [1] that
affects PowerPC big-endian and little-endian and probably all
other big-endians in the wild that might use 'ivshmem'.

It's not clear to me who is the maintainer/responsible for this device
(MAINTAINERS doesn't seem to have any 'ivhshmem' entries nor someone
that looks upon all hw/misc/* files) so I didn't add any CC in that
regard. 'qemu-ppc' is being copied for awareness since they are the
folks that are most likely being impacted by the bug.

[1] https://gitlab.com/qemu-project/qemu/-/issues/168

Daniel Henrique Barboza (2):
  ivshmem.c: change endianness to LITTLE_ENDIAN
  ivshmem-test.c: enable test_ivshmem_server for ppc64 arch

 hw/misc/ivshmem.c  | 2 +-
 tests/qtest/ivshmem-test.c | 5 +
 2 files changed, 2 insertions(+), 5 deletions(-)

-- 
2.31.1

[PATCH v1 3/7] tests/avocado: fix tcg_plugin mem access count test

2021-11-23 Thread Alex Bennée

When we cleaned up argument handling the test was missed.

Fixes: 5ae589faad ("tests/plugins/mem: introduce "track" arg and make args not 
positional")
Signed-off-by: Alex Bennée 
---
 tests/avocado/tcg_plugins.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/avocado/tcg_plugins.py b/tests/avocado/tcg_plugins.py
index 9ca1515c3b..642d2e49e3 100644
--- a/tests/avocado/tcg_plugins.py
+++ b/tests/avocado/tcg_plugins.py
@@ -131,7 +131,7 @@ def test_aarch64_virt_mem_icount(self):
  suffix=".log")
 
 self.run_vm(kernel_path, kernel_command_line,
-"tests/plugin/libmem.so,arg=both", plugin_log.name,
+"tests/plugin/libmem.so,inline=true,callback=true", 
plugin_log.name,
 console_pattern,
 args=('-icount', 'shift=1'))
 
-- 
2.30.2

[PATCH v1 5/7] gdbstub: handle a potentially racing TaskState

2021-11-23 Thread Alex Bennée

When dealing with multi-threaded userspace programs there is a race
condition with the addition of cpu->opaque (aka TaskState). This is
due to cpu_copy calling cpu_create which updates the global vCPU list.
However the task state isn't set until later. This shouldn't be a
problem because the new thread can't have executed anything yet but
the gdbstub code does liberally iterate through the CPU list in
various places.

This sticking plaster ensure the not yet fully realized vCPU is given
an pid of -1 which should be enough to ensure it doesn't show up
anywhere else.

In the longer term I think the code that manages the association
between vCPUs and attached GDB processes could do with a clean-up and
re-factor.

Signed-off-by: Alex Bennée 
Tested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Cc: Richard Henderson 
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/730
---
 gdbstub.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gdbstub.c b/gdbstub.c
index 23baaef40e..141d7bc4ec 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -94,7 +94,7 @@ static inline int cpu_gdb_index(CPUState *cpu)
 {
 #if defined(CONFIG_USER_ONLY)
 TaskState *ts = (TaskState *) cpu->opaque;
-return ts->ts_tid;
+return ts ? ts->ts_tid : -1;
 #else
 return cpu->cpu_index + 1;
 #endif
-- 
2.30.2

[PATCH v1 7/7] MAINTAINERS: Add section for Aarch64 GitLab custom runner

2021-11-23 Thread Alex Bennée

From: Philippe Mathieu-Daudé 

Add a MAINTAINERS section to cover the GitLab YAML config file
containing the jobs run on the custom runner sponsored by the
Works On Arm project [*].

[*] https://developer.arm.com/solutions/infrastructure/works-on-arm

Suggested-by: Thomas Huth 
Signed-off-by: Philippe Mathieu-Daudé 
Signed-off-by: Alex Bennée 
Message-Id: <2026163226.2719320-1-f4...@amsat.org>
---
 MAINTAINERS | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8f5156bfa7..006a2293ba 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3511,6 +3511,12 @@ R: Beraldo Leal 
 S: Odd Fixes
 F: tests/avocado/
 
+GitLab custom runner (Works On Arm Sponsored)
+M: Alex Bennée 
+M: Philippe Mathieu-Daudé 
+S: Maintained
+F: .gitlab-ci.d/custom-runners/ubuntu-20.04-aarch64.yml
+
 Documentation
 -
 Build system architecture
-- 
2.30.2

[PATCH v1 1/7] softmmu: fix watchpoint-interrupt races

2021-11-23 Thread Alex Bennée

From: Pavel Dovgalyuk 

Watchpoint may be processed in two phases. First one is detecting
the instruction with target memory access. And the second one is
executing only one instruction and setting the debug interrupt flag.
Hardware interrupts can break this sequence when they happen after
the first watchpoint phase.
This patch postpones the interrupt request until watchpoint is
processed.

Signed-off-by: Pavel Dovgalyuk 
Reviewed-by: Alex Bennée 
Reviewed-by: David Hildenbrand 
Message-Id: <163662451431.125458.14945698834107669531.stgit@pasha-ThinkPad-X280>
Signed-off-by: Alex Bennée 
---
 accel/tcg/cpu-exec.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2d14d02f6c..9cb892e326 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -742,6 +742,11 @@ static inline bool cpu_handle_interrupt(CPUState *cpu,
 qemu_mutex_unlock_iothread();
 return true;
 }
+/* Process watchpoints first, or interrupts will ruin everything */
+if (cpu->watchpoint_hit) {
+qemu_mutex_unlock_iothread();
+return false;
+}
 #if !defined(CONFIG_USER_ONLY)
 if (replay_mode == REPLAY_MODE_PLAY && !replay_has_interrupt()) {
 /* Do nothing */
-- 
2.30.2

[PATCH v1 6/7] MAINTAINERS: Remove me as a reviewer for the build and test/avocado

2021-11-23 Thread Alex Bennée

From: Willian Rampazzo 

Remove me as a reviewer for the Build and test automation and the
Integration Testing with the Avocado Framework and add Beraldo
Leal.

Signed-off-by: Willian Rampazzo 
Reviewed-by: Beraldo Leal 
Message-Id: <20211122191124.31620-1-willi...@redhat.com>
Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d3879aa3c1..8f5156bfa7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3469,7 +3469,7 @@ M: Alex Bennée 
 M: Philippe Mathieu-Daudé 
 M: Thomas Huth 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Maintained
 F: .github/lockdown.yml
 F: .gitlab-ci.yml
@@ -3507,7 +3507,7 @@ W: https://trello.com/b/6Qi1pxVn/avocado-qemu
 R: Cleber Rosa 
 R: Philippe Mathieu-Daudé 
 R: Wainer dos Santos Moschetta 
-R: Willian Rampazzo 
+R: Beraldo Leal 
 S: Odd Fixes
 F: tests/avocado/
 
-- 
2.30.2

[PATCH v1 2/7] accel/tcg: suppress IRQ check for special TBs

2021-11-23 Thread Alex Bennée

Generally when we set cpu->cflags_next_tb it is because we want to
carefully control the execution of the next TB. Currently there is a
race that causes cflags_next_tb to get ignored if an IRQ is processed
before we execute any actual instructions.

To avoid this we introduce a new compiler flag: CF_NOIRQ to suppress
this check in the generated code so we know we will definitely execute
the next block.

Signed-off-by: Alex Bennée 
Cc: Pavel Dovgalyuk 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/245
---
 include/exec/exec-all.h   |  1 +
 include/exec/gen-icount.h | 21 +
 accel/tcg/cpu-exec.c  | 14 ++
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 6bb2a0f7ec..35d8e93976 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -503,6 +503,7 @@ struct TranslationBlock {
 #define CF_USE_ICOUNT0x0002
 #define CF_INVALID   0x0004 /* TB is stale. Set with @jmp_lock held */
 #define CF_PARALLEL  0x0008 /* Generate code for a parallel context */
+#define CF_NOIRQ 0x0010 /* Generate an uninterruptible TB */
 #define CF_CLUSTER_MASK  0xff00 /* Top 8 bits are cluster ID */
 #define CF_CLUSTER_SHIFT 24
 
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 610cba58fe..c57204ddad 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -21,7 +21,6 @@ static inline void gen_tb_start(const TranslationBlock *tb)
 {
 TCGv_i32 count;
 
-tcg_ctx->exitreq_label = gen_new_label();
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 count = tcg_temp_local_new_i32();
 } else {
@@ -42,7 +41,19 @@ static inline void gen_tb_start(const TranslationBlock *tb)
 icount_start_insn = tcg_last_op();
 }
 
-tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+/*
+ * Emit the check against icount_decr.u32 to see if we should exit
+ * unless we suppress the check with CF_NOIRQ. If we are using
+ * icount and have suppressed interruption the higher level code
+ * should have ensured we don't run more instructions than the
+ * budget.
+ */
+if (tb_cflags(tb) & CF_NOIRQ) {
+tcg_ctx->exitreq_label = NULL;
+} else {
+tcg_ctx->exitreq_label = gen_new_label();
+tcg_gen_brcondi_i32(TCG_COND_LT, count, 0, tcg_ctx->exitreq_label);
+}
 
 if (tb_cflags(tb) & CF_USE_ICOUNT) {
 tcg_gen_st16_i32(count, cpu_env,
@@ -74,8 +85,10 @@ static inline void gen_tb_end(const TranslationBlock *tb, 
int num_insns)
tcgv_i32_arg(tcg_constant_i32(num_insns)));
 }
 
-gen_set_label(tcg_ctx->exitreq_label);
-tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+if (tcg_ctx->exitreq_label) {
+gen_set_label(tcg_ctx->exitreq_label);
+tcg_gen_exit_tb(tb, TB_EXIT_REQUESTED);
+}
 }
 
 #endif
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 9cb892e326..9e3ed42ceb 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -721,6 +721,15 @@ static inline bool need_replay_interrupt(int 
interrupt_request)
 static inline bool cpu_handle_interrupt(CPUState *cpu,
 TranslationBlock **last_tb)
 {
+/*
+ * If we have special cflags lets not get distracted with IRQs. We
+ * shall exit the loop as soon as the next TB completes what it
+ * needs to do.
+ */
+if (cpu->cflags_next_tb != -1) {
+return false;
+}
+
 /* Clear the interrupt flag now since we're processing
  * cpu->interrupt_request and cpu->exit_request.
  * Ensure zeroing happens before reading cpu->exit_request or
@@ -954,11 +963,16 @@ int cpu_exec(CPUState *cpu)
  * after-access watchpoints.  Since this request should never
  * have CF_INVALID set, -1 is a convenient invalid value that
  * does not require tcg headers for cpu_common_reset.
+ *
+ * As we don't want this special TB being interrupted by
+ * some sort of asynchronous event we apply CF_NOIRQ to
+ * disable the usual event checking.
  */
 cflags = cpu->cflags_next_tb;
 if (cflags == -1) {
 cflags = curr_cflags(cpu);
 } else {
+cflags |= CF_NOIRQ;
 cpu->cflags_next_tb = -1;
 }
 
-- 
2.30.2

[PATCH v1 4/7] plugins/meson.build: fix linker issue with weird paths

2021-11-23 Thread Alex Bennée

Signed-off-by: Alex Bennée 
Tested-by: Stefan Weil 
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/712
---
 plugins/meson.build | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/plugins/meson.build b/plugins/meson.build
index aeb386ebae..b3de57853b 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -2,9 +2,9 @@ plugin_ldflags = []
 # Modules need more symbols than just those in plugins/qemu-plugins.symbols
 if not enable_modules
   if 'CONFIG_HAS_LD_DYNAMIC_LIST' in config_host
-plugin_ldflags = ['-Wl,--dynamic-list=' + (meson.project_build_root() / 
'qemu-plugins-ld.symbols')]
+plugin_ldflags = ['-Wl,--dynamic-list=qemu-plugins-ld.symbols']
   elif 'CONFIG_HAS_LD_EXPORTED_SYMBOLS_LIST' in config_host
-plugin_ldflags = ['-Wl,-exported_symbols_list,' + 
(meson.project_build_root() / 'qemu-plugins-ld64.symbols')]
+plugin_ldflags = ['-Wl,-exported_symbols_list,qemu-plugins-ld64.symbols']
   endif
 endif
 
-- 
2.30.2

[PATCH for 6.2 v1 0/7] more tcg, plugin, test and build fixes

2021-11-23 Thread Alex Bennée

Hi,

As the release process rolls on here if the current state of my
for-6.2 tree. There are fixes for TCG, plugins, build and test.

The following still need review:

 - plugins/meson.build: fix linker issue with weird paths
 - tests/avocado: fix tcg_plugin mem access count test
 - accel/tcg: suppress IRQ check for special TBs

Alex Bennée (4):
  accel/tcg: suppress IRQ check for special TBs
  tests/avocado: fix tcg_plugin mem access count test
  plugins/meson.build: fix linker issue with weird paths
  gdbstub: handle a potentially racing TaskState

Pavel Dovgalyuk (1):
  softmmu: fix watchpoint-interrupt races

Philippe Mathieu-Daudé (1):
  MAINTAINERS: Add section for Aarch64 GitLab custom runner

Willian Rampazzo (1):
  MAINTAINERS: Remove me as a reviewer for the build and test/avocado

 include/exec/exec-all.h  |  1 +
 include/exec/gen-icount.h| 21 +
 accel/tcg/cpu-exec.c | 19 +++
 gdbstub.c|  2 +-
 MAINTAINERS  | 10 --
 plugins/meson.build  |  4 ++--
 tests/avocado/tcg_plugins.py |  2 +-
 7 files changed, 49 insertions(+), 10 deletions(-)

-- 
2.30.2

Re: [PATCH] MAINTAINERS: Add section for Aarch64 GitLab custom runner

2021-11-23 Thread Alex Bennée



Philippe Mathieu-Daudé  writes:

> Add a MAINTAINERS section to cover the GitLab YAML config file
> containing the jobs run on the custom runner sponsored by the
> Works On Arm project [*].

Queued to for-6.2/tcg-gdb-plugin-fixes, thanks.

-- 
Alex Bennée

Re: [PATCH for-6.2] Revert "arm: tcg: Adhere to SMCCC 1.3 section 5.2"

2021-11-23 Thread Jean-Christophe DUBOIS


Le 19/11/2021 à 17:34, Peter Maydell a écrit :

This reverts commit 9fcd15b9193e819b6cc2fd0a45e3506148812bb4.

This change turns out to cause regressions, for instance on the
imx6ul boards as described here:
https://lore.kernel.org/qemu-devel/c8b89685-7490-328b-51a3-48711c140...@tribudubois.net/

The primary cause of that regression is that the guest code running
at EL3 expects SMCs (not related to PSCI) to do what they would if
our PSCI emulation was not present at all, but after this change
they instead set a value in R0/X0 and continue.

We could fix that by a refactoring that allowed us to only turn on
the PSCI emulation if we weren't booting the guest at EL3, but there
is a more tangled problem with the highbank board, which:
  (1) wants to enable PSCI emulation
  (2) has a bit of guest code that it wants to run at EL3 and
  to perform SMC calls that trap to the monitor vector table:
  this is the boot stub code that is written to memory by
  arm_write_secure_board_setup_dummy_smc() and which the
  highbank board enables by setting bootinfo->secure_board_setup

We can't satisfy both of those and also have the PSCI emulation
handle all SMC instruction executions regardless of function
identifier value.

This is too tricky to try to sort out before 6.2 is released;
revert this commit so we can take the time to get it right in
the 7.0 release.

Signed-off-by: Peter Maydell 
---
Jean-Christophe: could you confirm that reverting this fixes the
regressions you had on the imx boards ?


Hello Peter,

With this patch, things are back to "normal".

Thanks.

JC



  target/arm/psci.c | 35 +--
  1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/target/arm/psci.c b/target/arm/psci.c
index b279c0b9a45..6709e280133 100644
--- a/target/arm/psci.c
+++ b/target/arm/psci.c
@@ -27,13 +27,15 @@
  
  bool arm_is_psci_call(ARMCPU *cpu, int excp_type)

  {
-/*
- * Return true if the exception type matches the configured PSCI conduit.
- * This is called before the SMC/HVC instruction is executed, to decide
- * whether we should treat it as a PSCI call or with the architecturally
+/* Return true if the r0/x0 value indicates a PSCI call and
+ * the exception type matches the configured PSCI conduit. This is
+ * called before the SMC/HVC instruction is executed, to decide whether
+ * we should treat it as a PSCI call or with the architecturally
   * defined behaviour for an SMC or HVC (which might be UNDEF or trap
   * to EL2 or to EL3).
   */
+CPUARMState *env = &cpu->env;
+uint64_t param = is_a64(env) ? env->xregs[0] : env->regs[0];
  
  switch (excp_type) {

  case EXCP_HVC:
@@ -50,7 +52,27 @@ bool arm_is_psci_call(ARMCPU *cpu, int excp_type)
  return false;
  }
  
-return true;

+switch (param) {
+case QEMU_PSCI_0_2_FN_PSCI_VERSION:
+case QEMU_PSCI_0_2_FN_MIGRATE_INFO_TYPE:
+case QEMU_PSCI_0_2_FN_AFFINITY_INFO:
+case QEMU_PSCI_0_2_FN64_AFFINITY_INFO:
+case QEMU_PSCI_0_2_FN_SYSTEM_RESET:
+case QEMU_PSCI_0_2_FN_SYSTEM_OFF:
+case QEMU_PSCI_0_1_FN_CPU_ON:
+case QEMU_PSCI_0_2_FN_CPU_ON:
+case QEMU_PSCI_0_2_FN64_CPU_ON:
+case QEMU_PSCI_0_1_FN_CPU_OFF:
+case QEMU_PSCI_0_2_FN_CPU_OFF:
+case QEMU_PSCI_0_1_FN_CPU_SUSPEND:
+case QEMU_PSCI_0_2_FN_CPU_SUSPEND:
+case QEMU_PSCI_0_2_FN64_CPU_SUSPEND:
+case QEMU_PSCI_0_1_FN_MIGRATE:
+case QEMU_PSCI_0_2_FN_MIGRATE:
+return true;
+default:
+return false;
+}
  }
  
  void arm_handle_psci_call(ARMCPU *cpu)

@@ -172,9 +194,10 @@ void arm_handle_psci_call(ARMCPU *cpu)
  break;
  case QEMU_PSCI_0_1_FN_MIGRATE:
  case QEMU_PSCI_0_2_FN_MIGRATE:
-default:
  ret = QEMU_PSCI_RET_NOT_SUPPORTED;
  break;
+default:
+g_assert_not_reached();
  }
  
  err:

RE: [for-6.2] hw/intc/arm_gicv3: Update cached state after acknowledging LPI

2021-11-23 Thread Shashi Mallela

Since LPIs do not have an active or active and pending state,the current implementation only clears the LPI pending state from the pending table once ICC_IAR1_EL1 acknowledges the interrupt.But, as part of gicv3_lpi_pending() processing, cs->hpplpi is updated with the next best priotiy lpi (only if the current acknowledged irq was best priority irq).By calling gicv3_redist_update() in icc_activate_irq(), we are re-initiating high priority irqs scan in redistributor and if applicable trigger of next best pending lpi from the latest cs->hpplpi info(which otherwise would have happened on next irq trigger from source).Is this the behaviour expected? ThanksShashi Sent from Mail for Windows From: Peter MaydellSent: November 23, 2021 12:46 PMTo: qemu-...@nongnu.org; qemu-devel@nongnu.orgCc: Alex Bennée; Shashi MallelaSubject: Re: [for-6.2] hw/intc/arm_gicv3: Update cached state after acknowledging LPI On Tue, 23 Nov 2021 at 17:10, Peter Maydell  wrote:> > In gicv3_redist_lpi_pending() we update cs->hpplpi to indicate the> new highest priority pending LPI after changing the requested LPI> pending bit.  However the overall highest priority pending interrupt> information won't be updated unless we call gicv3_redist_update().> We do that from the callsite in gicv3-redist_process_lpi(), but not> from the callsite in icc_activate_irq().  The effect is that when the> guest acknowledges an LPI by reading ICC_IAR1_EL1, we mark it as not> pending in the data structure but still leave it in cs->hppi so will> offer it to the guest again.> > The effect is that if we are using an emulated GICv3 and ITS and> using devices which use LPIs (ie PCI devices) then Linux will> complain "irq 54: nobody cared" and then hang (probably because the> stale bogus interrupt info meant we never tried to deliver some other> real interrupt). Hmm; this is definitely a bug, but maybe it's not the cause ofthe symptoms listed above -- I've just seen them again evenwith this fix. I'll keep digging... -- PMM

Re: [for-6.2] hw/intc/arm_gicv3: Update cached state after acknowledging LPI

2021-11-23 Thread Alex Bennée



Peter Maydell  writes:

> On Tue, 23 Nov 2021 at 17:10, Peter Maydell  wrote:
>>
>> In gicv3_redist_lpi_pending() we update cs->hpplpi to indicate the
>> new highest priority pending LPI after changing the requested LPI
>> pending bit.  However the overall highest priority pending interrupt
>> information won't be updated unless we call gicv3_redist_update().
>> We do that from the callsite in gicv3-redist_process_lpi(), but not
>> from the callsite in icc_activate_irq().  The effect is that when the
>> guest acknowledges an LPI by reading ICC_IAR1_EL1, we mark it as not
>> pending in the data structure but still leave it in cs->hppi so will
>> offer it to the guest again.
>>
>> The effect is that if we are using an emulated GICv3 and ITS and
>> using devices which use LPIs (ie PCI devices) then Linux will
>> complain "irq 54: nobody cared" and then hang (probably because the
>> stale bogus interrupt info meant we never tried to deliver some other
>> real interrupt).
>
> Hmm; this is definitely a bug, but maybe it's not the cause of
> the symptoms listed above -- I've just seen them again even
> with this fix. I'll keep digging...

Hmm interesting - does this affect the kvm-unit-tests for GICv3?

>
> -- PMM


-- 
Alex Bennée

Re: [PULL 0/3] Block patches

2021-11-23 Thread Richard Henderson


On 11/23/21 4:59 PM, Hanna Reitz wrote:

The following changes since commit 73e0f70e097b7c92a5ce16ee35b53afe119b20d7:

   Merge tag 'pull-lu-20211123' of https://gitlab.com/rth7680/qemu into staging 
(2021-11-23 11:33:14 +0100)

are available in the Git repository at:

   https://gitlab.com/hreitz/qemu.git tags/pull-block-2021-11-23

for you to fetch changes up to 4dd218fd0717ed3cddb69c01eeb9da630107d89d:

   iotests/149: Skip on unsupported ciphers (2021-11-23 15:39:12 +0100)


Block patches for 6.2-rc2:
- Fix memory leak in vvfat when vvfat_open() fails
- iotest fixes for the gnutls crypto backend


Daniella Lee (1):
   block/vvfat.c fix leak when failure occurs

Hanna Reitz (2):
   iotests: Use aes-128-cbc
   iotests/149: Skip on unsupported ciphers

  block/vvfat.c  | 16 
  tests/qemu-iotests/149 | 23 ++-
  tests/qemu-iotests/206 |  4 ++--
  tests/qemu-iotests/206.out |  6 +++---
  tests/qemu-iotests/210 |  4 ++--
  tests/qemu-iotests/210.out |  6 +++---
  6 files changed, 40 insertions(+), 19 deletions(-)


Applied, thanks.

r~

Re: [for-6.2] hw/intc/arm_gicv3: Update cached state after acknowledging LPI

2021-11-23 Thread Richard Henderson


On 11/23/21 6:10 PM, Peter Maydell wrote:

In gicv3_redist_lpi_pending() we update cs->hpplpi to indicate the
new highest priority pending LPI after changing the requested LPI
pending bit.  However the overall highest priority pending interrupt
information won't be updated unless we call gicv3_redist_update().
We do that from the callsite in gicv3-redist_process_lpi(), but not
from the callsite in icc_activate_irq().  The effect is that when the
guest acknowledges an LPI by reading ICC_IAR1_EL1, we mark it as not
pending in the data structure but still leave it in cs->hppi so will
offer it to the guest again.

The effect is that if we are using an emulated GICv3 and ITS and
using devices which use LPIs (ie PCI devices) then Linux will
complain "irq 54: nobody cared" and then hang (probably because the
stale bogus interrupt info meant we never tried to deliver some other
real interrupt).

Add the missing gicv3_redist_update() call.

Signed-off-by: Peter Maydell 
---
Marked for-6.2 because this is a bug fix to the ITS support
which is new in this release. At least for me this is necessary
to boot Debian on the virt board, since the ITS is default-enabled.
The failure seemed to be somewhat intermittent; I haven't bothered
to try to work out why.


Reviewed-by: Richard Henderson 

r~

[PATCH 17/23] multifd: Use normal pages array on the send side

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd.h  |  8 ++--
 migration/multifd-zlib.c |  6 +++---
 migration/multifd-zstd.c |  6 +++---
 migration/multifd.c  | 30 +++---
 migration/trace-events   |  4 ++--
 5 files changed, 33 insertions(+), 21 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 7496f951a7..78e73df3ec 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -104,14 +104,18 @@ typedef struct {
 /* thread local variables */
 /* packets sent through this channel */
 uint64_t num_packets;
-/* pages sent through this channel */
-uint64_t num_pages;
+/* non zero pages sent through this channel */
+uint64_t num_normal_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
 /* buffers to send */
 struct iovec *iov;
 /* number of iovs used */
 uint32_t iovs_num;
+/* Pages that are not zero */
+ram_addr_t *normal;
+/* num of non zero pages */
+uint32_t normal_num;
 /* used for compression methods */
 void *data;
 }  MultiFDSendParams;
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index f65159392a..25ef68a548 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -106,16 +106,16 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 int ret;
 uint32_t i;
 
-for (i = 0; i < p->pages->num; i++) {
+for (i = 0; i < p->normal_num; i++) {
 uint32_t available = z->zbuff_len - out_size;
 int flush = Z_NO_FLUSH;
 
-if (i == p->pages->num - 1) {
+if (i == p->normal_num - 1) {
 flush = Z_SYNC_FLUSH;
 }
 
 zs->avail_in = page_size;
-zs->next_in = p->pages->block->host + p->pages->offset[i];
+zs->next_in = p->pages->block->host + p->normal[i];
 
 zs->avail_out = available;
 zs->next_out = z->zbuff + out_size;
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 6933ba622a..61842d713e 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -121,13 +121,13 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 z->out.size = z->zbuff_len;
 z->out.pos = 0;
 
-for (i = 0; i < p->pages->num; i++) {
+for (i = 0; i < p->normal_num; i++) {
 ZSTD_EndDirective flush = ZSTD_e_continue;
 
-if (i == p->pages->num - 1) {
+if (i == p->normal_num - 1) {
 flush = ZSTD_e_flush;
 }
-z->in.src = p->pages->block->host + p->pages->offset[i];
+z->in.src = p->pages->block->host + p->normal[i];
 z->in.size = page_size;
 z->in.pos = 0;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 6983ba3e7c..dbe919b764 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -89,13 +89,13 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
 MultiFDPages_t *pages = p->pages;
 size_t page_size = qemu_target_page_size();
 
-for (int i = 0; i < p->pages->num; i++) {
-p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+for (int i = 0; i < p->normal_num; i++) {
+p->iov[p->iovs_num].iov_base = pages->block->host + p->normal[i];
 p->iov[p->iovs_num].iov_len = page_size;
 p->iovs_num++;
 }
 
-p->next_packet_size = p->pages->num * page_size;
+p->next_packet_size = p->normal_num * page_size;
 p->flags |= MULTIFD_FLAG_NOCOMP;
 return 0;
 }
@@ -262,7 +262,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
 packet->flags = cpu_to_be32(p->flags);
 packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-packet->pages_used = cpu_to_be32(p->pages->num);
+packet->pages_used = cpu_to_be32(p->normal_num);
 packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 packet->packet_num = cpu_to_be64(p->packet_num);
 
@@ -270,9 +270,9 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 strncpy(packet->ramblock, p->pages->block->idstr, 256);
 }
 
-for (i = 0; i < p->pages->num; i++) {
+for (i = 0; i < p->normal_num; i++) {
 /* there are architectures where ram_addr_t is 32 bit */
-uint64_t temp = p->pages->offset[i];
+uint64_t temp = p->normal[i];
 
 packet->offset[i] = cpu_to_be64(temp);
 }
@@ -556,6 +556,8 @@ void multifd_save_cleanup(void)
 p->packet = NULL;
 g_free(p->iov);
 p->iov = NULL;
+g_free(p->normal);
+p->normal = NULL;
 multifd_send_state->ops->send_cleanup(p, &local_err);
 if (local_err) {
 migrate_set_error(migrate_get_current(), local_err);
@@ -640,12 +642,17 @@ static void *multifd_send_thread(void *opaque)
 qemu_mutex_lock(&p->mutex);
 
 if (p->pending_job) {
-uint32_t used = p->pages->num;
 uint64_t packet_num = p->packet_num;
 uint32_t flags = p->flags;

[PATCH 15/23] multifd: Use a single writev on the send side

2021-11-23 Thread Juan Quintela

Until now, we wrote the packet header with write(), and the rest of the
pages with writev().  Just increase the size of the iovec and do a
single writev().

Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 71bdef068e..65676d56fd 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -643,7 +643,7 @@ static void *multifd_send_thread(void *opaque)
 uint32_t used = p->pages->num;
 uint64_t packet_num = p->packet_num;
 uint32_t flags = p->flags;
-p->iovs_num = 0;
+p->iovs_num = 1;
 
 if (used) {
 ret = multifd_send_state->ops->send_prepare(p, &local_err);
@@ -663,20 +663,15 @@ static void *multifd_send_thread(void *opaque)
 trace_multifd_send(p->id, packet_num, used, flags,
p->next_packet_size);
 
-ret = qio_channel_write_all(p->c, (void *)p->packet,
-p->packet_len, &local_err);
+p->iov[0].iov_len = p->packet_len;
+p->iov[0].iov_base = p->packet;
+
+ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
+ &local_err);
 if (ret != 0) {
 break;
 }
 
-if (used) {
-ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
- &local_err);
-if (ret != 0) {
-break;
-}
-}
-
 qemu_mutex_lock(&p->mutex);
 p->pending_job--;
 qemu_mutex_unlock(&p->mutex);
@@ -913,7 +908,8 @@ int multifd_save_setup(Error **errp)
 p->packet->version = cpu_to_be32(MULTIFD_VERSION);
 p->name = g_strdup_printf("multifdsend_%d", i);
 p->tls_hostname = g_strdup(s->hostname);
-p->iov = g_new0(struct iovec, page_count);
+/* We need one extra place for the packet header */
+p->iov = g_new0(struct iovec, page_count + 1);
 socket_send_channel_create(multifd_new_send_channel_async, p);
 }
 
-- 
2.33.1

[PATCH 06/23] multifd: remove used parameter from send_prepare() method

2021-11-23 Thread Juan Quintela

It is already there as p->pages->num.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h  | 2 +-
 migration/multifd-zlib.c | 7 +++
 migration/multifd-zstd.c | 7 +++
 migration/multifd.c  | 9 +++--
 4 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 86820dd028..7968cc5c20 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -159,7 +159,7 @@ typedef struct {
 /* Cleanup for sending side */
 void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
 /* Prepare the send packet */
-int (*send_prepare)(MultiFDSendParams *p, uint32_t used, Error **errp);
+int (*send_prepare)(MultiFDSendParams *p, Error **errp);
 /* Write the send packet */
 int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
 /* Setup for receiving side */
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index d0437cce2a..28f0ed933b 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -94,10 +94,9 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 struct iovec *iov = p->pages->iov;
 struct zlib_data *z = p->data;
@@ -106,11 +105,11 @@ static int zlib_send_prepare(MultiFDSendParams *p, 
uint32_t used, Error **errp)
 int ret;
 uint32_t i;
 
-for (i = 0; i < used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 uint32_t available = z->zbuff_len - out_size;
 int flush = Z_NO_FLUSH;
 
-if (i == used - 1) {
+if (i == p->pages->num - 1) {
 flush = Z_SYNC_FLUSH;
 }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 09ae1cf91a..4a71e96e06 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -107,10 +107,9 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
+static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
 struct iovec *iov = p->pages->iov;
 struct zstd_data *z = p->data;
@@ -121,10 +120,10 @@ static int zstd_send_prepare(MultiFDSendParams *p, 
uint32_t used, Error **errp)
 z->out.size = z->zbuff_len;
 z->out.pos = 0;
 
-for (i = 0; i < used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 ZSTD_EndDirective flush = ZSTD_e_continue;
 
-if (i == used - 1) {
+if (i == p->pages->num - 1) {
 flush = ZSTD_e_flush;
 }
 z->in.src = iov[i].iov_base;
diff --git a/migration/multifd.c b/migration/multifd.c
index ce7101cf9d..098ef8842c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -82,13 +82,11 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int nocomp_send_prepare(MultiFDSendParams *p, uint32_t used,
-   Error **errp)
+static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-p->next_packet_size = used * qemu_target_page_size();
+p->next_packet_size = p->pages->num * qemu_target_page_size();
 p->flags |= MULTIFD_FLAG_NOCOMP;
 return 0;
 }
@@ -654,8 +652,7 @@ static void *multifd_send_thread(void *opaque)
 uint32_t flags = p->flags;
 
 if (used) {
-ret = multifd_send_state->ops->send_prepare(p, used,
-&local_err);
+ret = multifd_send_state->ops->send_prepare(p, &local_err);
 if (ret != 0) {
 qemu_mutex_unlock(&p->mutex);
 break;
-- 
2.33.1

[PATCH 16/23] multifd: Unfold "used" variable by its value

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index 65676d56fd..6983ba3e7c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1059,7 +1059,6 @@ static void *multifd_recv_thread(void *opaque)
 rcu_register_thread();
 
 while (true) {
-uint32_t used;
 uint32_t flags;
 
 if (p->quit) {
@@ -1082,17 +1081,16 @@ static void *multifd_recv_thread(void *opaque)
 break;
 }
 
-used = p->pages->num;
 flags = p->flags;
 /* recv methods don't know how to handle the SYNC flag */
 p->flags &= ~MULTIFD_FLAG_SYNC;
-trace_multifd_recv(p->id, p->packet_num, used, flags,
+trace_multifd_recv(p->id, p->packet_num, p->pages->num, flags,
p->next_packet_size);
 p->num_packets++;
-p->num_pages += used;
+p->num_pages += p->pages->num;
 qemu_mutex_unlock(&p->mutex);
 
-if (used) {
+if (p->pages->num) {
 ret = multifd_recv_state->ops->recv_pages(p, &local_err);
 if (ret != 0) {
 break;
-- 
2.33.1

[PATCH 05/23] multifd: The variable is only used inside the loop

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/multifd.c b/migration/multifd.c
index cdeffdc4c5..ce7101cf9d 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -629,7 +629,6 @@ static void *multifd_send_thread(void *opaque)
 MultiFDSendParams *p = opaque;
 Error *local_err = NULL;
 int ret = 0;
-uint32_t flags = 0;
 
 trace_multifd_send_thread_start(p->id);
 rcu_register_thread();
@@ -652,7 +651,7 @@ static void *multifd_send_thread(void *opaque)
 if (p->pending_job) {
 uint32_t used = p->pages->num;
 uint64_t packet_num = p->packet_num;
-flags = p->flags;
+uint32_t flags = p->flags;
 
 if (used) {
 ret = multifd_send_state->ops->send_prepare(p, used,
-- 
2.33.1

[PATCH 13/23] multifd: Make zstd use iov's

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd-zstd.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 2d5b61106c..259277dc42 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -154,6 +154,9 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 return -1;
 }
 }
+p->iov[p->iovs_num].iov_base = z->zbuff;
+p->iov[p->iovs_num].iov_len = z->out.pos;
+p->iovs_num++;
 p->next_packet_size = z->out.pos;
 p->flags |= MULTIFD_FLAG_ZSTD;
 
@@ -173,10 +176,7 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
  */
 static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-struct zstd_data *z = p->data;
-
-return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
- errp);
+return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
-- 
2.33.1

[PATCH 10/23] multifd: Make zlib compression method not use iovs

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd-zlib.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index e85ef8824d..da6201704c 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include 
 #include "qemu/rcu.h"
+#include "exec/ramblock.h"
 #include "exec/target_page.h"
 #include "qapi/error.h"
 #include "migration.h"
@@ -98,8 +99,8 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zlib_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-struct iovec *iov = p->pages->iov;
 struct zlib_data *z = p->data;
+size_t page_size = qemu_target_page_size();
 z_stream *zs = &z->zs;
 uint32_t out_size = 0;
 int ret;
@@ -113,8 +114,8 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
-zs->avail_in = iov[i].iov_len;
-zs->next_in = iov[i].iov_base;
+zs->avail_in = page_size;
+zs->next_in = p->pages->block->host + p->pages->offset[i];
 
 zs->avail_out = available;
 zs->next_out = z->zbuff + out_size;
@@ -235,6 +236,7 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
 static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 struct zlib_data *z = p->data;
+size_t page_size = qemu_target_page_size();
 z_stream *zs = &z->zs;
 uint32_t in_size = p->next_packet_size;
 /* we measure the change of total_out */
@@ -259,7 +261,6 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 zs->next_in = z->zbuff;
 
 for (i = 0; i < p->pages->num; i++) {
-struct iovec *iov = &p->pages->iov[i];
 int flush = Z_NO_FLUSH;
 unsigned long start = zs->total_out;
 
@@ -267,8 +268,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 flush = Z_SYNC_FLUSH;
 }
 
-zs->avail_out = iov->iov_len;
-zs->next_out = iov->iov_base;
+zs->avail_out = page_size;
+zs->next_out = p->pages->block->host + p->pages->offset[i];
 
 /*
  * Welcome to inflate semantics
@@ -281,8 +282,8 @@ static int zlib_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 do {
 ret = inflate(zs, flush);
 } while (ret == Z_OK && zs->avail_in
- && (zs->total_out - start) < iov->iov_len);
-if (ret == Z_OK && (zs->total_out - start) < iov->iov_len) {
+ && (zs->total_out - start) < page_size);
+if (ret == Z_OK && (zs->total_out - start) < page_size) {
 error_setg(errp, "multifd %d: inflate generated too few output",
p->id);
 return -1;
-- 
2.33.1

[PATCH 14/23] multifd: Remove send_write() method

2021-11-23 Thread Juan Quintela

Everything use now iov's.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h  |  2 --
 migration/multifd-zlib.c | 17 -
 migration/multifd-zstd.c | 17 -
 migration/multifd.c  | 20 ++--
 4 files changed, 2 insertions(+), 54 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index c3f18af364..7496f951a7 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -164,8 +164,6 @@ typedef struct {
 void (*send_cleanup)(MultiFDSendParams *p, Error **errp);
 /* Prepare the send packet */
 int (*send_prepare)(MultiFDSendParams *p, Error **errp);
-/* Write the send packet */
-int (*send_write)(MultiFDSendParams *p, uint32_t used, Error **errp);
 /* Setup for receiving side */
 int (*recv_setup)(MultiFDRecvParams *p, Error **errp);
 /* Cleanup for receiving side */
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 478a4af115..f65159392a 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -152,22 +152,6 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 return 0;
 }
 
-/**
- * zlib_send_write: do the actual write of the data
- *
- * Do the actual write of the comprresed buffer.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * zlib_recv_setup: setup receive side
  *
@@ -307,7 +291,6 @@ static MultiFDMethods multifd_zlib_ops = {
 .send_setup = zlib_send_setup,
 .send_cleanup = zlib_send_cleanup,
 .send_prepare = zlib_send_prepare,
-.send_write = zlib_send_write,
 .recv_setup = zlib_recv_setup,
 .recv_cleanup = zlib_recv_cleanup,
 .recv_pages = zlib_recv_pages
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 259277dc42..6933ba622a 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -163,22 +163,6 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 return 0;
 }
 
-/**
- * zstd_send_write: do the actual write of the data
- *
- * Do the actual write of the comprresed buffer.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int zstd_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * zstd_recv_setup: setup receive side
  *
@@ -320,7 +304,6 @@ static MultiFDMethods multifd_zstd_ops = {
 .send_setup = zstd_send_setup,
 .send_cleanup = zstd_send_cleanup,
 .send_prepare = zstd_send_prepare,
-.send_write = zstd_send_write,
 .recv_setup = zstd_recv_setup,
 .recv_cleanup = zstd_recv_cleanup,
 .recv_pages = zstd_recv_pages
diff --git a/migration/multifd.c b/migration/multifd.c
index 37487fd01c..71bdef068e 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -100,22 +100,6 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
 return 0;
 }
 
-/**
- * nocomp_send_write: do the actual write of the data
- *
- * For no compression we just have to write the data.
- *
- * Returns 0 for success or -1 for error
- *
- * @p: Params for the channel that we are using
- * @used: number of pages used
- * @errp: pointer to an error
- */
-static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
-{
-return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
-}
-
 /**
  * nocomp_recv_setup: setup receive side
  *
@@ -173,7 +157,6 @@ static MultiFDMethods multifd_nocomp_ops = {
 .send_setup = nocomp_send_setup,
 .send_cleanup = nocomp_send_cleanup,
 .send_prepare = nocomp_send_prepare,
-.send_write = nocomp_send_write,
 .recv_setup = nocomp_recv_setup,
 .recv_cleanup = nocomp_recv_cleanup,
 .recv_pages = nocomp_recv_pages
@@ -687,7 +670,8 @@ static void *multifd_send_thread(void *opaque)
 }
 
 if (used) {
-ret = multifd_send_state->ops->send_write(p, used, &local_err);
+ret = qio_channel_writev_all(p->c, p->iov, p->iovs_num,
+ &local_err);
 if (ret != 0) {
 break;
 }
-- 
2.33.1

[PATCH 02/23] migration: Never call twice qemu_target_page_size()

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/migration.c | 7 ---
 migration/multifd.c   | 7 ---
 migration/savevm.c| 5 +++--
 3 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 2c1edb2cb9..3de11ae921 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -996,6 +996,8 @@ static void populate_time_info(MigrationInfo *info, 
MigrationState *s)
 
 static void populate_ram_info(MigrationInfo *info, MigrationState *s)
 {
+size_t page_size = qemu_target_page_size();
+
 info->has_ram = true;
 info->ram = g_malloc0(sizeof(*info->ram));
 info->ram->transferred = ram_counters.transferred;
@@ -1004,12 +1006,11 @@ static void populate_ram_info(MigrationInfo *info, 
MigrationState *s)
 /* legacy value.  It is not used anymore */
 info->ram->skipped = 0;
 info->ram->normal = ram_counters.normal;
-info->ram->normal_bytes = ram_counters.normal *
-qemu_target_page_size();
+info->ram->normal_bytes = ram_counters.normal * page_size;
 info->ram->mbps = s->mbps;
 info->ram->dirty_sync_count = ram_counters.dirty_sync_count;
 info->ram->postcopy_requests = ram_counters.postcopy_requests;
-info->ram->page_size = qemu_target_page_size();
+info->ram->page_size = page_size;
 info->ram->multifd_bytes = ram_counters.multifd_bytes;
 info->ram->pages_per_second = s->pages_per_second;
 
diff --git a/migration/multifd.c b/migration/multifd.c
index 7c9deb1921..8125d0015c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -289,7 +289,8 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
 {
 MultiFDPacket_t *packet = p->packet;
-uint32_t pages_max = MULTIFD_PACKET_SIZE / qemu_target_page_size();
+size_t page_size = qemu_target_page_size();
+uint32_t pages_max = MULTIFD_PACKET_SIZE / page_size;
 RAMBlock *block;
 int i;
 
@@ -358,14 +359,14 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 for (i = 0; i < p->pages->used; i++) {
 uint64_t offset = be64_to_cpu(packet->offset[i]);
 
-if (offset > (block->used_length - qemu_target_page_size())) {
+if (offset > (block->used_length - page_size)) {
 error_setg(errp, "multifd: offset too long %" PRIu64
" (max " RAM_ADDR_FMT ")",
offset, block->used_length);
 return -1;
 }
 p->pages->iov[i].iov_base = block->host + offset;
-p->pages->iov[i].iov_len = qemu_target_page_size();
+p->pages->iov[i].iov_len = page_size;
 }
 
 return 0;
diff --git a/migration/savevm.c b/migration/savevm.c
index d59e976d50..0bef031acb 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1685,6 +1685,7 @@ static int 
loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
 {
 PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_ADVISE);
 uint64_t remote_pagesize_summary, local_pagesize_summary, remote_tps;
+size_t page_size = qemu_target_page_size();
 Error *local_err = NULL;
 
 trace_loadvm_postcopy_handle_advise();
@@ -1741,13 +1742,13 @@ static int 
loadvm_postcopy_handle_advise(MigrationIncomingState *mis,
 }
 
 remote_tps = qemu_get_be64(mis->from_src_file);
-if (remote_tps != qemu_target_page_size()) {
+if (remote_tps != page_size) {
 /*
  * Again, some differences could be dealt with, but for now keep it
  * simple.
  */
 error_report("Postcopy needs matching target page sizes (s=%d d=%zd)",
- (int)remote_tps, qemu_target_page_size());
+ (int)remote_tps, page_size);
 return -1;
 }
 
-- 
2.33.1

[PATCH 08/23] multifd: Fill offset and block for reception

2021-11-23 Thread Juan Quintela

We were using the iov directly, but we will need this info on the
following patch.

Signed-off-by: Juan Quintela 
---
 migration/multifd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/multifd.c b/migration/multifd.c
index 55d99a8232..0533da154a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -354,6 +354,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 return -1;
 }
 
+p->pages->block = block;
 for (i = 0; i < p->pages->num; i++) {
 uint64_t offset = be64_to_cpu(packet->offset[i]);
 
@@ -363,6 +364,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
offset, block->used_length);
 return -1;
 }
+p->pages->offset[i] = offset;
 p->pages->iov[i].iov_base = block->host + offset;
 p->pages->iov[i].iov_len = page_size;
 }
-- 
2.33.1

[PATCH 09/23] multifd: Make zstd compression method not use iovs

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd-zstd.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index a8b104f4ee..2d5b61106c 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -13,6 +13,7 @@
 #include "qemu/osdep.h"
 #include 
 #include "qemu/rcu.h"
+#include "exec/ramblock.h"
 #include "exec/target_page.h"
 #include "qapi/error.h"
 #include "migration.h"
@@ -111,8 +112,8 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int zstd_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-struct iovec *iov = p->pages->iov;
 struct zstd_data *z = p->data;
+size_t page_size = qemu_target_page_size();
 int ret;
 uint32_t i;
 
@@ -126,8 +127,8 @@ static int zstd_send_prepare(MultiFDSendParams *p, Error 
**errp)
 if (i == p->pages->num - 1) {
 flush = ZSTD_e_flush;
 }
-z->in.src = iov[i].iov_base;
-z->in.size = iov[i].iov_len;
+z->in.src = p->pages->block->host + p->pages->offset[i];
+z->in.size = page_size;
 z->in.pos = 0;
 
 /*
@@ -256,7 +257,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 {
 uint32_t in_size = p->next_packet_size;
 uint32_t out_size = 0;
-uint32_t expected_size = p->pages->num * qemu_target_page_size();
+size_t page_size = qemu_target_page_size();
+uint32_t expected_size = p->pages->num * page_size;
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 struct zstd_data *z = p->data;
 int ret;
@@ -278,10 +280,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 z->in.pos = 0;
 
 for (i = 0; i < p->pages->num; i++) {
-struct iovec *iov = &p->pages->iov[i];
-
-z->out.dst = iov->iov_base;
-z->out.size = iov->iov_len;
+z->out.dst = p->pages->block->host + p->pages->offset[i];
+z->out.size = page_size;
 z->out.pos = 0;
 
 /*
@@ -295,8 +295,8 @@ static int zstd_recv_pages(MultiFDRecvParams *p, Error 
**errp)
 do {
 ret = ZSTD_decompressStream(z->zds, &z->out, &z->in);
 } while (ret > 0 && (z->in.size - z->in.pos > 0)
- && (z->out.pos < iov->iov_len));
-if (ret > 0 && (z->out.pos < iov->iov_len)) {
+ && (z->out.pos < page_size));
+if (ret > 0 && (z->out.pos < page_size)) {
 error_setg(errp, "multifd %d: decompressStream buffer too small",
p->id);
 return -1;
-- 
2.33.1

[PATCH 07/23] multifd: remove used parameter from send_recv_pages() method

2021-11-23 Thread Juan Quintela

It is already there as p->pages->num.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h  | 2 +-
 migration/multifd-zlib.c | 9 -
 migration/multifd-zstd.c | 7 +++
 migration/multifd.c  | 7 +++
 4 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 7968cc5c20..e57adc783b 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -167,7 +167,7 @@ typedef struct {
 /* Cleanup for receiving side */
 void (*recv_cleanup)(MultiFDRecvParams *p);
 /* Read all pages */
-int (*recv_pages)(MultiFDRecvParams *p, uint32_t used, Error **errp);
+int (*recv_pages)(MultiFDRecvParams *p, Error **errp);
 } MultiFDMethods;
 
 void multifd_register_ops(int method, MultiFDMethods *ops);
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 28f0ed933b..e85ef8824d 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -230,17 +230,16 @@ static void zlib_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zlib_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int zlib_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 struct zlib_data *z = p->data;
 z_stream *zs = &z->zs;
 uint32_t in_size = p->next_packet_size;
 /* we measure the change of total_out */
 uint32_t out_size = zs->total_out;
-uint32_t expected_size = used * qemu_target_page_size();
+uint32_t expected_size = p->pages->num * qemu_target_page_size();
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 int ret;
 int i;
@@ -259,12 +258,12 @@ static int zlib_recv_pages(MultiFDRecvParams *p, uint32_t 
used, Error **errp)
 zs->avail_in = in_size;
 zs->next_in = z->zbuff;
 
-for (i = 0; i < used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 struct iovec *iov = &p->pages->iov[i];
 int flush = Z_NO_FLUSH;
 unsigned long start = zs->total_out;
 
-if (i == used - 1) {
+if (i == p->pages->num - 1) {
 flush = Z_SYNC_FLUSH;
 }
 
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 4a71e96e06..a8b104f4ee 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -250,14 +250,13 @@ static void zstd_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int zstd_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int zstd_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 uint32_t in_size = p->next_packet_size;
 uint32_t out_size = 0;
-uint32_t expected_size = used * qemu_target_page_size();
+uint32_t expected_size = p->pages->num * qemu_target_page_size();
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 struct zstd_data *z = p->data;
 int ret;
@@ -278,7 +277,7 @@ static int zstd_recv_pages(MultiFDRecvParams *p, uint32_t 
used, Error **errp)
 z->in.size = in_size;
 z->in.pos = 0;
 
-for (i = 0; i < used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 struct iovec *iov = &p->pages->iov[i];
 
 z->out.dst = iov->iov_base;
diff --git a/migration/multifd.c b/migration/multifd.c
index 098ef8842c..55d99a8232 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -141,10 +141,9 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
  * Returns 0 for success or -1 for error
  *
  * @p: Params for the channel that we are using
- * @used: number of pages used
  * @errp: pointer to an error
  */
-static int nocomp_recv_pages(MultiFDRecvParams *p, uint32_t used, Error **errp)
+static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
 
@@ -153,7 +152,7 @@ static int nocomp_recv_pages(MultiFDRecvParams *p, uint32_t 
used, Error **errp)
p->id, flags, MULTIFD_FLAG_NOCOMP);
 return -1;
 }
-return qio_channel_readv_all(p->c, p->pages->iov, used, errp);
+return qio_channel_readv_all(p->c, p->pages->iov, p->pages->num, errp);
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -1099,7 +1098,7 @@ static void *multifd_recv_thread(void *opaque)
 qemu_mutex_unlock(&p->mutex);
 
 if (used) {
-ret = multifd_recv_state->ops->recv_pages(p, used, &local_err);
+ret = multifd_recv_state->ops->recv_pages(p, &local_err);
 if (ret != 0) {
 break;
 }
-- 
2.33.1

[PATCH 11/23] multifd: Move iov from pages to params

2021-11-23 Thread Juan Quintela

This will allow us to reduce the number of system calls on the next patch.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h |  8 ++--
 migration/multifd.c | 34 --
 2 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index e57adc783b..c3f18af364 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -62,8 +62,6 @@ typedef struct {
 uint64_t packet_num;
 /* offset of each page */
 ram_addr_t *offset;
-/* pointer to each page */
-struct iovec *iov;
 RAMBlock *block;
 } MultiFDPages_t;
 
@@ -110,6 +108,10 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* buffers to send */
+struct iovec *iov;
+/* number of iovs used */
+uint32_t iovs_num;
 /* used for compression methods */
 void *data;
 }  MultiFDSendParams;
@@ -149,6 +151,8 @@ typedef struct {
 uint64_t num_pages;
 /* syncs main thread and channels */
 QemuSemaphore sem_sync;
+/* buffers to recv */
+struct iovec *iov;
 /* used for de-compression methods */
 void *data;
 } MultiFDRecvParams;
diff --git a/migration/multifd.c b/migration/multifd.c
index 0533da154a..37487fd01c 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -86,7 +86,16 @@ static void nocomp_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  */
 static int nocomp_send_prepare(MultiFDSendParams *p, Error **errp)
 {
-p->next_packet_size = p->pages->num * qemu_target_page_size();
+MultiFDPages_t *pages = p->pages;
+size_t page_size = qemu_target_page_size();
+
+for (int i = 0; i < p->pages->num; i++) {
+p->iov[p->iovs_num].iov_base = pages->block->host + pages->offset[i];
+p->iov[p->iovs_num].iov_len = page_size;
+p->iovs_num++;
+}
+
+p->next_packet_size = p->pages->num * page_size;
 p->flags |= MULTIFD_FLAG_NOCOMP;
 return 0;
 }
@@ -104,7 +113,7 @@ static int nocomp_send_prepare(MultiFDSendParams *p, Error 
**errp)
  */
 static int nocomp_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-return qio_channel_writev_all(p->c, p->pages->iov, used, errp);
+return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
@@ -146,13 +155,18 @@ static void nocomp_recv_cleanup(MultiFDRecvParams *p)
 static int nocomp_recv_pages(MultiFDRecvParams *p, Error **errp)
 {
 uint32_t flags = p->flags & MULTIFD_FLAG_COMPRESSION_MASK;
+size_t page_size = qemu_target_page_size();
 
 if (flags != MULTIFD_FLAG_NOCOMP) {
 error_setg(errp, "multifd %d: flags received %x flags expected %x",
p->id, flags, MULTIFD_FLAG_NOCOMP);
 return -1;
 }
-return qio_channel_readv_all(p->c, p->pages->iov, p->pages->num, errp);
+for (int i = 0; i < p->pages->num; i++) {
+p->iov[i].iov_base = p->pages->block->host + p->pages->offset[i];
+p->iov[i].iov_len = page_size;
+}
+return qio_channel_readv_all(p->c, p->iov, p->pages->num, errp);
 }
 
 static MultiFDMethods multifd_nocomp_ops = {
@@ -242,7 +256,6 @@ static MultiFDPages_t *multifd_pages_init(size_t size)
 MultiFDPages_t *pages = g_new0(MultiFDPages_t, 1);
 
 pages->allocated = size;
-pages->iov = g_new0(struct iovec, size);
 pages->offset = g_new0(ram_addr_t, size);
 
 return pages;
@@ -254,8 +267,6 @@ static void multifd_pages_clear(MultiFDPages_t *pages)
 pages->allocated = 0;
 pages->packet_num = 0;
 pages->block = NULL;
-g_free(pages->iov);
-pages->iov = NULL;
 g_free(pages->offset);
 pages->offset = NULL;
 g_free(pages);
@@ -365,8 +376,6 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 return -1;
 }
 p->pages->offset[i] = offset;
-p->pages->iov[i].iov_base = block->host + offset;
-p->pages->iov[i].iov_len = page_size;
 }
 
 return 0;
@@ -470,8 +479,6 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, 
ram_addr_t offset)
 
 if (pages->block == block) {
 pages->offset[pages->num] = offset;
-pages->iov[pages->num].iov_base = block->host + offset;
-pages->iov[pages->num].iov_len = qemu_target_page_size();
 pages->num++;
 
 if (pages->num < pages->allocated) {
@@ -564,6 +571,8 @@ void multifd_save_cleanup(void)
 p->packet_len = 0;
 g_free(p->packet);
 p->packet = NULL;
+g_free(p->iov);
+p->iov = NULL;
 multifd_send_state->ops->send_cleanup(p, &local_err);
 if (local_err) {
 migrate_set_error(migrate_get_current(), local_err);
@@ -651,6 +660,7 @@ static void *multifd_send_thread(void *opaque)
 uint32_t used = p->pages->num;
 uint64_t packet_num = p->packet_num;
 uint32_t flags = p->flags;
+p->iovs_num = 0;
 
 if (used) {
 ret = multi

[PATCH 01/23] multifd: Delete useless operation

2021-11-23 Thread Juan Quintela

We are divining by page_size to multiply again in the only use.
Once there, impreve the comments.

Signed-off-by: Juan Quintela 
---
 migration/multifd-zlib.c | 13 -
 migration/multifd-zstd.c | 13 -
 2 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index ab4ba75d75..3fc7813b44 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -42,7 +42,6 @@ struct zlib_data {
  */
 static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
 {
-uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 struct zlib_data *z = g_malloc0(sizeof(struct zlib_data));
 z_stream *zs = &z->zs;
 
@@ -54,9 +53,8 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
 error_setg(errp, "multifd %d: deflate init failed", p->id);
 return -1;
 }
-/* We will never have more than page_count pages */
-z->zbuff_len = page_count * qemu_target_page_size();
-z->zbuff_len *= 2;
+/* To be safe, we reserve twice the size of the packet */
+z->zbuff_len = MULTIFD_PACKET_SIZE * 2;
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
 deflateEnd(&z->zs);
@@ -180,7 +178,6 @@ static int zlib_send_write(MultiFDSendParams *p, uint32_t 
used, Error **errp)
  */
 static int zlib_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
-uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 struct zlib_data *z = g_malloc0(sizeof(struct zlib_data));
 z_stream *zs = &z->zs;
 
@@ -194,10 +191,8 @@ static int zlib_recv_setup(MultiFDRecvParams *p, Error 
**errp)
 error_setg(errp, "multifd %d: inflate init failed", p->id);
 return -1;
 }
-/* We will never have more than page_count pages */
-z->zbuff_len = page_count * qemu_target_page_size();
-/* We know compression "could" use more space */
-z->zbuff_len *= 2;
+/* To be safe, we reserve twice the size of the packet */
+z->zbuff_len = MULTIFD_PACKET_SIZE * 2;
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
 inflateEnd(zs);
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 693bddf8c9..cc3b8869c0 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -47,7 +47,6 @@ struct zstd_data {
  */
 static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
 {
-uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 struct zstd_data *z = g_new0(struct zstd_data, 1);
 int res;
 
@@ -67,9 +66,8 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
p->id, ZSTD_getErrorName(res));
 return -1;
 }
-/* We will never have more than page_count pages */
-z->zbuff_len = page_count * qemu_target_page_size();
-z->zbuff_len *= 2;
+/* To be safe, we reserve twice the size of the packet */
+z->zbuff_len = MULTIFD_PACKET_SIZE * 2;
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
 ZSTD_freeCStream(z->zcs);
@@ -191,7 +189,6 @@ static int zstd_send_write(MultiFDSendParams *p, uint32_t 
used, Error **errp)
  */
 static int zstd_recv_setup(MultiFDRecvParams *p, Error **errp)
 {
-uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
 struct zstd_data *z = g_new0(struct zstd_data, 1);
 int ret;
 
@@ -212,10 +209,8 @@ static int zstd_recv_setup(MultiFDRecvParams *p, Error 
**errp)
 return -1;
 }
 
-/* We will never have more than page_count pages */
-z->zbuff_len = page_count * qemu_target_page_size();
-/* We know compression "could" use more space */
-z->zbuff_len *= 2;
+/* To be safe, we reserve twice the size of the packet */
+z->zbuff_len = MULTIFD_PACKET_SIZE * 2;
 z->zbuff = g_try_malloc(z->zbuff_len);
 if (!z->zbuff) {
 ZSTD_freeDStream(z->zds);
-- 
2.33.1

[PATCH 04/23] multifd: Add missing documention

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd-zlib.c | 2 ++
 migration/multifd-zstd.c | 2 ++
 migration/multifd.c  | 1 +
 3 files changed, 5 insertions(+)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 3fc7813b44..d0437cce2a 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -72,6 +72,7 @@ static int zlib_send_setup(MultiFDSendParams *p, Error **errp)
  * Close the channel and return memory.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void zlib_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
@@ -94,6 +95,7 @@ static void zlib_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  *
  * @p: Params for the channel that we are using
  * @used: number of pages used
+ * @errp: pointer to an error
  */
 static int zlib_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index cc3b8869c0..09ae1cf91a 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -84,6 +84,7 @@ static int zstd_send_setup(MultiFDSendParams *p, Error **errp)
  * Close the channel and return memory.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void zstd_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
@@ -107,6 +108,7 @@ static void zstd_send_cleanup(MultiFDSendParams *p, Error 
**errp)
  *
  * @p: Params for the channel that we are using
  * @used: number of pages used
+ * @errp: pointer to an error
  */
 static int zstd_send_prepare(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
diff --git a/migration/multifd.c b/migration/multifd.c
index 8ea86d81dc..cdeffdc4c5 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -66,6 +66,7 @@ static int nocomp_send_setup(MultiFDSendParams *p, Error 
**errp)
  * For no compression this function does nothing.
  *
  * @p: Params for the channel that we are using
+ * @errp: pointer to an error
  */
 static void nocomp_send_cleanup(MultiFDSendParams *p, Error **errp)
 {
-- 
2.33.1

[PATCH 03/23] multifd: Rename used field to num

2021-11-23 Thread Juan Quintela

We will need to split it later in zero_num (number of zero pages) and
normal_num (number of normal pages).  This name is better.

Signed-off-by: Juan Quintela 
---
 migration/multifd.h |  2 +-
 migration/multifd.c | 38 +++---
 2 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/migration/multifd.h b/migration/multifd.h
index 15c50ca0b2..86820dd028 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -55,7 +55,7 @@ typedef struct {
 
 typedef struct {
 /* number of used pages */
-uint32_t used;
+uint32_t num;
 /* number of allocated pages */
 uint32_t allocated;
 /* global number of generated multifd packets */
diff --git a/migration/multifd.c b/migration/multifd.c
index 8125d0015c..8ea86d81dc 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -252,7 +252,7 @@ static MultiFDPages_t *multifd_pages_init(size_t size)
 
 static void multifd_pages_clear(MultiFDPages_t *pages)
 {
-pages->used = 0;
+pages->num = 0;
 pages->allocated = 0;
 pages->packet_num = 0;
 pages->block = NULL;
@@ -270,7 +270,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 
 packet->flags = cpu_to_be32(p->flags);
 packet->pages_alloc = cpu_to_be32(p->pages->allocated);
-packet->pages_used = cpu_to_be32(p->pages->used);
+packet->pages_used = cpu_to_be32(p->pages->num);
 packet->next_packet_size = cpu_to_be32(p->next_packet_size);
 packet->packet_num = cpu_to_be64(p->packet_num);
 
@@ -278,7 +278,7 @@ static void multifd_send_fill_packet(MultiFDSendParams *p)
 strncpy(packet->ramblock, p->pages->block->idstr, 256);
 }
 
-for (i = 0; i < p->pages->used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 /* there are architectures where ram_addr_t is 32 bit */
 uint64_t temp = p->pages->offset[i];
 
@@ -332,18 +332,18 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams 
*p, Error **errp)
 p->pages = multifd_pages_init(packet->pages_alloc);
 }
 
-p->pages->used = be32_to_cpu(packet->pages_used);
-if (p->pages->used > packet->pages_alloc) {
+p->pages->num = be32_to_cpu(packet->pages_used);
+if (p->pages->num > packet->pages_alloc) {
 error_setg(errp, "multifd: received packet "
"with %d pages and expected maximum pages are %d",
-   p->pages->used, packet->pages_alloc) ;
+   p->pages->num, packet->pages_alloc) ;
 return -1;
 }
 
 p->next_packet_size = be32_to_cpu(packet->next_packet_size);
 p->packet_num = be64_to_cpu(packet->packet_num);
 
-if (p->pages->used == 0) {
+if (p->pages->num == 0) {
 return 0;
 }
 
@@ -356,7 +356,7 @@ static int multifd_recv_unfill_packet(MultiFDRecvParams *p, 
Error **errp)
 return -1;
 }
 
-for (i = 0; i < p->pages->used; i++) {
+for (i = 0; i < p->pages->num; i++) {
 uint64_t offset = be64_to_cpu(packet->offset[i]);
 
 if (offset > (block->used_length - page_size)) {
@@ -443,13 +443,13 @@ static int multifd_send_pages(QEMUFile *f)
 }
 qemu_mutex_unlock(&p->mutex);
 }
-assert(!p->pages->used);
+assert(!p->pages->num);
 assert(!p->pages->block);
 
 p->packet_num = multifd_send_state->packet_num++;
 multifd_send_state->pages = p->pages;
 p->pages = pages;
-transferred = ((uint64_t) pages->used) * qemu_target_page_size()
+transferred = ((uint64_t) pages->num) * qemu_target_page_size()
 + p->packet_len;
 qemu_file_update_transfer(f, transferred);
 ram_counters.multifd_bytes += transferred;
@@ -469,12 +469,12 @@ int multifd_queue_page(QEMUFile *f, RAMBlock *block, 
ram_addr_t offset)
 }
 
 if (pages->block == block) {
-pages->offset[pages->used] = offset;
-pages->iov[pages->used].iov_base = block->host + offset;
-pages->iov[pages->used].iov_len = qemu_target_page_size();
-pages->used++;
+pages->offset[pages->num] = offset;
+pages->iov[pages->num].iov_base = block->host + offset;
+pages->iov[pages->num].iov_len = qemu_target_page_size();
+pages->num++;
 
-if (pages->used < pages->allocated) {
+if (pages->num < pages->allocated) {
 return 1;
 }
 }
@@ -586,7 +586,7 @@ void multifd_send_sync_main(QEMUFile *f)
 if (!migrate_use_multifd()) {
 return;
 }
-if (multifd_send_state->pages->used) {
+if (multifd_send_state->pages->num) {
 if (multifd_send_pages(f) < 0) {
 error_report("%s: multifd_send_pages fail", __func__);
 return;
@@ -649,7 +649,7 @@ static void *multifd_send_thread(void *opaque)
 qemu_mutex_lock(&p->mutex);
 
 if (p->pending_job) {
-uint32_t used = p->pages->used;
+uint32_t used = p->pages->num;
 uint64_t packet_num = p->packet_num;
 flags = p->flags;
 
@@ -665,7 +66

[PATCH 12/23] multifd: Make zlib use iov's

2021-11-23 Thread Juan Quintela

Signed-off-by: Juan Quintela 
---
 migration/multifd-zlib.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index da6201704c..478a4af115 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -143,6 +143,9 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
 }
 out_size += available - zs->avail_out;
 }
+p->iov[p->iovs_num].iov_base = z->zbuff;
+p->iov[p->iovs_num].iov_len = out_size;
+p->iovs_num++;
 p->next_packet_size = out_size;
 p->flags |= MULTIFD_FLAG_ZLIB;
 
@@ -162,10 +165,7 @@ static int zlib_send_prepare(MultiFDSendParams *p, Error 
**errp)
  */
 static int zlib_send_write(MultiFDSendParams *p, uint32_t used, Error **errp)
 {
-struct zlib_data *z = p->data;
-
-return qio_channel_write_all(p->c, (void *)z->zbuff, p->next_packet_size,
- errp);
+return qio_channel_writev_all(p->c, p->iov, p->iovs_num, errp);
 }
 
 /**
-- 
2.33.1

[PATCH 00/23] Migration: Transmit and detect zero pages in the multifd threads

2021-11-23 Thread Juan Quintela

Hi

Since Friday version:
- More cleanups on the code
- Remove repeated calls to qemu_target_page_size()
- Establish normal pages and zero pages
- detect zero pages on the multifd threads
- send zero pages through the multifd channels.
- reviews by Richard addressed.

It pases migration-test, so it should be perfect O:+)

ToDo for next version:
- check the version changes
  I need that 6.2 is out to check for 7.0.
  This code don't exist at all due to that reason.
- Send measurements of the differences

Please, review.

[

Friday version that just created a single writev instead of
write+writev.

]

Right now, multifd does a write() for the header and a writev() for
each group of pages.  Simplify it so we send the header as another
member of the IOV.

Once there, I got several simplifications:
* is_zero_range() was used only once, just use its body.
* same with is_zero_page().
* Be consintent and use offset insed the ramblock everywhere.
* Now that we have the offsets of the ramblock, we can drop the iov.
* Now that nothing uses iov's except NOCOMP method, move the iovs
  from pages to methods.
* Now we can use iov's with a single field for zlib/zstd.
* send_write() method is the same in all the implementaitons, so use
  it directly.
* Now, we can use a single writev() to write everything.

ToDo: Move zero page detection to the multifd thrteads.

With RAM in the Terabytes size, the detection of the zero page takes
too much time on the main thread.

Last patch on the series removes the detection of zero pages in the
main thread for multifd.  In the next series post, I will add how to
detect the zero pages and send them on multifd channels.

Please review.

Later, Juan.

Juan Quintela (23):
  multifd: Delete useless operation
  migration: Never call twice qemu_target_page_size()
  multifd: Rename used field to num
  multifd: Add missing documention
  multifd: The variable is only used inside the loop
  multifd: remove used parameter from send_prepare() method
  multifd: remove used parameter from send_recv_pages() method
  multifd: Fill offset and block for reception
  multifd: Make zstd compression method not use iovs
  multifd: Make zlib compression method not use iovs
  multifd: Move iov from pages to params
  multifd: Make zlib use iov's
  multifd: Make zstd use iov's
  multifd: Remove send_write() method
  multifd: Use a single writev on the send side
  multifd: Unfold "used" variable by its value
  multifd: Use normal pages array on the send side
  multifd: Use normal pages array on the recv side
  multifd: recv side only needs the RAMBlock host address
  multifd: Rename pages_used to normal_pages
  multifd: Support for zero pages transmission
  multifd: Zero pages transmission
  migration: Use multifd before we check for the zero page

 migration/multifd.h  |  52 +++---
 migration/migration.c|   7 +-
 migration/multifd-zlib.c |  71 +
 migration/multifd-zstd.c |  70 +
 migration/multifd.c  | 214 +++
 migration/ram.c  |  22 ++--
 migration/savevm.c   |   5 +-
 migration/trace-events   |   4 +-
 8 files changed, 231 insertions(+), 214 deletions(-)

-- 
2.33.1

Re: [for-6.2] hw/intc/arm_gicv3: Update cached state after acknowledging LPI

2021-11-23 Thread Peter Maydell

On Tue, 23 Nov 2021 at 17:10, Peter Maydell  wrote:
>
> In gicv3_redist_lpi_pending() we update cs->hpplpi to indicate the
> new highest priority pending LPI after changing the requested LPI
> pending bit.  However the overall highest priority pending interrupt
> information won't be updated unless we call gicv3_redist_update().
> We do that from the callsite in gicv3-redist_process_lpi(), but not
> from the callsite in icc_activate_irq().  The effect is that when the
> guest acknowledges an LPI by reading ICC_IAR1_EL1, we mark it as not
> pending in the data structure but still leave it in cs->hppi so will
> offer it to the guest again.
>
> The effect is that if we are using an emulated GICv3 and ITS and
> using devices which use LPIs (ie PCI devices) then Linux will
> complain "irq 54: nobody cared" and then hang (probably because the
> stale bogus interrupt info meant we never tried to deliver some other
> real interrupt).

Hmm; this is definitely a bug, but maybe it's not the cause of
the symptoms listed above -- I've just seen them again even
with this fix. I'll keep digging...

-- PMM

[PATCH v6 16/16] meson: Move bsd_user_ss to bsd-user/

2021-11-23 Thread Richard Henderson

We have no need to reference bsd_user_ss outside of bsd-user.
Go ahead and merge it directly into specific_ss.

Reviewed-by: Warner Losh 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build  | 3 ---
 bsd-user/meson.build | 4 
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/meson.build b/meson.build
index bf7af382de..e425129011 100644
--- a/meson.build
+++ b/meson.build
@@ -2363,7 +2363,6 @@ genh += hxdep
 authz_ss = ss.source_set()
 blockdev_ss = ss.source_set()
 block_ss = ss.source_set()
-bsd_user_ss = ss.source_set()
 chardev_ss = ss.source_set()
 common_ss = ss.source_set()
 common_user_ss = ss.source_set()
@@ -2618,8 +2617,6 @@ subdir('common-user')
 subdir('bsd-user')
 subdir('linux-user')
 
-specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
-
 common_user_ss = common_user_ss.apply(config_all, strict: false)
 common_user = static_library('common-user',
  sources: common_user_ss.sources(),
diff --git a/bsd-user/meson.build b/bsd-user/meson.build
index 25c3976ead..9fcb80c3fa 100644
--- a/bsd-user/meson.build
+++ b/bsd-user/meson.build
@@ -2,6 +2,8 @@ if not have_bsd_user
subdir_done()
 endif
 
+bsd_user_ss = ss.source_set()
+
 common_user_inc += include_directories('.')
 
 bsd_user_ss.add(files(
@@ -17,3 +19,5 @@ bsd_user_ss.add(files(
 
 # Pull in the OS-specific build glue, if any
 subdir(targetos)
+
+specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
-- 
2.25.1

[PATCH v6 14/16] linux-user: Move thunk.c from top-level

2021-11-23 Thread Richard Henderson

So far, linux-user is the only user of these functions.
Clean up the build machinery by restricting it to linux-user.

Reviewed-by: Warner Losh 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 meson.build   | 1 -
 thunk.c => linux-user/thunk.c | 0
 MAINTAINERS   | 1 -
 linux-user/meson.build| 1 +
 4 files changed, 1 insertion(+), 2 deletions(-)
 rename thunk.c => linux-user/thunk.c (100%)

diff --git a/meson.build b/meson.build
index d5ac65b877..cd97b154bd 100644
--- a/meson.build
+++ b/meson.build
@@ -2621,7 +2621,6 @@ subdir('linux-user')
 
 specific_ss.add_all(when: 'CONFIG_BSD_USER', if_true: bsd_user_ss)
 
-linux_user_ss.add(files('thunk.c'))
 specific_ss.add_all(when: 'CONFIG_LINUX_USER', if_true: linux_user_ss)
 
 common_user_ss = common_user_ss.apply(config_all, strict: false)
diff --git a/thunk.c b/linux-user/thunk.c
similarity index 100%
rename from thunk.c
rename to linux-user/thunk.c
diff --git a/MAINTAINERS b/MAINTAINERS
index e3a12857f9..d3c045ff86 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3076,7 +3076,6 @@ Usermode Emulation
 Overall usermode emulation
 M: Riku Voipio 
 S: Maintained
-F: thunk.c
 F: accel/tcg/user-exec*.c
 F: include/user/
 F: common-user/
diff --git a/linux-user/meson.build b/linux-user/meson.build
index bf9d945504..fcf7d40f23 100644
--- a/linux-user/meson.build
+++ b/linux-user/meson.build
@@ -15,6 +15,7 @@ linux_user_ss.add(files(
   'signal.c',
   'strace.c',
   'syscall.c',
+  'thunk.c',
   'uaccess.c',
   'uname.c',
 ))
-- 
2.25.1

1 2 3 >

1 - 100 of 215 matches

Mail list logo