date:20230918

Re: [PATCH 4/7] block/dirty-bitmap: Clean up local variable shadowing

2023-09-18 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 31.08.2023 um 15:25 hat Markus Armbruster geschrieben:
>> Local variables shadowing other local variables or parameters make the
>> code needlessly hard to understand.  Tracked down with -Wshadow=local.
>> Clean up: delete inner declarations when they are actually redundant,
>> else rename variables.
>> 
>> Signed-off-by: Markus Armbruster 
>> ---
>>  block/monitor/bitmap-qmp-cmds.c | 2 +-
>>  block/qcow2-bitmap.c| 3 +--
>>  2 files changed, 2 insertions(+), 3 deletions(-)
>> 
>> diff --git a/block/monitor/bitmap-qmp-cmds.c 
>> b/block/monitor/bitmap-qmp-cmds.c
>> index 55f778f5af..4d018423d8 100644
>> --- a/block/monitor/bitmap-qmp-cmds.c
>> +++ b/block/monitor/bitmap-qmp-cmds.c
>> @@ -276,7 +276,7 @@ BdrvDirtyBitmap *block_dirty_bitmap_merge(const char 
>> *node, const char *target,
>>  
>>  for (lst = bms; lst; lst = lst->next) {
>>  switch (lst->value->type) {
>> -const char *name, *node;
>> +const char *name;
>>  case QTYPE_QSTRING:
>>  name = lst->value->u.local;
>>  src = bdrv_find_dirty_bitmap(bs, name);
>
> The names in this function are all over the place... A more ambitious
> patch could rename the parameters to dst_node/dst_bitmap and these
> variables to src_node/src_bitmap to get some more consistency (both with
> each other and with the existing src/dst variables).

What exactly would you like me to consider?  Perhaps:

* Rename parameter @node to @dst_node

* Rename which parameter to @dst_bitmap?

* Rename nested local @node to @src_node

* Rename which local variable to @src_bitmap?

* Move nested locals to function scope

> Preexisting, so I'm not insisting that you should do this.
>
>> diff --git a/block/qcow2-bitmap.c b/block/qcow2-bitmap.c
>> index 037fa2d435..ffd5cd3b23 100644
>> --- a/block/qcow2-bitmap.c
>> +++ b/block/qcow2-bitmap.c
>> @@ -1555,7 +1555,6 @@ bool 
>> qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>  FOR_EACH_DIRTY_BITMAP(bs, bitmap) {
>>  const char *name = bdrv_dirty_bitmap_name(bitmap);
>>  uint32_t granularity = bdrv_dirty_bitmap_granularity(bitmap);
>> -Qcow2Bitmap *bm;
>>  
>>  if (!bdrv_dirty_bitmap_get_persistence(bitmap) ||
>>  bdrv_dirty_bitmap_inconsistent(bitmap)) {
>> @@ -1625,7 +1624,7 @@ bool 
>> qcow2_store_persistent_dirty_bitmaps(BlockDriverState *bs,
>>  
>>  /* allocate clusters and store bitmaps */
>>  QSIMPLEQ_FOREACH(bm, bm_list, entry) {
>> -BdrvDirtyBitmap *bitmap = bm->dirty_bitmap;
>> +bitmap = bm->dirty_bitmap;
>>  
>>  if (bitmap == NULL || bdrv_dirty_bitmap_readonly(bitmap)) {
>>  continue;
>
> Reviewed-by: Kevin Wolf 

Thanks!

Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for Smepmp

2023-09-18 Thread Himanshu Chauhan




> On 19-Sep-2023, at 10:51 AM, Alistair Francis  wrote:
> 
> On Tue, Sep 19, 2023 at 3:08 PM Chang Alvin  wrote:
>> 
>>> -Original Message-
>> 
>>> From: Alistair Francis 
>> 
>>> Sent: Tuesday, September 19, 2023 12:42 PM
>> 
>>> To: Alvin Che-Chia Chang(張哲嘉) 
>> 
>>> Cc: qemu-ri...@nongnu.org; qemu-devel@nongnu.org;
>> 
>>> alistair.fran...@wdc.com; ajo...@ventanamicro.com
>> 
>>> Subject: Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for
>> 
>>> Smepmp to version 1.0
>> 
>>> 
>> 
>>> On Fri, Sep 15, 2023 at 6:32 PM Alvin Chang  wrote:
>> 
 
>> 
 Current checks on writing pmpcfg for Smepmp follows Smepmp version
>> 
 0.9.1. However, Smepmp specification has already been ratified, and
>> 
 there are some differences between version 0.9.1 and 1.0. In this
>> 
 commit we update the checks of writing pmpcfg to follow Smepmp version
>> 
 1.0.
>> 
 
>> 
 When mseccfg.MML is set, the constraints to modify PMP rules are:
>> 
 1. Locked rules cannot be removed or modified until a PMP reset, unless
>> 
   mseccfg.RLB is set.
>> 
 2. From Smepmp specification version 1.0, chapter 2 section 4b:
>> 
   Adding a rule with executable privileges that either is M-mode-only
>> 
   or a locked Shared-Region is not possible and such pmpcfg writes are
>> 
   ignored, leaving pmpcfg unchanged.
>> 
 
>> 
 The commit transfers the value of pmpcfg into the index of the Smepmp
>> 
 truth table, and checks the rules by aforementioned specification
>> 
 changes.
>> 
 
>> 
 Signed-off-by: Alvin Chang 
>> 
 ---
>> 
 Changes from v2: Adopt switch case ranges and numerical order.
>> 
 
>> 
 Changes from v1: Convert ePMP over to Smepmp.
>> 
>>> 
>> 
>>> Did this part get lost?
>> 
>>> 
>> 
>>> Alistair
>> 
>>> 
>> 
>> 
>> Sorry, do you mean that the term "ePMP" should be changed to "Smepmp" in 
>> source code?
> 
> We still call it epmp to users and it's still marked as experimental.
> 
> See this line in the QEMU source:
> 
> MULTI_EXT_CFG_BOOL("x-epmp", epmp, false),
> 
> Alistair

I had sent a patch to rename epmp to Smepmp
https://www.mail-archive.com/qemu-devel@nongnu.org/msg967676.html


Regards
Himanshu

Re: [PATCH 1/7] migration/rdma: Fix save_page method to fail on polling error

2023-09-18 Thread Markus Armbruster

Eric Blake  writes:

> On Thu, Aug 31, 2023 at 03:25:40PM +0200, Markus Armbruster wrote:
>> qemu_rdma_save_page() reports polling error with error_report(), then
>> succeeds anyway.  This is because the variable holding the polling
>> status *shadows* the variable the function returns.  The latter
>> remains zero.
>> 
>> Broken since day one, and duplicated more recently.
>> 
>> Fixes: 2da776db4846 (rdma: core logic)
>> Fixes: b390afd8c50b (migration/rdma: Fix out of order wrid)
>
> Alas, the curse of immutable git history preserving typos in commit
> subjects ;)

"wrid" is short for "work request ID", actually.

> The alternative of rewriting history and breaking SHA
> references is worse.

Rewriting master is a big no-no.

git-note can be used to correct the record, but it has its usability
issues.

>> Signed-off-by: Markus Armbruster 
>> ---
>>  migration/rdma.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> Reviewed-by: Eric Blake 

Thanks!

Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs

2023-09-18 Thread Zhijian Li (Fujitsu)



On 18/09/2023 22:41, Markus Armbruster wrote:
> We use int instead of uint64_t in a few places.  Change them to
> uint64_t.
> 
> This cleans up a comparison of signed qemu_rdma_block_for_wrid()
> parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
> actual arguments are non-negative enumeration constants.
> 
> Signed-off-by: Markus Armbruster 


Reviewed-by: Li Zhijian 

Thanks


> ---
>   migration/rdma.c   | 7 ---
>   migration/trace-events | 8 
>   2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/rdma.c b/migration/rdma.c
> index cda22be3f7..4328610a4c 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1599,13 +1599,13 @@ static int qemu_rdma_wait_comp_channel(RDMAContext 
> *rdma,
>   return rdma->error_state;
>   }
>   
> -static struct ibv_comp_channel *to_channel(RDMAContext *rdma, int wrid)
> +static struct ibv_comp_channel *to_channel(RDMAContext *rdma, uint64_t wrid)
>   {
>   return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_comp_channel :
>  rdma->recv_comp_channel;
>   }
>   
> -static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
> +static struct ibv_cq *to_cq(RDMAContext *rdma, uint64_t wrid)
>   {
>   return wrid < RDMA_WRID_RECV_CONTROL ? rdma->send_cq : rdma->recv_cq;
>   }
> @@ -1623,7 +1623,8 @@ static struct ibv_cq *to_cq(RDMAContext *rdma, int wrid)
>* completions only need to be recorded, but do not actually
>* need further processing.
>*/
> -static int qemu_rdma_block_for_wrid(RDMAContext *rdma, int wrid_requested,
> +static int qemu_rdma_block_for_wrid(RDMAContext *rdma,
> +uint64_t wrid_requested,
>   uint32_t *byte_len)
>   {
>   int num_cq_events = 0, ret = 0;
> diff --git a/migration/trace-events b/migration/trace-events
> index b78808f28b..d733107ec6 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -207,7 +207,7 @@ qemu_rdma_accept_incoming_migration(void) ""
>   qemu_rdma_accept_incoming_migration_accepted(void) ""
>   qemu_rdma_accept_pin_state(bool pin) "%d"
>   qemu_rdma_accept_pin_verbsc(void *verbs) "Verbs context after listen: %p"
> -qemu_rdma_block_for_wrid_miss(int wcomp, uint64_t req) "A Wanted wrid %d but 
> got %" PRIu64
> +qemu_rdma_block_for_wrid_miss(uint64_t wcomp, uint64_t req) "A Wanted wrid 
> %" PRIu64 " but got %" PRIu64
>   qemu_rdma_cleanup_disconnect(void) ""
>   qemu_rdma_close(void) ""
>   qemu_rdma_connect_pin_all_requested(void) ""
> @@ -221,9 +221,9 @@ qemu_rdma_exchange_send_waiting(const char *desc) 
> "Waiting for response %s"
>   qemu_rdma_exchange_send_received(const char *desc) "Response %s received."
>   qemu_rdma_fill(size_t control_len, size_t size) "RDMA %zd of %zd bytes 
> already in buffer"
>   qemu_rdma_init_ram_blocks(int blocks) "Allocated %d local ram block 
> structures"
> -qemu_rdma_poll_recv(int64_t comp, int64_t id, int sent) "completion %" 
> PRId64 " received (%" PRId64 ") left %d"
> -qemu_rdma_poll_write(int64_t comp, int left, uint64_t block, uint64_t chunk, 
> void *local, void *remote) "completions %" PRId64 " left %d, block %" PRIu64 
> ", chunk: %" PRIu64 " %p %p"
> -qemu_rdma_poll_other(int64_t comp, int left) "other completion %" PRId64 " 
> received left %d"
> +qemu_rdma_poll_recv(uint64_t comp, int64_t id, int sent) "completion %" 
> PRIu64 " received (%" PRId64 ") left %d"
> +qemu_rdma_poll_write(uint64_t comp, int left, uint64_t block, uint64_t 
> chunk, void *local, void *remote) "completions %" PRIu64 " left %d, block %" 
> PRIu64 ", chunk: %" PRIu64 " %p %p"
> +qemu_rdma_poll_other(uint64_t comp, int left) "other completion %" PRIu64 " 
> received left %d"
>   qemu_rdma_post_send_control(const char *desc) "CONTROL: sending %s.."
>   qemu_rdma_register_and_get_keys(uint64_t len, void *start) "Registering %" 
> PRIu64 " bytes @ %p"
>   qemu_rdma_register_odp_mr(const char *name) "Try to register On-Demand 
> Paging memory region: %s"

Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for Smepmp

2023-09-18 Thread Alistair Francis

On Tue, Sep 19, 2023 at 3:08 PM Chang Alvin  wrote:
>
>  > -Original Message-
>
> > From: Alistair Francis 
>
> > Sent: Tuesday, September 19, 2023 12:42 PM
>
> > To: Alvin Che-Chia Chang(張哲嘉) 
>
> > Cc: qemu-ri...@nongnu.org; qemu-devel@nongnu.org;
>
> > alistair.fran...@wdc.com; ajo...@ventanamicro.com
>
> > Subject: Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for
>
> > Smepmp to version 1.0
>
> >
>
> > On Fri, Sep 15, 2023 at 6:32 PM Alvin Chang  wrote:
>
> > >
>
> > > Current checks on writing pmpcfg for Smepmp follows Smepmp version
>
> > > 0.9.1. However, Smepmp specification has already been ratified, and
>
> > > there are some differences between version 0.9.1 and 1.0. In this
>
> > > commit we update the checks of writing pmpcfg to follow Smepmp version
>
> > > 1.0.
>
> > >
>
> > > When mseccfg.MML is set, the constraints to modify PMP rules are:
>
> > > 1. Locked rules cannot be removed or modified until a PMP reset, unless
>
> > >mseccfg.RLB is set.
>
> > > 2. From Smepmp specification version 1.0, chapter 2 section 4b:
>
> > >Adding a rule with executable privileges that either is M-mode-only
>
> > >or a locked Shared-Region is not possible and such pmpcfg writes are
>
> > >ignored, leaving pmpcfg unchanged.
>
> > >
>
> > > The commit transfers the value of pmpcfg into the index of the Smepmp
>
> > > truth table, and checks the rules by aforementioned specification
>
> > > changes.
>
> > >
>
> > > Signed-off-by: Alvin Chang 
>
> > > ---
>
> > > Changes from v2: Adopt switch case ranges and numerical order.
>
> > >
>
> > > Changes from v1: Convert ePMP over to Smepmp.
>
> >
>
> > Did this part get lost?
>
> >
>
> > Alistair
>
> >
>
>
> Sorry, do you mean that the term "ePMP" should be changed to "Smepmp" in 
> source code?

We still call it epmp to users and it's still marked as experimental.

See this line in the QEMU source:

MULTI_EXT_CFG_BOOL("x-epmp", epmp, false),

Alistair

Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for Smepmp

2023-09-18 Thread Chang Alvin

 > -Original Message-

> From: Alistair Francis 

> Sent: Tuesday, September 19, 2023 12:42 PM

> To: Alvin Che-Chia Chang(張哲嘉) 

> Cc: qemu-ri...@nongnu.org; qemu-devel@nongnu.org;

> alistair.fran...@wdc.com; ajo...@ventanamicro.com

> Subject: Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for

> Smepmp to version 1.0

>

> On Fri, Sep 15, 2023 at 6:32 PM Alvin Chang  wrote:

> >

> > Current checks on writing pmpcfg for Smepmp follows Smepmp version

> > 0.9.1. However, Smepmp specification has already been ratified, and

> > there are some differences between version 0.9.1 and 1.0. In this

> > commit we update the checks of writing pmpcfg to follow Smepmp version

> > 1.0.

> >

> > When mseccfg.MML is set, the constraints to modify PMP rules are:

> > 1. Locked rules cannot be removed or modified until a PMP reset, unless

> >mseccfg.RLB is set.

> > 2. From Smepmp specification version 1.0, chapter 2 section 4b:

> >Adding a rule with executable privileges that either is M-mode-only

> >or a locked Shared-Region is not possible and such pmpcfg writes are

> >ignored, leaving pmpcfg unchanged.

> >

> > The commit transfers the value of pmpcfg into the index of the Smepmp

> > truth table, and checks the rules by aforementioned specification

> > changes.

> >

> > Signed-off-by: Alvin Chang 

> > ---

> > Changes from v2: Adopt switch case ranges and numerical order.

> >

> > Changes from v1: Convert ePMP over to Smepmp.

>

> Did this part get lost?

>

> Alistair

>


Sorry, do you mean that the term "ePMP" should be changed to "Smepmp" in
source code?


Alvin Chang


> >

> >  target/riscv/pmp.c | 40 

> >  1 file changed, 32 insertions(+), 8 deletions(-)

> >

> > diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c index

> > a08cd95658..b144932b3b 100644

> > --- a/target/riscv/pmp.c

> > +++ b/target/riscv/pmp.c

> > @@ -99,16 +99,40 @@ static void pmp_write_cfg(CPURISCVState *env,

> uint32_t pmp_index, uint8_t val)

> >  locked = false;

> >  }

> >

> > -/* mseccfg.MML is set */

> > -if (MSECCFG_MML_ISSET(env)) {

> > -/* not adding execute bit */

> > -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) !=

> PMP_EXEC) {

> > +/*

> > + * mseccfg.MML is set. Locked rules cannot be removed or

> modified

> > + * until a PMP reset. Besides, from Smepmp specification

> version 1.0

> > + * , chapter 2 section 4b says:

> > + * Adding a rule with executable privileges that either is

> > + * M-mode-only or a locked Shared-Region is not possible

> and such

> > + * pmpcfg writes are ignored, leaving pmpcfg unchanged.

> > + */

> > +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env,

> pmp_index)) {

> > +/*

> > + * Convert the PMP permissions to match the truth

> table in the

> > + * ePMP spec.

> > + */

> > +const uint8_t epmp_operation =

> > +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) <<

> 2) |

> > +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2);

> > +

> > +switch (epmp_operation) {

> > +case 0 ... 8:

> >  locked = false;

> > -}

> > -/* shared region and not adding X bit */

> > -if ((val & PMP_LOCK) != PMP_LOCK &&

> > -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {

> > +break;

> > +case 9 ... 11:

> > +break;

> > +case 12:

> > +locked = false;

> > +break;

> > +case 13:

> > +break;

> > +case 14:

> > +case 15:

> >  locked = false;

> > +break;

> > +default:

> > +g_assert_not_reached();

> >  }

> >  }

> >  } else {

> > --

> > 2.34.1

> >

> >

Re: [PATCH v5 16/20] linux-user/riscv: Add vdso

2023-09-18 Thread Alistair Francis

On Wed, Aug 30, 2023 at 9:17 AM Richard Henderson
 wrote:
>
> Signed-off-by: Richard Henderson 

Acked-by: Alistair Francis 

Alistair

> ---
>  linux-user/riscv/vdso-asmoffset.h |   9 ++
>  linux-user/elfload.c  |   4 +
>  linux-user/riscv/signal.c |   8 ++
>  linux-user/meson.build|   1 +
>  linux-user/riscv/meson.build  |  30 +
>  linux-user/riscv/vdso-32.so   | Bin 0 -> 2888 bytes
>  linux-user/riscv/vdso-64.so   | Bin 0 -> 3840 bytes
>  linux-user/riscv/vdso.S   | 186 ++
>  linux-user/riscv/vdso.ld  |  74 
>  9 files changed, 312 insertions(+)
>  create mode 100644 linux-user/riscv/vdso-asmoffset.h
>  create mode 100644 linux-user/riscv/meson.build
>  create mode 100755 linux-user/riscv/vdso-32.so
>  create mode 100755 linux-user/riscv/vdso-64.so
>  create mode 100644 linux-user/riscv/vdso.S
>  create mode 100644 linux-user/riscv/vdso.ld
>
> diff --git a/linux-user/riscv/vdso-asmoffset.h 
> b/linux-user/riscv/vdso-asmoffset.h
> new file mode 100644
> index 00..123902ef61
> --- /dev/null
> +++ b/linux-user/riscv/vdso-asmoffset.h
> @@ -0,0 +1,9 @@
> +#ifdef TARGET_ABI32
> +# define sizeof_rt_sigframe 0x2b0
> +# define offsetof_uc_mcontext   0x120
> +# define offsetof_freg0 0x80
> +#else
> +# define sizeof_rt_sigframe 0x340
> +# define offsetof_uc_mcontext   0x130
> +# define offsetof_freg0 0x100
> +#endif
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index eb22a17e0e..8f902bb427 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -1733,8 +1733,10 @@ static void elf_core_copy_regs(target_elf_gregset_t 
> *regs,
>
>  #ifdef TARGET_RISCV32
>  #define ELF_CLASS ELFCLASS32
> +#include "vdso-32.c.inc"
>  #else
>  #define ELF_CLASS ELFCLASS64
> +#include "vdso-64.c.inc"
>  #endif
>
>  #define ELF_HWCAP get_elf_hwcap()
> @@ -1751,6 +1753,8 @@ static uint32_t get_elf_hwcap(void)
>  #undef MISA_BIT
>  }
>
> +#define vdso_image_info()_image_info
> +
>  static inline void init_thread(struct target_pt_regs *regs,
> struct image_info *infop)
>  {
> diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
> index eaa168199a..5449c7618a 100644
> --- a/linux-user/riscv/signal.c
> +++ b/linux-user/riscv/signal.c
> @@ -21,6 +21,7 @@
>  #include "user-internals.h"
>  #include "signal-common.h"
>  #include "linux-user/trace.h"
> +#include "vdso-asmoffset.h"
>
>  /* Signal handler invocation must be transparent for the code being
> interrupted. Complete CPU (hart) state is saved on entry and restored
> @@ -37,6 +38,8 @@ struct target_sigcontext {
>  uint32_t fcsr;
>  }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
>
> +QEMU_BUILD_BUG_ON(offsetof(struct target_sigcontext, fpr) != offsetof_freg0);
> +
>  struct target_ucontext {
>  unsigned long uc_flags;
>  struct target_ucontext *uc_link;
> @@ -51,6 +54,11 @@ struct target_rt_sigframe {
>  struct target_ucontext uc;
>  };
>
> +QEMU_BUILD_BUG_ON(sizeof(struct target_rt_sigframe)
> +  != sizeof_rt_sigframe);
> +QEMU_BUILD_BUG_ON(offsetof(struct target_rt_sigframe, uc.uc_mcontext)
> +  != offsetof_uc_mcontext);
> +
>  static abi_ulong get_sigframe(struct target_sigaction *ka,
>CPURISCVState *regs, size_t framesize)
>  {
> diff --git a/linux-user/meson.build b/linux-user/meson.build
> index 1b265ed365..3aa967b27c 100644
> --- a/linux-user/meson.build
> +++ b/linux-user/meson.build
> @@ -46,6 +46,7 @@ subdir('microblaze')
>  subdir('mips64')
>  subdir('mips')
>  subdir('ppc')
> +subdir('riscv')
>  subdir('s390x')
>  subdir('sh4')
>  subdir('sparc')
> diff --git a/linux-user/riscv/meson.build b/linux-user/riscv/meson.build
> new file mode 100644
> index 00..0a00cae9fd
> --- /dev/null
> +++ b/linux-user/riscv/meson.build
> @@ -0,0 +1,30 @@
> +vdso_cmd = [
> +build_vdso_cmd,
> +'-B', meson.project_build_root(),
> +'-C', meson.current_source_dir(),
> +'-T', 'riscv64-linux-user',
> +'-o', '@OUTPUT@',
> +'--',
> +'-nostdlib', '-shared', '-fpic',
> +'-Wl,-h,linux-vdso.so.1',
> +'-Wl,--build-id=sha1',
> +'-Wl,--hash-style=both',
> +'-Wl,-T,@INPUT1@',
> +'@INPUT0@'
> +]
> +
> +vdso_32_so = custom_target(output: 'vdso-32.so',
> +   input: files('vdso.S', 'vdso.ld'),
> +   depend_files: files('vdso-asmoffset.h'),
> +   command: vdso_cmd + ['-mabi=ilp32d', 
> '-march=rv32g'])
> +
> +vdso_64_so = custom_target(output: 'vdso-64.so',
> +   input: files('vdso.S', 'vdso.ld'),
> +   depend_files: files('vdso-asmoffset.h'),
> +   command: vdso_cmd + ['-mabi=lp64d', 
> '-march=rv64g'])
> +
> +vdso_32_inc = gen_vdso.process(vdso_32_so, extra_args: ['-r', 
> '__vdso_rt_sigreturn'])
> +vdso_64_inc

Re: Call for agenda for 2023-09-19 QEMU developers call

2023-09-18 Thread Elena Ufimtseva

On Tue, Sep 19, 2023 at 02:02:49AM +0200, Juan Quintela wrote:
> Elena Ufimtseva  wrote:
> > Hello Juan,
> >
> > Not sure if this is worth its own topic, would be it possible to hear
> > the community thoughts on the live migration series review/pull
> > progress (atomics, zero page multifd etc.. )? Seems like there are few
> > outstanding relevant patches.
> 
> Hi
> 
> If everybody agrees, can we move this topic to next call?
> I am on vacation this week and the next.
> 
> I was planning time to "moderate" the call, but preparing for a call
> about my topics is going to mean a divorce O:-)
> 

Thank you Juan! Understood :)

> Later, Juan.
> 
> PD.  I have had too many problem in the recent past with several things,
>  from my test machines to disappear (and configuring new ones taking
>  forever), to very bad time with the BOTS.  I expect/hope that
>  things are gonig to get better in the near future.

Re: [PATCH v3] target/riscv: update checks on writing pmpcfg for Smepmp to version 1.0

2023-09-18 Thread Alistair Francis

On Fri, Sep 15, 2023 at 6:32 PM Alvin Chang  wrote:
>
> Current checks on writing pmpcfg for Smepmp follows Smepmp version
> 0.9.1. However, Smepmp specification has already been ratified, and
> there are some differences between version 0.9.1 and 1.0. In this
> commit we update the checks of writing pmpcfg to follow Smepmp version
> 1.0.
>
> When mseccfg.MML is set, the constraints to modify PMP rules are:
> 1. Locked rules cannot be removed or modified until a PMP reset, unless
>mseccfg.RLB is set.
> 2. From Smepmp specification version 1.0, chapter 2 section 4b:
>Adding a rule with executable privileges that either is M-mode-only
>or a locked Shared-Region is not possible and such pmpcfg writes are
>ignored, leaving pmpcfg unchanged.
>
> The commit transfers the value of pmpcfg into the index of the Smepmp
> truth table, and checks the rules by aforementioned specification
> changes.
>
> Signed-off-by: Alvin Chang 
> ---
> Changes from v2: Adopt switch case ranges and numerical order.
>
> Changes from v1: Convert ePMP over to Smepmp.

Did this part get lost?

Alistair

>
>  target/riscv/pmp.c | 40 
>  1 file changed, 32 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index a08cd95658..b144932b3b 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -99,16 +99,40 @@ static void pmp_write_cfg(CPURISCVState *env, uint32_t 
> pmp_index, uint8_t val)
>  locked = false;
>  }
>
> -/* mseccfg.MML is set */
> -if (MSECCFG_MML_ISSET(env)) {
> -/* not adding execute bit */
> -if ((val & PMP_LOCK) != 0 && (val & PMP_EXEC) != PMP_EXEC) {
> +/*
> + * mseccfg.MML is set. Locked rules cannot be removed or modified
> + * until a PMP reset. Besides, from Smepmp specification version 
> 1.0
> + * , chapter 2 section 4b says:
> + * Adding a rule with executable privileges that either is
> + * M-mode-only or a locked Shared-Region is not possible and such
> + * pmpcfg writes are ignored, leaving pmpcfg unchanged.
> + */
> +if (MSECCFG_MML_ISSET(env) && !pmp_is_locked(env, pmp_index)) {
> +/*
> + * Convert the PMP permissions to match the truth table in 
> the
> + * ePMP spec.
> + */
> +const uint8_t epmp_operation =
> +((val & PMP_LOCK) >> 4) | ((val & PMP_READ) << 2) |
> +(val & PMP_WRITE) | ((val & PMP_EXEC) >> 2);
> +
> +switch (epmp_operation) {
> +case 0 ... 8:
>  locked = false;
> -}
> -/* shared region and not adding X bit */
> -if ((val & PMP_LOCK) != PMP_LOCK &&
> -(val & 0x7) != (PMP_WRITE | PMP_EXEC)) {
> +break;
> +case 9 ... 11:
> +break;
> +case 12:
> +locked = false;
> +break;
> +case 13:
> +break;
> +case 14:
> +case 15:
>  locked = false;
> +break;
> +default:
> +g_assert_not_reached();
>  }
>  }
>  } else {
> --
> 2.34.1
>
>

Re: [PATCH] target/riscv: pmp: Clear pmp/smepmp bits on reset

2023-09-18 Thread Alistair Francis

On Thu, Sep 7, 2023 at 4:25 PM Mayuresh Chitale
 wrote:
>
> As per the Priv and Smepmp specifications, certain bits such as the 'L'
> bit of pmp entries and mseccfg.MML can only be cleared upon reset and it
> is necessary to do so to allow 'M' mode firmware to correctly reinitialize
> the pmp/smpemp state across reboots.
>
> Signed-off-by: Mayuresh Chitale 
> ---
>  target/riscv/cpu.c | 11 +++
>  target/riscv/pmp.c | 10 ++
>  target/riscv/pmp.h |  1 +
>  3 files changed, 22 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 0843461660..77ed653b8d 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -896,6 +896,17 @@ static void riscv_cpu_reset_hold(Object *obj)
>  }
>  /* mmte is supposed to have pm.current hardwired to 1 */
>  env->mmte |= (EXT_STATUS_INITIAL | MMTE_M_PM_CURRENT);
> +
> +/*
> + * Clear mseccfg and unlock all the PMP entries upon reset.
> + * This is allowed as per the priv and smepmp specifications.
> + * and is needed to clear stale entries across reboots.
> + */
> +if (riscv_cpu_cfg(env)->ext_smepmp) {

Does this compile? ext_smepmp doesn't seem to exist

Alistair

> +env->mseccfg = 0;
> +}
> +
> +pmp_unlock_entries(env);
>  #endif
>  env->xl = riscv_cpu_mxl(env);
>  riscv_cpu_update_mask(env);
> diff --git a/target/riscv/pmp.c b/target/riscv/pmp.c
> index 3f6c8cf08d..f3eb6e6585 100644
> --- a/target/riscv/pmp.c
> +++ b/target/riscv/pmp.c
> @@ -131,6 +131,16 @@ static bool pmp_write_cfg(CPURISCVState *env, uint32_t 
> pmp_index, uint8_t val)
>  return false;
>  }
>
> +void pmp_unlock_entries(CPURISCVState *env)
> +{
> +uint32_t pmp_num = pmp_get_num_rules(env);
> +int i;
> +
> +for (i = 0; i < pmp_num; i++) {
> +env->pmp_state.pmp[i].cfg_reg &= ~PMP_LOCK;
> +}
> +}
> +
>  static void pmp_decode_napot(target_ulong a, target_ulong *sa,
>   target_ulong *ea)
>  {
> diff --git a/target/riscv/pmp.h b/target/riscv/pmp.h
> index cf5c99f8e6..2c5ec3cdf1 100644
> --- a/target/riscv/pmp.h
> +++ b/target/riscv/pmp.h
> @@ -81,6 +81,7 @@ void pmp_update_rule_addr(CPURISCVState *env, uint32_t 
> pmp_index);
>  void pmp_update_rule_nums(CPURISCVState *env);
>  uint32_t pmp_get_num_rules(CPURISCVState *env);
>  int pmp_priv_to_page_prot(pmp_priv_t pmp_priv);
> +void pmp_unlock_entries(CPURISCVState *env);
>
>  #define MSECCFG_MML_ISSET(env) get_field(env->mseccfg, MSECCFG_MML)
>  #define MSECCFG_MMWP_ISSET(env) get_field(env->mseccfg, MSECCFG_MMWP)
> --
> 2.34.1
>
>

Re: [PATCH] hw/i386/pc: fix max_used_gpa for 32-bit systems

2023-09-18 Thread Ani Sinha

On Tue, Sep 19, 2023 at 9:20 AM Ani Sinha  wrote:
>
> On Mon, Sep 18, 2023 at 9:28 PM David Hildenbrand  wrote:
> >
> > On 18.09.23 17:56, Ani Sinha wrote:
> > > On Mon, Sep 18, 2023 at 8:59 PM David Hildenbrand  
> > > wrote:
> > >>
> > >> On 18.09.23 17:22, Ani Sinha wrote:
> > >>> On Mon, Sep 18, 2023 at 7:25 PM Ani Sinha  wrote:
> > 
> >  32-bit systems do not have a reserved memory for hole64 but they may 
> >  have a
> >  reserved memory space for memory hotplug. Since, hole64 starts after 
> >  the
> >  reserved hotplug memory, the unaligned hole64 start address gives us 
> >  the
> >  end address for this memory hotplug region that the processor may use.
> >  Fix this. This ensures that the physical address space bound checking 
> >  works
> >  correctly for 32-bit systems as well.
> > >>>
> > >>> This patch breaks some unit tests. I am not sure why it did not catch
> > >>> it when I tested it before sending.
> > >>> Will have to resend after fixing the tests.
> > >>
> > >> Probably because they supply more memory than the system can actually
> > >> handle? (e.g., -m 4g on 32bit)?
> > >
> > > cxl tests are failing for example.
> > >
> > > $ ./qemu-system-i386 -display none -machine q35,cxl=on
> > > qemu-system-i386: Address space limit 0x < 0x1000f
> > > phys-bits too low (32)
>
> also another thing is:
>
> ./qemu-system-i386 -machine pc -m 128
> works but ...
>
> $ ./qemu-system-i386 -machine pc -m 128,slots=3,maxmem=1G
> qemu-system-i386: Address space limit 0x < 0x1f7ff
> phys-bits too low (32)
>
> or
>
> $ ./qemu-system-i386 -machine pc-i440fx-8.2 -accel kvm -m 
> 128,slots=3,maxmem=1G
> qemu-system-i386: Address space limit 0x < 0x1f7ff
> phys-bits too low (32)
>
> but of course after the compat knob older pc machines work fine using
> the old logic :
>
> $ ./qemu-system-i386 -machine pc-i440fx-8.1 -accel kvm -m 
> 128,slots=3,maxmem=1G
> VNC server running on ::1:5900
> ^Cqemu-system-i386: terminating on signal 2

I dpn't know if we always need to do this but this code adds 1 GiB per
slot for device memory :

if (pcmc->enforce_aligned_dimm) {
 /* size device region assuming 1G page max alignment per slot */
 size += (1 * GiB) * machine->ram_slots;
 }

For a 32-bit machine that is a lot of memory consumed in just alignment.

Re: [PATCH] disas/riscv: Fix the typo of inverted order of pmpaddr13 and pmpaddr14

2023-09-18 Thread Alistair Francis

On Thu, Sep 7, 2023 at 6:47 PM Alvin Chang  wrote:
>
> Fix the inverted order of pmpaddr13 and pmpaddr14 in csr_name().
>
> Signed-off-by: Alvin Chang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  disas/riscv.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/disas/riscv.c b/disas/riscv.c
> index 3873a69157..8e89e1d115 100644
> --- a/disas/riscv.c
> +++ b/disas/riscv.c
> @@ -2116,8 +2116,8 @@ static const char *csr_name(int csrno)
>  case 0x03ba: return "pmpaddr10";
>  case 0x03bb: return "pmpaddr11";
>  case 0x03bc: return "pmpaddr12";
> -case 0x03bd: return "pmpaddr14";
> -case 0x03be: return "pmpaddr13";
> +case 0x03bd: return "pmpaddr13";
> +case 0x03be: return "pmpaddr14";
>  case 0x03bf: return "pmpaddr15";
>  case 0x0780: return "mtohost";
>  case 0x0781: return "mfromhost";
> --
> 2.34.1
>
>

Re: [PATCH v4] target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

2023-09-18 Thread Alistair Francis

On Wed, Sep 13, 2023 at 7:14 PM liguang.zhang <18622748...@163.com> wrote:
>
> From: "liguang.zhang" 
>
> Fix the guest reboot error when using KVM
> There are two issues when rebooting a guest using KVM
> 1. When the guest initiates a reboot the host is unable to stop the vcpu
> 2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash
>
> This can be fixed by clearing the CSR values at reset and syncing the
> MPSTATE with the host.
>
> v4 update:
> rebase on riscv-to-apply
>
> Signed-off-by: liguang.zhang 

I have fixed up some of the issues and applied this

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/kvm.c   | 42 
>  target/riscv/kvm_riscv.h |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index c01cfb03f4..8ee410b9b1 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -51,6 +51,8 @@ void riscv_kvm_aplic_request(void *opaque, int irq, int 
> level)
>  kvm_set_irq(kvm_state, irq, !!level);
>  }
>
> +static bool cap_has_mp_state;
> +
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
>   uint64_t idx)
>  {
> @@ -797,6 +799,24 @@ int kvm_arch_get_registers(CPUState *cs)
>  return ret;
>  }
>
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state)
> +{
> +if (cap_has_mp_state) {
> +struct kvm_mp_state mp_state = {
> +.mp_state = state
> +};
> +
> +int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, _state);
> +if (ret) {
> +fprintf(stderr, "%s: failed to sync MP_STATE %d/%s\n",
> +__func__, ret, strerror(-ret));
> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_put_registers(CPUState *cs, int level)
>  {
>  int ret = 0;
> @@ -816,6 +836,18 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  return ret;
>  }
>
> +if (KVM_PUT_RESET_STATE == level) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +if (cs->cpu_index == 0) {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_RUNNABLE);
> +} else {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_STOPPED);
> +}
> +if (ret) {
> +return ret;
> +}
> +}
> +
>  return ret;
>  }
>
> @@ -928,6 +960,7 @@ int kvm_arch_get_default_type(MachineState *ms)
>
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> +cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  return 0;
>  }
>
> @@ -1014,10 +1047,19 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  if (!kvm_enabled()) {
>  return;
>  }
> +for (int i=0; i<32; i++)
> +env->gpr[i] = 0;
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
>  env->satp = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
> index de8c209ebc..8f8c1f969a 100644
> --- a/target/riscv/kvm_riscv.h
> +++ b/target/riscv/kvm_riscv.h
> @@ -27,5 +27,6 @@ void kvm_riscv_aia_create(MachineState *machine, uint64_t 
> group_shift,
>uint64_t aplic_base, uint64_t imsic_base,
>uint64_t guest_num);
>  void riscv_kvm_aplic_request(void *opaque, int irq, int level);
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state);
>
>  #endif
> --
> 2.41.0
>

Re: [PATCH v4] target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

2023-09-18 Thread Alistair Francis

On Wed, Sep 13, 2023 at 7:14 PM liguang.zhang <18622748...@163.com> wrote:
>
> From: "liguang.zhang" 
>
> Fix the guest reboot error when using KVM
> There are two issues when rebooting a guest using KVM
> 1. When the guest initiates a reboot the host is unable to stop the vcpu
> 2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash
>
> This can be fixed by clearing the CSR values at reset and syncing the
> MPSTATE with the host.
>
> v4 update:
> rebase on riscv-to-apply
>
> Signed-off-by: liguang.zhang 

This patch fails checkpatch. You should run checkpatch on all patches
before submitting them

git show | ./scripts/checkpatch.pl -

Alistair

> ---
>  target/riscv/kvm.c   | 42 
>  target/riscv/kvm_riscv.h |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index c01cfb03f4..8ee410b9b1 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -51,6 +51,8 @@ void riscv_kvm_aplic_request(void *opaque, int irq, int 
> level)
>  kvm_set_irq(kvm_state, irq, !!level);
>  }
>
> +static bool cap_has_mp_state;
> +
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
>   uint64_t idx)
>  {
> @@ -797,6 +799,24 @@ int kvm_arch_get_registers(CPUState *cs)
>  return ret;
>  }
>
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state)
> +{
> +if (cap_has_mp_state) {
> +struct kvm_mp_state mp_state = {
> +.mp_state = state
> +};
> +
> +int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, _state);
> +if (ret) {
> +fprintf(stderr, "%s: failed to sync MP_STATE %d/%s\n",
> +__func__, ret, strerror(-ret));
> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_put_registers(CPUState *cs, int level)
>  {
>  int ret = 0;
> @@ -816,6 +836,18 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  return ret;
>  }
>
> +if (KVM_PUT_RESET_STATE == level) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +if (cs->cpu_index == 0) {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_RUNNABLE);
> +} else {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_STOPPED);
> +}
> +if (ret) {
> +return ret;
> +}
> +}
> +
>  return ret;
>  }
>
> @@ -928,6 +960,7 @@ int kvm_arch_get_default_type(MachineState *ms)
>
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> +cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  return 0;
>  }
>
> @@ -1014,10 +1047,19 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  if (!kvm_enabled()) {
>  return;
>  }
> +for (int i=0; i<32; i++)
> +env->gpr[i] = 0;
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
>  env->satp = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
> index de8c209ebc..8f8c1f969a 100644
> --- a/target/riscv/kvm_riscv.h
> +++ b/target/riscv/kvm_riscv.h
> @@ -27,5 +27,6 @@ void kvm_riscv_aia_create(MachineState *machine, uint64_t 
> group_shift,
>uint64_t aplic_base, uint64_t imsic_base,
>uint64_t guest_num);
>  void riscv_kvm_aplic_request(void *opaque, int irq, int level);
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state);
>
>  #endif
> --
> 2.41.0
>

Re: [PATCH v4] target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

2023-09-18 Thread Alistair Francis

On Wed, Sep 13, 2023 at 7:14 PM liguang.zhang <18622748...@163.com> wrote:
>
> From: "liguang.zhang" 
>
> Fix the guest reboot error when using KVM
> There are two issues when rebooting a guest using KVM
> 1. When the guest initiates a reboot the host is unable to stop the vcpu
> 2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash
>
> This can be fixed by clearing the CSR values at reset and syncing the
> MPSTATE with the host.
>
> v4 update:
> rebase on riscv-to-apply

This should be below the line

>
> Signed-off-by: liguang.zhang 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/kvm.c   | 42 
>  target/riscv/kvm_riscv.h |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index c01cfb03f4..8ee410b9b1 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -51,6 +51,8 @@ void riscv_kvm_aplic_request(void *opaque, int irq, int 
> level)
>  kvm_set_irq(kvm_state, irq, !!level);
>  }
>
> +static bool cap_has_mp_state;
> +
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
>   uint64_t idx)
>  {
> @@ -797,6 +799,24 @@ int kvm_arch_get_registers(CPUState *cs)
>  return ret;
>  }
>
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state)
> +{
> +if (cap_has_mp_state) {
> +struct kvm_mp_state mp_state = {
> +.mp_state = state
> +};
> +
> +int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, _state);
> +if (ret) {
> +fprintf(stderr, "%s: failed to sync MP_STATE %d/%s\n",
> +__func__, ret, strerror(-ret));
> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_put_registers(CPUState *cs, int level)
>  {
>  int ret = 0;
> @@ -816,6 +836,18 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  return ret;
>  }
>
> +if (KVM_PUT_RESET_STATE == level) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +if (cs->cpu_index == 0) {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_RUNNABLE);
> +} else {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_STOPPED);
> +}
> +if (ret) {
> +return ret;
> +}
> +}
> +
>  return ret;
>  }
>
> @@ -928,6 +960,7 @@ int kvm_arch_get_default_type(MachineState *ms)
>
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> +cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  return 0;
>  }
>
> @@ -1014,10 +1047,19 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  if (!kvm_enabled()) {
>  return;
>  }
> +for (int i=0; i<32; i++)
> +env->gpr[i] = 0;
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
>  env->satp = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
> index de8c209ebc..8f8c1f969a 100644
> --- a/target/riscv/kvm_riscv.h
> +++ b/target/riscv/kvm_riscv.h
> @@ -27,5 +27,6 @@ void kvm_riscv_aia_create(MachineState *machine, uint64_t 
> group_shift,
>uint64_t aplic_base, uint64_t imsic_base,
>uint64_t guest_num);
>  void riscv_kvm_aplic_request(void *opaque, int irq, int level);
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state);
>
>  #endif
> --
> 2.41.0
>

Re: [PATCH] hw/i386/pc: fix max_used_gpa for 32-bit systems

2023-09-18 Thread Ani Sinha

On Mon, Sep 18, 2023 at 9:28 PM David Hildenbrand  wrote:
>
> On 18.09.23 17:56, Ani Sinha wrote:
> > On Mon, Sep 18, 2023 at 8:59 PM David Hildenbrand  wrote:
> >>
> >> On 18.09.23 17:22, Ani Sinha wrote:
> >>> On Mon, Sep 18, 2023 at 7:25 PM Ani Sinha  wrote:
> 
>  32-bit systems do not have a reserved memory for hole64 but they may 
>  have a
>  reserved memory space for memory hotplug. Since, hole64 starts after the
>  reserved hotplug memory, the unaligned hole64 start address gives us the
>  end address for this memory hotplug region that the processor may use.
>  Fix this. This ensures that the physical address space bound checking 
>  works
>  correctly for 32-bit systems as well.
> >>>
> >>> This patch breaks some unit tests. I am not sure why it did not catch
> >>> it when I tested it before sending.
> >>> Will have to resend after fixing the tests.
> >>
> >> Probably because they supply more memory than the system can actually
> >> handle? (e.g., -m 4g on 32bit)?
> >
> > cxl tests are failing for example.
> >
> > $ ./qemu-system-i386 -display none -machine q35,cxl=on
> > qemu-system-i386: Address space limit 0x < 0x1000f
> > phys-bits too low (32)

also another thing is:

./qemu-system-i386 -machine pc -m 128
works but ...

$ ./qemu-system-i386 -machine pc -m 128,slots=3,maxmem=1G
qemu-system-i386: Address space limit 0x < 0x1f7ff
phys-bits too low (32)

or

$ ./qemu-system-i386 -machine pc-i440fx-8.2 -accel kvm -m 128,slots=3,maxmem=1G
qemu-system-i386: Address space limit 0x < 0x1f7ff
phys-bits too low (32)

but of course after the compat knob older pc machines work fine using
the old logic :

$ ./qemu-system-i386 -machine pc-i440fx-8.1 -accel kvm -m 128,slots=3,maxmem=1G
VNC server running on ::1:5900
^Cqemu-system-i386: terminating on signal 2




>
> CXL with 32bit CPUs ... it might be reasonably to just disable such
> tests. Certainly does not exist in real HW ... :)
>
> --
> Cheers,
>
> David / dhildenb
>

Re: Call for agenda for 2023-09-19 QEMU developers call

2023-09-18 Thread Juan Quintela

Elena Ufimtseva  wrote:
> Hello Juan,
>
> Not sure if this is worth its own topic, would be it possible to hear
> the community thoughts on the live migration series review/pull
> progress (atomics, zero page multifd etc.. )? Seems like there are few
> outstanding relevant patches.

Hi

If everybody agrees, can we move this topic to next call?
I am on vacation this week and the next.

I was planning time to "moderate" the call, but preparing for a call
about my topics is going to mean a divorce O:-)

Later, Juan.

PD.  I have had too many problem in the recent past with several things,
 from my test machines to disappear (and configuring new ones taking
 forever), to very bad time with the BOTS.  I expect/hope that
 things are gonig to get better in the near future.

Re: Call for agenda for 2023-09-19 QEMU developers call

2023-09-18 Thread Juan Quintela

Mark Burton  wrote:
> Seems like we’ve had a bit of a ’slower’ time in recent weeks -
> presumably “summer time” - If I understand correctly, Linaro are not
> going toe preset this week?
> Maybe we should re-group in the next meeting, 
>
> So I’m happy to have the meeting tomorrow if Linaro can make it, otherwise 
> for 3rd Oct, I have:
>   I’d like to know where we are with the “Single Binary” work.
>   I think there is still outstanding questions on the whole
> “startup” subject (related, but not the same)
>   Alex and I have talked about the plugin-API to cover ‘icount’
> use cases, and that seems to be nudging open a pandora’s box, so I
> think we should discuss that?

As nobody else answered this, I guess we are not having this in the call
for this week.

Move it to next call.

Thanks, Juan.

[PATCH v3 qemu 3/3] dump: Add command interface for kdump-raw formats

2023-09-18 Thread Stephen Brennan

The QMP dump API represents the dump format as an enumeration. Add three
new enumerators, one for each supported kdump compression, each named
"kdump-raw-*".

For the HMP command line, rather than adding a new flag corresponding to
each format, it seems more human-friendly to add a single flag "-R" to
switch the kdump formats to "raw" mode. The choice of "-R" also
correlates nicely to the "makedumpfile -R" option, which would serve to
reassemble a flattened vmcore.

Signed-off-by: Stephen Brennan 
---
 dump/dump-hmp-cmds.c | 21 +
 dump/dump.c  | 31 ++-
 hmp-commands.hx  |  9 +++--
 qapi/dump.json   | 24 
 4 files changed, 74 insertions(+), 11 deletions(-)

diff --git a/dump/dump-hmp-cmds.c b/dump/dump-hmp-cmds.c
index b038785fee..b428ec33df 100644
--- a/dump/dump-hmp-cmds.c
+++ b/dump/dump-hmp-cmds.c
@@ -19,6 +19,7 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict)
 bool paging = qdict_get_try_bool(qdict, "paging", false);
 bool zlib = qdict_get_try_bool(qdict, "zlib", false);
 bool lzo = qdict_get_try_bool(qdict, "lzo", false);
+bool raw = qdict_get_try_bool(qdict, "raw", false);
 bool snappy = qdict_get_try_bool(qdict, "snappy", false);
 const char *file = qdict_get_str(qdict, "filename");
 bool has_begin = qdict_haskey(qdict, "begin");
@@ -40,16 +41,28 @@ void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict)
 dump_format = DUMP_GUEST_MEMORY_FORMAT_WIN_DMP;
 }
 
-if (zlib) {
-dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_ZLIB;
+if (zlib && raw) {
+if (raw) {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_ZLIB;
+} else {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_ZLIB;
+}
 }
 
 if (lzo) {
-dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_LZO;
+if (raw) {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_LZO;
+} else {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_LZO;
+}
 }
 
 if (snappy) {
-dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_SNAPPY;
+if (raw) {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_SNAPPY;
+} else {
+dump_format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_SNAPPY;
+}
 }
 
 if (has_begin) {
diff --git a/dump/dump.c b/dump/dump.c
index 10aa2c79e0..55cb6af20b 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -2090,6 +2090,7 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 int fd = -1;
 DumpState *s;
 bool detach_p = false;
+bool kdump_raw = false;
 
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 error_setg(errp, "Dump not allowed during incoming migration.");
@@ -2103,6 +2104,27 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 return;
 }
 
+/* externally, we represent kdump-raw-* as separate formats, but internally
+ * they are handled the same, except for the "raw" flag */
+if (has_format) {
+switch (format) {
+case DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_ZLIB:
+format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_ZLIB;
+kdump_raw = true;
+break;
+case DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_LZO:
+format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_LZO;
+kdump_raw = true;
+break;
+case DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_SNAPPY:
+format = DUMP_GUEST_MEMORY_FORMAT_KDUMP_SNAPPY;
+kdump_raw = true;
+break;
+default:
+break;
+}
+}
+
 /*
  * kdump-compressed format need the whole memory dumped, so paging or
  * filter is not supported here.
@@ -2166,6 +2188,10 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 error_setg(errp, QERR_INVALID_PARAMETER, "protocol");
 return;
 }
+if (kdump_raw && lseek(fd, 0, SEEK_CUR) == (loff_t) -1) {
+error_setg(errp, "kdump-raw formats require a seekable file");
+return;
+}
 
 if (!dump_migration_blocker) {
 error_setg(_migration_blocker,
@@ -2186,7 +2212,7 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 dump_state_prepare(s);
 
 dump_init(s, fd, has_format, format, paging, has_begin,
-  begin, length, false, errp);
+  begin, length, kdump_raw, errp);
 if (*errp) {
 qatomic_set(>status, DUMP_STATUS_FAILED);
 return;
@@ -2214,15 +2240,18 @@ DumpGuestMemoryCapability 
*qmp_query_dump_guest_memory_capability(Error **errp)
 
 /* kdump-zlib is always available */
 QAPI_LIST_APPEND(tail, DUMP_GUEST_MEMORY_FORMAT_KDUMP_ZLIB);
+QAPI_LIST_APPEND(tail, DUMP_GUEST_MEMORY_FORMAT_KDUMP_RAW_ZLIB);
 
 /* add new item if kdump-lzo is available */
 #ifdef CONFIG_LZO
 QAPI_LIST_APPEND(tail,

[PATCH v3 qemu 1/3] dump: Pass DumpState to write_ functions

2023-09-18 Thread Stephen Brennan

For the next patch, we need a reference to DumpState when writing data.

Signed-off-by: Stephen Brennan 
---
 dump/dump.c   | 40 
 include/sysemu/dump.h |  2 +-
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index d4ef713cd0..74071a1565 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -809,7 +809,7 @@ static void create_vmcore(DumpState *s, Error **errp)
 dump_end(s, errp);
 }
 
-static int write_start_flat_header(int fd)
+static int write_start_flat_header(DumpState *s)
 {
 MakedumpfileHeader *mh;
 int ret = 0;
@@ -824,7 +824,7 @@ static int write_start_flat_header(int fd)
 mh->version = cpu_to_be64(VERSION_FLAT_HEADER);
 
 size_t written_size;
-written_size = qemu_write_full(fd, mh, MAX_SIZE_MDF_HEADER);
+written_size = qemu_write_full(s->fd, mh, MAX_SIZE_MDF_HEADER);
 if (written_size != MAX_SIZE_MDF_HEADER) {
 ret = -1;
 }
@@ -833,7 +833,7 @@ static int write_start_flat_header(int fd)
 return ret;
 }
 
-static int write_end_flat_header(int fd)
+static int write_end_flat_header(DumpState *s)
 {
 MakedumpfileDataHeader mdh;
 
@@ -841,7 +841,7 @@ static int write_end_flat_header(int fd)
 mdh.buf_size = END_FLAG_FLAT_HEADER;
 
 size_t written_size;
-written_size = qemu_write_full(fd, , sizeof(mdh));
+written_size = qemu_write_full(s->fd, , sizeof(mdh));
 if (written_size != sizeof(mdh)) {
 return -1;
 }
@@ -849,7 +849,7 @@ static int write_end_flat_header(int fd)
 return 0;
 }
 
-static int write_buffer(int fd, off_t offset, const void *buf, size_t size)
+static int write_buffer(DumpState *s, off_t offset, const void *buf, size_t 
size)
 {
 size_t written_size;
 MakedumpfileDataHeader mdh;
@@ -857,12 +857,12 @@ static int write_buffer(int fd, off_t offset, const void 
*buf, size_t size)
 mdh.offset = cpu_to_be64(offset);
 mdh.buf_size = cpu_to_be64(size);
 
-written_size = qemu_write_full(fd, , sizeof(mdh));
+written_size = qemu_write_full(s->fd, , sizeof(mdh));
 if (written_size != sizeof(mdh)) {
 return -1;
 }
 
-written_size = qemu_write_full(fd, buf, size);
+written_size = qemu_write_full(s->fd, buf, size);
 if (written_size != size) {
 return -1;
 }
@@ -982,7 +982,7 @@ static void create_header32(DumpState *s, Error **errp)
 #endif
 dh->status = cpu_to_dump32(s, status);
 
-if (write_buffer(s->fd, 0, dh, size) < 0) {
+if (write_buffer(s, 0, dh, size) < 0) {
 error_setg(errp, "dump: failed to write disk dump header");
 goto out;
 }
@@ -1012,7 +1012,7 @@ static void create_header32(DumpState *s, Error **errp)
 kh->offset_note = cpu_to_dump64(s, offset_note);
 kh->note_size = cpu_to_dump32(s, s->note_size);
 
-if (write_buffer(s->fd, DISKDUMP_HEADER_BLOCKS *
+if (write_buffer(s, DISKDUMP_HEADER_BLOCKS *
  block_size, kh, size) < 0) {
 error_setg(errp, "dump: failed to write kdump sub header");
 goto out;
@@ -1027,7 +1027,7 @@ static void create_header32(DumpState *s, Error **errp)
 if (*errp) {
 goto out;
 }
-if (write_buffer(s->fd, offset_note, s->note_buf,
+if (write_buffer(s, offset_note, s->note_buf,
  s->note_size) < 0) {
 error_setg(errp, "dump: failed to write notes");
 goto out;
@@ -1093,7 +1093,7 @@ static void create_header64(DumpState *s, Error **errp)
 #endif
 dh->status = cpu_to_dump32(s, status);
 
-if (write_buffer(s->fd, 0, dh, size) < 0) {
+if (write_buffer(s, 0, dh, size) < 0) {
 error_setg(errp, "dump: failed to write disk dump header");
 goto out;
 }
@@ -1123,7 +1123,7 @@ static void create_header64(DumpState *s, Error **errp)
 kh->offset_note = cpu_to_dump64(s, offset_note);
 kh->note_size = cpu_to_dump64(s, s->note_size);
 
-if (write_buffer(s->fd, DISKDUMP_HEADER_BLOCKS *
+if (write_buffer(s, DISKDUMP_HEADER_BLOCKS *
  block_size, kh, size) < 0) {
 error_setg(errp, "dump: failed to write kdump sub header");
 goto out;
@@ -1139,7 +1139,7 @@ static void create_header64(DumpState *s, Error **errp)
 goto out;
 }
 
-if (write_buffer(s->fd, offset_note, s->note_buf,
+if (write_buffer(s, offset_note, s->note_buf,
  s->note_size) < 0) {
 error_setg(errp, "dump: failed to write notes");
 goto out;
@@ -1204,7 +1204,7 @@ static int set_dump_bitmap(uint64_t last_pfn, uint64_t 
pfn, bool value,
 while (old_offset < new_offset) {
 /* calculate the offset and write dump_bitmap */
 offset_bitmap1 = s->offset_dump_bitmap + old_offset;
-if (write_buffer(s->fd, offset_bitmap1, buf,
+if (write_buffer(s, offset_bitmap1, buf,
  bitmap_bufsize) < 0) {
 return -1;
 }
@@ -1212,7 +1212,7 @@ static int

[PATCH v3 qemu 0/3] Allow dump-guest-memory to output standard kdump format

2023-09-18 Thread Stephen Brennan

Hello all,

This is the third version of the kdump patch series, the first and
second revisions being visible at [1] and [2] respectively. You can see
the history and motivation for the patch series described in the cover
letter of [2].

Thank you for your continued feedback and review.
Stephen

Changes from v2 to v3:
- Rather than use "reassembled" flag in QMP API, represent each kdump
  format "kdump-X" with a new enumerator "kdump-raw-X". (The HMP
  interface retains the -R option)
- Return an error if the file descriptor passed in is not seekable, yet
  the requested dump format is kdump-raw-*

Changes from v1 to v2:
- Keep the default as the flattened format
- Add QMP / HMP arguments for "reassembled"

[1]: 
https://lore.kernel.org/qemu-devel/20230717163855.7383-1-stephen.s.bren...@oracle.com/
[2]: 
https://lore.kernel.org/qemu-devel/20230914010315.945705-1-stephen.s.bren...@oracle.com/

Stephen Brennan (3):
  dump: Pass DumpState to write_ functions
  dump: Allow directly outputting raw kdump format
  dump: Add command interface for kdump-raw formats

 dump/dump-hmp-cmds.c  | 21 +++--
 dump/dump.c   | 99 +++
 hmp-commands.hx   |  9 +++-
 include/sysemu/dump.h |  3 +-
 qapi/dump.json| 24 +--
 5 files changed, 119 insertions(+), 37 deletions(-)

-- 
2.39.3

[PATCH v3 qemu 2/3] dump: Allow directly outputting raw kdump format

2023-09-18 Thread Stephen Brennan

The flattened format (currently output by QEMU) is used by makedumpfile
only when it is outputting a vmcore to a file which is not seekable. The
flattened format functions essentially as a set of instructions of the
form "seek to the given offset, then write the given bytes out".

The flattened format can be reconstructed using makedumpfile -R, or
makedumpfile-R.pl, but it is a slow process because it requires copying
the entire vmcore. The flattened format can also be directly read by
crash, but still, it requires a lengthy reassembly phase.

To sum up, the flattened format is not an ideal one: it should only be
used on files which are actually not seekable. This is the exact
strategy which makedumpfile uses, as seen in the implementation of
"write_buffer()" in makedumpfile [1]. However, QEMU has always used the
flattened format. For compatibility it is best not to change the default
output format without warning. So, add a flag to DumpState which changes
the output to use the normal (i.e. raw) format. This flag will be added
to the QMP and HMP commands in the next change.

[1]: 
https://github.com/makedumpfile/makedumpfile/blob/f23bb943568188a2746dbf9b6692668f5a2ac3b6/makedumpfile.c#L5008-L5040

Signed-off-by: Stephen Brennan 
---
 dump/dump.c   | 32 +---
 include/sysemu/dump.h |  1 +
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/dump/dump.c b/dump/dump.c
index 74071a1565..10aa2c79e0 100644
--- a/dump/dump.c
+++ b/dump/dump.c
@@ -814,6 +814,10 @@ static int write_start_flat_header(DumpState *s)
 MakedumpfileHeader *mh;
 int ret = 0;
 
+if (s->kdump_raw) {
+return 0;
+}
+
 QEMU_BUILD_BUG_ON(sizeof *mh > MAX_SIZE_MDF_HEADER);
 mh = g_malloc0(MAX_SIZE_MDF_HEADER);
 
@@ -837,6 +841,10 @@ static int write_end_flat_header(DumpState *s)
 {
 MakedumpfileDataHeader mdh;
 
+if (s->kdump_raw) {
+return 0;
+}
+
 mdh.offset = END_FLAG_FLAT_HEADER;
 mdh.buf_size = END_FLAG_FLAT_HEADER;
 
@@ -853,13 +861,21 @@ static int write_buffer(DumpState *s, off_t offset, const 
void *buf, size_t size
 {
 size_t written_size;
 MakedumpfileDataHeader mdh;
+loff_t seek_loc;
 
-mdh.offset = cpu_to_be64(offset);
-mdh.buf_size = cpu_to_be64(size);
+if (s->kdump_raw) {
+seek_loc = lseek(s->fd, offset, SEEK_SET);
+if (seek_loc == (off_t) -1) {
+return -1;
+}
+} else {
+mdh.offset = cpu_to_be64(offset);
+mdh.buf_size = cpu_to_be64(size);
 
-written_size = qemu_write_full(s->fd, , sizeof(mdh));
-if (written_size != sizeof(mdh)) {
-return -1;
+written_size = qemu_write_full(s->fd, , sizeof(mdh));
+if (written_size != sizeof(mdh)) {
+return -1;
+}
 }
 
 written_size = qemu_write_full(s->fd, buf, size);
@@ -1775,7 +1791,8 @@ static void vmcoreinfo_update_phys_base(DumpState *s)
 
 static void dump_init(DumpState *s, int fd, bool has_format,
   DumpGuestMemoryFormat format, bool paging, bool 
has_filter,
-  int64_t begin, int64_t length, Error **errp)
+  int64_t begin, int64_t length, bool kdump_raw,
+  Error **errp)
 {
 ERRP_GUARD();
 VMCoreInfoState *vmci = vmcoreinfo_find();
@@ -1786,6 +1803,7 @@ static void dump_init(DumpState *s, int fd, bool 
has_format,
 s->has_format = has_format;
 s->format = format;
 s->written_size = 0;
+s->kdump_raw = kdump_raw;
 
 /* kdump-compressed is conflict with paging and filter */
 if (has_format && format != DUMP_GUEST_MEMORY_FORMAT_ELF) {
@@ -2168,7 +2186,7 @@ void qmp_dump_guest_memory(bool paging, const char *file,
 dump_state_prepare(s);
 
 dump_init(s, fd, has_format, format, paging, has_begin,
-  begin, length, errp);
+  begin, length, false, errp);
 if (*errp) {
 qatomic_set(>status, DUMP_STATUS_FAILED);
 return;
diff --git a/include/sysemu/dump.h b/include/sysemu/dump.h
index e27af8fb34..d702854853 100644
--- a/include/sysemu/dump.h
+++ b/include/sysemu/dump.h
@@ -157,6 +157,7 @@ typedef struct DumpState {
 MemoryMappingList list;
 bool resume;
 bool detached;
+bool kdump_raw;
 hwaddr memory_offset;
 int fd;
 
-- 
2.39.3

Re: [PATCH v4 2/3] i386: Explicitly ignore unsupported BUS_MCEERR_AO MCE on AMD guest

2023-09-18 Thread William Roche


Hi John,

I'd like to put the emphasis on the fact that ignoring the SRAO error
for a VM is a real problem at least for a specific (rare) case I'm
currently working on: The VM migration.

Context:

- In the case of a poisoned page in the VM address space, the migration
can't read it and will skip this page, considering it as a zero-filled
page. The VM kernel (that handled the vMCE) would have marked it's
associated page as poisoned, and if the VM touches the page, the VM
kernel generates the associated MCE because it already knows about the
poisoned page.

- When we ignore the vMCE in the case of a SIGBUS/BUS_MCEERR_AO error
(what this patch does), we entirely rely on the Hypervisor to send an
SRAR error to qemu when the page is touched: The AMD VM kernel will
receive the SIGBUS/BUS_MCEERR_AR and deal with it, thanks to your
changes here.

So it looks like the mechanism works fine... unless the VM has migrated
between the SRAO error and the first time it really touches the poisoned
page to get an SRAR error !  In this case, its new address space
(created on the migration destination) will have a zero-page where we
had a poisoned page, and the AMD VM Kernel (that never dealt with the
SRAO) doesn't know about the poisoned page and will access the page
finding only zeros...  We have a memory corruption !

It is a very rare window, but in order to fix it the most reasonable
course of action would be to make the AMD emulation deal with SRAO
errors, instead of ignoring them.

Do you agree with my analysis ?
Would an AMD platform generate SRAO signal to a process
(SIGBUS/BUS_MCEERR_AO) in case of a real hardware error ?

Thanks,
William.

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Warner Losh

On Mon, Sep 18, 2023 at 4:04 PM Jonathan Cameron via 
wrote:

> This has been missing from the start. Assume it should match
> with cxl/cxl-component-utils.c as both were part of early
> postings from Ben.
>
> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/mem/cxl_type3.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c5855d4e7d..ad3f0f6a9d 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1,3 +1,12 @@
> +/*
> + * CXL Type 3 (memory expander) device
> + *
> + * Copyright(C) 2020 Intel Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> the
> + * COPYING file in the top-level directory.
> + */
>

SPDX-License-Identifier: GPL-v2-only

while you're at it (plus a +1 on the other concerns in the thread, though
I'll let
that play out elsewhere).

Warner


> +
>  #include "qemu/osdep.h"
>  #include "qemu/units.h"
>  #include "qemu/error-report.h"
> --
> 2.39.2
>
>
>

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Ira Weiny

Jonathan Cameron wrote:
> On Mon, 18 Sep 2023 17:31:38 +0100
> Peter Maydell  wrote:
> 
> > On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
> >  wrote:
> > >
> > > This has been missing from the start. Assume it should match
> > > with cxl/cxl-component-utils.c as both were part of early
> > > postings from Ben.  
> > 
> > Sounds plausible -- is there an Intel person who could give us
> > an acked-by for this?

While we are at it; what about .../hw/mem/cxl_type3_stubs.c?

> > 
> > (Ideally we wouldn't have let more gpl-2-only code into the
> > codebase without a rationale...)

I'm curious about this statement.  Does the qemu project not want gpl v2
only code?  I agree with Jonathan that this is the intention of Ben's
initial submission; so from that PoV.

Acked-by: Ira Weiny 

Going forward I'd like to better understand the qemu communities view.

Thanks,
Ira

> > 
> 
> I've +CC'd the kernel CXL maintainers from Intel a few of whom
> have also contributed some of the QEMU CXL code.
> Hopefully someone can ack.
> 
> > > Suggested-by: Philippe Mathieu-Daudé 
> > > Signed-off-by: Jonathan Cameron 

> > > ---
> > >  hw/mem/cxl_type3.c | 9 +
> > >  1 file changed, 9 insertions(+)
> > >
> > > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > > index c5855d4e7d..ad3f0f6a9d 100644
> > > --- a/hw/mem/cxl_type3.c
> > > +++ b/hw/mem/cxl_type3.c
> > > @@ -1,3 +1,12 @@
> > > +/*
> > > + * CXL Type 3 (memory expander) device
> > > + *
> > > + * Copyright(C) 2020 Intel Corporation.
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2. See 
> > > the
> > > + * COPYING file in the top-level directory.
> > > + */
> > > +
> > >  #include "qemu/osdep.h"
> > >  #include "qemu/units.h"
> > >  #include "qemu/error-report.h"  
> > 
> > -- PMM
> > 
>

Re: Call for agenda for 2023-09-19 QEMU developers call

2023-09-18 Thread Elena Ufimtseva

Hello Juan,

Not sure if this is worth its own topic, would be it possible to hear the 
community thoughts on the live migration series review/pull progress (atomics, 
zero page multifd etc.. )? Seems like there are few outstanding relevant 
patches.


Thank you!

From: Juan Quintela 
Sent: Friday, September 15, 2023 1:45 AM
To: f4...@amsat.org ; Joao Martins 
; Juan Quintela ; 
md...@redhat.com ; fel...@nutanix.com ; 
afaer...@suse.de ; bazu...@redhat.com ; 
bbau...@redhat.com ; c...@f00f.org ; 
dustin.kirkl...@canonical.com ; 
ebl...@redhat.com ; edgar.igles...@gmail.com 
; eric.au...@redhat.com ; 
i...@theiggy.com ; jan.kis...@web.de ; 
jidong.x...@gmail.com ; jjhe...@linux.vnet.ibm.com 
; m...@linux.vnet.ibm.com 
; peter.mayd...@linaro.org ; 
richard.hender...@linaro.org ; stefa...@gmail.com 
; i...@bsdimp.com ; z@139.com 
; zwu.ker...@gmail.com ; j...@nvidia.com 
; c...@nvidia.com ; David Edmondson 
; Elena Ufimtseva ; 
Konrad Wilk ; Alessandro Di Federico ; 
a...@rev.ng ; shameerali.kolothum.th...@huawei.com 
; wei.w.w...@intel.com 
; chao.p.p...@linux.intel.com 
; qemu-devel@nongnu.org ; 
Mark Burton 
Subject: Call for agenda for 2023-09-19 QEMU developers call

Hi

If you have any topics, please add to this email.

Thanks, Juan.

Re: [PULL 00/17] Net patches

2023-09-18 Thread Ilya Maximets

On 9/14/23 10:13, Daniel P. Berrangé wrote:
> On Wed, Sep 13, 2023 at 08:46:42PM +0200, Ilya Maximets wrote:
>> On 9/8/23 16:15, Daniel P. Berrangé wrote:
>>> On Fri, Sep 08, 2023 at 04:06:35PM +0200, Ilya Maximets wrote:
 On 9/8/23 14:15, Daniel P. Berrangé wrote:
> On Fri, Sep 08, 2023 at 02:00:47PM +0200, Ilya Maximets wrote:
>> On 9/8/23 13:49, Daniel P. Berrangé wrote:
>>> On Fri, Sep 08, 2023 at 01:34:54PM +0200, Ilya Maximets wrote:
 On 9/8/23 13:19, Stefan Hajnoczi wrote:
> Hi Ilya and Jason,
> There is a CI failure related to a missing Debian libxdp-dev package:
> https://gitlab.com/qemu-project/qemu/-/jobs/5046139967
>
> I think the issue is that the debian-amd64 container image that QEMU
> uses for testing is based on Debian 11 ("bullseye" aka "oldstable")
> and libxdp is not available on that release:
> https://packages.debian.org/search?keywords=libxdp=names=oldstable=all

 Hmm.  Sorry about that.

>
> If we need to support Debian 11 CI then either XDP could be disabled
> for that distro or libxdp could be compiled from source.

 I'd suggest we just remove the attempt to install the package for now,
 building libxdp from sources may be a little painful to maintain.

 Can be re-added later once distributions with libxdp 1.4+ will be more
 widely available, i.e. when fedora dockerfile will be updated to 39,
 for example.  That should be soon-ish, right?
>>>
>>> If you follow the process in docs/devel/testing.rst for adding
>>> libxdp in libvirt-ci, then lcitool will "do the right thing"
>>> when we move the auto-generated dockerfiles to new distro versions.
>>
>> Thanks!  I'll prepare changes for libvirt-ci.
>>
>> In the meantime, none of the currently tested images will have a required
>> version of libxdp anyway, so I'm suggesting to just drop this one 
>> dockerfile
>> modification from the patch.  What do you think?
>
> Sure, if none of the distros have it, then lcitool won't emit the
> dockerfile changes until we update the inherited distro version.
> So it is sufficient to just update libvirt-ci.git with the mappings.yml
> info for libxdp, and add 'libxdp' to the tests/lcitool/projects/qemu.yml
> file in qemu.git. It will then 'just work' when someone updates the
> distro versions later.

 I posted an MR for libvirt-ci adding libxdp:
   https://gitlab.com/libvirt/libvirt-ci/-/merge_requests/429

 Please, take a look.

 The docs say that CI will try to build containers with the MR changes,
 but I don't think anything except sanity checks is actually tested on MR.
 Sorry if I missed something, never used GitLab pipelines before.
>>>
>>> No, that's our fault - we've broken the CI and your change alerted
>>> me to that fact :-)
>>>
 Note that with this update we will be installing older version of libxdp
 in many containers, even though they will not be used by QEMU, unless
 they are newer than 1.4.0.
>>>
>>> No problem, as it means QEMU CI will demonstrate the the meson.build
>>> change is ignoring the outdatd libxdp.
>>>
 tests/lcitool/projects/qemu.yml in qemu.git cannot be updated without
 updating a submodule after the MR merge.
>>>
>>> Yep.
>>
>> Since all the required changes went into libvirt-ci project, I posted an
>> updated patch set named:
>>
>>   '[PATCH v4 0/2] net: add initial support for AF_XDP network backend'
>>
>> Please, take a look.
>>
>> This should fix the CI issues, though I'm not sure how to run QEMU gitlab
>> pipelines myself, so I didn't actually test all the images.
> 
>   git push gitlab  -o ci.variable=QEMU_CI=2
> 
> will create pipeline and immediately run all jobs.

Thanks!  That worked.  Though I wasn't able to test much anyway as
this thing burned through all my free compute credits less than
half way through the pipeline. :D

So, AFAIU, it's not something an occasional contributor like me can
use, unless they are spending their own money.


> 
> Replace 'gitlab' with the name of the git remote pointing to your
> gitlab fork of QEMU.
> 
> Using QEMU_CI=1 will create pipeline, but let you manually start
> individual jobs from the web UI.
> 
> For further details see docs/devel/ci-jobs.rst.inc
> 
> 
>>
>> Sent as a patch set because the libvirt-ci submodule bump brings in a few
>> unrelated changes.  So, I split that into a separate patch.
> 
> Yep, that's perfect thanks.
> 
> With regards,
> Daniel

Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until
> they find on that works.  If none works, they return the first Error
> set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one.
>
> qemu_rdma_broken_ipv6_kernel() neglects to set an Error when
> ibv_open_device() fails.  If a later address fails differently, we use
> that Error instead, or else the generic one.  Harmless enough, but
> needs fixing all the same.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> QIOChannelClass methods qio_channel_rdma_readv() and
> qio_channel_rdma_writev() violate their method contract when
> rdma->error_state is non-zero:
>
> 1. They return whatever is in rdma->error_state then.  Only -1 will be
>fine.  -2 will be misinterpreted as "would block".  Anything less
>than -2 isn't defined in the contract.  A positive value would be
>misinterpreted as success, but I believe that's not actually
>possible.
>
> 2. They neglect to set an error then.  If something up the call stack
>dereferences the error when failure is returned, it will crash.  If
>it ignores the return value and checks the error instead, it will
>miss the error.
>
> Crap like this happens when return statements hide in macros,
> especially when their uses are far away from the definition.
>
> I elected not to investigate how callers are impacted.
>
> Expand the two bad macro uses, so we can set an error and return -1.
> The next commit will then get rid of the macro altogether.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> Hiding return statements in macros is a bad idea.  Use a function
> instead, and open code the return part.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PULL 00/28] Block layer patches

2023-09-18 Thread Stefan Hajnoczi

Hi Kevin,
I believe that my own commit "block-coroutine-wrapper: use
qemu_get_current_aio_context()" breaks this test. The failure is
non-deterministic (happens about 1 out of 4 runs).

It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).

I haven't debugged it yet but wanted to share this information to save
some time. Tomorrow I'll investigate further.

Stefan

Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> Several error messages include numeric error codes returned by failed
> functions:
>
> * ibv_poll_cq() returns an unspecified negative value.  Useless.
>
> * rdma_accept and rmda_get_cm_event() return -1.  Useless.
>
> * qemu_rdma_poll() returns either -1 or an unspecified negative
>   value.  Useless.
>
> * qemu_rdma_block_for_wrid(), qemu_rdma_write_flush(),
>   qemu_rdma_exchange_send(), qemu_rdma_exchange_recv(),
>   qemu_rdma_write() return a negative value that may or may not be an
>   errno value.  While reporting human-readable errno
>   information (which a number is not) can be useful, reporting an
>   error code that may or may not be an errno value is useless.
>
> Drop these error codes from the error messages.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Dave Jiang




On 9/18/23 10:00, Jonathan Cameron wrote:
> On Mon, 18 Sep 2023 17:31:38 +0100
> Peter Maydell  wrote:
> 
>> On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
>>  wrote:
>>>
>>> This has been missing from the start. Assume it should match
>>> with cxl/cxl-component-utils.c as both were part of early
>>> postings from Ben.  
>>
>> Sounds plausible -- is there an Intel person who could give us
>> an acked-by for this?
>>
>> (Ideally we wouldn't have let more gpl-2-only code into the
>> codebase without a rationale...)
>>
> 
> I've +CC'd the kernel CXL maintainers from Intel a few of whom
> have also contributed some of the QEMU CXL code.
> Hopefully someone can ack.
> 
>>> Suggested-by: Philippe Mathieu-Daudé 
>>> Signed-off-by: Jonathan Cameron 

Acked-by: Dave Jiang 

>>> ---
>>>  hw/mem/cxl_type3.c | 9 +
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
>>> index c5855d4e7d..ad3f0f6a9d 100644
>>> --- a/hw/mem/cxl_type3.c
>>> +++ b/hw/mem/cxl_type3.c
>>> @@ -1,3 +1,12 @@
>>> +/*
>>> + * CXL Type 3 (memory expander) device
>>> + *
>>> + * Copyright(C) 2020 Intel Corporation.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2. See the
>>> + * COPYING file in the top-level directory.
>>> + */
>>> +
>>>  #include "qemu/osdep.h"
>>>  #include "qemu/units.h"
>>>  #include "qemu/error-report.h"  
>>
>> -- PMM
>>
>

[PATCH 12/22] parallels: collect bitmap of used clusters at open

2023-09-18 Thread Denis V. Lunev

If the operation is failed, we need to check image consistency if the
problem is not about memory allocation.

Bitmap adjustments in allocate_cluster are not performed yet.
They worth to be separate. This was proven useful during debug of this
series. Kept as is for future bissecting.

It should be specifically noted that used bitmap must be recalculated
if data_off has been fixed during image consistency check.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 73 +++
 block/parallels.h |  3 ++
 2 files changed, 76 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 2b5f2b54a0..c41511398f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -193,6 +193,58 @@ static int mark_used(BlockDriverState *bs,
 return 0;
 }
 
+/*
+ * Collect used bitmap. The image can contain errors, we should fill the
+ * bitmap anyway, as much as we can. This information will be used for
+ * error resolution.
+ */
+static int parallels_fill_used_bitmap(BlockDriverState *bs)
+{
+BDRVParallelsState *s = bs->opaque;
+int64_t payload_bytes;
+uint32_t i;
+int err = 0;
+
+payload_bytes = bdrv_co_getlength(bs->file->bs);
+if (payload_bytes < 0) {
+return payload_bytes;
+}
+payload_bytes -= s->data_start * BDRV_SECTOR_SIZE;
+if (payload_bytes < 0) {
+return -EINVAL;
+}
+
+s->used_bmap_size = DIV_ROUND_UP(payload_bytes, s->cluster_size);
+if (s->used_bmap_size == 0) {
+return 0;
+}
+s->used_bmap = bitmap_try_new(s->used_bmap_size);
+if (s->used_bmap == NULL) {
+return -ENOMEM;
+}
+
+for (i = 0; i < s->bat_size; i++) {
+int err2;
+int64_t host_off = bat2sect(s, i) << BDRV_SECTOR_BITS;
+if (host_off == 0) {
+continue;
+}
+
+err2 = mark_used(bs, s->used_bmap, s->used_bmap_size, host_off);
+if (err2 < 0 && err == 0) {
+err = err2;
+}
+}
+return err;
+}
+
+static void parallels_free_used_bitmap(BlockDriverState *bs)
+{
+BDRVParallelsState *s = bs->opaque;
+s->used_bmap_size = 0;
+g_free(s->used_bmap);
+}
+
 static int64_t coroutine_fn GRAPH_RDLOCK
 allocate_clusters(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, int *pnum)
@@ -530,8 +582,17 @@ parallels_check_data_off(BlockDriverState *bs, 
BdrvCheckResult *res,
 
 res->corruptions++;
 if (fix & BDRV_FIX_ERRORS) {
+int err;
 s->header->data_off = cpu_to_le32(data_off);
 s->data_start = data_off;
+
+parallels_free_used_bitmap(bs);
+err = parallels_fill_used_bitmap(bs);
+if (err == -ENOMEM) {
+res->check_errors++;
+return err;
+}
+
 res->corruptions_fixed++;
 }
 
@@ -1222,6 +1283,14 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 need_check = need_check || s->data_end > file_nb_sectors;
 
+if (!need_check) {
+ret = parallels_fill_used_bitmap(bs);
+if (ret == -ENOMEM) {
+goto fail;
+}
+need_check = need_check || ret < 0; /* These are correctable errors */
+}
+
 /*
  * We don't repair the image here if it's opened for checks. Also we don't
  * want to change inactive images and can't change readonly images.
@@ -1251,6 +1320,8 @@ fail:
  * "s" object was allocated by g_malloc0 so we can safely
  * try to free its fields even they were not allocated.
  */
+parallels_free_used_bitmap(bs);
+
 error_free(s->migration_blocker);
 g_free(s->bat_dirty_bmap);
 qemu_vfree(s->header);
@@ -1271,6 +1342,8 @@ static void parallels_close(BlockDriverState *bs)
   PREALLOC_MODE_OFF, 0, NULL);
 }
 
+parallels_free_used_bitmap(bs);
+
 g_free(s->bat_dirty_bmap);
 qemu_vfree(s->header);
 
diff --git a/block/parallels.h b/block/parallels.h
index 4e53e9572d..6b199443cf 100644
--- a/block/parallels.h
+++ b/block/parallels.h
@@ -72,6 +72,9 @@ typedef struct BDRVParallelsState {
 unsigned long *bat_dirty_bmap;
 unsigned int  bat_dirty_block;
 
+unsigned long *used_bmap;
+unsigned long used_bmap_size;
+
 uint32_t *bat_bitmap;
 unsigned int bat_size;
 
-- 
2.34.1

Re: [PATCH 3/3] tests: extend test 131 to cover availability of the write-zeroes

2023-09-18 Thread Denis V. Lunev


On 9/18/23 20:00, Denis V. Lunev wrote:

This patch contains test which minimally tests write-zeroes on top of
working discard.

The following checks are added:
* write 2 clusters, write-zero to the first allocated cluster
* write 2 cluster, write-zero to the half the first allocated cluster

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
  tests/qemu-iotests/131 | 20 
  tests/qemu-iotests/131.out | 20 
  2 files changed, 40 insertions(+)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index e50a658f22..308732d84b 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -105,6 +105,26 @@ _make_test_img $size
  { $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
  { $QEMU_IO -c "read -P 0 $((CLUSTER_SIZE + CLUSTER_HALF_SIZE)) $CLUSTER_DBL_SIZE" 
"$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
  
+echo "== check write-zeroes =="

+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_DBL_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io 
| _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check cluster-partial write-zeroes =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
  echo "== allocate with backing =="
  # Verify that allocating clusters works fine even when there is a backing 
image.
  # Regression test for a bug where we would pass a buffer read from the backing
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index 9882f9df6c..8493561bab 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -64,6 +64,26 @@ read 524288/524288 bytes at offset 0
  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  read 2097152/2097152 bytes at offset 1572864
  2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0x100x10TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check cluster-partial write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  == allocate with backing ==
  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
  Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864

This patch is actually patch 22, please disregard it.

Den

[PATCH 16/22] parallels: update used bitmap in allocate_cluster

2023-09-18 Thread Denis V. Lunev

We should extend the bitmap if the file is extended and set the bit in
the image used bitmap once the cluster is allocated. Sanity check at
that moment also looks like a good idea.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index b6505fcc5b..3beb18e44f 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -282,6 +282,8 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 return len;
 }
 if (s->data_end + space > (len >> BDRV_SECTOR_BITS)) {
+uint32_t new_usedsize;
+
 space += s->prealloc_size;
 /*
  * We require the expanded size to read back as zero. If the
@@ -305,6 +307,12 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 if (ret < 0) {
 return ret;
 }
+
+new_usedsize = s->used_bmap_size +
+   (space << BDRV_SECTOR_BITS) / s->cluster_size;
+s->used_bmap = bitmap_zero_extend(s->used_bmap, s->used_bmap_size,
+  new_usedsize);
+s->used_bmap_size = new_usedsize;
 }
 
 /*
@@ -336,6 +344,12 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 }
 }
 
+ret = mark_used(bs, s->used_bmap, s->used_bmap_size,
+s->data_end << BDRV_SECTOR_BITS, to_allocate);
+if (ret < 0) {
+/* Image consistency is broken. Alarm! */
+return ret;
+}
 for (i = 0; i < to_allocate; i++) {
 parallels_set_bat_entry(s, idx + i, s->data_end / s->off_multiplier);
 s->data_end += s->tracks;
-- 
2.34.1

[PATCH 09/22] tests: ensure that image validation will not cure the corruption

2023-09-18 Thread Denis V. Lunev

Since
commit cfce1091d55322789582480798a891cbaf66924e
Author: Alexander Ivanov 
Date:   Tue Jul 18 12:44:29 2023 +0200
parallels: Image repairing in parallels_open()
there is a potential pit fall with calling
qemu-io -c "read"
The image is opened in read-write mode and thus could be potentially
repaired. This could ruin testing process.

The patch forces read-only opening for reads. In that case repairing
is impossible.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index a7a1b357b5..5917ee079d 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -91,7 +91,7 @@ file_size=`stat --printf="%s" "$TEST_IMG"`
 echo "file size: $file_size"
 
 echo "== check last cluster =="
-{ $QEMU_IO -c "read -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -r -c "read -P 0x11 $LAST_CLUSTER_OFF $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
 # Clear image
 _make_test_img $SIZE
@@ -105,19 +105,20 @@ echo "== write another pattern to second cluster =="
 { $QEMU_IO -c "write -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 
| _filter_qemu_io | _filter_testdir
 
 echo "== check second cluster =="
-{ $QEMU_IO -c "read -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -r -c "read -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
 
 echo "== corrupt image =="
 poke_file "$TEST_IMG" "$(($BAT_OFFSET + 4))" "\x01\x00\x00\x00"
 
 echo "== check second cluster =="
-{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
 echo "== repair image =="
 _check_test_img -r all
 
 echo "== check second cluster =="
-{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
 echo "== check first cluster on host =="
 printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
-- 
2.34.1

[PATCH 22/22] tests: extend test 131 to cover availability of the write-zeroes

2023-09-18 Thread Denis V. Lunev

This patch contains test which minimally tests write-zeroes on top of
working discard.

The following checks are added:
* write 2 clusters, write-zero to the first allocated cluster
* write 2 cluster, write-zero to the half the first allocated cluster

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/131 | 21 +
 tests/qemu-iotests/131.out | 22 ++
 2 files changed, 43 insertions(+)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index 324008b3f6..3119100e78 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -105,6 +105,27 @@ _make_test_img $size
 { $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 { $QEMU_IO -c "read -P 0 $((CLUSTER_SIZE + CLUSTER_HALF_SIZE)) 
$CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
+echo "== check write-zeroes =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_DBL_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io 
| _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check cluster-partial write-zeroes =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_HALF_SIZE" 
"$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+
 echo "== allocate with backing =="
 # Verify that allocating clusters works fine even when there is a backing 
image.
 # Regression test for a bug where we would pass a buffer read from the backing
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index 27df91ca97..02aa55abf1 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -64,6 +64,28 @@ read 524288/524288 bytes at offset 0
 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 524288/524288 bytes at offset 1572864
 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x20TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check cluster-partial write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == allocate with backing ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
-- 
2.34.1

[PATCH 02/22] parallels: mark driver as supporting CBT

2023-09-18 Thread Denis V. Lunev

Parallels driver indeed support Parallels Dirty Bitmap Feature in
read-only mode. The patch adds bdrv_supports_persistent_dirty_bitmap()
callback which always return 1 to indicate that.

This will allow to copy CBT from Parallels image with qemu-img.

Note: read-write support is signalled through
bdrv_co_can_store_new_dirty_bitmap() and is different.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 2ebd8e1301..428f72de1c 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1248,6 +1248,11 @@ static void parallels_close(BlockDriverState *bs)
 error_free(s->migration_blocker);
 }
 
+static bool parallels_is_support_dirty_bitmaps(BlockDriverState *bs)
+{
+return 1;
+}
+
 static BlockDriver bdrv_parallels = {
 .format_name= "parallels",
 .instance_size  = sizeof(BDRVParallelsState),
@@ -1256,6 +1261,7 @@ static BlockDriver bdrv_parallels = {
 .supports_backing   = true,
 
 .bdrv_has_zero_init = bdrv_has_zero_init_1,
+.bdrv_supports_persistent_dirty_bitmap = 
parallels_is_support_dirty_bitmaps,
 
 .bdrv_probe = parallels_probe,
 .bdrv_open  = parallels_open,
-- 
2.34.1

[PATCH 13/22] tests: fix broken deduplication check in parallels format test

2023-09-18 Thread Denis V. Lunev

Original check is broken as supposed reading from 2 different clusters
results in read from the same file offset twice. This is definitely
wrong.

We should be sure that
* the content of both clusters is correct after repair
* clusters are at the different offsets after repair
In order to check the latter we write some content into the first one
and validate that fact.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 14 ++
 tests/qemu-iotests/tests/parallels-checks.out | 16 
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index f4ca50295e..df99558486 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -117,14 +117,20 @@ echo "== check second cluster =="
 echo "== repair image =="
 _check_test_img -r all
 
+echo "== check the first cluster =="
+{ $QEMU_IO -r -c "read -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
 echo "== check second cluster =="
 { $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
-echo "== check first cluster on host =="
-printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
+echo "== write another pattern to the first clusters =="
+{ $QEMU_IO -c "write -P 0x66 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check the first cluster =="
+{ $QEMU_IO -r -c "read -P 0x66 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
-echo "== check second cluster on host =="
-printf "content: 0x%02x\n" `peek_file_le $TEST_IMG $(($CLUSTER_SIZE)) 1`
+echo "== check the second cluster (deduplicated) =="
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
 
 # Clear image
 _make_test_img $SIZE
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index 74a5e29260..1325d2b611 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -55,13 +55,21 @@ The following inconsistencies were found and repaired:
 
 Double checking the fixed image now...
 No errors were found on the image.
+== check the first cluster ==
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == check second cluster ==
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-== check first cluster on host ==
-content: 0x11
-== check second cluster on host ==
-content: 0x11
+== write another pattern to the first clusters ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check the first cluster ==
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check the second cluster (deduplicated) ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
 == TEST DATA_OFF CHECK ==
 == write pattern to first cluster ==
-- 
2.34.1

[PATCH 04/22] parallels: invent parallels_opts_prealloc() helper to parse prealloc opts

2023-09-18 Thread Denis V. Lunev

This patch creates above mentioned helper and moves its usage to the
beginning of parallels_open(). This simplifies parallels_open() a bit.

The patch also ensures that we store prealloc_size on block driver state
always in sectors. This makes code cleaner and avoids wrong opinion at
the assignment that the value is in bytes.

Signed-off-by: Denis V. Lunev 
---
 block/parallels.c | 72 +--
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index af7be427c9..ae006e7fc7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1025,6 +1025,44 @@ static int parallels_update_header(BlockDriverState *bs)
 return bdrv_pwrite_sync(bs->file, 0, size, s->header, 0);
 }
 
+
+static int parallels_opts_prealloc(BlockDriverState *bs, QDict *options,
+   Error **errp)
+{
+int err;
+char *buf;
+int64_t bytes;
+BDRVParallelsState *s = bs->opaque;
+Error *local_err = NULL;
+QemuOpts *opts = qemu_opts_create(_runtime_opts, NULL, 0, errp);
+if (!opts) {
+return -ENOMEM;
+}
+
+err = -EINVAL;
+if (!qemu_opts_absorb_qdict(opts, options, errp)) {
+goto done;
+}
+
+bytes = qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
+s->prealloc_size = bytes >> BDRV_SECTOR_BITS;
+buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
+/* prealloc_mode can be downgraded later during allocate_clusters */
+s->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
+   PRL_PREALLOC_MODE_FALLOCATE,
+   _err);
+g_free(buf);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+goto done;
+}
+err = 0;
+
+done:
+qemu_opts_del(opts);
+return err;
+}
+
 static int parallels_open(BlockDriverState *bs, QDict *options, int flags,
   Error **errp)
 {
@@ -1033,11 +1071,13 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret, size, i;
 int64_t file_nb_sectors, sector;
 uint32_t data_start;
-QemuOpts *opts = NULL;
-Error *local_err = NULL;
-char *buf;
 bool data_off_is_correct;
 
+ret = parallels_opts_prealloc(bs, options, errp);
+if (ret < 0) {
+return ret;
+}
+
 ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
 if (ret < 0) {
 return ret;
@@ -1078,6 +1118,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 ret = -EFBIG;
 goto fail;
 }
+s->prealloc_size = MAX(s->tracks, s->prealloc_size);
 s->cluster_size = s->tracks << BDRV_SECTOR_BITS;
 
 s->bat_size = le32_to_cpu(ph.bat_entries);
@@ -1117,29 +1158,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->header_size = size;
 }
 
-opts = qemu_opts_create(_runtime_opts, NULL, 0, errp);
-if (!opts) {
-goto fail_options;
-}
-
-if (!qemu_opts_absorb_qdict(opts, options, errp)) {
-goto fail_options;
-}
-
-s->prealloc_size =
-qemu_opt_get_size_del(opts, PARALLELS_OPT_PREALLOC_SIZE, 0);
-s->prealloc_size = MAX(s->tracks, s->prealloc_size >> BDRV_SECTOR_BITS);
-buf = qemu_opt_get_del(opts, PARALLELS_OPT_PREALLOC_MODE);
-/* prealloc_mode can be downgraded later during allocate_clusters */
-s->prealloc_mode = qapi_enum_parse(_mode_lookup, buf,
-   PRL_PREALLOC_MODE_FALLOCATE,
-   _err);
-g_free(buf);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-goto fail_options;
-}
-
 if (ph.ext_off) {
 if (flags & BDRV_O_RDWR) {
 /*
@@ -1214,10 +1232,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 fail_format:
 error_setg(errp, "Image not in Parallels format");
-fail_options:
 ret = -EINVAL;
 fail:
-qemu_opts_del(opts);
 /*
  * "s" object was allocated by g_malloc0 so we can safely
  * try to free its fields even they were not allocated.
-- 
2.34.1

[PATCH 01/22] parallels: fix formatting in bdrv_parallels initialization

2023-09-18 Thread Denis V. Lunev

Old code is ugly and contains tabulations. There are no functional
changes in this patch.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 48c32d6821..2ebd8e1301 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1249,23 +1249,25 @@ static void parallels_close(BlockDriverState *bs)
 }
 
 static BlockDriver bdrv_parallels = {
-.format_name   = "parallels",
-.instance_size = sizeof(BDRVParallelsState),
-.bdrv_probe= parallels_probe,
-.bdrv_open = parallels_open,
-.bdrv_close= parallels_close,
-.bdrv_child_perm  = bdrv_default_perms,
-.bdrv_co_block_status = parallels_co_block_status,
-.bdrv_has_zero_init   = bdrv_has_zero_init_1,
-.bdrv_co_flush_to_os  = parallels_co_flush_to_os,
-.bdrv_co_readv  = parallels_co_readv,
-.bdrv_co_writev = parallels_co_writev,
-.is_format  = true,
-.supports_backing = true,
-.bdrv_co_create  = parallels_co_create,
-.bdrv_co_create_opts = parallels_co_create_opts,
-.bdrv_co_check  = parallels_co_check,
-.create_opts= _create_opts,
+.format_name= "parallels",
+.instance_size  = sizeof(BDRVParallelsState),
+.create_opts= _create_opts,
+.is_format  = true,
+.supports_backing   = true,
+
+.bdrv_has_zero_init = bdrv_has_zero_init_1,
+
+.bdrv_probe = parallels_probe,
+.bdrv_open  = parallels_open,
+.bdrv_close = parallels_close,
+.bdrv_child_perm= bdrv_default_perms,
+.bdrv_co_block_status   = parallels_co_block_status,
+.bdrv_co_flush_to_os= parallels_co_flush_to_os,
+.bdrv_co_readv  = parallels_co_readv,
+.bdrv_co_writev = parallels_co_writev,
+.bdrv_co_create = parallels_co_create,
+.bdrv_co_create_opts= parallels_co_create_opts,
+.bdrv_co_check  = parallels_co_check,
 };
 
 static void bdrv_parallels_init(void)
-- 
2.34.1

[PATCH 10/22] parallels: fix broken parallels_check_data_off()

2023-09-18 Thread Denis V. Lunev

Once we have repaired data_off field in the header we should update
s->data_start which is calculated on the base of it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/parallels.c b/block/parallels.c
index 66c86d87e3..2b5f2b54a0 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -531,6 +531,7 @@ parallels_check_data_off(BlockDriverState *bs, 
BdrvCheckResult *res,
 res->corruptions++;
 if (fix & BDRV_FIX_ERRORS) {
 s->header->data_off = cpu_to_le32(data_off);
+s->data_start = data_off;
 res->corruptions_fixed++;
 }
 
-- 
2.34.1

[PATCH 06/22] parallels: return earlier from parallels_open() function on error

2023-09-18 Thread Denis V. Lunev

At the beginning of the function we can return immediately until we
really allocate s->header.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 14 +-
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 12f38cf70b..bd26c8db63 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1090,7 +1090,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 ret = bdrv_pread(bs->file, 0, sizeof(ph), , 0);
 if (ret < 0) {
-goto fail;
+return ret;
 }
 
 bs->total_sectors = le64_to_cpu(ph.nb_sectors);
@@ -1110,13 +1110,11 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->tracks = le32_to_cpu(ph.tracks);
 if (s->tracks == 0) {
 error_setg(errp, "Invalid image: Zero sectors per track");
-ret = -EINVAL;
-goto fail;
+return -EINVAL;
 }
 if (s->tracks > INT32_MAX/513) {
 error_setg(errp, "Invalid image: Too big cluster");
-ret = -EFBIG;
-goto fail;
+return -EFBIG;
 }
 s->prealloc_size = MAX(s->tracks, s->prealloc_size);
 s->cluster_size = s->tracks << BDRV_SECTOR_BITS;
@@ -1124,16 +1122,14 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->bat_size = le32_to_cpu(ph.bat_entries);
 if (s->bat_size > INT_MAX / sizeof(uint32_t)) {
 error_setg(errp, "Catalog too large");
-ret = -EFBIG;
-goto fail;
+return -EFBIG;
 }
 
 size = bat_entry_off(s->bat_size);
 s->header_size = ROUND_UP(size, bdrv_opt_mem_align(bs->file->bs));
 s->header = qemu_try_blockalign(bs->file->bs, s->header_size);
 if (s->header == NULL) {
-ret = -ENOMEM;
-goto fail;
+return -ENOMEM;
 }
 
 ret = bdrv_pread(bs->file, 0, s->header_size, s->header, 0);
-- 
2.34.1

[PATCH 03/22] parallels: fix memory leak in parallels_open()

2023-09-18 Thread Denis V. Lunev

We should free opts allocated through qemu_opts_create() at the end.

Signed-off-by: Denis V. Lunev 
---
 block/parallels.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/parallels.c b/block/parallels.c
index 428f72de1c..af7be427c9 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1217,6 +1217,7 @@ fail_format:
 fail_options:
 ret = -EINVAL;
 fail:
+qemu_opts_del(opts);
 /*
  * "s" object was allocated by g_malloc0 so we can safely
  * try to free its fields even they were not allocated.
-- 
2.34.1

[PATCH 07/22] parallels: refactor path when we need to re-check image in parallels_open

2023-09-18 Thread Denis V. Lunev

More conditions follows thus the check should be more scalable.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index bd26c8db63..af3b4894d7 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1071,7 +1071,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret, size, i;
 int64_t file_nb_sectors, sector;
 uint32_t data_start;
-bool data_off_is_correct;
+bool need_check = false;
 
 ret = parallels_opts_prealloc(bs, options, errp);
 if (ret < 0) {
@@ -1139,11 +1139,14 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->bat_bitmap = (uint32_t *)(s->header + 1);
 
 if (le32_to_cpu(ph.inuse) == HEADER_INUSE_MAGIC) {
-s->header_unclean = true;
+need_check = s->header_unclean = true;
+}
+
+{
+bool ok = parallels_test_data_off(s, file_nb_sectors, _start);
+need_check = need_check || !ok;
 }
 
-data_off_is_correct = parallels_test_data_off(s, file_nb_sectors,
-  _start);
 s->data_start = data_start;
 s->data_end = s->data_start;
 if (s->data_end < (s->header_size >> BDRV_SECTOR_BITS)) {
@@ -1200,6 +1203,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 s->data_end = sector + s->tracks;
 }
 }
+need_check = need_check || s->data_end > file_nb_sectors;
 
 /*
  * We don't repair the image here if it's opened for checks. Also we don't
@@ -1209,12 +1213,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 return 0;
 }
 
-/*
- * Repair the image if it's dirty or
- * out-of-image corruption was detected.
- */
-if (s->data_end > file_nb_sectors || s->header_unclean
-|| !data_off_is_correct) {
+/* Repair the image if corruption was detected. */
+if (need_check) {
 BdrvCheckResult res;
 ret = bdrv_check(bs, , BDRV_FIX_ERRORS | BDRV_FIX_LEAKS);
 if (ret < 0) {
@@ -1223,7 +1223,6 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 goto fail;
 }
 }
-
 return 0;
 
 fail_format:
-- 
2.34.1

[PATCH 14/22] tests: test self-cure of parallels image with duplicated clusters

2023-09-18 Thread Denis V. Lunev

The test is quite similar with the original one for duplicated clusters.
There is the only difference in the operation which should fix the
image.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 36 +++
 tests/qemu-iotests/tests/parallels-checks.out | 31 
 2 files changed, 67 insertions(+)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index df99558486..b281246a42 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -135,6 +135,42 @@ echo "== check the second cluster (deduplicated) =="
 # Clear image
 _make_test_img $SIZE
 
+echo "== TEST DUPLICATION SELF-CURE =="
+
+echo "== write pattern to whole image =="
+{ $QEMU_IO -c "write -P 0x11 0 $SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | 
_filter_testdir
+
+echo "== write another pattern to second cluster =="
+{ $QEMU_IO -c "write -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 
| _filter_qemu_io | _filter_testdir
+
+echo "== check second cluster =="
+{ $QEMU_IO -r -c "read -P 0x55 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+
+echo "== corrupt image =="
+poke_file "$TEST_IMG" "$(($BAT_OFFSET + 4))" "\x01\x00\x00\x00"
+
+echo "== check second cluster =="
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+echo "== check the first cluster with self-repair =="
+{ $QEMU_IO -c "read -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check second cluster =="
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+echo "== write another pattern to the first clusters =="
+{ $QEMU_IO -c "write -P 0x66 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check the first cluster =="
+{ $QEMU_IO -r -c "read -P 0x66 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check the second cluster (deduplicated) =="
+{ $QEMU_IO -r -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+# Clear image
+_make_test_img $SIZE
+
 echo "== TEST DATA_OFF CHECK =="
 
 echo "== write pattern to first cluster =="
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index 1325d2b611..9793423111 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -71,6 +71,37 @@ read 1048576/1048576 bytes at offset 0
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST DUPLICATION SELF-CURE ==
+== write pattern to whole image ==
+wrote 4194304/4194304 bytes at offset 0
+4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== write another pattern to second cluster ==
+wrote 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== corrupt image ==
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check the first cluster with self-repair ==
+Repairing duplicate offset in BAT entry 1
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check second cluster ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== write another pattern to the first clusters ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check the first cluster ==
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check the second cluster (deduplicated) ==
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
 == TEST DATA_OFF CHECK ==
 == write pattern to first cluster ==
 wrote 1048576/1048576 bytes at offset 0
-- 
2.34.1

[PATCH 21/22] parallels: naive implementation of parallels_co_pwrite_zeroes

2023-09-18 Thread Denis V. Lunev

The zero flag is missed in the Parallels format specification. We can
resort to discard if we have no backing file.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index 1ef23f6669..a6d64d0d47 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -582,6 +582,19 @@ done:
 return ret;
 }
 
+static int coroutine_fn
+parallels_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+   BdrvRequestFlags flags)
+{
+/*
+ * The zero flag is missed in the Parallels format specification. We can
+ * resort to discard if we have no backing file (this condition is checked
+ * inside parallels_co_pdiscard().
+ */
+return parallels_co_pdiscard(bs, offset, bytes);
+}
+
+
 static void parallels_check_unclean(BlockDriverState *bs,
 BdrvCheckResult *res,
 BdrvCheckMode fix)
@@ -1463,6 +1476,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_co_create_opts= parallels_co_create_opts,
 .bdrv_co_check  = parallels_co_check,
 .bdrv_co_pdiscard   = parallels_co_pdiscard,
+.bdrv_co_pwrite_zeroes  = parallels_co_pwrite_zeroes,
 };
 
 static void bdrv_parallels_init(void)
-- 
2.34.1

[PATCH 19/22] parallels: naive implementation of parallels_co_pdiscard

2023-09-18 Thread Denis V. Lunev

* Discarding with backing stores is not supported by the format.
* There is no buffering/queueing of the discard operation.
* Only operations aligned to the cluster are supported.

Signed-off-by: Denis V. Lunev 
---
 block/parallels.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/block/parallels.c b/block/parallels.c
index d9d36c514b..1ef23f6669 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -537,6 +537,51 @@ parallels_co_readv(BlockDriverState *bs, int64_t 
sector_num, int nb_sectors,
 return ret;
 }
 
+
+static int coroutine_fn
+parallels_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+int ret = 0;
+uint32_t cluster, count;
+BDRVParallelsState *s = bs->opaque;
+
+/*
+ * The image does not support ZERO mark inside the BAT, which means that
+ * stale data could be exposed from the backing file.
+ */
+if (bs->backing) {
+return -ENOTSUP;
+}
+
+if (!QEMU_IS_ALIGNED(offset, s->cluster_size)) {
+return -ENOTSUP;
+} else if (!QEMU_IS_ALIGNED(bytes, s->cluster_size)) {
+return -ENOTSUP;
+}
+
+cluster = offset / s->cluster_size;
+count = bytes / s->cluster_size;
+
+qemu_co_mutex_lock(>lock);
+for (; count > 0; cluster++, count--) {
+int64_t host_off = bat2sect(s, cluster) << BDRV_SECTOR_BITS;
+if (host_off == 0) {
+continue;
+}
+
+ret = bdrv_co_pdiscard(bs->file, host_off, s->cluster_size);
+if (ret < 0) {
+goto done;
+}
+
+parallels_set_bat_entry(s, cluster, 0);
+bitmap_clear(s->used_bmap, host_cluster_index(s, host_off), 1);
+}
+done:
+qemu_co_mutex_unlock(>lock);
+return ret;
+}
+
 static void parallels_check_unclean(BlockDriverState *bs,
 BdrvCheckResult *res,
 BdrvCheckMode fix)
@@ -1417,6 +1462,7 @@ static BlockDriver bdrv_parallels = {
 .bdrv_co_create = parallels_co_create,
 .bdrv_co_create_opts= parallels_co_create_opts,
 .bdrv_co_check  = parallels_co_check,
+.bdrv_co_pdiscard   = parallels_co_pdiscard,
 };
 
 static void bdrv_parallels_init(void)
-- 
2.34.1

[PATCH 17/22] parallels: naive implementation of allocate_clusters with used bitmap

2023-09-18 Thread Denis V. Lunev

The access to the bitmap is not optimized completely.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 51 ---
 1 file changed, 39 insertions(+), 12 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 3beb18e44f..6a5bff4fcb 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -253,7 +253,7 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 {
 int ret = 0;
 BDRVParallelsState *s = bs->opaque;
-int64_t pos, space, idx, to_allocate, i, len;
+int64_t i, pos, idx, to_allocate, first_free, host_off;
 
 pos = block_status(s, sector_num, nb_sectors, pnum);
 if (pos > 0) {
@@ -276,15 +276,13 @@ allocate_clusters(BlockDriverState *bs, int64_t 
sector_num,
  */
 assert(idx < s->bat_size && idx + to_allocate <= s->bat_size);
 
-space = to_allocate * s->tracks;
-len = bdrv_co_getlength(bs->file->bs);
-if (len < 0) {
-return len;
-}
-if (s->data_end + space > (len >> BDRV_SECTOR_BITS)) {
+first_free = find_first_zero_bit(s->used_bmap, s->used_bmap_size);
+if (first_free == s->used_bmap_size) {
 uint32_t new_usedsize;
+int64_t space = to_allocate * s->tracks + s->prealloc_size;
+
+host_off = s->data_end * BDRV_SECTOR_SIZE;
 
-space += s->prealloc_size;
 /*
  * We require the expanded size to read back as zero. If the
  * user permitted truncation, we try that; but if it fails, we
@@ -313,6 +311,32 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 s->used_bmap = bitmap_zero_extend(s->used_bmap, s->used_bmap_size,
   new_usedsize);
 s->used_bmap_size = new_usedsize;
+} else {
+int64_t next_used;
+next_used = find_next_bit(s->used_bmap, s->used_bmap_size, first_free);
+
+/* Not enough continuous clusters in the middle, adjust the size */
+if (next_used - first_free < to_allocate) {
+to_allocate = next_used - first_free;
+*pnum = (idx + to_allocate) * s->tracks - sector_num;
+}
+
+host_off = s->data_start * BDRV_SECTOR_SIZE;
+host_off += first_free * s->cluster_size;
+
+/*
+ * No need to preallocate if we are using tail area from the above
+ * branch. In the other case we are likely re-using hole. Preallocate
+ * the space if required by the prealloc_mode.
+ */
+if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE &&
+host_off < s->data_end * BDRV_SECTOR_SIZE) {
+ret = bdrv_co_pwrite_zeroes(bs->file, host_off,
+s->cluster_size * to_allocate, 0);
+if (ret < 0) {
+return ret;
+}
+}
 }
 
 /*
@@ -344,15 +368,18 @@ allocate_clusters(BlockDriverState *bs, int64_t 
sector_num,
 }
 }
 
-ret = mark_used(bs, s->used_bmap, s->used_bmap_size,
-s->data_end << BDRV_SECTOR_BITS, to_allocate);
+ret = mark_used(bs, s->used_bmap, s->used_bmap_size, host_off, 
to_allocate);
 if (ret < 0) {
 /* Image consistency is broken. Alarm! */
 return ret;
 }
 for (i = 0; i < to_allocate; i++) {
-parallels_set_bat_entry(s, idx + i, s->data_end / s->off_multiplier);
-s->data_end += s->tracks;
+parallels_set_bat_entry(s, idx + i,
+host_off / BDRV_SECTOR_SIZE / s->off_multiplier);
+host_off += s->cluster_size;
+}
+if (host_off > s->data_end * BDRV_SECTOR_SIZE) {
+s->data_end = host_off / BDRV_SECTOR_SIZE;
 }
 
 return bat2sect(s, idx) + sector_num % s->tracks;
-- 
2.34.1

[PATCH 11/22] parallels: add test which will validate data_off fixes through repair

2023-09-18 Thread Denis V. Lunev

We have only check through self-repair and that proven to be not enough.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/tests/parallels-checks | 17 +
 tests/qemu-iotests/tests/parallels-checks.out | 18 ++
 2 files changed, 35 insertions(+)

diff --git a/tests/qemu-iotests/tests/parallels-checks 
b/tests/qemu-iotests/tests/parallels-checks
index 5917ee079d..f4ca50295e 100755
--- a/tests/qemu-iotests/tests/parallels-checks
+++ b/tests/qemu-iotests/tests/parallels-checks
@@ -140,6 +140,23 @@ poke_file "$TEST_IMG" "$DATA_OFF_OFFSET" "\xff\xff\xff\xff"
 echo "== check first cluster =="
 { $QEMU_IO -c "read -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
+# Clear image
+_make_test_img $SIZE
+
+echo "== TEST DATA_OFF THROUGH REPAIR =="
+
+echo "== write pattern to first cluster =="
+{ $QEMU_IO -c "write -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== spoil data_off field =="
+poke_file "$TEST_IMG" "$DATA_OFF_OFFSET" "\xff\xff\xff\xff"
+
+echo "== repair image =="
+_check_test_img -r all
+
+echo "== check first cluster =="
+{ $QEMU_IO -r -c "read -P 0x55 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/tests/parallels-checks.out 
b/tests/qemu-iotests/tests/parallels-checks.out
index 98a3a7f55e..74a5e29260 100644
--- a/tests/qemu-iotests/tests/parallels-checks.out
+++ b/tests/qemu-iotests/tests/parallels-checks.out
@@ -72,4 +72,22 @@ wrote 1048576/1048576 bytes at offset 0
 Repairing data_off field has incorrect value
 read 1048576/1048576 bytes at offset 0
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=4194304
+== TEST DATA_OFF THROUGH REPAIR ==
+== write pattern to first cluster ==
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== spoil data_off field ==
+== repair image ==
+Repairing data_off field has incorrect value
+The following inconsistencies were found and repaired:
+
+0 leaked clusters
+1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+== check first cluster ==
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 *** done
-- 
2.34.1

[PATCH 15/22] parallels: accept multiple clusters in mark_used()

2023-09-18 Thread Denis V. Lunev

This would be useful in the next patch in allocate_clusters(). This
change would not imply serious performance drawbacks as usually image
is full of data or are at the end of the bitmap.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index c41511398f..b6505fcc5b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -178,18 +178,20 @@ static void parallels_set_bat_entry(BDRVParallelsState *s,
 bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 
1);
 }
 
-static int mark_used(BlockDriverState *bs,
- unsigned long *bitmap, uint32_t bitmap_size, int64_t off)
+static int mark_used(BlockDriverState *bs, unsigned long *bitmap,
+ uint32_t bitmap_size, int64_t off, uint32_t count)
 {
 BDRVParallelsState *s = bs->opaque;
 uint32_t cluster_index = host_cluster_index(s, off);
-if (cluster_index >= bitmap_size) {
+unsigned long next_used;
+if (cluster_index + count > bitmap_size) {
 return -E2BIG;
 }
-if (test_bit(cluster_index, bitmap)) {
+next_used = find_next_bit(bitmap, bitmap_size, cluster_index);
+if (next_used < cluster_index + count) {
 return -EBUSY;
 }
-bitmap_set(bitmap, cluster_index, 1);
+bitmap_set(bitmap, cluster_index, count);
 return 0;
 }
 
@@ -230,7 +232,7 @@ static int parallels_fill_used_bitmap(BlockDriverState *bs)
 continue;
 }
 
-err2 = mark_used(bs, s->used_bmap, s->used_bmap_size, host_off);
+err2 = mark_used(bs, s->used_bmap, s->used_bmap_size, host_off, 1);
 if (err2 < 0 && err == 0) {
 err = err2;
 }
@@ -732,7 +734,7 @@ parallels_check_duplicate(BlockDriverState *bs, 
BdrvCheckResult *res,
 continue;
 }
 
-ret = mark_used(bs, bitmap, bitmap_size, host_off);
+ret = mark_used(bs, bitmap, bitmap_size, host_off, 1);
 assert(ret != -E2BIG);
 if (ret == 0) {
 continue;
@@ -792,7 +794,7 @@ parallels_check_duplicate(BlockDriverState *bs, 
BdrvCheckResult *res,
  * considered, and the bitmap size doesn't change. This specifically
  * means that -E2BIG is OK.
  */
-ret = mark_used(bs, bitmap, bitmap_size, host_off);
+ret = mark_used(bs, bitmap, bitmap_size, host_off, 1);
 if (ret == -EBUSY) {
 res->check_errors++;
 goto out_repair_bat;
-- 
2.34.1

[PATCH 05/22] parallels: return earler in fail_format branch in parallels_open()

2023-09-18 Thread Denis V. Lunev

We do not need to perform any deallocation/cleanup if wrong format is
detected.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/parallels.c b/block/parallels.c
index ae006e7fc7..12f38cf70b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1232,7 +1232,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 fail_format:
 error_setg(errp, "Image not in Parallels format");
-ret = -EINVAL;
+return -EINVAL;
+
 fail:
 /*
  * "s" object was allocated by g_malloc0 so we can safely
-- 
2.34.1

[PATCH 08/22] parallels: create mark_used() helper which sets bit in used bitmap

2023-09-18 Thread Denis V. Lunev

This functionality is used twice already and next patch will add more
code with it.

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 34 +-
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index af3b4894d7..66c86d87e3 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -178,6 +178,21 @@ static void parallels_set_bat_entry(BDRVParallelsState *s,
 bitmap_set(s->bat_dirty_bmap, bat_entry_off(index) / s->bat_dirty_block, 
1);
 }
 
+static int mark_used(BlockDriverState *bs,
+ unsigned long *bitmap, uint32_t bitmap_size, int64_t off)
+{
+BDRVParallelsState *s = bs->opaque;
+uint32_t cluster_index = host_cluster_index(s, off);
+if (cluster_index >= bitmap_size) {
+return -E2BIG;
+}
+if (test_bit(cluster_index, bitmap)) {
+return -EBUSY;
+}
+bitmap_set(bitmap, cluster_index, 1);
+return 0;
+}
+
 static int64_t coroutine_fn GRAPH_RDLOCK
 allocate_clusters(BlockDriverState *bs, int64_t sector_num,
   int nb_sectors, int *pnum)
@@ -621,7 +636,7 @@ parallels_check_duplicate(BlockDriverState *bs, 
BdrvCheckResult *res,
 BDRVParallelsState *s = bs->opaque;
 int64_t host_off, host_sector, guest_sector;
 unsigned long *bitmap;
-uint32_t i, bitmap_size, cluster_index, bat_entry;
+uint32_t i, bitmap_size, bat_entry;
 int n, ret = 0;
 uint64_t *buf = NULL;
 bool fixed = false;
@@ -655,10 +670,9 @@ parallels_check_duplicate(BlockDriverState *bs, 
BdrvCheckResult *res,
 continue;
 }
 
-cluster_index = host_cluster_index(s, host_off);
-assert(cluster_index < bitmap_size);
-if (!test_bit(cluster_index, bitmap)) {
-bitmap_set(bitmap, cluster_index, 1);
+ret = mark_used(bs, bitmap, bitmap_size, host_off);
+assert(ret != -E2BIG);
+if (ret == 0) {
 continue;
 }
 
@@ -713,11 +727,13 @@ parallels_check_duplicate(BlockDriverState *bs, 
BdrvCheckResult *res,
  * consistent for the new allocated clusters too.
  *
  * Note, clusters allocated outside the current image are not
- * considered, and the bitmap size doesn't change.
+ * considered, and the bitmap size doesn't change. This specifically
+ * means that -E2BIG is OK.
  */
-cluster_index = host_cluster_index(s, host_off);
-if (cluster_index < bitmap_size) {
-bitmap_set(bitmap, cluster_index, 1);
+ret = mark_used(bs, bitmap, bitmap_size, host_off);
+if (ret == -EBUSY) {
+res->check_errors++;
+goto out_repair_bat;
 }
 
 fixed = true;
-- 
2.34.1

[PATCH 18/22] parallels: improve readability of allocate_clusters

2023-09-18 Thread Denis V. Lunev

Replace 'space' representing the amount of data to preallocate with
'bytes'.

Rationale:
* 'space' at each place is converted to bytes
* the unit is more close to the variable name

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 block/parallels.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 6a5bff4fcb..d9d36c514b 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -279,7 +279,8 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
 first_free = find_first_zero_bit(s->used_bmap, s->used_bmap_size);
 if (first_free == s->used_bmap_size) {
 uint32_t new_usedsize;
-int64_t space = to_allocate * s->tracks + s->prealloc_size;
+int64_t bytes = to_allocate * s->cluster_size;
+bytes += s->prealloc_size * BDRV_SECTOR_SIZE;
 
 host_off = s->data_end * BDRV_SECTOR_SIZE;
 
@@ -289,8 +290,7 @@ allocate_clusters(BlockDriverState *bs, int64_t sector_num,
  * force the safer-but-slower fallocate.
  */
 if (s->prealloc_mode == PRL_PREALLOC_MODE_TRUNCATE) {
-ret = bdrv_co_truncate(bs->file,
-   (s->data_end + space) << BDRV_SECTOR_BITS,
+ret = bdrv_co_truncate(bs->file, host_off + bytes,
false, PREALLOC_MODE_OFF,
BDRV_REQ_ZERO_WRITE, NULL);
 if (ret == -ENOTSUP) {
@@ -298,16 +298,13 @@ allocate_clusters(BlockDriverState *bs, int64_t 
sector_num,
 }
 }
 if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) {
-ret = bdrv_co_pwrite_zeroes(bs->file,
-s->data_end << BDRV_SECTOR_BITS,
-space << BDRV_SECTOR_BITS, 0);
+ret = bdrv_co_pwrite_zeroes(bs->file, host_off, bytes, 0);
 }
 if (ret < 0) {
 return ret;
 }
 
-new_usedsize = s->used_bmap_size +
-   (space << BDRV_SECTOR_BITS) / s->cluster_size;
+new_usedsize = s->used_bmap_size + bytes / s->cluster_size;
 s->used_bmap = bitmap_zero_extend(s->used_bmap, s->used_bmap_size,
   new_usedsize);
 s->used_bmap_size = new_usedsize;
-- 
2.34.1

[PATCH 20/22] tests: extend test 131 to cover availability of the discard operation

2023-09-18 Thread Denis V. Lunev

This patch contains test which minimally tests discard and new cluster
allocation logic.

The following checks are added:
* write 2 clusters, discard the first allocated
* write another cluster, check that the hole is filled
* write 2 clusters, discard the first allocated, write 1 cluster at
  non-aligned to cluster offset (2 new clusters should be allocated)

Signed-off-by: Denis V. Lunev 
---
 tests/qemu-iotests/131 | 31 +++
 tests/qemu-iotests/131.out | 38 ++
 2 files changed, 69 insertions(+)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index 304bbb3f61..324008b3f6 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -74,6 +74,37 @@ poke_file "$TEST_IMG" "$inuse_offset" "\x59\x6e\x6f\x74"
 echo "== read corrupted image with repairing =="
 { $QEMU_IO -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 
+echo "== check discard =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_DBL_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "discard 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io 
| _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check simple allocation over the discarded hole =="
+
+{ $QEMU_IO -c "write -P 0x11 $CLUSTER_DBL_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_DBL_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+
+echo "== check more complex allocation over the discard hole =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 $CLUSTER_DBL_SIZE $CLUSTER_DBL_SIZE" "$TEST_IMG"; 
} 2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "discard $CLUSTER_DBL_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+# There is 1 cluster hole. Fill it fully and allocate 1 cluster at the end
+{ $QEMU_IO -c "write -P 0x12 $CLUSTER_HALF_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0x12 $CLUSTER_HALF_SIZE $CLUSTER_SIZE" "$TEST_IMG"; } 
2>&1 | _filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 $((CLUSTER_SIZE + CLUSTER_HALF_SIZE)) 
$CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+
 echo "== allocate with backing =="
 # Verify that allocating clusters works fine even when there is a backing 
image.
 # Regression test for a bug where we would pass a buffer read from the backing
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index d2904578df..27df91ca97 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -26,6 +26,44 @@ read 524288/524288 bytes at offset 0
 Repairing image was not closed correctly
 read 1048576/1048576 bytes at offset 1048576
 1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check discard ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x20TEST_DIR/t.IMGFMT
+discard 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0x100x10TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check simple allocation over the discarded hole ==
+wrote 1048576/1048576 bytes at offset 2097152
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0x100x10TEST_DIR/t.IMGFMT
+0x200x10TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 2097152
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check more complex allocation over the discard hole ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 2097152/2097152 bytes at offset 2097152
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+discard 1048576/1048576 bytes at offset 2097152
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 524288
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0   0x10TEST_DIR/t.IMGFMT
+0x100x10TEST_DIR/t.IMGFMT
+0x300x10TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 524288
+1 MiB, X ops;

[PATCH v2 00/22] implement discard operation for Parallels images

2023-09-18 Thread Denis V. Lunev

This series introduces new block allocator scheme into unused data
blocks inside the image first and only after that extends the file.
On top of that naive implementation of discard and write-zeroes
(through the discard) is added.

There are also a bunch of bugs revealed in the code during the
implementation and testing.

Changes from v1:
* added #3 - fix memory leak in parallels_open
* rewritten #4 (orig #3) - reflecting leak fixed previous patch
* fixed warning in #7 (orig #6), thanks Mike!
* fixed typo in #16 (was #15)
* fixed warning in #19 (was #18)
* fixed wrong argument for bdrv_co_pdiscard in #19 (was #18)
* improved discard tests in #20 (was #19)

Signed-off-by: Denis V. Lunev 

Denis V. Lunev (22):
  parallels: fix formatting in bdrv_parallels initialization
  parallels: mark driver as supporting CBT
  parallels: fix memory leak in parallels_open()
  parallels: invent parallels_opts_prealloc() helper to parse prealloc
opts
  parallels: return earler in fail_format branch in parallels_open()
  parallels: return earlier from parallels_open() function on error
  parallels: refactor path when we need to re-check image in
parallels_open
  parallels: create mark_used() helper which sets bit in used bitmap
  tests: ensure that image validation will not cure the corruption
  parallels: fix broken parallels_check_data_off()
  parallels: add test which will validate data_off fixes through repair
  parallels: collect bitmap of used clusters at open
  tests: fix broken deduplication check in parallels format test
  tests: test self-cure of parallels image with duplicated clusters
  parallels: accept multiple clusters in mark_used()
  parallels: update used bitmap in allocate_cluster
  parallels: naive implementation of allocate_clusters with used bitmap
  parallels: improve readability of allocate_clusters
  parallels: naive implementation of parallels_co_pdiscard
  tests: extend test 131 to cover availability of the discard operation
  parallels: naive implementation of parallels_co_pwrite_zeroes
  tests: extend test 131 to cover availability of the write-zeroes

 block/parallels.c | 389 ++
 block/parallels.h |   3 +
 tests/qemu-iotests/131|  52 +++
 tests/qemu-iotests/131.out|  60 +++
 tests/qemu-iotests/tests/parallels-checks |  76 +++-
 tests/qemu-iotests/tests/parallels-checks.out |  65 ++-
 6 files changed, 544 insertions(+), 101 deletions(-)

-- 
2.34.1

[PATCH 3/3] tests: extend test 131 to cover availability of the write-zeroes

2023-09-18 Thread Denis V. Lunev

This patch contains test which minimally tests write-zeroes on top of
working discard.

The following checks are added:
* write 2 clusters, write-zero to the first allocated cluster
* write 2 cluster, write-zero to the half the first allocated cluster

Signed-off-by: Denis V. Lunev 
Reviewed-by: Alexander Ivanov 
---
 tests/qemu-iotests/131 | 20 
 tests/qemu-iotests/131.out | 20 
 2 files changed, 40 insertions(+)

diff --git a/tests/qemu-iotests/131 b/tests/qemu-iotests/131
index e50a658f22..308732d84b 100755
--- a/tests/qemu-iotests/131
+++ b/tests/qemu-iotests/131
@@ -105,6 +105,26 @@ _make_test_img $size
 { $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
 { $QEMU_IO -c "read -P 0 $((CLUSTER_SIZE + CLUSTER_HALF_SIZE)) 
$CLUSTER_DBL_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
 
+echo "== check write-zeroes =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_DBL_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | _filter_qemu_io 
| _filter_testdir
+{ $QEMU_IMG map "$TEST_IMG"; } 2>&1 | _filter_qemu_img_map
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+
+echo "== check cluster-partial write-zeroes =="
+
+# Clear image
+_make_test_img $size
+
+{ $QEMU_IO -c "write -P 0x11 0 $CLUSTER_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "write -z 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0 0 $CLUSTER_HALF_SIZE" "$TEST_IMG"; } 2>&1 | 
_filter_qemu_io | _filter_testdir
+{ $QEMU_IO -c "read -P 0x11 $CLUSTER_HALF_SIZE $CLUSTER_HALF_SIZE" 
"$TEST_IMG"; } 2>&1 | _filter_qemu_io | _filter_testdir
+
 echo "== allocate with backing =="
 # Verify that allocating clusters works fine even when there is a backing 
image.
 # Regression test for a bug where we would pass a buffer read from the backing
diff --git a/tests/qemu-iotests/131.out b/tests/qemu-iotests/131.out
index 9882f9df6c..8493561bab 100644
--- a/tests/qemu-iotests/131.out
+++ b/tests/qemu-iotests/131.out
@@ -64,6 +64,26 @@ read 524288/524288 bytes at offset 0
 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 2097152/2097152 bytes at offset 1572864
 2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 2097152/2097152 bytes at offset 0
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Offset  Length  File
+0x100x10TEST_DIR/t.IMGFMT
+read 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== check cluster-partial write-zeroes ==
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+wrote 1048576/1048576 bytes at offset 0
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 524288/524288 bytes at offset 524288
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 == allocate with backing ==
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=67108864
-- 
2.34.1

Re: [PATCH v2 16/18] hw/sm501: allow compiling without PIXMAN

2023-09-18 Thread BALATON Zoltan


On Mon, 18 Sep 2023, marcandre.lur...@redhat.com wrote:

From: Marc-André Lureau 

Drop the "x-pixman" property and use fallback path in such case.

Signed-off-by: Marc-André Lureau 
---
hw/display/sm501.c | 19 ---
1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/hw/display/sm501.c b/hw/display/sm501.c
index 0eecd00701..a897c82f04 100644
--- a/hw/display/sm501.c
+++ b/hw/display/sm501.c
@@ -730,7 +730,6 @@ static void sm501_2d_operation(SM501State *s)
switch (cmd) {
case 0: /* BitBlt */
{
-static uint32_t tmp_buf[16384];
unsigned int src_x = (s->twoD_source >> 16) & 0x01FFF;
unsigned int src_y = s->twoD_source & 0x;
uint32_t src_base = s->twoD_source_base & 0x03FF;
@@ -828,9 +827,11 @@ static void sm501_2d_operation(SM501State *s)
de = db + (width + (height - 1) * dst_pitch) * bypp;
overlap = (db < se && sb < de);
}
+#ifdef CONFIG_PIXMAN
if (overlap && (s->use_pixman & BIT(2))) {
/* pixman can't do reverse blit: copy via temporary */
int tmp_stride = DIV_ROUND_UP(width * bypp, sizeof(uint32_t));
+static uint32_t tmp_buf[16384];
uint32_t *tmp = tmp_buf;

if (tmp_stride * sizeof(uint32_t) * height > sizeof(tmp_buf)) {
@@ -860,9 +861,12 @@ static void sm501_2d_operation(SM501State *s)
   dst_pitch * bypp / sizeof(uint32_t),
   8 * bypp, 8 * bypp, src_x, src_y,
   dst_x, dst_y, width, height);
-} else {
+} else
+#else
+{
fallback = true;
}
+#endif
if (fallback) {
uint8_t *sp = s->local_mem + src_base;
uint8_t *d = s->local_mem + dst_base;
@@ -894,10 +898,13 @@ static void sm501_2d_operation(SM501State *s)
color = cpu_to_le16(color);
}

+#ifdef CONFIG_PIXMAN
if (!(s->use_pixman & BIT(0)) || (width == 1 && height == 1) ||
!pixman_fill((uint32_t *)>local_mem[dst_base],
 dst_pitch * bypp / sizeof(uint32_t), 8 * bypp,
- dst_x, dst_y, width, height, color)) {
+ dst_x, dst_y, width, height, color))
+#endif
+{
/* fallback when pixman failed or we don't want to call it */
uint8_t *d = s->local_mem + dst_base;
unsigned int x, y, i;
@@ -2038,7 +2045,9 @@ static void sm501_realize_sysbus(DeviceState *dev, Error 
**errp)

static Property sm501_sysbus_properties[] = {
DEFINE_PROP_UINT32("vram-size", SM501SysBusState, vram_size, 0),
+#ifdef CONFIG_PIXMAN
DEFINE_PROP_UINT8("x-pixman", SM501SysBusState, state.use_pixman, 7),
+#endif


Do we want to omit the property when compiled without pixman? I think we 
could leave it there and it would just be ignored without pixman but the 
same command line would still work and need less ifdefs in code.


Otherwise:

Reviewed-by: BALATON Zoltan 

Regards,
BALATON Zoltan


DEFINE_PROP_END_OF_LIST(),
};

@@ -2126,7 +2135,9 @@ static void sm501_realize_pci(PCIDevice *dev, Error 
**errp)

static Property sm501_pci_properties[] = {
DEFINE_PROP_UINT32("vram-size", SM501PCIState, vram_size, 64 * MiB),
+#ifdef CONFIG_PIXMAN
DEFINE_PROP_UINT8("x-pixman", SM501PCIState, state.use_pixman, 7),
+#endif
DEFINE_PROP_END_OF_LIST(),
};

@@ -2169,8 +2180,10 @@ static void sm501_pci_class_init(ObjectClass *klass, 
void *data)

static void sm501_pci_init(Object *o)
{
+#ifdef CONFIG_PIXMAN
object_property_set_description(o, "x-pixman", "Use pixman for: "
"1: fill, 2: blit, 4: overlap blit");
+#endif
}

static const TypeInfo sm501_pci_info = {

Re: [PATCH v1 00/22] vfio: Adopt iommufd

2023-09-18 Thread Jason Gunthorpe

On Mon, Sep 18, 2023 at 02:23:48PM +0200, Cédric Le Goater wrote:
> On 9/18/23 13:51, Jason Gunthorpe wrote:
> > On Fri, Sep 15, 2023 at 02:42:48PM +0200, Cédric Le Goater wrote:
> > > On 8/30/23 12:37, Zhenzhong Duan wrote:
> > > > Hi All,
> > > > 
> > > > As the kernel side iommufd cdev and hot reset feature have been queued,
> > > > also hwpt alloc has been added in Jason's for_next branch [1], I'd like
> > > > to update a new version matching kernel side update and with rfc flag
> > > > removed. Qemu code can be found at [2], look forward more comments!
> > > 
> > > FYI, I have started cleaning up the VFIO support in QEMU PPC. First
> > > is the removal of nvlink2, which was dropped from the kernel 2.5 years
> > > ago. Next is probably removal of all the PPC bits in VFIO. Code is
> > > bitrotting and AFAICT VFIO has been broken on these platforms since
> > > 5.18 or so.
> > 
> > It was fixed since then - at least one company (not IBM) still cares
> > about vfio on ppc, though I think it is for a DPDK use case not VFIO.
> 
> Indeed.
> I just checked on a POWER9 box running a debian sid (6.4) and device
> assignment of a simple NIC (e1000e) in a ubuntu 23.04 guest worked
> correctly. Using a 6.6-rc1 on the host worked also. One improvement
> would be to reflect in the Kconfig files that CONFIG_IOMMUFD is not
> supported on PPC so that it can not be selected.

When we did this I thought there were other iommu drivers on Power
that did work with VFIO (fsl_pamu specifically), but it turns out that
ppc iommu driver doesn't support VFIO and the VFIO FSL stuff is for
ARM only.

So it could be done...

These days I believe we have the capacity to do the PPC stuff without
making it so special - it would be alot of work but the road is pretty
clear. At least if qemu wants to remove PPC VFIO support I would not
object.

Jason

Re: [PULL 00/19] crypto: Provide clmul.h and host accel

2023-09-18 Thread Stefan Hajnoczi

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any 
user-visible changes.


signature.asc
Description: PGP signature

Re: [PATCH v2 qemu 3/3] dump: Add qmp argument "reassembled"

2023-09-18 Thread Daniel P . Berrangé

On Mon, Sep 18, 2023 at 10:34:30AM -0700, Stephen Brennan wrote:
> Daniel P. Berrangé  writes:
> > #
> > # @DumpGuestMemoryFormat:
> > #
> > # An enumeration of guest-memory-dump's format.
> > #
> > # @elf: elf format
> > #
> > # @kdump-zlib: makedumpfile flattened, kdump-compressed format with 
> > zlib-compressed
> > #
> > # @kdump-lzo: makedumpfile flattened, kdump-compressed format with 
> > lzo-compressed
> > #
> > # @kdump-snappy: makedumpfile flattened, kdump-compressed format with 
> > snappy-compressed
> > #
> > # @kdump-raw-zlib: raw assembled kdump-compressed format with 
> > zlib-compressed (since 8.2)
> > #
> > # @kdump-raw-lzo: raw assembled kdump-compressed format with lzo-compressed 
> > (since 8.2)
> > #
> > # @kdump-raw-snappy: raw assembled kdump-compressed format with 
> > snappy-compressed (since 8.2)
> > #
> > # @win-dmp: Windows full crashdump format, can be used instead of ELF
> > # converting (since 2.13)
> > #
> > # Since: 2.0
> > ##
> > { 'enum': 'DumpGuestMemoryFormat',
> >   'data': [ 'elf',
> > 'kdump-zlib', 'kdump-lzo', 'kdump-snappy',
> > 'kdump-raw-zlib', 'kdump-raw-lzo', 'kdump-raw-snappy',
> > 'win-dmp' ] }
> 
> Hi Daniel,
> 
> Sure, I'll go ahead and use this approach instead. One question: I see
> that this generates the enumeration DumpGuestMemoryFormat in
> qapi-types-dump.h. I just wanted to double-check if there's any ABI
> considerations for the numbering of this enum? Inserting kdump-raw-* at
> this point would result in 'win-dmp' getting a different numbering, and
> it seems possible that the API/ABI which libvirt uses might depend on
> the enumeration values not changing. E.G. if libvirt is built against
> one version of Qemu and then used with a different one.

The QAPI integer representation of enums is a private internal impl
detail known only to QEMU.

In terms of QMP, the on the wire representation is exclusively string
format, so safe wrt re-ordering for new/old QEMU and new/old libvirt.


In livirt's own public API, if we chose to expose these new formats,
then we have to strictly append after the existing enums constants
in libvirt's header file.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Peter Maydell

On Mon, 18 Sept 2023 at 18:26, Dave Jiang  wrote:
>
>
>
> On 9/18/23 10:00, Jonathan Cameron wrote:
> > On Mon, 18 Sep 2023 17:31:38 +0100
> > Peter Maydell  wrote:
> >
> >> On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
> >>  wrote:
> >>>
> >>> This has been missing from the start. Assume it should match
> >>> with cxl/cxl-component-utils.c as both were part of early
> >>> postings from Ben.
> >>
> >> Sounds plausible -- is there an Intel person who could give us
> >> an acked-by for this?
> >>
> >> (Ideally we wouldn't have let more gpl-2-only code into the
> >> codebase without a rationale...)
> >>
> >
> > I've +CC'd the kernel CXL maintainers from Intel a few of whom
> > have also contributed some of the QEMU CXL code.
> > Hopefully someone can ack.
>
> I see that nvdimm.c from Intel is under LGPL 2.1. What is the typical license 
> this should be applied for QEMU?

The project has a mix of licenses, for mostly historical reasons.
The overall license is thus GPLv2 (as the most-restrictive of the set).
Our preference (as noted in the top level LICENSE file) for new
code is for GPL-v2-or-later; we can take other GPL-2-compatible
licenses (preferably GPL-v2-or-later compatible) if there's a
good rationale from the submitter. (Historically, one reason
for the GPL-v2-only code has been "this came from the Linux
kernel and so it's GPL-2-only"; "we copied a lot of this code
from some other file in QEMU and that has license X" is
the other one.)

thanks
-- PMM

Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> @error_reported and @received_error are flags.  The latter is even
> assigned bool true.  Change them from int to bool.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

[PATCH v4 1/1] cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

2023-09-18 Thread Gregory Price

Create a new device to emulate the SK hynix Niagara MHSLD platform.

This device has custom CCI commands that allow for applying isolation
to each memory block between hosts. This enables an early form of
dynamic capacity, whereby the NUMA node maps the entire region, but
the host is responsible for asking the device which memory blocks
are allocated to it, and therefore may be onlined.

To instantiate:

-device 
cxl-skh-niagara,cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0,sn=6,mhd-head=0,mhd-shmid=15

The linux kernel will require raw CXL commands enabled to allow for
passing through of Niagara CXL commands via the CCI mailbox.

The Niagara MH-SLD has a shared memory region that must be initialized
using the 'init_niagara' tool located in the vendor subdirectory

usage: init_niagara
heads : number of heads on the device
sections  : number of sections
section_size  : size of a section in 128mb increments
shmid : shmid produced by ipcmk

Example:
$shmid1=ipcmk -M 131072
./init_niagara 4 32 1 $shmid1

Signed-off-by: Gregory Price 
Signed-off-by: Junhee Ryu 
Signed-off-by: Kwangjin Ko 
---
 hw/cxl/Kconfig  |   6 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/Kconfig   |   1 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/Kconfig   |   4 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   3 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 162 
 10 files changed, 795 insertions(+)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

diff --git a/hw/cxl/Kconfig b/hw/cxl/Kconfig
index c9b2e46bac..88022008c7 100644
--- a/hw/cxl/Kconfig
+++ b/hw/cxl/Kconfig
@@ -1,6 +1,12 @@
+source vendor/Kconfig
+
 config CXL
 bool
 default y if PCI_EXPRESS
 
+config CXL_VENDOR
+bool
+default y
+
 config I2C_MCTP_CXL
 bool
diff --git a/hw/cxl/meson.build b/hw/cxl/meson.build
index 1393821fc4..e8c8c1355a 100644
--- a/hw/cxl/meson.build
+++ b/hw/cxl/meson.build
@@ -15,3 +15,5 @@ system_ss.add(when: 'CONFIG_CXL',
 system_ss.add(when: 'CONFIG_I2C_MCTP_CXL', if_true: files('i2c_mctp_cxl.c'))
 
 system_ss.add(when: 'CONFIG_ALL', if_true: files('cxl-host-stubs.c'))
+
+subdir('vendor')
diff --git a/hw/cxl/vendor/Kconfig b/hw/cxl/vendor/Kconfig
new file mode 100644
index 00..aa23bb051b
--- /dev/null
+++ b/hw/cxl/vendor/Kconfig
@@ -0,0 +1 @@
+source skhynix/Kconfig
diff --git a/hw/cxl/vendor/meson.build b/hw/cxl/vendor/meson.build
new file mode 100644
index 00..12db8991f1
--- /dev/null
+++ b/hw/cxl/vendor/meson.build
@@ -0,0 +1 @@
+subdir('skhynix')
diff --git a/hw/cxl/vendor/skhynix/.gitignore b/hw/cxl/vendor/skhynix/.gitignore
new file mode 100644
index 00..6d96de38ea
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/.gitignore
@@ -0,0 +1 @@
+init_niagara
diff --git a/hw/cxl/vendor/skhynix/Kconfig b/hw/cxl/vendor/skhynix/Kconfig
new file mode 100644
index 00..382fa0cd6c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/Kconfig
@@ -0,0 +1,4 @@
+config CXL_SKHYNIX_NIAGARA
+bool
+depends on CXL_MEM_DEVICE
+default y if CXL_VENDOR
diff --git a/hw/cxl/vendor/skhynix/init_niagara.c 
b/hw/cxl/vendor/skhynix/init_niagara.c
new file mode 100644
index 00..2c189dc33c
--- /dev/null
+++ b/hw/cxl/vendor/skhynix/init_niagara.c
@@ -0,0 +1,99 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Copyright (c) 2023 MemVerge Inc.
+ * Copyright (c) 2023 SK hynix Inc.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct niagara_state {
+uint8_t nr_heads;
+uint8_t nr_lds;
+uint8_t ldmap[65536];
+uint32_t total_sections;
+uint32_t free_sections;
+uint32_t section_size;
+uint32_t sections[];
+};
+
+int main(int argc, char *argv[])
+{
+int shmid = 0;
+uint32_t sections = 0;
+uint32_t section_size = 0;
+uint32_t heads = 0;
+struct niagara_state *niagara_state = NULL;
+size_t state_size;
+uint8_t i;
+
+if (argc != 5) {
+printf("usage: init_niagara
\n"
+"\theads : number of heads on the device\n"
+"\tsections  : number of sections\n"
+"\tsection_size  : size of a section in 128mb increments\n"
+"\tshmid : /tmp/mytoken.tmp\n\n"
+"It is recommended your shared memory

[PATCH v4 0/1] Niagara MHSLD

2023-09-18 Thread Gregory Price

v4 update: Kconfig and meson fixes

Since Niagara uses , it presently can only be built
for linux.  Also addings missing Kconfig files and options
to turn it off, and turns it off by default if VENDOR or
CXL_MEM_DEVICE are turned off.

Gregory Price (1):
  cxl/vendor: SK hynix Niagara Multi-Headed SLD Device

 hw/cxl/Kconfig  |   6 +
 hw/cxl/meson.build  |   2 +
 hw/cxl/vendor/Kconfig   |   1 +
 hw/cxl/vendor/meson.build   |   1 +
 hw/cxl/vendor/skhynix/.gitignore|   1 +
 hw/cxl/vendor/skhynix/Kconfig   |   4 +
 hw/cxl/vendor/skhynix/init_niagara.c|  99 +
 hw/cxl/vendor/skhynix/meson.build   |   3 +
 hw/cxl/vendor/skhynix/skhynix_niagara.c | 516 
 hw/cxl/vendor/skhynix/skhynix_niagara.h | 162 
 10 files changed, 795 insertions(+)
 create mode 100644 hw/cxl/vendor/Kconfig
 create mode 100644 hw/cxl/vendor/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/.gitignore
 create mode 100644 hw/cxl/vendor/skhynix/Kconfig
 create mode 100644 hw/cxl/vendor/skhynix/init_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/meson.build
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.c
 create mode 100644 hw/cxl/vendor/skhynix/skhynix_niagara.h

-- 
2.39.1

Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_buffer_mergable() is semantically a predicate.  It returns
> int 0 or 1.  Return bool instead.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_search_ram_block() can't fail.  Return void, and drop the
> unreachable error handling.
>
> Signed-off-by: Markus Armbruster 
> ---
>  migration/rdma.c | 22 --
>  1 file changed, 8 insertions(+), 14 deletions(-)
>
> diff --git a/migration/rdma.c b/migration/rdma.c
> index 2b0f9d52d8..98520a42b4 100644
> --- a/migration/rdma.c
> +++ b/migration/rdma.c
> @@ -1234,12 +1234,12 @@ static int qemu_rdma_reg_whole_ram_blocks(RDMAContext 
> *rdma)
>   *
>   * This search cannot fail or the migration will fail.
>   */

This comment can be removed as well.

> -static int qemu_rdma_search_ram_block(RDMAContext *rdma,
> -  uintptr_t block_offset,
> -  uint64_t offset,
> -  uint64_t length,
> -  uint64_t *block_index,
> -  uint64_t *chunk_index)
> +static void qemu_rdma_search_ram_block(RDMAContext *rdma,
> +   uintptr_t block_offset,
> +   uint64_t offset,
> +   uint64_t length,
> +   uint64_t *block_index,
> +   uint64_t *chunk_index)
>  {
>  uint64_t current_addr = block_offset + offset;
>  RDMALocalBlock *block = g_hash_table_lookup(rdma->blockmap,
> @@ -1251,8 +1251,6 @@ static int qemu_rdma_search_ram_block(RDMAContext *rdma,
>  *block_index = block->index;
>  *chunk_index = ram_chunk_index(block->local_host_addr,
>  block->local_host_addr + (current_addr - block->offset));
> -
> -return 0;
>  }
>  
>  /*
> @@ -2321,12 +2319,8 @@ static int qemu_rdma_write(QEMUFile *f, RDMAContext 
> *rdma,
>  rdma->current_length = 0;
>  rdma->current_addr = current_addr;
>  
> -ret = qemu_rdma_search_ram_block(rdma, block_offset,
> - offset, len, , );
> -if (ret) {
> -error_report("ram block search failed");
> -return ret;
> -}
> +qemu_rdma_search_ram_block(rdma, block_offset,
> +   offset, len, , );
>  rdma->current_index = index;
>  rdma->current_chunk = chunk;
>  }

Re: [PATCH v2 qemu 3/3] dump: Add qmp argument "reassembled"

2023-09-18 Thread Stephen Brennan

Daniel P. Berrangé  writes:
> On Wed, Sep 13, 2023 at 06:03:15PM -0700, Stephen Brennan wrote:
>> This can be used from QMP command line as "-R" to mirror the
>> corresponding flag for makedumpfile. This enables the kdump_reassembled
>> flag introduced in the previous patch.
>> 
>> Signed-off-by: Stephen Brennan 
>> ---
>>  dump/dump-hmp-cmds.c |  8 +++-
>>  dump/dump.c  | 12 +++-
>>  hmp-commands.hx  |  7 +--
>>  qapi/dump.json   | 14 +-
>>  4 files changed, 36 insertions(+), 5 deletions(-)
>
>> diff --git a/qapi/dump.json b/qapi/dump.json
>> index 4ae1f722a9..9cc7c3ea93 100644
>> --- a/qapi/dump.json
>> +++ b/qapi/dump.json
>> @@ -69,6 +69,18 @@
>>  # to dump all guest's memory, please specify the start @begin and
>>  # @length
>>  #
>> +# @reassembled: if false (the default), the kdump output formats will use 
>> the
>> +# "makedumpfile flattened" variant of the format, which is less broadly
>> +# compatible with analysis tools. The flattened dump can be reassembled
>> +# after the fact using the command "makedumpfile -R". If true, Qemu
>> +# attempts to generate the standard kdump format. This requires a
>> +# seekable file as output -- if the output file is not seekable, then
>> +# the flattened format is still generated. The standard format is more
>> +# broadly compatible with debugging tools, but generating it requires a
>> +# seekable output file descriptor, and could use more system memory due
>> +# to page cache utilization. This should be left unspecified for 
>> non-kdump
>> +# output formats.
>> +#
>>  # @format: if specified, the format of guest memory dump.  But non-elf
>>  # format is conflict with paging and filter, ie.  @paging, @begin
>>  # and @length is not allowed to be specified with non-elf @format
>> @@ -89,7 +101,7 @@
>>  { 'command': 'dump-guest-memory',
>>'data': { 'paging': 'bool', 'protocol': 'str', '*detach': 'bool',
>>  '*begin': 'int', '*length': 'int',
>> -'*format': 'DumpGuestMemoryFormat'} }
>> +'*reassembled': 'bool', '*format': 'DumpGuestMemoryFormat'} }
>
> The 'reassembled' flag is changing the meaning of 3 out of the 5
> 'format' enum values. IMHO, we should just be adding new formats
> instead of changing the meaning of existing formats. It is a shame
> we have the current 'kdump' naming prefix, but we're stuck with
> that for backwards compat, we need a new prefix. I'd suggest
> 'kdump-raw'. eg
>
> #
> # @DumpGuestMemoryFormat:
> #
> # An enumeration of guest-memory-dump's format.
> #
> # @elf: elf format
> #
> # @kdump-zlib: makedumpfile flattened, kdump-compressed format with 
> zlib-compressed
> #
> # @kdump-lzo: makedumpfile flattened, kdump-compressed format with 
> lzo-compressed
> #
> # @kdump-snappy: makedumpfile flattened, kdump-compressed format with 
> snappy-compressed
> #
> # @kdump-raw-zlib: raw assembled kdump-compressed format with zlib-compressed 
> (since 8.2)
> #
> # @kdump-raw-lzo: raw assembled kdump-compressed format with lzo-compressed 
> (since 8.2)
> #
> # @kdump-raw-snappy: raw assembled kdump-compressed format with 
> snappy-compressed (since 8.2)
> #
> # @win-dmp: Windows full crashdump format, can be used instead of ELF
> # converting (since 2.13)
> #
> # Since: 2.0
> ##
> { 'enum': 'DumpGuestMemoryFormat',
>   'data': [ 'elf',
> 'kdump-zlib', 'kdump-lzo', 'kdump-snappy',
> 'kdump-raw-zlib', 'kdump-raw-lzo', 'kdump-raw-snappy',
> 'win-dmp' ] }

Hi Daniel,

Sure, I'll go ahead and use this approach instead. One question: I see
that this generates the enumeration DumpGuestMemoryFormat in
qapi-types-dump.h. I just wanted to double-check if there's any ABI
considerations for the numbering of this enum? Inserting kdump-raw-* at
this point would result in 'win-dmp' getting a different numbering, and
it seems possible that the API/ABI which libvirt uses might depend on
the enumeration values not changing. E.G. if libvirt is built against
one version of Qemu and then used with a different one.

Thanks,
Stephen

> With regards,
> Daniel
> -- 
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> rdma_add_block() can't fail.  Return void, and drop the unreachable
> error handling.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

[PATCH 2/8] migration: Fix possible race when setting rp_state.error

2023-09-18 Thread Fabiano Rosas

We don't need to set the rp_state.error right after a shutdown because
qemu_file_shutdown() always sets the QEMUFile error, so the return
path thread would have seen it and set the rp error itself.

Setting the error outside of the thread is also racy because the
thread could clear it after we set it.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index 3ee1e6b0d6..d426b69ada 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2074,7 +2074,6 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
  * waiting for the destination.
  */
 qemu_file_shutdown(ms->rp_state.from_dst_file);
-mark_source_rp_bad(ms);
 }
 trace_await_return_path_close_on_source_joining();
 qemu_thread_join(>rp_state.rp_thread);
-- 
2.35.3

[PATCH 4/8] migration: Fix possible race when shutting down to_dst_file

2023-09-18 Thread Fabiano Rosas

It's not safe to call qemu_file_shutdown() on the to_dst_file without
first checking for the file's presence under the lock. The cleanup of
this file happens at postcopy_pause() and migrate_fd_cleanup() which
are not necessarily running in the same thread as migrate_fd_cancel().

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 15b7258bb2..6e09463466 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1246,7 +1246,7 @@ static void migrate_fd_error(MigrationState *s, const 
Error *error)
 static void migrate_fd_cancel(MigrationState *s)
 {
 int old_state ;
-QEMUFile *f = migrate_get_current()->to_dst_file;
+
 trace_migrate_fd_cancel();
 
 WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
@@ -1272,11 +1272,13 @@ static void migrate_fd_cancel(MigrationState *s)
  * If we're unlucky the migration code might be stuck somewhere in a
  * send/write while the network has failed and is waiting to timeout;
  * if we've got shutdown(2) available then we can force it to quit.
- * The outgoing qemu file gets closed in migrate_fd_cleanup that is
- * called in a bh, so there is no race against this cancel.
  */
-if (s->state == MIGRATION_STATUS_CANCELLING && f) {
-qemu_file_shutdown(f);
+if (s->state == MIGRATION_STATUS_CANCELLING) {
+WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
+if (s->to_dst_file) {
+qemu_file_shutdown(s->to_dst_file);
+}
+}
 }
 if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
 Error *local_err = NULL;
@@ -1536,12 +1538,14 @@ void qmp_migrate_pause(Error **errp)
 {
 MigrationState *ms = migrate_get_current();
 MigrationIncomingState *mis = migration_incoming_get_current();
-int ret;
+int ret = 0;
 
 if (ms->state == MIGRATION_STATUS_POSTCOPY_ACTIVE) {
 /* Source side, during postcopy */
 qemu_mutex_lock(>qemu_file_lock);
-ret = qemu_file_shutdown(ms->to_dst_file);
+if (ms->to_dst_file) {
+ret = qemu_file_shutdown(ms->to_dst_file);
+}
 qemu_mutex_unlock(>qemu_file_lock);
 if (ret) {
 error_setg(errp, "Failed to pause source migration");
-- 
2.35.3

[PATCH 6/8] migration: Consolidate return path closing code

2023-09-18 Thread Fabiano Rosas

We'll start calling the await_return_path_close_on_source() function
from other parts of the code, so move all of the related checks and
tracepoints into it.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4372b0fbbf..f6c0250d33 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2061,6 +2061,14 @@ static int open_return_path_on_source(MigrationState *ms,
 /* Returns 0 if the RP was ok, otherwise there was an error on the RP */
 static int await_return_path_close_on_source(MigrationState *ms)
 {
+int ret;
+
+if (!ms->rp_state.rp_thread_created) {
+return 0;
+}
+
+trace_migration_return_path_end_before();
+
 /*
  * If this is a normal exit then the destination will send a SHUT
  * and the rp_thread will exit, however if there's an error we
@@ -2078,7 +2086,10 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
 qemu_thread_join(>rp_state.rp_thread);
 ms->rp_state.rp_thread_created = false;
 trace_await_return_path_close_on_source_close();
-return ms->rp_state.error;
+
+ret = ms->rp_state.error;
+trace_migration_return_path_end_after(ret);
+return ret;
 }
 
 static inline void
@@ -2374,20 +2385,8 @@ static void migration_completion(MigrationState *s)
 goto fail;
 }
 
-/*
- * If rp was opened we must clean up the thread before
- * cleaning everything else up (since if there are no failures
- * it will wait for the destination to send it's status in
- * a SHUT command).
- */
-if (s->rp_state.rp_thread_created) {
-int rp_error;
-trace_migration_return_path_end_before();
-rp_error = await_return_path_close_on_source(s);
-trace_migration_return_path_end_after(rp_error);
-if (rp_error) {
-goto fail;
-}
+if (await_return_path_close_on_source(s)) {
+goto fail;
 }
 
 if (qemu_file_get_error(s->to_dst_file)) {
-- 
2.35.3

[PATCH 8/8] migration: Move return path cleanup to main migration thread

2023-09-18 Thread Fabiano Rosas

Now that the return path thread is allowed to finish during a paused
migration, we can move the cleanup of the QEMUFiles to the main
migration thread.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/migration/migration.c b/migration/migration.c
index af78f7ee54..e2ed85b5be 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -98,6 +98,7 @@ static int migration_maybe_pause(MigrationState *s,
  int *current_active_state,
  int new_state);
 static void migrate_fd_cancel(MigrationState *s);
+static int await_return_path_close_on_source(MigrationState *s);
 
 static bool migration_needs_multiple_sockets(void)
 {
@@ -1178,6 +1179,12 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
+/*
+ * We already cleaned up to_dst_file, so errors from the return
+ * path might be due to that, ignore them.
+ */
+await_return_path_close_on_source(s);
+
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
@@ -1997,7 +2004,6 @@ out:
 }
 
 trace_source_return_path_thread_end();
-migration_release_dst_files(ms);
 rcu_unregister_thread();
 return NULL;
 }
@@ -2051,6 +2057,9 @@ static int 
await_return_path_close_on_source(MigrationState *ms)
 
 ret = ms->rp_state.error;
 ms->rp_state.error = false;
+
+migration_release_dst_files(ms);
+
 trace_migration_return_path_end_after(ret);
 return ret;
 }
-- 
2.35.3

[PATCH 0/8] migration fixes

2023-09-18 Thread Fabiano Rosas

This series contains fixes for the two currently know failures that
show up in migration tests plus a set of fixes for some theoretical
race conditions around QEMUFile handling.

Patch 1 addresses the issue found in the postcopy/preempt/plain test:
https://gitlab.com/qemu-project/qemu/-/issues/1886

Patch 7 fixes a rare crash during the postocpy/preempt/recovery/plain test:

  Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
  0x560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at 
../migration/qemu-file.c:154
  154 return f->last_error;

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1008652837

Fabiano Rosas (7):
  migration: Fix possible race when setting rp_state.error
  migration: Fix possible races when shutting down the return path
  migration: Fix possible race when shutting down to_dst_file
  migration: Remove redundant cleanup of postcopy_qemufile_src
  migration: Consolidate return path closing code
  migration: Replace the return path retry logic
  migration: Move return path cleanup to main migration thread

Peter Xu (1):
  migration: Fix race that dest preempt thread close too early

 migration/migration.c| 145 +++
 migration/migration.h|  14 +++-
 migration/postcopy-ram.c |  38 +-
 3 files changed, 106 insertions(+), 91 deletions(-)

-- 
2.35.3

[PATCH 1/8] migration: Fix race that dest preempt thread close too early

2023-09-18 Thread Fabiano Rosas

From: Peter Xu 

We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:

qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current 
= 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: 
(bad == 0)
(test program exited with status code -6)

Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.

To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.

Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine.  It's just better to not do so.

The condvar is only used when that special case is triggered.  Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas 
Signed-off-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c|  3 ++-
 migration/migration.h| 13 -
 migration/postcopy-ram.c | 38 +-
 3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d61e572742..3ee1e6b0d6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -153,6 +153,7 @@ void migration_object_init(void)
 qemu_sem_init(_incoming->postcopy_qemufile_dst_done, 0);
 
 qemu_mutex_init(_incoming->page_request_mutex);
+qemu_cond_init(_incoming->page_request_cond);
 current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
 
 migration_object_check(current_migration, _fatal);
@@ -367,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
  * things like g_tree_lookup() will return TRUE (1) when found.
  */
 g_tree_insert(mis->page_requested, aligned, (gpointer)1);
-mis->page_requested_count++;
+qatomic_inc(>page_requested_count);
 trace_postcopy_page_req_add(aligned, mis->page_requested_count);
 }
 }
diff --git a/migration/migration.h b/migration/migration.h
index c390500604..cdaa10d515 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -196,7 +196,10 @@ struct MigrationIncomingState {
 
 /* A tree of pages that we requested to the source VM */
 GTree *page_requested;
-/* For debugging purpose only, but would be nice to keep */
+/*
+ * For postcopy only, count the number of requested page faults that
+ * still haven't been resolved.
+ */
 int page_requested_count;
 /*
  * The mutex helps to maintain the requested pages that we sent to the
@@ -210,6 +213,14 @@ struct MigrationIncomingState {
  * contains valid information.
  */
 QemuMutex page_request_mutex;
+/*
+ * If postcopy preempt is enabled, there is a chance that the main
+ * thread finished loading its data before the preempt channel has
+ * finished loading the urgent pages.  If that happens, the two threads
+ * will use this condvar to synchronize, so the main thread will always
+ * wait until all pages received.
+ */
+QemuCond page_request_cond;
 
 /*
  * Number of devices that have yet to approve switchover. When this reaches
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 29aea9456d..5408e028c6 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -599,6 +599,30 @@ int postcopy_ram_incoming_cleanup(MigrationIncomingState 
*mis)
 if (mis->preempt_thread_status == PREEMPT_THREAD_CREATED) {
 /* Notify the fast load thread to quit */
 mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
+/*
+ * Update preempt_thread_status before reading count.  Note: mutex
+ * lock only provide ACQUIRE semantic, and it doesn't stops this
+ * write to be reordered after reading the count.
+ */
+smp_mb();
+/*
+ * It's possible that the preempt thread is still handling the last
+ * pages to arrive which were requested by guest page faults.
+ * Making sure nothing is left behind by waiting on the condvar if
+ *

[PATCH 7/8] migration: Replace the return path retry logic

2023-09-18 Thread Fabiano Rosas

Replace the return path retry logic with finishing and restarting the
thread. This fixes a race when resuming the migration that leads to a
segfault.

Currently when doing postcopy we consider that an IO error on the
return path file could be due to a network intermittency. We then keep
the thread alive but have it do cleanup of the 'from_dst_file' and
wait on the 'postcopy_pause_rp' semaphore. When the user issues a
migrate resume, a new return path is opened and the thread is allowed
to continue.

There's a race condition in the above mechanism. It is possible for
the new return path file to be setup *before* the cleanup code in the
return path thread has had a chance to run, leading to the *new* file
being closed and the pointer set to NULL. When the thread is released
after the resume, it tries to dereference 'from_dst_file' and crashes:

Thread 7 "return path" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1dbf700 (LWP 9611)]
0x560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at 
../migration/qemu-file.c:154
154 return f->last_error;

(gdb) bt
 #0  0x560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at 
../migration/qemu-file.c:154
 #1  0x560e4983 in qemu_file_get_error (f=0x0) at 
../migration/qemu-file.c:206
 #2  0x55b9a1df in source_return_path_thread (opaque=0x56e06000) at 
../migration/migration.c:1876
 #3  0x5602e14f in qemu_thread_start (args=0x5782e780) at 
../util/qemu-thread-posix.c:541
 #4  0x738d76ea in start_thread (arg=0x7fffd1dbf700) at 
pthread_create.c:477
 #5  0x735efa6f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the race (important bit is open_return_path happening before
migration_release_dst_files):

migration | qmp | return path
--+-+-
qmp_migrate_pause()
 shutdown(ms->to_dst_file)
  f->last_error = -EIO
migrate_detect_error()
 postcopy_pause()
  set_state(PAUSED)
  wait(postcopy_pause_sem)
qmp_migrate(resume)
migrate_fd_connect()
 resume = state == PAUSED
 open_return_path <-- TOO SOON!
 set_state(RECOVER)
 post(postcopy_pause_sem)
(incoming closes 
to_src_file)
res = 
qemu_file_get_error(rp)

migration_release_dst_files()

ms->rp_state.from_dst_file = NULL
  post(postcopy_pause_rp_sem)

postcopy_pause_return_path_thread()
  
wait(postcopy_pause_rp_sem)
rp = 
ms->rp_state.from_dst_file
goto retry
qemu_file_get_error(rp)
SIGSEGV
---

We can keep the retry logic without having the thread alive and
waiting. The only piece of data used by it is the 'from_dst_file' and
it is only allowed to proceed after a migrate resume is issued and the
semaphore released at migrate_fd_connect().

Move the retry logic to outside the thread by waiting for the thread
to finish before pausing the migration.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 60 ---
 migration/migration.h |  1 -
 2 files changed, 11 insertions(+), 50 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f6c0250d33..af78f7ee54 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1787,18 +1787,6 @@ static void migrate_handle_rp_req_pages(MigrationState 
*ms, const char* rbname,
 }
 }
 
-/* Return true to retry, false to quit */
-static bool postcopy_pause_return_path_thread(MigrationState *s)
-{
-trace_postcopy_pause_return_path();
-
-qemu_sem_wait(>postcopy_pause_rp_sem);
-
-trace_postcopy_pause_return_path_continued();
-
-return true;
-}
-
 static int migrate_handle_rp_recv_bitmap(MigrationState *s, char *block_name)
 {
 RAMBlock *block = qemu_ram_block_by_name(block_name);
@@ -1882,7 +1870,6 @@ static void *source_return_path_thread(void *opaque)
 trace_source_return_path_thread_entry();
 rcu_register_thread();
 
-retry:
 while (!ms->rp_state.error && !qemu_file_get_error(rp) &&
migration_is_setup_or_active(ms->state)) {

Deadlock with SATA CD I/O and eject

2023-09-18 Thread John Levon



Observed with base of qemu 6.2.0, but from code inspection it looks to me like
it's still current in upstream master. Apologies if I have missed a fix in this
area.

Symptom: run a UEFI-booted SATA CD Windows installer. When it hits "Loading
files.." screen, run an eject e.g.

virsh qemu-monitor-command 64c6e190-ea7f-49e2-b2d5-6ba1814b00ae 
'{"execute":"eject", "arguments": { "id": "sata0-0-0" } }'

qemu will get stuck like so:

gdb) bt
#0  0x7f8ba4b16036 in ppoll () from /lib64/libc.so.6
#1  0x561813c48ed5 in ppoll (__ss=0x0, __timeout=0x7ffcbd981a70, 
__nfds=, __fds=) at /usr/include/bits/poll2.h:62
#2  qemu_poll_ns (fds=, nfds=, 
timeout=timeout@entry=999896128) at ../util/qemu-timer.c:348
#3  0x561813c29be9 in fdmon_poll_wait (ctx=0x56181516e070, 
ready_list=0x7ffcbd981af0, timeout=999896128) at ../util/fdmon-poll.c:80
#4  0x561813c297e1 in aio_poll (ctx=ctx@entry=0x56181516e070, 
blocking=blocking@entry=true) at ../util/aio-posix.c:607
#5  0x561813ae2fad in bdrv_do_drained_begin (poll=true, 
ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x56181533fcc0) at 
../block/io.c:483
#6  bdrv_do_drained_begin (bs=0x56181533fcc0, recursive=, 
parent=0x0, ignore_bds_parents=, poll=) at 
../block/io.c:446
#7  0x561813ad9982 in blk_drain (blk=0x5618161c1f10) at 
../block/block-backend.c:1741
#8  0x561813ad9b8c in blk_remove_bs (blk=blk@entry=0x5618161c1f10) at 
../block/block-backend.c:852
#9  0x56181382b8ab in blockdev_remove_medium (has_device=, 
device=, has_id=, id=, 
errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:232
#10 0x56181382bfb1 in qmp_eject (has_device=, device=0x0, 
has_id=, id=0x561815e6efe0 "sata0-0-0", has_force=, force=, errp=0x7ffcbd981c78) at ../block/qapi-sysemu.c:45

We are stuck forever here:

 351 static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent,
 352   bool poll)
...
 380 if (poll) {
 381 BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, parent));
 382 }

Because the blk root's ->in_flight is 1, as tested by the condition
blk_root_drained_poll().


Our blk->in_flight user is stuck here:

1298 static void coroutine_fn blk_wait_while_drained(BlockBackend *blk)
...
1310 blk_dec_in_flight(blk);
1311 qemu_co_queue_wait(>queued_requests, 
>queued_requests_lock);
1312 blk_inc_in_flight(blk);

Note that before entering this stanza, blk->in_flight was 2. This turns out to
be due to the ide atapi code. In particular, the guest VM is generating lots of
read I/O. The "first IO" arrives into blk via:

cd_read_sector()->ide_buffered_readv()->blk_aio_preadv()

This initial IO completes:

blk_aio_read_entry()->blk_aio_complete()

1560 static void blk_aio_complete(BlkAioEmAIOCB *acb)
1561 {
1562 if (acb->has_returned) {
1563 acb->common.cb(acb->common.opaque, acb->rwco.ret);
1564 blk_dec_in_flight(acb->rwco.blk);
1565 qemu_aio_unref(acb);
1566 }
1567 }

Line 1564 is what we need to move blk->in_flight down to zero, but that is never
reached! This is because of what happens at :1563

acm->common.cb()->cd_read_sector_cb()->ide_atapi_cmd_reply_end()->cd_read_sector_sync()->blk_pread()

That is, the IO callback in the atapi code itself triggers more - synchronous - 
IO.

In the meantime, we start processing the blk_drain() code, so by the time this
blk_pread() actually gets handled, quiesce is set, and we get stuck in the
blk_wait_while_drained().

I don't know the qemu block stack well enough to propose an actual fix.

Experimentally, waiting for ->in_flight to drop to zero *before* we quiesce in
blk_remove_bs() via an AIO_WAIT_WHILE() avoids the symptom, but I'm pretty sure
that's just a band-aid instead of fixing the deadlock.

Any suggestions/clues/thoughts?

thanks
john

[PATCH 5/8] migration: Remove redundant cleanup of postcopy_qemufile_src

2023-09-18 Thread Fabiano Rosas

This file is owned by the return path thread which is already doing
cleanup.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 6e09463466..4372b0fbbf 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1178,12 +1178,6 @@ static void migrate_fd_cleanup(MigrationState *s)
 qemu_fclose(tmp);
 }
 
-if (s->postcopy_qemufile_src) {
-migration_ioc_unregister_yank_from_file(s->postcopy_qemufile_src);
-qemu_fclose(s->postcopy_qemufile_src);
-s->postcopy_qemufile_src = NULL;
-}
-
 assert(!migration_is_active(s));
 
 if (s->state == MIGRATION_STATUS_CANCELLING) {
-- 
2.35.3

[PATCH 3/8] migration: Fix possible races when shutting down the return path

2023-09-18 Thread Fabiano Rosas

We cannot call qemu_file_shutdown() on the return path file without
taking the file lock. The return path thread could be running it's
cleanup code and have just cleared the from_dst_file pointer.

Checking ms->to_dst_file for errors could also race with
migrate_fd_cleanup() which clears the to_dst_file pointer.

Protect both accesses by taking the file lock.

This was caught by inspection, it should be rare, but the next patches
will start calling this code from other places, so let's do the
correct thing.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 migration/migration.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index d426b69ada..15b7258bb2 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2064,17 +2064,18 @@ static int open_return_path_on_source(MigrationState 
*ms,
 static int await_return_path_close_on_source(MigrationState *ms)
 {
 /*
- * If this is a normal exit then the destination will send a SHUT and the
- * rp_thread will exit, however if there's an error we need to cause
- * it to exit.
+ * If this is a normal exit then the destination will send a SHUT
+ * and the rp_thread will exit, however if there's an error we
+ * need to cause it to exit. shutdown(2), if we have it, will
+ * cause it to unblock if it's stuck waiting for the destination.
  */
-if (qemu_file_get_error(ms->to_dst_file) && ms->rp_state.from_dst_file) {
-/*
- * shutdown(2), if we have it, will cause it to unblock if it's stuck
- * waiting for the destination.
- */
-qemu_file_shutdown(ms->rp_state.from_dst_file);
+WITH_QEMU_LOCK_GUARD(>qemu_file_lock) {
+if (ms->to_dst_file && ms->rp_state.from_dst_file &&
+qemu_file_get_error(ms->to_dst_file)) {
+qemu_file_shutdown(ms->rp_state.from_dst_file);
+}
 }
+
 trace_await_return_path_close_on_source_joining();
 qemu_thread_join(>rp_state.rp_thread);
 ms->rp_state.rp_thread_created = false;
-- 
2.35.3

Re: [PATCH] hw/i386/pc: fix max_used_gpa for 32-bit systems

2023-09-18 Thread Michael S. Tsirkin

On Mon, Sep 18, 2023 at 07:40:45PM +0530, Ani Sinha wrote:
> On Mon, Sep 18, 2023 at 7:31 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, Sep 18, 2023 at 07:24:48PM +0530, Ani Sinha wrote:
> > > 32-bit systems do not have a reserved memory for hole64 but they may have 
> > > a
> > > reserved memory space for memory hotplug. Since, hole64 starts after the
> > > reserved hotplug memory, the unaligned hole64 start address gives us the
> > > end address for this memory hotplug region that the processor may use.
> > > Fix this. This ensures that the physical address space bound checking 
> > > works
> > > correctly for 32-bit systems as well.
> > >
> > > Suggested-by: David Hildenbrand 
> > > Signed-off-by: Ani Sinha 
> >
> >
> > I doubt we can make changes like this without compat machinery. No?
> 
> Is that for not breaking migration or being backward compatible
> (something which was broken in the first place used to work but now
> its doesnt because we fixed it?)

migration mostly.

> >
> > > ---
> > >  hw/i386/pc.c | 60 ++--
> > >  1 file changed, 35 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index 54838c0c41..c8abcabd53 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -904,13 +904,43 @@ static uint64_t pc_get_cxl_range_end(PCMachineState 
> > > *pcms)
> > >  return start;
> > >  }
> > >
> > > +/*
> > > + * The 64bit pci hole starts after "above 4G RAM" and
> > > + * potentially the space reserved for memory hotplug.
> > > + * This function returns unaligned start address.
> > > + */
> > > +static uint64_t pc_pci_hole64_start_unaligned(void)
> > > +{
> > > +PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > > +PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > > +MachineState *ms = MACHINE(pcms);
> > > +uint64_t hole64_start = 0;
> > > +ram_addr_t size = 0;
> > > +
> > > +if (pcms->cxl_devices_state.is_enabled) {
> > > +hole64_start = pc_get_cxl_range_end(pcms);
> > > +} else if (pcmc->has_reserved_memory && (ms->ram_size < 
> > > ms->maxram_size)) {
> > > +pc_get_device_memory_range(pcms, _start, );
> > > +if (!pcmc->broken_reserved_end) {
> > > +hole64_start += size;
> > > +}
> > > +} else {
> > > +hole64_start = pc_above_4g_end(pcms);
> > > +}
> > > +
> > > +return hole64_start;
> > > +}
> > > +
> > >  static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t 
> > > pci_hole64_size)
> > >  {
> > >  X86CPU *cpu = X86_CPU(first_cpu);
> > >
> > > -/* 32-bit systems don't have hole64 thus return max CPU address */
> > > -if (cpu->phys_bits <= 32) {
> > > -return ((hwaddr)1 << cpu->phys_bits) - 1;
> > > +/*
> > > + * 32-bit systems don't have hole64, but we might have a region for
> > > + * memory hotplug.
> > > + */
> > > +if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
> > > +return pc_pci_hole64_start_unaligned() - 1;
> > >  }
> > >
> >
> > I see you are changing cpu->phys_bits to a CPUID check.
> > Could you explain why in the commit log?
> 
> Yeah missed that but will do in v2.
> 
> >
> > >  return pc_pci_hole64_start() + pci_hole64_size - 1;
> > > @@ -1147,30 +1177,10 @@ void pc_memory_init(PCMachineState *pcms,
> > >  pcms->memhp_io_base = ACPI_MEMORY_HOTPLUG_BASE;
> > >  }
> > >
> > > -/*
> > > - * The 64bit pci hole starts after "above 4G RAM" and
> > > - * potentially the space reserved for memory hotplug.
> > > - */
> > > +/* returns 1 GiB aligned hole64 start address */
> > >  uint64_t pc_pci_hole64_start(void)
> > >  {
> > > -PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
> > > -PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
> > > -MachineState *ms = MACHINE(pcms);
> > > -uint64_t hole64_start = 0;
> > > -ram_addr_t size = 0;
> > > -
> > > -if (pcms->cxl_devices_state.is_enabled) {
> > > -hole64_start = pc_get_cxl_range_end(pcms);
> > > -} else if (pcmc->has_reserved_memory && (ms->ram_size < 
> > > ms->maxram_size)) {
> > > -pc_get_device_memory_range(pcms, _start, );
> > > -if (!pcmc->broken_reserved_end) {
> > > -hole64_start += size;
> > > -}
> > > -} else {
> > > -hole64_start = pc_above_4g_end(pcms);
> > > -}
> > > -
> > > -return ROUND_UP(hole64_start, 1 * GiB);
> > > +return ROUND_UP(pc_pci_hole64_start_unaligned(), 1 * GiB);
> > >  }
> > >
> > >  DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus)
> > > --
> > > 2.39.1
> >

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Dave Jiang




On 9/18/23 10:00, Jonathan Cameron wrote:
> On Mon, 18 Sep 2023 17:31:38 +0100
> Peter Maydell  wrote:
> 
>> On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
>>  wrote:
>>>
>>> This has been missing from the start. Assume it should match
>>> with cxl/cxl-component-utils.c as both were part of early
>>> postings from Ben.  
>>
>> Sounds plausible -- is there an Intel person who could give us
>> an acked-by for this?
>>
>> (Ideally we wouldn't have let more gpl-2-only code into the
>> codebase without a rationale...)
>>
> 
> I've +CC'd the kernel CXL maintainers from Intel a few of whom
> have also contributed some of the QEMU CXL code.
> Hopefully someone can ack.

I see that nvdimm.c from Intel is under LGPL 2.1. What is the typical license 
this should be applied for QEMU?

> 
>>> Suggested-by: Philippe Mathieu-Daudé 
>>> Signed-off-by: Jonathan Cameron 
>>> ---
>>>  hw/mem/cxl_type3.c | 9 +
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
>>> index c5855d4e7d..ad3f0f6a9d 100644
>>> --- a/hw/mem/cxl_type3.c
>>> +++ b/hw/mem/cxl_type3.c
>>> @@ -1,3 +1,12 @@
>>> +/*
>>> + * CXL Type 3 (memory expander) device
>>> + *
>>> + * Copyright(C) 2020 Intel Corporation.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2. See the
>>> + * COPYING file in the top-level directory.
>>> + */
>>> +
>>>  #include "qemu/osdep.h"
>>>  #include "qemu/units.h"
>>>  #include "qemu/error-report.h"  
>>
>> -- PMM
>>
>

Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> When all we do with an Error we receive into a local variable is
> propagating to somewhere else, we can just as well receive it there
> right away.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 2/3] qga: Add optional stream-output argument to guest-exec

2023-09-18 Thread Daniel Xu

Hi Daniel,

On Mon, Sep 18, 2023 at 04:14:47PM +0100, Daniel P. Berrangé wrote:
> On Mon, Sep 18, 2023 at 04:54:22AM -0600, Daniel Xu wrote:
> > Currently, commands run through guest-exec are "silent" until they
> > finish running. This is fine for short lived commands. But for commands
> > that take a while, this is a bad user experience.
> > 
> > Usually long running programs know that they will run for a while. To
> > improve user experience, they will typically print some kind of status
> > to output at a regular interval. So that the user knows that their
> > command isn't just hanging.
> > 
> > This commit adds support for an optional stream-output parameter to
> > guest-exec. This causes subsequent calls to guest-exec-status to return
> > all buffered output. This allows downstream applications to be able to
> > relay "status" to the end user.
> > 
> > If stream-output is requested, it is up to the guest-exec-status caller
> > to keep track of the last seen output position and slice the returned
> > output appropriately. This is fairly trivial for a client to do. And it
> > is a more reliable design than having QGA internally keep track of
> > position -- for the cases that the caller "loses" a response.
> 
> I can understand why you want this incremental output facility,
> but at the same time I wonder where we draw the line for QGA
> with users needing a real shell session instead.

You mean interactive shell, right? If so, I would agree an interactive
shell is not a good fit for QGA.

But as it stands, a non-interactive shell works quite well (having
guest-exec run a bash script). I was the one who added the merged output
stream support a few months back. With merged output streams and this
streaming support, you can do some really neat things with QGA (see
below).

The primary reason I'm adding this support is for vmtest [0]. You can
find code for it here [1]. Basically what leveraging QGA does is allow
the vmtest implementation to reuse the same code for both kernel-only
(ie bzImage) and and image targets (eg qcow2). 

[0]: https://dxuuu.xyz/vmtest.html
[1]: https://github.com/danobi/vmtest

> 
> When there is a long lived command, then IMHO it is also likely
> that there will be a need to kill the background running command
> too.
> 
> We quickly end up re-inventing a shell in QGA if we go down this
> route.

I can understand if you don't want to bloat the QGA feature set, but
IMHO this change cleanly composes with the current implementation and
is easily unit testable (and comes with a test).

Per the discussion in the other thread, it could be argued that this
streaming feature is actually a bug fix -- the documentation seems to
imply otherwise, which both Markus and I have independently arrived
at. But I don't think we need to go into semantics like that :) .

But it does kinda imply from first principles that it is a reasonable
thing for guest-exec-status to provide. Perhaps it's too late to change
the existing behavior, so a flag is needed.

I hope my reasoning makes sense. And thanks for giving this a look.

Thanks,
Daniel

[...]

Re: [PATCH 09/52] migration/rdma: Put @errp parameter last

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> include/qapi/error.h demands:
>
>  * - Functions that use Error to report errors have an Error **errp
>  *   parameter.  It should be the last parameter, except for functions
>  *   taking variable arguments.
>
> qemu_rdma_connect() does not conform.  Clean it up.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_accept() returns 0 in some cases even when it didn't
> complete its job due to errors.  Impact is not obvious.  I figure the
> caller will soon fail again with a misleading error message.
>
> Fix it to return -1 on any failure.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 06/52] migration/rdma: Clean up two more harmless signed vs. unsigned issues

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_exchange_get_response() compares int parameter @expecting
> with uint32_t head->type.  Actual arguments are non-negative
> enumeration constants, RDMAControlHeader uint32_t member type, or
> qemu_rdma_exchange_recv() int parameter expecting.  Actual arguments
> for the latter are non-negative enumeration constants.  Change both
> parameters to uint32_t.
>
> In qio_channel_rdma_readv(), loop control variable @i is ssize_t, and
> counts from 0 up to @niov, which is size_t.  Change @i to size_t.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas 

just a comment...

>  static int qemu_rdma_exchange_recv(RDMAContext *rdma, RDMAControlHeader 
> *head,
> -int expecting)
> +   uint32_t expecting)
>  {
>  RDMAControlHeader ready = {
>  .len = 0,
> @@ -2851,7 +2851,7 @@ static ssize_t qio_channel_rdma_readv(QIOChannel *ioc,
>  RDMAContext *rdma;
>  RDMAControlHeader head;
>  int ret = 0;
> -ssize_t i;
> +size_t i;
>  size_t done = 0;

It seems the idea was for 'done' to be ssize_t like in
qio_channel_rdma_writev()

Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> We use int instead of uint64_t in a few places.  Change them to
> uint64_t.
>
> This cleans up a comparison of signed qemu_rdma_block_for_wrid()
> parameter @wrid_requested with unsigned @wr_id.  Harmless, because the
> actual arguments are non-negative enumeration constants.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> wrid_desc[] uses 4001 pointers to map four integer values to strings.
>
> print_wrid() accesses wrid_desc[] out of bounds when passed a negative
> argument.  It returns null for values 2..1999 and 2001..3999.
>
> qemu_rdma_poll() and qemu_rdma_block_for_wrid() print wrid_desc[wr_id]
> and passes print_wrid(wr_id) to tracepoints.  Could conceivably crash
> trying to format a null string.  I believe access out of bounds is not
> possible.
>
> Not worth cleaning up.  Dumb down to show just numeric wr_id.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Jonathan Cameron via

On Mon, 18 Sep 2023 17:31:38 +0100
Peter Maydell  wrote:

> On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
>  wrote:
> >
> > This has been missing from the start. Assume it should match
> > with cxl/cxl-component-utils.c as both were part of early
> > postings from Ben.  
> 
> Sounds plausible -- is there an Intel person who could give us
> an acked-by for this?
> 
> (Ideally we wouldn't have let more gpl-2-only code into the
> codebase without a rationale...)
> 

I've +CC'd the kernel CXL maintainers from Intel a few of whom
have also contributed some of the QEMU CXL code.
Hopefully someone can ack.

> > Suggested-by: Philippe Mathieu-Daudé 
> > Signed-off-by: Jonathan Cameron 
> > ---
> >  hw/mem/cxl_type3.c | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> > index c5855d4e7d..ad3f0f6a9d 100644
> > --- a/hw/mem/cxl_type3.c
> > +++ b/hw/mem/cxl_type3.c
> > @@ -1,3 +1,12 @@
> > +/*
> > + * CXL Type 3 (memory expander) device
> > + *
> > + * Copyright(C) 2020 Intel Corporation.
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2. See the
> > + * COPYING file in the top-level directory.
> > + */
> > +
> >  #include "qemu/osdep.h"
> >  #include "qemu/units.h"
> >  #include "qemu/error-report.h"  
> 
> -- PMM
>

Re: [PATCH 2/3] qga: Add optional stream-output argument to guest-exec

2023-09-18 Thread Daniel Xu

On Mon, Sep 18, 2023 at 05:05:03PM +0200, Markus Armbruster wrote:
> Daniel Xu  writes:
> 
> > Currently, commands run through guest-exec are "silent" until they
> > finish running. This is fine for short lived commands. But for commands
> > that take a while, this is a bad user experience.
> >
> > Usually long running programs know that they will run for a while. To
> > improve user experience, they will typically print some kind of status
> > to output at a regular interval. So that the user knows that their
> > command isn't just hanging.
> >
> > This commit adds support for an optional stream-output parameter to
> > guest-exec. This causes subsequent calls to guest-exec-status to return
> > all buffered output. This allows downstream applications to be able to
> > relay "status" to the end user.
> >
> > If stream-output is requested, it is up to the guest-exec-status caller
> > to keep track of the last seen output position and slice the returned
> > output appropriately. This is fairly trivial for a client to do. And it
> > is a more reliable design than having QGA internally keep track of
> > position -- for the cases that the caller "loses" a response.
> >
> > Signed-off-by: Daniel Xu 
> > ---
> 
> [...]
> 
> > diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
> > index b720dd4379..0a76e35082 100644
> > --- a/qga/qapi-schema.json
> > +++ b/qga/qapi-schema.json
> > @@ -1315,13 +1315,18 @@
> >  # @capture-output: bool flag to enable capture of stdout/stderr of
> >  # running process.  defaults to false.
> >  #
> > +# @stream-output: causes future guest-exec-status calls to always
> > +# return current captured output rather than waiting to return
> > +# it all when the command exits. defaults to false. (since: 8.2)
> 
> So guest-exec-status normally returns no captured output until the
> process terminates, right?  Its documentation (shown below for
> convenience) did not make me expect this!

Right. You can see I made the same mistake [0] quite a few months ago
and realized it some time later.

[0]: 
https://github.com/danobi/vmtest/blob/73007280446cea3c823eb2dde78261501e6273ab/src/qemu.rs#L368-L406

> 
> > +#
> >  # Returns: PID on success.
> >  #
> >  # Since: 2.5
> >  ##
> >  { 'command': 'guest-exec',
> >'data':{ 'path': 'str', '*arg': ['str'], '*env': ['str'],
> > -   '*input-data': 'str', '*capture-output': 
> > 'GuestExecCaptureOutput' },
> > +   '*input-data': 'str', '*capture-output': 
> > 'GuestExecCaptureOutput',
> > +   '*stream-output': 'bool' },
> >'returns': 'GuestExec' }
> 
>##
># @GuestExecStatus:
>#
># @exited: true if process has already terminated.
>#
># @exitcode: process exit code if it was normally terminated.
>#
># @signal: signal number (linux) or unhandled exception code (windows)
># if the process was abnormally terminated.
>#
># @out-data: base64-encoded stdout of the process
>#
># @err-data: base64-encoded stderr of the process Note: @out-data and
># @err-data are present only if 'capture-output' was specified for
># 'guest-exec'
>#
># @out-truncated: true if stdout was not fully captured due to size
># limitation.
>#
># @err-truncated: true if stderr was not fully captured due to size
># limitation.
>#
># Since: 2.5
>##
>{ 'struct': 'GuestExecStatus',
>  'data': { 'exited': 'bool', '*exitcode': 'int', '*signal': 'int',
>'*out-data': 'str', '*err-data': 'str',
>'*out-truncated': 'bool', '*err-truncated': 'bool' }}
>##
># @guest-exec-status:
>#
># Check status of process associated with PID retrieved via
># guest-exec.  Reap the process and associated metadata if it has
># exited.
>#
># @pid: pid returned from guest-exec
>#
># Returns: GuestExecStatus on success.
>#
># Since: 2.5
>##
>{ 'command': 'guest-exec-status',
>  'data':{ 'pid': 'int' },
>  'returns': 'GuestExecStatus' }
> 
> Could you throw in a patch to clarify what exactly is returned while the
> process is still running, and what only after it terminated?  It should
> go first.
> 

Yes, will do.

Thanks,
Daniel

Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> rdma_delete_block() always returns 0, which its only caller ignores.
> Return void instead.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_data_init() return type is void *.  It actually returns
> RDMAContext *, and all its callers assign the value to an
> RDMAContext *.  Unclean.
>
> Return RDMAContext * instead.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type

2023-09-18 Thread Fabiano Rosas

Markus Armbruster  writes:

> qemu_rdma_poll()'s return type is uint64_t, even though it returns 0,
> -1, or @ret, which is int.  Its callers assign the return value to int
> variables, then check whether it's negative.  Unclean.
>
> Return int instead.
>
> Signed-off-by: Markus Armbruster 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 4/4] hw/cxl: Line length reductions

2023-09-18 Thread Fan Ni

On Fri, Sep 15, 2023 at 06:04:18PM +0100, Jonathan Cameron wrote:
> Michael Tsirkin observed that there were some unnecessarily
> long lines in the CXL code in a recent review.
> This patch is intended to rectify that where it does not
> hurt readability.
> 
> Reviewed-by: Michael Tokarev 
> Signed-off-by: Jonathan Cameron 
> ---

One minor comment inline. Other than that, looks good to me.


>  include/hw/cxl/cxl_component.h |  3 ++-
>  include/hw/cxl/cxl_device.h|  5 +++--
>  include/hw/cxl/cxl_events.h|  3 ++-
>  hw/cxl/cxl-cdat.c  |  3 ++-
>  hw/cxl/cxl-component-utils.c   | 10 ++
>  hw/cxl/cxl-events.c|  9 ++---
>  hw/cxl/cxl-mailbox-utils.c | 21 ++---
>  hw/mem/cxl_type3.c | 31 +++
>  hw/mem/cxl_type3_stubs.c   |  5 +++--
>  9 files changed, 57 insertions(+), 33 deletions(-)
> 
> diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
> index 3c795a6278..e52dd8d2b9 100644
> --- a/include/hw/cxl/cxl_component.h
> +++ b/include/hw/cxl/cxl_component.h
> @@ -175,7 +175,8 @@ HDM_DECODER_INIT(3);
>  (CXL_IDE_REGISTERS_OFFSET + CXL_IDE_REGISTERS_SIZE)
>  #define CXL_SNOOP_REGISTERS_SIZE   0x8
>  
> -QEMU_BUILD_BUG_MSG((CXL_SNOOP_REGISTERS_OFFSET + CXL_SNOOP_REGISTERS_SIZE) 
> >= 0x1000,
> +QEMU_BUILD_BUG_MSG((CXL_SNOOP_REGISTERS_OFFSET +
> +CXL_SNOOP_REGISTERS_SIZE) >= 0x1000,
> "No space for registers");
>  
>  typedef struct component_registers {
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 51cd0d9ce3..007ddaf078 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -192,7 +192,7 @@ void cxl_device_register_init_common(CXLDeviceState *dev);
>   * Documented as a 128 bit register, but 64 bit accesses and the second
>   * 64 bits are currently reserved.
>   */
> -REG64(CXL_DEV_CAP_ARRAY, 0) /* Documented as 128 bit register but 64 byte 
> accesses */
> +REG64(CXL_DEV_CAP_ARRAY, 0)
>  FIELD(CXL_DEV_CAP_ARRAY, CAP_ID, 0, 16)
>  FIELD(CXL_DEV_CAP_ARRAY, CAP_VERSION, 16, 8)
>  FIELD(CXL_DEV_CAP_ARRAY, CAP_COUNT, 32, 16)
> @@ -361,7 +361,8 @@ struct CXLType3Class {
>  uint64_t offset);
>  void (*set_lsa)(CXLType3Dev *ct3d, const void *buf, uint64_t size,
>  uint64_t offset);
> -bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
> *data);
> +bool (*set_cacheline)(CXLType3Dev *ct3d, uint64_t dpa_offset,
> +  uint8_t *data);
>  };
>  
>  MemTxResult cxl_type3_read(PCIDevice *d, hwaddr host_addr, uint64_t *data,
> diff --git a/include/hw/cxl/cxl_events.h b/include/hw/cxl/cxl_events.h
> index 089ba2091f..d778487b7e 100644
> --- a/include/hw/cxl/cxl_events.h
> +++ b/include/hw/cxl/cxl_events.h
> @@ -92,7 +92,8 @@ typedef enum CXLEventIntMode {
>  CXL_INT_RES  = 0x03,
>  } CXLEventIntMode;
>  #define CXL_EVENT_INT_MODE_MASK 0x3
> -#define CXL_EVENT_INT_SETTING(vector) uint8_t)vector & 0xf) << 4) | 
> CXL_INT_MSI_MSIX)
> +#define CXL_EVENT_INT_SETTING(vector) \
> +uint8_t)vector & 0xf) << 4) | CXL_INT_MSI_MSIX)
>  typedef struct CXLEventInterruptPolicy {
>  uint8_t info_settings;
>  uint8_t warn_settings;
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> index d246d6885b..639a2db3e1 100644
> --- a/hw/cxl/cxl-cdat.c
> +++ b/hw/cxl/cxl-cdat.c
> @@ -60,7 +60,8 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
>  return;
>  }
>  
> -cdat->built_buf_len = cdat->build_cdat_table(>built_buf, 
> cdat->private);
> +cdat->built_buf_len = cdat->build_cdat_table(>built_buf,
> + cdat->private);
>  
>  if (!cdat->built_buf_len) {
>  /* Build later as not all data available yet */
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index c8d632d540..915c208209 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -240,7 +240,8 @@ static void hdm_init_common(uint32_t *reg_state, uint32_t 
> *write_msk,
>  ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, TARGET_COUNT, 1);
>  ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, INTERLEAVE_256B, 
> 1);
>  ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, INTERLEAVE_4K, 
> 1);
> -ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY, 
> POISON_ON_ERR_CAP, 0);
> +ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_CAPABILITY,
> + POISON_ON_ERR_CAP, 0);
>  ARRAY_FIELD_DP32(reg_state, CXL_HDM_DECODER_GLOBAL_CONTROL,
>   HDM_DECODER_ENABLE, 0);
>  write_msk[R_CXL_HDM_DECODER_GLOBAL_CONTROL] = 0x3;
> @@ -263,13 +264,14 @@ static void hdm_init_common(uint32_t *reg_state, 
> uint32_t *write_msk,
>  }
>  }
>  
> -void cxl_component_register_init_common(uint32_t *reg_state, uint32_t 
>

Re: [PATCH 3/3] docs/cxl: Cleanout some more aarch64 examples.

2023-09-18 Thread Peter Maydell

On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
 wrote:
>
> These crossed with the previous fix to get rid of examples
> using aarch64 for which support is not yet upstream.
>
> Signed-off-by: Jonathan Cameron 

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1892

thanks
-- PMM

Re: [PATCH v2 3/4] hw/cxl: CXLDVSECPortExtensions renamed to CXLDVSECPortExt

2023-09-18 Thread Fan Ni

On Fri, Sep 15, 2023 at 06:04:17PM +0100, Jonathan Cameron wrote:

> Done to reduce line lengths where this is used.
> Ext seems sufficiently obvious that it need not be spelt out
> fully.
> 
> Signed-off-by: Jonathan Cameron 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---

Reviewed-by: Fan Ni 

>  include/hw/cxl/cxl_pci.h   |  6 ++---
>  hw/cxl/cxl-component-utils.c   | 49 --
>  hw/pci-bridge/cxl_downstream.c |  2 +-
>  hw/pci-bridge/cxl_root_port.c  |  2 +-
>  hw/pci-bridge/cxl_upstream.c   |  2 +-
>  5 files changed, 35 insertions(+), 26 deletions(-)
> 
> diff --git a/include/hw/cxl/cxl_pci.h b/include/hw/cxl/cxl_pci.h
> index 407be95b9e..ddf01a543b 100644
> --- a/include/hw/cxl/cxl_pci.h
> +++ b/include/hw/cxl/cxl_pci.h
> @@ -86,7 +86,7 @@ typedef struct CXLDVSECDevice {
>  QEMU_BUILD_BUG_ON(sizeof(CXLDVSECDevice) != 0x38);
>  
>  /* CXL 2.0 - 8.1.5 (ID 0003) */
> -typedef struct CXLDVSECPortExtensions {
> +typedef struct CXLDVSECPortExt {
>  DVSECHeader hdr;
>  uint16_t status;
>  uint16_t control;
> @@ -100,8 +100,8 @@ typedef struct CXLDVSECPortExtensions {
>  uint32_t alt_prefetch_limit_high;
>  uint32_t rcrb_base;
>  uint32_t rcrb_base_high;
> -} CXLDVSECPortExtensions;
> -QEMU_BUILD_BUG_ON(sizeof(CXLDVSECPortExtensions) != 0x28);
> +} CXLDVSECPortExt;
> +QEMU_BUILD_BUG_ON(sizeof(CXLDVSECPortExt) != 0x28);
>  
>  #define PORT_CONTROL_OFFSET  0xc
>  #define PORT_CONTROL_UNMASK_SBR  1
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index 7ef3ef2bd6..c8d632d540 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -392,26 +392,35 @@ void cxl_component_create_dvsec(CXLComponentState *cxl,
>  case NON_CXL_FUNCTION_MAP_DVSEC:
>  break; /* Not yet implemented */
>  case EXTENSIONS_PORT_DVSEC:
> -wmask[offset + offsetof(CXLDVSECPortExtensions, control)] = 0x0F;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, control) + 1] = 0x40;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_bus_base)] = 
> 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_bus_limit)] = 
> 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_memory_base)] = 
> 0xF0;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_memory_base) + 
> 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_memory_limit)] = 
> 0xF0;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_memory_limit) + 
> 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_prefetch_base)] 
> = 0xF0;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_prefetch_base) + 
> 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_prefetch_limit)] 
> = 0xF0;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, alt_prefetch_limit) 
> + 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_base_high)] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_base_high) + 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_base_high) + 2] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_base_high) + 3] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_limit_high)] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_limit_high) + 1] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_limit_high) + 2] = 0xFF;
> -wmask[offset + offsetof(CXLDVSECPortExtensions, 
> alt_prefetch_limit_high) + 3] = 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, control)] = 0x0F;
> +wmask[offset + offsetof(CXLDVSECPortExt, control) + 1] = 0x40;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_bus_base)] = 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_bus_limit)] = 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_memory_base)] = 0xF0;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_memory_base) + 1] = 
> 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_memory_limit)] = 0xF0;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_memory_limit) + 1] = 
> 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_base)] = 0xF0;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_base) + 1] = 
> 0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_limit)] = 0xF0;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_limit) + 1] =
> +0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_base_high)] =
> +0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_base_high) + 
> 1] =
> +0xFF;
> +wmask[offset + offsetof(CXLDVSECPortExt, alt_prefetch_base_high) + 
> 2] =
> +

Re: [PATCH 18/22] target/s390x: Call s390_cpu_realize_sysemu from s390_realize_cpu_model

2023-09-18 Thread David Hildenbrand


On 18.09.23 18:02, Philippe Mathieu-Daudé wrote:

s390_cpu_realize_sysemu() runs some checks for the TCG accelerator,
previous to creating the vCPU. s390_realize_cpu_model() also does
run some checks for KVM.
Move the sysemu call to s390_realize_cpu_model(). Having a single
call before cpu_exec_realizefn() will allow us to factor a
verify_accel_features() handler out in a pair of commits.

Directly pass a S390CPU* to s390_cpu_realize_sysemu() to simplify.

Signed-off-by: Philippe Mathieu-Daudé 
---
  target/s390x/s390x-internal.h | 2 +-
  target/s390x/cpu-sysemu.c | 3 +--
  target/s390x/cpu.c| 6 --
  target/s390x/cpu_models.c | 4 
  4 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/s390x/s390x-internal.h b/target/s390x/s390x-internal.h
index 825252d728..781ac08458 100644
--- a/target/s390x/s390x-internal.h
+++ b/target/s390x/s390x-internal.h
@@ -241,7 +241,7 @@ uint32_t calc_cc(CPUS390XState *env, uint32_t cc_op, 
uint64_t src, uint64_t dst,
  unsigned int s390_cpu_halt(S390CPU *cpu);
  void s390_cpu_unhalt(S390CPU *cpu);
  void s390_cpu_init_sysemu(Object *obj);
-bool s390_cpu_realize_sysemu(DeviceState *dev, Error **errp);
+bool s390_cpu_realize_sysemu(S390CPU *cpu, Error **errp);
  void s390_cpu_finalize(Object *obj);
  void s390_cpu_class_init_sysemu(CPUClass *cc);
  void s390_cpu_machine_reset_cb(void *opaque);
diff --git a/target/s390x/cpu-sysemu.c b/target/s390x/cpu-sysemu.c
index 8112561e5e..5178736c46 100644
--- a/target/s390x/cpu-sysemu.c
+++ b/target/s390x/cpu-sysemu.c
@@ -122,9 +122,8 @@ void s390_cpu_init_sysemu(Object *obj)
  s390_cpu_set_state(S390_CPU_STATE_STOPPED, cpu);
  }
  
-bool s390_cpu_realize_sysemu(DeviceState *dev, Error **errp)

+bool s390_cpu_realize_sysemu(S390CPU *cpu, Error **errp)
  {
-S390CPU *cpu = S390_CPU(dev);
  MachineState *ms = MACHINE(qdev_get_machine());
  unsigned int max_cpus = ms->smp.max_cpus;
  
diff --git a/target/s390x/cpu.c b/target/s390x/cpu.c

index 416ac6c4e0..7257d4bc19 100644
--- a/target/s390x/cpu.c
+++ b/target/s390x/cpu.c
@@ -237,12 +237,6 @@ static void s390_cpu_realizefn(DeviceState *dev, Error 
**errp)
  goto out;
  }
  
-#if !defined(CONFIG_USER_ONLY)

-if (!s390_cpu_realize_sysemu(dev, )) {
-goto out;
-}
-#endif
-
  cpu_exec_realizefn(cs, );
  if (err != NULL) {
  goto out;
diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 98f14c09c2..f030be0d55 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -612,6 +612,10 @@ void s390_realize_cpu_model(CPUState *cs, Error **errp)
  cpu->env.cpuid = deposit64(cpu->env.cpuid, CPU_PHYS_ADDR_SHIFT,
 CPU_PHYS_ADDR_BITS, cpu->env.core_id);
  }
+
+if (!s390_cpu_realize_sysemu(cpu, )) {
+return;
+}
  #endif
  }
  


That has nothing to do with CPU models and is, therefore, completely 
misplaced ... :/


Or what am I missing?

--
Cheers,

David / dhildenb

Re: [PATCH 2/3] hw/mem/cxl_type3: Add missing copyright and license notice

2023-09-18 Thread Peter Maydell

On Mon, 18 Sept 2023 at 16:04, Jonathan Cameron
 wrote:
>
> This has been missing from the start. Assume it should match
> with cxl/cxl-component-utils.c as both were part of early
> postings from Ben.

Sounds plausible -- is there an Intel person who could give us
an acked-by for this?

(Ideally we wouldn't have let more gpl-2-only code into the
codebase without a rationale...)

> Suggested-by: Philippe Mathieu-Daudé 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/mem/cxl_type3.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index c5855d4e7d..ad3f0f6a9d 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -1,3 +1,12 @@
> +/*
> + * CXL Type 3 (memory expander) device
> + *
> + * Copyright(C) 2020 Intel Corporation.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See the
> + * COPYING file in the top-level directory.
> + */
> +
>  #include "qemu/osdep.h"
>  #include "qemu/units.h"
>  #include "qemu/error-report.h"

-- PMM

1 2 3 4 >

1 - 100 of 391 matches

Mail list logo