date:20240617

Re: [PATCH 08/23] Add Aarch64 sysarch() system call emulation for BSD-USER

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son

Initial implementation of sysarch() syscall and a printing function

Signed-off-by: Stacey Son
Signed-off-by: Ajeet Singh
---
  bsd-user/aarch64/target_arch_sysarch.h | 42 ++
  1 file changed, 42 insertions(+)
  create mode 100644 bsd-user/aarch64/target_arch_sysarch.h


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 07/23] Add ARM AArch64 TLS Management Prototypes for BSD-User

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son 

Prototypes for setting and getting TLS( thread local storage)

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
  bsd-user/aarch64/target_arch.h | 28 
  1 file changed, 28 insertions(+)
  create mode 100644 bsd-user/aarch64/target_arch.h

diff --git a/bsd-user/aarch64/target_arch.h b/bsd-user/aarch64/target_arch.h
new file mode 100644
index 00..27f47de8eb
--- /dev/null
+++ b/bsd-user/aarch64/target_arch.h
@@ -0,0 +1,28 @@
+/*
+ * ARM AArch64 specific prototypes for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_H
+#define TARGET_ARCH_H
+
+#include "qemu.h"
+
+void target_cpu_set_tls(CPUARMState *env, target_ulong newtls);
+target_ulong target_cpu_get_tls(CPUARMState *env);
+
+#endif /* TARGET_ARCH_H */


Acked-by: Richard Henderson 

I suggest that these declarations use CPUArchState, and be made common for all 
targets.

r~

Re: [PATCH 06/23] Add Aarch64 register handling

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son 

Header file for managing CPU register states in
FreeBSD user mode

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
  bsd-user/aarch64/target_arch_reg.h | 56 ++
  1 file changed, 56 insertions(+)
  create mode 100644 bsd-user/aarch64/target_arch_reg.h

diff --git a/bsd-user/aarch64/target_arch_reg.h 
b/bsd-user/aarch64/target_arch_reg.h
new file mode 100644
index 00..5c7154f0c1
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_reg.h
@@ -0,0 +1,56 @@
+/*
+ *  FreeBSD arm64 register structures
+ *
+ *  Copyright (c) 2015 Stacey Son
+ *  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_REG_H
+#define TARGET_ARCH_REG_H
+
+/* See sys/arm64/include/reg.h */
+typedef struct target_reg {
+uint64_tx[30];
+uint64_tlr;
+uint64_tsp;
+uint64_telr;
+uint64_tspsr;
+} target_reg_t;
+
+typedef struct target_fpreg {
+__uint128_t fp_q[32];


I'm not keen on this, though possibly it doesn't matter for hosts that bsd-user is 
intended to support.  Better as either Int128 or uint64_t fp_q[32][2].


What is this structure used for within qemu?
Does freebsd support SVE yet?

It's certainly not used with this patch, so it's hard to tell, but can we omit it entirely 
for now?



r~

Re: [PATCH 05/23] Managing CPU register for BSD-USER

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son

Added structure for storing register states

Signed-off-by: Stacey Son
Signed-off-by: Ajeet Singh
Co-authored-by: Sean Bruno
---
  bsd-user/aarch64/target_syscall.h | 51 +++
  1 file changed, 51 insertions(+)
  create mode 100644 bsd-user/aarch64/target_syscall.h


Acked-by: Richard Henderson 

r~

Re: [PATCH 04/23] AArch64 specific CPU for bsd-user

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son 

Function to set and recieve thread-local-storage value
from tpidr_el0 register

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
  bsd-user/aarch64/target_arch_cpu.c | 34 ++
  1 file changed, 34 insertions(+)
  create mode 100644 bsd-user/aarch64/target_arch_cpu.c

diff --git a/bsd-user/aarch64/target_arch_cpu.c 
b/bsd-user/aarch64/target_arch_cpu.c
new file mode 100644
index 00..70ef651827
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_cpu.c
@@ -0,0 +1,34 @@
+/*
+ * ARM AArch64 specific CPU for bsd-user
+ *
+ * Copyright (c) 2015 Stacey Son
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "qemu/osdep.h"
+
+#include "target_arch.h"
+
+/* See cpu_set_user_tls() in arm64/arm64/vm_machdep.c */
+void target_cpu_set_tls(CPUARMState *env, target_ulong newtls)
+{
+
+env->cp15.tpidr_el[0] = newtls;


Extra newline.

Otherwise,
Reviewed-by: Richard Henderson 

r~

Re: [PATCH 03/23] Added function to clone CPU state

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son

Function can copy cpu state to create new thread

Signed-off-by: Stacey Son
Signed-off-by: Ajeet Singh
---
  bsd-user/aarch64/target_arch_cpu.h | 17 +
  1 file changed, 17 insertions(+)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] hw/usb/dev-mtp: Correctly report free space

2024-06-17 Thread Philippe Mathieu-Daudé


On 18/6/24 02:36, Fabio D'Urso wrote:

In order to compute the amount of free space (in bytes), the number
of available blocks (f_bavail) should be multiplied by the block
size (f_frsize) instead of the total number of blocks (f_blocks).

Signed-off-by: Fabio D'Urso 
---
  hw/usb/dev-mtp.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 7e4a0765ae..554b397e88 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -886,7 +886,7 @@ static MTPData *usb_mtp_get_storage_info(MTPState *s, 
MTPControl *c)
  rc = statvfs(s->root, );
  if (rc == 0) {
  usb_mtp_add_u64(d, (uint64_t)buf.f_frsize * buf.f_blocks);
-usb_mtp_add_u64(d, (uint64_t)buf.f_bavail * buf.f_blocks);
+usb_mtp_add_u64(d, (uint64_t)buf.f_frsize * buf.f_bavail);


Indeed (USB MTP Spec v1.1, 5.2.2.5 "Free Space In Bytes").

Reviewed-by: Philippe Mathieu-Daudé 


  usb_mtp_add_u32(d, buf.f_ffree);
  } else {
  usb_mtp_add_u64(d, 0x);

Re: [PATCH 02/23] Added CPU loop function

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

+/*
+ * The carry bit is cleared for no error; set for error.
+ * See arm64/arm64/vm_machdep.c cpu_set_syscall_retval()
+ */
+pstate = pstate_read(env);
+if (ret >= 0) {
+pstate &= ~PSTATE_C;
+env->xregs[0] = ret;
+} else if (ret == -TARGET_ERESTART) {
+env->pc -= 4;
+break;
+} else if (ret != -TARGET_EJUSTRETURN) {
+pstate |= PSTATE_C;
+env->xregs[0] = -ret;
+}
+pstate_write(env, pstate);


No need for full pstate read/write:

env->CF = {0,1};



+break;
+
+case EXCP_INTERRUPT:
+/* Just indicate that signals should be handle ASAP. */
+break;
+
+case EXCP_UDEF:
+force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPN, env->pc);
+break;
+
+
+case EXCP_PREFETCH_ABORT:
+case EXCP_DATA_ABORT:
+/* We should only arrive here with EC in {DATAABORT, INSNABORT}. */
+ec = syn_get_ec(env->exception.syndrome);


Nevermind about my question about syndrome.h vs patch 1.


r~

Re: [PATCH 01/23] Add CPU initialization function

2024-06-17 Thread Richard Henderson


On 6/17/24 11:57, Ajeet Singh wrote:

From: Stacey Son 

Addded function to initialize ARM CPU
and to check if it supports 64 bit mode

Signed-off-by: Ajeet Singh 
Signed-off-by: Stacey Son 
---
  bsd-user/aarch64/target_arch_cpu.h | 42 ++
  1 file changed, 42 insertions(+)
  create mode 100644 bsd-user/aarch64/target_arch_cpu.h

diff --git a/bsd-user/aarch64/target_arch_cpu.h 
b/bsd-user/aarch64/target_arch_cpu.h
new file mode 100644
index 00..db5c7062b9
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_cpu.h
@@ -0,0 +1,42 @@
+/*
+ *  ARM AArch64 cpu init and loop
+ *
+ * Copyright (c) 2015 Stacey Son
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_CPU_H
+#define TARGET_ARCH_CPU_H
+
+#include "target_arch.h"
+#include "target/arm/syndrome.h"


Do you actually need syndrome.h?

Otherwise,
Reviewed-by: Richard Henderson 

r~


+
+#define TARGET_DEFAULT_CPU_MODEL "any"
+
+static inline void target_cpu_init(CPUARMState *env,
+struct target_pt_regs *regs)
+{
+int i;
+
+if (!(arm_feature(env, ARM_FEATURE_AARCH64))) {
+fprintf(stderr, "The selected ARM CPU does not support 64 bit mode\n");
+exit(1);
+}
+for (i = 0; i < 31; i++) {
+env->xregs[i] = regs->regs[i];
+}
+env->pc = regs->pc;
+env->xregs[31] = regs->sp;
+}

Re: [PATCH 18/18] tcg/loongarch64: Enable v256 with LASX

2024-06-17 Thread gaosong


在 2024/5/29 下午2:39, Philippe Mathieu-Daudé 写道:

On 27/5/24 23:19, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.h |  2 +-
  tcg/loongarch64/tcg-target.c.inc | 11 ---
  2 files changed, 9 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 

diff --git a/tcg/loongarch64/tcg-target.c.inc 
b/tcg/loongarch64/tcg-target.c.inc

index e2b5aad5e3..0b41b807e3 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -2484,9 +2484,14 @@ static void tcg_target_init(TCGContext *s)
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
  -    if (cpuinfo & CPUINFO_LSX) {
-    tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
-    tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+    if (cpuinfo & (CPUINFO_LSX | CPUINFO_LASX)) {
+    if (cpuinfo & CPUINFO_LSX) {
+    tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+    tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+    }
+    if (cpuinfo & CPUINFO_LASX) {
+    tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+    }


Out of curiosity, could we have LASX without LSX?

No.  LSX depends on FPU.  LASX depends on LSX.

Thanks.
Song Gao

Re: [PATCH 18/18] tcg/loongarch64: Enable v256 with LASX

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.h |  2 +-
  tcg/loongarch64/tcg-target.c.inc | 11 ---
  2 files changed, 9 insertions(+), 4 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 990bad1d51..58bd7d258e 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -173,7 +173,7 @@ typedef enum {
  
  #define TCG_TARGET_HAS_v64  (cpuinfo & CPUINFO_LSX)

  #define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX)
-#define TCG_TARGET_HAS_v256 0
+#define TCG_TARGET_HAS_v256 (cpuinfo & CPUINFO_LASX)
  
  #define TCG_TARGET_HAS_not_vec  1

  #define TCG_TARGET_HAS_neg_vec  1
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index e2b5aad5e3..0b41b807e3 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -2484,9 +2484,14 @@ static void tcg_target_init(TCGContext *s)
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
  
-if (cpuinfo & CPUINFO_LSX) {

-tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
-tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+if (cpuinfo & (CPUINFO_LSX | CPUINFO_LASX)) {
+if (cpuinfo & CPUINFO_LSX) {
+tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+}
+if (cpuinfo & CPUINFO_LASX) {
+tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+}
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);
  tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V26);

Re: [PATCH 17/18] tcg/loongarch64: Support LASX in tcg_out_vec_op

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Signed-off-by: Richard Henderson
---
  tcg/loongarch64/tcg-target.c.inc | 223 +++
  1 file changed, 137 insertions(+), 86 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

Re: [PATCH 16/18] tcg/loongarch64: Split out vdvjukN in tcg_out_vec_op

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Fixes a bug in the immediate shifts, because the exact
encoding depends on the element size.

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 58 ++--
  1 file changed, 32 insertions(+), 26 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 54f7bc9d14..5d2a6b2ca2 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1901,6 +1901,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  static const LoongArchInsn rotrv_vec_insn[4] = {
  OPC_VROTR_B, OPC_VROTR_H, OPC_VROTR_W, OPC_VROTR_D
  };
+static const LoongArchInsn rotri_vec_insn[4] = {
+OPC_VROTRI_B, OPC_VROTRI_H, OPC_VROTRI_W, OPC_VROTRI_D
+};
  
  a0 = args[0];

  a1 = args[1];
@@ -2034,15 +2037,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  case INDEX_op_sarv_vec:
  insn = sarv_vec_insn[vece];
  goto vdvjvk;
-case INDEX_op_shli_vec:
-tcg_out32(s, encode_vdvjuk3_insn(shli_vec_insn[vece], a0, a1, a2));
-break;
-case INDEX_op_shri_vec:
-tcg_out32(s, encode_vdvjuk3_insn(shri_vec_insn[vece], a0, a1, a2));
-break;
-case INDEX_op_sari_vec:
-tcg_out32(s, encode_vdvjuk3_insn(sari_vec_insn[vece], a0, a1, a2));
-break;
  case INDEX_op_rotlv_vec:
  /* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */
  tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], TCG_VEC_TMP0, a2));
@@ -2051,26 +2045,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  case INDEX_op_rotrv_vec:
  insn = rotrv_vec_insn[vece];
  goto vdvjvk;
+case INDEX_op_shli_vec:
+insn = shli_vec_insn[vece];
+goto vdvjukN;
+case INDEX_op_shri_vec:
+insn = shri_vec_insn[vece];
+goto vdvjukN;
+case INDEX_op_sari_vec:
+insn = sari_vec_insn[vece];
+goto vdvjukN;
  case INDEX_op_rotli_vec:
  /* rotli_vec a1, a2 = rotri_vec a1, -a2 */
  a2 = extract32(-a2, 0, 3 + vece);
-switch (vece) {
-case MO_8:
-tcg_out_opc_vrotri_b(s, a0, a1, a2);
-break;
-case MO_16:
-tcg_out_opc_vrotri_h(s, a0, a1, a2);
-break;
-case MO_32:
-tcg_out_opc_vrotri_w(s, a0, a1, a2);
-break;
-case MO_64:
-tcg_out_opc_vrotri_d(s, a0, a1, a2);
-break;
-default:
-g_assert_not_reached();
-}
-break;
+insn = rotri_vec_insn[vece];
+goto vdvjukN;
  case INDEX_op_bitsel_vec:
  /* vbitsel vd, vj, vk, va = bitsel_vec vd, va, vk, vj */
  tcg_out_opc_vbitsel_v(s, a0, a3, a2, a1);
@@ -2083,6 +2071,24 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  vdvjvk:
  tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
  break;
+vdvjukN:
+switch (vece) {
+case MO_8:
+tcg_out32(s, encode_vdvjuk3_insn(insn, a0, a1, a2));
+break;
+case MO_16:
+tcg_out32(s, encode_vdvjuk4_insn(insn, a0, a1, a2));
+break;
+case MO_32:
+tcg_out32(s, encode_vdvjuk5_insn(insn, a0, a1, a2));
+break;
+case MO_64:
+tcg_out32(s, encode_vdvjuk6_insn(insn, a0, a1, a2));
+break;
+default:
+g_assert_not_reached();
+}
+break;
  }
  }

Re: [PATCH 15/18] tcg/loongarch64: Remove temp_vec from tcg_out_vec_op

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Use TCG_VEC_TMP0 directly.

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 9 -
  1 file changed, 4 insertions(+), 5 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index e633d268d0..54f7bc9d14 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1834,7 +1834,6 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  {
  TCGType type = vecl + TCG_TYPE_V64;
  TCGArg a0, a1, a2, a3;
-TCGReg temp_vec = TCG_VEC_TMP0;
  
  static const LoongArchInsn cmp_vec_insn[16][4] = {

  [TCG_COND_EQ] = {OPC_VSEQ_B, OPC_VSEQ_H, OPC_VSEQ_W, OPC_VSEQ_D},
@@ -1976,8 +1975,8 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
   * dupi_vec temp, a2
   * cmp_vec a0, a1, temp, cond
   */
-tcg_out_dupi_vec(s, type, vece, temp_vec, a2);
-a2 = temp_vec;
+tcg_out_dupi_vec(s, type, vece, TCG_VEC_TMP0, a2);
+a2 = TCG_VEC_TMP0;
  }
  
  insn = cmp_vec_insn[cond][vece];

@@ -2046,8 +2045,8 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  break;
  case INDEX_op_rotlv_vec:
  /* rotlv_vec a1, a2 = rotrv_vec a1, -a2 */
-tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], temp_vec, a2));
-a2 = temp_vec;
+tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], TCG_VEC_TMP0, a2));
+a2 = TCG_VEC_TMP0;
  /* fall through */
  case INDEX_op_rotrv_vec:
  insn = rotrv_vec_insn[vece];

Re: [PATCH 14/18] tcg/loongarch64: Support LASX in tcg_out_{mov,ld,st}

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 19 +++
  1 file changed, 19 insertions(+)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 5f4915c6ac..e633d268d0 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -325,6 +325,9 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg 
ret, TCGReg arg)
  case TCG_TYPE_V128:
  tcg_out_opc_vori_b(s, ret, arg, 0);
  break;
+case TCG_TYPE_V256:
+tcg_out_opc_xvori_b(s, ret, arg, 0);
+break;
  default:
  g_assert_not_reached();
  }
@@ -854,6 +857,14 @@ static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg 
dest,
  tcg_out_opc_vldx(s, dest, base, TCG_REG_TMP0);
  }
  break;
+case TCG_TYPE_V256:
+if (-0x800 <= offset && offset <= 0x7ff) {
+tcg_out_opc_xvld(s, dest, base, offset);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
+tcg_out_opc_xvldx(s, dest, base, TCG_REG_TMP0);
+}
+break;
  default:
  g_assert_not_reached();
  }
@@ -886,6 +897,14 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg 
src,
  tcg_out_opc_vstx(s, src, base, TCG_REG_TMP0);
  }
  break;
+case TCG_TYPE_V256:
+if (-0x800 <= offset && offset <= 0x7ff) {
+tcg_out_opc_xvst(s, src, base, offset);
+} else {
+tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
+tcg_out_opc_xvstx(s, src, base, TCG_REG_TMP0);
+}
+break;
  default:
  g_assert_not_reached();
  }

Re: [PATCH 13/18] tcg/loongarch64: Split out vdvjvk in tcg_out_vec_op

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

  case INDEX_op_andc_vec:
  /*
   * vandn vd, vj, vk: vd = vk & ~vj
   * andc_vec vd, vj, vk: vd = vj & ~vk
- * vk and vk are swapped
+ * vj and vk are swapped
   */
-tcg_out_opc_vandn_v(s, a0, a2, a1);
-break;
+a1 = a2;
+a2 = args[2];

Should be args[1]?  Similar to op_not_vec 'a2 = a1'.

Thanks.
Song Gao

+insn = OPC_VANDN_V;
+goto vdvjvk;
  case INDEX_op_or_vec:
-tcg_out_opc_vor_v(s, a0, a1, a2);
-break;
+insn = OPC_VOR_V;
+goto vdvjvk;
  case INDEX_op_orc_vec:
-tcg_out_opc_vorn_v(s, a0, a1, a2);
-break;
+insn = OPC_VORN_V;
+goto vdvjvk;
  case INDEX_op_xor_vec:
-tcg_out_opc_vxor_v(s, a0, a1, a2);
-break;
-case INDEX_op_nor_vec:
-tcg_out_opc_vnor_v(s, a0, a1, a2);
-break;
+insn = OPC_VXOR_V;
+goto vdvjvk;
  case INDEX_op_not_vec:
-tcg_out_opc_vnor_v(s, a0, a1, a1);
-break;
+a2 = a1;
+/* fall through */
+case INDEX_op_nor_vec:
+insn = OPC_VNOR_V;
+goto vdvjvk;

Re: [PATCH 12/18] tcg/loongarch64: Support LASX in tcg_out_addsub_vec

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 36 ++--
  1 file changed, 20 insertions(+), 16 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 47011488dd..652aa261a3 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1758,21 +1758,25 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType 
type, unsigned vece,
  tcg_out_dup_vec(s, type, vece, rd, TCG_REG_TMP0);
  }
  
-static void tcg_out_addsub_vec(TCGContext *s, unsigned vece, const TCGArg a0,

-   const TCGArg a1, const TCGArg a2,
+static void tcg_out_addsub_vec(TCGContext *s, bool lasx, unsigned vece,
+   TCGArg a0, TCGArg a1, TCGArg a2,
 bool a2_is_const, bool is_add)
  {
-static const LoongArchInsn add_vec_insn[4] = {
-OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D
+static const LoongArchInsn add_vec_insn[2][4] = {
+{ OPC_VADD_B, OPC_VADD_H, OPC_VADD_W, OPC_VADD_D },
+{ OPC_XVADD_B, OPC_XVADD_H, OPC_XVADD_W, OPC_XVADD_D },
  };
-static const LoongArchInsn add_vec_imm_insn[4] = {
-OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU
+static const LoongArchInsn add_vec_imm_insn[2][4] = {
+{ OPC_VADDI_BU, OPC_VADDI_HU, OPC_VADDI_WU, OPC_VADDI_DU },
+{ OPC_XVADDI_BU, OPC_XVADDI_HU, OPC_XVADDI_WU, OPC_XVADDI_DU },
  };
-static const LoongArchInsn sub_vec_insn[4] = {
-OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D
+static const LoongArchInsn sub_vec_insn[2][4] = {
+{ OPC_VSUB_B, OPC_VSUB_H, OPC_VSUB_W, OPC_VSUB_D },
+{ OPC_XVSUB_B, OPC_XVSUB_H, OPC_XVSUB_W, OPC_XVSUB_D },
  };
-static const LoongArchInsn sub_vec_imm_insn[4] = {
-OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
+static const LoongArchInsn sub_vec_imm_insn[2][4] = {
+{ OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU },
+{ OPC_XVSUBI_BU, OPC_XVSUBI_HU, OPC_XVSUBI_WU, OPC_XVSUBI_DU },
  };
  LoongArchInsn insn;
  
@@ -1783,10 +1787,10 @@ static void tcg_out_addsub_vec(TCGContext *s, unsigned vece, const TCGArg a0,

  value = -value;
  }
  if (value < 0) {
-insn = sub_vec_imm_insn[vece];
+insn = sub_vec_imm_insn[lasx][vece];
  value = -value;
  } else {
-insn = add_vec_imm_insn[vece];
+insn = add_vec_imm_insn[lasx][vece];
  }
  
  /* Constraint TCG_CT_CONST_VADD ensures validity. */

@@ -1797,9 +1801,9 @@ static void tcg_out_addsub_vec(TCGContext *s, unsigned 
vece, const TCGArg a0,
  }
  
  if (is_add) {

-insn = add_vec_insn[vece];
+insn = add_vec_insn[lasx][vece];
  } else {
-insn = sub_vec_insn[vece];
+insn = sub_vec_insn[lasx][vece];
  }
  tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
  }
@@ -1963,10 +1967,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
  }
  break;
  case INDEX_op_add_vec:
-tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], true);
+tcg_out_addsub_vec(s, false, vece, a0, a1, a2, const_args[2], true);
  break;
  case INDEX_op_sub_vec:
-tcg_out_addsub_vec(s, vece, a0, a1, a2, const_args[2], false);
+tcg_out_addsub_vec(s, false, vece, a0, a1, a2, const_args[2], false);
  break;
  case INDEX_op_neg_vec:
  tcg_out32(s, encode_vdvj_insn(neg_vec_insn[vece], a0, a1));

Re: linux-user: array overflow in pselect6 emulation

2024-06-17 Thread Richard Henderson


On 6/17/24 03:43, Andreas Schwab wrote:

$ cat select.c
#include 
#include 
#include 
#include 
#include 
#include 

int
main (int argc, char **argv)
{
   int nfds = (argc > 1 ? atoi (argv[1]) : 1031);
   fd_set *fds = calloc ((nfds + (sizeof (fd_mask) * 8) - 1)
 / (sizeof (fd_mask) * 8), sizeof (fd_mask));
   setrlimit (RLIMIT_NOFILE,
  &(struct rlimit){ .rlim_cur = nfds, .rlim_max = nfds });
   dup2 (open ("/dev/null", O_RDONLY), nfds - 1);
   FD_SET (nfds - 1, fds);
   syscall (SYS_pselect6, nfds, fds, 0, 0, 0, 0);
}


Ack.

We use libc fd_set, which is sized for FD_SETSIZE at 1024.

We can either artificially limit RLIMIT_NOFILE (not ideal), or dynamically allocate all 
fd_set within qemu (which will take some time and effort).



r~

Re: [PATCH 11/18] tcg/loongarch64: Simplify tcg_out_addsub_vec

2024-06-17 Thread gaosong


在 2024/5/28 上午5:19, Richard Henderson 写道:

Signed-off-by: Richard Henderson 
---
  tcg/loongarch64/tcg-target.c.inc | 29 +++--
  1 file changed, 15 insertions(+), 14 deletions(-)

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index c7d0c7839b..47011488dd 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1774,33 +1774,34 @@ static void tcg_out_addsub_vec(TCGContext *s, unsigned 
vece, const TCGArg a0,
  static const LoongArchInsn sub_vec_imm_insn[4] = {
  OPC_VSUBI_BU, OPC_VSUBI_HU, OPC_VSUBI_WU, OPC_VSUBI_DU
  };
+LoongArchInsn insn;
  
  if (a2_is_const) {

  int64_t value = sextract64(a2, 0, 8 << vece);
+
  if (!is_add) {
  value = -value;
  }
-
-/* Try vaddi/vsubi */
-if (0 <= value && value <= 0x1f) {
-tcg_out32(s, encode_vdvjuk5_insn(add_vec_imm_insn[vece], a0, \
- a1, value));
-return;
-} else if (-0x1f <= value && value < 0) {
-tcg_out32(s, encode_vdvjuk5_insn(sub_vec_imm_insn[vece], a0, \
- a1, -value));
-return;
+if (value < 0) {
+insn = sub_vec_imm_insn[vece];
+value = -value;
+} else {
+insn = add_vec_imm_insn[vece];
  }
  
-/* constraint TCG_CT_CONST_VADD ensures unreachable */

-g_assert_not_reached();
+/* Constraint TCG_CT_CONST_VADD ensures validity. */
+tcg_debug_assert(0 <= value && value <= 0x1f);
+
+tcg_out32(s, encode_vdvjuk5_insn(insn, a0, a1, value));
+return;
  }
  
  if (is_add) {

-tcg_out32(s, encode_vdvjvk_insn(add_vec_insn[vece], a0, a1, a2));
+insn = add_vec_insn[vece];
  } else {
-tcg_out32(s, encode_vdvjvk_insn(sub_vec_insn[vece], a0, a1, a2));
+insn = sub_vec_insn[vece];
  }
+tcg_out32(s, encode_vdvjvk_insn(insn, a0, a1, a2));
  }
  
  static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,

[PATCH] hw/usb/dev-mtp: Correctly report free space

2024-06-17 Thread Fabio D'Urso

In order to compute the amount of free space (in bytes), the number
of available blocks (f_bavail) should be multiplied by the block
size (f_frsize) instead of the total number of blocks (f_blocks).

Signed-off-by: Fabio D'Urso 
---
 hw/usb/dev-mtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/usb/dev-mtp.c b/hw/usb/dev-mtp.c
index 7e4a0765ae..554b397e88 100644
--- a/hw/usb/dev-mtp.c
+++ b/hw/usb/dev-mtp.c
@@ -886,7 +886,7 @@ static MTPData *usb_mtp_get_storage_info(MTPState *s, 
MTPControl *c)
 rc = statvfs(s->root, );
 if (rc == 0) {
 usb_mtp_add_u64(d, (uint64_t)buf.f_frsize * buf.f_blocks);
-usb_mtp_add_u64(d, (uint64_t)buf.f_bavail * buf.f_blocks);
+usb_mtp_add_u64(d, (uint64_t)buf.f_frsize * buf.f_bavail);
 usb_mtp_add_u32(d, buf.f_ffree);
 } else {
 usb_mtp_add_u64(d, 0x);
-- 
2.45.2.627.g7a2c4fd464-goog

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-17 Thread Pierrick Bouvier


On 6/17/24 14:04, Manos Pitsidianakis wrote:

On Mon, 17 Jun 2024 17:32, Paolo Bonzini  wrote:

On Mon, Jun 17, 2024 at 4:04 PM Manos Pitsidianakis
 wrote:

I respectfully disagree and recommend taking another look at the code.

The device actually performs all logic in non-unsafe methods and is
typed instead of operating on raw integers as fields/state. The C stuff
is the FFI boundary calls which you cannot avoid; they are the same
things you'd wrap under these bindings we're talking about.


Indeed, but the whole point is that the bindings wrap unsafe code in
such a way that the safety invariants hold. Not doing this, especially
for a device that does not do DMA (so that there are very few ways
that things can go wrong in the first place), runs counter to the
whole philosophy of Rust.

For example, you have:

pub fn realize( self) {
unsafe {
qemu_chr_fe_set_handlers(
addr_of_mut!(self.char_backend),
Some(pl011_can_receive),
Some(pl011_receive),
Some(pl011_event),
None,
addr_of_mut!(*self).cast::(),
core::ptr::null_mut(),
true,
);
}
}

where you are implicitly relying on the fact that pl011_can_receive(),
pl011_receive(), pl011_event() are never called from the
MemoryRegionOps read() and write() callbacks. Otherwise you'd have two
mutable references at the same time, one as an argument to the
callbacks:

   pub fn read( self, offset: hwaddr, ...

and one from e.g. "state.as_mut().put_fifo()" in pl011_receive().

This is not Rust code. It makes no attempt at enforcing the whole
"shared XOR mutable" which is the basis of Rust's reference semantics.
In other words, this is as safe as C code---sure, it can use nice
abstractions for register access, it has "unsafe" added in front of
pointer dereferences, but it is not safe.

Again, I'm not saying it's a bad first step. It's *awesome* if we
treat it as what it is: a guide towards providing safe bindings
between Rust and C (which often implies them being idiomatic). But if
we don't accept this, if there is no plan to make the code safe, it is
a potential huge source of technical debt.


QEMU calls the device's FFI callbacks with its pointer and arguments,
the pointer gets dereferenced to the actual Rust type which qemu
guarantees is valid, then the Rust struct's methods are called to handle
each functionality. There is nothing actually unsafe here, assuming
QEMU's invariants and code are correct.


The same can be said of C code, can't it? There is nothing unsafe in C
code, assuming there are no bugs...


Paolo, first please tone down your condescending tone, it's incredibly
offensive. I'm honestly certain this is not on purpose otherwise I'd not
engage at all.


The best compliment you had was "I'm not saying it's a bad first step", 
and I would say this differently: It's a great first step!


We should have a first version where we stick to the C API, with all the 
Rust code being as Rusty as possible: benefit from typed enums, error 
handling, bounds checking and other nice things.


It's useless to iterate/debate for two years on the list before landing 
something upstream. We can start with this, have one or two devices that 
use this build system, and then focus on designing a good interface for 
this.




Secondly, are you implying that these callbacks are not operated under
the BQL? I'm not seeing the C UART devices using mutexes. If they are
not running under the BQL, then gladly we add mutexes, big deal. Just
say this directly instead of writing all these amounts of text. If it's
true I'd just accept it and move on with a new iteration. Isn't that the
point of code review? It is really that simple. Why not do this right
away? I'm frankly puzzled.



As I mentioned in my previous answer, this device already makes a good 
progress: it eliminates a whole class of memory errors, and the only 
issue brought by unsafe code is concurrency issues. And this should be 
our focus once we get the build infrastructure done!



Finally, this is Rust code. You cannot turn off the type system, you
cannot turn off the borrow checker, you can only go around creating new
types and references out of raw memory addresses and tell the compiler
'trust me on this'. Ignoring that misses the entire point of Rust. The
statement 'this is not Rust code but as good as C' is dishonest and
misguided. Check for example the source code of the nix crate, which
exposes libc and various POSIX/*nix APIs. Is it the same as C and not
Rust code?

If you have specific scenarios in mind where such things exist in the
code and end up doing invalid behavior please be kind and write them
down explicitly and demonstrate them on code review. This approach of
'yes but no' is not constructive because it is not addressing any
specific problems directly. Instead it comes out as vague dismissive FUD
and I'm sure that is not what

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-17 Thread Pierrick Bouvier


On 6/17/24 07:32, Paolo Bonzini wrote:

On Mon, Jun 17, 2024 at 4:04 PM Manos Pitsidianakis
 wrote:

I respectfully disagree and recommend taking another look at the code.

The device actually performs all logic in non-unsafe methods and is
typed instead of operating on raw integers as fields/state. The C stuff
is the FFI boundary calls which you cannot avoid; they are the same
things you'd wrap under these bindings we're talking about.


Indeed, but the whole point is that the bindings wrap unsafe code in
such a way that the safety invariants hold. Not doing this, especially
for a device that does not do DMA (so that there are very few ways
that things can go wrong in the first place), runs counter to the
whole philosophy of Rust.



You are raising an interesting point, and should be a central discussion 
when designing the future Rust API for this subsystem.
It may not be intuitive to people that even a code without any unsafe 
section could still call code in a sequence that will violate some 
invariants, and break Rust rules.
IMHO, this is one of the big challenge with the Rust/C interfacing, 
including for Linux.


But it's *not yet* the goal of this series. First, we should see how to 
build one device (I would even like to see a second, to factorize all 
the boilerplate), and then, focus on which interface we want to have to 
make it really better than the C version.



For example, you have:

 pub fn realize( self) {
 unsafe {
 qemu_chr_fe_set_handlers(
 addr_of_mut!(self.char_backend),
 Some(pl011_can_receive),
 Some(pl011_receive),
 Some(pl011_event),
 None,
 addr_of_mut!(*self).cast::(),
 core::ptr::null_mut(),
 true,
 );
 }
 }

where you are implicitly relying on the fact that pl011_can_receive(),
pl011_receive(), pl011_event() are never called from the
MemoryRegionOps read() and write() callbacks. Otherwise you'd have two
mutable references at the same time, one as an argument to the
callbacks:

pub fn read( self, offset: hwaddr, ...

and one from e.g. "state.as_mut().put_fifo()" in pl011_receive().

This is not Rust code. It makes no attempt at enforcing the whole
"shared XOR mutable" which is the basis of Rust's reference semantics.
In other words, this is as safe as C code---sure, it can use nice
abstractions for register access, it has "unsafe" added in front of
pointer dereferences, but it is not safe.



I think that Manos did a great amount of work to reduce the use of 
unsafe code. Basically, he wrote the device as Rusty as possible, while 
still using the QEMU C API part that is inevitable today.


In more, given the current design, yes some race conditions are possible 
(depends on how QEMU calls callbacks installed), but a whole class of 
problems is still eliminated.


From the start of this series, it was precised that focus was on build 
system, and it seemed to me that we shifted on the hot debate of "How to 
interface with C code?".



Again, I'm not saying it's a bad first step. It's *awesome* if we
treat it as what it is: a guide towards providing safe bindings
between Rust and C (which often implies them being idiomatic). But if
we don't accept this, if there is no plan to make the code safe, it is
a potential huge source of technical debt.



I agree, it should be the next direction after a first device was 
written: How to remove almost all usage of unsafe code and maintain good 
invariants in the API?


While talking about idiomatic, Rust tends to favor message passing to 
memory share when it comes to concurrency (and IMHO, it's the right 
thing). And it's definitely now how QEMU is architected at this time.


With extra locking, we should be able to have a first correct interface 
that offers strong guarantees, and we can then work on making it faster 
with concurrency.



QEMU calls the device's FFI callbacks with its pointer and arguments,
the pointer gets dereferenced to the actual Rust type which qemu
guarantees is valid, then the Rust struct's methods are called to handle
each functionality. There is nothing actually unsafe here, assuming
QEMU's invariants and code are correct.


The same can be said of C code, can't it? There is nothing unsafe in C
code, assuming there are no bugs...



Not the same, you still get other compile/runtime guarantees:
- strong typed enums, so no case is forgotten
- good error handling
- safe type conversions
- bound check of fifo access

The only open issue by calling unsafe code in this context is related to 
(potential) concurrency. If a first step to have a good Rust API is to 
run everything under a lock, there is good chance you already started to 
design the device in the right way to support real concurrency later, so 
it's still useful.


Pierrick


Paolo



I think we're actually in a great position. We have a PoC from Marc-André,
plus the

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-17 Thread Dr. David Alan Gilbert

* Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:
> On 6/17/24 13:56, Dr. David Alan Gilbert wrote:
> > * Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:
> > > On 6/14/24 15:00, Dr. David Alan Gilbert wrote:
> > > > * Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:
> > > > > Hi Dave,
> > > > > 
> > > > > On 6/12/24 14:02, Dr. David Alan Gilbert wrote:
> > > > > > * Alex Bennée (alex.ben...@linaro.org) wrote:
> > > > > > > From: Pierrick Bouvier 
> > > > > > > 
> > > > > > > This plugin uses the new time control interface to make decisions
> > > > > > > about the state of time during the emulation. The algorithm is
> > > > > > > currently very simple. The user specifies an ips rate which 
> > > > > > > applies
> > > > > > > per core. If the core runs ahead of its allocated execution time 
> > > > > > > the
> > > > > > > plugin sleeps for a bit to let real time catch up. Either way 
> > > > > > > time is
> > > > > > > updated for the emulation as a function of total executed 
> > > > > > > instructions
> > > > > > > with some adjustments for cores that idle.
> > > > > > 
> > > > > > A few random thoughts:
> > > > > >  a) Are there any definitions of what a plugin that controls 
> > > > > > time
> > > > > > should do with a live migration?
> > > > > 
> > > > > It's not something that was considered as part of this work.
> > > > 
> > > > That's OK, the only thing is we need to stop anyone from hitting 
> > > > problems
> > > > when they don't realise it's not been addressed.
> > > > One way might be to add a migration blocker; see 
> > > > include/migration/blocker.h
> > > > then you might print something like 'Migration not available due to 
> > > > plugin '
> > > > 
> > > 
> > > So basically, we could make a call to migrate_add_blocker(), when someone
> > > request time_control through plugin API?
> > > 
> > > IMHO, it's something that should be part of plugin API (if any plugin 
> > > calls
> > > qemu_plugin_request_time_control()), instead of the plugin code itself. 
> > > This
> > > way, any plugin getting time control automatically blocks any potential
> > > migration.
> > 
> > Note my question asked for a 'any definitions of what a plugin ..' - so
> > you could define it that way, another one is to think that in the future
> > you may allow it and the plugin somehow interacts with migration not to
> > change time at certain migration phases.
> > 
> 
> I would be in favor to forbid usage for now in this context. I'm not sure
> why people would play with migration and plugins generally at this time
> (there might be experiments or use cases I'm not aware of), so a simple
> barrier preventing that seems ok.
> 
> This plugin is part of an experiment where we implement a qemu feature
> (icount=auto in this case) by using plugins. If it turns into a successful
> usage and this plugin becomes popular, we can always lift the limitation
> later.

Sounds reasonable to me.

Dave

> @Alex, would you like to add this now (icount=auto is still not removed from
> qemu), or wait for integration, and add this as another patch?
> 
> > > > > >  b) The sleep in migration/dirtyrate.c points out g_usleep might
> > > > > > sleep for longer, so reads the actual wall clock time to
> > > > > > figure out a new 'now'.
> > > > > 
> > > > > The current API mentions time starts at 0 from qemu startup. Maybe we 
> > > > > could
> > > > > consider in the future to change this behavior to retrieve time from 
> > > > > an
> > > > > existing migrated machine.
> > > > 
> > > > Ah, I meant for (b) to be independent of (a) - not related to 
> > > > migration; just
> > > > down to the fact you used g_usleep in the plugin and a g_usleep might 
> > > > sleep
> > > > for a different amount of time than you asked.
> > > > 
> > > 
> > > We know that, and the plugin is not meant to be "cycle accurate" in 
> > > general,
> > > we just set a upper bound for number of instructions we can execute in a
> > > given amount of time (1/10 second for now).
> > > 
> > > We compute the new time based on how many instructions effectively ran on
> > > the most used cpu, so even if we slept a bit more than expected, it's
> > > correct.
> > 
> > Ah OK.
> > 
> > Dave
> > 
> > > > > >  c) A fun thing to do with this would be to follow an external 
> > > > > > simulation
> > > > > > or 2nd qemu, trying to keep the two from running too far 
> > > > > > past
> > > > > > each other.
> > > > > > 
> > > > > 
> > > > > Basically, to slow the first one, waiting for the replicated one to 
> > > > > catch
> > > > > up?
> > > > 
> > > > Yes, something like that.
> > > > 
> > > > Dave
> > > > 
> > > > > > Dave >
> > > > > > > Examples
> > > > > > > 
> > > > > > > 
> > > > > > > Slow down execution of /bin/true:
> > > > > > > $ num_insn=$(./build/qemu-x86_64 -plugin 
> > > > > > > ./build/tests/plugin/libinsn.so -d plugin /bin/true |& grep total 
> > > > > > > | sed -e 's/.*: //')
> > > > > > > $

[PATCH v2] hw/arm/virt-acpi-build: Fix IORT id_count

2024-06-17 Thread Nicolin Chen

The IORT doc defines "Number of IDs" ("id_count" in the virt-acpi-build)
to be "the number of IDs in the range minus one". Otherwise, Linux kernel
reports "conflicting mapping for input ID" FW_BUG at the overlapped ID.

Fixes: 42e0f050e3a5 ("hw/arm/virt-acpi-build: Add IORT support to bypass 
SMMUv3")
Signed-off-by: Nicolin Chen 
---
Changelog
v2:
 * Moved "-1" to the same line of id_count calculation
 * Added "+1" to the next_range.input_base calculation
v1:
 https://lore.kernel.org/all/20240613234802.828265-1-nicol...@nvidia.com/

 hw/arm/virt-acpi-build.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index c3ccfef026..631f2c6d04 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -243,7 +243,8 @@ iort_host_bridges(Object *obj, void *opaque)
 
 AcpiIortIdMapping idmap = {
 .input_base = min_bus << 8,
-.id_count = (max_bus - min_bus + 1) << 8,
+/* id_count is the number of IDs in the range minus one */
+.id_count = ((max_bus - min_bus + 1) << 8) - 1,
 };
 g_array_append_val(idmap_blob, idmap);
 }
@@ -298,11 +299,13 @@ build_iort(GArray *table_data, BIOSLinker *linker, 
VirtMachineState *vms)
 idmap = _array_index(smmu_idmaps, AcpiIortIdMapping, i);
 
 if (next_range.input_base < idmap->input_base) {
-next_range.id_count = idmap->input_base - 
next_range.input_base;
+/* id_count is the number of IDs in the range minus one */
+next_range.id_count = idmap->input_base -
+  next_range.input_base - 1;
 g_array_append_val(its_idmaps, next_range);
 }
 
-next_range.input_base = idmap->input_base + idmap->id_count;
+next_range.input_base = idmap->input_base + idmap->id_count + 1;
 }
 
 /* Append the last RC -> ITS ID mapping */
-- 
2.43.0

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-17 Thread Pierrick Bouvier


On 6/17/24 13:56, Dr. David Alan Gilbert wrote:

* Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:

On 6/14/24 15:00, Dr. David Alan Gilbert wrote:

* Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:

Hi Dave,

On 6/12/24 14:02, Dr. David Alan Gilbert wrote:

* Alex Bennée (alex.ben...@linaro.org) wrote:

From: Pierrick Bouvier 

This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies
per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.


A few random thoughts:
 a) Are there any definitions of what a plugin that controls time
should do with a live migration?


It's not something that was considered as part of this work.


That's OK, the only thing is we need to stop anyone from hitting problems
when they don't realise it's not been addressed.
One way might be to add a migration blocker; see include/migration/blocker.h
then you might print something like 'Migration not available due to plugin '



So basically, we could make a call to migrate_add_blocker(), when someone
request time_control through plugin API?

IMHO, it's something that should be part of plugin API (if any plugin calls
qemu_plugin_request_time_control()), instead of the plugin code itself. This
way, any plugin getting time control automatically blocks any potential
migration.


Note my question asked for a 'any definitions of what a plugin ..' - so
you could define it that way, another one is to think that in the future
you may allow it and the plugin somehow interacts with migration not to
change time at certain migration phases.



I would be in favor to forbid usage for now in this context. I'm not 
sure why people would play with migration and plugins generally at this 
time (there might be experiments or use cases I'm not aware of), so a 
simple barrier preventing that seems ok.


This plugin is part of an experiment where we implement a qemu feature 
(icount=auto in this case) by using plugins. If it turns into a 
successful usage and this plugin becomes popular, we can always lift the 
limitation later.


@Alex, would you like to add this now (icount=auto is still not removed 
from qemu), or wait for integration, and add this as another patch?



 b) The sleep in migration/dirtyrate.c points out g_usleep might
sleep for longer, so reads the actual wall clock time to
figure out a new 'now'.


The current API mentions time starts at 0 from qemu startup. Maybe we could
consider in the future to change this behavior to retrieve time from an
existing migrated machine.


Ah, I meant for (b) to be independent of (a) - not related to migration; just
down to the fact you used g_usleep in the plugin and a g_usleep might sleep
for a different amount of time than you asked.



We know that, and the plugin is not meant to be "cycle accurate" in general,
we just set a upper bound for number of instructions we can execute in a
given amount of time (1/10 second for now).

We compute the new time based on how many instructions effectively ran on
the most used cpu, so even if we slept a bit more than expected, it's
correct.


Ah OK.

Dave


 c) A fun thing to do with this would be to follow an external simulation
or 2nd qemu, trying to keep the two from running too far past
each other.



Basically, to slow the first one, waiting for the replicated one to catch
up?


Yes, something like that.

Dave


Dave >

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin 
/bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Tested in system mode by booting a full debian system, and using:
$ sysbench cpu run
Performance decrease linearly with the given number of ips.

Signed-off-by: Pierrick Bouvier 
Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
---
contrib/plugins/ips.c| 164 +++
contrib/plugins/Makefile |   1 +
2 files changed, 165 insertions(+)
create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 00..db77729264
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,164 @@
+/*
+ * ips rate limiting plugin.
+ *
+ * This plugin can be used to restrict the execution of a

Re: [RFC PATCH v2 3/5] rust: add PL011 device model

2024-06-17 Thread Manos Pitsidianakis


On Mon, 17 Jun 2024 17:32, Paolo Bonzini  wrote:

On Mon, Jun 17, 2024 at 4:04 PM Manos Pitsidianakis
 wrote:

I respectfully disagree and recommend taking another look at the code.

The device actually performs all logic in non-unsafe methods and is
typed instead of operating on raw integers as fields/state. The C stuff
is the FFI boundary calls which you cannot avoid; they are the same
things you'd wrap under these bindings we're talking about.


Indeed, but the whole point is that the bindings wrap unsafe code in
such a way that the safety invariants hold. Not doing this, especially
for a device that does not do DMA (so that there are very few ways
that things can go wrong in the first place), runs counter to the
whole philosophy of Rust.

For example, you have:

   pub fn realize( self) {
   unsafe {
   qemu_chr_fe_set_handlers(
   addr_of_mut!(self.char_backend),
   Some(pl011_can_receive),
   Some(pl011_receive),
   Some(pl011_event),
   None,
   addr_of_mut!(*self).cast::(),
   core::ptr::null_mut(),
   true,
   );
   }
   }

where you are implicitly relying on the fact that pl011_can_receive(),
pl011_receive(), pl011_event() are never called from the
MemoryRegionOps read() and write() callbacks. Otherwise you'd have two
mutable references at the same time, one as an argument to the
callbacks:

  pub fn read( self, offset: hwaddr, ...

and one from e.g. "state.as_mut().put_fifo()" in pl011_receive().

This is not Rust code. It makes no attempt at enforcing the whole
"shared XOR mutable" which is the basis of Rust's reference semantics.
In other words, this is as safe as C code---sure, it can use nice
abstractions for register access, it has "unsafe" added in front of
pointer dereferences, but it is not safe.

Again, I'm not saying it's a bad first step. It's *awesome* if we
treat it as what it is: a guide towards providing safe bindings
between Rust and C (which often implies them being idiomatic). But if
we don't accept this, if there is no plan to make the code safe, it is
a potential huge source of technical debt.


QEMU calls the device's FFI callbacks with its pointer and arguments,
the pointer gets dereferenced to the actual Rust type which qemu
guarantees is valid, then the Rust struct's methods are called to handle
each functionality. There is nothing actually unsafe here, assuming
QEMU's invariants and code are correct.


The same can be said of C code, can't it? There is nothing unsafe in C
code, assuming there are no bugs...


Paolo, first please tone down your condescending tone, it's incredibly 
offensive. I'm honestly certain this is not on purpose otherwise I'd not 
engage at all.


Secondly, are you implying that these callbacks are not operated under 
the BQL? I'm not seeing the C UART devices using mutexes. If they are 
not running under the BQL, then gladly we add mutexes, big deal. Just 
say this directly instead of writing all these amounts of text. If it's 
true I'd just accept it and move on with a new iteration. Isn't that the 
point of code review? It is really that simple. Why not do this right 
away? I'm frankly puzzled.


Finally, this is Rust code. You cannot turn off the type system, you 
cannot turn off the borrow checker, you can only go around creating new 
types and references out of raw memory addresses and tell the compiler 
'trust me on this'. Ignoring that misses the entire point of Rust. The 
statement 'this is not Rust code but as good as C' is dishonest and 
misguided. Check for example the source code of the nix crate, which 
exposes libc and various POSIX/*nix APIs. Is it the same as C and not 
Rust code?


If you have specific scenarios in mind where such things exist in the 
code and end up doing invalid behavior please be kind and write them 
down explicitly and demonstrate them on code review. This approach of 
'yes but no' is not constructive because it is not addressing any 
specific problems directly. Instead it comes out as vague dismissive FUD 
and I'm sure that is not what you or anyone else wants.


Please take some time to understand my POV here, it'd help both of us 
immensely.


Sincerely thank you in advance,
Manos

Re: [PATCH 20/20] qapi: convert "Example" sections to rST

2024-06-17 Thread John Snow

On Fri, Jun 14, 2024 at 10:39 AM Markus Armbruster 
wrote:

> John Snow  writes:
>
> > Eliminate the "Example" sections in QAPI doc blocks, converting them
> > into QMP example code blocks. This is generally done by converting
> > "Example:" or "Examples:" lines into ".. code-block:: QMP" lines.
> >
> > This patch does also allow for the use of the rST syntax "Example::" by
> > exempting double-colon syntax from the QAPI doc parser, but that form is
> > not used by this conversion patch. The phrase "Example" here is not
> > special, it is the double-colon syntax that transforms the following
> > block into a code-block. By default, *this* form does not apply QMP
> > highlighting.
> >
> > This patch has several benefits:
> >
> > 1. Example sections can now be written more arbitrarily, mixing
> >explanatory paragraphs and code blocks however desired.
> >
> > 2. Example sections can now use fully arbitrary rST.
> >
> > 3. All code blocks are now lexed and validated as QMP; increasing
> >usability of the docs and ensuring validity of example snippets.
> >
> > 4. Each code-block can be captioned independently without bypassing the
> >QMP lexer/validator.
> >
> > For any sections with more than one example, examples are split up into
> > multiple code-block regions. If annotations are present, those
> > annotations are converted into code-block captions instead, e.g.
> >
> > ```
> > Examples:
> >
> >1. Lorem Ipsum
> >
> >-> { "foo": "bar" }
> > ```
> >
> > Is rewritten as:
> >
> > ```
> > .. code-block:: QMP
> >:caption: Example: Lorem Ipsum
> >
> >-> { "foo": "bar" }
> > ```
> >
> > This process was only semi-automated:
> >
> > 1. Replace "Examples?:" sections with sed:
> >
> > sed -i 's|# Example:|# .. code-block:: QMP|' *.json
> > sed -i 's|# Examples:|# .. code-block:: QMP|' *.json
> >
> > 2. Identify sections that no longer parse successfully by attempting the
> >doc build, convert annotations into captions manually.
> >(Tedious, oh well.)
> >
> > 3. Add captions where still needed:
> >
> > sed -zi 's|# .. code-block:: QMP\n#\n|# .. code-block:: QMP\n#
> :caption: Example\n#\n|g' *.json
> >
> > Not fully ideal, but hopefully not something that has to be done very
> > often. (Or ever again.)
> >
> > Signed-off-by: John Snow 
> > ---
> >  qapi/acpi.json  |   6 +-
> >  qapi/block-core.json| 120 --
> >  qapi/block.json |  60 +++--
> >  qapi/char.json  |  36 ++--
> >  qapi/control.json   |  16 ++--
> >  qapi/dump.json  |  12 ++-
> >  qapi/machine-target.json|   3 +-
> >  qapi/machine.json   |  79 ++---
> >  qapi/migration.json | 145 +++-
> >  qapi/misc-target.json   |  33 +---
> >  qapi/misc.json  |  48 +++
> >  qapi/net.json   |  30 +--
> >  qapi/pci.json   |   6 +-
> >  qapi/qapi-schema.json   |   6 +-
> >  qapi/qdev.json  |  15 +++-
> >  qapi/qom.json   |  20 +++--
> >  qapi/replay.json|  12 ++-
> >  qapi/rocker.json|  12 ++-
> >  qapi/run-state.json |  45 ++
> >  qapi/tpm.json   |   9 +-
> >  qapi/trace.json |   6 +-
> >  qapi/transaction.json   |   3 +-
> >  qapi/ui.json|  62 +-
> >  qapi/virtio.json|  38 +
> >  qapi/yank.json  |   6 +-
> >  scripts/qapi/parser.py  |  15 +++-
> >  tests/qapi-schema/doc-good.json |  12 +--
> >  tests/qapi-schema/doc-good.out  |  17 ++--
> >  tests/qapi-schema/doc-good.txt  |  17 +---
> >  29 files changed, 574 insertions(+), 315 deletions(-)
>
> Missing: update of docs/devel/qapi-code-gen.rst.
>
> > diff --git a/qapi/acpi.json b/qapi/acpi.json
> > index aa4dbe57943..3da01f1b7fc 100644
> > --- a/qapi/acpi.json
> > +++ b/qapi/acpi.json
> > @@ -111,7 +111,8 @@
> >  #
> >  # Since: 2.1
> >  #
> > -# Example:
> > +# .. code-block:: QMP
> > +#:caption: Example
>
> I wish this was a bit less verbose.  Oh well, we'll live.
>

We can create a custom directive that assumes a default caption; e.g.

.. qapi:example::

... blah blah blah ...

And if you want to override it, you'd just

.. qapi:example::
:caption: Lorem Ipsum ...

... blah blah blah ...

Interested? (I kept this patch vanilla to avoid getting fancy, but there
are fanciness options available if you'd like them.)


>
> >  #
> >  # -> { "execute": "query-acpi-ospm-status" }
> >  # <- { "return": [ { "device": "d1", "slot": "0", "slot-type":
> "DIMM", "source": 1, "status": 0},
>
> This is rendered as a light green box with the caption on top, in
> italics and centered.  I'm not sure I like the use of the caption.  The
> previous patch's Note boxes look nicer.
>

We can change that with styling

Re: [PATCH v2 08/10] tests/migration-tests: Always enable migration events

2024-06-17 Thread Peter Xu

On Mon, Jun 17, 2024 at 04:51:32PM -0300, Fabiano Rosas wrote:
> Peter Xu  writes:
> 
> > Libvirt should always enable it, so it'll be nice qtest also cover that for
> > all tests.  Though this patch only enables it, no extra tests are done on
> > these events yet.
> >
> > Signed-off-by: Peter Xu 
> > ---
> >  tests/qtest/migration-test.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 13b59d4c10..9ae8892e26 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, 
> > QTestState **to,
> >  unlink(shmem_path);
> >  }
> >  
> > +/*
> > + * Always enable migration events.  Libvirt always uses it, let's try
> > + * to mimic as closer as that.
> > + */
> > +migrate_set_capability(*from, "events", true);
> > +migrate_set_capability(*to, "events", true);
> > +
> 
> What do we do with the one at migrate_incoming_qmp()?

Hmm missed that..  I'll drop that one in this same patch and rewrite the
commit message.  New version attached:

===8<===
>From 443fef4188d544362fc026b46784c15b82624642 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Mon, 17 Jun 2024 10:49:52 -0400
Subject: [PATCH] tests/migration-tests: Always enable migration events

Libvirt should always enable it, so it'll be nice qtest also cover that for
all tests on both sides.  migrate_incoming_qmp() used to enable it only on
dst, now we enable them on both, as we'll start to sanity check events even
on the src QEMU.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-helpers.c | 2 --
 tests/qtest/migration-test.c| 7 +++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index 0ac49ceb54..797b1e8c1c 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -258,8 +258,6 @@ void migrate_incoming_qmp(QTestState *to, const char *uri, 
const char *fmt, ...)
 g_assert(!qdict_haskey(args, "uri"));
 qdict_put_str(args, "uri", uri);
 
-migrate_set_capability(to, "events", true);
-
 rsp = qtest_qmp(to, "{ 'execute': 'migrate-incoming', 'arguments': %p}",
 args);
 
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 640713bfd5..c015e801ac 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -851,6 +851,13 @@ static int test_migrate_start(QTestState **from, 
QTestState **to,
 unlink(shmem_path);
 }
 
+/*
+ * Always enable migration events.  Libvirt always uses it, let's try
+ * to mimic as closer as that.
+ */
+migrate_set_capability(*from, "events", true);
+migrate_set_capability(*to, "events", true);
+
 return 0;
 }
 
-- 
2.45.0


-- 
Peter Xu

Re: [PULL 0/1] Dirty page rate and dirty page limit 20240617 patches

2024-06-17 Thread Richard Henderson


On 6/17/24 08:49, Hyman Huang wrote:

The following changes since commit 05ad1440b8428b0ade9b8e5c01469adb8fbf83e3:

   Merge tag 'virtio-grants-v8-tag' ofhttps://gitlab.com/sstabellini/qemu  into 
staging (2024-06-15 20:13:06 -0700)

are available in the Git repository at:

   https://github.com/newfriday/qemu.git  
tags/dirtylimit-dirtyrate-pull-request-20240617

for you to fetch changes up to e65152d5483b2c847ec7a947ed52650152cfdcc0:

   migration/dirtyrate: Fix segmentation fault (2024-06-17 23:29:21 +0800)


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PULL v2 00/25] target/i386, SCSI changes for 2024-06-11

2024-06-17 Thread Richard Henderson


On 6/17/24 01:13, Paolo Bonzini wrote:

The following changes since commit 80e8f0602168f451a93e71cbb1d59e93d745e62e:

   Merge tag 'bsd-user-misc-2024q2-pull-request' of gitlab.com:bsdimp/qemu into 
staging (2024-06-09 11:21:55 -0700)

are available in the Git repository at:

   https://gitlab.com/bonzini/qemu.git  tags/for-upstream

for you to fetch changes up to 109238a8d97cd8e85ca614109724a0b1222b21f5:

   target/i386: SEV: do not assume machine->cgs is SEV (2024-06-17 09:47:39 
+0200)


* i386: fix issue with cache topology passthrough
* scsi-disk: migrate emulated requests
* i386/sev: fix Coverity issues
* i386/tcg: more conversions to new decoder



v1->v2: fixed MOV from/to CR and DR in 64-bit modes (does not need REX.W)


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

[PATCH 3/5] pnv/xive2: Set Translation Table for the NVC port space

2024-06-17 Thread Michael Kowal

From: Frederic Barrat 

Set Translation Table for the NVC port space is missing.  The xive model
doesn't take into account the remapping of IO operations via the Set
Translation Table but firmware is allowed to define it for the Notify
Virtual Crowd (NVC), like it's already done for the other VST tables.

Signed-off-by: Michael Kowal 
---
 hw/intc/pnv_xive2.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index ff3d2d9c7b..a1146311a3 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -724,6 +724,7 @@ static int pnv_xive2_stt_set_data(PnvXive2 *xive, uint64_t 
val)
 case CQ_TAR_NVPG:
 case CQ_TAR_ESB:
 case CQ_TAR_END:
+case CQ_TAR_NVC:
 xive->tables[tsel][entry] = val;
 break;
 default:
-- 
2.39.3

[PATCH 2/5] pnv/xive2: Enable VST NVG and NVC index compression

2024-06-17 Thread Michael Kowal

From: Frederic Barrat 

Enable NVG and NVC VST tables for index compression which indicates the number
of bits the address is shifted to the right for the table accesses.
The compression values are defined as:
    - No compression
   0001 - 1 bit shift
   0010 - 2 bit shift
   
   1000 - 8 bit shift
   1001- - No compression

Signed-off-by: Michael Kowal 
---
 hw/intc/pnv_xive2_regs.h |  2 ++
 hw/intc/pnv_xive2.c  | 20 
 2 files changed, 22 insertions(+)

diff --git a/hw/intc/pnv_xive2_regs.h b/hw/intc/pnv_xive2_regs.h
index ca05255d20..e8b87b3d2c 100644
--- a/hw/intc/pnv_xive2_regs.h
+++ b/hw/intc/pnv_xive2_regs.h
@@ -427,6 +427,8 @@
 #define X_PC_NXC_PROC_CONFIG0x28A
 #define PC_NXC_PROC_CONFIG  0x450
 #define   PC_NXC_PROC_CONFIG_WATCH_ASSIGN   PPC_BITMASK(0, 3)
+#define   PC_NXC_PROC_CONFIG_NVG_TABLE_COMPRESS PPC_BITMASK(32, 35)
+#define   PC_NXC_PROC_CONFIG_NVC_TABLE_COMPRESS PPC_BITMASK(36, 39)
 
 /* NxC Cache Watch 0 Specification */
 #define X_PC_NXC_WATCH0_SPEC0x2A0
diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index 2050ed5efd..ff3d2d9c7b 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -217,6 +217,20 @@ static uint64_t pnv_xive2_vst_addr_indirect(PnvXive2 
*xive, uint32_t type,
 return pnv_xive2_vst_addr_direct(xive, type, vsd, (idx % vst_per_page));
 }
 
+static uint8_t nvc_table_compress_shift(PnvXive2 *xive)
+{
+uint8_t shift =  GETFIELD(PC_NXC_PROC_CONFIG_NVC_TABLE_COMPRESS,
+  xive->pc_regs[PC_NXC_PROC_CONFIG >> 3]);
+return shift > 8 ? 0 : shift;
+}
+
+static uint8_t nvg_table_compress_shift(PnvXive2 *xive)
+{
+uint8_t shift = GETFIELD(PC_NXC_PROC_CONFIG_NVG_TABLE_COMPRESS,
+ xive->pc_regs[PC_NXC_PROC_CONFIG >> 3]);
+return shift > 8 ? 0 : shift;
+}
+
 static uint64_t pnv_xive2_vst_addr(PnvXive2 *xive, uint32_t type, uint8_t blk,
uint32_t idx)
 {
@@ -238,6 +252,12 @@ static uint64_t pnv_xive2_vst_addr(PnvXive2 *xive, 
uint32_t type, uint8_t blk,
 return xive ? pnv_xive2_vst_addr(xive, type, blk, idx) : 0;
 }
 
+if (type == VST_NVG) {
+idx >>= nvg_table_compress_shift(xive);
+} else if (type == VST_NVC) {
+idx >>= nvc_table_compress_shift(xive);
+}
+
 if (VSD_INDIRECT & vsd) {
 return pnv_xive2_vst_addr_indirect(xive, type, vsd, idx);
 }
-- 
2.39.3

[PATCH 1/5] pnv/xive2: XIVE2 Cache Watch, Cache Flush and Sync Injection support

2024-06-17 Thread Michael Kowal

From: Frederic Barrat 

XIVE offers a 'cache watch facility', which allows software to read/update
a potentially cached table entry with no software lock. There's one such
facility in the Virtualization Controller (VC) to update the ESB and END
entries and one in the Presentation Controller (PC) to update the
NVP/NVG/NVC entries.

Both the Virtualization layer (VC) and Presentation layer (PC) need to be
configured to access the VSTs. Since the information is redundant, the
xive model combines both into one set of tables.  Since the VST tables can
be set through both, they are dynamically re-mapped in memory by first
deleting the memory subregion.

For the NVG and NVC tables, it can make sense to only configure them
with the PC, since they are only used by the presenter. So this patch
allow to configure the VST tables through the PC as well. The
definitions are still shared.

Each facility has 4 cache watch engines to control the updates and
firmware can request an available engine by querying the hardware
'watch_assign' register of the VC or PC. The engine is then reserved and
is released after the data is updated by reading the 'watch_spec' register
(which also allows to check for a conflict during the update).
If no engine is available, the special value 0xFF is returned and
firmware is expected to repeat the request until an engine becomes
available.

There is also support for writing a completion notification byte in memory
whenever a cache flush or queue sync inject operation is requested by
software.  QEMU does not cache any of the XIVE data that is in memory and
therefore it simply writes the completion notification byte at the time
that the operation is requested.

Signed-off-by: Michael Kowal 
---
 hw/intc/pnv_xive2_regs.h  | 106 
 include/hw/ppc/pnv_chip.h |   1 +
 hw/intc/pnv_xive2.c   | 511 +-
 3 files changed, 554 insertions(+), 64 deletions(-)

diff --git a/hw/intc/pnv_xive2_regs.h b/hw/intc/pnv_xive2_regs.h
index 7165dc8704..ca05255d20 100644
--- a/hw/intc/pnv_xive2_regs.h
+++ b/hw/intc/pnv_xive2_regs.h
@@ -232,6 +232,10 @@
 #define  VC_ESBC_FLUSH_POLL_BLOCK_ID_MASK   PPC_BITMASK(32, 35)
 #define  VC_ESBC_FLUSH_POLL_OFFSET_MASK PPC_BITMASK(36, 63) /* 28-bit 
*/
 
+/* ESBC cache flush inject register */
+#define X_VC_ESBC_FLUSH_INJECT  0x142
+#define VC_ESBC_FLUSH_INJECT0x210
+
 /* ESBC configuration */
 #define X_VC_ESBC_CFG   0x148
 #define VC_ESBC_CFG 0x240
@@ -250,6 +254,10 @@
 #define  VC_EASC_FLUSH_POLL_BLOCK_ID_MASK   PPC_BITMASK(32, 35)
 #define  VC_EASC_FLUSH_POLL_OFFSET_MASK PPC_BITMASK(36, 63) /* 28-bit 
*/
 
+/* EASC flush inject register */
+#define X_VC_EASC_FLUSH_INJECT  0x162
+#define VC_EASC_FLUSH_INJECT0x310
+
 /*
  * VC2
  */
@@ -270,6 +278,10 @@
 #define  VC_ENDC_FLUSH_POLL_BLOCK_ID_MASK   PPC_BITMASK(36, 39)
 #define  VC_ENDC_FLUSH_POLL_OFFSET_MASK PPC_BITMASK(40, 63) /* 24-bit 
*/
 
+/* ENDC flush inject register */
+#define X_VC_ENDC_FLUSH_INJECT  0x182
+#define VC_ENDC_FLUSH_INJECT0x410
+
 /* ENDC Sync done */
 #define X_VC_ENDC_SYNC_DONE 0x184
 #define VC_ENDC_SYNC_DONE   0x420
@@ -283,6 +295,15 @@
 #define   VC_ENDC_SYNC_QUEUE_HARD   PPC_BIT(6)
 #define   VC_QUEUE_COUNT7
 
+/* ENDC cache watch assign */
+#define X_VC_ENDC_WATCH_ASSIGN  0x186
+#define VC_ENDC_WATCH_ASSIGN0x430
+
+/* ENDC configuration register */
+#define X_VC_ENDC_CFG   0x188
+#define VC_ENDC_CFG 0x440
+#define   VC_ENDC_CFG_CACHE_WATCH_ASSIGNPPC_BITMASK(32, 35)
+
 /* ENDC cache watch specification 0  */
 #define X_VC_ENDC_WATCH0_SPEC   0x1A0
 #define VC_ENDC_WATCH0_SPEC 0x500
@@ -302,6 +323,42 @@
 #define VC_ENDC_WATCH0_DATA20x530
 #define VC_ENDC_WATCH0_DATA30x538
 
+/* ENDC cache watch 1  */
+#define X_VC_ENDC_WATCH1_SPEC   0x1A8
+#define VC_ENDC_WATCH1_SPEC 0x540
+#define X_VC_ENDC_WATCH1_DATA0  0x1AC
+#define X_VC_ENDC_WATCH1_DATA1  0x1AD
+#define X_VC_ENDC_WATCH1_DATA2  0x1AE
+#define X_VC_ENDC_WATCH1_DATA3  0x1AF
+#define VC_ENDC_WATCH1_DATA00x560
+#define VC_ENDC_WATCH1_DATA10x568
+#define VC_ENDC_WATCH1_DATA20x570
+#define VC_ENDC_WATCH1_DATA30x578
+
+/* ENDC cache watch 2  */
+#define X_VC_ENDC_WATCH2_SPEC   0x1B0
+#define VC_ENDC_WATCH2_SPEC 0x580
+#define X_VC_ENDC_WATCH2_DATA0  0x1B4
+#define X_VC_ENDC_WATCH2_DATA1  0x1B5
+#define X_VC_ENDC_WATCH2_DATA2

[PATCH 5/5] pnv/xive2: Move xive2_nvp_pic_print_info() to xive2.c

2024-06-17 Thread Michael Kowal

From: Frederic Barrat 

Moving xive2_nvp_pic_print_info() align with the other "pic_print_info"
functions and allows us to call functions internal to xive2.c.

In XIVE Gen 2 there were some minor changes to the TIMA header that were
updated when printed.

Additional END state 'info pic' information as added.  The 'ignore',
'crowd' and 'precluded escalation control' bits of an Event Notification
Descriptor are all used when delivering an interrupt targeting a VP-group
or crowd.

Signed-off-by: Michael Kowal 
---
 include/hw/ppc/xive2_regs.h |  9 +
 hw/intc/pnv_xive2.c | 27 ---
 hw/intc/xive.c  | 11 +--
 hw/intc/xive2.c | 33 +++--
 4 files changed, 49 insertions(+), 31 deletions(-)

diff --git a/include/hw/ppc/xive2_regs.h b/include/hw/ppc/xive2_regs.h
index 816f5d0e84..5fa39f3ccc 100644
--- a/include/hw/ppc/xive2_regs.h
+++ b/include/hw/ppc/xive2_regs.h
@@ -97,6 +97,7 @@ typedef struct Xive2End {
 uint32_t   w6;
 #define END2_W6_FORMAT_BIT PPC_BIT32(0)
 #define END2_W6_IGNORE PPC_BIT32(1)
+#define END2_W6_CROWD  PPC_BIT32(2)
 #define END2_W6_VP_BLOCK   PPC_BITMASK32(4, 7)
 #define END2_W6_VP_OFFSET  PPC_BITMASK32(8, 31)
 #define END2_W6_VP_OFFSET_GEN1 PPC_BITMASK32(13, 31)
@@ -111,6 +112,8 @@ typedef struct Xive2End {
 #define xive2_end_is_notify(end)\
 (be32_to_cpu((end)->w0) & END2_W0_UCOND_NOTIFY)
 #define xive2_end_is_backlog(end)  (be32_to_cpu((end)->w0) & END2_W0_BACKLOG)
+#define xive2_end_is_precluded_escalation(end)  \
+(be32_to_cpu((end)->w0) & END2_W0_PRECL_ESC_CTL)
 #define xive2_end_is_escalate(end)  \
 (be32_to_cpu((end)->w0) & END2_W0_ESCALATE_CTL)
 #define xive2_end_is_uncond_escalation(end)  \
@@ -123,6 +126,10 @@ typedef struct Xive2End {
 (be32_to_cpu((end)->w0) & END2_W0_FIRMWARE1)
 #define xive2_end_is_firmware2(end)  \
 (be32_to_cpu((end)->w0) & END2_W0_FIRMWARE2)
+#define xive2_end_is_ignore(end)\
+(be32_to_cpu((end)->w6) & END2_W6_IGNORE)
+#define xive2_end_is_crowd(end) \
+(be32_to_cpu((end)->w6) & END2_W6_CROWD)
 
 static inline uint64_t xive2_end_qaddr(Xive2End *end)
 {
@@ -194,6 +201,8 @@ static inline uint32_t xive2_nvp_blk(uint32_t cam_line)
 return (cam_line >> XIVE2_NVP_SHIFT) & 0xf;
 }
 
+void xive2_nvp_pic_print_info(Xive2Nvp *nvp, uint32_t nvp_idx, Monitor *mon);
+
 /*
  * Notification Virtual Group or Crowd (NVG/NVC)
  */
diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index e473109196..2138b7e365 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -2435,33 +2435,6 @@ static void pnv_xive2_register_types(void)
 
 type_init(pnv_xive2_register_types)
 
-static void xive2_nvp_pic_print_info(Xive2Nvp *nvp, uint32_t nvp_idx,
- Monitor *mon)
-{
-uint8_t  eq_blk = xive_get_field32(NVP2_W5_VP_END_BLOCK, nvp->w5);
-uint32_t eq_idx = xive_get_field32(NVP2_W5_VP_END_INDEX, nvp->w5);
-
-if (!xive2_nvp_is_valid(nvp)) {
-return;
-}
-
-monitor_printf(mon, "  %08x end:%02x/%04x IPB:%02x",
-   nvp_idx, eq_blk, eq_idx,
-   xive_get_field32(NVP2_W2_IPB, nvp->w2));
-/*
- * When the NVP is HW controlled, more fields are updated
- */
-if (xive2_nvp_is_hw(nvp)) {
-monitor_printf(mon, " CPPR:%02x",
-   xive_get_field32(NVP2_W2_CPPR, nvp->w2));
-if (xive2_nvp_is_co(nvp)) {
-monitor_printf(mon, " CO:%04x",
-   xive_get_field32(NVP2_W1_CO_THRID, nvp->w1));
-}
-}
-monitor_printf(mon, "\n");
-}
-
 /*
  * If the table is direct, we can compute the number of PQ entries
  * provisioned by FW.
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 057b308ae9..de583a87af 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -693,8 +693,15 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
 }
 }
 
-monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE 
PIPR"
-   "  W2\n", cpu_index);
+if (xive_presenter_get_config(tctx->xptr) & XIVE_PRESENTER_GEN1_TIMA_OS) {
+monitor_printf(mon,
+   "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
+   "  W2\n", cpu_index);
+} else {
+monitor_printf(mon,
+   "CPU[%04x]:   QW   NSR CPPR IPB LSMFB   -  LGS  T  PIPR"
+   "  W2\n", cpu_index);
+}
 
 for (i = 0; i < XIVE_TM_RING_COUNT; i++) {
 char *s = xive_tctx_ring_print(>regs[i * XIVE_TM_RING_SIZE]);
diff --git a/hw/intc/xive2.c b/hw/intc/xive2.c
index 98c0d8ba44..e5168330f3 100644
--- a/hw/intc/xive2.c
+++ b/hw/intc/xive2.c
@@ -91,7 +91,7 @@ void xive2_end_pic_print_info(Xive2End *end, uint32_t 
end_idx, Monitor *mon)
 pq =

[PATCH 0/5] XIVE changes for Cache Watch, VSTs, STT and info pic

2024-06-17 Thread Michael Kowal

These changes provide enhanced support of the External Interrupt Virtualization
Engine.  The changes are focused on the following areas:
 - Cache Watch, Cache Flush and Sync Injection
 - Virtual Structure Tables
 - Set Translation Table
 - 'info pic' command data that is dumped

Frederic Barrat (5):
  pnv/xive2: XIVE2 Cache Watch, Cache Flush and Sync Injection support
  pnv/xive2: Enable VST NVG and NVC index compression
  pnv/xive2: Set Translation Table for the NVC port space
  pnv/xive2: Fail VST entry address computation if table has no VSD
  pnv/xive2: Move xive2_nvp_pic_print_info() to xive2.c

 hw/intc/pnv_xive2_regs.h| 108 +++
 include/hw/ppc/pnv_chip.h   |   1 +
 include/hw/ppc/xive2_regs.h |   9 +
 hw/intc/pnv_xive2.c | 564 ++--
 hw/intc/xive.c  |  11 +-
 hw/intc/xive2.c |  33 ++-
 6 files changed, 631 insertions(+), 95 deletions(-)

--
2.39.3

[PATCH 4/5] pnv/xive2: Fail VST entry address computation if table has no VSD

2024-06-17 Thread Michael Kowal

From: Frederic Barrat 

Fail VST entry address computatio if firmware doesn't define a descriptor
for one of the Virtualization Structure Tables (VST), there's no point in
trying to compute the address of its entry.  Abort the operation and log
an error.

Signed-off-by: Michael Kowal 
---
 hw/intc/pnv_xive2.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index a1146311a3..e473109196 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -244,6 +244,11 @@ static uint64_t pnv_xive2_vst_addr(PnvXive2 *xive, 
uint32_t type, uint8_t blk,
 }
 
 vsd = xive->vsds[type][blk];
+if (vsd == 0) {
+xive2_error(xive, "VST: vsd == 0 block id %d for VST %s %d !?",
+   blk, info->name, idx);
+return 0;
+}
 
 /* Remote VST access */
 if (GETFIELD(VSD_MODE, vsd) == VSD_MODE_FORWARD) {
-- 
2.39.3

Re: [RFC PATCH v2 2/5] rust: add bindgen step as a meson dependency

2024-06-17 Thread Paolo Bonzini

Just one somewhat larger request, otherwise just a collection of ideas.

On Tue, Jun 11, 2024 at 12:34 PM Manos Pitsidianakis
 wrote:
> diff --git a/rust/meson.build b/rust/meson.build
> new file mode 100644
> index 00..e9660a3045
> --- /dev/null
> +++ b/rust/meson.build
> @@ -0,0 +1,91 @@
> +rust_targets = {}
> +
> +cargo_wrapper = [
> +  find_program(meson.global_source_root() / 'scripts/cargo_wrapper.py'),
> +  '--config-headers', meson.project_build_root() / 'config-host.h',
> +  '--meson-build-root', meson.project_build_root(),
> +  '--meson-build-dir', meson.current_build_dir(),
> +  '--meson-source-dir', meson.current_source_dir(),
> +]
>
> +
> +# TODO: verify rust_target_triple if given as an option
> +if rust_target_triple == ''
> +  if not supported_oses.contains(host_os)
> +message()
> +error('QEMU does not support `' + host_os +'` as a Rust platform.')
> +  elif not supported_cpus.contains(host_arch)
> +message()
> +error('QEMU does not support `' + host_arch +'` as a Rust architecture.')
> +  endif
> +  rust_target_triple = host_arch + rust_supported_oses[host_os]
> +  if host_os == 'windows' and host_arch == 'aarch64'
> +rust_target_triple += 'llvm'
> +  endif
> +endif
> +
> +if get_option('optimization') in ['0', '1', 'g']
> +  rs_build_type = 'debug'
> +else
> +  rs_build_type = 'release'
> +endif
> +
> +rust_hw_target_list = {}
> +
> +foreach rust_hw_target, rust_hws: rust_hw_target_list
> +  foreach rust_hw_dev: rust_hws
> +output = meson.current_build_dir() / rust_target_triple / rs_build_type 
> / rust_hw_dev['output']
> +crate_metadata = {
> +  'name': rust_hw_dev['name'],
> +  'output': [rust_hw_dev['output']],
> +  'output-path': output,
> +  'command': [cargo_wrapper,
> +'--crate-dir',
> +meson.current_source_dir() / rust_hw_dev['dirname'],
> +'--profile',
> +rs_build_type,
> +'--target-triple',
> +rust_target_triple,
> +'--outdir',
> +'@OUTDIR@',
> +'build-lib'

Probably needs to add config-devices.h as well to --config-headers? I
think that we can have just one crate that is built per target,
instead of many small crates like your rust_pl011_cargo. And then that
crate is built many times with a rust/src/ hierarchy that resembles
the meson.build files we use for C.

rust/src/mod.rs:
pub mod hw;

rust/src/hw/mod.rs:
pub mod char;

rust/src/hw/char/mod.rs:
#[cfg(feature = "CONFIG_PL011")] pub mod pl011;

(perhaps just src/? And we could even move all sources down one level
to src/ for QEMU as well? But that can be done later, for now it can
be rust/src/). This basically gives Kconfig integration for free if it
works, so I'd ask you to try doing this for the next version?

Also, sooner or later we'll have to tackle building a set of common
dependencies (similar to common_ss), and then bringing those as a
dependency to the build of the per-target crates. Even if we don't get
to the point of building devices once for all targets, I'd rather at
least avoid having to build a dozen times the common deps of
procedural macros.

This is not trivial, but should not be hard either, by using build.rs
(cargo::rustc-link-search) to add the -Ldependency=/path/to/rlibs
option to the per-target rustc invocations. This may also require
grabbing the compilation log via "cargo build
--message-format=json-render-diagnostics", but that's not hard to do
in cargo_wrapper.py. And unlike the previous request about Kconfig, it
can be done after the first merge.

> +#include "qemu/osdep.h"
> +#include "qemu/module.h"
> +#include "qemu-io.h"
> +#include "sysemu/sysemu.h"
> +#include "hw/sysbus.h"
> +#include "exec/memory.h"
> +#include "chardev/char-fe.h"
> +#include "hw/clock.h"
> +#include "hw/qdev-clock.h"
> +#include "hw/qdev-properties.h"
> +#include "hw/qdev-properties-system.h"
> +#include "hw/irq.h"
> +#include "qapi/error.h"
> +#include "migration/vmstate.h"
> +#include "chardev/char-serial.h"

Please check for any headers included indirectly and whether it's
worth including them here (e.g. it seems that qapi/error.h,
qom/object.h, hw/qdev-core.h should be there).

> diff --git a/scripts/cargo_wrapper.py b/scripts/cargo_wrapper.py
> index d338effdaa..a36a4fc86d 100644
> --- a/scripts/cargo_wrapper.py
> +++ b/scripts/cargo_wrapper.py
> @@ -94,6 +94,8 @@ def get_cargo_rustc(args: argparse.Namespace) -> 
> tuple[Dict[str, Any], List[str]
>
>  env = os.environ
>  env["CARGO_ENCODED_RUSTFLAGS"] = cfg
> +env["MESON_BUILD_DIR"] = str(target_dir)
> +env["MESON_BUILD_ROOT"] = str(args.meson_build_root)
>
>  return (env, cargo_cmd)
>
> @@ -164,6 +166,14 @@ def main() -> None:
>  required=True,
>  )
>  parser.add_argument(
> +"--meson-build-root",
> +metavar="BUILD_ROOT",
> +help="meson.project_build_root()",
> +type=pathlib.Path,
> +dest="meson_build_root",
> +required=True,
> +)
> +

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-17 Thread Dr. David Alan Gilbert

* Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:
> On 6/14/24 15:00, Dr. David Alan Gilbert wrote:
> > * Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:
> > > Hi Dave,
> > > 
> > > On 6/12/24 14:02, Dr. David Alan Gilbert wrote:
> > > > * Alex Bennée (alex.ben...@linaro.org) wrote:
> > > > > From: Pierrick Bouvier 
> > > > > 
> > > > > This plugin uses the new time control interface to make decisions
> > > > > about the state of time during the emulation. The algorithm is
> > > > > currently very simple. The user specifies an ips rate which applies
> > > > > per core. If the core runs ahead of its allocated execution time the
> > > > > plugin sleeps for a bit to let real time catch up. Either way time is
> > > > > updated for the emulation as a function of total executed instructions
> > > > > with some adjustments for cores that idle.
> > > > 
> > > > A few random thoughts:
> > > > a) Are there any definitions of what a plugin that controls time
> > > >should do with a live migration?
> > > 
> > > It's not something that was considered as part of this work.
> > 
> > That's OK, the only thing is we need to stop anyone from hitting problems
> > when they don't realise it's not been addressed.
> > One way might be to add a migration blocker; see include/migration/blocker.h
> > then you might print something like 'Migration not available due to plugin 
> > '
> > 
> 
> So basically, we could make a call to migrate_add_blocker(), when someone
> request time_control through plugin API?
> 
> IMHO, it's something that should be part of plugin API (if any plugin calls
> qemu_plugin_request_time_control()), instead of the plugin code itself. This
> way, any plugin getting time control automatically blocks any potential
> migration.

Note my question asked for a 'any definitions of what a plugin ..' - so
you could define it that way, another one is to think that in the future
you may allow it and the plugin somehow interacts with migration not to
change time at certain migration phases.

> > > > b) The sleep in migration/dirtyrate.c points out g_usleep might
> > > >sleep for longer, so reads the actual wall clock time to
> > > >figure out a new 'now'.
> > > 
> > > The current API mentions time starts at 0 from qemu startup. Maybe we 
> > > could
> > > consider in the future to change this behavior to retrieve time from an
> > > existing migrated machine.
> > 
> > Ah, I meant for (b) to be independent of (a) - not related to migration; 
> > just
> > down to the fact you used g_usleep in the plugin and a g_usleep might sleep
> > for a different amount of time than you asked.
> > 
> 
> We know that, and the plugin is not meant to be "cycle accurate" in general,
> we just set a upper bound for number of instructions we can execute in a
> given amount of time (1/10 second for now).
> 
> We compute the new time based on how many instructions effectively ran on
> the most used cpu, so even if we slept a bit more than expected, it's
> correct.

Ah OK.

Dave

> > > > c) A fun thing to do with this would be to follow an external 
> > > > simulation
> > > >or 2nd qemu, trying to keep the two from running too far past
> > > >each other.
> > > > 
> > > 
> > > Basically, to slow the first one, waiting for the replicated one to catch
> > > up?
> > 
> > Yes, something like that.
> > 
> > Dave
> > 
> > > > Dave >
> > > > > Examples
> > > > > 
> > > > > 
> > > > > Slow down execution of /bin/true:
> > > > > $ num_insn=$(./build/qemu-x86_64 -plugin 
> > > > > ./build/tests/plugin/libinsn.so -d plugin /bin/true |& grep total | 
> > > > > sed -e 's/.*: //')
> > > > > $ time ./build/qemu-x86_64 -plugin 
> > > > > ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
> > > > > real 4.000s
> > > > > 
> > > > > Boot a Linux kernel simulating a 250MHz cpu:
> > > > > $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 
> > > > > -append "console=ttyS0" -plugin 
> > > > > ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
> > > > > check time until kernel panic on serial0
> > > > > 
> > > > > Tested in system mode by booting a full debian system, and using:
> > > > > $ sysbench cpu run
> > > > > Performance decrease linearly with the given number of ips.
> > > > > 
> > > > > Signed-off-by: Pierrick Bouvier 
> > > > > Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
> > > > > ---
> > > > >contrib/plugins/ips.c| 164 
> > > > > +++
> > > > >contrib/plugins/Makefile |   1 +
> > > > >2 files changed, 165 insertions(+)
> > > > >create mode 100644 contrib/plugins/ips.c
> > > > > 
> > > > > diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
> > > > > new file mode 100644
> > > > > index 00..db77729264
> > > > > --- /dev/null
> > > > > +++ b/contrib/plugins/ips.c
> > > > > @@ -0,0 +1,164 @@
> > > > > +/*
> > > > > + * ips rate limiting plugin.
>

Re: [PATCH v2 00/10] migration: New postcopy state, and some cleanups

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Hello,
>
> On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
>> v2:
>> - Collect tags
>> - Patch 3
>>   - cover all states in migration_postcopy_is_alive()
>> - Patch 4 (old)
>>   - English changes [Fabiano]
>>   - Split the migration_incoming_state_setup() cleanup into a new patch
>> [Fabiano]
>>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>>   - Keep using explicit state check in migrate_fd_connect() for resume
>> [Fabiano]
>> - New patches
>>   - New doc update: "migration/docs: Update postcopy recover session for
>> SETUP phase"
>>   - New test case: last four patches
>
> I just found that this won't apply on top of latest master, and also has a
> trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
> few days on comments, and resend v3 on top of your direct-io stuff.
>
> Meanwhile I also plan to squash below fixup to the last test patch, just to
> fix up a spelling error I just found, and also renamed the test cases (as
> the new test is actually also a "double failure" test, just at different
> phase).  Comments welcomed for that fixup even before a repost.
>
> ===8<===
> From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
> From: Peter Xu 
> Date: Mon, 17 Jun 2024 14:40:15 -0400
> Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
>  reconnect
>
> Signed-off-by: Peter Xu 
> ---
>  tests/qtest/migration-test.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index a4fed4cc6b..fe33b86783 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, 
> QTestState *to,
>  
>  /*
>   * Kick dest QEMU out too. This is normally not needed in reality
> - * because when the channel is shutdown it should also happens on src.
> + * because when the channel is shutdown it should also happen on src.
>   * However here we used separate socket pairs so we need to do that
>   * explicitly.
>   */
> @@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
>  test_postcopy_recovery_common();
>  }
>  
> -static void test_postcopy_recovery_double_fail(void)
> +static void test_postcopy_recovery_fail_handshake(void)
>  {
>  MigrateCommon args = {
>  .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
> @@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
>  test_postcopy_recovery_common();
>  }
>  
> -static void test_postcopy_recovery_channel_reconnect(void)
> +static void test_postcopy_recovery_fail_reconnect(void)
>  {
>  MigrateCommon args = {
>  .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
> @@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
> test_postcopy_preempt);
>  migration_test_add("/migration/postcopy/preempt/recovery/plain",
> test_postcopy_preempt_recovery);
> -migration_test_add("/migration/postcopy/recovery/double-failures",
> -   test_postcopy_recovery_double_fail);
> -migration_test_add("/migration/postcopy/recovery/channel-reconnect",
> -   test_postcopy_recovery_channel_reconnect);
> +
> migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
> +   test_postcopy_recovery_fail_handshake);
> +
> migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
> +   test_postcopy_recovery_fail_reconnect);
>  if (is_x86) {
>  migration_test_add("/migration/postcopy/suspend",
> test_postcopy_suspend);
> -- 
> 2.45.0

LGTM

Re: [RFC PATCH 0/3] target/i386: Reorg push/pop within seg_helper.c

2024-06-17 Thread Paolo Bonzini

On Mon, Jun 17, 2024 at 6:12 PM Richard Henderson
 wrote:
>
> Hi Paolo,
>
> Thanks for offering to do the work to fix the memory access issues
> identified by Robert.
>
> Here is a code dump from this weekend that I noodled with -- it is
> prep work only, not intending to change any semantics, but it may
> be helpful in finishing the work.
>
> I considered adding a MemOp member to the structure, which would
> allow the push/pop subroutine to choose the correct access width,
> which would allow the callers to stop having 3 nearly identical
> code paths.  But I didn't quite get that far, and I don't yet
> know if that would really work out wrt long-mode.

Thanks, I had something working but your patch 3 is definitely
prettier so I'll just rebase on top of these three.

Paolo

Re: [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Make sure there will be an event for postcopy recovery, irrelevant of
> whether the reconnect will success, or when the failure happens.
>
> The added new case is to fail early in postcopy recovery, in which case it
> didn't even reach RECOVER stage on src (and in real life it'll be the same
> to dest, but the test case is just slightly more involved due to the dual
> socketpair setup).
>
> To do that, rename the postcopy_recovery_test_fail to reflect either stage
> to fail, instead of a boolean.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 19/20] qapi: convert "Note" sections to plain rST

2024-06-17 Thread John Snow

On Fri, Jun 14, 2024, 9:44 AM Markus Armbruster  wrote:

> John Snow  writes:
>
> > We do not need a dedicated section for notes. By eliminating a specially
> > parsed section, these notes can be treated as normal rST paragraphs in
> > the new QMP reference manual, and can be placed and styled much more
> > flexibly.
> >
> > Update the QAPI parser to now prohibit special "Note" sections while
> > suggesting a new syntax.
> >
> > The exact formatting to use is a matter of taste, but a good candidate
> > is simply:
> >
> > .. note:: lorem ipsum ...
> >
> > but there are other choices, too. The Sphinx readthedocs theme offers
> > theming for the following forms (capitalization unimportant); all are
> > adorned with a (!) symbol in the title bar for rendered HTML docs.
>

Store this paragraph in your L1 cache for a moment ...

>
> > These are rendered in orange:
> >
> > .. Attention:: ...
> > .. Caution:: ...
> > .. WARNING:: ...
> >
> > These are rendered in red:
> >
> > .. DANGER:: ...
> > .. Error:: ...
> >
> > These are rendered in green:
> >
> > .. Hint:: ...
> > .. Important:: ...
> > .. Tip:: ...
> >
> > These are rendered in blue:
> >
> > .. Note:: ...
> > .. admonition:: custom title
> >
> >admonition body text
> >
> > This patch uses ".. notes::" almost everywhere, with just two "caution"
> > directives. ".. admonition:: notes" is used in a few places where we had
> > an ordered list of multiple notes.
> >
> > Signed-off-by: John Snow 
> > ---
> >  qapi/block-core.json  | 30 +++
> >  qapi/block.json   |  2 +-
> >  qapi/char.json| 12 +--
> >  qapi/control.json | 15 ++--
> >  qapi/dump.json|  2 +-
> >  qapi/introspect.json  |  6 +-
> >  qapi/machine-target.json  | 26 +++---
> >  qapi/machine.json | 47 +-
> >  qapi/migration.json   | 12 +--
> >  qapi/misc.json| 88 +--
> >  qapi/net.json |  6 +-
> >  qapi/pci.json |  7 +-
> >  qapi/qdev.json| 30 +++
> >  qapi/qom.json | 19 ++--
> >  qapi/rocker.json  | 16 ++--
> >  qapi/run-state.json   | 18 ++--
> >  qapi/sockets.json | 10 +--
> >  qapi/stats.json   | 22 ++---
> >  qapi/transaction.json |  8 +-
> >  qapi/ui.json  | 29 +++---
> >  qapi/virtio.json  | 12 +--
> >  qga/qapi-schema.json  | 48 +-
> >  scripts/qapi/parser.py|  9 ++
> >  tests/qapi-schema/doc-empty-section.err   |  2 +-
> >  tests/qapi-schema/doc-empty-section.json  |  2 +-
> >  tests/qapi-schema/doc-good.json   |  6 +-
> >  tests/qapi-schema/doc-good.out| 10 ++-
> >  tests/qapi-schema/doc-good.txt| 14 ++-
> >  .../qapi-schema/doc-interleaved-section.json  |  2 +-
> >  29 files changed, 258 insertions(+), 252 deletions(-)
>
> Missing: update of docs/devel/qapi-code-gen.rst.  Something like
>
> diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
> index f453bd3546..1a4e240adb 100644
> --- a/docs/devel/qapi-code-gen.rst
> +++ b/docs/devel/qapi-code-gen.rst
> @@ -995,14 +995,13 @@ line "Features:", like this::
># @feature: Description text
>
>  A tagged section begins with a paragraph that starts with one of the
> -following words: "Note:"/"Notes:", "Since:", "Example:"/"Examples:",
> -"Returns:", "Errors:", "TODO:".  It ends with the start of a new
> -section.
> +following words: "Since:", "Example:"/"Examples:", "Returns:",
> +"Errors:", "TODO:".  It ends with the start of a new section.
>
>  The second and subsequent lines of tagged sections must be indented
>  like this::
>
> - # Note: Ut enim ad minim veniam, quis nostrud exercitation ullamco
> + # TODO: Ut enim ad minim veniam, quis nostrud exercitation ullamco
>   # laboris nisi ut aliquip ex ea commodo consequat.
>   #
>   # Duis aute irure dolor in reprehenderit in voluptate velit esse
>
>
Done.


> >
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 64fe5240cc9..530af40404d 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -1510,7 +1510,7 @@
> >  # @mode: whether and how QEMU should create a new image, default is
> >  # 'absolute-paths'.
> >  #
> > -# Note: Either @device or @node-name must be set but not both.
> > +# .. note:: Either @device or @node-name must be set but not both.
>
> The commit message talks about ".. Note::", but you actually use
> ".. note::".  Is there a difference?
>

Retrieve paragraph from L1 cache.

Nope!

Re: [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Making sure the postcopy-recover-setup status is present in the postcopy
> failure unit test.  Note that it only applies to src QEMU not dest.
>
> This also introduces the tiny but helpful migration_event_wait() helper.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 08/10] tests/migration-tests: Always enable migration events

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Libvirt should always enable it, so it'll be nice qtest also cover that for
> all tests.  Though this patch only enables it, no extra tests are done on
> these events yet.
>
> Signed-off-by: Peter Xu 
> ---
>  tests/qtest/migration-test.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 13b59d4c10..9ae8892e26 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -841,6 +841,13 @@ static int test_migrate_start(QTestState **from, 
> QTestState **to,
>  unlink(shmem_path);
>  }
>  
> +/*
> + * Always enable migration events.  Libvirt always uses it, let's try
> + * to mimic as closer as that.
> + */
> +migrate_set_capability(*from, "events", true);
> +migrate_set_capability(*to, "events", true);
> +

What do we do with the one at migrate_incoming_qmp()?

Re: [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Most of them are not needed, we can stick with one ifdef inside
> postcopy_recover_fail() so as to cover the scm right tricks only.
> The tests won't run on windows anyway due to has_uffd always false.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Firstly, the "Paused" state was added in the wrong place before. The state
> machine section was describing PostcopyState, rather than MigrationStatus.
> Drop the Paused state descriptions.
>
> Then in the postcopy recover session, add more information on the state
> machine for MigrationStatus in the lines.  Add the new RECOVER_SETUP phase.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas 

> ---
>  docs/devel/migration/postcopy.rst | 31 ---
>  1 file changed, 16 insertions(+), 15 deletions(-)
>
> diff --git a/docs/devel/migration/postcopy.rst 
> b/docs/devel/migration/postcopy.rst
> index 6c51e96d79..a15594e11f 100644
> --- a/docs/devel/migration/postcopy.rst
> +++ b/docs/devel/migration/postcopy.rst
> @@ -99,17 +99,6 @@ ADVISE->DISCARD->LISTEN->RUNNING->END
>  (although it can't do the cleanup it would do as it
>  finishes a normal migration).
>  
> - - Paused
> -
> -Postcopy can run into a paused state (normally on both sides when
> -happens), where all threads will be temporarily halted mostly due to
> -network errors.  When reaching paused state, migration will make sure
> -the qemu binary on both sides maintain the data without corrupting
> -the VM.  To continue the migration, the admin needs to fix the
> -migration channel using the QMP command 'migrate-recover' on the
> -destination node, then resume the migration using QMP command 'migrate'
> -again on source node, with resume=true flag set.
> -
>   - End
>  
>  The listen thread can now quit, and perform the cleanup of migration
> @@ -221,7 +210,8 @@ paused postcopy migration.
>  
>  The recovery phase normally contains a few steps:
>  
> -  - When network issue occurs, both QEMU will go into PAUSED state
> +  - When network issue occurs, both QEMU will go into **POSTCOPY_PAUSED**
> +migration state.
>  
>- When the network is recovered (or a new network is provided), the admin
>  can setup the new channel for migration using QMP command
> @@ -229,9 +219,20 @@ The recovery phase normally contains a few steps:
>  
>- On source host, the admin can continue the interrupted postcopy
>  migration using QMP command 'migrate' with resume=true flag set.
> -
> -  - After the connection is re-established, QEMU will continue the postcopy
> -migration on both sides.
> +Source QEMU will go into **POSTCOPY_RECOVER_SETUP** state trying to
> +re-establish the channels.
> +
> +  - When both sides of QEMU successfully reconnects using a new or fixed up

s/reconnects/reconnect

I can touch it up when queueing

> +channel, they will go into **POSTCOPY_RECOVER** state, some handshake
> +procedure will be needed to properly synchronize the VM states between
> +the two QEMUs to continue the postcopy migration.  For example, there
> +can be pages sent right during the window when the network is
> +interrupted, then the handshake will guarantee pages lost in-flight
> +will be resent again.
> +
> +  - After a proper handshake synchronization, QEMU will continue the
> +postcopy migration on both sides and go back to **POSTCOPY_ACTIVE**
> +state.  Postcopy migration will continue.
>  
>  During a paused postcopy migration, the VM can logically still continue
>  running, and it will not be impacted from any page access to pages that

Re: [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> This patch adds a migration state on src called "postcopy-recover-setup".
> The new state will describe the intermediate step starting from when the
> src QEMU received a postcopy recovery request, until the migration channels
> are properly established, but before the recovery process take place.
>
> The request came from Libvirt where Libvirt currently rely on the migration
> state events to detect migration state changes.  That works for most of the
> migration process but except postcopy recovery failures at the beginning.
>
> Currently postcopy recovery only has two major states:
>
>   - postcopy-paused: this is the state that both sides of QEMU will be in
> for a long time as long as the migration channel was interrupted.
>
>   - postcopy-recover: this is the state where both sides of QEMU handshake
> with each other, preparing for a continuation of postcopy which used to
> be interrupted.
>
> The issue here is when the recovery port is invalid, the src QEMU will take
> the URI/channels, noticing the ports are not valid, and it'll silently keep
> in the postcopy-paused state, with no event sent to Libvirt.  In this case,
> the only thing Libvirt can do is to poll the migration status with a proper
> interval, however that's less optimal.
>
> Considering that this is the only case where Libvirt won't get a
> notification from QEMU on such events, let's add postcopy-recover-setup
> state to mimic what we have with the "setup" state of a newly initialized
> migration, describing the phase of connection establishment.
>
> With that, postcopy recovery will have two paths to go now, and either path
> will guarantee an event generated.  Now the events will look like this
> during a recovery process on src QEMU:
>
>   - Initially when the recovery is initiated on src, QEMU will go from
> "postcopy-paused" -> "postcopy-recover-setup".  Old QEMUs don't have
> this event.
>
>   - Depending on whether the channel re-establishment is succeeded:
>
> - In succeeded case, src QEMU will move from "postcopy-recover-setup"
>   to "postcopy-recover".  Old QEMUs also have this event.
>
> - In failure case, src QEMU will move from "postcopy-recover-setup" to
>   "postcopy-paused" again.  Old QEMUs don't have this event.
>
> This guarantees that Libvirt will always receive a notification for
> recovery process properly.
>
> One thing to mention is, such new status is only needed on src QEMU not
> both.  On dest QEMU, the state machine doesn't change.  Hence the events
> don't change either.  It's done like so because dest QEMU may not have an
> explicit point of setup start.  E.g., it can happen that when dest QEMUs
> doesn't use migrate-recover command to use a new URI/channel, but the old
> URI/channels can be reused in recovery, in which case the old ports simply
> can work again after the network routes are fixed up.
>
> Add a new helper postcopy_is_paused() detecting whether postcopy is still
> paused, taking RECOVER_SETUP into account too.  When using it on both
> src/dst, a slight change is done altogether to always wait for the
> semaphore before checking the status, because for both sides a sem_post()
> will be required for a recovery.
>
> Cc: Jiri Denemark 
> Cc: Fabiano Rosas 
> Cc: Prasad Pandit 
> Buglink: https://issues.redhat.com/browse/RHEL-38485
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 04/10] migration: Cleanup incoming migration setup state change

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> Destination QEMU can setup incoming ports for two purposes: either a fresh
> new incoming migration, in which QEMU will switch to SETUP for channel
> establishment, or a paused postcopy migration, in which QEMU will stay in
> POSTCOPY_PAUSED until kicking off the RECOVER phase.
>
> Now the state machine worked on dest node for the latter, only because
> migrate_set_state() implicitly will become a noop if the current state
> check failed.  It wasn't clear at all.
>
> Clean it up by providing a helper migration_incoming_state_setup() doing
> proper checks over current status.  Postcopy-paused will be explicitly
> checked now, and then we can bail out for unknown states.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 03/10] migration: Use MigrationStatus instead of int

2024-06-17 Thread Fabiano Rosas

Peter Xu  writes:

> QEMU uses "int" in most cases even if it stores MigrationStatus.  I don't
> know why, so let's try to do that right and see what blows up..
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v2 00/10] migration: New postcopy state, and some cleanups

2024-06-17 Thread Peter Xu

Hello,

On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
> v2:
> - Collect tags
> - Patch 3
>   - cover all states in migration_postcopy_is_alive()
> - Patch 4 (old)
>   - English changes [Fabiano]
>   - Split the migration_incoming_state_setup() cleanup into a new patch
> [Fabiano]
>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>   - Keep using explicit state check in migrate_fd_connect() for resume
> [Fabiano]
> - New patches
>   - New doc update: "migration/docs: Update postcopy recover session for
> SETUP phase"
>   - New test case: last four patches

I just found that this won't apply on top of latest master, and also has a
trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
few days on comments, and resend v3 on top of your direct-io stuff.

Meanwhile I also plan to squash below fixup to the last test patch, just to
fix up a spelling error I just found, and also renamed the test cases (as
the new test is actually also a "double failure" test, just at different
phase).  Comments welcomed for that fixup even before a repost.

===8<===
>From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
From: Peter Xu 
Date: Mon, 17 Jun 2024 14:40:15 -0400
Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
 reconnect

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index a4fed4cc6b..fe33b86783 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, 
QTestState *to,
 
 /*
  * Kick dest QEMU out too. This is normally not needed in reality
- * because when the channel is shutdown it should also happens on src.
+ * because when the channel is shutdown it should also happen on src.
  * However here we used separate socket pairs so we need to do that
  * explicitly.
  */
@@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
 test_postcopy_recovery_common();
 }
 
-static void test_postcopy_recovery_double_fail(void)
+static void test_postcopy_recovery_fail_handshake(void)
 {
 MigrateCommon args = {
 .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
@@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
 test_postcopy_recovery_common();
 }
 
-static void test_postcopy_recovery_channel_reconnect(void)
+static void test_postcopy_recovery_fail_reconnect(void)
 {
 MigrateCommon args = {
 .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
@@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
test_postcopy_preempt);
 migration_test_add("/migration/postcopy/preempt/recovery/plain",
test_postcopy_preempt_recovery);
-migration_test_add("/migration/postcopy/recovery/double-failures",
-   test_postcopy_recovery_double_fail);
-migration_test_add("/migration/postcopy/recovery/channel-reconnect",
-   test_postcopy_recovery_channel_reconnect);
+
migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
+   test_postcopy_recovery_fail_handshake);
+
migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
+   test_postcopy_recovery_fail_reconnect);
 if (is_x86) {
 migration_test_add("/migration/postcopy/suspend",
test_postcopy_suspend);
-- 
2.45.0


-- 
Peter Xu

Re: [PATCH v3 16/16] tests/qtest/migration: Add a test for mapped-ram with passing of fds

2024-06-17 Thread Peter Xu

On Mon, Jun 17, 2024 at 03:57:31PM -0300, Fabiano Rosas wrote:
> Add a multifd test for mapped-ram with passing of fds into QEMU. This
> is how libvirt will consume the feature.
> 
> There are a couple of details to the fdset mechanism:
> 
> - multifd needs two distinct file descriptors (not duplicated with
>   dup()) so it can enable O_DIRECT only on the channels that do
>   aligned IO. The dup() system call creates file descriptors that
>   share status flags, of which O_DIRECT is one.
> 
> - the open() access mode flags used for the fds passed into QEMU need
>   to match the flags QEMU uses to open the file. Currently O_WRONLY
>   for src and O_RDONLY for dst.
> 
> Note that fdset code goes under _WIN32 because fd passing is not
> supported on Windows.
> 
> Signed-off-by: Fabiano Rosas 
> ---
> - dropped Peter's r-b
> 
> - stopped removing the fdset at the end of the tests. The migration
> code should always cleanup after itself.

Ah, that looks also ok.

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH v3 12/16] migration/multifd: Add direct-io support

2024-06-17 Thread Peter Xu

On Mon, Jun 17, 2024 at 03:57:27PM -0300, Fabiano Rosas wrote:
> When multifd is used along with mapped-ram, we can take benefit of a
> filesystem that supports the O_DIRECT flag and perform direct I/O in
> the multifd threads. This brings a significant performance improvement
> because direct-io writes bypass the page cache which would otherwise
> be thrashed by the multifd data which is unlikely to be needed again
> in a short period of time.
> 
> To be able to use a multifd channel opened with O_DIRECT, we must
> ensure that a certain aligment is used. Filesystems usually require a
> block-size alignment for direct I/O. The way to achieve this is by
> enabling the mapped-ram feature, which already aligns its I/O properly
> (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).
> 
> By setting O_DIRECT on the multifd channels, all writes to the same
> file descriptor need to be aligned as well, even the ones that come
> from outside multifd, such as the QEMUFile I/O from the main migration
> code. This makes it impossible to use the same file descriptor for the
> QEMUFile and for the multifd channels. The various flags and metadata
> written by the main migration code will always be unaligned by virtue
> of their small size. To workaround this issue, we'll require a second
> file descriptor to be used exclusively for direct I/O.
> 
> The second file descriptor can be obtained by QEMU by re-opening the
> migration file (already possible), or by being provided by the user or
> management application (support to be added in future patches).
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-17 Thread Pierrick Bouvier


On 6/14/24 15:00, Dr. David Alan Gilbert wrote:

* Pierrick Bouvier (pierrick.bouv...@linaro.org) wrote:

Hi Dave,

On 6/12/24 14:02, Dr. David Alan Gilbert wrote:

* Alex Bennée (alex.ben...@linaro.org) wrote:

From: Pierrick Bouvier 

This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies
per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.


A few random thoughts:
a) Are there any definitions of what a plugin that controls time
   should do with a live migration?


It's not something that was considered as part of this work.


That's OK, the only thing is we need to stop anyone from hitting problems
when they don't realise it's not been addressed.
One way might be to add a migration blocker; see include/migration/blocker.h
then you might print something like 'Migration not available due to plugin '



So basically, we could make a call to migrate_add_blocker(), when 
someone request time_control through plugin API?


IMHO, it's something that should be part of plugin API (if any plugin 
calls qemu_plugin_request_time_control()), instead of the plugin code 
itself. This way, any plugin getting time control automatically blocks 
any potential migration.



b) The sleep in migration/dirtyrate.c points out g_usleep might
   sleep for longer, so reads the actual wall clock time to
   figure out a new 'now'.


The current API mentions time starts at 0 from qemu startup. Maybe we could
consider in the future to change this behavior to retrieve time from an
existing migrated machine.


Ah, I meant for (b) to be independent of (a) - not related to migration; just
down to the fact you used g_usleep in the plugin and a g_usleep might sleep
for a different amount of time than you asked.



We know that, and the plugin is not meant to be "cycle accurate" in 
general, we just set a upper bound for number of instructions we can 
execute in a given amount of time (1/10 second for now).


We compute the new time based on how many instructions effectively ran 
on the most used cpu, so even if we slept a bit more than expected, it's 
correct.



c) A fun thing to do with this would be to follow an external simulation
   or 2nd qemu, trying to keep the two from running too far past
   each other.



Basically, to slow the first one, waiting for the replicated one to catch
up?


Yes, something like that.

Dave


Dave >

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin 
/bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Tested in system mode by booting a full debian system, and using:
$ sysbench cpu run
Performance decrease linearly with the given number of ips.

Signed-off-by: Pierrick Bouvier 
Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
---
   contrib/plugins/ips.c| 164 +++
   contrib/plugins/Makefile |   1 +
   2 files changed, 165 insertions(+)
   create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 00..db77729264
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,164 @@
+/*
+ * ips rate limiting plugin.
+ *
+ * This plugin can be used to restrict the execution of a system to a
+ * particular number of Instructions Per Second (ips). This controls
+ * time as seen by the guest so while wall-clock time may be longer
+ * from the guests point of view time will pass at the normal rate.
+ *
+ * This uses the new plugin API which allows the plugin to control
+ * system time.
+ *
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include 
+#include 
+#include 
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+/* how many times do we update time per sec */
+#define NUM_TIME_UPDATE_PER_SEC 10
+#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
+
+static GMutex global_state_lock;
+
+static uint64_t max_insn_per_second = 1000 * 1000 * 1000; /* ips per core, per 
second */
+static uint64_t max_insn_per_quantum; /* trap every N instructions */
+static int64_t virtual_time_ns; /* last set virtual time */
+
+static const void *time_handle;
+
+typedef struct {
+uint64_t total_insn;
+uint64_t quantum_insn; /*

Re: [PATCH v3 01/16] migration: Drop reference to QIOChannel if file seeking fails

2024-06-17 Thread Peter Xu

On Mon, Jun 17, 2024 at 03:57:16PM -0300, Fabiano Rosas wrote:
> We forgot to drop the reference to the QIOChannel in the error path of
> the offset adjustment. Do it now.
> 
> Signed-off-by: Fabiano Rosas 

Reviewed-by: Peter Xu 

-- 
Peter Xu

Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling

2024-06-17 Thread Pierrick Bouvier


On 6/16/24 11:43, Alex Bennée wrote:

Pierrick Bouvier  writes:


On 6/13/24 01:54, Philippe Mathieu-Daudé wrote:

On 12/6/24 17:35, Alex Bennée wrote:

From: Pierrick Bouvier 

This plugin uses the new time control interface to make decisions
about the state of time during the emulation. The algorithm is
currently very simple. The user specifies an ips rate which applies

... IPS rate (Instructions Per Second) which ...


per core. If the core runs ahead of its allocated execution time the
plugin sleeps for a bit to let real time catch up. Either way time is
updated for the emulation as a function of total executed instructions
with some adjustments for cores that idle.

Examples


Slow down execution of /bin/true:
$ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin 
/bin/true |& grep total | sed -e 's/.*: //')
$ time ./build/qemu-x86_64 -plugin 
./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
real 4.000s

Boot a Linux kernel simulating a 250MHz cpu:
$ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append 
"console=ttyS0" -plugin 
./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
check time until kernel panic on serial0

Tested in system mode by booting a full debian system, and using:
$ sysbench cpu run
Performance decrease linearly with the given number of ips.

Signed-off-by: Pierrick Bouvier 
Message-Id: <20240530220610.1245424-7-pierrick.bouv...@linaro.org>
---
contrib/plugins/ips.c| 164 +++
contrib/plugins/Makefile |   1 +
2 files changed, 165 insertions(+)
create mode 100644 contrib/plugins/ips.c

diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
new file mode 100644
index 00..db77729264
--- /dev/null
+++ b/contrib/plugins/ips.c
@@ -0,0 +1,164 @@
+/*
+ * ips rate limiting plugin.

The plugin names are really to packed to my taste (each time I look
for
one I have to open most source files to figure out the correct one); so
please ease my life by using a more descriptive header at least:
Instructions Per Second (IPS) rate limiting plugin.
Thanks.



I agree most of the plugin names are pretty cryptic, and they are
lacking a common "help" system, to describe what they do, and which
options are available for them. It's definitely something we could add
in the future.

Regarding what you reported, I'm totally ok with the change.

However, since this is a new series, I'm not if I or Alex should
change it. If it's ok for you to modify this Alex, it could be simpler
than waiting for me to push a new patch with just this.


Its my tree so I'll fix it up. I'll ask you if I want a respin ;-)



Thanks Alex.

[PATCH 12/23] Add ability to get rval2

2024-06-17 Thread Ajeet Singh

From: Warner Losh 

Function accesses the x1 register which holds the value

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_vmparam.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_vmparam.h 
b/bsd-user/aarch64/target_arch_vmparam.h
index dc66e1289b..0c35491970 100644
--- a/bsd-user/aarch64/target_arch_vmparam.h
+++ b/bsd-user/aarch64/target_arch_vmparam.h
@@ -65,4 +65,10 @@ static inline void set_second_rval(CPUARMState *state, 
abi_ulong retval2)
 {
 state->xregs[1] = retval2; /* XXX not really used on 64-bit arch */
 }
+
+static inline abi_ulong get_second_rval(CPUARMState *state)
+{
+return state->xregs[1];
+}
+
 #endif /* TARGET_ARCH_VMPARAM_H */
-- 
2.34.1

[PATCH 06/23] Add Aarch64 register handling

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Header file for managing CPU register states in
FreeBSD user mode

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_reg.h | 56 ++
 1 file changed, 56 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_reg.h

diff --git a/bsd-user/aarch64/target_arch_reg.h 
b/bsd-user/aarch64/target_arch_reg.h
new file mode 100644
index 00..5c7154f0c1
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_reg.h
@@ -0,0 +1,56 @@
+/*
+ *  FreeBSD arm64 register structures
+ *
+ *  Copyright (c) 2015 Stacey Son
+ *  All rights reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#ifndef TARGET_ARCH_REG_H
+#define TARGET_ARCH_REG_H
+
+/* See sys/arm64/include/reg.h */
+typedef struct target_reg {
+uint64_tx[30];
+uint64_tlr;
+uint64_tsp;
+uint64_telr;
+uint64_tspsr;
+} target_reg_t;
+
+typedef struct target_fpreg {
+__uint128_t fp_q[32];
+uint32_tfp_sr;
+uint32_tfp_cr;
+} target_fpreg_t;
+
+#define tswapreg(ptr)   tswapal(ptr)
+
+static inline void target_copy_regs(target_reg_t *regs, CPUARMState *env)
+{
+int i;
+
+for (i = 0; i < 30; i++) {
+regs->x[i] = tswapreg(env->xregs[i]);
+}
+regs->lr = tswapreg(env->xregs[30]);
+regs->sp = tswapreg(env->xregs[31]);
+regs->elr = tswapreg(env->pc);
+regs->spsr = tswapreg(pstate_read(env));
+}
+
+#undef tswapreg
+
+#endif /* TARGET_ARCH_REG_H */
-- 
2.34.1

[PATCH 20/23] Add get_mcontext function for ARM AArch64 in bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

function to retrieve machine context,it populates the provided
target_mcontext_t structure with information from the CPUARMState
registers

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Kyle Evans 
---
 bsd-user/aarch64/signal.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/bsd-user/aarch64/signal.c b/bsd-user/aarch64/signal.c
index 98861f9ab3..ab3bf8558a 100644
--- a/bsd-user/aarch64/signal.c
+++ b/bsd-user/aarch64/signal.c
@@ -51,3 +51,33 @@ abi_long set_sigtramp_args(CPUARMState *regs, int sig,
 
 return 0;
 }
+
+/*
+ * Compare to get_mcontext() in arm64/arm64/machdep.c
+ * Assumes that the memory is locked if mcp points to user memory.
+ */
+abi_long get_mcontext(CPUARMState *regs, target_mcontext_t *mcp, int flags)
+{
+int err = 0, i;
+uint64_t *gr = mcp->mc_gpregs.gp_x;
+
+mcp->mc_gpregs.gp_spsr = pstate_read(regs);
+if (flags & TARGET_MC_GET_CLEAR_RET) {
+gr[0] = 0UL;
+mcp->mc_gpregs.gp_spsr &= ~CPSR_C;
+} else {
+gr[0] = tswap64(regs->xregs[0]);
+}
+
+for (i = 1; i < 30; i++) {
+gr[i] = tswap64(regs->xregs[i]);
+}
+
+mcp->mc_gpregs.gp_sp = tswap64(regs->xregs[TARGET_REG_SP]);
+mcp->mc_gpregs.gp_lr = tswap64(regs->xregs[TARGET_REG_LR]);
+mcp->mc_gpregs.gp_elr = tswap64(regs->pc);
+
+/* XXX FP? */
+
+return err;
+}
-- 
2.34.1

[PATCH 22/23] Add set_mcontext function for ARM AArch64 in bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

The function copies register values from the provided target_mcontext_t
structure to the CPUARMState registers

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/signal.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/bsd-user/aarch64/signal.c b/bsd-user/aarch64/signal.c
index 43c886e603..13faac8ce6 100644
--- a/bsd-user/aarch64/signal.c
+++ b/bsd-user/aarch64/signal.c
@@ -95,3 +95,25 @@ abi_long setup_sigframe_arch(CPUARMState *env, abi_ulong 
frame_addr,
 return 0;
 }
 
+/*
+ * Compare to set_mcontext() in arm64/arm64/machdep.c
+ * Assumes that the memory is locked if frame points to user memory.
+ */
+abi_long set_mcontext(CPUARMState *regs, target_mcontext_t *mcp, int srflag)
+{
+int err = 0, i;
+const uint64_t *gr = mcp->mc_gpregs.gp_x;
+
+for (i = 0; i < 30; i++) {
+regs->xregs[i] = tswap64(gr[i]);
+}
+
+regs->xregs[TARGET_REG_SP] = tswap64(mcp->mc_gpregs.gp_sp);
+regs->xregs[TARGET_REG_LR] = tswap64(mcp->mc_gpregs.gp_lr);
+regs->pc = mcp->mc_gpregs.gp_elr;
+pstate_write(regs, mcp->mc_gpregs.gp_spsr);
+
+/* XXX FP? */
+
+return err;
+}
-- 
2.34.1

[PATCH 23/23] Add get_ucontext_sigreturn function

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Function checks the processor state to ensure that the current
execution mode is EL0 and no flags indicating interrupts or
exceptions are set

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/signal.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/bsd-user/aarch64/signal.c b/bsd-user/aarch64/signal.c
index 13faac8ce6..ad81531ec5 100644
--- a/bsd-user/aarch64/signal.c
+++ b/bsd-user/aarch64/signal.c
@@ -117,3 +117,21 @@ abi_long set_mcontext(CPUARMState *regs, target_mcontext_t 
*mcp, int srflag)
 
 return err;
 }
+
+/* Compare to sys_sigreturn() in  arm64/arm64/machdep.c */
+abi_long get_ucontext_sigreturn(CPUARMState *regs, abi_ulong target_sf,
+abi_ulong *target_uc)
+{
+uint32_t pstate = pstate_read(regs);
+
+*target_uc = 0;
+
+if ((pstate & PSTATE_M) != PSTATE_MODE_EL0t  ||
+(pstate & (PSTATE_F | PSTATE_I | PSTATE_A | PSTATE_D)) != 0) {
+return -TARGET_EINVAL;
+}
+
+*target_uc = target_sf;
+
+return 0;
+}
-- 
2.34.1

[PATCH v3 03/16] tests/qtest/migration: Fix file migration offset check

2024-06-17 Thread Fabiano Rosas

When doing file migration, QEMU accepts an offset that should be
skipped when writing the migration stream to the file. The purpose of
the offset is to allow the management layer to put its own metadata at
the start of the file.

We have tests for this in migration-test, but only testing that the
migration stream starts at the correct offset and not that it actually
leaves the data intact. Unsurprisingly, there's been a bug in that
area that the tests didn't catch.

Fix the tests to write some data to the offset region and check that
it's actually there after the migration.

While here, switch to using g_get_file_contents() which is more
portable than mmap().

Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 79 ++--
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0dccb4beff..0a529a527b 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -68,6 +68,7 @@ static QTestMigrationState dst_state;
 #define QEMU_VM_FILE_MAGIC 0x5145564d
 #define FILE_TEST_FILENAME "migfile"
 #define FILE_TEST_OFFSET 0x1000
+#define FILE_TEST_MARKER 'X'
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
 
@@ -1693,10 +1694,43 @@ finish:
 test_migrate_end(from, to, args->result == MIG_TEST_SUCCEED);
 }
 
+static void file_dirty_offset_region(void)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET;
+g_autofree char *data = g_new0(char, size);
+
+memset(data, FILE_TEST_MARKER, size);
+g_assert(g_file_set_contents(path, data, size, NULL));
+}
+
+static void file_check_offset_region(void)
+{
+g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+size_t size = FILE_TEST_OFFSET;
+g_autofree char *expected = g_new0(char, size);
+g_autofree char *actual = NULL;
+uint64_t *stream_start;
+
+/*
+ * Ensure the skipped offset region's data has not been touched
+ * and the migration stream starts at the right place.
+ */
+
+memset(expected, FILE_TEST_MARKER, size);
+
+g_assert(g_file_get_contents(path, , NULL, NULL));
+g_assert(!memcmp(actual, expected, size));
+
+stream_start = (uint64_t *)(actual + size);
+g_assert_cmpint(cpu_to_be64(*stream_start) >> 32, ==, QEMU_VM_FILE_MAGIC);
+}
+
 static void test_file_common(MigrateCommon *args, bool stop_src)
 {
 QTestState *from, *to;
 void *data_hook = NULL;
+bool check_offset = false;
 
 if (test_migrate_start(, , args->listen_uri, >start)) {
 return;
@@ -1709,6 +1743,16 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
  */
 g_assert_false(args->live);
 
+if (g_strrstr(args->connect_uri, "offset=")) {
+check_offset = true;
+/*
+ * This comes before the start_hook because it's equivalent to
+ * a management application creating the file and writing to
+ * it so hooks should expect the file to be already present.
+ */
+file_dirty_offset_region();
+}
+
 if (args->start_hook) {
 data_hook = args->start_hook(from, to);
 }
@@ -1743,6 +1787,10 @@ static void test_file_common(MigrateCommon *args, bool 
stop_src)
 
 wait_for_serial("dest_serial");
 
+if (check_offset) {
+file_check_offset_region();
+}
+
 finish:
 if (args->finish_hook) {
 args->finish_hook(from, to, data_hook);
@@ -1942,36 +1990,6 @@ static void test_precopy_file(void)
 test_file_common(, true);
 }
 
-static void file_offset_finish_hook(QTestState *from, QTestState *to,
-void *opaque)
-{
-#if defined(__linux__)
-g_autofree char *path = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
-size_t size = FILE_TEST_OFFSET + sizeof(QEMU_VM_FILE_MAGIC);
-uintptr_t *addr, *p;
-int fd;
-
-fd = open(path, O_RDONLY);
-g_assert(fd != -1);
-addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
-g_assert(addr != MAP_FAILED);
-
-/*
- * Ensure the skipped offset contains zeros and the migration
- * stream starts at the right place.
- */
-p = addr;
-while (p < addr + FILE_TEST_OFFSET / sizeof(uintptr_t)) {
-g_assert(*p == 0);
-p++;
-}
-g_assert_cmpint(cpu_to_be64(*p) >> 32, ==, QEMU_VM_FILE_MAGIC);
-
-munmap(addr, size);
-close(fd);
-#endif
-}
-
 static void test_precopy_file_offset(void)
 {
 g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
@@ -1980,7 +1998,6 @@ static void test_precopy_file_offset(void)
 MigrateCommon args = {
 .connect_uri = uri,
 .listen_uri = "defer",
-.finish_hook = file_offset_finish_hook,
 };
 
 test_file_common(, false);
-- 
2.35.3

[PATCH v3 10/16] io: Stop using qemu_open_old in channel-file

2024-06-17 Thread Fabiano Rosas

We want to make use of the Error object to report fdset errors from
qemu_open_internal() and passing the error pointer to qemu_open_old()
would require changing all callers. Move the file channel to the new
API instead.

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 io/channel-file.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/io/channel-file.c b/io/channel-file.c
index 6436cfb6ae..2ea8d08360 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
 
 ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
 
-ioc->fd = qemu_open_old(path, flags, mode);
+if (flags & O_CREAT) {
+ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
+} else {
+ioc->fd = qemu_open(path, flags, errp);
+}
 if (ioc->fd < 0) {
 object_unref(OBJECT(ioc));
-error_setg_errno(errp, errno,
- "Unable to open %s", path);
 return NULL;
 }
 
-- 
2.35.3

[PATCH 19/23] Add ARM AArch64 signal trampoline argument setup for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

function to set up signal handler arguments it populates
register values in `CPUARMState` based on the provided
signal, signal frame, signal action, and frame address

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/signal.c | 53 +++
 1 file changed, 53 insertions(+)
 create mode 100644 bsd-user/aarch64/signal.c

diff --git a/bsd-user/aarch64/signal.c b/bsd-user/aarch64/signal.c
new file mode 100644
index 00..98861f9ab3
--- /dev/null
+++ b/bsd-user/aarch64/signal.c
@@ -0,0 +1,53 @@
+/*
+ * ARM AArch64 specific signal definitions for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "qemu/osdep.h"
+
+#include "qemu.h"
+
+/*
+ * Compare to sendsig() in sys/arm64/arm64/machdep.c
+ * Assumes that target stack frame memory is locked.
+ */
+abi_long set_sigtramp_args(CPUARMState *regs, int sig,
+   struct target_sigframe *frame,
+   abi_ulong frame_addr,
+   struct target_sigaction *ka)
+{
+/*
+ * Arguments to signal handler:
+ *  x0 = signal number
+ *  x1 = siginfo pointer
+ *  x2 = ucontext pointer
+ *  pc/elr = signal handler pointer
+ *  sp = sigframe struct pointer
+ *  lr = sigtramp at base of user stack
+ */
+
+regs->xregs[0] = sig;
+regs->xregs[1] = frame_addr +
+offsetof(struct target_sigframe, sf_si);
+regs->xregs[2] = frame_addr +
+offsetof(struct target_sigframe, sf_uc);
+
+regs->pc = ka->_sa_handler;
+regs->xregs[TARGET_REG_SP] = frame_addr;
+regs->xregs[TARGET_REG_LR] = TARGET_PS_STRINGS - TARGET_SZSIGCODE;
+
+return 0;
+}
-- 
2.34.1

[PATCH 08/23] Add Aarch64 sysarch() system call emulation for BSD-USER

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Initial implementation of sysarch() syscall and a printing function

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_sysarch.h | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_sysarch.h

diff --git a/bsd-user/aarch64/target_arch_sysarch.h 
b/bsd-user/aarch64/target_arch_sysarch.h
new file mode 100644
index 00..b003015daf
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_sysarch.h
@@ -0,0 +1,42 @@
+/*
+ * ARM AArch64 sysarch() system call emulation for bsd-user.
+ *
+ * Copyright (c) 2015 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SYSARCH_H
+#define TARGET_ARCH_SYSARCH_H
+
+#include "target_syscall.h"
+#include "target_arch.h"
+
+/* See sysarch() in sys/arm64/arm64/sys_machdep.c */
+static inline abi_long do_freebsd_arch_sysarch(CPUARMState *env, int op,
+abi_ulong parms)
+{
+int ret = -TARGET_EOPNOTSUPP;
+
+fprintf(stderr, "sysarch");
+return ret;
+}
+
+static inline void do_freebsd_arch_print_sysarch(
+const struct syscallname *name, abi_long arg1, abi_long arg2,
+abi_long arg3, abi_long arg4, abi_long arg5, abi_long arg6)
+{
+}
+
+#endif /* TARGET_ARCH_SYSARCH_H */
-- 
2.34.1

[PATCH 03/23] Added function to clone CPU state

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Function can copy cpu state to create new thread

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_cpu.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_cpu.h 
b/bsd-user/aarch64/target_arch_cpu.h
index 1962d2c99b..4e950305d3 100644
--- a/bsd-user/aarch64/target_arch_cpu.h
+++ b/bsd-user/aarch64/target_arch_cpu.h
@@ -171,4 +171,21 @@ static inline void target_cpu_loop(CPUARMState *env)
 } /* for (;;) */
 }
 
+
+/* See arm64/arm64/vm_machdep.c cpu_fork() */
+static inline void target_cpu_clone_regs(CPUARMState *env, target_ulong newsp)
+{
+if (newsp) {
+env->xregs[31] = newsp;
+}
+env->regs[0] = 0;
+env->regs[1] = 0;
+pstate_write(env, 0);
+}
+
+static inline void target_cpu_reset(CPUArchState *env)
+{
+}
+
+
 #endif /* TARGET_ARCH_CPU_H */
-- 
2.34.1

[PATCH 18/23] Add ARM AArch64 specific signal definitions for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Defined register indices and sizes,introduced structures to represent
general purpose registers, floating point registers, and machine context

Signed-off-by: Stacey Son 
Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
Co-authored-by: Warner Losh 
---
 bsd-user/aarch64/target_arch_signal.h | 80 +++
 1 file changed, 80 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_signal.h

diff --git a/bsd-user/aarch64/target_arch_signal.h 
b/bsd-user/aarch64/target_arch_signal.h
new file mode 100644
index 00..df17173316
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_signal.h
@@ -0,0 +1,80 @@
+/*
+ * ARM AArch64 specific signal definitions for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SIGNAL_H
+#define TARGET_ARCH_SIGNAL_H
+
+#include "cpu.h"
+
+#define TARGET_REG_X0   0
+#define TARGET_REG_X30  30
+#define TARGET_REG_X31  31
+#define TARGET_REG_LR   TARGET_REG_X30
+#define TARGET_REG_SP   TARGET_REG_X31
+
+#define TARGET_INSN_SIZE4   /* arm64 instruction size */
+
+/* Size of the signal trampolin code. See _sigtramp(). */
+#define TARGET_SZSIGCODE((abi_ulong)(9 * TARGET_INSN_SIZE))
+
+/* compare to sys/arm64/include/_limits.h */
+#define TARGET_MINSIGSTKSZ  (1024 * 4)  /* min sig stack size 
*/
+#define TARGET_SIGSTKSZ (TARGET_MINSIGSTKSZ + 32768)  /* recommended size 
*/
+
+/* struct __mcontext in sys/arm64/include/ucontext.h */
+
+struct target_gpregs {
+uint64_tgp_x[30];
+uint64_tgp_lr;
+uint64_tgp_sp;
+uint64_tgp_elr;
+uint32_tgp_spsr;
+uint32_tgp_pad;
+};
+
+struct target_fpregs {
+__uint128_t fp_q[32];
+uint32_tfp_sr;
+uint32_tfp_cr;
+uint32_tfp_flags;
+uint32_tfp_pad;
+};
+
+struct target__mcontext {
+struct target_gpregs mc_gpregs;
+struct target_fpregs mc_fpregs;
+uint32_tmc_flags;
+#define TARGET_MC_FP_VALID  0x1
+uint32_tmc_pad;
+uint64_tmc_spare[8];
+};
+
+typedef struct target__mcontext target_mcontext_t;
+
+#define TARGET_MCONTEXT_SIZE 880
+#define TARGET_UCONTEXT_SIZE 960
+
+#include "target_os_ucontext.h"
+
+struct target_sigframe {
+target_siginfo_tsf_si;  /* saved siginfo */
+target_ucontext_t   sf_uc;  /* saved ucontext */
+};
+
+#endif /* TARGET_ARCH_SIGNAL_H */
-- 
2.34.1

[PATCH 10/23] Add thread initialization for BSD-USER

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Initializes thread's register state

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
---
 bsd-user/aarch64/target_arch_thread.h | 16 
 1 file changed, 16 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_thread.h 
b/bsd-user/aarch64/target_arch_thread.h
index d2f2dea7ce..bfc9050cb1 100644
--- a/bsd-user/aarch64/target_arch_thread.h
+++ b/bsd-user/aarch64/target_arch_thread.h
@@ -42,4 +42,20 @@ static inline void target_thread_set_upcall(CPUARMState 
*regs, abi_ulong entry,
 pstate_write(regs, PSTATE_MODE_EL0t);
 }
 
+static inline void target_thread_init(struct target_pt_regs *regs,
+struct image_info *infop)
+{
+abi_long stack = infop->start_stack;
+
+/*
+ * Make sure the stack is properly aligned.
+ * arm64/include/param.h (STACKLIGN() macro)
+ */
+
+memset(regs, 0, sizeof(*regs));
+regs->regs[0] = infop->start_stack;
+regs->pc = infop->entry;
+regs->sp = stack & ~(16 - 1);
+}
+
 #endif /* TARGET_ARCH_THREAD_H */
-- 
2.34.1

[PATCH 00/23] ARM AArch64 Support for BSD

2024-06-17 Thread Ajeet Singh

making sure to credit all the authors correctly

Stacey Son (18):
  Add CPU initialization function
  Added CPU loop function
  Added function to clone CPU state
  AArch64 specific CPU for bsd-user
  Managing CPU register for BSD-USER
  Add Aarch64 register handling
  Add ARM AArch64 TLS Management Prototypes for BSD-User
  Add Aarch64 sysarch() system call emulation for BSD-USER
  Add thread setup for BSD-USER
  Add thread initialization for BSD-USER
  Update ARM AArch64 VM parameter definitions for bsd-user
  Add ARM AArch64 ELF definitions for bsd-user
  Add ARM AArch64 sigcode setup function for bsd-user
  Add ARM AArch64 specific signal definitions for bsd-user
  Add ARM AArch64 signal trampoline argument setup for bsd-user
  Add get_mcontext function for ARM AArch64 in bsd-user
  Add set_mcontext function for ARM AArch64 in bsd-user
  Add get_ucontext_sigreturn function

Warner Losh (5):
  Add ability to get rval2
  Add ARM AArch64 hardware capability definitions
  Add function to retrieve ARM AArch64 hardware capabilities
  Add function to retrieve additional ARM AArch64 hwcap
  Add setup_sigframe_arch function for ARM AArch64 in bsd-user

 bsd-user/aarch64/signal.c   | 137 +
 bsd-user/aarch64/target_arch.h  |  28 
 bsd-user/aarch64/target_arch_cpu.c  |  34 +
 bsd-user/aarch64/target_arch_cpu.h  | 191 
 bsd-user/aarch64/target_arch_elf.h  | 165 
 bsd-user/aarch64/target_arch_reg.h  |  56 +++
 bsd-user/aarch64/target_arch_signal.h   |  80 ++
 bsd-user/aarch64/target_arch_sigtramp.h |  48 ++
 bsd-user/aarch64/target_arch_sysarch.h  |  42 ++
 bsd-user/aarch64/target_arch_thread.h   |  61 
 bsd-user/aarch64/target_arch_vmparam.h  |  74 +
 bsd-user/aarch64/target_syscall.h   |  51 +++
 12 files changed, 967 insertions(+)
 create mode 100644 bsd-user/aarch64/signal.c
 create mode 100644 bsd-user/aarch64/target_arch.h
 create mode 100644 bsd-user/aarch64/target_arch_cpu.c
 create mode 100644 bsd-user/aarch64/target_arch_cpu.h
 create mode 100644 bsd-user/aarch64/target_arch_elf.h
 create mode 100644 bsd-user/aarch64/target_arch_reg.h
 create mode 100644 bsd-user/aarch64/target_arch_signal.h
 create mode 100644 bsd-user/aarch64/target_arch_sigtramp.h
 create mode 100644 bsd-user/aarch64/target_arch_sysarch.h
 create mode 100644 bsd-user/aarch64/target_arch_thread.h
 create mode 100644 bsd-user/aarch64/target_arch_vmparam.h
 create mode 100644 bsd-user/aarch64/target_syscall.h

-- 
2.34.1

[PATCH 09/23] Add thread setup for BSD-USER

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Function for setting up thread upcall which will
add thread support to BSD-User

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Jessica Clarke 
---
 bsd-user/aarch64/target_arch_thread.h | 45 +++
 1 file changed, 45 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_thread.h

diff --git a/bsd-user/aarch64/target_arch_thread.h 
b/bsd-user/aarch64/target_arch_thread.h
new file mode 100644
index 00..d2f2dea7ce
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_thread.h
@@ -0,0 +1,45 @@
+/*
+ * ARM AArch64 thread support for bsd-user.
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_THREAD_H
+#define TARGET_ARCH_THREAD_H
+
+/* Compare to arm64/arm64/vm_machdep.c cpu_set_upcall_kse() */
+static inline void target_thread_set_upcall(CPUARMState *regs, abi_ulong entry,
+abi_ulong arg, abi_ulong stack_base, abi_ulong stack_size)
+{
+abi_ulong sp;
+
+/*
+ * Make sure the stack is properly aligned.
+ * arm64/include/param.h (STACKLIGN() macro)
+ */
+sp = (abi_ulong)(stack_base + stack_size) & ~(16 - 1);
+
+/* sp = stack base */
+regs->xregs[31] = sp;
+/* pc = start function entry */
+regs->pc = entry;
+/* r0 = arg */
+regs->xregs[0] = arg;
+
+pstate_write(regs, PSTATE_MODE_EL0t);
+}
+
+#endif /* TARGET_ARCH_THREAD_H */
-- 
2.34.1

[PATCH 16/23] Add function to retrieve additional ARM AArch64 hwcap

2024-06-17 Thread Ajeet Singh

From: Warner Losh 

Function to retrieve the extended hardware capability flags

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_elf.h | 29 +
 1 file changed, 29 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_elf.h 
b/bsd-user/aarch64/target_arch_elf.h
index ed2419408e..7202cd8334 100644
--- a/bsd-user/aarch64/target_arch_elf.h
+++ b/bsd-user/aarch64/target_arch_elf.h
@@ -133,4 +133,33 @@ static uint32_t get_elf_hwcap(void)
 return hwcaps;
 }
 
+static uint32_t get_elf_hwcap2(void)
+{
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+uint32_t hwcaps = 0;
+
+GET_FEATURE_ID(aa64_dcpodp, ARM_HWCAP2_A64_DCPODP);
+GET_FEATURE_ID(aa64_sve2, ARM_HWCAP2_A64_SVE2);
+GET_FEATURE_ID(aa64_sve2_aes, ARM_HWCAP2_A64_SVEAES);
+GET_FEATURE_ID(aa64_sve2_pmull128, ARM_HWCAP2_A64_SVEPMULL);
+GET_FEATURE_ID(aa64_sve2_bitperm, ARM_HWCAP2_A64_SVEBITPERM);
+GET_FEATURE_ID(aa64_sve2_sha3, ARM_HWCAP2_A64_SVESHA3);
+GET_FEATURE_ID(aa64_sve2_sm4, ARM_HWCAP2_A64_SVESM4);
+GET_FEATURE_ID(aa64_condm_5, ARM_HWCAP2_A64_FLAGM2);
+GET_FEATURE_ID(aa64_frint, ARM_HWCAP2_A64_FRINT);
+GET_FEATURE_ID(aa64_sve_i8mm, ARM_HWCAP2_A64_SVEI8MM);
+GET_FEATURE_ID(aa64_sve_f32mm, ARM_HWCAP2_A64_SVEF32MM);
+GET_FEATURE_ID(aa64_sve_f64mm, ARM_HWCAP2_A64_SVEF64MM);
+GET_FEATURE_ID(aa64_sve_bf16, ARM_HWCAP2_A64_SVEBF16);
+GET_FEATURE_ID(aa64_i8mm, ARM_HWCAP2_A64_I8MM);
+GET_FEATURE_ID(aa64_bf16, ARM_HWCAP2_A64_BF16);
+GET_FEATURE_ID(aa64_rndr, ARM_HWCAP2_A64_RNG);
+GET_FEATURE_ID(aa64_bti, ARM_HWCAP2_A64_BTI);
+GET_FEATURE_ID(aa64_mte, ARM_HWCAP2_A64_MTE);
+
+return hwcaps;
+}
+
+#undef GET_FEATURE_ID
+
 #endif /* TARGET_ARCH_ELF_H */
-- 
2.34.1

[PATCH v3 04/16] tests/qtest/migration: Add a precopy file test with fdset

2024-06-17 Thread Fabiano Rosas

Add a test for file migration using fdset. The passing of fds is more
complex than using a file path. This is also the scenario where it's
most important we ensure that the initial migration stream offset is
respected because the fdset interface is the one used by the
management layer when providing a non empty migration file.

Note that fd passing is not available on Windows, so anything that
uses add-fd needs to exclude that platform.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 tests/qtest/migration-test.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 0a529a527b..22b07bc0ec 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1990,6 +1990,46 @@ static void test_precopy_file(void)
 test_file_common(, true);
 }
 
+#ifndef _WIN32
+static void fdset_add_fds(QTestState *qts, const char *file, int flags,
+  int num_fds)
+{
+for (int i = 0; i < num_fds; i++) {
+int fd;
+
+fd = open(file, flags, 0660);
+assert(fd != -1);
+
+qtest_qmp_fds_assert_success(qts, , 1, "{'execute': 'add-fd', "
+ "'arguments': {'fdset-id': 1}}");
+close(fd);
+}
+}
+
+static void *file_offset_fdset_start_hook(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 1);
+fdset_add_fds(to, file, O_RDONLY, 1);
+
+return NULL;
+}
+
+static void test_precopy_file_offset_fdset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = file_offset_fdset_start_hook,
+};
+
+test_file_common(, false);
+}
+#endif
+
 static void test_precopy_file_offset(void)
 {
 g_autofree char *uri = g_strdup_printf("file:%s/%s,offset=%d", tmpfs,
@@ -3527,6 +3567,10 @@ int main(int argc, char **argv)
test_precopy_file);
 migration_test_add("/migration/precopy/file/offset",
test_precopy_file_offset);
+#ifndef _WIN32
+migration_test_add("/migration/precopy/file/offset/fdset",
+   test_precopy_file_offset_fdset);
+#endif
 migration_test_add("/migration/precopy/file/offset/bad",
test_precopy_file_offset_bad);
 
-- 
2.35.3

[PATCH 07/23] Add ARM AArch64 TLS Management Prototypes for BSD-User

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Prototypes for setting and getting TLS( thread local storage)

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch.h | 28 
 1 file changed, 28 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch.h

diff --git a/bsd-user/aarch64/target_arch.h b/bsd-user/aarch64/target_arch.h
new file mode 100644
index 00..27f47de8eb
--- /dev/null
+++ b/bsd-user/aarch64/target_arch.h
@@ -0,0 +1,28 @@
+/*
+ * ARM AArch64 specific prototypes for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_H
+#define TARGET_ARCH_H
+
+#include "qemu.h"
+
+void target_cpu_set_tls(CPUARMState *env, target_ulong newtls);
+target_ulong target_cpu_get_tls(CPUARMState *env);
+
+#endif /* TARGET_ARCH_H */
-- 
2.34.1

[PATCH v3 12/16] migration/multifd: Add direct-io support

2024-06-17 Thread Fabiano Rosas

When multifd is used along with mapped-ram, we can take benefit of a
filesystem that supports the O_DIRECT flag and perform direct I/O in
the multifd threads. This brings a significant performance improvement
because direct-io writes bypass the page cache which would otherwise
be thrashed by the multifd data which is unlikely to be needed again
in a short period of time.

To be able to use a multifd channel opened with O_DIRECT, we must
ensure that a certain aligment is used. Filesystems usually require a
block-size alignment for direct I/O. The way to achieve this is by
enabling the mapped-ram feature, which already aligns its I/O properly
(see MAPPED_RAM_FILE_OFFSET_ALIGNMENT at ram.c).

By setting O_DIRECT on the multifd channels, all writes to the same
file descriptor need to be aligned as well, even the ones that come
from outside multifd, such as the QEMUFile I/O from the main migration
code. This makes it impossible to use the same file descriptor for the
QEMUFile and for the multifd channels. The various flags and metadata
written by the main migration code will always be unaligned by virtue
of their small size. To workaround this issue, we'll require a second
file descriptor to be used exclusively for direct I/O.

The second file descriptor can be obtained by QEMU by re-opening the
migration file (already possible), or by being provided by the user or
management application (support to be added in future patches).

Signed-off-by: Fabiano Rosas 
---
 migration/file.c  | 33 -
 migration/file.h  |  1 -
 migration/migration.c | 23 +++
 3 files changed, 51 insertions(+), 6 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index a903710f06..db870f2cf0 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -50,12 +50,31 @@ void file_cleanup_outgoing_migration(void)
 outgoing_args.fname = NULL;
 }
 
+static void file_enable_direct_io(int *flags)
+{
+#ifdef O_DIRECT
+*flags |= O_DIRECT;
+#else
+/* it should have been rejected when setting the parameter */
+g_assert_not_reached();
+#endif
+}
+
 bool file_send_channel_create(gpointer opaque, Error **errp)
 {
 QIOChannelFile *ioc;
 int flags = O_WRONLY;
 bool ret = true;
 
+if (migrate_direct_io()) {
+/*
+ * Enable O_DIRECT for the secondary channels. These are used
+ * for sending ram pages and writes should be guaranteed to be
+ * aligned to at least page size.
+ */
+file_enable_direct_io();
+}
+
 ioc = qio_channel_file_new_path(outgoing_args.fname, flags, 0, errp);
 if (!ioc) {
 ret = false;
@@ -117,21 +136,25 @@ static gboolean file_accept_incoming_migration(QIOChannel 
*ioc,
 return G_SOURCE_REMOVE;
 }
 
-void file_create_incoming_channels(QIOChannel *ioc, Error **errp)
+static void file_create_incoming_channels(QIOChannel *ioc, char *filename,
+  Error **errp)
 {
-int i, fd, channels = 1;
+int i, channels = 1;
 g_autofree QIOChannel **iocs = NULL;
+int flags = O_RDONLY;
 
 if (migrate_multifd()) {
 channels += migrate_multifd_channels();
+if (migrate_direct_io()) {
+file_enable_direct_io();
+}
 }
 
 iocs = g_new0(QIOChannel *, channels);
-fd = QIO_CHANNEL_FILE(ioc)->fd;
 iocs[0] = ioc;
 
 for (i = 1; i < channels; i++) {
-QIOChannelFile *fioc = qio_channel_file_new_dupfd(fd, errp);
+QIOChannelFile *fioc = qio_channel_file_new_path(filename, flags, 0, 
errp);
 
 if (!fioc) {
 while (i) {
@@ -171,7 +194,7 @@ void file_start_incoming_migration(FileMigrationArgs 
*file_args, Error **errp)
 return;
 }
 
-file_create_incoming_channels(QIO_CHANNEL(fioc), errp);
+file_create_incoming_channels(QIO_CHANNEL(fioc), filename, errp);
 }
 
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
diff --git a/migration/file.h b/migration/file.h
index 7699c04677..9f71e87f74 100644
--- a/migration/file.h
+++ b/migration/file.h
@@ -20,7 +20,6 @@ void file_start_outgoing_migration(MigrationState *s,
 int file_parse_offset(char *filespec, uint64_t *offsetp, Error **errp);
 void file_cleanup_outgoing_migration(void);
 bool file_send_channel_create(gpointer opaque, Error **errp);
-void file_create_incoming_channels(QIOChannel *ioc, Error **errp);
 int file_write_ramblock_iov(QIOChannel *ioc, const struct iovec *iov,
 int niov, RAMBlock *block, Error **errp);
 int multifd_file_recv_data(MultiFDRecvParams *p, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index e1b269624c..e03c80b3aa 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -155,6 +155,16 @@ static bool migration_needs_seekable_channel(void)
 return migrate_mapped_ram();
 }
 
+static bool migration_needs_extra_fds(void)
+{
+/*
+ * When doing direct-io, multifd requires two

[PATCH v3 08/16] monitor: Simplify fdset and fd removal

2024-06-17 Thread Fabiano Rosas

Remove fds right away instead of setting the ->removed flag. We don't
need the extra complexity of having a cleanup function reap the
removed entries at a later time.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 27 ++-
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index ea85ba1f05..32b79c621e 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -43,7 +43,6 @@ struct mon_fd_t {
 typedef struct MonFdsetFd MonFdsetFd;
 struct MonFdsetFd {
 int fd;
-bool removed;
 char *opaque;
 QLIST_ENTRY(MonFdsetFd) next;
 };
@@ -193,20 +192,6 @@ static void monitor_fdset_fd_free(MonFdsetFd *mon_fdset_fd)
 g_free(mon_fdset_fd);
 }
 
-static void monitor_fdset_cleanup(MonFdset *mon_fdset)
-{
-MonFdsetFd *mon_fdset_fd;
-MonFdsetFd *mon_fdset_fd_next;
-
-QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next, mon_fdset_fd_next) 
{
-if (mon_fdset_fd->removed) {
-monitor_fdset_fd_free(mon_fdset_fd);
-}
-}
-
-monitor_fdset_free_if_empty(mon_fdset);
-}
-
 void monitor_fdsets_cleanup(void)
 {
 MonFdset *mon_fdset;
@@ -281,7 +266,7 @@ void qmp_get_win32_socket(const char *infos, const char 
*fdname, Error **errp)
 void qmp_remove_fd(int64_t fdset_id, bool has_fd, int64_t fd, Error **errp)
 {
 MonFdset *mon_fdset;
-MonFdsetFd *mon_fdset_fd;
+MonFdsetFd *mon_fdset_fd, *mon_fdset_fd_next;
 char fd_str[60];
 
 QEMU_LOCK_GUARD(_fdsets_lock);
@@ -289,21 +274,22 @@ void qmp_remove_fd(int64_t fdset_id, bool has_fd, int64_t 
fd, Error **errp)
 if (mon_fdset->id != fdset_id) {
 continue;
 }
-QLIST_FOREACH(mon_fdset_fd, _fdset->fds, next) {
+QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next,
+   mon_fdset_fd_next) {
 if (has_fd) {
 if (mon_fdset_fd->fd != fd) {
 continue;
 }
-mon_fdset_fd->removed = true;
+monitor_fdset_fd_free(mon_fdset_fd);
 break;
 } else {
-mon_fdset_fd->removed = true;
+monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
 if (has_fd && !mon_fdset_fd) {
 goto error;
 }
-monitor_fdset_cleanup(mon_fdset);
+monitor_fdset_free_if_empty(mon_fdset);
 return;
 }
 
@@ -413,7 +399,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 
 mon_fdset_fd = g_malloc0(sizeof(*mon_fdset_fd));
 mon_fdset_fd->fd = fd;
-mon_fdset_fd->removed = false;
 mon_fdset_fd->opaque = g_strdup(opaque);
 QLIST_INSERT_HEAD(_fdset->fds, mon_fdset_fd, next);
 
-- 
2.35.3

[PATCH v3 15/16] migration: Add documentation for fdset with multifd + file

2024-06-17 Thread Fabiano Rosas

With the last few changes to the fdset infrastructure, we now allow
multifd to use an fdset when migrating to a file. This is useful for
the scenario where the management layer wants to have control over the
migration file.

By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.

Document the "file:/dev/fdset/#" syntax for the multifd migration with
mapped-ram. The requirements for the fdset mechanism are:

- the fdset must contain two fds that are not duplicates between
  themselves;

- if direct-io is to be used, exactly one of the fds must have the
  O_DIRECT flag set;

- the file must be opened with WRONLY on the migration source side;

- the file must be opened with RDONLY on the migration destination
  side.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 docs/devel/migration/main.rst   | 24 +++-
 docs/devel/migration/mapped-ram.rst |  6 +-
 2 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 495cdcb112..784c899dca 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -47,11 +47,25 @@ over any transport.
   QEMU interference. Note that QEMU does not flush cached file
   data/metadata at the end of migration.
 
-In addition, support is included for migration using RDMA, which
-transports the page data using ``RDMA``, where the hardware takes care of
-transporting the pages, and the load on the CPU is much lower.  While the
-internals of RDMA migration are a bit different, this isn't really visible
-outside the RAM migration code.
+  The file migration also supports using a file that has already been
+  opened. A set of file descriptors is passed to QEMU via an "fdset"
+  (see add-fd QMP command documentation). This method allows a
+  management application to have control over the migration file
+  opening operation. There are, however, strict requirements to this
+  interface if the multifd capability is enabled:
+
+- the fdset must contain two file descriptors that are not
+  duplicates between themselves;
+- if the direct-io capability is to be used, exactly one of the
+  file descriptors must have the O_DIRECT flag set;
+- the file must be opened with WRONLY on the migration source side
+  and RDONLY on the migration destination side.
+
+- rdma migration: support is included for migration using RDMA, which
+  transports the page data using ``RDMA``, where the hardware takes
+  care of transporting the pages, and the load on the CPU is much
+  lower.  While the internals of RDMA migration are a bit different,
+  this isn't really visible outside the RAM migration code.
 
 All these migration protocols use the same infrastructure to
 save/restore state devices.  This infrastructure is shared with the
diff --git a/docs/devel/migration/mapped-ram.rst 
b/docs/devel/migration/mapped-ram.rst
index fa4cefd9fc..d352b546e9 100644
--- a/docs/devel/migration/mapped-ram.rst
+++ b/docs/devel/migration/mapped-ram.rst
@@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
 sequential stream. Having the pages at fixed offsets also allows the
 usage of O_DIRECT for save/restore of the migration stream as the
 pages are ensured to be written respecting O_DIRECT alignment
-restrictions (direct-io support not yet implemented).
+restrictions.
 
 Usage
 -
@@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
 Mapped-ram migration is best done non-live, i.e. by stopping the VM on
 the source side before migrating.
 
+For best performance enable the ``direct-io`` parameter as well:
+
+``migrate_set_parameter direct-io on``
+
 Use-cases
 -
 
-- 
2.35.3

[PATCH 21/23] Add setup_sigframe_arch function for ARM AArch64 in bsd-user

2024-06-17 Thread Ajeet Singh

From: Warner Losh 

The function utilizes the `get_mcontext` function to retrieve the machine
context for the current CPUARMState

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/signal.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/bsd-user/aarch64/signal.c b/bsd-user/aarch64/signal.c
index ab3bf8558a..43c886e603 100644
--- a/bsd-user/aarch64/signal.c
+++ b/bsd-user/aarch64/signal.c
@@ -81,3 +81,17 @@ abi_long get_mcontext(CPUARMState *regs, target_mcontext_t 
*mcp, int flags)
 
 return err;
 }
+
+/*
+ * Compare to arm64/arm64/exec_machdep.c sendsig()
+ * Assumes that the memory is locked if frame points to user memory.
+ */
+abi_long setup_sigframe_arch(CPUARMState *env, abi_ulong frame_addr,
+ struct target_sigframe *frame, int flags)
+{
+target_mcontext_t *mcp = >sf_uc.uc_mcontext;
+
+get_mcontext(env, mcp, flags);
+return 0;
+}
+
-- 
2.34.1

[PATCH v3 16/16] tests/qtest/migration: Add a test for mapped-ram with passing of fds

2024-06-17 Thread Fabiano Rosas

Add a multifd test for mapped-ram with passing of fds into QEMU. This
is how libvirt will consume the feature.

There are a couple of details to the fdset mechanism:

- multifd needs two distinct file descriptors (not duplicated with
  dup()) so it can enable O_DIRECT only on the channels that do
  aligned IO. The dup() system call creates file descriptors that
  share status flags, of which O_DIRECT is one.

- the open() access mode flags used for the fds passed into QEMU need
  to match the flags QEMU uses to open the file. Currently O_WRONLY
  for src and O_RDONLY for dst.

Note that fdset code goes under _WIN32 because fd passing is not
supported on Windows.

Signed-off-by: Fabiano Rosas 
---
- dropped Peter's r-b

- stopped removing the fdset at the end of the tests. The migration
code should always cleanup after itself.
---
 tests/qtest/migration-test.c | 98 ++--
 1 file changed, 95 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 928f0ca6ce..85a21ff5e9 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2024,11 +2024,18 @@ static void test_precopy_file(void)
 
 #ifndef _WIN32
 static void fdset_add_fds(QTestState *qts, const char *file, int flags,
-  int num_fds)
+  int num_fds, bool direct_io)
 {
 for (int i = 0; i < num_fds; i++) {
 int fd;
 
+#ifdef O_DIRECT
+/* only secondary channels can use direct-io */
+if (direct_io && i != 0) {
+flags |= O_DIRECT;
+}
+#endif
+
 fd = open(file, flags, 0660);
 assert(fd != -1);
 
@@ -2042,8 +2049,8 @@ static void *file_offset_fdset_start_hook(QTestState 
*from, QTestState *to)
 {
 g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
 
-fdset_add_fds(from, file, O_WRONLY, 1);
-fdset_add_fds(to, file, O_RDONLY, 1);
+fdset_add_fds(from, file, O_WRONLY, 1, false);
+fdset_add_fds(to, file, O_RDONLY, 1, false);
 
 return NULL;
 }
@@ -2215,6 +,84 @@ static void test_multifd_file_mapped_ram_dio(void)
 test_file_common(, true);
 }
 
+#ifndef _WIN32
+static void multifd_mapped_ram_fdset_end(QTestState *from, QTestState *to,
+ void *opaque)
+{
+QDict *resp;
+QList *fdsets;
+
+/*
+ * Make sure no fdsets are left after migration, otherwise a
+ * second migration would fail due fdset reuse.
+ */
+resp = qtest_qmp(from, "{'execute': 'query-fdsets', "
+ "'arguments': {}}");
+g_assert(qdict_haskey(resp, "return"));
+fdsets = qdict_get_qlist(resp, "return");
+g_assert(fdsets && qlist_empty(fdsets));
+}
+
+static void *multifd_mapped_ram_fdset_dio(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 2, true);
+fdset_add_fds(to, file, O_RDONLY, 2, true);
+
+migrate_multifd_mapped_ram_start(from, to);
+migrate_set_parameter_bool(from, "direct-io", true);
+migrate_set_parameter_bool(to, "direct-io", true);
+
+return NULL;
+}
+
+static void *multifd_mapped_ram_fdset(QTestState *from, QTestState *to)
+{
+g_autofree char *file = g_strdup_printf("%s/%s", tmpfs, 
FILE_TEST_FILENAME);
+
+fdset_add_fds(from, file, O_WRONLY, 2, false);
+fdset_add_fds(to, file, O_RDONLY, 2, false);
+
+migrate_multifd_mapped_ram_start(from, to);
+
+return NULL;
+}
+
+static void test_multifd_file_mapped_ram_fdset(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = multifd_mapped_ram_fdset,
+.finish_hook = multifd_mapped_ram_fdset_end,
+};
+
+test_file_common(, true);
+}
+
+static void test_multifd_file_mapped_ram_fdset_dio(void)
+{
+g_autofree char *uri = g_strdup_printf("file:/dev/fdset/1,offset=%d",
+   FILE_TEST_OFFSET);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = multifd_mapped_ram_fdset_dio,
+.finish_hook = multifd_mapped_ram_fdset_end,
+};
+
+if (!probe_o_direct_support(tmpfs)) {
+g_test_skip("Filesystem does not support O_DIRECT");
+return;
+}
+
+test_file_common(, true);
+}
+#endif /* !_WIN32 */
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -3654,6 +3739,13 @@ int main(int argc, char **argv)
 migration_test_add("/migration/multifd/file/mapped-ram/dio",
test_multifd_file_mapped_ram_dio);
 
+#ifndef _WIN32
+migration_test_add("/migration/multifd/file/mapped-ram/fdset",
+   test_multifd_file_mapped_ram_fdset);
+

[PATCH 17/23] Add ARM AArch64 sigcode setup function for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

signal trampoline function initializes a sequence of instructions
to handle signal returns and exits, and copies this code to the target offset.

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_sigtramp.h | 48 +
 1 file changed, 48 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_sigtramp.h

diff --git a/bsd-user/aarch64/target_arch_sigtramp.h 
b/bsd-user/aarch64/target_arch_sigtramp.h
new file mode 100644
index 00..8cdd33b621
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_sigtramp.h
@@ -0,0 +1,48 @@
+/*
+ * ARM AArch64 sigcode for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_SIGTRAMP_H
+#define TARGET_ARCH_SIGTRAMP_H
+
+/* Compare to ENTRY(sigcode) in arm64/arm64/locore.S */
+static inline abi_long setup_sigtramp(abi_ulong offset, unsigned sigf_uc,
+unsigned sys_sigreturn)
+{
+int i;
+uint32_t sys_exit = TARGET_FREEBSD_NR_exit;
+
+uint32_t sigtramp_code[] = {
+/* 1 */ 0x910003e0, /* mov x0, sp */
+/* 2 */ 0x9100 + (sigf_uc << 10), /* add x0, x0, #SIGF_UC */
+/* 3 */ 0xd280 + (sys_sigreturn << 5) + 0x8, /* mov x8, #SYS_sigreturn 
*/
+/* 4 */ 0xd401, /* svc #0 */
+/* 5 */ 0xd2800028 + (sys_exit << 5) + 0x8, /* mov x8, #SYS_exit */
+/* 6 */ 0xd401, /* svc #0 */
+/* 7 */ 0x17fc, /* b -4 */
+/* 8 */ sys_sigreturn,
+/* 9 */ sys_exit
+};
+
+for (i = 0; i < 9; i++) {
+tswap32s(_code[i]);
+}
+
+return memcpy_to_target(offset, sigtramp_code, TARGET_SZSIGCODE);
+}
+#endif /* TARGET_ARCH_SIGTRAMP_H */
-- 
2.34.1

[PATCH 02/23] Added CPU loop function

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

CPU loop function to handle exceptions
and emulate execution of instructions

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Kyle Evans 
Co-authored-by: Sean Bruno 
Co-authored-by: Jessica Clarke 
---
 bsd-user/aarch64/target_arch_cpu.h | 132 +
 1 file changed, 132 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_cpu.h 
b/bsd-user/aarch64/target_arch_cpu.h
index db5c7062b9..1962d2c99b 100644
--- a/bsd-user/aarch64/target_arch_cpu.h
+++ b/bsd-user/aarch64/target_arch_cpu.h
@@ -40,3 +40,135 @@ static inline void target_cpu_init(CPUARMState *env,
 env->pc = regs->pc;
 env->xregs[31] = regs->sp;
 }
+
+
+static inline void target_cpu_loop(CPUARMState *env)
+{
+CPUState *cs = env_cpu(env);
+int trapnr, ec, fsc, si_code, si_signo;
+uint64_t code, arg1, arg2, arg3, arg4, arg5, arg6, arg7, arg8;
+uint32_t pstate;
+abi_long ret;
+
+for (;;) {
+cpu_exec_start(cs);
+trapnr = cpu_exec(cs);
+cpu_exec_end(cs);
+process_queued_cpu_work(cs);
+
+switch (trapnr) {
+case EXCP_SWI:
+/* See arm64/arm64/trap.c cpu_fetch_syscall_args() */
+code = env->xregs[8];
+if (code == TARGET_FREEBSD_NR_syscall ||
+code == TARGET_FREEBSD_NR___syscall) {
+code = env->xregs[0];
+arg1 = env->xregs[1];
+arg2 = env->xregs[2];
+arg3 = env->xregs[3];
+arg4 = env->xregs[4];
+arg5 = env->xregs[5];
+arg6 = env->xregs[6];
+arg7 = env->xregs[7];
+arg8 = 0;
+} else {
+arg1 = env->xregs[0];
+arg2 = env->xregs[1];
+arg3 = env->xregs[2];
+arg4 = env->xregs[3];
+arg5 = env->xregs[4];
+arg6 = env->xregs[5];
+arg7 = env->xregs[6];
+arg8 = env->xregs[7];
+}
+ret = do_freebsd_syscall(env, code, arg1, arg2, arg3,
+arg4, arg5, arg6, arg7, arg8);
+/*
+ * The carry bit is cleared for no error; set for error.
+ * See arm64/arm64/vm_machdep.c cpu_set_syscall_retval()
+ */
+pstate = pstate_read(env);
+if (ret >= 0) {
+pstate &= ~PSTATE_C;
+env->xregs[0] = ret;
+} else if (ret == -TARGET_ERESTART) {
+env->pc -= 4;
+break;
+} else if (ret != -TARGET_EJUSTRETURN) {
+pstate |= PSTATE_C;
+env->xregs[0] = -ret;
+}
+pstate_write(env, pstate);
+break;
+
+case EXCP_INTERRUPT:
+/* Just indicate that signals should be handle ASAP. */
+break;
+
+case EXCP_UDEF:
+force_sig_fault(TARGET_SIGILL, TARGET_ILL_ILLOPN, env->pc);
+break;
+
+
+case EXCP_PREFETCH_ABORT:
+case EXCP_DATA_ABORT:
+/* We should only arrive here with EC in {DATAABORT, INSNABORT}. */
+ec = syn_get_ec(env->exception.syndrome);
+assert(ec == EC_DATAABORT || ec == EC_INSNABORT);
+
+/* Both EC have the same format for FSC, or close enough. */
+fsc = extract32(env->exception.syndrome, 0, 6);
+switch (fsc) {
+case 0x04 ... 0x07: /* Translation fault, level {0-3} */
+si_signo = TARGET_SIGSEGV;
+si_code = TARGET_SEGV_MAPERR;
+break;
+case 0x09 ... 0x0b: /* Access flag fault, level {1-3} */
+case 0x0d ... 0x0f: /* Permission fault, level {1-3} */
+si_signo = TARGET_SIGSEGV;
+si_code = TARGET_SEGV_ACCERR;
+break;
+case 0x11: /* Synchronous Tag Check Fault */
+si_signo = TARGET_SIGSEGV;
+si_code = /* TARGET_SEGV_MTESERR; */ TARGET_SEGV_ACCERR;
+break;
+case 0x21: /* Alignment fault */
+si_signo = TARGET_SIGBUS;
+si_code = TARGET_BUS_ADRALN;
+break;
+default:
+g_assert_not_reached();
+}
+force_sig_fault(si_signo, si_code, env->exception.vaddress);
+break;
+
+case EXCP_DEBUG:
+case EXCP_BKPT:
+force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, env->pc);
+break;
+
+case EXCP_ATOMIC:
+cpu_exec_step_atomic(cs);
+break;
+
+case EXCP_YIELD:
+/* nothing to do here for user-mode, just resume guest code */
+break;
+default:
+fprintf(stderr, "qemu: unhandled CPU exception 0x%x - aborting\n",
+trapnr);
+cpu_dump_state(cs, stderr, 0);
+abort();
+} /*

[PATCH 05/23] Managing CPU register for BSD-USER

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Added structure for storing register states

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Sean Bruno 
---
 bsd-user/aarch64/target_syscall.h | 51 +++
 1 file changed, 51 insertions(+)
 create mode 100644 bsd-user/aarch64/target_syscall.h

diff --git a/bsd-user/aarch64/target_syscall.h 
b/bsd-user/aarch64/target_syscall.h
new file mode 100644
index 00..08ae913c42
--- /dev/null
+++ b/bsd-user/aarch64/target_syscall.h
@@ -0,0 +1,51 @@
+/*
+ * ARM AArch64 specific CPU for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef BSD_USER_AARCH64_TARGET_SYSCALL_H
+#define BSD_USER_AARCH64_TARGET_SYSCALL_H
+
+/*
+ * The aarch64 registers are named:
+ *
+ * x0 through x30 - for 64-bit-wide access (same registers)
+ * Register '31' is one of two registers depending on the instruction context:
+ *  For instructions dealing with the stack, it is the stack pointer, named rsp
+ *  For all other instructions, it is a "zero" register, which returns 0 when
+ *  read and discards data when written - named rzr (xzr, wzr)
+ *
+ * Usage during syscall/function call:
+ * r0-r7 are used for arguments and return values
+ * For syscalls, the syscall number is in r8
+ * r9-r15 are for temporary values (may get trampled)
+ * r16-r18 are used for intra-procedure-call and platform values (avoid)
+ * The called routine is expected to preserve r19-r28
+ * r29 and r30 are used as the frame register and link register (avoid)
+ * See the ARM Procedure Call Reference for details.
+ */
+struct target_pt_regs {
+uint64_tregs[31];
+uint64_tsp;
+uint64_tpc;
+uint64_tpstate;
+};
+
+#define TARGET_HW_MACHINE   "arm64"
+#define TARGET_HW_MACHINE_ARCH  "aarch64"
+
+#endif /* BSD_USER_AARCH64_TARGET_SYSCALL_H */
-- 
2.34.1

[PATCH 14/23] Add ARM AArch64 hardware capability definitions

2024-06-17 Thread Ajeet Singh

From: Warner Losh 

Defined a huge list of hardware capabilites and added
macros for retrieving hwcap flags

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_elf.h | 61 ++
 1 file changed, 61 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_elf.h 
b/bsd-user/aarch64/target_arch_elf.h
index 6d0fa3525f..41afa5a9da 100644
--- a/bsd-user/aarch64/target_arch_elf.h
+++ b/bsd-user/aarch64/target_arch_elf.h
@@ -34,4 +34,65 @@
 #define USE_ELF_CORE_DUMP
 #define ELF_EXEC_PAGESIZE   4096
 
+enum {
+ARM_HWCAP_A64_FP= 1 << 0,
+ARM_HWCAP_A64_ASIMD = 1 << 1,
+ARM_HWCAP_A64_EVTSTRM   = 1 << 2,
+ARM_HWCAP_A64_AES   = 1 << 3,
+ARM_HWCAP_A64_PMULL = 1 << 4,
+ARM_HWCAP_A64_SHA1  = 1 << 5,
+ARM_HWCAP_A64_SHA2  = 1 << 6,
+ARM_HWCAP_A64_CRC32 = 1 << 7,
+ARM_HWCAP_A64_ATOMICS   = 1 << 8,
+ARM_HWCAP_A64_FPHP  = 1 << 9,
+ARM_HWCAP_A64_ASIMDHP   = 1 << 10,
+ARM_HWCAP_A64_CPUID = 1 << 11,
+ARM_HWCAP_A64_ASIMDRDM  = 1 << 12,
+ARM_HWCAP_A64_JSCVT = 1 << 13,
+ARM_HWCAP_A64_FCMA  = 1 << 14,
+ARM_HWCAP_A64_LRCPC = 1 << 15,
+ARM_HWCAP_A64_DCPOP = 1 << 16,
+ARM_HWCAP_A64_SHA3  = 1 << 17,
+ARM_HWCAP_A64_SM3   = 1 << 18,
+ARM_HWCAP_A64_SM4   = 1 << 19,
+ARM_HWCAP_A64_ASIMDDP   = 1 << 20,
+ARM_HWCAP_A64_SHA512= 1 << 21,
+ARM_HWCAP_A64_SVE   = 1 << 22,
+ARM_HWCAP_A64_ASIMDFHM  = 1 << 23,
+ARM_HWCAP_A64_DIT   = 1 << 24,
+ARM_HWCAP_A64_USCAT = 1 << 25,
+ARM_HWCAP_A64_ILRCPC= 1 << 26,
+ARM_HWCAP_A64_FLAGM = 1 << 27,
+ARM_HWCAP_A64_SSBS  = 1 << 28,
+ARM_HWCAP_A64_SB= 1 << 29,
+ARM_HWCAP_A64_PACA  = 1 << 30,
+ARM_HWCAP_A64_PACG  = 1UL << 31,
+
+ARM_HWCAP2_A64_DCPODP   = 1 << 0,
+ARM_HWCAP2_A64_SVE2 = 1 << 1,
+ARM_HWCAP2_A64_SVEAES   = 1 << 2,
+ARM_HWCAP2_A64_SVEPMULL = 1 << 3,
+ARM_HWCAP2_A64_SVEBITPERM   = 1 << 4,
+ARM_HWCAP2_A64_SVESHA3  = 1 << 5,
+ARM_HWCAP2_A64_SVESM4   = 1 << 6,
+ARM_HWCAP2_A64_FLAGM2   = 1 << 7,
+ARM_HWCAP2_A64_FRINT= 1 << 8,
+ARM_HWCAP2_A64_SVEI8MM  = 1 << 9,
+ARM_HWCAP2_A64_SVEF32MM = 1 << 10,
+ARM_HWCAP2_A64_SVEF64MM = 1 << 11,
+ARM_HWCAP2_A64_SVEBF16  = 1 << 12,
+ARM_HWCAP2_A64_I8MM = 1 << 13,
+ARM_HWCAP2_A64_BF16 = 1 << 14,
+ARM_HWCAP2_A64_DGH  = 1 << 15,
+ARM_HWCAP2_A64_RNG  = 1 << 16,
+ARM_HWCAP2_A64_BTI  = 1 << 17,
+ARM_HWCAP2_A64_MTE  = 1 << 18,
+};
+
+#define ELF_HWCAP   get_elf_hwcap()
+#define ELF_HWCAP2  get_elf_hwcap2()
+
+#define GET_FEATURE_ID(feat, hwcap) \
+do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
+
 #endif /* TARGET_ARCH_ELF_H */
-- 
2.34.1

[PATCH v3 11/16] migration: Add direct-io parameter

2024-06-17 Thread Fabiano Rosas

Add the direct-io migration parameter that tells the migration code to
use O_DIRECT when opening the migration stream file whenever possible.

This is currently only used with the mapped-ram migration that has a
clear window guaranteed to perform aligned writes.

Acked-by: Markus Armbruster 
Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 include/qemu/osdep.h   |  2 ++
 migration/migration-hmp-cmds.c | 11 +++
 migration/options.c| 35 ++
 migration/options.h|  1 +
 qapi/migration.json| 21 +---
 util/osdep.c   |  9 +
 6 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index f61edcfdc2..191916f38e 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -612,6 +612,8 @@ int qemu_lock_fd_test(int fd, int64_t start, int64_t len, 
bool exclusive);
 bool qemu_has_ofd_lock(void);
 #endif
 
+bool qemu_has_direct_io(void);
+
 #if defined(__HAIKU__) && defined(__i386__)
 #define FMT_pid "%ld"
 #elif defined(WIN64)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index 9f0e8029e0..7d608d26e1 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -351,6 +351,13 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MODE),
 qapi_enum_lookup(_lookup, params->mode));
+
+if (params->has_direct_io) {
+monitor_printf(mon, "%s: %s\n",
+   MigrationParameter_str(
+   MIGRATION_PARAMETER_DIRECT_IO),
+   params->direct_io ? "on" : "off");
+}
 }
 
 qapi_free_MigrationParameters(params);
@@ -624,6 +631,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_mode = true;
 visit_type_MigMode(v, param, >mode, );
 break;
+case MIGRATION_PARAMETER_DIRECT_IO:
+p->has_direct_io = true;
+visit_type_bool(v, param, >direct_io, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 5ab5b6d85d..645f55003d 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -702,6 +702,25 @@ bool migrate_cpu_throttle_tailslow(void)
 return s->parameters.cpu_throttle_tailslow;
 }
 
+bool migrate_direct_io(void)
+{
+MigrationState *s = migrate_get_current();
+
+/*
+ * O_DIRECT is only supported with mapped-ram and multifd.
+ *
+ * mapped-ram is needed because filesystems impose restrictions on
+ * O_DIRECT IO alignment (see MAPPED_RAM_FILE_OFFSET_ALIGNMENT).
+ *
+ * multifd is needed to keep the unaligned portion of the stream
+ * isolated to the main migration thread while multifd channels
+ * process the aligned data with O_DIRECT enabled.
+ */
+return s->parameters.direct_io &&
+s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM] &&
+s->capabilities[MIGRATION_CAPABILITY_MULTIFD];
+}
+
 uint64_t migrate_downtime_limit(void)
 {
 MigrationState *s = migrate_get_current();
@@ -905,6 +924,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->mode = s->parameters.mode;
 params->has_zero_page_detection = true;
 params->zero_page_detection = s->parameters.zero_page_detection;
+params->has_direct_io = true;
+params->direct_io = s->parameters.direct_io;
 
 return params;
 }
@@ -937,6 +958,7 @@ void migrate_params_init(MigrationParameters *params)
 params->has_vcpu_dirty_limit = true;
 params->has_mode = true;
 params->has_zero_page_detection = true;
+params->has_direct_io = true;
 }
 
 /*
@@ -1110,6 +1132,11 @@ bool migrate_params_check(MigrationParameters *params, 
Error **errp)
 return false;
 }
 
+if (params->has_direct_io && params->direct_io && !qemu_has_direct_io()) {
+error_setg(errp, "No build-time support for direct-io");
+return false;
+}
+
 return true;
 }
 
@@ -1216,6 +1243,10 @@ static void 
migrate_params_test_apply(MigrateSetParameters *params,
 if (params->has_zero_page_detection) {
 dest->zero_page_detection = params->zero_page_detection;
 }
+
+if (params->has_direct_io) {
+dest->direct_io = params->direct_io;
+}
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1341,6 +1372,10 @@ static void migrate_params_apply(MigrateSetParameters 
*params, Error **errp)
 if (params->has_zero_page_detection) {
 s->parameters.zero_page_detection = params->zero_page_detection;
 }
+
+if (params->has_direct_io) {
+s->parameters.direct_io = params->direct_io;
+}
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h

[PATCH v3 09/16] monitor: Report errors from monitor_fdset_dup_fd_add

2024-06-17 Thread Fabiano Rosas

I'm keeping the EACCES because callers expect to be able to look at
errno.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 include/monitor/monitor.h |  2 +-
 monitor/fds.c | 10 +-
 stubs/fdset.c |  2 +-
 util/osdep.c  | 10 +-
 4 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index fd9b3f538c..c3740ec616 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -51,7 +51,7 @@ int monitor_read_password(MonitorHMP *mon, ReadLineFunc 
*readline_func,
 
 AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, int64_t fdset_id,
 const char *opaque, Error **errp);
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp);
 void monitor_fdset_dup_fd_remove(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
diff --git a/monitor/fds.c b/monitor/fds.c
index 32b79c621e..fd87e7db8b 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -409,9 +409,10 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 return fdinfo;
 }
 
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp)
 {
 #ifdef _WIN32
+error_setg(errp, "Platform does not support fd passing (fdset)");
 return -ENOENT;
 #else
 MonFdset *mon_fdset;
@@ -431,6 +432,8 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 QLIST_FOREACH(mon_fdset_fd, _fdset->fds, next) {
 mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
 if (mon_fd_flags == -1) {
+error_setg(errp, "Failed to read file status flags for fd=%d",
+   mon_fdset_fd->fd);
 return -1;
 }
 
@@ -442,11 +445,15 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 
 if (fd == -1) {
 errno = EACCES;
+error_setg(errp,
+   "Failed to find file descriptor with matching 
flags=0x%x",
+   flags);
 return -1;
 }
 
 dup_fd = qemu_dup_flags(fd, flags);
 if (dup_fd == -1) {
+error_setg(errp, "Failed to dup() given file descriptor fd=%d", 
fd);
 return -1;
 }
 
@@ -456,6 +463,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 return dup_fd;
 }
 
+error_setg(errp, "Failed to find fdset /dev/fdset/%" PRId64, fdset_id);
 errno = ENOENT;
 return -1;
 #endif
diff --git a/stubs/fdset.c b/stubs/fdset.c
index 389e368a29..2950fd91fd 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -3,7 +3,7 @@
 #include "monitor/monitor.h"
 #include "../monitor/monitor-internal.h"
 
-int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
+int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, Error **errp)
 {
 errno = ENOSYS;
 return -1;
diff --git a/util/osdep.c b/util/osdep.c
index 756de9a745..5bbfdfac7a 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -310,7 +310,6 @@ qemu_open_internal(const char *name, int flags, mode_t 
mode, Error **errp)
 /* Attempt dup of fd from fd set */
 if (strstart(name, "/dev/fdset/", _id_str)) {
 int64_t fdset_id;
-int dupfd;
 
 fdset_id = qemu_parse_fdset(fdset_id_str);
 if (fdset_id == -1) {
@@ -319,14 +318,7 @@ qemu_open_internal(const char *name, int flags, mode_t 
mode, Error **errp)
 return -1;
 }
 
-dupfd = monitor_fdset_dup_fd_add(fdset_id, flags);
-if (dupfd == -1) {
-error_setg_errno(errp, errno, "Could not dup FD for %s flags %x",
- name, flags);
-return -1;
-}
-
-return dupfd;
+return monitor_fdset_dup_fd_add(fdset_id, flags, errp);
 }
 #endif
 
-- 
2.35.3

[PATCH 15/23] Add function to retrieve ARM AArch64 hardware capabilities

2024-06-17 Thread Ajeet Singh

From: Warner Losh 

The function initializes default hardware capabilities and
finds additional features using the `GET_FEATURE_ID` macro

Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_elf.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/bsd-user/aarch64/target_arch_elf.h 
b/bsd-user/aarch64/target_arch_elf.h
index 41afa5a9da..ed2419408e 100644
--- a/bsd-user/aarch64/target_arch_elf.h
+++ b/bsd-user/aarch64/target_arch_elf.h
@@ -95,4 +95,42 @@ enum {
 #define GET_FEATURE_ID(feat, hwcap) \
 do { if (cpu_isar_feature(feat, cpu)) { hwcaps |= hwcap; } } while (0)
 
+static uint32_t get_elf_hwcap(void)
+{
+ARMCPU *cpu = ARM_CPU(thread_cpu);
+uint32_t hwcaps = 0;
+
+hwcaps |= ARM_HWCAP_A64_FP;
+hwcaps |= ARM_HWCAP_A64_ASIMD;
+hwcaps |= ARM_HWCAP_A64_CPUID;
+
+/* probe for the extra features */
+
+GET_FEATURE_ID(aa64_aes, ARM_HWCAP_A64_AES);
+GET_FEATURE_ID(aa64_pmull, ARM_HWCAP_A64_PMULL);
+GET_FEATURE_ID(aa64_sha1, ARM_HWCAP_A64_SHA1);
+GET_FEATURE_ID(aa64_sha256, ARM_HWCAP_A64_SHA2);
+GET_FEATURE_ID(aa64_sha512, ARM_HWCAP_A64_SHA512);
+GET_FEATURE_ID(aa64_crc32, ARM_HWCAP_A64_CRC32);
+GET_FEATURE_ID(aa64_sha3, ARM_HWCAP_A64_SHA3);
+GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
+GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
+GET_FEATURE_ID(aa64_fp16, ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
+GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
+GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
+GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
+GET_FEATURE_ID(aa64_fcma, ARM_HWCAP_A64_FCMA);
+GET_FEATURE_ID(aa64_sve, ARM_HWCAP_A64_SVE);
+GET_FEATURE_ID(aa64_pauth, ARM_HWCAP_A64_PACA | ARM_HWCAP_A64_PACG);
+GET_FEATURE_ID(aa64_fhm, ARM_HWCAP_A64_ASIMDFHM);
+GET_FEATURE_ID(aa64_jscvt, ARM_HWCAP_A64_JSCVT);
+GET_FEATURE_ID(aa64_sb, ARM_HWCAP_A64_SB);
+GET_FEATURE_ID(aa64_condm_4, ARM_HWCAP_A64_FLAGM);
+GET_FEATURE_ID(aa64_dcpop, ARM_HWCAP_A64_DCPOP);
+GET_FEATURE_ID(aa64_rcpc_8_3, ARM_HWCAP_A64_LRCPC);
+GET_FEATURE_ID(aa64_rcpc_8_4, ARM_HWCAP_A64_ILRCPC);
+
+return hwcaps;
+}
+
 #endif /* TARGET_ARCH_ELF_H */
-- 
2.34.1

[PATCH 04/23] AArch64 specific CPU for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Function to set and recieve thread-local-storage value
from tpidr_el0 register

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
---
 bsd-user/aarch64/target_arch_cpu.c | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_cpu.c

diff --git a/bsd-user/aarch64/target_arch_cpu.c 
b/bsd-user/aarch64/target_arch_cpu.c
new file mode 100644
index 00..70ef651827
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_cpu.c
@@ -0,0 +1,34 @@
+/*
+ * ARM AArch64 specific CPU for bsd-user
+ *
+ * Copyright (c) 2015 Stacey Son
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "qemu/osdep.h"
+
+#include "target_arch.h"
+
+/* See cpu_set_user_tls() in arm64/arm64/vm_machdep.c */
+void target_cpu_set_tls(CPUARMState *env, target_ulong newtls)
+{
+
+env->cp15.tpidr_el[0] = newtls;
+}
+
+target_ulong target_cpu_get_tls(CPUARMState *env)
+{
+
+return env->cp15.tpidr_el[0];
+}
-- 
2.34.1

[PATCH v3 13/16] tests/qtest/migration: Add tests for file migration with direct-io

2024-06-17 Thread Fabiano Rosas

The tests are only allowed to run in systems that know about the
O_DIRECT flag and in filesystems which support it.

Note: this also brings back migrate_set_parameter_bool() which went
away when we removed the compression tests. I copied it verbatim.

Signed-off-by: Fabiano Rosas 
Reviewed-by: Peter Xu 
---
 tests/qtest/migration-helpers.c | 44 +++
 tests/qtest/migration-helpers.h |  8 +
 tests/qtest/migration-test.c| 62 +
 3 files changed, 114 insertions(+)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index ce6d6615b5..0ac49ceb54 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -18,6 +18,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qlist.h"
 #include "qemu/cutils.h"
+#include "qemu/memalign.h"
 
 #include "migration-helpers.h"
 
@@ -473,3 +474,46 @@ void migration_test_add(const char *path, void (*fn)(void))
 qtest_add_data_func_full(path, test, migration_test_wrapper,
  migration_test_destroy);
 }
+
+#ifdef O_DIRECT
+/*
+ * Probe for O_DIRECT support on the filesystem. Since this is used
+ * for tests, be conservative, if anything fails, assume it's
+ * unsupported.
+ */
+bool probe_o_direct_support(const char *tmpfs)
+{
+g_autofree char *filename = g_strdup_printf("%s/probe-o-direct", tmpfs);
+int fd, flags = O_CREAT | O_RDWR | O_TRUNC | O_DIRECT;
+void *buf;
+ssize_t ret, len;
+uint64_t offset;
+
+fd = open(filename, flags, 0660);
+if (fd < 0) {
+unlink(filename);
+return false;
+}
+
+/*
+ * Using 1MB alignment as conservative choice to satisfy any
+ * plausible architecture default page size, and/or filesystem
+ * alignment restrictions.
+ */
+len = 0x10;
+offset = 0x10;
+
+buf = qemu_try_memalign(len, len);
+g_assert(buf);
+
+ret = pwrite(fd, buf, len, offset);
+unlink(filename);
+g_free(buf);
+
+if (ret < 0) {
+return false;
+}
+
+return true;
+}
+#endif
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index 1339835698..50095fca4a 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -54,5 +54,13 @@ char *find_common_machine_version(const char *mtype, const 
char *var1,
   const char *var2);
 char *resolve_machine_version(const char *alias, const char *var1,
   const char *var2);
+#ifdef O_DIRECT
+bool probe_o_direct_support(const char *tmpfs);
+#else
+static inline bool probe_o_direct_support(const char *tmpfs)
+{
+return false;
+}
+#endif
 void migration_test_add(const char *path, void (*fn)(void));
 #endif /* MIGRATION_HELPERS_H */
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 22b07bc0ec..928f0ca6ce 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -407,6 +407,38 @@ static void migrate_set_parameter_str(QTestState *who, 
const char *parameter,
 migrate_check_parameter_str(who, parameter, value);
 }
 
+static long long migrate_get_parameter_bool(QTestState *who,
+   const char *parameter)
+{
+QDict *rsp;
+int result;
+
+rsp = qtest_qmp_assert_success_ref(
+who, "{ 'execute': 'query-migrate-parameters' }");
+result = qdict_get_bool(rsp, parameter);
+qobject_unref(rsp);
+return !!result;
+}
+
+static void migrate_check_parameter_bool(QTestState *who, const char 
*parameter,
+int value)
+{
+int result;
+
+result = migrate_get_parameter_bool(who, parameter);
+g_assert_cmpint(result, ==, value);
+}
+
+static void migrate_set_parameter_bool(QTestState *who, const char *parameter,
+  int value)
+{
+qtest_qmp_assert_success(who,
+ "{ 'execute': 'migrate-set-parameters',"
+ "'arguments': { %s: %i } }",
+ parameter, value);
+migrate_check_parameter_bool(who, parameter, value);
+}
+
 static void migrate_ensure_non_converge(QTestState *who)
 {
 /* Can't converge with 1ms downtime + 3 mbs bandwidth limit */
@@ -2155,6 +2187,33 @@ static void test_multifd_file_mapped_ram(void)
 test_file_common(, true);
 }
 
+static void *multifd_mapped_ram_dio_start(QTestState *from, QTestState *to)
+{
+migrate_multifd_mapped_ram_start(from, to);
+
+migrate_set_parameter_bool(from, "direct-io", true);
+migrate_set_parameter_bool(to, "direct-io", true);
+
+return NULL;
+}
+
+static void test_multifd_file_mapped_ram_dio(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook =

[PATCH v3 02/16] migration: Fix file migration with fdset

2024-06-17 Thread Fabiano Rosas

When the "file:" migration support was added we missed the special
case in the qemu_open_old implementation that allows for a particular
file name format to be used to refer to a set of file descriptors that
have been previously provided to QEMU via the add-fd QMP command.

When using this fdset feature, we should not truncate the migration
file because being given an fd means that the management layer is in
control of the file and will likely already have some data written to
it. This is further indicated by the presence of the 'offset'
argument, which indicates the start of the region where QEMU is
allowed to write.

Fix the issue by replacing the O_TRUNC flag on open by an ftruncate
call, which will take the offset into consideration.

Fixes: 385f510df5 ("migration: file URI offset")
Suggested-by: Daniel P. Berrangé 
Reviewed-by: Prasad Pandit 
Reviewed-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Fabiano Rosas 
---
 migration/file.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/migration/file.c b/migration/file.c
index 2bb8c64092..a903710f06 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -84,12 +84,19 @@ void file_start_outgoing_migration(MigrationState *s,
 
 trace_migration_file_outgoing(filename);
 
-fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY | O_TRUNC,
- 0600, errp);
+fioc = qio_channel_file_new_path(filename, O_CREAT | O_WRONLY, 0600, errp);
 if (!fioc) {
 return;
 }
 
+if (ftruncate(fioc->fd, offset)) {
+error_setg_errno(errp, errno,
+ "failed to truncate migration file to offset %" 
PRIx64,
+ offset);
+object_unref(OBJECT(fioc));
+return;
+}
+
 outgoing_args.fname = g_strdup(filename);
 
 ioc = QIO_CHANNEL(fioc);
-- 
2.35.3

[PATCH 11/23] Update ARM AArch64 VM parameter definitions for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Defined address spaces for FreeBSD/arm64 and added function for
getting stack pointer from CPU and setting a return value.

Signed-off-by: Stacey Son 
Signed-off-by: Warner Losh 
Signed-off-by: Ajeet Singh 
Co-authored-by: Sean Bruno 
Co-authored-by: Warner Losh 
---
 bsd-user/aarch64/target_arch_vmparam.h | 68 ++
 1 file changed, 68 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_vmparam.h

diff --git a/bsd-user/aarch64/target_arch_vmparam.h 
b/bsd-user/aarch64/target_arch_vmparam.h
new file mode 100644
index 00..dc66e1289b
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_vmparam.h
@@ -0,0 +1,68 @@
+/*
+ * ARM AArch64 VM parameters definitions for bsd-user.
+ *
+ * Copyright (c) 2015 Stacey D. Son 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_VMPARAM_H
+#define TARGET_ARCH_VMPARAM_H
+
+#include "cpu.h"
+
+/**
+ * FreeBSD/arm64 Address space layout.
+ *
+ * ARMv8 implements up to a 48 bit virtual address space. The address space is
+ * split into 2 regions at each end of the 64 bit address space, with an
+ * out of range "hole" in the middle.
+ *
+ * We limit the size of the two spaces to 39 bits each.
+ *
+ * Upper region:0x
+ *  0xff80
+ *
+ * Hole:0xff7f
+ *  0x0080
+ *
+ * Lower region:0x007f
+ *  0x
+ *
+ * The upper region for the kernel, and the lower region for userland.
+ */
+
+
+/* compare to sys/arm64/include/vmparam.h */
+#define TARGET_MAXTSIZ  (1 * GiB)   /* max text size */
+#define TARGET_DFLDSIZ  (128 * MiB) /* initial data size limit */
+#define TARGET_MAXDSIZ  (1 * GiB)   /* max data size */
+#define TARGET_DFLSSIZ  (128 * MiB) /* initial stack size limit */
+#define TARGET_MAXSSIZ  (1 * GiB)   /* max stack size */
+#define TARGET_SGROWSIZ (128 * KiB) /* amount to grow stack */
+
+/* KERNBASE - 512 MB */
+#define TARGET_VM_MAXUSER_ADDRESS   (0x7f00ULL - (512 * MiB))
+#define TARGET_USRSTACK TARGET_VM_MAXUSER_ADDRESS
+
+static inline abi_ulong get_sp_from_cpustate(CPUARMState *state)
+{
+return state->xregs[31]; /* sp */
+}
+
+static inline void set_second_rval(CPUARMState *state, abi_ulong retval2)
+{
+state->xregs[1] = retval2; /* XXX not really used on 64-bit arch */
+}
+#endif /* TARGET_ARCH_VMPARAM_H */
-- 
2.34.1

[PATCH v3 07/16] monitor: Stop removing non-duplicated fds

2024-06-17 Thread Fabiano Rosas

monitor_fdsets_cleanup() currently has three responsibilities:

1- Remove the fds that have been marked for removal(->removed=true) by
   qmp_remove_fd(). This is overly complicated, but ok.

2- Remove any file descriptors that have been passed into QEMU and
   never duplicated[1,2]. A file descriptor without duplicates
   indicates that no part of QEMU has made use of it. This is
   problematic because the current implementation does it only if the
   guest is not running and the monitor is closed.

3- Remove/free fdsets that have become empty due to the above
   removals. This is ok.

The scenario described in (2) is starting to show some cracks now that
we're trying to consume fds from the migration code:

- Doing cleanup every time the last monitor connection closes works to
  reap unused fds, but also has the side effect of forcing the
  management layer to pass the file descriptors again in case of a
  disconnect/re-connect, if that happened to be the only monitor
  connection.

  Another side effect is that removing an fd with qmp_remove_fd() is
  effectively delayed until the last monitor connection closes.

  The usage of mon_refcount is also problematic because it's racy.

- Checking runstate_is_running() skips the cleanup unless the VM is
  running and avoids premature cleanup of the fds, but also has the
  side effect of blocking the legitimate removal of an fd via
  qmp_remove_fd() if the VM happens to be in another state.

  This affects qmp_remove_fd() and qmp_query_fdsets() in particular
  because requesting a removal at a bad time (guest stopped) might
  cause an fd to never be removed, or to be removed at a much later
  point in time, causing the query command to continue showing the
  supposedly removed fd/fdset.

Note that file descriptors that *have* been duplicated are owned by
the code that uses them and will be removed after qemu_close() is
called. Therefore we've decided that the best course of action to
avoid the undesired side-effects is to stop managing non-duplicated
file descriptors.

1- efb87c1697 ("monitor: Clean up fd sets on monitor disconnect")
2- ebe52b592d ("monitor: Prevent removing fd from set during init")

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c  | 15 ---
 monitor/hmp.c  |  2 --
 monitor/monitor-internal.h |  1 -
 monitor/monitor.c  |  1 -
 monitor/qmp.c  |  2 --
 5 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index bd45a26368..ea85ba1f05 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -175,6 +175,11 @@ static void monitor_fdset_free(MonFdset *mon_fdset)
 
 static void monitor_fdset_free_if_empty(MonFdset *mon_fdset)
 {
+/*
+ * Only remove an empty fdset. The fds are owned by the user and
+ * should have been removed with qmp_remove_fd(). The dup_fds are
+ * owned by QEMU and should have been removed with qemu_close().
+ */
 if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
 monitor_fdset_free(mon_fdset);
 }
@@ -194,9 +199,7 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 MonFdsetFd *mon_fdset_fd_next;
 
 QLIST_FOREACH_SAFE(mon_fdset_fd, _fdset->fds, next, mon_fdset_fd_next) 
{
-if ((mon_fdset_fd->removed ||
-(QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0)) &&
-runstate_is_running()) {
+if (mon_fdset_fd->removed) {
 monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
@@ -211,7 +214,7 @@ void monitor_fdsets_cleanup(void)
 
 QEMU_LOCK_GUARD(_fdsets_lock);
 QLIST_FOREACH_SAFE(mon_fdset, _fdsets, next, mon_fdset_next) {
-monitor_fdset_cleanup(mon_fdset);
+monitor_fdset_free_if_empty(mon_fdset);
 }
 }
 
@@ -484,9 +487,7 @@ void monitor_fdset_dup_fd_remove(int dup_fd)
 if (mon_fdset_fd_dup->fd == dup_fd) {
 QLIST_REMOVE(mon_fdset_fd_dup, next);
 g_free(mon_fdset_fd_dup);
-if (QLIST_EMPTY(_fdset->dup_fds)) {
-monitor_fdset_cleanup(mon_fdset);
-}
+monitor_fdset_free(mon_fdset);
 return;
 }
 }
diff --git a/monitor/hmp.c b/monitor/hmp.c
index 69c1b7e98a..460e8832f6 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1437,11 +1437,9 @@ static void monitor_event(void *opaque, QEMUChrEvent 
event)
 monitor_resume(mon);
 }
 qemu_mutex_unlock(>mon_lock);
-mon_refcount++;
 break;
 
 case CHR_EVENT_CLOSED:
-mon_refcount--;
 monitor_fdsets_cleanup();
 break;
 
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index 252de85681..cb628f681d 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -168,7 +168,6 @@ extern bool qmp_dispatcher_co_shutdown;
 extern QmpCommandList qmp_commands, qmp_cap_negotiation_commands;
 extern QemuMutex

[PATCH v3 14/16] monitor: fdset: Match against O_DIRECT

2024-06-17 Thread Fabiano Rosas

We're about to enable the use of O_DIRECT in the migration code and
due to the alignment restrictions imposed by filesystems we need to
make sure the flag is only used when doing aligned IO.

The migration will do parallel IO to different regions of a file, so
we need to use more than one file descriptor. Those cannot be obtained
by duplicating (dup()) since duplicated file descriptors share the
file status flags, including O_DIRECT. If one migration channel does
unaligned IO while another sets O_DIRECT to do aligned IO, the
filesystem would fail the unaligned operation.

The add-fd QMP command along with the fdset code are specifically
designed to allow the user to pass a set of file descriptors with
different access flags into QEMU to be later fetched by code that
needs to alternate between those flags when doing IO.

Extend the fdset matching to behave the same with the O_DIRECT flag.

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index fd87e7db8b..436d5d80b8 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -424,6 +424,11 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, 
Error **errp)
 int fd = -1;
 int dup_fd;
 int mon_fd_flags;
+int mask = O_ACCMODE;
+
+#ifdef O_DIRECT
+mask |= O_DIRECT;
+#endif
 
 if (mon_fdset->id != fdset_id) {
 continue;
@@ -437,7 +442,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags, 
Error **errp)
 return -1;
 }
 
-if ((flags & O_ACCMODE) == (mon_fd_flags & O_ACCMODE)) {
+if ((flags & mask) == (mon_fd_flags & mask)) {
 fd = mon_fdset_fd->fd;
 break;
 }
-- 
2.35.3

[PATCH 01/23] Add CPU initialization function

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Addded function to initialize ARM CPU
and to check if it supports 64 bit mode

Signed-off-by: Ajeet Singh 
Signed-off-by: Stacey Son 
---
 bsd-user/aarch64/target_arch_cpu.h | 42 ++
 1 file changed, 42 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_cpu.h

diff --git a/bsd-user/aarch64/target_arch_cpu.h 
b/bsd-user/aarch64/target_arch_cpu.h
new file mode 100644
index 00..db5c7062b9
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_cpu.h
@@ -0,0 +1,42 @@
+/*
+ *  ARM AArch64 cpu init and loop
+ *
+ * Copyright (c) 2015 Stacey Son
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_CPU_H
+#define TARGET_ARCH_CPU_H
+
+#include "target_arch.h"
+#include "target/arm/syndrome.h"
+
+#define TARGET_DEFAULT_CPU_MODEL "any"
+
+static inline void target_cpu_init(CPUARMState *env,
+struct target_pt_regs *regs)
+{
+int i;
+
+if (!(arm_feature(env, ARM_FEATURE_AARCH64))) {
+fprintf(stderr, "The selected ARM CPU does not support 64 bit mode\n");
+exit(1);
+}
+for (i = 0; i < 31; i++) {
+env->xregs[i] = regs->regs[i];
+}
+env->pc = regs->pc;
+env->xregs[31] = regs->sp;
+}
-- 
2.34.1

[PATCH 13/23] Add ARM AArch64 ELF definitions for bsd-user

2024-06-17 Thread Ajeet Singh

From: Stacey Son 

Defined mmap and dynamic load adresses and
set various elf parameters

Signed-off-by: Stacey Son 
Signed-off-by: Ajeet Singh 
Co-authored-by: Kyle Evans 
---
 bsd-user/aarch64/target_arch_elf.h | 37 ++
 1 file changed, 37 insertions(+)
 create mode 100644 bsd-user/aarch64/target_arch_elf.h

diff --git a/bsd-user/aarch64/target_arch_elf.h 
b/bsd-user/aarch64/target_arch_elf.h
new file mode 100644
index 00..6d0fa3525f
--- /dev/null
+++ b/bsd-user/aarch64/target_arch_elf.h
@@ -0,0 +1,37 @@
+/*
+ * ARM AArch64 ELF definitions for bsd-user
+ *
+ * Copyright (c) 2015 Stacey D. Son
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef TARGET_ARCH_ELF_H
+#define TARGET_ARCH_ELF_H
+
+#include "target/arm/cpu-features.h"
+
+#define ELF_START_MMAP 0x8000
+#define ELF_ET_DYN_LOAD_ADDR0x10
+
+#define elf_check_arch(x) ((x) == EM_AARCH64)
+
+#define ELF_CLASS   ELFCLASS64
+#define ELF_DATAELFDATA2LSB
+#define ELF_ARCHEM_AARCH64
+
+#define USE_ELF_CORE_DUMP
+#define ELF_EXEC_PAGESIZE   4096
+
+#endif /* TARGET_ARCH_ELF_H */
-- 
2.34.1

[PATCH v3 01/16] migration: Drop reference to QIOChannel if file seeking fails

2024-06-17 Thread Fabiano Rosas

We forgot to drop the reference to the QIOChannel in the error path of
the offset adjustment. Do it now.

Signed-off-by: Fabiano Rosas 
---
 migration/file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/file.c b/migration/file.c
index ab18ba505a..2bb8c64092 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -94,6 +94,7 @@ void file_start_outgoing_migration(MigrationState *s,
 
 ioc = QIO_CHANNEL(fioc);
 if (offset && qio_channel_io_seek(ioc, offset, SEEK_SET, errp) < 0) {
+object_unref(OBJECT(fioc));
 return;
 }
 qio_channel_set_name(ioc, "migration-file-outgoing");
-- 
2.35.3

[PATCH v3 00/16] migration/mapped-ram: Add direct-io support

2024-06-17 Thread Fabiano Rosas

Hi,

Not many changes since v2:

- new patch 1: drop IOC reference on offset check failure

- dropped 2 patches adding direct-io to the single threaded
  migration. We have agreed to use multifd with 1 channel for that
  use-case.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1335719164

Thanks

v2:
https://lore.kernel.org/r/20240523190548.23977-1-faro...@suse.de

v1:
https://lore.kernel.org/r/20240426142042.14573-1-faro...@suse.de

Hi everyone, here's the rest of the migration "mapped-ram" feature
that didn't get merged for 9.0. This series adds support for direct
I/O, the missing piece to get the desired performance improvements.

There's 3 parts to this:

1- The plumbing for the new "direct-io" migration parameter. With this
   we can already use direct-io with the file transport + multifd +
   mapped-ram. Patches 1-3.

Due to the alignment requirements of O_DIRECT and the fact that
multifd runs the channels in parallel with the migration thread, we
must open the migration file two times, one with O_DIRECT set and
another with it clear.

If the user is not passing in a file name which QEMU can open at will,
we must then require that the user pass the two file descriptors with
the flags already properly set. We'll use the already existing fdset +
QMP add-fd infrastructure for this.

2- Changes to the fdset infrastructure to support O_DIRECT. We need
   those to be able to select from the user-provided fdset the file
   descriptor that contains the O_DIRECT flag. Patches 4-5.

3- Some fdset validation to make sure the two-fds requirement is being
   met. Patches 6-7.

CI run: https://gitlab.com/farosas/qemu/-/pipelines/1269352083

Fabiano Rosas (15):
  migration: Drop reference to QIOChannel if file seeking fails
  migration: Fix file migration with fdset
  tests/qtest/migration: Fix file migration offset check
  tests/qtest/migration: Add a precopy file test with fdset
  monitor: Introduce monitor_fdset_*free
  monitor: Stop removing non-duplicated fds
  monitor: Simplify fdset and fd removal
  monitor: Report errors from monitor_fdset_dup_fd_add
  io: Stop using qemu_open_old in channel-file
  migration: Add direct-io parameter
  migration/multifd: Add direct-io support
  tests/qtest/migration: Add tests for file migration with direct-io
  monitor: fdset: Match against O_DIRECT
  migration: Add documentation for fdset with multifd + file
  tests/qtest/migration: Add a test for mapped-ram with passing of fds

Peter Xu (1):
  monitor: Drop monitor_fdset_dup_fd_find/_remove()

 docs/devel/migration/main.rst   |  24 ++-
 docs/devel/migration/mapped-ram.rst |   6 +-
 include/monitor/monitor.h   |   3 +-
 include/qemu/osdep.h|   2 +
 io/channel-file.c   |   8 +-
 migration/file.c|  45 -
 migration/file.h|   1 -
 migration/migration-hmp-cmds.c  |  11 ++
 migration/migration.c   |  23 +++
 migration/options.c |  35 
 migration/options.h |   1 +
 monitor/fds.c   |  96 +-
 monitor/hmp.c   |   2 -
 monitor/monitor-internal.h  |   1 -
 monitor/monitor.c   |   1 -
 monitor/qmp.c   |   2 -
 qapi/migration.json |  21 ++-
 stubs/fdset.c   |   7 +-
 tests/qtest/migration-helpers.c |  44 +
 tests/qtest/migration-helpers.h |   8 +
 tests/qtest/migration-test.c| 263 +---
 util/osdep.c|  34 ++--
 22 files changed, 508 insertions(+), 130 deletions(-)


base-commit: 05ad1440b8428b0ade9b8e5c01469adb8fbf83e3
-- 
2.35.3

[PATCH v3 05/16] monitor: Drop monitor_fdset_dup_fd_find/_remove()

2024-06-17 Thread Fabiano Rosas

From: Peter Xu 

Those functions are not needed, one remove function should already
work.  Clean it up.

Here the code doesn't really care about whether we need to keep that dupfd
around if close() failed: when that happens something got very wrong,
keeping the dup_fd around the fdsets may not help that situation so far.

Cc: Dr. David Alan Gilbert 
Cc: Markus Armbruster 
Cc: Philippe Mathieu-Daudé 
Cc: Paolo Bonzini 
Cc: Daniel P. Berrangé 
Signed-off-by: Peter Xu 
Reviewed-by: Daniel P. Berrangé 
[add missing return statement, removal during traversal is not safe]
Signed-off-by: Fabiano Rosas 
---
 include/monitor/monitor.h |  1 -
 monitor/fds.c | 28 ++--
 stubs/fdset.c |  5 -
 util/osdep.c  | 15 +--
 4 files changed, 7 insertions(+), 42 deletions(-)

diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 965f5d5450..fd9b3f538c 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -53,7 +53,6 @@ AddfdInfo *monitor_fdset_add_fd(int fd, bool has_fdset_id, 
int64_t fdset_id,
 const char *opaque, Error **errp);
 int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags);
 void monitor_fdset_dup_fd_remove(int dup_fd);
-int64_t monitor_fdset_dup_fd_find(int dup_fd);
 
 void monitor_register_hmp(const char *name, bool info,
   void (*cmd)(Monitor *mon, const QDict *qdict));
diff --git a/monitor/fds.c b/monitor/fds.c
index d86c2c674c..fb9f58c056 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -458,7 +458,7 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 #endif
 }
 
-static int64_t monitor_fdset_dup_fd_find_remove(int dup_fd, bool remove)
+void monitor_fdset_dup_fd_remove(int dup_fd)
 {
 MonFdset *mon_fdset;
 MonFdsetFd *mon_fdset_fd_dup;
@@ -467,31 +467,15 @@ static int64_t monitor_fdset_dup_fd_find_remove(int 
dup_fd, bool remove)
 QLIST_FOREACH(mon_fdset, _fdsets, next) {
 QLIST_FOREACH(mon_fdset_fd_dup, _fdset->dup_fds, next) {
 if (mon_fdset_fd_dup->fd == dup_fd) {
-if (remove) {
-QLIST_REMOVE(mon_fdset_fd_dup, next);
-g_free(mon_fdset_fd_dup);
-if (QLIST_EMPTY(_fdset->dup_fds)) {
-monitor_fdset_cleanup(mon_fdset);
-}
-return -1;
-} else {
-return mon_fdset->id;
+QLIST_REMOVE(mon_fdset_fd_dup, next);
+g_free(mon_fdset_fd_dup);
+if (QLIST_EMPTY(_fdset->dup_fds)) {
+monitor_fdset_cleanup(mon_fdset);
 }
+return;
 }
 }
 }
-
-return -1;
-}
-
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-return monitor_fdset_dup_fd_find_remove(dup_fd, false);
-}
-
-void monitor_fdset_dup_fd_remove(int dup_fd)
-{
-monitor_fdset_dup_fd_find_remove(dup_fd, true);
 }
 
 int monitor_fd_param(Monitor *mon, const char *fdname, Error **errp)
diff --git a/stubs/fdset.c b/stubs/fdset.c
index d7c39a28ac..389e368a29 100644
--- a/stubs/fdset.c
+++ b/stubs/fdset.c
@@ -9,11 +9,6 @@ int monitor_fdset_dup_fd_add(int64_t fdset_id, int flags)
 return -1;
 }
 
-int64_t monitor_fdset_dup_fd_find(int dup_fd)
-{
-return -1;
-}
-
 void monitor_fdset_dup_fd_remove(int dupfd)
 {
 }
diff --git a/util/osdep.c b/util/osdep.c
index 5d23bbfbec..756de9a745 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -398,21 +398,8 @@ int qemu_open_old(const char *name, int flags, ...)
 
 int qemu_close(int fd)
 {
-int64_t fdset_id;
-
 /* Close fd that was dup'd from an fdset */
-fdset_id = monitor_fdset_dup_fd_find(fd);
-if (fdset_id != -1) {
-int ret;
-
-ret = close(fd);
-if (ret == 0) {
-monitor_fdset_dup_fd_remove(fd);
-}
-
-return ret;
-}
-
+monitor_fdset_dup_fd_remove(fd);
 return close(fd);
 }
 
-- 
2.35.3

[PATCH v3 06/16] monitor: Introduce monitor_fdset_*free

2024-06-17 Thread Fabiano Rosas

Introduce new functions to remove and free no longer used fds and
fdsets.

We need those to decouple the remove/free routines from
monitor_fdset_cleanup() which will go away in the next patches.

The new functions:

- monitor_fdset_free/_if_empty() will be used when a monitor
  connection closes and when an fd is removed to cleanup any fdset
  that is now empty.

- monitor_fdset_fd_free() will be used to remove one or more fds that
  have been explicitly targeted by qmp_remove_fd().

Reviewed-by: Peter Xu 
Signed-off-by: Fabiano Rosas 
---
 monitor/fds.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/monitor/fds.c b/monitor/fds.c
index fb9f58c056..bd45a26368 100644
--- a/monitor/fds.c
+++ b/monitor/fds.c
@@ -167,6 +167,27 @@ int monitor_get_fd(Monitor *mon, const char *fdname, Error 
**errp)
 return -1;
 }
 
+static void monitor_fdset_free(MonFdset *mon_fdset)
+{
+QLIST_REMOVE(mon_fdset, next);
+g_free(mon_fdset);
+}
+
+static void monitor_fdset_free_if_empty(MonFdset *mon_fdset)
+{
+if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
+monitor_fdset_free(mon_fdset);
+}
+}
+
+static void monitor_fdset_fd_free(MonFdsetFd *mon_fdset_fd)
+{
+close(mon_fdset_fd->fd);
+g_free(mon_fdset_fd->opaque);
+QLIST_REMOVE(mon_fdset_fd, next);
+g_free(mon_fdset_fd);
+}
+
 static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 {
 MonFdsetFd *mon_fdset_fd;
@@ -176,17 +197,11 @@ static void monitor_fdset_cleanup(MonFdset *mon_fdset)
 if ((mon_fdset_fd->removed ||
 (QLIST_EMPTY(_fdset->dup_fds) && mon_refcount == 0)) &&
 runstate_is_running()) {
-close(mon_fdset_fd->fd);
-g_free(mon_fdset_fd->opaque);
-QLIST_REMOVE(mon_fdset_fd, next);
-g_free(mon_fdset_fd);
+monitor_fdset_fd_free(mon_fdset_fd);
 }
 }
 
-if (QLIST_EMPTY(_fdset->fds) && QLIST_EMPTY(_fdset->dup_fds)) {
-QLIST_REMOVE(mon_fdset, next);
-g_free(mon_fdset);
-}
+monitor_fdset_free_if_empty(mon_fdset);
 }
 
 void monitor_fdsets_cleanup(void)
-- 
2.35.3

Re: [PULL v2 00/19] aspeed queue

2024-06-17 Thread Richard Henderson


On 6/16/24 21:02, Cédric Le Goater wrote:

The following changes since commit 05ad1440b8428b0ade9b8e5c01469adb8fbf83e3:

   Merge tag 'virtio-grants-v8-tag' ofhttps://gitlab.com/sstabellini/qemu  into 
staging (2024-06-15 20:13:06 -0700)

are available in the Git repository at:

   https://github.com/legoater/qemu/  tags/pull-aspeed-20240617

for you to fetch changes up to 5f44521242d2fdfa190206a6be40577a58a71ef9:

   MAINTAINERS: Add reviewers for ASPEED BMCs (2024-06-16 21:08:54 +0200)


aspeed queue:

* Add AST2700 support

Changes in v2:

   - Fixed class_size in TYPE_ASPEED_INTC definition
   - Fixed spelling : Unhandeled -> Unhandled


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/9.1 as 
appropriate.


r~

Re: [PATCH v2] os-posix: Expand setrlimit() syscall compatibility

2024-06-17 Thread Richard Henderson


On 6/17/24 08:15, Philippe Mathieu-Daudé wrote:

On 17/6/24 15:07, Michael Tokarev wrote:

17.06.2024 10:19, Philippe Mathieu-Daudé wrote:

Hi Trent,

On 14/6/24 23:06, Trent Huber wrote:

Darwin uses a subtly different version of the setrlimit() syscall as
described in the COMPATIBILITY section of the macOS man page. The value
of the rlim_cur member has been adjusted accordingly for Darwin-based
systems.

Signed-off-by: Trent Huber 
---
The previous version assumed OPEN_MAX was a constant defined on all
POSIX systems--turns out it's only a macOS constant. This version adds
preprocessing conditionals to maintain compatibility with Linux.

  os-posix.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/os-posix.c b/os-posix.c
index a4284e2c07..43f9a43f3f 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -270,7 +270,11 @@ void os_setup_limits(void)
  return;
  }
+#ifdef CONFIG_DARWIN
+    nofile.rlim_cur = OPEN_MAX < nofile.rlim_max ? OPEN_MAX : nofile.rlim_max;


Why open-code min()? (The man-page also suggests it).


I guess it's because stddef.h isn't included there, so min() isn't immediately
available :)


I see os-posix.c -> "qemu/osdep.h" -> . Anyway,


We also have MIN in osdep.h.


r~

[PATCH v2 04/10] migration: Cleanup incoming migration setup state change

2024-06-17 Thread Peter Xu

Destination QEMU can setup incoming ports for two purposes: either a fresh
new incoming migration, in which QEMU will switch to SETUP for channel
establishment, or a paused postcopy migration, in which QEMU will stay in
POSTCOPY_PAUSED until kicking off the RECOVER phase.

Now the state machine worked on dest node for the latter, only because
migrate_set_state() implicitly will become a noop if the current state
check failed.  It wasn't clear at all.

Clean it up by providing a helper migration_incoming_state_setup() doing
proper checks over current status.  Postcopy-paused will be explicitly
checked now, and then we can bail out for unknown states.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 75c9d80e8e..59442181a1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -595,6 +595,29 @@ bool migrate_uri_parse(const char *uri, MigrationChannel 
**channel,
 return true;
 }
 
+static bool
+migration_incoming_state_setup(MigrationIncomingState *mis, Error **errp)
+{
+MigrationStatus current = mis->state;
+
+if (current == MIGRATION_STATUS_POSTCOPY_PAUSED) {
+/*
+ * Incoming postcopy migration will stay in PAUSED state even if
+ * reconnection happened.
+ */
+return true;
+}
+
+if (current != MIGRATION_STATUS_NONE) {
+error_setg(errp, "Illegal migration incoming state: %s",
+   MigrationStatus_str(current));
+return false;
+}
+
+migrate_set_state(>state, current, MIGRATION_STATUS_SETUP);
+return true;
+}
+
 static void qemu_start_incoming_migration(const char *uri, bool has_channels,
   MigrationChannelList *channels,
   Error **errp)
@@ -633,8 +656,9 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 return;
 }
 
-migrate_set_state(>state, MIGRATION_STATUS_NONE,
-  MIGRATION_STATUS_SETUP);
+if (!migration_incoming_state_setup(mis, errp)) {
+return;
+}
 
 if (addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET) {
 SocketAddress *saddr = >u.socket;
-- 
2.45.0

[PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests

2024-06-17 Thread Peter Xu

Most of them are not needed, we can stick with one ifdef inside
postcopy_recover_fail() so as to cover the scm right tricks only.
The tests won't run on windows anyway due to has_uffd always false.

Signed-off-by: Peter Xu 
---
 tests/qtest/migration-test.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index b7e3406471..13b59d4c10 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1353,9 +1353,9 @@ static void wait_for_postcopy_status(QTestState *one, 
const char *status)
   "completed", NULL });
 }
 
-#ifndef _WIN32
 static void postcopy_recover_fail(QTestState *from, QTestState *to)
 {
+#ifndef _WIN32
 int ret, pair1[2], pair2[2];
 char c;
 
@@ -1417,8 +1417,8 @@ static void postcopy_recover_fail(QTestState *from, 
QTestState *to)
 close(pair1[1]);
 close(pair2[0]);
 close(pair2[1]);
+#endif
 }
-#endif /* _WIN32 */
 
 static void test_postcopy_recovery_common(MigrateCommon *args)
 {
@@ -1458,7 +1458,6 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
 wait_for_postcopy_status(to, "postcopy-paused");
 wait_for_postcopy_status(from, "postcopy-paused");
 
-#ifndef _WIN32
 if (args->postcopy_recovery_test_fail) {
 /*
  * Test when a wrong socket specified for recover, and then the
@@ -1467,7 +1466,6 @@ static void test_postcopy_recovery_common(MigrateCommon 
*args)
 postcopy_recover_fail(from, to);
 /* continue with a good recovery */
 }
-#endif /* _WIN32 */
 
 /*
  * Create a new socket to emulate a new channel that is different
@@ -1496,7 +1494,6 @@ static void test_postcopy_recovery(void)
 test_postcopy_recovery_common();
 }
 
-#ifndef _WIN32
 static void test_postcopy_recovery_double_fail(void)
 {
 MigrateCommon args = {
@@ -1505,7 +1502,6 @@ static void test_postcopy_recovery_double_fail(void)
 
 test_postcopy_recovery_common();
 }
-#endif /* _WIN32 */
 
 #ifdef CONFIG_GNUTLS
 static void test_postcopy_recovery_tls_psk(void)
@@ -3486,10 +3482,8 @@ int main(int argc, char **argv)
test_postcopy_preempt);
 migration_test_add("/migration/postcopy/preempt/recovery/plain",
test_postcopy_preempt_recovery);
-#ifndef _WIN32
 migration_test_add("/migration/postcopy/recovery/double-failures",
test_postcopy_recovery_double_fail);
-#endif /* _WIN32 */
 if (is_x86) {
 migration_test_add("/migration/postcopy/suspend",
test_postcopy_suspend);
-- 
2.45.0

1 2 3 4 >

1 - 100 of 300 matches

Mail list logo