date:20230810

[PATCH v2] linux-user/riscv: Use abi type for target_ucontext

2023-08-10 Thread LIU Zhiwei

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei 
Reviewed-by: Richard Henderson 
Reviewed-by: Daniel Henrique Barboza 
---
v2:
- Use abi_ptr instead of abi_ulong for uc_link. (Suggest by Philippe 
Mathieu-Daudé)
---
 linux-user/riscv/signal.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index eaa168199a..f989f7f51f 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -38,8 +38,8 @@ struct target_sigcontext {
 }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
 
 struct target_ucontext {
-unsigned long uc_flags;
-struct target_ucontext *uc_link;
+abi_ulong uc_flags;
+abi_ptr uc_link;
 target_stack_t uc_stack;
 target_sigset_t uc_sigmask;
 uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];
-- 
2.17.1

Re:Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-10 Thread ThinerLogoer

At 2023-08-11 05:24:43, "Peter Xu"  wrote:
>On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
>> >I think we have the following options (there might be more)
>> >
>> >1) This patch.
>> >
>> >2) New flag for memory-backend-file. We already have "readonly" and 
>> >"share=". I'm having a hard time coming up with a good name that really 
>> >describes the subtle difference.
>> >
>> >3) Glue behavior to the QEMU machine
>> >
>> 
>> 4) '-deny-private-discard' argv, or environment variable, or both
>
>I'd personally vote for (2).  How about "fdperm"?  To describe when we want
>to use different rw permissions on the file (besides the access permission
>of the memory we already provided with "readonly"=XXX).  IIUC the only sane
>value will be ro/rw/default, where "default" should just use the same rw
>permission as the memory ("readonly"=XXX).
>
>Would that be relatively clean and also work in this use case?
>
>(the other thing I'd wish we don't have that fallback is, as long as we
> have any of that "fallback" we'll need to be compatible with it since
> then, and for ever...)

If it must be (2), I would vote (2) + (4), with (4) adjust the default behavior 
of said `fdperm`.
Mainly because (private+discard) is itself not a good practice and (4) serves
as a good tool to help catch existing (private+discard) problems.

Actually (readonly+private) is more reasonable than (private+discard), so I
want at least one room for a default (readonly+private) behavior.

Also in my case I kind of have to use "-mem-path" despite it being considered
to be close to deprecated. Only with this I can avoid knowledge of memory
backend before migration. Actually there seems to be no equivalent working 
after-migration
setup of "-object memory-backend-file,... -machine q35,mem=..." that can match
before-migration setup of "-machine q35" (specifying nothing). Therefore
I must make a plan and choose a migration method BEFORE I boot the
machine and prepare to migrate, reducing the operation freedom.
Considering that, I have to use "-mem-path" which keeps the freedom but
has no configurable argument and I have to rely on default config.

Are there any "-object memory-backend-file..." setup equivalent to "-machine 
q35"
that can migrate from and to each other? If there is, I want to try it out.
By the way "-object memory-backend-file,id=pc.ram" has just been killed by an 
earlier
commit.

Either (4) or fixing this should help my config. Hope you can consider this
deeper and figure out a more systematic solution that helps more user?

--

Regards,
logoerthiner

[PATCH v2 2/4] hw/i2c/aspeed: Fix Tx count and Rx size error

2023-08-10 Thread Hang Yu

According to the ast2600 datasheet,the actual Tx count is
Transmit Data Byte Count plus 1, and the max Rx size is
Receive Pool Buffer Size plus 1, both in Pool Buffer Control Register.
The version before forgot to plus 1, and mistake Rx count for Rx size.

Signed-off-by: Hang Yu 
---
 hw/i2c/aspeed_i2c.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
index 1f071a3811..e485d8bfb8 100644
--- a/hw/i2c/aspeed_i2c.c
+++ b/hw/i2c/aspeed_i2c.c
@@ -236,7 +236,7 @@ static int aspeed_i2c_bus_send(AspeedI2CBus *bus, uint8_t 
pool_start)
 uint32_t reg_byte_buf = aspeed_i2c_bus_byte_buf_offset(bus);
 uint32_t reg_dma_len = aspeed_i2c_bus_dma_len_offset(bus);
 int pool_tx_count = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl,
-TX_COUNT);
+TX_COUNT) + 1;
 
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_BUFF_EN)) {
 for (i = pool_start; i < pool_tx_count; i++) {
@@ -293,7 +293,7 @@ static void aspeed_i2c_bus_recv(AspeedI2CBus *bus)
 uint32_t reg_dma_len = aspeed_i2c_bus_dma_len_offset(bus);
 uint32_t reg_dma_addr = aspeed_i2c_bus_dma_addr_offset(bus);
 int pool_rx_count = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl,
-RX_COUNT);
+RX_SIZE) + 1;
 
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, RX_BUFF_EN)) {
 uint8_t *pool_base = aic->bus_pool_base(bus);
@@ -418,7 +418,7 @@ static void aspeed_i2c_bus_cmd_dump(AspeedI2CBus *bus)
 uint32_t reg_intr_sts = aspeed_i2c_bus_intr_sts_offset(bus);
 uint32_t reg_dma_len = aspeed_i2c_bus_dma_len_offset(bus);
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, RX_BUFF_EN)) {
-count = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl, TX_COUNT);
+count = SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl, TX_COUNT) + 
1;
 } else if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, RX_DMA_EN)) {
 count = bus->regs[reg_dma_len];
 } else { /* BYTE mode */
@@ -490,7 +490,7 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, 
uint64_t value)
  */
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_BUFF_EN)) {
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl, TX_COUNT)
-== 1) {
+== 0) {
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_TX_CMD, 0);
 } else {
 /*
-- 
2.39.2 (Apple Git-143)

[PATCH v2 3/4] hw/i2c/aspeed: Fix TXBUF transmission start position error

2023-08-10 Thread Hang Yu

According to the ast2600 datasheet and the linux aspeed i2c driver,
the TXBUF transmission start position should be TXBUF[0] instead
of TXBUF[1],so the arg pool_start is useless,and the address is not
included in TXBUF.So even if Tx Count equals zero,there is at least
1 byte data needs to be transmitted,and M_TX_CMD should not be cleared
at this condition.The driver url is:
https://github.com/AspeedTech-BMC/linux/blob/aspeed-master-v5.15/drivers/i2c/busses/i2c-ast2600.c

Signed-off-by: Hang Yu 
---
 hw/i2c/aspeed_i2c.c | 30 ++
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
index e485d8bfb8..44905d7899 100644
--- a/hw/i2c/aspeed_i2c.c
+++ b/hw/i2c/aspeed_i2c.c
@@ -226,7 +226,7 @@ static int aspeed_i2c_dma_read(AspeedI2CBus *bus, uint8_t 
*data)
 return 0;
 }
 
-static int aspeed_i2c_bus_send(AspeedI2CBus *bus, uint8_t pool_start)
+static int aspeed_i2c_bus_send(AspeedI2CBus *bus)
 {
 AspeedI2CClass *aic = ASPEED_I2C_GET_CLASS(bus->controller);
 int ret = -1;
@@ -239,7 +239,7 @@ static int aspeed_i2c_bus_send(AspeedI2CBus *bus, uint8_t 
pool_start)
 TX_COUNT) + 1;
 
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_BUFF_EN)) {
-for (i = pool_start; i < pool_tx_count; i++) {
+for (i = 0; i < pool_tx_count; i++) {
 uint8_t *pool_base = aic->bus_pool_base(bus);
 
 trace_aspeed_i2c_bus_send("BUF", i + 1, pool_tx_count,
@@ -273,7 +273,7 @@ static int aspeed_i2c_bus_send(AspeedI2CBus *bus, uint8_t 
pool_start)
 }
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, TX_DMA_EN, 0);
 } else {
-trace_aspeed_i2c_bus_send("BYTE", pool_start, 1,
+trace_aspeed_i2c_bus_send("BYTE", 0, 1,
   bus->regs[reg_byte_buf]);
 ret = i2c_send(bus->bus, bus->regs[reg_byte_buf]);
 }
@@ -446,10 +446,8 @@ static void aspeed_i2c_bus_cmd_dump(AspeedI2CBus *bus)
  */
 static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, uint64_t value)
 {
-uint8_t pool_start = 0;
 uint32_t reg_intr_sts = aspeed_i2c_bus_intr_sts_offset(bus);
 uint32_t reg_cmd = aspeed_i2c_bus_cmd_offset(bus);
-uint32_t reg_pool_ctrl = aspeed_i2c_bus_pool_ctrl_offset(bus);
 uint32_t reg_dma_len = aspeed_i2c_bus_dma_len_offset(bus);
 
 if (!aspeed_i2c_check_sram(bus)) {
@@ -483,27 +481,11 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, 
uint64_t value)
 
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_START_CMD, 0);
 
-/*
- * The START command is also a TX command, as the slave
- * address is sent on the bus. Drop the TX flag if nothing
- * else needs to be sent in this sequence.
- */
-if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_BUFF_EN)) {
-if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_pool_ctrl, TX_COUNT)
-== 0) {
-SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_TX_CMD, 0);
-} else {
-/*
- * Increase the start index in the TX pool buffer to
- * skip the address byte.
- */
-pool_start++;
-}
-} else if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_DMA_EN)) {
+if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_DMA_EN)) {
 if (bus->regs[reg_dma_len] == 0) {
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_TX_CMD, 0);
 }
-} else {
+} else if (!SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, TX_BUFF_EN)) {
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_cmd, M_TX_CMD, 0);
 }
 
@@ -520,7 +502,7 @@ static void aspeed_i2c_bus_handle_cmd(AspeedI2CBus *bus, 
uint64_t value)
 
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, M_TX_CMD)) {
 aspeed_i2c_set_state(bus, I2CD_MTXD);
-if (aspeed_i2c_bus_send(bus, pool_start)) {
+if (aspeed_i2c_bus_send(bus)) {
 SHARED_ARRAY_FIELD_DP32(bus->regs, reg_intr_sts, TX_NAK, 1);
 i2c_end_transfer(bus->bus);
 } else {
-- 
2.39.2 (Apple Git-143)

[PATCH v2 1/4] hw/i2c/aspeed: Fix I2CD_POOL_CTRL register bit field defination

2023-08-10 Thread Hang Yu

Fixed inconsistency between the regisiter bit field defination in headfile
and the ast2600 datasheet. The reg name is I2CD1C:Pool Buffer Control
Register in old register mode and  I2CC0C: Master/Slave Pool Buffer Control
Register in new register mode. They share bit field
[12:8]:Transmit Data Byte Count and bit field
[29:24]:Actual Received Pool Buffer Size according to the datasheet.

Signed-off-by: Hang Yu 
---
 include/hw/i2c/aspeed_i2c.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h
index 51c944efea..2e1e15aaf0 100644
--- a/include/hw/i2c/aspeed_i2c.h
+++ b/include/hw/i2c/aspeed_i2c.h
@@ -139,9 +139,9 @@ REG32(I2CD_CMD, 0x14) /* I2CD Command/Status */
 REG32(I2CD_DEV_ADDR, 0x18) /* Slave Device Address */
 SHARED_FIELD(SLAVE_DEV_ADDR1, 0, 7)
 REG32(I2CD_POOL_CTRL, 0x1C) /* Pool Buffer Control */
-SHARED_FIELD(RX_COUNT, 24, 5)
+SHARED_FIELD(RX_COUNT, 24, 6)
 SHARED_FIELD(RX_SIZE, 16, 5)
-SHARED_FIELD(TX_COUNT, 9, 5)
+SHARED_FIELD(TX_COUNT, 8, 5)
 FIELD(I2CD_POOL_CTRL, OFFSET, 2, 6) /* AST2400 */
 REG32(I2CD_BYTE_BUF, 0x20) /* Transmit/Receive Byte Buffer */
 SHARED_FIELD(RX_BUF, 8, 8)
-- 
2.39.2 (Apple Git-143)

[PATCH v2 4/4] hw/i2c/aspeed: Add support for BUFFER ORGANIZATION in new register mode

2023-08-10 Thread Hang Yu

Added support for the BUFFER ORGANIZATION option in reg I2CC_POOL_CTRL,
when set to 1,The buffer is split into two parts: Lower 16 bytes for Tx
and higher 16 bytes for Rx.

Signed-off-by: Hang Yu 
---
 hw/i2c/aspeed_i2c.c | 7 ++-
 include/hw/i2c/aspeed_i2c.h | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/i2c/aspeed_i2c.c b/hw/i2c/aspeed_i2c.c
index 44905d7899..26fefb8f9e 100644
--- a/hw/i2c/aspeed_i2c.c
+++ b/hw/i2c/aspeed_i2c.c
@@ -296,7 +296,12 @@ static void aspeed_i2c_bus_recv(AspeedI2CBus *bus)
 RX_SIZE) + 1;
 
 if (SHARED_ARRAY_FIELD_EX32(bus->regs, reg_cmd, RX_BUFF_EN)) {
-uint8_t *pool_base = aic->bus_pool_base(bus);
+uint8_t *pool_base;
+if (ARRAY_FIELD_EX32(bus->regs, I2CC_POOL_CTRL, BUF_ORGANIZATION)) {
+pool_base = aic->bus_pool_base(bus) + 16;
+} else {
+pool_base = aic->bus_pool_base(bus);
+}
 
 for (i = 0; i < pool_rx_count; i++) {
 pool_base[i] = i2c_recv(bus->bus);
diff --git a/include/hw/i2c/aspeed_i2c.h b/include/hw/i2c/aspeed_i2c.h
index 2e1e15aaf0..88b144a599 100644
--- a/include/hw/i2c/aspeed_i2c.h
+++ b/include/hw/i2c/aspeed_i2c.h
@@ -162,6 +162,7 @@ REG32(I2CC_MS_TXRX_BYTE_BUF, 0x08)
 /* 15:0  shared with I2CD_BYTE_BUF[15:0] */
 REG32(I2CC_POOL_CTRL, 0x0c)
 /* 31:0 shared with I2CD_POOL_CTRL[31:0] */
+FIELD(I2CC_POOL_CTRL, BUF_ORGANIZATION, 0, 1) /* AST2600 */
 REG32(I2CM_INTR_CTRL, 0x10)
 REG32(I2CM_INTR_STS, 0x14)
 FIELD(I2CM_INTR_STS, PKT_STATE, 28, 4)
-- 
2.39.2 (Apple Git-143)

Re: [PULL 1/1] target/openrisc: Set EPCR to next PC on FPE exceptions

2023-08-10 Thread Michael Tokarev


10.08.2023 22:50, Stafford Horne wrote:

On Thu, Aug 10, 2023 at 09:35:18AM +0300, Michael Tokarev wrote:

..

Is it a -stable material?  It applies cleanly to 8.0 and 7.2.
Or maybe it is not needed on older versions, not being noticed before?


I would say no, it will work on 8.0 an 7.2 but this code path is not very useful
withouth the other 8.1 Floating Point Exception handling updates.


Thank you for letting me know. This makes good sense, and shares my expectations
too.  This particular situation is rather interesting, that's why I asked.

/mjt

Re: [PATCH v1 2/6] target/loongarch: Add some checks before translating fpu instructions

2023-08-10 Thread gaosong


在 2023/8/10 下午11:03, Richard Henderson 写道:

On 8/10/23 05:41, Song Gao wrote:

This patch adds REQUIRE_FP/FP_SP/FP_DP to check CPUCFG2.FP/FP_SP/FP_DP.

Signed-off-by: Song Gao 
---
  target/loongarch/cpu.h    |   6 +
  .../loongarch/insn_trans/trans_farith.c.inc   | 132 --
  target/loongarch/insn_trans/trans_fcmp.c.inc  |   4 +
  target/loongarch/insn_trans/trans_fcnv.c.inc  |  56 
  .../loongarch/insn_trans/trans_fmemory.c.inc  | 104 ++
  target/loongarch/insn_trans/trans_fmov.c.inc  |  47 +--
  6 files changed, 247 insertions(+), 102 deletions(-)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 9f550793ca..5594d83011 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -459,6 +459,9 @@ static inline void set_pc(CPULoongArchState *env, 
uint64_t value)

  #define HW_FLAGS_EUEN_FPE   0x04
  #define HW_FLAGS_EUEN_SXE   0x08
  #define HW_FLAGS_VA32   0x20
+#define HW_FLAGS_FP 0x40
+#define HW_FLAGS_FP_SP  0x80
+#define HW_FLAGS_FP_DP  0x100
  static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, 
vaddr *pc,
  uint64_t *cs_base, uint32_t 
*flags)
@@ -469,6 +472,9 @@ static inline void 
cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
  *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * 
HW_FLAGS_EUEN_FPE;
  *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * 
HW_FLAGS_EUEN_SXE;

  *flags |= is_va32(env) * HW_FLAGS_VA32;
+    *flags |= FIELD_EX32(env->cpucfg[2], CPUCFG2, FP) * HW_FLAGS_FP;
+    *flags |= FIELD_EX32(env->cpucfg[2], CPUCFG2, FP_SP) * 
HW_FLAGS_FP_SP;
+    *flags |= FIELD_EX32(env->cpucfg[2], CPUCFG2, FP_DP) * 
HW_FLAGS_FP_DP;


You do not need to put any of these in HW_FLAGS, because CPUCFG space 
never changes for the lifetime of the cpu.


You can extract these into DisasContext in loongarch_tr_init_disas_context.


+#define REQUIRE_FP do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_FP) == 0) { \
+    return false; \
+    } \
+} while (0)
+
+#define REQUIRE_FP_SP do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_FP_SP) == 0) { \
+    return false; \
+    } \
+} while (0)
+
+#define REQUIRE_FP_DP do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_FP_DP) == 0) { \
+    return false; \
+    } \
+} while (0)


It would be much better to not create so many of these REQUIRE macros.


+TRANS(fadd_s, gen_fff, 0, gen_helper_fadd_s)
+TRANS(fadd_d, gen_fff, 1, gen_helper_fadd_d)


0 vs 1 is very opaque.

Better is something like Jiajie Chen's TRANS_64,


+/* for LoongArch64-only instructions */
+#define TRANS_64(NAME, FUNC, ...) \
+    static bool trans_##NAME(DisasContext *ctx, arg_##NAME * a) \
+    { \
+    return ctx->la64 && FUNC(ctx, a, __VA_ARGS__); \
+    }


But as we now know, we would need at least 7 of these.

Even better would be to generalize this so that every instruction 
records the condition under which it is valid.


Perhaps

typedef struct DisasContext {
     ...
     uint32_t cpucfg1;
     uint32_t cpucfg2;
};

static void loongarch_tr_init_disas_context(...)
{
     ...
     ctx->cpucfg1 = env->cpucfg[1];
     ctx->cpucfg2 = env->cpucfg[2];
}

#define avail_ALL(C)  true
#define avail_64(C)   FIELD_EX32((C)->cpucfg1, CPUCFG1, ARCH) == 
CPUCFG1_ARCH_LA64

#define avail_FP(C)   FIELD_EX32((C)->cpucfg2, CPUCFG2, FP)
etc


#define TRANS(NAME, AVAIL, FUNC, ...) \
     static bool trans_##NAME(DisasContext *ctx, arg_##NAME *a)  \
     { return avail_##AVAIL(ctx) && FUNC(ctx, a, __VA_ARGS__); }

so that the above becomes

TRANS(fadd_s, FP_SP, gen_fff, gen_helper_fadd_s)
TRANS(fadd_d, FP_DP, gen_fff, gen_helper_fadd_d)

and even simple instructions get

TRANS(add_w, ALL, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)
TRANS(add_d,  64, gen_rrr, EXT_NONE, EXT_NONE, EXT_SIGN, tcg_gen_add_tl)


Thanks for your suggestions, I will send v2 as soon as possblie.

Thanks.
Song Gao

Re: [PATCH] linux-user/riscv: Use abi_ulong for target_ucontext

2023-08-10 Thread LIU Zhiwei




On 2023/8/10 18:48, Philippe Mathieu-Daudé wrote:

On 8/8/23 11:34, LIU Zhiwei wrote:

We should not use types dependend on host arch for target_ucontext.
This bug is found when run rv32 applications.

Signed-off-by: LIU Zhiwei 
---
  linux-user/riscv/signal.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/linux-user/riscv/signal.c b/linux-user/riscv/signal.c
index eaa168199a..ff8634a272 100644
--- a/linux-user/riscv/signal.c
+++ b/linux-user/riscv/signal.c
@@ -38,8 +38,8 @@ struct target_sigcontext {
  }; /* cf. riscv-linux:arch/riscv/include/uapi/asm/ptrace.h */
    struct target_ucontext {
-    unsigned long uc_flags;
-    struct target_ucontext *uc_link;
+    abi_ulong uc_flags;


Correct.


+    abi_ulong uc_link;


Isn't it 'abi_ptr uc_link' instead?


Thanks, I think abi_ptr is better. As RISC-V doesn't has similar ABI as 
sparc32plus(64bit long but 32bit space address). It is also right here. 
And many arches  use the abi_ulong for uc_link, such as ARM.


I will send a v2 patch.

Zhiwei




  target_stack_t uc_stack;
  target_sigset_t uc_sigmask;
  uint8_t   __unused[1024 / 8 - sizeof(target_sigset_t)];

Re: CXL volatile memory is not listed

2023-08-10 Thread Maverickk 78

Thanks Phil, David and Fan

Looks like it was an error from my side due to lack of information
cxl create-region works :)


On Thu, 10 Aug 2023 at 16:29, Philippe Mathieu-Daudé  wrote:
>
> Hi,
>
> Cc'ing Igor and David.
>
> On 9/8/23 00:51, Maverickk 78 wrote:
> > Hello,
> >
> > I am running qemu-system-x86_64
> >
> > qemu-system-x86_64 --version
> > QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)
> >
> > qemu-system-x86_64 \
> > -m 2G,slots=4,maxmem=4G \
> > -smp 4 \
> > -machine type=q35,accel=kvm,cxl=on \
> > -enable-kvm \
> > -nographic \
> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> >
> >
> > I was expecting the CXL memory to be listed in "System Ram", the lsmem
> > shows only 2G memory which is System RAM, it's not listing the CXL
> > memory.
>
> Sounds like a bug. Do you mind reporting at
> https://gitlab.com/qemu-project/qemu/-/issues?
>
> Thanks,
>
> Phil.
>
> > Do I need to pass any particular parameter in the kernel command line?
> >
> > Is there any documentation available? I followed the inputs provided in
> >
> > https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/
> >
> > Is there any documentation/blog listed?
> >
>

Re: CXL volatile memory is not listed

2023-08-10 Thread Maverickk 78

Jonathan,

> More generally for the flow that would bring the memory up as system ram
> you would typically need the bios to have done the CXL enumeration or
> a bunch of scripts in the kernel to have done it.  In general it can't
> be fully automated, because there are policy decisions to make on things like
> interleaving.

BIOS CXL enumeration? is CEDT not enough? or BIOS further needs to
create an entry
in the e820 table?

>
> I'm not aware of any open source BIOSs that do it yet.  So you have
> to rely on the same kernel paths as for persistent memory - manual 
> configuration
> etc in the kernel.
>
Manual works with "cxl create regiton"  :)

On Thu, 10 Aug 2023 at 16:05, Jonathan Cameron
 wrote:
>
> On Wed, 9 Aug 2023 04:21:47 +0530
> Maverickk 78  wrote:
>
> > Hello,
> >
> > I am running qemu-system-x86_64
> >
> > qemu-system-x86_64 --version
> > QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)
> >
> +Cc linux-cxl as the answer is more todo with linux than qemu.
>
> > qemu-system-x86_64 \
> > -m 2G,slots=4,maxmem=4G \
> > -smp 4 \
> > -machine type=q35,accel=kvm,cxl=on \
> > -enable-kvm \
> > -nographic \
> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
>
> There are some problems upstream at the moment (probably not cxl related but
> I'm digging). So today I can't boot an x86 machine. (goody)
>
>
> More generally for the flow that would bring the memory up as system ram
> you would typically need the bios to have done the CXL enumeration or
> a bunch of scripts in the kernel to have done it.  In general it can't
> be fully automated, because there are policy decisions to make on things like
> interleaving.
>
> I'm not aware of any open source BIOSs that do it yet.  So you have
> to rely on the same kernel paths as for persistent memory - manual 
> configuration
> etc in the kernel.
>
> There is support in ndctl for those enabling flows, so I'd look there
> for more information
>
> Jonathan
>
>
> >
> >
> > I was expecting the CXL memory to be listed in "System Ram", the lsmem
> > shows only 2G memory which is System RAM, it's not listing the CXL
> > memory.
> >
> > Do I need to pass any particular parameter in the kernel command line?
> >
> > Is there any documentation available? I followed the inputs provided in
> >
> > https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/
> >
> > Is there any documentation/blog listed?
>

Re: [PATCH QEMU v2 0/3] provide a smooth upgrade solution for multi-queues disk

2023-08-10 Thread Yong Huang

Hi, Stefan, thank you for your interest in this series.
I'm trying to explain my point,  if you think my explanation
doesn't stand up, please let me know.

On Fri, Aug 11, 2023 at 2:33 AM Stefan Hajnoczi  wrote:

> On Thu, Aug 10, 2023 at 07:07:09AM +, ~hyman wrote:
> > Ping,
> >
> > This version is a copy of version 1 and is rebased
> > on the master. No functional changes.
> >
> > A 1:1 virtqueue:vCPU mapping implementation for virtio-*-pci disk
> > introduced since qemu >= 5.2.0, which improves IO performance
> > remarkably. To enjoy this feature for exiting running VMs without
> > service interruption, the common solution is to migrate VMs from the
> > lower version of the hypervisor to the upgraded hypervisor, then wait
> > for the next cold reboot of the VM to enable this feature. That's the
> > way "discard" and "write-zeroes" features work.
> >
> > As to multi-queues disk allocation automatically, it's a little
> > different because the destination will allocate queues to match the
> > number of vCPUs automatically by default in the case of live migration,
> > and the VMs on the source side remain 1 queue by default, which results
> > in migration failure due to loading disk VMState incorrectly on the
> > destination side.
>
> Are you using QEMU's versioned machine types to freeze the VM
> configuration?


> If not, then live migration won't work reliably because you're migrating
> between two potentially different VM configurations. This issue is not
> specific to num-queues, it affects all device properties.
>
> In commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
> default num_queues to -smp N") the num_queues property is set to 1 for
> versioned machine types <=5.1:
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 9ee2aa0f7b..7f65fa8743 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -31,6 +31,7 @@
>  GlobalProperty hw_compat_5_1[] = {
>  { "vhost-scsi", "num_queues", "1"},
>  { "vhost-user-scsi", "num_queues", "1"},
> +{ "virtio-blk-device", "num-queues", "1"},
>  { "virtio-scsi-device", "num_queues", "1"},
>  };
>  const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);
>
> Live migration works when the source and destination QEMU are launched
> with the same versioned machine type. You can check the "info qtree"
> output to confirm that starting a VM with -smp 4 -M pc-q35-5.1 results
> in num-queues=1 while -smp 4 -M pc-q35-5.2 results in num-queues=4.
>
> > This issue requires Qemu to provide a hint that shows
> > multi-queues disk allocation is automatically supported, and this allows
> > upper APPs, e.g., libvirt, to recognize the hypervisor's capability of
> > this. And upper APPs can ensure to allocate the same num-queues on the
> > destination side in case of migration failure.
> >
> > To fix the issue, we introduce the auto-num-queues property for
> > virtio-*-pci as a solution, which would be probed by APPs, e.g., libvirt
> > by querying the device properties of QEMU. When launching live
> > migration, libvirt will send the auto-num-queues property as a migration
> > cookie to the destination, and thus the destination knows if the source
> > side supports auto-num-queues. If not, the destination would switch off
> > by building the command line with "auto-num-queues=off" when preparing
> > the incoming VM process. The following patches of libvirt show how it
> > roughly works:
> >
> https://github.com/newfriday/libvirt/commit/ce2bae2e1a6821afeb80756dc01f3680f525e506
> >
> https://github.com/newfriday/libvirt/commit/f546972b009458c88148fe079544db7e9e1f43c3
> >
> https://github.com/newfriday/libvirt/commit/5ee19c8646fdb4d87ab8b93f287c20925268ce83
> >
> > The smooth upgrade solution requires the introduction of the auto-num-
> > queues property on the QEMU side, which is what the patch set does. I'm
> > hoping for comments about the series.
>
> Please take a look at versioned machine types. I think auto-num-queues
> is not necessary if you use versioned machine types.
>
> If you do think auto-num-queues is needed, please explain the issue in
> more detail and state why versioned machine types don't help.


"Using the versioned machine types" is indeed the standard way to ensure
the proper functioning of live migration.

However, a stable version is strongly advised to maintain function in our
production environment and perhaps practically all the production
environments in other businesses. As a result, we must backport features
like "auto-allocation num-queues" while keeping the machine type the same.

This patch set is posted for that reason. The "feature-backport" scenario
is its target. I'm not sure if the upstream development strategy should
take this scenario into account; if it does, perhaps this patch set can be
of use. After all, the primary goal of this patch set is to broaden the uses
for this feature and essentially makes no functional changes.




> Thanks,
> Stefan
>
> >
> > Please review, thanks.
> >

Re: LTP test related to virtio releasing and reassigning resource leads to guest hung

2023-08-10 Thread longguang.yue

一） 
Can you post the guest kernel messages (dmesg)? If the guest is hanging
then it may be easiest to configure a serial console so the kernel
messages are sent to the host where you can see them.

Does the hang occur during the LTP code you linked or afterwards when
the PCI device is bound to a virtio driver?

>   I used conosle, the hang occurred afterwards.   dmesg shows that tpci test 
> is finished without error.
LTP test case: 
https://github.com/linux-test-project/ltp/blob/522d7fba4afc84e07b252aa4cd91b241e81d6613/testcases/kernel/device-drivers/pci/tpci_kernel/ltp_tpci.c#L428
kernel 5.10, qemu 6.2

different guest-configuration tests show different results.  guest did not 
crash if hung-task-panic=0, in my case  i enable hung-task-panic in order to 
trace.

test case 1:
xml machine pc,virtio disk, virtio net ——  guest's io hung, network broke down, 
though console is avilable but io operation hung.

#ps -aux| grep D

root   7  0.0  0.0  0 0 ?D14:37   0:00 
[kworker/u16:0+flush-253:0]
root 483  0.0  0.0  0 0 ?  D14:37   0:00 [jbd2/vda3-8]

test case 2:
xml machine q35,virtio/q35,scsi ——disk did not hung but network broke down. 
ping errors though everything looks ok and no crash and no kernel error

二）
I didn't see your original email so I missed the panic. I'd still like
to see the earlier kernel messages before the panic in order to
understand how the PCI device is bound.

Is the vda device with hung I/O the same device that was accessed by
the LTP test earlier? I guess the LTP test runs against the device and
then the virtio driver binds to the device again afterwards?

> the test is 
```
// iterate all devices
……
for (i = 0; i < 7; ++i) {  // iterate current device's resources

  if (r->flags & IORESOURCE_MEM &&
  r->flags & IORESOURCE_PREFETCH) {
  pci_release_resource(dev, i);
  ret = pci_assign_resource(dev, i);
  prk_info("assign resource to '%d', ret '%d'", i, ret);
  rc |= (ret < 0 && ret != -EBUSY) ? TFAIL : TPASS;
  }

}
```
test does not do virtio device unbind and  bind. 
I only notice mem resource changed. see 'test-case 12'

———
[   88.905705] ltp_tpci: test-case 12
[   88.905706] ltp_tpci: assign resources
[   88.905706] ltp_tpci: assign resource #0
[   88.905707] ltp_tpci: name = :00:07.0, flags = 262401, start 0xc080, end 
0xc0ff
[   88.905707] ltp_tpci: assign resource #1
[   88.905708] ltp_tpci: name = :00:07.0, flags = 262656, start 0xfebd4000, 
end 0xfebd4fff
[   88.905709] ltp_tpci: assign resource #2
[   88.905709] ltp_tpci: name = :00:07.0, flags = 0, start 0x0, end 0x0
[   88.905710] ltp_tpci: assign resource #3
[   88.905710] ltp_tpci: name = :00:07.0, flags = 0, start 0x0, end 0x0
[   88.905711] ltp_tpci: assign resource #4
[   88.905711] ltp_tpci: name = :00:07.0, flags = 1319436, start 
0xfe00c000, end 0xfe00
[   88.905713] virtio-pci :00:07.0: BAR 4: releasing [mem 
0xfe00c000-0xfe00 64bit pref]
[   88.905715] virtio-pci :00:07.0: BAR 4: assigned [mem 
0x24000c000-0x24000 64bit pref]
[   88.906693] ltp_tpci: assign resource to '4', ret '0'
[   88.906694] ltp_tpci: assign resource #5
[   88.906694] ltp_tpci: name = (null), flags = 0, start 0x0, end 0x0
[   88.906695] ltp_tpci: assign resource #6
[   88.906695] ltp_tpci: name = :00:07.0, flags = 0, start 0x0, end 0x0

[   88.906800] ltp_tpci: test-case 13

 Replied Message 
| From | Stefan Hajnoczi |
| Date | 08/10/2023 23:24 |
| To | Stefan Hajnoczi |
| Cc | longguang.yue ,
Michael Tokarev ,
m...@redhat.com ,
qemu-devel ,
linux-kernel |
| Subject | Re: LTP test related to virtio releasing and reassigning resource 
leads to guest hung |
On Thu, 10 Aug 2023 at 10:14, Stefan Hajnoczi  wrote:

On Thu, Aug 10, 2023 at 06:35:32PM +0800, longguang.yue wrote:
could you please give me some tips to diagnose?  I could do tests on qemu 8.0, 
but product environment could not update.
I test on different kernel version 5.10.0-X, one is better and results show 
problem is more about host kernel  rather than qemu.

test cases are different combination of i440fx/q35 and virtio/scsi and kernel.

Can you post the guest kernel messages (dmesg)? If the guest is hanging
then it may be easiest to configure a serial console so the kernel
messages are sent to the host where you can see them.

Does the hang occur during the LTP code you linked or afterwards when
the PCI device is bound to a virtio driver?

I didn't see your original email so I missed the panic. I'd still like
to see the earlier kernel messages before the panic in order to
understand how the PCI device is bound.

Is the vda device with hung I/O the same device that was accessed by
the LTP test earlier? I guess the LTP test runs against the device and
then the virtio driver binds to the device again afterwards?

Which virtio device causes the problem?

Can you describe the hang in more detail: is the guest still responsive
(e.g. console or

Re: CXL volatile memory is not listed

2023-08-10 Thread Maverickk 78

Thanks Fan,

cxl create-region works like a charm :)

Since this gets listed as "System Ram(kmem)", I guess the kernel
treats it as regular memory and
allocates it to the applications when needed?
or is there an extra effort needed to make it available for
applications on the host?

On Thu, 10 Aug 2023 at 22:03, Fan Ni  wrote:
>
> On Wed, Aug 09, 2023 at 04:21:47AM +0530, Maverickk 78 wrote:
> > Hello,
> >
> > I am running qemu-system-x86_64
> >
> > qemu-system-x86_64 --version
> > QEMU emulator version 8.0.92 (v8.1.0-rc2-80-g0450cf0897)
> >
> > qemu-system-x86_64 \
> > -m 2G,slots=4,maxmem=4G \
> > -smp 4 \
> > -machine type=q35,accel=kvm,cxl=on \
> > -enable-kvm \
> > -nographic \
> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> > -object memory-backend-file,id=mem0,mem-path=/tmp/mem0,size=1G,share=true \
> > -device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G
> >
> >
> > I was expecting the CXL memory to be listed in "System Ram", the lsmem
> > shows only 2G memory which is System RAM, it's not listing the CXL
> > memory.
> >
> > Do I need to pass any particular parameter in the kernel command line?
> >
> > Is there any documentation available? I followed the inputs provided in
> >
> > https://lore.kernel.org/linux-mm/y+csoehvlkudn...@kroah.com/T/
> >
> > Is there any documentation/blog listed?
>
> If I remember it correctly, for volatile cxl memory, we need to create a
> region and then it will be discovered as system memory and shows up.
>
> Try to create a region with "cxl create-region".
>
> Fan
> >

[RFC v2 PATCH] record-replay: support SMP target machine

2023-08-10 Thread Nicholas Piggin

RR CPU switching is driven by timers and events so it is deterministic
like everything else. Record a CPU switch event and use that to drive
the CPU switch on replay.

Signed-off-by: Nicholas Piggin 
---
This is still in RFC phase because so far I've only really testd ppc
pseries, and only with patches that are not yet upstream (but posted
to list).

It works with smp 2, can step, reverse-step, reverse-continue, etc.
throughout a Linux boot.

One issue is reverse-step on one gdb thread (vCPU) only steps back one
icount, so if another thread is currently running then it is that one
which goes back one instruction and the selected thread doesn't move. I
would call this a separate issue from the record-replay mechanism, which
is in the replay-debugging policy. I think we could record in each vCPU
an icount of the last instruction it executed before switching, then
reverse step for that vCPU could replay to there. I think that's not so
important yet until this mechanism is solid. But if you test and rsi is
not going backwards, then check your other threads.

Thanks,
Nick


 accel/tcg/tcg-accel-ops-icount.c |  9 +++-
 accel/tcg/tcg-accel-ops-rr.c | 73 +---
 include/exec/replay-core.h   |  3 ++
 replay/replay-internal.h |  1 +
 replay/replay.c  | 34 ++-
 scripts/replay-dump.py   |  5 +++
 softmmu/vl.c |  4 --
 7 files changed, 115 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/tcg-accel-ops-icount.c b/accel/tcg/tcg-accel-ops-icount.c
index 3d2cfbbc97..c26782a56a 100644
--- a/accel/tcg/tcg-accel-ops-icount.c
+++ b/accel/tcg/tcg-accel-ops-icount.c
@@ -93,10 +93,15 @@ void icount_handle_deadline(void)
 int64_t icount_percpu_budget(int cpu_count)
 {
 int64_t limit = icount_get_limit();
-int64_t timeslice = limit / cpu_count;
+int64_t timeslice;
 
-if (timeslice == 0) {
+if (replay_mode == REPLAY_MODE_PLAY) {
 timeslice = limit;
+} else {
+timeslice = limit / cpu_count;
+if (timeslice == 0) {
+timeslice = limit;
+}
 }
 
 return timeslice;
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index 212d6f8df4..ce040a687e 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -27,6 +27,7 @@
 #include "qemu/lockable.h"
 #include "sysemu/tcg.h"
 #include "sysemu/replay.h"
+#include "sysemu/reset.h"
 #include "sysemu/cpu-timers.h"
 #include "qemu/main-loop.h"
 #include "qemu/notify.h"
@@ -61,6 +62,22 @@ void rr_kick_vcpu_thread(CPUState *unused)
 
 static QEMUTimer *rr_kick_vcpu_timer;
 static CPUState *rr_current_cpu;
+static CPUState *rr_next_cpu;
+static CPUState *rr_last_cpu;
+
+/*
+ * Reset the vCPU scheduler to the initial state.
+ */
+static void record_replay_reset(void *param)
+{
+if (rr_kick_vcpu_timer) {
+timer_del(rr_kick_vcpu_timer);
+}
+g_assert(!rr_current_cpu);
+rr_next_cpu = NULL;
+rr_last_cpu = first_cpu;
+current_cpu = NULL;
+}
 
 static inline int64_t rr_next_kick_time(void)
 {
@@ -184,6 +201,8 @@ static void *rr_cpu_thread_fn(void *arg)
 Notifier force_rcu;
 CPUState *cpu = arg;
 
+qemu_register_reset(record_replay_reset, NULL);
+
 assert(tcg_enabled());
 rcu_register_thread();
 force_rcu.notify = rr_force_rcu;
@@ -238,14 +257,20 @@ static void *rr_cpu_thread_fn(void *arg)
 cpu_budget = icount_percpu_budget(cpu_count);
 }
 
+if (!rr_next_cpu) {
+qatomic_set_mb(_next_cpu, first_cpu);
+}
+cpu = rr_next_cpu;
+
+if (cpu != rr_last_cpu) {
+replay_switch_cpu();
+qatomic_set_mb(_last_cpu, cpu);
+}
+
 rr_start_kick_timer();
 
 replay_mutex_unlock();
 
-if (!cpu) {
-cpu = first_cpu;
-}
-
 while (cpu && cpu_work_list_empty(cpu) && !cpu->exit_request) {
 /* Store rr_current_cpu before evaluating cpu_can_run().  */
 qatomic_set_mb(_current_cpu, cpu);
@@ -284,7 +309,34 @@ static void *rr_cpu_thread_fn(void *arg)
 break;
 }
 
-cpu = CPU_NEXT(cpu);
+if (replay_mode == REPLAY_MODE_NONE) {
+cpu = CPU_NEXT(cpu);
+} else if (replay_mode == REPLAY_MODE_RECORD) {
+/*
+ * Exit the loop immediately so CPU switch events can be
+ * recorded. This may be able to be improved to record
+ * switch events here.
+ */
+cpu = CPU_NEXT(cpu);
+break;
+} else if (replay_mode == REPLAY_MODE_PLAY) {
+/*
+ * Play can exit from tcg_cpus_exec at different times than
+ * record, because icount budget is set to next non-insn
+ * event which could be an exception or something that
+ * tcg_cpus_exec can process

[ANNOUNCE] QEMU 8.1.0-rc3 is now available

2023-08-10 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
fourth release candidate for the QEMU 8.1 release. This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu.org/qemu-8.1.0-rc3.tar.xz
  http://download.qemu.org/qemu-8.1.0-rc3.tar.xz.sig

You can help improve the quality of the QEMU 8.1 release by testing this
release and reporting bugs using our GitLab issue tracker:

  https://gitlab.com/qemu-project/qemu/-/milestones/8#tab-issues

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/8.1

Please add entries to the ChangeLog for the 8.1 release below:

  http://wiki.qemu.org/ChangeLog/8.1

Thank you to everyone involved!

Changes since rc2:

3944e93af0: Update version for v8.1.0-rc3 release (Richard Henderson)
f1b0f894c8: gdbstub: don't complain about preemptive ACK chars (Alex Bennée)
3869eb7eee: gdbstub: more fixes for client Ctrl-C handling (Alex Bennée)
dad1036f43: tests/tcg: ensure system-mode gdb tests start stopped (Alex Bennée)
6a2c23ddeb: accel/tcg: Avoid reading too much in load_atom_{2,4} (Richard 
Henderson)
b8002058c4: linux-user: Fix openat() emulation to correctly detect accesses to 
/proc (Helge Deller)
47d1e98231: util/interval-tree: Check root for null in interval_tree_iter_first 
(Helge Deller)
1b65895ddd: tests/tcg: Disable filename test for info proc mappings (Richard 
Henderson)
a05cee93f4: linux-user: Use ARRAY_SIZE with bitmask_transtbl (Richard Henderson)
9ab8d07149: linux-user: Split out do_mmap (Richard Henderson)
3439ba9c5d: hw/nvme: fix null pointer access in ruh update (Klaus Jensen)
6c8f8456cb: hw/nvme: fix null pointer access in directive receive (Klaus Jensen)
c42e77a90d: qemu/osdep: Remove fallback for MAP_FIXED_NOREPLACE (Richard 
Henderson)
dd55885516: linux-user: Rewrite non-fixed probe_guest_base (Richard Henderson)
06f38c6688: linux-user: Rewrite fixed probe_guest_base (Richard Henderson)
0c441aeb39: linux-user: Consolidate guest bounds check in probe_guest_base 
(Richard Henderson)
435c042fdc: linux-user: Remove duplicate CPU_LOG_PAGE from probe_guest_base 
(Richard Henderson)
3ce3dd8ca9: util/selfmap: Rewrite using qemu/interval-tree.h (Richard Henderson)
5f4e5b3409: linux-user: Use zero_bss for PT_LOAD with no file contents too 
(Richard Henderson)
2d385be615: linux-user: Do not adjust zero_bss for host page size (Richard 
Henderson)
e3d97d5c5d: linux-user: Do not adjust image mapping for host page size (Richard 
Henderson)
1f356e8c01: linux-user: Adjust initial brk when interpreter is close to 
executable (Helge Deller)
1ea06ded0d: linux-user: Use elf_et_dyn_base for ET_DYN with interpreter 
(Richard Henderson)
ad25051bae: linux-user: Use MAP_FIXED_NOREPLACE for initial image mmap (Richard 
Henderson)
da2b71fab6: linux-user: Define ELF_ET_DYN_BASE in $guest/target_mman.h (Richard 
Henderson)
2d708164e0: linux-user: Define TASK_UNMAPPED_BASE in $guest/target_mman.h 
(Richard Henderson)
c8fb5cf97d: linux-user: Adjust task_unmapped_base for reserved_va (Richard 
Henderson)
971fac2731: configure: unify case statements for CPU canonicalization (Paolo 
Bonzini)
50a0012227: linux-user: cleanup unused linux-user/include/host directories 
(Paolo Bonzini)
f140823c56: configure: fix detection for x32 linux-user (Paolo Bonzini)
ec5a138ce6: docs: update hw/nvme documentation for protection information 
(Ankit Kumar)
dbdb13f931: hw/nvme: fix CRC64 for guard tag (Ankit Kumar)
58ea90f803: ui/gtk: set scanout mode in gd_egl/gd_gl_area_scanout_texture 
(Dongwon Kim)
fdd649538e: hw/i386/vmmouse:add relative packet flag for button status (Zongmin 
Zhou)
8a64609eea: dump: kdump-zlib data pages not dumped with pvtime/aarch64 (Dongli 
Zhang)
a41e2d97f9: virtio-gpu: reset gfx resources in main thread (Marc-André Lureau)
957d77863e: virtio-gpu: free BHs, by implementing unrealize (Marc-André Lureau)
81cd34a359: chardev: report the handshake error (Marc-André Lureau)
6ee960823d: Fixed incorrect LLONG alignment for openrisc and cris (Luca Bonissi)
beb1a91781: stubs/colo.c: spelling (Michael Tokarev)
8ada214a90: hw/i2c: Fix bitbang_i2c_data trace event (BALATON Zoltan)
6a33f2e920: hw/nvme: fix compliance issue wrt. iosqes/iocqes (Klaus Jensen)
ecb1b7b082: hw/nvme: fix oob memory read in fdp events log (Klaus Jensen)
3c4a8a8fda: bsd-user: Remove last_brk (Richard Henderson)
62cbf08150: linux-user: Remove last_brk (Richard Henderson)
0662a626a7: linux-user: Properly set image_info.brk in flatload (Richard 
Henderson)
2aea137a42: linux-user: Do not align brk with host page size (Akihiko Odaki)
cb9d5d1fda: linux-user: Do nothing if too small brk is specified (Akihiko Odaki)
e69e032d1a: linux-user: Use MAP_FIXED_NOREPLACE for do_brk() (Akihiko Odaki)
c6cc059eca: linux-user: Do not call get_errno() in do_brk() (Akihiko Odaki)
ddcdd8c48f: linux-user: Fix MAP_FIXED_NOREPLACE on old kernels (Akihiko Odaki)
c3dd50da0f: linux-user: Unset

[PATCH v2] util: Delete checks for old host definitions

2023-08-10 Thread Akihiko Odaki

IA-64 and PA-RISC host support is already removed with commit
b1cef6d02f("Drop remaining bits of ia64 host support").

Signed-off-by: Akihiko Odaki 
---
 util/async-teardown.c |  3 ---
 util/oslib-posix.c| 14 ++
 2 files changed, 2 insertions(+), 15 deletions(-)

diff --git a/util/async-teardown.c b/util/async-teardown.c
index 62cdeb0f20..396963c091 100644
--- a/util/async-teardown.c
+++ b/util/async-teardown.c
@@ -121,10 +121,7 @@ static void *new_stack_for_clone(void)
 
 /* Allocate a new stack and get a pointer to its top. */
 stack_ptr = qemu_alloc_stack(_size);
-#if !defined(HOST_HPPA)
-/* The top is at the end of the area, except on HPPA. */
 stack_ptr += stack_size;
-#endif
 
 return stack_ptr;
 }
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 760390b31e..6da3cc5014 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -585,7 +585,7 @@ char *qemu_get_pid_name(pid_t pid)
 
 void *qemu_alloc_stack(size_t *sz)
 {
-void *ptr, *guardpage;
+void *ptr;
 int flags;
 #ifdef CONFIG_DEBUG_STACK_USAGE
 void *ptr2;
@@ -618,17 +618,7 @@ void *qemu_alloc_stack(size_t *sz)
 abort();
 }
 
-#if defined(HOST_IA64)
-/* separate register stack */
-guardpage = ptr + (((*sz - pagesz) / 2) & ~pagesz);
-#elif defined(HOST_HPPA)
-/* stack grows up */
-guardpage = ptr + *sz - pagesz;
-#else
-/* stack grows down */
-guardpage = ptr;
-#endif
-if (mprotect(guardpage, pagesz, PROT_NONE) != 0) {
+if (mprotect(ptr, pagesz, PROT_NONE) != 0) {
 perror("failed to set up stack guard page");
 abort();
 }
-- 
2.41.0

[PATCH 0/3] Fix the build on CentOS 7

2023-08-10 Thread Ilya Leoshkevich

Hi,

I know that CentOS 7 is not tested anymore, but unfortunately it's the
only ppc64le system that I have, so I had to fix a few build issues
that crept in since the testing stopped. The fixes are simple and may
be helpful to people in the same situation.

Best regards,
Ilya

Ilya Leoshkevich (3):
  linux-user: Fix the build on systems without SOL_ALG
  linux-user: Fix the build on systems without MAP_SHARED_VALIDATE
  linux-user: Fix the build on systems without MADV_{KEEP,WIPE}ONFORK

 linux-user/mmap.c| 1 +
 linux-user/syscall.c | 3 +++
 2 files changed, 4 insertions(+)

-- 
2.41.0

[PATCH 1/3] linux-user: Fix the build on systems without SOL_ALG

2023-08-10 Thread Ilya Leoshkevich

Building QEMU on CentOS 7 fails, because there SOL_ALG is not defined.
There already exists #if defined(SOL_ALG) in do_setsockopt(); add it to
target_to_host_cmsg() as well.

Fixes: 27404b6c15c1 ("linux-user: Implement SOL_ALG encryption support")
Signed-off-by: Ilya Leoshkevich 
---
 linux-user/syscall.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 18d3107194b..42f4aed8e84 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1806,6 +1806,7 @@ static inline abi_long target_to_host_cmsg(struct msghdr 
*msgh,
 __get_user(cred->pid, _cred->pid);
 __get_user(cred->uid, _cred->uid);
 __get_user(cred->gid, _cred->gid);
+#ifdef SOL_ALG
 } else if (cmsg->cmsg_level == SOL_ALG) {
 uint32_t *dst = (uint32_t *)data;
 
@@ -1814,6 +1815,7 @@ static inline abi_long target_to_host_cmsg(struct msghdr 
*msgh,
 if (len >= sizeof(uint32_t)) {
 *dst = tswap32(*dst);
 }
+#endif
 } else {
 qemu_log_mask(LOG_UNIMP, "Unsupported ancillary data: %d/%d\n",
   cmsg->cmsg_level, cmsg->cmsg_type);
-- 
2.41.0

[PATCH 3/3] linux-user: Fix the build on systems without MADV_{KEEP, WIPE}ONFORK

2023-08-10 Thread Ilya Leoshkevich

CentOS 7 does not define MADV_KEEPONFORK and MADV_WIPEONFORK. Use
definitions provided by the QEMU's copy of linux/mman.h.

Fixes: 4530deb1 ("linux-user: Add emulation for MADV_WIPEONFORK and 
MADV_KEEPONFORK in madvise()")
Signed-off-by: Ilya Leoshkevich 
---
 linux-user/mmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 9aab48d4a30..127962c1c26 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -17,6 +17,7 @@
  *  along with this program; if not, see .
  */
 #include "qemu/osdep.h"
+#include 
 #include "trace.h"
 #include "exec/log.h"
 #include "qemu.h"
-- 
2.41.0

Re: [PATCH 2/3] linux-user: Fix the build on systems without MAP_SHARED_VALIDATE

2023-08-10 Thread Ilya Leoshkevich

On Fri, 2023-08-11 at 00:03 +0200, Helge Deller wrote:
> On 8/10/23 23:51, Ilya Leoshkevich wrote:
> > CentOS 7 does not define MAP_SHARED_VALIDATE. Use a definition
> > provided
> > by the QEMU's copy of linux/mman.h.
> > 
> > Fixes: 4b840f96096d ("linux-user: Populate more bits in
> > mmap_flags_tbl")
> > Signed-off-by: Ilya Leoshkevich 
> 
> Does it fix the missing MADV_WIPEONFORK as well?
> https://gitlab.com/qemu-project/qemu/-/issues/1824#note_1507837354
> 
> Helge

What a coincidence, that multiple people ran into this on the same day.

This should be fixed by [3/3] of this series.

Best regards,
Ilya

Re: [PATCH 2/3] linux-user: Fix the build on systems without MAP_SHARED_VALIDATE

2023-08-10 Thread Helge Deller


On 8/10/23 23:51, Ilya Leoshkevich wrote:

CentOS 7 does not define MAP_SHARED_VALIDATE. Use a definition provided
by the QEMU's copy of linux/mman.h.

Fixes: 4b840f96096d ("linux-user: Populate more bits in mmap_flags_tbl")
Signed-off-by: Ilya Leoshkevich 


Does it fix the missing MADV_WIPEONFORK as well?
https://gitlab.com/qemu-project/qemu/-/issues/1824#note_1507837354

Helge


---
  linux-user/syscall.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 42f4aed8e84..256f38cdd5d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -121,6 +121,7 @@
  #ifdef HAVE_BTRFS_H
  #include 
  #endif
+#include 
  #ifdef HAVE_DRM_H
  #include 
  #include

Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h

2023-08-10 Thread Helge Deller


On 8/10/23 23:31, Ilya Leoshkevich wrote:

On Mon, 2023-08-07 at 11:17 -0700, Richard Henderson wrote:

On 8/7/23 09:37, Richard Henderson wrote:

We will want to be able to search the set of mappings.
For this patch, the two users iterate the tree in order.

Signed-off-by: Richard Henderson 
---
   include/qemu/selfmap.h |  20 
   linux-user/elfload.c   |  14 +++--
   linux-user/syscall.c   |  15 +++---
   util/selfmap.c | 114 +---
-
   4 files changed, 96 insertions(+), 67 deletions(-)


I should note that, for 8.2, this will enable a rewrite of
open_self_maps_1 so that it
does not require page-by-page checking of page_get_flags.

My idea is that open_self_maps_1 would use walk_memory_regions to see
all guest memory
regions.  The per-region callback would cross-check with the host-
region interval tree to
find the dev+inode+path.

Cc Ilya and Helge, since there are two outstanding changes to
open_self_maps.


I think the rewrite is good.
My patches regarding the map aren't important, I can adjust them
afterwards and resend (if necessary).

Helge

[PATCH 2/3] linux-user: Fix the build on systems without MAP_SHARED_VALIDATE

2023-08-10 Thread Ilya Leoshkevich

CentOS 7 does not define MAP_SHARED_VALIDATE. Use a definition provided
by the QEMU's copy of linux/mman.h.

Fixes: 4b840f96096d ("linux-user: Populate more bits in mmap_flags_tbl")
Signed-off-by: Ilya Leoshkevich 
---
 linux-user/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 42f4aed8e84..256f38cdd5d 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -121,6 +121,7 @@
 #ifdef HAVE_BTRFS_H
 #include 
 #endif
+#include 
 #ifdef HAVE_DRM_H
 #include 
 #include 
-- 
2.41.0

Re: [PATCH] thunk: Delete checks for old host definitions

2023-08-10 Thread Helge Deller


On 8/10/23 23:29, Akihiko Odaki wrote:

On 2023/08/10 19:56, Philippe Mathieu-Daudé wrote:

Helge and myself sometime run the tests on a HPPA host


I think we mix up HOST and TARGET here
I run HPPA target (=guest) on x86-64 host.
That means, both qemu-hppa-user and qemu-hppa-system does
run fine for me (on x86-64 "HOST_X86_64" emulating HPPA).


(testing the QEMU tools). I guess remember John Paul
also runs some on Alpha (so Cc'ing him).


Yes, I think so.
If I find more time I'd like too.


Helge, what is your take on this?


This file is only used in userspace emulation so it's not a problem
for Alpha, which does no longer have userspace emulation.


Akihiko, your statement is correct, but somewhat misleading.
A native alpha machine (as host) isn't any longer able to run a linux-user
emulation to emulate Linux for some other architecture.
That's true.


The story is different for HPPA. HPPA has userspace emulation code
and there are also references for HOST_HPPA in coroutine code
(util/async-teardown.c and util/os-posix.c). Probably HPPA support is
broken both for userspace and system emulation. I think it's time to
think of dropping HPPA support for both of userspace and system
emulation.

HPPA as host... Yes, I think it doesn't make sense to try to emulate
something else on HPPA.

So, I think the patch below is OK.

Helge


On 8/8/23 17:23, Akihiko Odaki wrote:

Alpha, IA-64, and PA-RISC hosts are no longer supported.

Signed-off-by: Akihiko Odaki 
---
  include/exec/user/thunk.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 300a840d58..d9c131ec80 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -111,8 +111,7 @@ static inline int thunk_type_size(const argtype *type_ptr, 
int is_host)
  if (is_host) {
  #if defined(HOST_X86_64)
  return 8;
-#elif defined(HOST_ALPHA) || defined(HOST_IA64) || defined(HOST_MIPS) || \
-  defined(HOST_PARISC) || defined(HOST_SPARC64)
+#elif defined(HOST_MIPS) || defined(HOST_SPARC64)
  return 4;
  #elif defined(HOST_PPC)
  return sizeof(void *);

Re: [PATCH for-8.1 v10 10/14] util/selfmap: Rewrite using qemu/interval-tree.h

2023-08-10 Thread Ilya Leoshkevich

On Mon, 2023-08-07 at 11:17 -0700, Richard Henderson wrote:
> On 8/7/23 09:37, Richard Henderson wrote:
> > We will want to be able to search the set of mappings.
> > For this patch, the two users iterate the tree in order.
> > 
> > Signed-off-by: Richard Henderson 
> > ---
> >   include/qemu/selfmap.h |  20 
> >   linux-user/elfload.c   |  14 +++--
> >   linux-user/syscall.c   |  15 +++---
> >   util/selfmap.c | 114 +---
> > -
> >   4 files changed, 96 insertions(+), 67 deletions(-)
> 
> I should note that, for 8.2, this will enable a rewrite of
> open_self_maps_1 so that it 
> does not require page-by-page checking of page_get_flags.
> 
> My idea is that open_self_maps_1 would use walk_memory_regions to see
> all guest memory 
> regions.  The per-region callback would cross-check with the host-
> region interval tree to 
> find the dev+inode+path.
> 
> Cc Ilya and Helge, since there are two outstanding changes to
> open_self_maps.
> 
> 
> r~

My outstanding change should not be sensitive to this; it should be
possible to put it in both before or after the rewrite.



I really like this idea though, since I looked into ppc64le and there
printing maps is quite broken: it's not just that QEMU can't determine
the names of the mapped files, but also a number of regions are simply
missing. This also affects core dumps generated by GDB attached to
gdbstub.

For example, cat /proc/self/maps has the following internal page
layout:

startend  size prot
1000-1000d000 d000 r-x
1000d000-1001 3000 ---
1001-1001f000 f000 r--
1001f000-1002 1000 r--
1002-10021000 1000 rw-
1000-1001 0001 ---
1001-1081 0080 rw-
1081-1083 0002 r-x
1083-1083d000 d000 r-x
1083d000-1084 3000 ---
1084-1084f000 f000 r--
1084f000-1085 1000 r--
1085-10851000 1000 rw-
10851000-10852000 1000 rw-
1086-10861000 1000 r-x
1088-10a5 001d r-x
10a5-10a6 0001 r--
10a6-10a7 0001 rw-
10a7-10b7 0010 rw-
10b7-1742d000 068bd000 r--
7fffb22b-7fffb22e 0003 rw-

but prints only:

1000-1001 ---p  00:00 0   
1001-1081 rw-p  00:00 0   
[stack]
1081-1083 r-xp  fd:00 3049136 
/usr/lib64/ld-2.17.so
1088-10a5 r-xp  fd:00 3017372 
/usr/lib64/libc-2.17.so
10a5-10a6 r--p 001c fd:00 3017372 
/usr/lib64/libc-2.17.so
10a6-10a7 rw-p 001d fd:00 3017372 
/usr/lib64/libc-2.17.so
10a7-10b7 rw-p  00:00 0   
7fffb22b-7fffb22e rw-p  00:00 0   

I don't see a good way to prevent page_check_range() from rejecting
most of the mappings with the current code structure, but I think that
after the proposed rewrite it should begin to just work.

Re: [PATCH] thunk: Delete checks for old host definitions

2023-08-10 Thread Akihiko Odaki


On 2023/08/10 19:56, Philippe Mathieu-Daudé wrote:

Helge and myself sometime run the tests on a HPPA host
(testing the QEMU tools). I guess remember John Paul
also runs some on Alpha (so Cc'ing him).

Helge, what is your take on this?


This file is only used in userspace emulation so it's not a problem for 
Alpha, which does no longer have userspace emulation.


The story is different for HPPA. HPPA has userspace emulation code and 
there are also references for HOST_HPPA in coroutine code 
(util/async-teardown.c and util/os-posix.c). Probably HPPA support is 
broken both for userspace and system emulation. I think it's time to 
think of dropping HPPA support for both of userspace and system emulation.




On 8/8/23 17:23, Akihiko Odaki wrote:

Alpha, IA-64, and PA-RISC hosts are no longer supported.

Signed-off-by: Akihiko Odaki 
---
  include/exec/user/thunk.h | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/exec/user/thunk.h b/include/exec/user/thunk.h
index 300a840d58..d9c131ec80 100644
--- a/include/exec/user/thunk.h
+++ b/include/exec/user/thunk.h
@@ -111,8 +111,7 @@ static inline int thunk_type_size(const argtype 
*type_ptr, int is_host)

  if (is_host) {
  #if defined(HOST_X86_64)
  return 8;
-#elif defined(HOST_ALPHA) || defined(HOST_IA64) || defined(HOST_MIPS) 
|| \

-  defined(HOST_PARISC) || defined(HOST_SPARC64)
+#elif defined(HOST_MIPS) || defined(HOST_SPARC64)
  return 4;
  #elif defined(HOST_PPC)
  return sizeof(void *);

Re: Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-10 Thread Peter Xu

On Fri, Aug 11, 2023 at 01:06:12AM +0800, ThinerLogoer wrote:
> >I think we have the following options (there might be more)
> >
> >1) This patch.
> >
> >2) New flag for memory-backend-file. We already have "readonly" and 
> >"share=". I'm having a hard time coming up with a good name that really 
> >describes the subtle difference.
> >
> >3) Glue behavior to the QEMU machine
> >
> 
> 4) '-deny-private-discard' argv, or environment variable, or both

I'd personally vote for (2).  How about "fdperm"?  To describe when we want
to use different rw permissions on the file (besides the access permission
of the memory we already provided with "readonly"=XXX).  IIUC the only sane
value will be ro/rw/default, where "default" should just use the same rw
permission as the memory ("readonly"=XXX).

Would that be relatively clean and also work in this use case?

(the other thing I'd wish we don't have that fallback is, as long as we
 have any of that "fallback" we'll need to be compatible with it since
 then, and for ever...)

-- 
Peter Xu

Re: [PULL 1/1] target/openrisc: Set EPCR to next PC on FPE exceptions

2023-08-10 Thread Stafford Horne

On Thu, Aug 10, 2023 at 09:35:18AM +0300, Michael Tokarev wrote:
> 09.08.2023 23:34, Stafford Horne пишет:
> > The architecture specification calls for the EPCR to be set to "Address
> > of next not executed instruction" when there is a floating point
> > exception (FPE).  This was not being done, so fix it by using the same
> > pattern as syscall.  Also, we move this logic down to be done for
> > instructions not in the delay slot as called for by the architecture
> > manual.
> > 
> > Without this patch FPU exceptions will loop, as the exception handling
> > will always return back to the failed floating point instruction.
> > 
> > This was not noticed in earlier testing because:
> > 
> >   1. The compiler usually generates code which clobbers the input operand
> >  such as:
> > 
> >lf.div.s r19,r17,r19
> > 
> >   2. The target will store the operation output before to the register
> >  before handling the exception.  So an operation such as:
> > 
> >float a = 100.0f;
> >float b = 0.0f;
> >float c = a / b;/* lf.div.s r19,r17,r19 */
> > 
> >  Will first execute:
> > 
> >100 / 0-> Store inf to c (r19)
> >   -> triggering divide by zero exception
> >   -> handle and return
> > 
> >  Then it will execute:
> > 
> >100 / inf  -> Store 0 to c  (no exception)
> > 
> > To confirm the looping behavior and the fix I used the following:
> > 
> >  float fpu_div(float a, float b) {
> > float c;
> > asm volatile("lf.div.s %0, %1, %2"
> >   : "+r" (c)
> >   : "r" (a), "r" (b));
> > return c;
> >  }
> 
> Is it a -stable material?  It applies cleanly to 8.0 and 7.2.
> Or maybe it is not needed on older versions, not being noticed before?

I would say no, it will work on 8.0 an 7.2 but this code path is not very useful
withouth the other 8.1 Floating Point Exception handling updates.

-Stafford

Re: [PULL 0/4] tcg/gdbstub late fixes

2023-08-10 Thread Richard Henderson


On 8/10/23 11:08, Richard Henderson wrote:

The following changes since commit 64d3be986f9e2379bc688bf1d0aca0557e0035ca:

   Merge tag 'or1k-pull-request-20230809' of https://github.com/stffrdhrn/qemu 
into staging (2023-08-09 15:05:02 -0700)

are available in the Git repository at:

   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230810

for you to fetch changes up to f1b0f894c8c25f7ed24197ff130c7acb6b9fd6e7:

   gdbstub: don't complain about preemptive ACK chars (2023-08-10 11:04:34 
-0700)


accel/tcg: Avoid reading too much in load_atom_{2,4}
tests/tcg: ensure system-mode gdb tests start stopped
gdbstub: more fixes for client Ctrl-C handling


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

Re: [PATCH v1 3/8] hw/misc/xlnx-versal-cfu: Introduce a model of Xilinx Versal CFU_FDRO

2023-08-10 Thread Francisco Iglesias


Hi Peter,

On 2023-08-03 15:48, Peter Maydell wrote:

On Mon, 10 Jul 2023 at 15:03, Francisco Iglesias
 wrote:


Introduce a model of Xilinx Versal's Configuration Frame Unit's data out
port (CFU_FDRO).

Signed-off-by: Francisco Iglesias 
---
  hw/misc/xlnx-versal-cfu.c | 105 ++
  include/hw/misc/xlnx-versal-cfu.h |  11 
  2 files changed, 116 insertions(+)

diff --git a/hw/misc/xlnx-versal-cfu.c b/hw/misc/xlnx-versal-cfu.c
index cbd17d2351..528090ef1b 100644
--- a/hw/misc/xlnx-versal-cfu.c
+++ b/hw/misc/xlnx-versal-cfu.c
@@ -257,6 +257,26 @@ static void cfu_stream_write(void *opaque, hwaddr addr, 
uint64_t value,
  }
  }

+static uint64_t cfu_fdro_read(void *opaque, hwaddr addr, unsigned size)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(opaque);
+uint64_t ret = 0;
+
+if (s->fdro_data->len) {
+ret = g_array_index(s->fdro_data, uint32_t, 0);
+g_array_remove_index(s->fdro_data, 0);


This is pretty expensive because everything in the GArray
after element 0 must be copied downwards. Are you sure you
don't want a different data structure ?

What actually is this, and what are the operations we want
to do on it ?


Thank you very much for reviewing! Regarding above, it is a fifo so 
changed to use a Fifo32 in v2 and I also tried to update according to 
all other comments!


Best regards,
Francisco Iglesias




+}
+
+return ret;
+}
+
+static void cfu_fdro_write(void *opaque, hwaddr addr, uint64_t value,
+   unsigned size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Unsupported write from addr=%"
+  HWADDR_PRIx "\n", __func__, addr);
+}
+
  static const MemoryRegionOps cfu_stream_ops = {
  .read = cfu_stream_read,
  .write = cfu_stream_write,
@@ -267,6 +287,16 @@ static const MemoryRegionOps cfu_stream_ops = {
  },
  };

+static const MemoryRegionOps cfu_fdro_ops = {
+.read = cfu_fdro_read,
+.write = cfu_fdro_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
  static void cfu_apb_init(Object *obj)
  {
  XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(obj);
@@ -298,6 +328,24 @@ static void cfu_apb_init(Object *obj)
  sysbus_init_irq(sbd, >irq_cfu_imr);
  }

+static void cfu_fdro_init(Object *obj)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+memory_region_init_io(>iomem_fdro, obj, _fdro_ops, s,
+  TYPE_XLNX_VERSAL_CFU_FDRO, KEYHOLE_STREAM_4K);
+sysbus_init_mmio(sbd, >iomem_fdro);
+s->fdro_data = g_array_new(FALSE, FALSE, sizeof(uint32_t));
+}
+
+static void cfu_fdro_cfi_transfer_packet(XlnxCfiIf *cfi_if, XlnxCfiPacket *pkt)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(cfi_if);
+
+g_array_append_vals(s->fdro_data, >data[0], 4);
+}
+
  static Property cfu_props[] = {
  DEFINE_PROP_LINK("cframe0", XlnxVersalCFUAPB, cfg.cframe[0],
   TYPE_XLNX_CFI_IF, XlnxCfiIf *),
@@ -344,6 +392,41 @@ static const VMStateDescription vmstate_cfu_apb = {
  }
  };

+static int cfdro_reg_pre_save(void *opaque)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(opaque);
+
+if (s->fdro_data->len) {
+s->ro_data = (uint32_t *) s->fdro_data->data;
+s->ro_dlen = s->fdro_data->len;
+}


I think we need to initialise ro_data and ro_dlen in
the else case here as well. Otherwise they might have old
stale stuff in them that then goes into the migration stream.


+
+return 0;
+}
+
+static int cfdro_reg_post_load(void *opaque, int version_id)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(opaque);
+
+if (s->ro_dlen) {
+g_array_append_vals(s->fdro_data, s->ro_data, s->ro_dlen);
+}
+return 0;
+}
+
+static const VMStateDescription vmstate_cfu_fdro = {
+.name = TYPE_XLNX_VERSAL_CFU_FDRO,
+.version_id = 1,
+.minimum_version_id = 1,
+.pre_save = cfdro_reg_pre_save,
+.post_load = cfdro_reg_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_VARRAY_UINT32_ALLOC(ro_data, XlnxVersalCFUFDRO, ro_dlen,
+0, vmstate_info_uint32, uint32_t),


This kind of _ALLOC vmstate will cause the migration core
code to g_malloc() you a buffer for the data. We don't
free that anywhere (and if we have a subsequent vmsave
then we will overwrite the ro-data pointer, and leak the
memory).

It might be better to avoid the GArray and just directly
work with a g_malloc()'d buffer of our own, to fit better
with how the _ALLOC vmstate wants to work.


+VMSTATE_END_OF_LIST(),
+}
+};
+
  static void cfu_apb_class_init(ObjectClass *klass, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(klass);
@@ -353,6 +436,15 @@ static void cfu_apb_class_init(ObjectClass *klass, void 
*data)
  device_class_set_props(dc, cfu_props);
  }

+static void cfu_fdro_class_init(ObjectClass *klass,

[PATCH v2 5/8] hw/misc: Introduce a model of Xilinx Versal's CFRAME_REG

2023-08-10 Thread Francisco Iglesias

Introduce a model of Xilinx Versal's Configuration Frame controller
(CFRAME_REG).

Signed-off-by: Francisco Iglesias 
---
 MAINTAINERS  |   2 +
 hw/misc/meson.build  |   1 +
 hw/misc/xlnx-versal-cframe-reg.c | 753 +++
 include/hw/misc/xlnx-versal-cframe-reg.h | 289 +
 4 files changed, 1045 insertions(+)
 create mode 100644 hw/misc/xlnx-versal-cframe-reg.c
 create mode 100644 include/hw/misc/xlnx-versal-cframe-reg.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 847b997d73..645374c1d9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1041,6 +1041,8 @@ F: hw/misc/xlnx-cfi-if.c
 F: include/hw/misc/xlnx-cfi-if.h
 F: hw/misc/xlnx-versal-cfu.c
 F: include/hw/misc/xlnx-versal-cfu.h
+F: hw/misc/xlnx-versal-cframe-reg.c
+F: include/hw/misc/xlnx-versal-cframe-reg.h
 
 STM32F100
 M: Alexandre Iooss 
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index d95cc3fd87..1b425b03bd 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -99,6 +99,7 @@ system_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
   'xlnx-versal-pmc-iou-slcr.c',
   'xlnx-versal-cfu.c',
   'xlnx-cfi-if.c',
+  'xlnx-versal-cframe-reg.c',
 ))
 system_ss.add(when: 'CONFIG_STM32F2XX_SYSCFG', if_true: 
files('stm32f2xx_syscfg.c'))
 system_ss.add(when: 'CONFIG_STM32F4XX_SYSCFG', if_true: 
files('stm32f4xx_syscfg.c'))
diff --git a/hw/misc/xlnx-versal-cframe-reg.c b/hw/misc/xlnx-versal-cframe-reg.c
new file mode 100644
index 00..401bd7a4a9
--- /dev/null
+++ b/hw/misc/xlnx-versal-cframe-reg.c
@@ -0,0 +1,753 @@
+/*
+ * QEMU model of the Configuration Frame Control module
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc.
+ *
+ * Written by Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/register.h"
+#include "hw/registerfields.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+#include "hw/irq.h"
+#include "hw/misc/xlnx-versal-cframe-reg.h"
+
+#ifndef XLNX_VERSAL_CFRAME_REG_ERR_DEBUG
+#define XLNX_VERSAL_CFRAME_REG_ERR_DEBUG 0
+#endif
+
+#define KEYHOLE_STREAM_4K (4 * KiB)
+#define N_WORDS_128BIT 4
+#define MIG_CFRAME_SZ ((FRAME_NUM_WORDS + 1) * sizeof(uint32_t))
+
+#define MAX_BLOCKTYPE 6
+#define MAX_BLOCKTYPE_FRAMES 0xF
+
+enum {
+CFRAME_CMD_WCFG = 1,
+CFRAME_CMD_ROWON = 2,
+CFRAME_CMD_ROWOFF = 3,
+CFRAME_CMD_RCFG = 4,
+CFRAME_CMD_DLPARK = 5,
+};
+
+static void cfrm_imr_update_irq(XlnxVersalCFrameReg *s)
+{
+bool pending = s->regs[R_CFRM_ISR0] & ~s->regs[R_CFRM_IMR0];
+qemu_set_irq(s->irq_cfrm_imr, pending);
+}
+
+static void cfrm_isr_postw(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(reg->opaque);
+cfrm_imr_update_irq(s);
+}
+
+static uint64_t cfrm_ier_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(reg->opaque);
+
+s->regs[R_CFRM_IMR0] &= ~s->regs[R_CFRM_IER0];
+s->regs[R_CFRM_IER0] = 0;
+cfrm_imr_update_irq(s);
+return 0;
+}
+
+static uint64_t cfrm_idr_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(reg->opaque);
+
+s->regs[R_CFRM_IMR0] |= s->regs[R_CFRM_IDR0];
+s->regs[R_CFRM_IDR0] = 0;
+cfrm_imr_update_irq(s);
+return 0;
+}
+
+static uint64_t cfrm_itr_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(reg->opaque);
+
+s->regs[R_CFRM_ISR0] |= s->regs[R_CFRM_ITR0];
+s->regs[R_CFRM_ITR0] = 0;
+cfrm_imr_update_irq(s);
+return 0;
+}
+
+static void cframe_incr_far(XlnxVersalCFrameReg *s)
+{
+uint32_t faddr = ARRAY_FIELD_EX32(s->regs, FAR0, FRAME_ADDR);
+uint32_t blktype = ARRAY_FIELD_EX32(s->regs, FAR0, BLOCKTYPE);
+
+assert(blktype <= MAX_BLOCKTYPE);
+
+faddr++;
+if (faddr > s->cfg.blktype_num_frames[blktype]) {
+/* Restart from 0 and increment block type */
+faddr = 0;
+blktype++;
+
+assert(blktype <= MAX_BLOCKTYPE);
+
+ARRAY_FIELD_DP32(s->regs, FAR0, BLOCKTYPE, blktype);
+}
+
+ARRAY_FIELD_DP32(s->regs, FAR0, FRAME_ADDR, faddr);
+}
+
+static XlnxCFrame *cframes_get_frame(XlnxVersalCFrameReg *s, uint32_t addr)
+{
+for (int i = 0; i < s->cframes->len; i++) {
+XlnxCFrame *f = _array_index(s->cframes, XlnxCFrame, i);
+
+if (f->addr == addr) {
+return f;
+}
+}
+return NULL;
+}
+
+static void cframe_alloc(XlnxCFrame *f)
+{
+f->addr = 0;
+fifo32_create(>data, FRAME_NUM_WORDS);
+}
+
+static void cframe_move(XlnxCFrame *dst, XlnxCFrame *src)
+{
+fifo32_destroy(>data);
+dst[0] = src[0];
+}
+
+static void cfrm_fdri_post_write(RegisterInfo *reg, uint64_t val)
+{
+XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(reg->opaque);
+
+if (s->row_configured

[PATCH v2 3/8] hw/misc/xlnx-versal-cfu: Introduce a model of Xilinx Versal CFU_FDRO

2023-08-10 Thread Francisco Iglesias

Introduce a model of Xilinx Versal's Configuration Frame Unit's data out
port (CFU_FDRO).

Signed-off-by: Francisco Iglesias 
---
 hw/misc/xlnx-versal-cfu.c | 96 +++
 include/hw/misc/xlnx-versal-cfu.h | 12 
 2 files changed, 108 insertions(+)

diff --git a/hw/misc/xlnx-versal-cfu.c b/hw/misc/xlnx-versal-cfu.c
index b2dc6ab211..255c1bf4b8 100644
--- a/hw/misc/xlnx-versal-cfu.c
+++ b/hw/misc/xlnx-versal-cfu.c
@@ -264,6 +264,25 @@ static void cfu_stream_write(void *opaque, hwaddr addr, 
uint64_t value,
 }
 }
 
+static uint64_t cfu_fdro_read(void *opaque, hwaddr addr, unsigned size)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(opaque);
+uint64_t ret = 0;
+
+if (!fifo32_is_empty(>fdro_data)) {
+ret = fifo32_pop(>fdro_data);
+}
+
+return ret;
+}
+
+static void cfu_fdro_write(void *opaque, hwaddr addr, uint64_t value,
+   unsigned size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Unsupported write from addr=%"
+  HWADDR_PRIx "\n", __func__, addr);
+}
+
 static const MemoryRegionOps cfu_stream_ops = {
 .read = cfu_stream_read,
 .write = cfu_stream_write,
@@ -274,6 +293,16 @@ static const MemoryRegionOps cfu_stream_ops = {
 },
 };
 
+static const MemoryRegionOps cfu_fdro_ops = {
+.read = cfu_fdro_read,
+.write = cfu_fdro_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
 static void cfu_apb_init(Object *obj)
 {
 XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(obj);
@@ -305,6 +334,39 @@ static void cfu_apb_init(Object *obj)
 sysbus_init_irq(sbd, >irq_cfu_imr);
 }
 
+static void cfu_fdro_init(Object *obj)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+memory_region_init_io(>iomem_fdro, obj, _fdro_ops, s,
+  TYPE_XLNX_VERSAL_CFU_FDRO, KEYHOLE_STREAM_4K);
+sysbus_init_mmio(sbd, >iomem_fdro);
+fifo32_create(>fdro_data, 8 * KiB / sizeof(uint32_t));
+}
+
+static void cfu_fdro_reset_enter(Object *obj, ResetType type)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(obj);
+
+fifo32_reset(>fdro_data);
+}
+
+static void cfu_fdro_cfi_transfer_packet(XlnxCfiIf *cfi_if, XlnxCfiPacket *pkt)
+{
+XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(cfi_if);
+
+if (fifo32_num_free(>fdro_data) >= ARRAY_SIZE(pkt->data)) {
+for (int i = 0; i < ARRAY_SIZE(pkt->data); i++) {
+fifo32_push(>fdro_data, pkt->data[i]);
+}
+} else {
+/* It is a programming error to fill the fifo. */
+qemu_log_mask(LOG_GUEST_ERROR,
+  "CFU_FDRO: CFI data dropped due to full read fifo\n");
+}
+}
+
 static Property cfu_props[] = {
 DEFINE_PROP_LINK("cframe0", XlnxVersalCFUAPB, cfg.cframe[0],
  TYPE_XLNX_CFI_IF, XlnxCfiIf *),
@@ -351,6 +413,16 @@ static const VMStateDescription vmstate_cfu_apb = {
 }
 };
 
+static const VMStateDescription vmstate_cfu_fdro = {
+.name = TYPE_XLNX_VERSAL_CFU_FDRO,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_FIFO32(fdro_data, XlnxVersalCFUFDRO),
+VMSTATE_END_OF_LIST(),
+}
+};
+
 static void cfu_apb_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -360,6 +432,17 @@ static void cfu_apb_class_init(ObjectClass *klass, void 
*data)
 device_class_set_props(dc, cfu_props);
 }
 
+static void cfu_fdro_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
+XlnxCfiIfClass *xcic = XLNX_CFI_IF_CLASS(klass);
+
+dc->vmsd = _cfu_fdro;
+xcic->cfi_transfer_packet = cfu_fdro_cfi_transfer_packet;
+rc->phases.enter = cfu_fdro_reset_enter;
+}
+
 static const TypeInfo cfu_apb_info = {
 .name  = TYPE_XLNX_VERSAL_CFU_APB,
 .parent= TYPE_SYS_BUS_DEVICE,
@@ -372,9 +455,22 @@ static const TypeInfo cfu_apb_info = {
 }
 };
 
+static const TypeInfo cfu_fdro_info = {
+.name  = TYPE_XLNX_VERSAL_CFU_FDRO,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XlnxVersalCFUFDRO),
+.class_init= cfu_fdro_class_init,
+.instance_init = cfu_fdro_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_XLNX_CFI_IF },
+{ }
+}
+};
+
 static void cfu_apb_register_types(void)
 {
 type_register_static(_apb_info);
+type_register_static(_fdro_info);
 }
 
 type_init(cfu_apb_register_types)
diff --git a/include/hw/misc/xlnx-versal-cfu.h 
b/include/hw/misc/xlnx-versal-cfu.h
index 62d10caf27..73e9a21af4 100644
--- a/include/hw/misc/xlnx-versal-cfu.h
+++ b/include/hw/misc/xlnx-versal-cfu.h
@@ -20,10 +20,14 @@
 #include "hw/sysbus.h"
 #include "hw/register.h"
 #include "hw/misc/xlnx-cfi-if.h"
+#include "qemu/fifo32.h"
 
 #define

[PATCH v2 8/8] hw/arm/versal: Connect the CFRAME_REG and CFRAME_BCAST_REG

2023-08-10 Thread Francisco Iglesias

Connect the Configuration Frame controller (CFRAME_REG) and the
Configuration Frame broadcast controller (CFRAME_BCAST_REG) to the
Versal machine.

Signed-off-by: Francisco Iglesias 
---
 hw/arm/xlnx-versal.c | 113 ++-
 include/hw/arm/xlnx-versal.h |  69 +
 2 files changed, 181 insertions(+), 1 deletion(-)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index 3f4b4b1560..fa556d8764 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -27,7 +27,7 @@
 #define XLNX_VERSAL_RCPU_TYPE ARM_CPU_TYPE_NAME("cortex-r5f")
 #define GEM_REVISION0x40070106
 
-#define VERSAL_NUM_PMC_APB_IRQS 3
+#define VERSAL_NUM_PMC_APB_IRQS 18
 #define NUM_OSPI_IRQ_LINES 3
 
 static void versal_create_apu_cpus(Versal *s)
@@ -341,6 +341,7 @@ static void versal_create_pmc_apb_irq_orgate(Versal *s, 
qemu_irq *pic)
  *  - RTC
  *  - BBRAM
  *  - PMC SLCR
+ *  - CFRAME regs (input 3 - 17 to the orgate)
  */
 object_initialize_child(OBJECT(s), "pmc-apb-irq-orgate",
 >pmc.apb_irq_orgate, TYPE_OR_IRQ);
@@ -573,6 +574,42 @@ static void versal_create_ospi(Versal *s, qemu_irq *pic)
 static void versal_create_cfu(Versal *s, qemu_irq *pic)
 {
 SysBusDevice *sbd;
+DeviceState *dev;
+int i;
+const struct {
+uint64_t reg_base;
+uint64_t fdri_base;
+} cframe_addr[] = {
+{ MM_PMC_CFRAME0_REG, MM_PMC_CFRAME0_FDRI },
+{ MM_PMC_CFRAME1_REG, MM_PMC_CFRAME1_FDRI },
+{ MM_PMC_CFRAME2_REG, MM_PMC_CFRAME2_FDRI },
+{ MM_PMC_CFRAME3_REG, MM_PMC_CFRAME3_FDRI },
+{ MM_PMC_CFRAME4_REG, MM_PMC_CFRAME4_FDRI },
+{ MM_PMC_CFRAME5_REG, MM_PMC_CFRAME5_FDRI },
+{ MM_PMC_CFRAME6_REG, MM_PMC_CFRAME6_FDRI },
+{ MM_PMC_CFRAME7_REG, MM_PMC_CFRAME7_FDRI },
+{ MM_PMC_CFRAME8_REG, MM_PMC_CFRAME8_FDRI },
+{ MM_PMC_CFRAME9_REG, MM_PMC_CFRAME9_FDRI },
+{ MM_PMC_CFRAME10_REG, MM_PMC_CFRAME10_FDRI },
+{ MM_PMC_CFRAME11_REG, MM_PMC_CFRAME11_FDRI },
+{ MM_PMC_CFRAME12_REG, MM_PMC_CFRAME12_FDRI },
+{ MM_PMC_CFRAME13_REG, MM_PMC_CFRAME13_FDRI },
+{ MM_PMC_CFRAME14_REG, MM_PMC_CFRAME14_FDRI },
+};
+const struct {
+uint32_t blktype0_frames;
+uint32_t blktype1_frames;
+uint32_t blktype2_frames;
+uint32_t blktype3_frames;
+uint32_t blktype4_frames;
+uint32_t blktype5_frames;
+uint32_t blktype6_frames;
+} cframe_cfg[] = {
+[0] = { 34111, 3528, 12800, 11, 5, 1, 1 },
+[1] = { 38498, 3841, 15361, 13, 7, 3, 1 },
+[2] = { 38498, 3841, 15361, 13, 7, 3, 1 },
+[3] = { 38498, 3841, 15361, 13, 7, 3, 1 },
+};
 
 /* CFU FDRO */
 object_initialize_child(OBJECT(s), "cfu-fdro", >pmc.cfu_fdro,
@@ -583,10 +620,84 @@ static void versal_create_cfu(Versal *s, qemu_irq *pic)
 memory_region_add_subregion(>mr_ps, MM_PMC_CFU_FDRO,
 sysbus_mmio_get_region(sbd, 0));
 
+/* CFRAME REG */
+for (i = 0; i < ARRAY_SIZE(s->pmc.cframe); i++) {
+g_autofree char *name = g_strdup_printf("cframe%d", i);
+
+object_initialize_child(OBJECT(s), name, >pmc.cframe[i],
+TYPE_XLNX_VERSAL_CFRAME_REG);
+
+sbd = SYS_BUS_DEVICE(>pmc.cframe[i]);
+dev = DEVICE(>pmc.cframe[i]);
+
+if (i < ARRAY_SIZE(cframe_cfg)) {
+object_property_set_int(OBJECT(dev), "blktype0-frames",
+cframe_cfg[i].blktype0_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype1-frames",
+cframe_cfg[i].blktype1_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype2-frames",
+cframe_cfg[i].blktype2_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype3-frames",
+cframe_cfg[i].blktype3_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype4-frames",
+cframe_cfg[i].blktype4_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype5-frames",
+cframe_cfg[i].blktype5_frames,
+_abort);
+object_property_set_int(OBJECT(dev), "blktype6-frames",
+cframe_cfg[i].blktype6_frames,
+_abort);
+}
+object_property_set_link(OBJECT(dev), "cfu-fdro",
+ OBJECT(>pmc.cfu_fdro), _fatal);
+
+sysbus_realize(SYS_BUS_DEVICE(dev), _fatal);
+
+memory_region_add_subregion(>mr_ps, cframe_addr[i].reg_base,
+

[PATCH v2 7/8] hw/arm/xlnx-versal: Connect the CFU_APB, CFU_FDRO and CFU_SFR

2023-08-10 Thread Francisco Iglesias

Connect the Configuration Frame Unit (CFU_APB, CFU_FDRO and CFU_SFR) to
the Versal machine.

Signed-off-by: Francisco Iglesias 
Acked-by: Edgar E. Iglesias 
Reviewed-by: Peter Maydell 
---
 hw/arm/xlnx-versal.c | 42 
 include/hw/arm/xlnx-versal.h | 16 ++
 2 files changed, 58 insertions(+)

diff --git a/hw/arm/xlnx-versal.c b/hw/arm/xlnx-versal.c
index 60bf5fe657..3f4b4b1560 100644
--- a/hw/arm/xlnx-versal.c
+++ b/hw/arm/xlnx-versal.c
@@ -570,6 +570,47 @@ static void versal_create_ospi(Versal *s, qemu_irq *pic)
 qdev_connect_gpio_out(orgate, 0, pic[VERSAL_OSPI_IRQ]);
 }
 
+static void versal_create_cfu(Versal *s, qemu_irq *pic)
+{
+SysBusDevice *sbd;
+
+/* CFU FDRO */
+object_initialize_child(OBJECT(s), "cfu-fdro", >pmc.cfu_fdro,
+TYPE_XLNX_VERSAL_CFU_FDRO);
+sbd = SYS_BUS_DEVICE(>pmc.cfu_fdro);
+
+sysbus_realize(sbd, _fatal);
+memory_region_add_subregion(>mr_ps, MM_PMC_CFU_FDRO,
+sysbus_mmio_get_region(sbd, 0));
+
+/* CFU APB */
+object_initialize_child(OBJECT(s), "cfu-apb", >pmc.cfu_apb,
+TYPE_XLNX_VERSAL_CFU_APB);
+sbd = SYS_BUS_DEVICE(>pmc.cfu_apb);
+
+sysbus_realize(sbd, _fatal);
+memory_region_add_subregion(>mr_ps, MM_PMC_CFU_APB,
+sysbus_mmio_get_region(sbd, 0));
+memory_region_add_subregion(>mr_ps, MM_PMC_CFU_STREAM,
+sysbus_mmio_get_region(sbd, 1));
+memory_region_add_subregion(>mr_ps, MM_PMC_CFU_STREAM_2,
+sysbus_mmio_get_region(sbd, 2));
+sysbus_connect_irq(sbd, 0, pic[VERSAL_CFU_IRQ_0]);
+
+/* CFU SFR */
+object_initialize_child(OBJECT(s), "cfu-sfr", >pmc.cfu_sfr,
+TYPE_XLNX_VERSAL_CFU_SFR);
+
+sbd = SYS_BUS_DEVICE(>pmc.cfu_sfr);
+
+object_property_set_link(OBJECT(>pmc.cfu_sfr),
+"cfu", OBJECT(>pmc.cfu_apb), _abort);
+
+sysbus_realize(sbd, _fatal);
+memory_region_add_subregion(>mr_ps, MM_PMC_CFU_SFR,
+sysbus_mmio_get_region(sbd, 0));
+}
+
 static void versal_create_crl(Versal *s, qemu_irq *pic)
 {
 SysBusDevice *sbd;
@@ -763,6 +804,7 @@ static void versal_realize(DeviceState *dev, Error **errp)
 versal_create_pmc_iou_slcr(s, pic);
 versal_create_ospi(s, pic);
 versal_create_crl(s, pic);
+versal_create_cfu(s, pic);
 versal_map_ddr(s);
 versal_unimp(s);
 
diff --git a/include/hw/arm/xlnx-versal.h b/include/hw/arm/xlnx-versal.h
index 39ee31185c..29b9c60301 100644
--- a/include/hw/arm/xlnx-versal.h
+++ b/include/hw/arm/xlnx-versal.h
@@ -32,6 +32,7 @@
 #include "hw/misc/xlnx-versal-crl.h"
 #include "hw/misc/xlnx-versal-pmc-iou-slcr.h"
 #include "hw/net/xlnx-versal-canfd.h"
+#include "hw/misc/xlnx-versal-cfu.h"
 
 #define TYPE_XLNX_VERSAL "xlnx-versal"
 OBJECT_DECLARE_SIMPLE_TYPE(Versal, XLNX_VERSAL)
@@ -117,6 +118,9 @@ struct Versal {
 XlnxEFuse efuse;
 XlnxVersalEFuseCtrl efuse_ctrl;
 XlnxVersalEFuseCache efuse_cache;
+XlnxVersalCFUAPB cfu_apb;
+XlnxVersalCFUFDRO cfu_fdro;
+XlnxVersalCFUSFR cfu_sfr;
 
 OrIRQState apb_irq_orgate;
 } pmc;
@@ -147,6 +151,7 @@ struct Versal {
 #define VERSAL_GEM1_WAKE_IRQ_0 59
 #define VERSAL_ADMA_IRQ_0  60
 #define VERSAL_XRAM_IRQ_0  79
+#define VERSAL_CFU_IRQ_0   120
 #define VERSAL_PMC_APB_IRQ 121
 #define VERSAL_OSPI_IRQ124
 #define VERSAL_SD0_IRQ_0   126
@@ -240,6 +245,17 @@ struct Versal {
 #define MM_PMC_EFUSE_CACHE  0xf125
 #define MM_PMC_EFUSE_CACHE_SIZE 0x00C00
 
+#define MM_PMC_CFU_APB  0xf12b
+#define MM_PMC_CFU_APB_SIZE 0x1
+#define MM_PMC_CFU_STREAM   0xf12c
+#define MM_PMC_CFU_STREAM_SIZE  0x1000
+#define MM_PMC_CFU_SFR  0xf12c1000
+#define MM_PMC_CFU_SFR_SIZE 0x1000
+#define MM_PMC_CFU_FDRO 0xf12c2000
+#define MM_PMC_CFU_FDRO_SIZE0x1000
+#define MM_PMC_CFU_STREAM_2 0xf1f8
+#define MM_PMC_CFU_STREAM_2_SIZE0x4
+
 #define MM_PMC_CRP  0xf126U
 #define MM_PMC_CRP_SIZE 0x1
 #define MM_PMC_RTC  0xf12a
-- 
2.34.1

[PATCH v2 6/8] hw/misc: Introduce a model of Xilinx Versal's CFRAME_BCAST_REG

2023-08-10 Thread Francisco Iglesias

Introduce a model of Xilinx Versal's Configuration Frame broadcast
controller (CFRAME_BCAST_REG).

Signed-off-by: Francisco Iglesias 
---
 hw/misc/xlnx-versal-cframe-reg.c | 161 +++
 include/hw/misc/xlnx-versal-cframe-reg.h |  17 +++
 2 files changed, 178 insertions(+)

diff --git a/hw/misc/xlnx-versal-cframe-reg.c b/hw/misc/xlnx-versal-cframe-reg.c
index 401bd7a4a9..686425756b 100644
--- a/hw/misc/xlnx-versal-cframe-reg.c
+++ b/hw/misc/xlnx-versal-cframe-reg.c
@@ -582,6 +582,83 @@ static const MemoryRegionOps cframe_reg_fdri_ops = {
 },
 };
 
+static uint64_t cframes_bcast_reg_read(void *opaque, hwaddr addr, unsigned 
size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Unsupported read from addr=%"
+  HWADDR_PRIx "\n", __func__, addr);
+return 0;
+}
+
+static void cframes_bcast_write(XlnxVersalCFrameBcastReg *s, uint8_t reg_addr,
+uint32_t *wfifo)
+{
+XlnxCfiPacket pkt = {
+.reg_addr = reg_addr,
+.data[0] = wfifo[0],
+.data[1] = wfifo[1],
+.data[2] = wfifo[2],
+.data[3] = wfifo[3]
+};
+
+for (int i = 0; i < ARRAY_SIZE(s->cfg.cframe); i++) {
+if (s->cfg.cframe[i]) {
+xlnx_cfi_transfer_packet(s->cfg.cframe[i], );
+}
+}
+}
+
+static void cframes_bcast_reg_write(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+XlnxVersalCFrameBcastReg *s = XLNX_VERSAL_CFRAME_BCAST_REG(opaque);
+uint32_t wfifo[WFIFO_SZ];
+
+if (update_wfifo(addr, value, s->wfifo, wfifo)) {
+uint8_t reg_addr = extract32(addr, 4, 6);
+
+cframes_bcast_write(s, reg_addr, wfifo);
+}
+}
+
+static uint64_t cframes_bcast_fdri_read(void *opaque, hwaddr addr,
+unsigned size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Unsupported read from addr=%"
+  HWADDR_PRIx "\n", __func__, addr);
+return 0;
+}
+
+static void cframes_bcast_fdri_write(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+XlnxVersalCFrameBcastReg *s = XLNX_VERSAL_CFRAME_BCAST_REG(opaque);
+uint32_t wfifo[WFIFO_SZ];
+
+if (update_wfifo(addr, value, s->wfifo, wfifo)) {
+cframes_bcast_write(s, CFRAME_FDRI, wfifo);
+}
+}
+
+static const MemoryRegionOps cframes_bcast_reg_reg_ops = {
+.read = cframes_bcast_reg_read,
+.write = cframes_bcast_reg_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
+static const MemoryRegionOps cframes_bcast_reg_fdri_ops = {
+.read = cframes_bcast_fdri_read,
+.write = cframes_bcast_fdri_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
 static void cframe_reg_realize(DeviceState *dev, Error **errp)
 {
 XlnxVersalCFrameReg *s = XLNX_VERSAL_CFRAME_REG(dev);
@@ -719,6 +796,71 @@ static Property cframe_regs_props[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static void cframe_bcast_reg_init(Object *obj)
+{
+XlnxVersalCFrameBcastReg *s = XLNX_VERSAL_CFRAME_BCAST_REG(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+memory_region_init_io(>iomem_reg, obj, _bcast_reg_reg_ops, s,
+  TYPE_XLNX_VERSAL_CFRAME_BCAST_REG, 
KEYHOLE_STREAM_4K);
+memory_region_init_io(>iomem_fdri, obj, _bcast_reg_fdri_ops, s,
+  TYPE_XLNX_VERSAL_CFRAME_BCAST_REG "-fdri",
+  KEYHOLE_STREAM_4K);
+sysbus_init_mmio(sbd, >iomem_reg);
+sysbus_init_mmio(sbd, >iomem_fdri);
+}
+
+static void cframe_bcast_reg_reset_enter(Object *obj, ResetType type)
+{
+XlnxVersalCFrameBcastReg *s = XLNX_VERSAL_CFRAME_BCAST_REG(obj);
+
+memset(s->wfifo, 0, WFIFO_SZ * sizeof(uint32_t));
+}
+
+static const VMStateDescription vmstate_cframe_bcast_reg = {
+.name = TYPE_XLNX_VERSAL_CFRAME_BCAST_REG,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(wfifo, XlnxVersalCFrameBcastReg, 4),
+VMSTATE_END_OF_LIST(),
+}
+};
+
+static Property cframe_bcast_regs_props[] = {
+DEFINE_PROP_LINK("cframe0", XlnxVersalCFrameBcastReg, cfg.cframe[0],
+ TYPE_XLNX_CFI_IF, XlnxCfiIf *),
+DEFINE_PROP_LINK("cframe1", XlnxVersalCFrameBcastReg, cfg.cframe[1],
+ TYPE_XLNX_CFI_IF, XlnxCfiIf *),
+DEFINE_PROP_LINK("cframe2", XlnxVersalCFrameBcastReg, cfg.cframe[2],
+ TYPE_XLNX_CFI_IF, XlnxCfiIf *),
+DEFINE_PROP_LINK("cframe3", XlnxVersalCFrameBcastReg, cfg.cframe[3],
+ TYPE_XLNX_CFI_IF, XlnxCfiIf *),
+DEFINE_PROP_LINK("cframe4", XlnxVersalCFrameBcastReg, cfg.cframe[4],
+ TYPE_XLNX_CFI_IF, XlnxCfiIf *),
+DEFINE_PROP_LINK("cframe5", XlnxVersalCFrameBcastReg, cfg.cframe[5],
+ TYPE_XLNX_CFI_IF,

[PATCH v2 2/8] hw/misc: Introduce a model of Xilinx Versal's CFU_APB

2023-08-10 Thread Francisco Iglesias

Introduce a model of the software programming interface (CFU_APB) of
Xilinx Versal's Configuration Frame Unit.

Signed-off-by: Francisco Iglesias 
---
 MAINTAINERS   |   2 +
 hw/misc/meson.build   |   1 +
 hw/misc/xlnx-versal-cfu.c | 380 ++
 include/hw/misc/xlnx-versal-cfu.h | 231 ++
 4 files changed, 614 insertions(+)
 create mode 100644 hw/misc/xlnx-versal-cfu.c
 create mode 100644 include/hw/misc/xlnx-versal-cfu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index e0cd365462..847b997d73 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1039,6 +1039,8 @@ M: Francisco Iglesias 
 S: Maintained
 F: hw/misc/xlnx-cfi-if.c
 F: include/hw/misc/xlnx-cfi-if.h
+F: hw/misc/xlnx-versal-cfu.c
+F: include/hw/misc/xlnx-versal-cfu.h
 
 STM32F100
 M: Alexandre Iooss 
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 0c562f5e3e..d95cc3fd87 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -97,6 +97,7 @@ specific_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-crl.c'))
 system_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
   'xlnx-versal-xramc.c',
   'xlnx-versal-pmc-iou-slcr.c',
+  'xlnx-versal-cfu.c',
   'xlnx-cfi-if.c',
 ))
 system_ss.add(when: 'CONFIG_STM32F2XX_SYSCFG', if_true: 
files('stm32f2xx_syscfg.c'))
diff --git a/hw/misc/xlnx-versal-cfu.c b/hw/misc/xlnx-versal-cfu.c
new file mode 100644
index 00..b2dc6ab211
--- /dev/null
+++ b/hw/misc/xlnx-versal-cfu.c
@@ -0,0 +1,380 @@
+/*
+ * QEMU model of the CFU Configuration Unit.
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc.
+ *
+ * Written by Edgar E. Iglesias ,
+ *Sai Pavan Boddu ,
+ *Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/sysbus.h"
+#include "hw/register.h"
+#include "hw/irq.h"
+#include "qemu/bitops.h"
+#include "qemu/log.h"
+#include "qemu/units.h"
+#include "migration/vmstate.h"
+#include "hw/qdev-properties.h"
+#include "hw/qdev-properties-system.h"
+#include "hw/misc/xlnx-versal-cfu.h"
+
+#ifndef XLNX_VERSAL_CFU_APB_ERR_DEBUG
+#define XLNX_VERSAL_CFU_APB_ERR_DEBUG 0
+#endif
+
+#define KEYHOLE_STREAM_4K (4 * KiB)
+#define KEYHOLE_STREAM_256K (256 * KiB)
+#define CFRAME_BROADCAST_ROW 0x1F
+
+bool update_wfifo(hwaddr addr, uint64_t value,
+  uint32_t *wfifo, uint32_t *wfifo_ret)
+{
+unsigned int idx = extract32(addr, 2, 2);
+
+wfifo[idx] = value;
+
+if (idx == 3) {
+memcpy(wfifo_ret, wfifo, WFIFO_SZ * sizeof(uint32_t));
+memset(wfifo, 0, WFIFO_SZ * sizeof(uint32_t));
+return true;
+}
+
+return false;
+}
+
+static void cfu_imr_update_irq(XlnxVersalCFUAPB *s)
+{
+bool pending = s->regs[R_CFU_ISR] & ~s->regs[R_CFU_IMR];
+qemu_set_irq(s->irq_cfu_imr, pending);
+}
+
+static void cfu_isr_postw(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(reg->opaque);
+cfu_imr_update_irq(s);
+}
+
+static uint64_t cfu_ier_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(reg->opaque);
+uint32_t val = val64;
+
+s->regs[R_CFU_IMR] &= ~val;
+cfu_imr_update_irq(s);
+return 0;
+}
+
+static uint64_t cfu_idr_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(reg->opaque);
+uint32_t val = val64;
+
+s->regs[R_CFU_IMR] |= val;
+cfu_imr_update_irq(s);
+return 0;
+}
+
+static uint64_t cfu_itr_prew(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(reg->opaque);
+uint32_t val = val64;
+
+s->regs[R_CFU_ISR] |= val;
+cfu_imr_update_irq(s);
+return 0;
+}
+
+static void cfu_fgcr_postw(RegisterInfo *reg, uint64_t val64)
+{
+XlnxVersalCFUAPB *s = XLNX_VERSAL_CFU_APB(reg->opaque);
+uint32_t val = (uint32_t)val64;
+
+/* Do a scan. It always looks good. */
+if (FIELD_EX32(val, CFU_FGCR, SC_HBC_TRIGGER)) {
+ARRAY_FIELD_DP32(s->regs, CFU_STATUS, SCAN_CLEAR_PASS, 1);
+ARRAY_FIELD_DP32(s->regs, CFU_STATUS, SCAN_CLEAR_DONE, 1);
+}
+}
+
+static const RegisterAccessInfo cfu_apb_regs_info[] = {
+{   .name = "CFU_ISR",  .addr = A_CFU_ISR,
+.rsvd = 0xfc00,
+.w1c = 0x3ff,
+.post_write = cfu_isr_postw,
+},{ .name = "CFU_IMR",  .addr = A_CFU_IMR,
+.reset = 0x3ff,
+.rsvd = 0xfc00,
+.ro = 0x3ff,
+},{ .name = "CFU_IER",  .addr = A_CFU_IER,
+.rsvd = 0xfc00,
+.pre_write = cfu_ier_prew,
+},{ .name = "CFU_IDR",  .addr = A_CFU_IDR,
+.rsvd = 0xfc00,
+.pre_write = cfu_idr_prew,
+},{ .name = "CFU_ITR",  .addr = A_CFU_ITR,
+.rsvd = 0xfc00,
+.pre_write = cfu_itr_prew,
+},{ .name = "CFU_PROTECT",  .addr = A_CFU_PROTECT,
+.reset = 0x1,
+},{ .name = "CFU_FGCR",  .addr = A_CFU_FGCR,
+.rsvd = 0x8000,
+.post_write = cfu_fgcr_postw,
+

[PATCH v2 4/8] hw/misc/xlnx-versal-cfu: Introduce a model of Xilinx Versal's CFU_SFR

2023-08-10 Thread Francisco Iglesias

Introduce a model of Xilinx Versal's Configuration Frame Unit's Single
Frame Read port (CFU_SFR).

Signed-off-by: Francisco Iglesias 
---
 hw/misc/xlnx-versal-cfu.c | 87 +++
 include/hw/misc/xlnx-versal-cfu.h | 15 ++
 2 files changed, 102 insertions(+)

diff --git a/hw/misc/xlnx-versal-cfu.c b/hw/misc/xlnx-versal-cfu.c
index 255c1bf4b8..8e588ac1d8 100644
--- a/hw/misc/xlnx-versal-cfu.c
+++ b/hw/misc/xlnx-versal-cfu.c
@@ -264,6 +264,31 @@ static void cfu_stream_write(void *opaque, hwaddr addr, 
uint64_t value,
 }
 }
 
+static uint64_t cfu_sfr_read(void *opaque, hwaddr addr, unsigned size)
+{
+qemu_log_mask(LOG_GUEST_ERROR, "%s: Unsupported read from addr=%"
+  HWADDR_PRIx "\n", __func__, addr);
+return 0;
+}
+
+static void cfu_sfr_write(void *opaque, hwaddr addr, uint64_t value,
+  unsigned size)
+{
+XlnxVersalCFUSFR *s = XLNX_VERSAL_CFU_SFR(opaque);
+uint32_t wfifo[WFIFO_SZ];
+
+if (update_wfifo(addr, value, s->wfifo, wfifo)) {
+uint8_t row_addr = extract32(wfifo[0], 23, 5);
+uint32_t frame_addr = extract32(wfifo[0], 0, 23);
+XlnxCfiPacket pkt = { .reg_addr = CFRAME_SFR,
+  .data[0] = frame_addr };
+
+if (s->cfg.cfu) {
+cfu_transfer_cfi_packet(s->cfg.cfu, row_addr, );
+}
+}
+}
+
 static uint64_t cfu_fdro_read(void *opaque, hwaddr addr, unsigned size)
 {
 XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(opaque);
@@ -293,6 +318,16 @@ static const MemoryRegionOps cfu_stream_ops = {
 },
 };
 
+static const MemoryRegionOps cfu_sfr_ops = {
+.read = cfu_sfr_read,
+.write = cfu_sfr_write,
+.endianness = DEVICE_LITTLE_ENDIAN,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+},
+};
+
 static const MemoryRegionOps cfu_fdro_ops = {
 .read = cfu_fdro_read,
 .write = cfu_fdro_write,
@@ -334,6 +369,23 @@ static void cfu_apb_init(Object *obj)
 sysbus_init_irq(sbd, >irq_cfu_imr);
 }
 
+static void cfu_sfr_init(Object *obj)
+{
+XlnxVersalCFUSFR *s = XLNX_VERSAL_CFU_SFR(obj);
+SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+memory_region_init_io(>iomem_sfr, obj, _sfr_ops, s,
+  TYPE_XLNX_VERSAL_CFU_SFR, KEYHOLE_STREAM_4K);
+sysbus_init_mmio(sbd, >iomem_sfr);
+}
+
+static void cfu_sfr_reset_enter(Object *obj, ResetType type)
+{
+XlnxVersalCFUSFR *s = XLNX_VERSAL_CFU_SFR(obj);
+
+memset(s->wfifo, 0, WFIFO_SZ * sizeof(uint32_t));
+}
+
 static void cfu_fdro_init(Object *obj)
 {
 XlnxVersalCFUFDRO *s = XLNX_VERSAL_CFU_FDRO(obj);
@@ -401,6 +453,12 @@ static Property cfu_props[] = {
 DEFINE_PROP_END_OF_LIST(),
 };
 
+static Property cfu_sfr_props[] = {
+DEFINE_PROP_LINK("cfu", XlnxVersalCFUSFR, cfg.cfu,
+ TYPE_XLNX_VERSAL_CFU_APB, XlnxVersalCFUAPB *),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static const VMStateDescription vmstate_cfu_apb = {
 .name = TYPE_XLNX_VERSAL_CFU_APB,
 .version_id = 1,
@@ -423,6 +481,16 @@ static const VMStateDescription vmstate_cfu_fdro = {
 }
 };
 
+static const VMStateDescription vmstate_cfu_sfr = {
+.name = TYPE_XLNX_VERSAL_CFU_SFR,
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32_ARRAY(wfifo, XlnxVersalCFUSFR, 4),
+VMSTATE_END_OF_LIST(),
+}
+};
+
 static void cfu_apb_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -443,6 +511,16 @@ static void cfu_fdro_class_init(ObjectClass *klass, void 
*data)
 rc->phases.enter = cfu_fdro_reset_enter;
 }
 
+static void cfu_sfr_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
+
+device_class_set_props(dc, cfu_sfr_props);
+dc->vmsd = _cfu_sfr;
+rc->phases.enter = cfu_sfr_reset_enter;
+}
+
 static const TypeInfo cfu_apb_info = {
 .name  = TYPE_XLNX_VERSAL_CFU_APB,
 .parent= TYPE_SYS_BUS_DEVICE,
@@ -467,10 +545,19 @@ static const TypeInfo cfu_fdro_info = {
 }
 };
 
+static const TypeInfo cfu_sfr_info = {
+.name  = TYPE_XLNX_VERSAL_CFU_SFR,
+.parent= TYPE_SYS_BUS_DEVICE,
+.instance_size = sizeof(XlnxVersalCFUSFR),
+.class_init= cfu_sfr_class_init,
+.instance_init = cfu_sfr_init,
+};
+
 static void cfu_apb_register_types(void)
 {
 type_register_static(_apb_info);
 type_register_static(_fdro_info);
+type_register_static(_sfr_info);
 }
 
 type_init(cfu_apb_register_types)
diff --git a/include/hw/misc/xlnx-versal-cfu.h 
b/include/hw/misc/xlnx-versal-cfu.h
index 73e9a21af4..86fb841053 100644
--- a/include/hw/misc/xlnx-versal-cfu.h
+++ b/include/hw/misc/xlnx-versal-cfu.h
@@ -28,6 +28,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(XlnxVersalCFUAPB, 
XLNX_VERSAL_CFU_APB)
 #define TYPE_XLNX_VERSAL_CFU_FDRO "xlnx,versal-cfu-fdro"

[PATCH v2 1/8] hw/misc: Introduce the Xilinx CFI interface

2023-08-10 Thread Francisco Iglesias

Introduce the Xilinx Configuration Frame Interface (CFI) for transmitting
CFI data packets between the Xilinx Configuration Frame Unit models
(CFU_APB, CFU_FDRO and CFU_SFR), the Xilinx CFRAME controller (CFRAME_REG)
and the Xilinx CFRAME broadcast controller (CFRAME_BCAST_REG) models (when
emulating bitstream programming and readback).

Signed-off-by: Francisco Iglesias 
Reviewed-by: Sai Pavan Boddu 
Acked-by: Edgar E. Iglesias 
---
 MAINTAINERS   |  6 
 hw/misc/meson.build   |  1 +
 hw/misc/xlnx-cfi-if.c | 34 
 include/hw/misc/xlnx-cfi-if.h | 59 +++
 4 files changed, 100 insertions(+)
 create mode 100644 hw/misc/xlnx-cfi-if.c
 create mode 100644 include/hw/misc/xlnx-cfi-if.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6111b6b4d9..e0cd365462 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1034,6 +1034,12 @@ S: Maintained
 F: hw/ssi/xlnx-versal-ospi.c
 F: include/hw/ssi/xlnx-versal-ospi.h
 
+Xilinx Versal CFI
+M: Francisco Iglesias 
+S: Maintained
+F: hw/misc/xlnx-cfi-if.c
+F: include/hw/misc/xlnx-cfi-if.h
+
 STM32F100
 M: Alexandre Iooss 
 L: qemu-...@nongnu.org
diff --git a/hw/misc/meson.build b/hw/misc/meson.build
index 892f8b91c5..0c562f5e3e 100644
--- a/hw/misc/meson.build
+++ b/hw/misc/meson.build
@@ -97,6 +97,7 @@ specific_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: 
files('xlnx-versal-crl.c'))
 system_ss.add(when: 'CONFIG_XLNX_VERSAL', if_true: files(
   'xlnx-versal-xramc.c',
   'xlnx-versal-pmc-iou-slcr.c',
+  'xlnx-cfi-if.c',
 ))
 system_ss.add(when: 'CONFIG_STM32F2XX_SYSCFG', if_true: 
files('stm32f2xx_syscfg.c'))
 system_ss.add(when: 'CONFIG_STM32F4XX_SYSCFG', if_true: 
files('stm32f4xx_syscfg.c'))
diff --git a/hw/misc/xlnx-cfi-if.c b/hw/misc/xlnx-cfi-if.c
new file mode 100644
index 00..c45f05c4aa
--- /dev/null
+++ b/hw/misc/xlnx-cfi-if.c
@@ -0,0 +1,34 @@
+/*
+ * Xilinx CFI interface
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc.
+ *
+ * Written by Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#include "qemu/osdep.h"
+#include "hw/misc/xlnx-cfi-if.h"
+
+void xlnx_cfi_transfer_packet(XlnxCfiIf *cfi_if, XlnxCfiPacket *pkt)
+{
+XlnxCfiIfClass *xcic = XLNX_CFI_IF_GET_CLASS(cfi_if);
+
+if (xcic->cfi_transfer_packet) {
+xcic->cfi_transfer_packet(cfi_if, pkt);
+}
+}
+
+static const TypeInfo xlnx_cfi_if_info = {
+.name  = TYPE_XLNX_CFI_IF,
+.parent= TYPE_INTERFACE,
+.class_size = sizeof(XlnxCfiIfClass),
+};
+
+static void xlnx_cfi_if_register_types(void)
+{
+type_register_static(_cfi_if_info);
+}
+
+type_init(xlnx_cfi_if_register_types)
+
diff --git a/include/hw/misc/xlnx-cfi-if.h b/include/hw/misc/xlnx-cfi-if.h
new file mode 100644
index 00..f9bd12292d
--- /dev/null
+++ b/include/hw/misc/xlnx-cfi-if.h
@@ -0,0 +1,59 @@
+/*
+ * Xilinx CFI interface
+ *
+ * Copyright (C) 2023, Advanced Micro Devices, Inc.
+ *
+ * Written by Francisco Iglesias 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef XLNX_CFI_IF_H
+#define XLNX_CFI_IF_H 1
+
+#include "qemu/help-texts.h"
+#include "hw/hw.h"
+#include "qom/object.h"
+
+#define TYPE_XLNX_CFI_IF "xlnx-cfi-if"
+typedef struct XlnxCfiIfClass XlnxCfiIfClass;
+DECLARE_CLASS_CHECKERS(XlnxCfiIfClass, XLNX_CFI_IF, TYPE_XLNX_CFI_IF)
+
+#define XLNX_CFI_IF(obj) \
+ INTERFACE_CHECK(XlnxCfiIf, (obj), TYPE_XLNX_CFI_IF)
+
+typedef enum {
+PACKET_TYPE_CFU = 0x52,
+PACKET_TYPE_CFRAME = 0xA1,
+} xlnx_cfi_packet_type;
+
+typedef enum {
+CFRAME_FAR = 1,
+CFRAME_SFR = 2,
+CFRAME_FDRI = 4,
+CFRAME_CMD = 6,
+} xlnx_cfi_reg_addr;
+
+typedef struct XlnxCfiPacket {
+uint8_t reg_addr;
+uint32_t data[4];
+} XlnxCfiPacket;
+
+typedef struct XlnxCfiIf {
+Object Parent;
+} XlnxCfiIf;
+
+typedef struct XlnxCfiIfClass {
+InterfaceClass parent;
+
+void (*cfi_transfer_packet)(XlnxCfiIf *cfi_if, XlnxCfiPacket *pkt);
+} XlnxCfiIfClass;
+
+/**
+ * Transfer a XlnxCfiPacket.
+ *
+ * @cfi_if: the object implementing this interface
+ * @XlnxCfiPacket: a pointer to the XlnxCfiPacket to transfer
+ */
+void xlnx_cfi_transfer_packet(XlnxCfiIf *cfi_if, XlnxCfiPacket *pkt);
+
+#endif /* XLNX_CFI_IF_H */
-- 
2.34.1

[PATCH v2 0/8] Xilinx Versal CFI support

2023-08-10 Thread Francisco Iglesias

Hi,

This series adds support for the Configuration Frame Unit (CFU) and the
Configuration Frame controllers (CFRAME) to the Xilinx Versal machine
([1], chapter 21) for emulaing bitstream loading and readback.

The series starts by introducing the Xilinx CFI interface that is
thereafter used by the Xilinx CFU components, the Xilinx CFRAME and Xilinx
CFRAME broadcast models for transfering CFI packets between each other.
Thereafter a model of the CFU_APB, CFU_FDRO and CFU_SFR are introduced and
also models of the CFRAME controller and CFRAME broadcast controller.

The series thereafter ends with connecting the models to Xilinx Versal
machine.

Best regards,
Francisco Iglesias

References:
[1] https://docs.xilinx.com/r/en-US/am011-versal-acap-trm/PSM-Local-Registers

Changelog:
v1->v2:
 [PATCH 2]
   * Use KiB when defining KEYHOLE_STREAM_4K/KEYHOLE_STREAM_256K
   * Updated to be able to share wfifo code 

 [PATCH 3]
   * Swap to use Fifo32 instead of GArray in the CFU_FDRO model
   * Add device reset to the CFU_FDRO model

 [PATCH 4]
   * Add device reset to the CFU_SFR model

 [PATCH 5]
   * Use KiB when defining KEYHOLE_STREAM_4K
   * Add comma after CFRAME_CMD_DLPARK
   * Remove backwards compatibility comment (and the 'cfu' alias propname for
 cfg.cfu_fdro)
   * Use Fifo32 inside the XlnxCFrame structure
   * Reworked cframes_reg_pre_save / cframes_reg_post_load

 [PATCH 6]
   * Add device reset to the CFrame broadcast reg model

 [PATCH 8]
   * Switch to use g_autofree instead of explicit g_free


Francisco Iglesias (8):
  hw/misc: Introduce the Xilinx CFI interface
  hw/misc: Introduce a model of Xilinx Versal's CFU_APB
  hw/misc/xlnx-versal-cfu: Introduce a model of Xilinx Versal CFU_FDRO
  hw/misc/xlnx-versal-cfu: Introduce a model of Xilinx Versal's CFU_SFR
  hw/misc: Introduce a model of Xilinx Versal's CFRAME_REG
  hw/misc: Introduce a model of Xilinx Versal's CFRAME_BCAST_REG
  hw/arm/xlnx-versal: Connect the CFU_APB, CFU_FDRO and CFU_SFR
  hw/arm/versal: Connect the CFRAME_REG and CFRAME_BCAST_REG

 MAINTAINERS  |  10 +
 hw/arm/xlnx-versal.c | 155 +++-
 hw/misc/meson.build  |   3 +
 hw/misc/xlnx-cfi-if.c|  34 +
 hw/misc/xlnx-versal-cframe-reg.c | 914 +++
 hw/misc/xlnx-versal-cfu.c| 563 ++
 include/hw/arm/xlnx-versal.h |  85 +++
 include/hw/misc/xlnx-cfi-if.h|  59 ++
 include/hw/misc/xlnx-versal-cframe-reg.h | 306 
 include/hw/misc/xlnx-versal-cfu.h| 258 +++
 10 files changed, 2386 insertions(+), 1 deletion(-)
 create mode 100644 hw/misc/xlnx-cfi-if.c
 create mode 100644 hw/misc/xlnx-versal-cframe-reg.c
 create mode 100644 hw/misc/xlnx-versal-cfu.c
 create mode 100644 include/hw/misc/xlnx-cfi-if.h
 create mode 100644 include/hw/misc/xlnx-versal-cframe-reg.h
 create mode 100644 include/hw/misc/xlnx-versal-cfu.h

-- 
2.34.1

Re: [PATCH 4/5] target/arm: Support more GM blocksizes

2023-08-10 Thread Richard Henderson


On 8/10/23 07:23, Peter Maydell wrote:

+case 4:
+/* 64 bytes -> 4 tags -> 16 result bits */
+ret = cpu_to_le16(*(uint16_t *)tag_mem);


Does this really make a difference compared to ldw_le_p() ?


ldw_le_p uses memcpy, though only mips and sparc hosts do not have unaligned reads, so 
perhaps it doesn't make much difference.


I had originally been thinking about atomicity, but then noticed that the pseudocode uses 
a loop and so the instruction is therefore non-atomic.




Is it worth having an assert in CPU realize for an invalid
blocksize, so that we can catch duff ID register values
without having to rely on there being a test run that
uses ldgm/stgm ?


Yes, that's a good idea.


r~

Re: [PATCH 2/5] target/arm: Reduce dcz_blocksize to uint8_t

2023-08-10 Thread Richard Henderson


On 8/10/23 07:09, Peter Maydell wrote:

On Thu, 10 Aug 2023 at 03:37, Richard Henderson
 wrote:


This value is only 4 bits wide.


True. Any particular reason to change the type, though?


To save space.


r~




Signed-off-by: Richard Henderson 
---
  target/arm/cpu.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)



Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: Re: [PATCH] target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

2023-08-10 Thread Alistair Francis

On Mon, Jul 24, 2023 at 2:06 AM liguang.zhang <18622748...@163.com> wrote:
>
> > On Tue, Jul 18, 2023 at 10:22 PM liguang.zhang <18622748...@163.com> wrote:
> > >
> > > From: "liguang.zhang" 
> > >
> > > Fix the guest reboot error when using KVM
> > > There are two issues when rebooting a guest using KVM
> > > 1. When the guest initiates a reboot the host is unable to stop the vcpu
> > > 2. When running a SMP guest the qemu monitor system_reset causes a vcpu 
> > > crash
> > >
> > > This can be fixed by clearing the CSR values at reset and syncing the
> > > MPSTATE with the host.
> > >
> > > Signed-off-by: liguang.zhang 
> >
> > Thanks!
> >
> > When sending new versions of patches please increment the patch
> > version:
> > https://www.qemu.org/docs/master/devel/submitting-a-patch.html#when-resending-patches-add-a-version-tag
> >
>
> Sorry about it, since i'm confused about the git send-email, original mail 
> thread lost. -> 
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg977038.html
> I would like to resubmit and track the email history.

No worries :)

I have noticed that when you send the patch emails I get multiple
emails sent a few minutes apart. I count at least 8 copies of the same
email. Do you mind trying to fix whatever is causing that?

>
> > The patch looks good, but don't we need an equivalent for the get register 
> > call?
> >
> > Alistair
>
> Sorry, "get register call" refers to which section? It was not mentioned in 
> the previous suggestions for modifications.

You are adding code to kvm_arch_put_registers() don't you also need to
add code to kvm_arch_get_registers()? and a *_mpstate_to_qemu()
function to match?

> Follow the original modification suggestions, I hope to upstream as soon as 
> possible, as it has been delayed for quite some time.

Upstreamig code is an iterative process. Just because something wasn't
brought up in the first version doesn't mean it won't be raised later.

I'm sorry it has taken so long. If you don't get a reply within a week
please ping your patches or responses. Ensuring you follow patch
submission best practices can help improve the upstreaming speed, such
as incrementing patch versions and responding with plain text inline.

Hopefully just one more revision is required :)

Alistair

>
> Thanks ~
>

Re: [PATCH v3] target/riscv: Clearing the CSR values at reset and syncing the MPSTATE with the host

2023-08-10 Thread Alistair Francis

On Mon, Jul 24, 2023 at 2:26 AM liguang.zhang <18622748...@163.com> wrote:
>
> From: "liguang.zhang" 
>
> Fix the guest reboot error when using KVM
> There are two issues when rebooting a guest using KVM
> 1. When the guest initiates a reboot the host is unable to stop the vcpu
> 2. When running a SMP guest the qemu monitor system_reset causes a vcpu crash
>
> This can be fixed by clearing the CSR values at reset and syncing the
> MPSTATE with the host.
>
> Signed-off-by: liguang.zhang 
> ---
>  target/riscv/kvm.c   | 42 
>  target/riscv/kvm_riscv.h |  1 +
>  2 files changed, 43 insertions(+)
>
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 9d8a8982f9..ecc8ab8238 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -44,6 +44,8 @@
>  #include "migration/migration.h"
>  #include "sysemu/runstate.h"
>
> +static bool cap_has_mp_state;
> +
>  static uint64_t kvm_riscv_reg_id(CPURISCVState *env, uint64_t type,
>   uint64_t idx)
>  {
> @@ -790,6 +792,24 @@ int kvm_arch_get_registers(CPUState *cs)
>  return ret;
>  }
>
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state)
> +{
> +if (cap_has_mp_state) {
> +struct kvm_mp_state mp_state = {
> +.mp_state = state
> +};
> +
> +int ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_MP_STATE, _state);
> +if (ret) {
> +fprintf(stderr, "%s: failed to sync MP_STATE %d/%s\n",
> +__func__, ret, strerror(-ret));
> +return -1;
> +}
> +}
> +
> +return 0;
> +}
> +
>  int kvm_arch_put_registers(CPUState *cs, int level)
>  {
>  int ret = 0;
> @@ -809,6 +829,18 @@ int kvm_arch_put_registers(CPUState *cs, int level)
>  return ret;
>  }
>
> +if (KVM_PUT_RESET_STATE == level) {
> +RISCVCPU *cpu = RISCV_CPU(cs);
> +if (cs->cpu_index == 0) {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_RUNNABLE);
> +} else {
> +ret = kvm_riscv_sync_mpstate_to_kvm(cpu, KVM_MP_STATE_STOPPED);
> +}
> +if (ret) {
> +return ret;
> +}
> +}

You are adding code to kvm_arch_put_registers() don't you also need to
add code to kvm_arch_get_registers()? and a *_mpstate_to_qemu()
function to match?

Alistair

> +
>  return ret;
>  }
>
> @@ -909,6 +941,7 @@ int kvm_arch_add_msi_route_post(struct 
> kvm_irq_routing_entry *route,
>
>  int kvm_arch_init(MachineState *ms, KVMState *s)
>  {
> +cap_has_mp_state = kvm_check_extension(s, KVM_CAP_MP_STATE);
>  return 0;
>  }
>
> @@ -987,10 +1020,19 @@ void kvm_riscv_reset_vcpu(RISCVCPU *cpu)
>  if (!kvm_enabled()) {
>  return;
>  }
> +for (int i=0; i<32; i++)
> +env->gpr[i] = 0;
>  env->pc = cpu->env.kernel_addr;
>  env->gpr[10] = kvm_arch_vcpu_id(CPU(cpu)); /* a0 */
>  env->gpr[11] = cpu->env.fdt_addr;  /* a1 */
>  env->satp = 0;
> +env->mie = 0;
> +env->stvec = 0;
> +env->sscratch = 0;
> +env->sepc = 0;
> +env->scause = 0;
> +env->stval = 0;
> +env->mip = 0;
>  }
>
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level)
> diff --git a/target/riscv/kvm_riscv.h b/target/riscv/kvm_riscv.h
> index e3ba935808..3ea68c38e3 100644
> --- a/target/riscv/kvm_riscv.h
> +++ b/target/riscv/kvm_riscv.h
> @@ -22,5 +22,6 @@
>  void kvm_riscv_init_user_properties(Object *cpu_obj);
>  void kvm_riscv_reset_vcpu(RISCVCPU *cpu);
>  void kvm_riscv_set_irq(RISCVCPU *cpu, int irq, int level);
> +int kvm_riscv_sync_mpstate_to_kvm(RISCVCPU *cpu, int state);
>
>  #endif
> --
> 2.41.0
>

Re: [PATCH QEMU v2 0/3] provide a smooth upgrade solution for multi-queues disk

2023-08-10 Thread Stefan Hajnoczi

On Thu, Aug 10, 2023 at 07:07:09AM +, ~hyman wrote:
> Ping,
> 
> This version is a copy of version 1 and is rebased
> on the master. No functional changes.
> 
> A 1:1 virtqueue:vCPU mapping implementation for virtio-*-pci disk
> introduced since qemu >= 5.2.0, which improves IO performance
> remarkably. To enjoy this feature for exiting running VMs without
> service interruption, the common solution is to migrate VMs from the
> lower version of the hypervisor to the upgraded hypervisor, then wait
> for the next cold reboot of the VM to enable this feature. That's the
> way "discard" and "write-zeroes" features work.
> 
> As to multi-queues disk allocation automatically, it's a little
> different because the destination will allocate queues to match the
> number of vCPUs automatically by default in the case of live migration,
> and the VMs on the source side remain 1 queue by default, which results
> in migration failure due to loading disk VMState incorrectly on the
> destination side.

Are you using QEMU's versioned machine types to freeze the VM
configuration?

If not, then live migration won't work reliably because you're migrating
between two potentially different VM configurations. This issue is not
specific to num-queues, it affects all device properties.

In commit 9445e1e15e66c19e42bea942ba810db28052cd05 ("virtio-blk-pci:
default num_queues to -smp N") the num_queues property is set to 1 for
versioned machine types <=5.1:

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 9ee2aa0f7b..7f65fa8743 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -31,6 +31,7 @@
 GlobalProperty hw_compat_5_1[] = {
 { "vhost-scsi", "num_queues", "1"},
 { "vhost-user-scsi", "num_queues", "1"},
+{ "virtio-blk-device", "num-queues", "1"},
 { "virtio-scsi-device", "num_queues", "1"},
 };
 const size_t hw_compat_5_1_len = G_N_ELEMENTS(hw_compat_5_1);

Live migration works when the source and destination QEMU are launched
with the same versioned machine type. You can check the "info qtree"
output to confirm that starting a VM with -smp 4 -M pc-q35-5.1 results
in num-queues=1 while -smp 4 -M pc-q35-5.2 results in num-queues=4.

> This issue requires Qemu to provide a hint that shows
> multi-queues disk allocation is automatically supported, and this allows
> upper APPs, e.g., libvirt, to recognize the hypervisor's capability of
> this. And upper APPs can ensure to allocate the same num-queues on the
> destination side in case of migration failure.
> 
> To fix the issue, we introduce the auto-num-queues property for
> virtio-*-pci as a solution, which would be probed by APPs, e.g., libvirt
> by querying the device properties of QEMU. When launching live
> migration, libvirt will send the auto-num-queues property as a migration
> cookie to the destination, and thus the destination knows if the source
> side supports auto-num-queues. If not, the destination would switch off
> by building the command line with "auto-num-queues=off" when preparing
> the incoming VM process. The following patches of libvirt show how it
> roughly works:
> https://github.com/newfriday/libvirt/commit/ce2bae2e1a6821afeb80756dc01f3680f525e506
> https://github.com/newfriday/libvirt/commit/f546972b009458c88148fe079544db7e9e1f43c3
> https://github.com/newfriday/libvirt/commit/5ee19c8646fdb4d87ab8b93f287c20925268ce83
> 
> The smooth upgrade solution requires the introduction of the auto-num-
> queues property on the QEMU side, which is what the patch set does. I'm
> hoping for comments about the series.

Please take a look at versioned machine types. I think auto-num-queues
is not necessary if you use versioned machine types.

If you do think auto-num-queues is needed, please explain the issue in
more detail and state why versioned machine types don't help.

Thanks,
Stefan

> 
> Please review, thanks.
> Yong
> 
> Hyman Huang(黄勇) (3):
>   virtio-scsi-pci: introduce auto-num-queues property
>   virtio-blk-pci: introduce auto-num-queues property
>   vhost-user-blk-pci: introduce auto-num-queues property
> 
>  hw/block/vhost-user-blk.c  |  1 +
>  hw/block/virtio-blk.c  |  1 +
>  hw/scsi/vhost-scsi.c   |  2 ++
>  hw/scsi/vhost-user-scsi.c  |  2 ++
>  hw/scsi/virtio-scsi.c  |  2 ++
>  hw/virtio/vhost-scsi-pci.c | 11 +--
>  hw/virtio/vhost-user-blk-pci.c |  9 -
>  hw/virtio/vhost-user-scsi-pci.c| 11 +--
>  hw/virtio/virtio-blk-pci.c |  9 -
>  hw/virtio/virtio-scsi-pci.c| 11 +--
>  include/hw/virtio/vhost-user-blk.h |  5 +
>  include/hw/virtio/virtio-blk.h |  5 +
>  include/hw/virtio/virtio-scsi.h|  5 +
>  13 files changed, 66 insertions(+), 8 deletions(-)
> 
> -- 
> 2.38.5
> 


signature.asc
Description: PGP signature

Re: [PATCH 2/2] hw/intc: Make rtc variable names consistent

2023-08-10 Thread Alistair Francis

On Fri, Jul 28, 2023 at 4:57 AM Jason Chien  wrote:
>
> The variables whose values are given by cpu_riscv_read_rtc() should be named
> "rtc". The variables whose value are given by cpu_riscv_read_rtc_raw()
> should be named "rtc_r".
>
> Signed-off-by: Jason Chien 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/intc/riscv_aclint.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
> index bf77e29a70..25cf7a5d9d 100644
> --- a/hw/intc/riscv_aclint.c
> +++ b/hw/intc/riscv_aclint.c
> @@ -64,13 +64,13 @@ static void 
> riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
>  uint64_t next;
>  uint64_t diff;
>
> -uint64_t rtc_r = cpu_riscv_read_rtc(mtimer);
> +uint64_t rtc = cpu_riscv_read_rtc(mtimer);
>
>  /* Compute the relative hartid w.r.t the socket */
>  hartid = hartid - mtimer->hartid_base;
>
>  mtimer->timecmp[hartid] = value;
> -if (mtimer->timecmp[hartid] <= rtc_r) {
> +if (mtimer->timecmp[hartid] <= rtc) {
>  /*
>   * If we're setting an MTIMECMP value in the "past",
>   * immediately raise the timer interrupt
> @@ -81,7 +81,7 @@ static void 
> riscv_aclint_mtimer_write_timecmp(RISCVAclintMTimerState *mtimer,
>
>  /* otherwise, set up the future timer interrupt */
>  qemu_irq_lower(mtimer->timer_irqs[hartid]);
> -diff = mtimer->timecmp[hartid] - rtc_r;
> +diff = mtimer->timecmp[hartid] - rtc;
>  /* back to ns (note args switched in muldiv64) */
>  uint64_t ns_diff = muldiv64(diff, NANOSECONDS_PER_SECOND, timebase_freq);
>
> --
> 2.17.1
>
>

Re: [PATCH 1/2] hw/intc: Fix upper/lower mtime write calculation

2023-08-10 Thread Alistair Francis

On Fri, Jul 28, 2023 at 5:13 AM Jason Chien  wrote:
>
> When writing the upper mtime, we should keep the original lower mtime
> whose value is given by cpu_riscv_read_rtc() instead of
> cpu_riscv_read_rtc_raw(). The same logic applies to writes to lower mtime.
>
> Signed-off-by: Jason Chien 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  hw/intc/riscv_aclint.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/intc/riscv_aclint.c b/hw/intc/riscv_aclint.c
> index b466a6abaf..bf77e29a70 100644
> --- a/hw/intc/riscv_aclint.c
> +++ b/hw/intc/riscv_aclint.c
> @@ -208,11 +208,12 @@ static void riscv_aclint_mtimer_write(void *opaque, 
> hwaddr addr,
>  return;
>  } else if (addr == mtimer->time_base || addr == mtimer->time_base + 4) {
>  uint64_t rtc_r = cpu_riscv_read_rtc_raw(mtimer->timebase_freq);
> +uint64_t rtc = cpu_riscv_read_rtc(mtimer);
>
>  if (addr == mtimer->time_base) {
>  if (size == 4) {
>  /* time_lo for RV32/RV64 */
> -mtimer->time_delta = ((rtc_r & ~0xULL) | value) - 
> rtc_r;
> +mtimer->time_delta = ((rtc & ~0xULL) | value) - 
> rtc_r;
>  } else {
>  /* time for RV64 */
>  mtimer->time_delta = value - rtc_r;
> @@ -220,7 +221,7 @@ static void riscv_aclint_mtimer_write(void *opaque, 
> hwaddr addr,
>  } else {
>  if (size == 4) {
>  /* time_hi for RV32/RV64 */
> -mtimer->time_delta = (value << 32 | (rtc_r & 0x)) - 
> rtc_r;
> +mtimer->time_delta = (value << 32 | (rtc & 0x)) - 
> rtc_r;
>  } else {
>  qemu_log_mask(LOG_GUEST_ERROR,
>"aclint-mtimer: invalid time_hi write: %08x",
> --
> 2.17.1
>
>

Re: [PATCH v5 17/17] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

2023-08-10 Thread Eric Blake

On Thu, Aug 10, 2023 at 12:37:04PM -0500, Eric Blake wrote:
> Allow a client to request a subset of negotiated meta contexts.  For
> example, a client may ask to use a single connection to learn about
> both block status and dirty bitmaps, but where the dirty bitmap
> queries only need to be performed on a subset of the disk; forcing the
> server to compute that information on block status queries in the rest
> of the disk is wasted effort (both at the server, and on the amount of
> traffic sent over the wire to be parsed and ignored by the client).
> 

> +nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
> + Error **errp)
> +{
> +int payload_len = request->len;
> +g_autofree char *buf = NULL;
> +size_t count, i, nr_bitmaps;
> +uint32_t id;
> +
> +if (payload_len > NBD_MAX_BUFFER_SIZE) {
> +error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
> +   request->len, NBD_MAX_BUFFER_SIZE);

Copy and paste specing bug produces "len (12345678980 ) is larger...",
should be 'PRIu64 ")'; will touch up here and in all other places it
occurs.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

[PULL 3/4] gdbstub: more fixes for client Ctrl-C handling

2023-08-10 Thread Richard Henderson

From: Alex Bennée 

The original fix caused problems with spurious characters on other
system emulation. So:

  - instead of spamming output make the warning a trace point
  - ensure we only allow a stop reply if it was 0x3

Suggested-by: Matheus Tavares Bernardino 
Signed-off-by: Alex Bennée 
Message-Id: 
<456ed3318421dd7946bdfb5ceda7e05332da368c.1690910333.git.quic_mathb...@quicinc.com>
Reviewed-by: Richard Henderson 
Tested-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230810153640.1879717-8-alex.ben...@linaro.org>
Signed-off-by: Richard Henderson 
---
 gdbstub/gdbstub.c| 5 +++--
 gdbstub/trace-events | 1 +
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index e74ecc78cc..20b6fe03fb 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -2059,9 +2059,10 @@ void gdb_read_byte(uint8_t ch)
  * here, but it does expect a stop reply.
  */
 if (ch != 0x03) {
-warn_report("gdbstub: client sent packet while target running\n");
+trace_gdbstub_err_unexpected_runpkt(ch);
+} else {
+gdbserver_state.allow_stop_reply = true;
 }
-gdbserver_state.allow_stop_reply = true;
 vm_stop(RUN_STATE_PAUSED);
 } else
 #endif
diff --git a/gdbstub/trace-events b/gdbstub/trace-events
index 0c18a4d70a..7bc79a73c4 100644
--- a/gdbstub/trace-events
+++ b/gdbstub/trace-events
@@ -26,6 +26,7 @@ gdbstub_err_invalid_repeat(uint8_t ch) "got invalid RLE 
count: 0x%02x"
 gdbstub_err_invalid_rle(void) "got invalid RLE sequence"
 gdbstub_err_checksum_invalid(uint8_t ch) "got invalid command checksum digit: 
0x%02x"
 gdbstub_err_checksum_incorrect(uint8_t expected, uint8_t got) "got command 
packet with incorrect checksum, expected=0x%02x, received=0x%02x"
+gdbstub_err_unexpected_runpkt(uint8_t ch) "unexpected packet (0x%02x) while 
target running"
 
 # softmmu.c
 gdbstub_hit_watchpoint(const char *type, int cpu_gdb_index, uint64_t vaddr) 
"Watchpoint hit, type=\"%s\" cpu=%d, vaddr=0x%" PRIx64 ""
-- 
2.34.1

[PULL 1/4] accel/tcg: Avoid reading too much in load_atom_{2,4}

2023-08-10 Thread Richard Henderson

When load_atom_extract_al16_or_al8 is inexpensive, we want to use
it early, in order to avoid the overhead of required_atomicity.
However, we must not read past the end of the page.

If there are more than 8 bytes remaining, then both the "aligned 16"
and "aligned 8" paths align down so that the read has at least
16 bytes remaining on the page.

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 tests/tcg/aarch64/lse2-fault.c| 38 +++
 accel/tcg/ldst_atomicity.c.inc| 10 ++--
 tests/tcg/aarch64/Makefile.target |  2 +-
 3 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/aarch64/lse2-fault.c

diff --git a/tests/tcg/aarch64/lse2-fault.c b/tests/tcg/aarch64/lse2-fault.c
new file mode 100644
index 00..2187219a08
--- /dev/null
+++ b/tests/tcg/aarch64/lse2-fault.c
@@ -0,0 +1,38 @@
+#include 
+#include 
+#include 
+#include 
+
+int main()
+{
+int psize = getpagesize();
+int id;
+void *p;
+
+/*
+ * We need a shared mapping to enter CF_PARALLEL mode.
+ * The easiest way to get that is shmat.
+ */
+id = shmget(IPC_PRIVATE, 2 * psize, IPC_CREAT | 0600);
+if (id < 0) {
+perror("shmget");
+return 2;
+}
+p = shmat(id, NULL, 0);
+if (p == MAP_FAILED) {
+perror("shmat");
+return 2;
+}
+
+/* Protect the second page. */
+if (mprotect(p + psize, psize, PROT_NONE) < 0) {
+perror("mprotect");
+return 2;
+}
+
+/*
+ * Load 4 bytes, 6 bytes from the end of the page.
+ * On success this will load 0 from the newly allocated shm.
+ */
+return *(int *)(p + psize - 6);
+}
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index e5c590a499..1b793e6935 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -404,7 +404,10 @@ static uint16_t load_atom_2(CPUArchState *env, uintptr_t 
ra,
 return load_atomic2(pv);
 }
 if (HAVE_ATOMIC128_RO) {
-return load_atom_extract_al16_or_al8(pv, 2);
+intptr_t left_in_page = -(pi | TARGET_PAGE_MASK);
+if (likely(left_in_page > 8)) {
+return load_atom_extract_al16_or_al8(pv, 2);
+}
 }
 
 atmax = required_atomicity(env, pi, memop);
@@ -443,7 +446,10 @@ static uint32_t load_atom_4(CPUArchState *env, uintptr_t 
ra,
 return load_atomic4(pv);
 }
 if (HAVE_ATOMIC128_RO) {
-return load_atom_extract_al16_or_al8(pv, 4);
+intptr_t left_in_page = -(pi | TARGET_PAGE_MASK);
+if (likely(left_in_page > 8)) {
+return load_atom_extract_al16_or_al8(pv, 4);
+}
 }
 
 atmax = required_atomicity(env, pi, memop);
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 617f821613..681dfa077c 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -9,7 +9,7 @@ AARCH64_SRC=$(SRC_PATH)/tests/tcg/aarch64
 VPATH  += $(AARCH64_SRC)
 
 # Base architecture tests
-AARCH64_TESTS=fcvt pcalign-a64
+AARCH64_TESTS=fcvt pcalign-a64 lse2-fault
 
 fcvt: LDFLAGS+=-lm
 
-- 
2.34.1

[PULL 2/4] tests/tcg: ensure system-mode gdb tests start stopped

2023-08-10 Thread Richard Henderson

From: Alex Bennée 

Without -S we run into potential races with tests starting before the
gdbstub attaches. We don't need to worry about user-mode as enabling
the gdbstub implies we wait for the initial connection.

Signed-off-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230810153640.1879717-7-alex.ben...@linaro.org>
Signed-off-by: Richard Henderson 
---
 tests/guest-debug/run-test.py | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tests/guest-debug/run-test.py b/tests/guest-debug/run-test.py
index de6106a5e5..a032e01f79 100755
--- a/tests/guest-debug/run-test.py
+++ b/tests/guest-debug/run-test.py
@@ -69,13 +69,10 @@ def log(output, msg):
 
 # Launch QEMU with binary
 if "system" in args.qemu:
-cmd = "%s %s %s -gdb unix:path=%s,server=on" % (args.qemu,
-args.qargs,
-args.binary,
-socket_name)
+cmd = f'{args.qemu} {args.qargs} {args.binary}' \
+f' -S -gdb unix:path={socket_name},server=on'
 else:
-cmd = "%s %s -g %s %s" % (args.qemu, args.qargs, socket_name,
-  args.binary)
+cmd = f'{args.qemu} {args.qargs} -g {socket_name} {args.binary}'
 
 log(output, "QEMU CMD: %s" % (cmd))
 inferior = subprocess.Popen(shlex.split(cmd))
-- 
2.34.1

[PULL 4/4] gdbstub: don't complain about preemptive ACK chars

2023-08-10 Thread Richard Henderson

From: Alex Bennée 

When starting a remote connection GDB sends an '+':

  /* Ack any packet which the remote side has already sent.  */
  remote_serial_write ("+", 1);

which gets flagged as a garbage character in the gdbstub state
machine. As gdb does send it out lets be permissive about the handling
so we can better see real issues.

Signed-off-by: Alex Bennée 
Cc: gdb-patc...@sourceware.org
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230810153640.1879717-9-alex.ben...@linaro.org>
Signed-off-by: Richard Henderson 
---
 gdbstub/gdbstub.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 20b6fe03fb..5f28d5cf57 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -2074,6 +2074,11 @@ void gdb_read_byte(uint8_t ch)
 gdbserver_state.line_buf_index = 0;
 gdbserver_state.line_sum = 0;
 gdbserver_state.state = RS_GETLINE;
+} else if (ch == '+') {
+/*
+ * do nothing, gdb may preemptively send out ACKs on
+ * initial connection
+ */
 } else {
 trace_gdbstub_err_garbage(ch);
 }
-- 
2.34.1

[PULL 0/4] tcg/gdbstub late fixes

2023-08-10 Thread Richard Henderson

The following changes since commit 64d3be986f9e2379bc688bf1d0aca0557e0035ca:

  Merge tag 'or1k-pull-request-20230809' of https://github.com/stffrdhrn/qemu 
into staging (2023-08-09 15:05:02 -0700)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230810

for you to fetch changes up to f1b0f894c8c25f7ed24197ff130c7acb6b9fd6e7:

  gdbstub: don't complain about preemptive ACK chars (2023-08-10 11:04:34 -0700)


accel/tcg: Avoid reading too much in load_atom_{2,4}
tests/tcg: ensure system-mode gdb tests start stopped
gdbstub: more fixes for client Ctrl-C handling


Alex Bennée (3):
  tests/tcg: ensure system-mode gdb tests start stopped
  gdbstub: more fixes for client Ctrl-C handling
  gdbstub: don't complain about preemptive ACK chars

Richard Henderson (1):
  accel/tcg: Avoid reading too much in load_atom_{2,4}

 gdbstub/gdbstub.c | 10 --
 tests/tcg/aarch64/lse2-fault.c| 38 ++
 accel/tcg/ldst_atomicity.c.inc| 10 --
 gdbstub/trace-events  |  1 +
 tests/guest-debug/run-test.py |  9 +++--
 tests/tcg/aarch64/Makefile.target |  2 +-
 6 files changed, 59 insertions(+), 11 deletions(-)
 create mode 100644 tests/tcg/aarch64/lse2-fault.c

Re: [PATCH 2/2] riscv: zicond: make default

2023-08-10 Thread Alistair Francis

On Tue, Aug 8, 2023 at 6:10 PM Vineet Gupta  wrote:
>
>
>
> On 8/8/23 14:06, Daniel Henrique Barboza wrote:
> > (CCing Alistair and other reviewers)
> >
> > On 8/8/23 15:17, Vineet Gupta wrote:
> >> Again this helps with better testing and something qemu has been doing
> >> with newer features anyways.
> >>
> >> Signed-off-by: Vineet Gupta 
> >> ---
> >
> > Even if we can reach a consensus about removing the experimental (x-
> > prefix) status
> > from an extension that is Frozen instead of ratified, enabling stuff
> > in the default
> > CPUs because it's easier to test is something we would like to avoid.
> > The rv64
> > CPU has a random set of extensions enabled for the most different and
> > undocumented
> > reasons, and users don't know what they'll get because we keep beefing
> > up the
> > generic CPUs arbitrarily.

The idea was to enable "most" extensions for the virt machine. It's a
bit wishy-washy, but the idea was to enable as much as possible by
default on the virt machine, as long as it doesn't conflict. The goal
being to allow users to get the "best" experience as all their
favourite extensions are enabled.

It's harder to do in practice, so we are in a weird state where users
don't know what is and isn't enabled.

We probably want to revisit this. We should try to enable what is
useful for users and make it clear what is and isn't enabled. I'm not
clear on how best to do that though.

Again, I think this comes back to we need to version the virt machine.
I might do that as a starting point, that allows us to make changes in
a clear way.

>
> I understand this position given the arbitrary nature of gazillion
> extensions. However pragmatically things like bitmanip and zicond are so
> fundamental it would be strange for designs to not have them, in a few
> years. Besides these don't compete or conflict with other extensions.
> But on face value it is indeed possible for vendors to drop them for
> various reasons or no-reasons.
>
> But having the x- dropped is good enough for our needs as there's
> already mechanisms to enable the toggles from elf attributes.
>
> >
> > Starting on QEMU 8.2 we'll have a 'max' CPU type that will enable all
> > non-experimental
> > and non-vendor extensions by default, making it easier for tooling to
> > test new
> > features/extensions. All tooling should consider changing their
> > scripts to use the
> > 'max' CPU when it's available.
>
> That would be great.

The max CPU helps, but I do feel that the default should allow users
to experience as many RISC-V extensions/features as practical.

Alistair

>
> >
> > For now, I fear that gcc and friends will still need to enable
> > 'zicond' in the command
> > line via 'zicond=true'.  Thanks,
>
> Thx,
> -Vineet
>

Re: [PATCH v2] gdbstub: fixes cases where wrong threads were reported to GDB on SIGINT

2023-08-10 Thread Alex Bennée



Matheus Branco Borella  writes:

> Alex Bennée  writes:
>> Can gdb switch which packet sequence it uses to halt and restart
>> threads?
>
> Yes, but the way it does it does not trigger the behavior I was concerned 
> about. GDB falls back to the old sequence when either (1) the target does not
> support the vCont command it's trying to send or (2) you step backwards. In 
> both
> cases, though, whenever it does fall back, it will first send an Hc packet 
> before continuing or stepping, which means we won't ever see a sequence such 
> as
> ["Hc", "vCont;c:*", "c"]. This means, in short, that, while the shortcoming 
> does
> exist in the code, GDB never actually triggers it.
>
>> The test I would like see is pretty much your test case
>> 
>>  - load a multi-threaded program
>>  - wait until threads running
>>  - pause
>>  - resume thread
>>  - check resumed thread was the right one
>
> What I have here should be pretty much that. 
>
> Is there something else you think I'm missing?
>
> ---
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1725
>
> This fix is implemented by having the vCont handler set the value of
> `gdbserver_state.c_cpu` if any threads are to be resumed. The specific CPU
> picked is arbitrarily from the ones to be resumed, but it should be okay, as 
> all
> GDB cares about is that it is a resumed thread.
>
> Signed-off-by: Matheus Branco Borella 

Arg the commit message is in the --- discard section.

Queued to for-8.1/misc-fixes, thanks.

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

Re: [PATCH v6 11/12] avocado, risc-v: add opensbi tests for 'max' CPU

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:33 PM Daniel Henrique Barboza
 wrote:
>
> Add smoke tests to ensure that we'll not break the 'max' CPU type when
> adding new ratified extensions to be enabled.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  tests/avocado/riscv_opensbi.py | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/tests/avocado/riscv_opensbi.py b/tests/avocado/riscv_opensbi.py
> index bfff9cc3c3..15fd57fe51 100644
> --- a/tests/avocado/riscv_opensbi.py
> +++ b/tests/avocado/riscv_opensbi.py
> @@ -61,3 +61,19 @@ def test_riscv64_virt(self):
>  :avocado: tags=machine:virt
>  """
>  self.boot_opensbi()
> +
> +def test_riscv32_virt_maxcpu(self):
> +"""
> +:avocado: tags=arch:riscv32
> +:avocado: tags=machine:virt
> +:avocado: tags=cpu:max
> +"""
> +self.boot_opensbi()
> +
> +def test_riscv64_virt_maxcpu(self):
> +"""
> +:avocado: tags=arch:riscv64
> +:avocado: tags=machine:virt
> +:avocado: tags=cpu:max
> +"""
> +self.boot_opensbi()
> --
> 2.41.0
>
>

Re: [PATCH v6 10/12] target/riscv: add 'max' CPU type

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:53 PM Daniel Henrique Barboza
 wrote:
>
> The 'max' CPU type is used by tooling to determine what's the most
> capable CPU a current QEMU version implements. Other archs such as ARM
> implements this type. Let's add it to RISC-V.
>
> What we consider "most capable CPU" in this context are related to
> ratified, non-vendor extensions. This means that we want the 'max' CPU
> to enable all (possible) ratified extensions by default. The reasoning
> behind this design is (1) vendor extensions can conflict with each other
> and we won't play favorities deciding which one is default or not and
> (2) non-ratified extensions are always prone to changes, not being
> stable enough to be enabled by default.
>
> All this said, we're still not able to enable all ratified extensions
> due to conflicts between them. Zfinx and all its dependencies aren't
> enabled because of a conflict with RVF. zce, zcmp and zcmt are also
> disabled due to RVD conflicts. When running with 64 bits we're also
> disabling zcf.
>
> MISA bits RVG, RVJ and RVV are also being set manually since they're
> default disabled.
>
> This is the resulting 'riscv,isa' DT for this new CPU:
>
> rv64imafdcvh_zicbom_zicboz_zicsr_zifencei_zihintpause_zawrs_zfa_
> zfh_zfhmin_zca_zcb_zcd_zba_zbb_zbc_zbkb_zbkc_zbkx_zbs_zk_zkn_zknd_
> zkne_zknh_zkr_zks_zksed_zksh_zkt_zve32f_zve64f_zve64d_
> smstateen_sscofpmf_sstc_svadu_svinval_svnapot_svpbmt
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Weiwei Li 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu-qom.h |  1 +
>  target/riscv/cpu.c | 56 ++
>  2 files changed, 57 insertions(+)
>
> diff --git a/target/riscv/cpu-qom.h b/target/riscv/cpu-qom.h
> index 04af50983e..f3fbe37a2c 100644
> --- a/target/riscv/cpu-qom.h
> +++ b/target/riscv/cpu-qom.h
> @@ -30,6 +30,7 @@
>  #define CPU_RESOLVING_TYPE TYPE_RISCV_CPU
>
>  #define TYPE_RISCV_CPU_ANY  RISCV_CPU_TYPE_NAME("any")
> +#define TYPE_RISCV_CPU_MAX  RISCV_CPU_TYPE_NAME("max")
>  #define TYPE_RISCV_CPU_BASE32   RISCV_CPU_TYPE_NAME("rv32")
>  #define TYPE_RISCV_CPU_BASE64   RISCV_CPU_TYPE_NAME("rv64")
>  #define TYPE_RISCV_CPU_BASE128  RISCV_CPU_TYPE_NAME("x-rv128")
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 01b0d228f5..3e840f1a20 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -250,6 +250,7 @@ static const char * const riscv_intr_names[] = {
>  };
>
>  static void riscv_cpu_add_user_properties(Object *obj);
> +static void riscv_init_max_cpu_extensions(Object *obj);
>
>  const char *riscv_cpu_get_trap_name(target_ulong cause, bool async)
>  {
> @@ -376,6 +377,25 @@ static void riscv_any_cpu_init(Object *obj)
>  cpu->cfg.pmp = true;
>  }
>
> +static void riscv_max_cpu_init(Object *obj)
> +{
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +CPURISCVState *env = >env;
> +RISCVMXL mlx = MXL_RV64;
> +
> +#ifdef TARGET_RISCV32
> +mlx = MXL_RV32;
> +#endif
> +set_misa(env, mlx, 0);
> +riscv_cpu_add_user_properties(obj);
> +riscv_init_max_cpu_extensions(obj);
> +env->priv_ver = PRIV_VERSION_LATEST;
> +#ifndef CONFIG_USER_ONLY
> +set_satp_mode_max_supported(RISCV_CPU(obj), mlx == MXL_RV32 ?
> +VM_1_10_SV32 : VM_1_10_SV57);
> +#endif
> +}
> +
>  #if defined(TARGET_RISCV64)
>  static void rv64_base_cpu_init(Object *obj)
>  {
> @@ -1961,6 +1981,41 @@ static void riscv_cpu_add_user_properties(Object *obj)
>  ADD_CPU_QDEV_PROPERTIES_ARRAY(dev, riscv_cpu_experimental_exts);
>  }
>
> +/*
> + * The 'max' type CPU will have all possible ratified
> + * non-vendor extensions enabled.
> + */
> +static void riscv_init_max_cpu_extensions(Object *obj)
> +{
> +RISCVCPU *cpu = RISCV_CPU(obj);
> +CPURISCVState *env = >env;
> +
> +/* Enable RVG, RVJ and RVV that are disabled by default */
> +set_misa(env, env->misa_mxl, env->misa_ext | RVG | RVJ | RVV);
> +
> +for (int i = 0; i < ARRAY_SIZE(riscv_cpu_extensions); i++) {
> +object_property_set_bool(obj, riscv_cpu_extensions[i].name,
> + true, NULL);
> +}
> +
> +/* set vector version */
> +env->vext_ver = VEXT_VERSION_1_00_0;
> +
> +/* Zfinx is not compatible with F. Disable it */
> +object_property_set_bool(obj, "zfinx", false, NULL);
> +object_property_set_bool(obj, "zdinx", false, NULL);
> +object_property_set_bool(obj, "zhinx", false, NULL);
> +object_property_set_bool(obj, "zhinxmin", false, NULL);
> +
> +object_property_set_bool(obj, "zce", false, NULL);
> +object_property_set_bool(obj, "zcmp", false, NULL);
> +object_property_set_bool(obj, "zcmt", false, NULL);
> +
> +if (env->misa_mxl != MXL_RV32) {
> +object_property_set_bool(obj, "zcf", false, NULL);
> +}
> +}
> +
>  static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("debug", RISCVCPU, cfg.debug, true),
>
> @@

Re: [PATCH v6 09/12] target/riscv/cpu.c: limit cfg->vext_spec log message

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:22 PM Daniel Henrique Barboza
 wrote:
>
> Inside riscv_cpu_validate_v() we're always throwing a log message if the
> user didn't set a vector version via 'vext_spec'.
>
> We're going to include one case with the 'max' CPU where env->vext_ver
> will be set in the cpu_init(). But that alone will not stop the "vector
> version is not specified" message from appearing. The usefulness of this
> log message is debatable for the generic CPUs, but for a 'max' CPU type,
> where we are supposed to deliver a CPU model with all features possible,
> it's strange to force users to set 'vext_spec' to get rid of this
> message.
>
> Change riscv_cpu_validate_v() to not throw this log message if
> env->vext_ver is already set.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 9 -
>  1 file changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 08f61ed051..01b0d228f5 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -939,8 +939,6 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
> disassemble_info *info)
>  static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
>   Error **errp)
>  {
> -int vext_version = VEXT_VERSION_1_00_0;
> -
>  if (!is_power_of_2(cfg->vlen)) {
>  error_setg(errp, "Vector extension VLEN must be power of 2");
>  return;
> @@ -963,17 +961,18 @@ static void riscv_cpu_validate_v(CPURISCVState *env, 
> RISCVCPUConfig *cfg,
>  }
>  if (cfg->vext_spec) {
>  if (!g_strcmp0(cfg->vext_spec, "v1.0")) {
> -vext_version = VEXT_VERSION_1_00_0;
> +env->vext_ver = VEXT_VERSION_1_00_0;
>  } else {
>  error_setg(errp, "Unsupported vector spec version '%s'",
> cfg->vext_spec);
>  return;
>  }
> -} else {
> +} else if (env->vext_ver == 0) {
>  qemu_log("vector version is not specified, "
>   "use the default value v1.0\n");
> +
> +env->vext_ver = VEXT_VERSION_1_00_0;
>  }
> -env->vext_ver = vext_version;
>  }
>
>  static void riscv_cpu_validate_priv_spec(RISCVCPU *cpu, Error **errp)
> --
> 2.41.0
>
>

Re: [PATCH v4 9/9] docs/system: add basic virtio-gpu documentation

2023-08-10 Thread Gurchetan Singh

On Wed, Aug 9, 2023 at 11:55 PM Akihiko Odaki 
wrote:

> On 2023/08/10 10:11, Gurchetan Singh wrote:
> >
> >
> > On Tue, Aug 8, 2023 at 10:18 PM Akihiko Odaki  > > wrote:
> >
> > On 2023/08/09 11:11, Gurchetan Singh wrote:
> >  > This adds basic documentation for virtio-gpu.
> >  >
> >  > Suggested-by: Akihiko Odaki  > >
> >  > Signed-off-by: Gurchetan Singh  > >
> >  > ---
> >  > v2: - Incorporated suggestions by Akihiko Odaki
> >  >  - Listed the currently supported capset_names (Bernard)
> >  >
> >  > v3: - Incorporated suggestions by Akihiko Odaki and Alyssa Ross
> >  >
> >  > v4: - Incorporated suggestions by Akihiko Odaki
> >  >
> >  >   docs/system/device-emulation.rst   |   1 +
> >  >   docs/system/devices/virtio-gpu.rst | 115
> > +
> >  >   2 files changed, 116 insertions(+)
> >  >   create mode 100644 docs/system/devices/virtio-gpu.rst
> >  >
> >  > diff --git a/docs/system/device-emulation.rst
> > b/docs/system/device-emulation.rst
> >  > index 4491c4cbf7..1167f3a9f2 100644
> >  > --- a/docs/system/device-emulation.rst
> >  > +++ b/docs/system/device-emulation.rst
> >  > @@ -91,6 +91,7 @@ Emulated Devices
> >  >  devices/nvme.rst
> >  >  devices/usb.rst
> >  >  devices/vhost-user.rst
> >  > +   devices/virtio-gpu.rst
> >  >  devices/virtio-pmem.rst
> >  >  devices/vhost-user-rng.rst
> >  >  devices/canokey.rst
> >  > diff --git a/docs/system/devices/virtio-gpu.rst
> > b/docs/system/devices/virtio-gpu.rst
> >  > new file mode 100644
> >  > index 00..d56524270d
> >  > --- /dev/null
> >  > +++ b/docs/system/devices/virtio-gpu.rst
> >  > @@ -0,0 +1,115 @@
> >  > +..
> >  > +   SPDX-License-Identifier: GPL-2.0
> >  > +
> >  > +virtio-gpu
> >  > +==
> >  > +
> >  > +This document explains the setup and usage of the virtio-gpu
> device.
> >  > +The virtio-gpu device paravirtualizes the GPU and display
> > controller.
> >  > +
> >  > +Linux kernel support
> >  > +
> >  > +
> >  > +virtio-gpu requires a guest Linux kernel built with the
> >  > +``CONFIG_DRM_VIRTIO_GPU`` option.
> >  > +
> >  > +QEMU virtio-gpu variants
> >  > +
> >  > +
> >  > +QEMU virtio-gpu device variants come in the following form:
> >  > +
> >  > + * ``virtio-vga[-BACKEND]``
> >  > + * ``virtio-gpu[-BACKEND][-INTERFACE]``
> >  > + * ``vhost-user-vga``
> >  > + * ``vhost-user-pci``
> >  > +
> >  > +**Backends:** QEMU provides a 2D virtio-gpu backend, and two
> > accelerated
> >  > +backends: virglrenderer ('gl' device label) and rutabaga_gfx
> > ('rutabaga'
> >  > +device label).  There is a vhost-user backend that runs the
> > graphics stack
> >  > +in a separate process for improved isolation.
> >  > +
> >  > +**Interfaces:** QEMU further categorizes virtio-gpu device
> > variants based
> >  > +on the interface exposed to the guest. The interfaces can be
> > classified
> >  > +into VGA and non-VGA variants. The VGA ones are prefixed with
> > virtio-vga
> >  > +or vhost-user-vga while the non-VGA ones are prefixed with
> > virtio-gpu or
> >  > +vhost-user-gpu.
> >  > +
> >  > +The VGA ones always use the PCI interface, but for the non-VGA
> > ones, the
> >  > +user can further pick between MMIO or PCI. For MMIO, the user
> > can suffix
> >  > +the device name with -device, though vhost-user-gpu does not
> > support MMIO.
> >  > +For PCI, the user can suffix it with -pci. Without these
> > suffixes, the
> >  > +platform default will be chosen.
> >  > +
> >  > +This document uses the PCI interface in examples.
> >
> > I think it's better to omit -pci.
> >
> >
> > Are you suggesting to use "-device virtio-gpu-rutabaga" or "-device
> > virtio-gpu-gl" in the examples?  Or "-device virtio-gpu-rutabaga-device"
> > or "-device virtio-gpu-gl-device"?  The former I believe wouldn't
> > launch, and the examples should ideally be directly applicable to a user.
> >
> >
> > By the way you are not adding the aliases for Rutabaga so please do
> so.
> > You can find the table in: softmmu/qdev-monitor.c
> >
> >
> > I don't follow this comment.  Isn't "-device virtio-gpu-rutabaga-pci"
> > (along with "-device virtio-gpu-rutabaga-device") an alias for the
> > rutabaga device?  Where would the alias be placed in the doc (we don't
> > explicitly list aliases for other devices either), outside the
> > "..parsed-literal::" launch command?
>
> virtio-gpu-gl should work, and you need add an alias definition to get
> virtio-gpu-rutabaga work.
>

I see the

Re: [PATCH v6 12/12] target/riscv: deprecate the 'any' CPU type

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:39 PM Daniel Henrique Barboza
 wrote:
>
> The 'any' CPU type was introduced in commit dc5bd18fa5725 ("RISC-V CPU
> Core Definition"), being around since the beginning. It's not an easy
> CPU to use: it's undocumented and its name doesn't tell users much about
> what the CPU is supposed to bring. 'git log' doesn't help us either in
> knowing what was the original design of this CPU type.
>
> The closest we have is a comment from Alistair [1] where he recalls from
> memory that the 'any' CPU is supposed to behave like the newly added
> 'max' CPU. He also suggested that the 'any' CPU should be removed.
>
> The default CPUs are rv32 and rv64, so removing the 'any' CPU will have
> impact only on users that might have a script that uses '-cpu any'.
> And those users are better off using the default CPUs or the new 'max'
> CPU.
>
> We would love to just remove the code and be done with it, but one does
> not simply remove a feature in QEMU. We'll put the CPU in quarantine
> first, letting users know that we have the intent of removing it in the
> future.
>
> [1] https://lists.gnu.org/archive/html/qemu-devel/2023-07/msg02891.html
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  docs/about/deprecated.rst | 12 
>  target/riscv/cpu.c|  5 +
>  2 files changed, 17 insertions(+)
>
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 02ea5a839f..68afa43fd0 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -371,6 +371,18 @@ QEMU's ``vhost`` feature, which would eliminate the high 
> latency costs under
>  which the 9p ``proxy`` backend currently suffers. However as of to date 
> nobody
>  has indicated plans for such kind of reimplemention unfortunately.
>
> +RISC-V 'any' CPU type ``-cpu any`` (since 8.2)
> +^^
> +
> +The 'any' CPU type was introduced back in 2018 and has been around since the
> +initial RISC-V QEMU port. Its usage has always been unclear: users don't know
> +what to expect from a CPU called 'any', and in fact the CPU does not do 
> anything
> +special that aren't already done by the default CPUs rv32/rv64.
> +
> +After the introduction of the 'max' CPU type RISC-V now has a good coverage
> +of generic CPUs: rv32 and rv64 as default CPUs and 'max' as a feature 
> complete
> +CPU for both 32 and 64 bit builds. Users are then discouraged to use the 
> 'any'
> +CPU type starting in 8.2.
>
>  Block device options
>  
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 3e840f1a20..b5a2266eef 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1477,6 +1477,11 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
> **errp)
>  RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
>  Error *local_err = NULL;
>
> +if (object_dynamic_cast(OBJECT(dev), TYPE_RISCV_CPU_ANY) != NULL) {
> +warn_report("The 'any' CPU is deprecated and will be "
> +"removed in the future.");
> +}
> +
>  cpu_exec_realizefn(cs, _err);
>  if (local_err != NULL) {
>  error_propagate(errp, local_err);
> --
> 2.41.0
>
>

Re: [PATCH] hw/pci-host: Allow extended config space access for Designware PCIe host

2023-08-10 Thread Michael S. Tsirkin

On Wed, Aug 09, 2023 at 10:22:50AM +, Jason Chien wrote:
> In pcie_bus_realize(), a root bus is realized as a PCIe bus and a non-root
> bus is realized as a PCIe bus if its parent bus is a PCIe bus. However,
> the child bus "dw-pcie" is realized before the parent bus "pcie" which is
> the root PCIe bus. Thus, the extended configuration space is not accessible
> on "dw-pcie". The issue can be resolved by adding the
> PCI_BUS_EXTENDED_CONFIG_SPACE flag to "pcie" before "dw-pcie" is realized.
> 
> Signed-off-by: Jason Chien 


Acked-by: Michael S. Tsirkin 

I'm not planning another pull before release, hopefully
another maintainer can pick it up? Peter?

> ---
>  hw/pci-host/designware.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
> index 9e183caa48..388d252ee2 100644
> --- a/hw/pci-host/designware.c
> +++ b/hw/pci-host/designware.c
> @@ -694,6 +694,7 @@ static void designware_pcie_host_realize(DeviceState 
> *dev, Error **errp)
>   >pci.io,
>   0, 4,
>   TYPE_PCIE_BUS);
> +pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
>  
>  memory_region_init(>pci.address_space_root,
> OBJECT(s),
> -- 
> 2.17.1

Re: [PATCH v6 04/12] target/riscv/cpu.c: del DEFINE_PROP_END_OF_LIST() from riscv_cpu_extensions

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:20 PM Daniel Henrique Barboza
 wrote:
>
> This last blank element is used by the 'for' loop to check if a property
> has a valid name.
>
> Remove it and use ARRAY_SIZE() instead like riscv_cpu_options is already
> using. All future arrays will also do the same and we'll able to
> encapsulate more repetitions in macros later on.

Is this the right approach? This seem different to the rest of QEMU

Alistair

>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Weiwei Li 
> ---
>  target/riscv/cpu.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index f1a292d967..33a2e9328c 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1842,8 +1842,6 @@ static Property riscv_cpu_extensions[] = {
>  DEFINE_PROP_BOOL("x-zfbfmin", RISCVCPU, cfg.ext_zfbfmin, false),
>  DEFINE_PROP_BOOL("x-zvfbfmin", RISCVCPU, cfg.ext_zvfbfmin, false),
>  DEFINE_PROP_BOOL("x-zvfbfwma", RISCVCPU, cfg.ext_zvfbfwma, false),
> -
> -DEFINE_PROP_END_OF_LIST(),
>  };
>
>  static Property riscv_cpu_options[] = {
> @@ -1901,14 +1899,13 @@ static void riscv_cpu_add_kvm_unavail_prop(Object 
> *obj, const char *prop_name)
>
>  static void riscv_cpu_add_kvm_properties(Object *obj)
>  {
> -Property *prop;
>  DeviceState *dev = DEVICE(obj);
>
>  kvm_riscv_init_user_properties(obj);
>  riscv_cpu_add_misa_properties(obj);
>
> -for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
> -riscv_cpu_add_kvm_unavail_prop(obj, prop->name);
> +for (int i = 0; i < ARRAY_SIZE(riscv_cpu_extensions); i++) {
> +riscv_cpu_add_kvm_unavail_prop(obj, riscv_cpu_extensions[i].name);
>  }
>
>  for (int i = 0; i < ARRAY_SIZE(riscv_cpu_options); i++) {
> @@ -1929,7 +1926,6 @@ static void riscv_cpu_add_kvm_properties(Object *obj)
>   */
>  static void riscv_cpu_add_user_properties(Object *obj)
>  {
> -Property *prop;
>  DeviceState *dev = DEVICE(obj);
>
>  #ifndef CONFIG_USER_ONLY
> @@ -1943,8 +1939,8 @@ static void riscv_cpu_add_user_properties(Object *obj)
>
>  riscv_cpu_add_misa_properties(obj);
>
> -for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
> -qdev_property_add_static(dev, prop);
> +for (int i = 0; i < ARRAY_SIZE(riscv_cpu_extensions); i++) {
> +qdev_property_add_static(dev, _cpu_extensions[i]);
>  }
>
>  for (int i = 0; i < ARRAY_SIZE(riscv_cpu_options); i++) {
> --
> 2.41.0
>
>

Re: [PATCH] hw/pci-host: Allow extended config space access for Designware PCIe host

2023-08-10 Thread Jason Chien

the patch
 link:
https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg02162.html

On Fri, Aug 11, 2023 at 1:44 AM Michael S. Tsirkin  wrote:

> On Fri, Aug 11, 2023 at 01:22:08AM +0800, Jason Chien wrote:
> > As far as I know, the order issue is caused by nested device
> realization. In
> > this case, realizing TYPE_DESIGNWARE_PCIE_HOST will also
> > realize TYPE_DESIGNWARE_PCIE_ROOT(see designware_pcie_host_realize()).
> > device_set_realized() is the function that realizing a device must go
> through,
> > and this function first realizes the device by dc->realize() and then
> realizes
> > the device's child bus by qbus_realize(). Whether there is any child bus
> of the
> > device may depend on dc->realize(). The realization flow will be like a
> > recursive call to device_set_realized(). More precisely, the flow in
> this case
> > is: qdev_realize() --> ... --> FIRST device_set_realized() --> FIRST dc->
> > realize() --> ... --> designware_pcie_host_realize() --> qdev_realize()
> --> ...
> > --> SECOND device_set_realized() --> SECOND dc->realize() --> ... -->
> >  designware_pcie_root_realize() --> ...--> back to the SECOND
> > device_set_realized() --> SECOND qbus_realize() the CHILD bus "dw-pcie"
> --> ...
> > --> back to the FIRST device_set_realized() --> FIRST qbus_realize() the
> PARENT
> > bus "pcie".
> >
> > I also found this patch that solves the same bus issue.
>
> Which patch?
>
> > Do you have any suggestions on the order of realization? Thanks!
>
>
> I see. It's not easy to fix. Worth thinking about but I guess your
> patch is ok for now.
>
> > On Thu, Aug 10, 2023 at 5:24 AM Michael S. Tsirkin 
> wrote:
> >
> > On Wed, Aug 09, 2023 at 10:22:50AM +, Jason Chien wrote:
> > > In pcie_bus_realize(), a root bus is realized as a PCIe bus and a
> > non-root
> > > bus is realized as a PCIe bus if its parent bus is a PCIe bus.
> However,
> > > the child bus "dw-pcie" is realized before the parent bus "pcie"
> which is
> > > the root PCIe bus. Thus, the extended configuration space is not
> > accessible
> > > on "dw-pcie". The issue can be resolved by adding the
> > > PCI_BUS_EXTENDED_CONFIG_SPACE flag to "pcie" before "dw-pcie" is
> > realized.
> > >
> > > Signed-off-by: Jason Chien 
> >
> > I think we should fix the order of initialization rather than
> > hack around it.
> >
> > > ---
> > >  hw/pci-host/designware.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
> > > index 9e183caa48..388d252ee2 100644
> > > --- a/hw/pci-host/designware.c
> > > +++ b/hw/pci-host/designware.c
> > > @@ -694,6 +694,7 @@ static void
> designware_pcie_host_realize(DeviceState
> > *dev, Error **errp)
> > >   >pci.io,
> > >   0, 4,
> > >   TYPE_PCIE_BUS);
> > > +pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
> > >
> > >  memory_region_init(>pci.address_space_root,
> > > OBJECT(s),
> > > --
> > > 2.17.1
> >
> >
>
>

[PATCH v5 11/17] nbd/client: Plumb errp through nbd_receive_replies

2023-08-10 Thread Eric Blake

Instead of ignoring the low-level error just to refabricate our own
message to pass to the caller, we can just plumb the caller's errp
down to the low level.

Signed-off-by: Eric Blake 
---

v5: set errp on more failure cases [Vladimir], typo fix

v4: new patch [Vladimir]
---
 block/nbd.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 57123c17f94..4b60b832b70 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -417,7 +417,8 @@ static void coroutine_fn GRAPH_RDLOCK 
nbd_reconnect_attempt(BDRVNBDState *s)
 reconnect_delay_timer_del(s);
 }

-static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie)
+static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t cookie,
+Error **errp)
 {
 int ret;
 uint64_t ind = COOKIE_TO_INDEX(cookie), ind2;
@@ -458,20 +459,25 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, NULL);
-if (ret <= 0) {
-ret = ret ? ret : -EIO;
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+if (ret == 0) {
+ret = -EIO;
+error_setg(errp, "server dropped connection");
+}
+if (ret < 0) {
 nbd_channel_error(s, ret);
 return ret;
 }
 if (nbd_reply_is_structured(>reply) &&
 s->info.mode < NBD_MODE_STRUCTURED) {
 nbd_channel_error(s, -EINVAL);
+error_setg(errp, "unexpected structured reply");
 return -EINVAL;
 }
 ind2 = COOKIE_TO_INDEX(s->reply.cookie);
 if (ind2 >= MAX_NBD_REQUESTS || !s->requests[ind2].coroutine) {
 nbd_channel_error(s, -EINVAL);
+error_setg(errp, "unexpected cookie value");
 return -EINVAL;
 }
 if (s->reply.cookie == cookie) {
@@ -843,9 +849,9 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }
 *request_ret = 0;

-ret = nbd_receive_replies(s, cookie);
+ret = nbd_receive_replies(s, cookie, errp);
 if (ret < 0) {
-error_setg(errp, "Connection closed");
+error_prepend(errp, "Connection closed: ");
 return -EIO;
 }
 assert(s->ioc);
-- 
2.41.0

[PATCH v5 07/17] nbd/server: Prepare to receive extended header requests

2023-08-10 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to accept extended requests for all messages.  Previous patches
have already taken care of supporting 64-bit lengths, now we just need
to read it off the wire.

Note that this implementation will block indefinitely on a buggy
client that sends a non-extended payload (that is, we try to read a
full packet before we ever check the magic number, but a client that
mistakenly sends a simple request after negotiating extended headers
doesn't send us enough bytes), but it's no different from any other
client that stops talking to us partway through a packet and thus not
worth coding around.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: no change

v4: new patch, split out from v3 9/14
---
 nbd/nbd-internal.h |  5 -
 nbd/server.c   | 43 ++-
 2 files changed, 34 insertions(+), 14 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 133b1d94b50..dfa02f77ee4 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -34,8 +34,11 @@
  * https://github.com/yoe/nbd/blob/master/doc/proto.md
  */

-/* Size of all NBD_OPT_*, without payload */
+/* Size of all compact NBD_CMD_*, without payload */
 #define NBD_REQUEST_SIZE(4 + 2 + 2 + 8 + 8 + 4)
+/* Size of all extended NBD_CMD_*, without payload */
+#define NBD_EXTENDED_REQUEST_SIZE   (4 + 2 + 2 + 8 + 8 + 8)
+
 /* Size of all NBD_REP_* sent in answer to most NBD_OPT_*, without payload */
 #define NBD_REPLY_SIZE  (4 + 4 + 8)
 /* Size of reply to NBD_OPT_EXPORT_NAME */
diff --git a/nbd/server.c b/nbd/server.c
index 9b7fb3c55ae..566afe9527c 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1415,11 +1415,13 @@ nbd_read_eof(NBDClient *client, void *buffer, size_t 
size, Error **errp)
 static int coroutine_fn nbd_receive_request(NBDClient *client, NBDRequest 
*request,
 Error **errp)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
-uint32_t magic;
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+uint32_t magic, expect;
 int ret;
+size_t size = client->mode >= NBD_MODE_EXTENDED ?
+NBD_EXTENDED_REQUEST_SIZE : NBD_REQUEST_SIZE;

-ret = nbd_read_eof(client, buf, sizeof(buf), errp);
+ret = nbd_read_eof(client, buf, size, errp);
 if (ret < 0) {
 return ret;
 }
@@ -1427,13 +1429,21 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 return -EIO;
 }

-/* Request
-   [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
-   [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
-   [ 6 ..  7]   type(NBD_CMD_READ, ...)
-   [ 8 .. 15]   cookie
-   [16 .. 23]   from
-   [24 .. 27]   len
+/*
+ * Compact request
+ *  [ 0 ..  3]   magic   (NBD_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 27]   len
+ * Extended request
+ *  [ 0 ..  3]   magic   (NBD_EXTENDED_REQUEST_MAGIC)
+ *  [ 4 ..  5]   flags   (NBD_CMD_FLAG_FUA, NBD_CMD_FLAG_PAYLOAD_LEN, ...)
+ *  [ 6 ..  7]   type(NBD_CMD_READ, ...)
+ *  [ 8 .. 15]   cookie
+ *  [16 .. 23]   from
+ *  [24 .. 31]   len
  */

 magic = ldl_be_p(buf);
@@ -1441,13 +1451,20 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 request->type   = lduw_be_p(buf + 6);
 request->cookie = ldq_be_p(buf + 8);
 request->from   = ldq_be_p(buf + 16);
-request->len= ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+if (client->mode >= NBD_MODE_EXTENDED) {
+request->len = ldq_be_p(buf + 24);
+expect = NBD_EXTENDED_REQUEST_MAGIC;
+} else {
+request->len = ldl_be_p(buf + 24); /* widen 32 to 64 bits */
+expect = NBD_REQUEST_MAGIC;
+}

 trace_nbd_receive_request(magic, request->flags, request->type,
   request->from, request->len);

-if (magic != NBD_REQUEST_MAGIC) {
-error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
+if (magic != expect) {
+error_setg(errp, "invalid magic (got 0x%" PRIx32 ", expected 0x%"
+   PRIx32 ")", magic, expect);
 return -EINVAL;
 }
 return 0;
-- 
2.41.0

[PATCH v5 04/17] nbd: Prepare for 64-bit request effect lengths

2023-08-10 Thread Eric Blake

Widen the length field of NBDRequest to 64-bits, although we can
assert that all current uses are still under 32 bits: either because
of NBD_MAX_BUFFER_SIZE which is even smaller (and where size_t can
still be appropriate, even on 32-bit platforms), or because nothing
ever puts us into NBD_MODE_EXTENDED yet (and while future patches will
allow larger transactions, the lengths in play here are still capped
at 32-bit).  Thus no semantic change.

Signed-off-by: Eric Blake 
---

v5: tweak commit message, adjust a few more spots [Vladimir].

v4: split off enum changes to earlier patches [Vladimir]
---
 include/block/nbd.h |  4 ++--
 block/nbd.c | 25 +++--
 nbd/client.c|  1 +
 nbd/server.c| 21 ++---
 block/trace-events  |  2 +-
 nbd/trace-events| 14 +++---
 6 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index b2fb8ab44d5..ec4e8eda6bd 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -71,8 +71,8 @@ typedef enum NBDMode {
  */
 typedef struct NBDRequest {
 uint64_t cookie;
-uint64_t from;
-uint32_t len;
+uint64_t from;  /* Offset touched by the command */
+uint64_t len;   /* Effect length; 32 bit limit without extended headers */
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type;  /* NBD_CMD_* */
 NBDMode mode;   /* Determines which network representation to use */
diff --git a/block/nbd.c b/block/nbd.c
index c7581794873..57123c17f94 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1306,10 +1306,11 @@ nbd_client_co_pwrite_zeroes(BlockDriverState *bs, 
int64_t offset, int64_t bytes,
 NBDRequest request = {
 .type = NBD_CMD_WRITE_ZEROES,
 .from = offset,
-.len = bytes,  /* .len is uint32_t actually */
+.len = bytes,
 };

-assert(bytes <= UINT32_MAX); /* rely on max_pwrite_zeroes */
+/* rely on max_pwrite_zeroes */
+assert(bytes <= UINT32_MAX || s->info.mode >= NBD_MODE_EXTENDED);

 assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
 if (!(s->info.flags & NBD_FLAG_SEND_WRITE_ZEROES)) {
@@ -1356,10 +1357,11 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t 
offset, int64_t bytes)
 NBDRequest request = {
 .type = NBD_CMD_TRIM,
 .from = offset,
-.len = bytes, /* len is uint32_t */
+.len = bytes,
 };

-assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+/* rely on max_pdiscard */
+assert(bytes <= UINT32_MAX || s->info.mode >= NBD_MODE_EXTENDED);

 assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
 if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
@@ -1381,8 +1383,7 @@ static int coroutine_fn GRAPH_RDLOCK 
nbd_client_co_block_status(
 NBDRequest request = {
 .type = NBD_CMD_BLOCK_STATUS,
 .from = offset,
-.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
-   MIN(bytes, s->info.size - offset)),
+.len = MIN(bytes, s->info.size - offset),
 .flags = NBD_CMD_FLAG_REQ_ONE,
 };

@@ -1392,6 +1393,10 @@ static int coroutine_fn GRAPH_RDLOCK 
nbd_client_co_block_status(
 *file = bs;
 return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
 }
+if (s->info.mode < NBD_MODE_EXTENDED) {
+request.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
+  request.len);
+}

 /*
  * Work around the fact that the block layer doesn't do
@@ -1956,6 +1961,14 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;

+/*
+ * Assume that if the server supports extended headers, it also
+ * supports unlimited size zero and trim commands.
+ */
+if (s->info.mode >= NBD_MODE_EXTENDED) {
+bs->bl.max_pdiscard = bs->bl.max_pwrite_zeroes = 0;
+}
+
 if (s->info.opt_block &&
 s->info.opt_block > bs->bl.opt_transfer) {
 bs->bl.opt_transfer = s->info.opt_block;
diff --git a/nbd/client.c b/nbd/client.c
index 40a1eb72346..1495a9b0ab1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1355,6 +1355,7 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 uint8_t buf[NBD_REQUEST_SIZE];

 assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
+assert(request->len <= UINT32_MAX);
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));
diff --git a/nbd/server.c b/nbd/server.c
index 97403db2e07..db8f5943139 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1441,7 +1441,7 @@ static int coroutine_fn nbd_receive_request(NBDClient 
*client, NBDRequest *reque
 request->type   = lduw_be_p(buf + 6);
 request->cookie = ldq_be_p(buf + 8);
 request->from   = ldq_be_p(buf + 16);
-request->len= ldl_be_p(buf + 24);
+

[PATCH v5 08/17] nbd/server: Prepare to send extended header replies

2023-08-10 Thread Eric Blake

Although extended mode is not yet enabled, once we do turn it on, we
need to reply with extended headers to all messages.  Update the low
level entry points necessary so that all other callers automatically
get the right header based on the current mode.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: s/iov->iov_len/iov[0].iov_len/ [Vladimir], add R-b

v4: new patch, split out from v3 9/14
---
 nbd/server.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 566afe9527c..5c06a6466ec 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1950,8 +1950,6 @@ static inline void set_be_chunk(NBDClient *client, struct 
iovec *iov,
 size_t niov, uint16_t flags, uint16_t type,
 NBDRequest *request)
 {
-/* TODO - handle structured vs. extended replies */
-NBDStructuredReplyChunk *chunk = iov->iov_base;
 size_t i, length = 0;

 for (i = 1; i < niov; i++) {
@@ -1959,12 +1957,26 @@ static inline void set_be_chunk(NBDClient *client, 
struct iovec *iov,
 }
 assert(length <= NBD_MAX_BUFFER_SIZE + sizeof(NBDStructuredReadData));

-iov[0].iov_len = sizeof(*chunk);
-stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
-stw_be_p(>flags, flags);
-stw_be_p(>type, type);
-stq_be_p(>cookie, request->cookie);
-stl_be_p(>length, length);
+if (client->mode >= NBD_MODE_EXTENDED) {
+NBDExtendedReplyChunk *chunk = iov->iov_base;
+
+iov[0].iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_EXTENDED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stq_be_p(>offset, request->from);
+stq_be_p(>length, length);
+} else {
+NBDStructuredReplyChunk *chunk = iov->iov_base;
+
+iov[0].iov_len = sizeof(*chunk);
+stl_be_p(>magic, NBD_STRUCTURED_REPLY_MAGIC);
+stw_be_p(>flags, flags);
+stw_be_p(>type, type);
+stq_be_p(>cookie, request->cookie);
+stl_be_p(>length, length);
+}
 }

 static int coroutine_fn nbd_co_send_chunk_done(NBDClient *client,
@@ -2515,6 +2527,8 @@ static coroutine_fn int nbd_send_generic_reply(NBDClient 
*client,
 {
 if (client->mode >= NBD_MODE_STRUCTURED && ret < 0) {
 return nbd_co_send_chunk_error(client, request, -ret, error_msg, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+return nbd_co_send_chunk_done(client, request, errp);
 } else {
 return nbd_co_send_simple_reply(client, request, ret < 0 ? -ret : 0,
 NULL, 0, errp);
-- 
2.41.0

[PATCH v5 12/17] nbd/client: Initial support for extended headers

2023-08-10 Thread Eric Blake

Update the client code to be able to send an extended request, and
parse an extended header from the server.  Note that since we reject
any structured reply with a too-large payload, we can always normalize
a valid header back into the compact form, so that the caller need not
deal with two branches of a union.  Still, until a later patch lets
the client negotiate extended headers, the code added here should not
be reached.  Note that because of the different magic numbers, it is
just as easy to trace and then tolerate a non-compliant server sending
the wrong header reply as it would be to insist that the server is
compliant.

Signed-off-by: Eric Blake 
---

v5: fix logic bug on error reporting [Vladimir]

v4: split off errp handling to separate patch [Vladimir], better
function naming [Vladimir]
---
 include/block/nbd.h |   3 +-
 block/nbd.c |   2 +-
 nbd/client.c| 104 +---
 nbd/trace-events|   3 +-
 4 files changed, 74 insertions(+), 38 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index ec4e8eda6bd..4e9ce679e37 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -390,7 +390,8 @@ int nbd_init(int fd, QIOChannelSocket *sioc, NBDExportInfo 
*info,
  Error **errp);
 int nbd_send_request(QIOChannel *ioc, NBDRequest *request);
 int coroutine_fn nbd_receive_reply(BlockDriverState *bs, QIOChannel *ioc,
-   NBDReply *reply, Error **errp);
+   NBDReply *reply, NBDMode mode,
+   Error **errp);
 int nbd_client(int fd);
 int nbd_disconnect(int fd);
 int nbd_errno_to_system_errno(int err);
diff --git a/block/nbd.c b/block/nbd.c
index 4b60b832b70..d60782b25c7 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -459,7 +459,7 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie,

 /* We are under mutex and cookie is 0. We have to do the dirty work. */
 assert(s->reply.cookie == 0);
-ret = nbd_receive_reply(s->bs, s->ioc, >reply, errp);
+ret = nbd_receive_reply(s->bs, s->ioc, >reply, s->info.mode, errp);
 if (ret == 0) {
 ret = -EIO;
 error_setg(errp, "server dropped connection");
diff --git a/nbd/client.c b/nbd/client.c
index 1495a9b0ab1..e78d6c00f18 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1352,22 +1352,29 @@ int nbd_disconnect(int fd)

 int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
-uint8_t buf[NBD_REQUEST_SIZE];
+uint8_t buf[NBD_EXTENDED_REQUEST_SIZE];
+size_t len;

-assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
-assert(request->len <= UINT32_MAX);
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));

-stl_be_p(buf, NBD_REQUEST_MAGIC);
 stw_be_p(buf + 4, request->flags);
 stw_be_p(buf + 6, request->type);
 stq_be_p(buf + 8, request->cookie);
 stq_be_p(buf + 16, request->from);
-stl_be_p(buf + 24, request->len);
+if (request->mode >= NBD_MODE_EXTENDED) {
+stl_be_p(buf, NBD_EXTENDED_REQUEST_MAGIC);
+stq_be_p(buf + 24, request->len);
+len = NBD_EXTENDED_REQUEST_SIZE;
+} else {
+assert(request->len <= UINT32_MAX);
+stl_be_p(buf, NBD_REQUEST_MAGIC);
+stl_be_p(buf + 24, request->len);
+len = NBD_REQUEST_SIZE;
+}

-return nbd_write(ioc, buf, sizeof(buf), NULL);
+return nbd_write(ioc, buf, len, NULL);
 }

 /* nbd_receive_simple_reply
@@ -1394,30 +1401,36 @@ static int nbd_receive_simple_reply(QIOChannel *ioc, 
NBDSimpleReply *reply,
 return 0;
 }

-/* nbd_receive_structured_reply_chunk
+/* nbd_receive_reply_chunk_header
  * Read structured reply chunk except magic field (which should be already
- * read).
+ * read).  Normalize into the compact form.
  * Payload is not read.
  */
-static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
-  NBDStructuredReplyChunk *chunk,
-  Error **errp)
+static int nbd_receive_reply_chunk_header(QIOChannel *ioc, NBDReply *chunk,
+  Error **errp)
 {
 int ret;
+size_t len;
+uint64_t payload_len;

-assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+if (chunk->magic == NBD_STRUCTURED_REPLY_MAGIC) {
+len = sizeof(chunk->structured);
+} else {
+assert(chunk->magic == NBD_EXTENDED_REPLY_MAGIC);
+len = sizeof(chunk->extended);
+}

 ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
-   sizeof(*chunk) - sizeof(chunk->magic), "structured chunk",
+   len - sizeof(chunk->magic), "structured chunk",
errp);
 if (ret < 0) {
 return ret;
 }

-

[PATCH v5 06/17] nbd/server: Support a request payload

2023-08-10 Thread Eric Blake

Upcoming additions to support NBD 64-bit effect lengths allow for the
possibility to distinguish between payload length (capped at 32M) and
effect length (64 bits, although we generally assume 63 bits because
of off_t limitations).  Without that extension, only the NBD_CMD_WRITE
request has a payload; but with the extension, it makes sense to allow
at least NBD_CMD_BLOCK_STATUS to have both a payload and effect length
in a future patch (where the payload is a limited-size struct that in
turn gives the real effect length as well as a subset of known ids for
which status is requested).  Other future NBD commands may also have a
request payload, so the 64-bit extension introduces a new
NBD_CMD_FLAG_PAYLOAD_LEN that distinguishes between whether the header
length is a payload length or an effect length, rather than
hard-coding the decision based on the command; although a client
should never send a command with a payload without the negotiation
phase proving such extension is available, we are now able to
gracefully fail unexpected client payloads while keeping the
connection alive.  Note that we do not support the payload version of
BLOCK_STATUS yet.

Signed-off-by: Eric Blake 
---

v5: retitled from v4 13/24, rewrite on top of previous patch's switch
statement [Vladimir]

v4: less indentation on several 'if's [Vladimir]
---
 nbd/server.c | 33 -
 nbd/trace-events |  1 +
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 795f7c86781..9b7fb3c55ae 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2334,7 +2334,8 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
Error **errp)
 {
 NBDClient *client = req->client;
-bool check_length = false;
+bool extended_with_payload;
+bool check_length;
 bool check_rofs = false;
 bool allocate_buffer = false;
 unsigned payload_len = 0;
@@ -2350,6 +2351,9 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,

 trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
  nbd_cmd_lookup(request->type));
+check_length = extended_with_payload = client->mode >= NBD_MODE_EXTENDED &&
+request->flags & NBD_CMD_FLAG_PAYLOAD_LEN;
+
 switch (request->type) {
 case NBD_CMD_DISC:
 /* Special case: we're going to disconnect without a reply,
@@ -2366,6 +2370,14 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
 break;

 case NBD_CMD_WRITE:
+if (client->mode >= NBD_MODE_EXTENDED) {
+if (!extended_with_payload) {
+/* The client is noncompliant. Trace it, but proceed. */
+trace_nbd_co_receive_ext_payload_compliance(request->from,
+request->len);
+}
+valid_flags |= NBD_CMD_FLAG_PAYLOAD_LEN;
+}
 payload_len = request->len;
 check_length = true;
 allocate_buffer = true;
@@ -2407,6 +2419,15 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
request->len, NBD_MAX_BUFFER_SIZE);
 return -EINVAL;
 }
+if (extended_with_payload && !allocate_buffer) {
+/*
+ * For now, we don't support payloads on other commands; but
+ * we can keep the connection alive by ignoring the payload.
+ */
+assert(request->type != NBD_CMD_WRITE);
+payload_len = request->len;
+request->len = 0;
+}
 if (allocate_buffer) {
 /* READ, WRITE */
 req->data = blk_try_blockalign(client->exp->common.blk,
@@ -2417,10 +2438,12 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
 }
 }
 if (payload_len) {
-/* WRITE */
-assert(req->data);
-ret = nbd_read(client->ioc, req->data, payload_len,
-   "CMD_WRITE data", errp);
+if (req->data) {
+ret = nbd_read(client->ioc, req->data, payload_len,
+   "CMD_WRITE data", errp);
+} else {
+ret = nbd_drop(client->ioc, payload_len, errp);
+}
 if (ret < 0) {
 return -EIO;
 }
diff --git a/nbd/trace-events b/nbd/trace-events
index f9dccfcfb44..c1a3227613f 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -71,6 +71,7 @@ nbd_co_send_extents(uint64_t cookie, unsigned int extents, 
uint32_t id, uint64_t
 nbd_co_send_chunk_error(uint64_t cookie, int err, const char *errname, const 
char *msg) "Send structured error reply: cookie = %" PRIu64 ", error = %d (%s), 
msg = '%s'"
 nbd_co_receive_request_decode_type(uint64_t cookie, uint16_t type, const char 
*name) "Decoding type: cookie = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t cookie, uint64_t len) 
"Payload received: cookie =

[PATCH v5 13/17] nbd/client: Accept 64-bit block status chunks

2023-08-10 Thread Eric Blake

Once extended mode is enabled, we need to accept 64-bit status replies
(even for replies that don't exceed a 32-bit length).  It is easier to
normalize narrow replies into wide format so that the rest of our code
only has to handle one width.  Although a server is non-compliant if
it sends a 64-bit reply in compact mode, or a 32-bit reply in extended
mode, it is still easy enough to tolerate these mismatches.

In normal execution, we are only requesting "base:allocation" which
never exceeds 32 bits for flag values. But during testing with
x-dirty-bitmap, we can force qemu to connect to some other context
that might have 64-bit status bit; however, we ignore those upper bits
(other than mapping qemu:allocation-depth into something that
'qemu-img map --output=json' can expose), and since that only affects
testing, we really don't bother with checking whether more than the
two least-significant bits are set.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: factor out duplicate length calculation [Vladimir], add R-b

v4: tweak comments and error message about count mismatch, fix setting
of wide in loop [Vladimir]
---
 block/nbd.c| 49 --
 block/trace-events |  1 +
 2 files changed, 35 insertions(+), 15 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index d60782b25c7..d37f5425a0f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -616,13 +616,17 @@ static int nbd_parse_offset_hole_payload(BDRVNBDState *s,
  */
 static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
  NBDStructuredReplyChunk *chunk,
- uint8_t *payload, uint64_t 
orig_length,
- NBDExtent32 *extent, Error **errp)
+ uint8_t *payload, bool wide,
+ uint64_t orig_length,
+ NBDExtent64 *extent, Error **errp)
 {
 uint32_t context_id;
+uint32_t count;
+size_t ext_len = wide ? sizeof(*extent) : sizeof(NBDExtent32);
+size_t pay_len = sizeof(context_id) + wide * sizeof(count) + ext_len;

 /* The server succeeded, so it must have sent [at least] one extent */
-if (chunk->length < sizeof(context_id) + sizeof(*extent)) {
+if (chunk->length < pay_len) {
 error_setg(errp, "Protocol error: invalid payload for "
  "NBD_REPLY_TYPE_BLOCK_STATUS");
 return -EINVAL;
@@ -637,8 +641,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 return -EINVAL;
 }

-extent->length = payload_advance32();
-extent->flags = payload_advance32();
+if (wide) {
+count = payload_advance32();
+extent->length = payload_advance64();
+extent->flags = payload_advance64();
+} else {
+count = 0;
+extent->length = payload_advance32();
+extent->flags = payload_advance32();
+}

 if (extent->length == 0) {
 error_setg(errp, "Protocol error: server sent status chunk with "
@@ -659,7 +670,7 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
  * (always a safe status, even if it loses information).
  */
 if (s->info.min_block && !QEMU_IS_ALIGNED(extent->length,
-   s->info.min_block)) {
+  s->info.min_block)) {
 trace_nbd_parse_blockstatus_compliance("extent length is unaligned");
 if (extent->length > s->info.min_block) {
 extent->length = QEMU_ALIGN_DOWN(extent->length,
@@ -673,13 +684,15 @@ static int nbd_parse_blockstatus_payload(BDRVNBDState *s,
 /*
  * We used NBD_CMD_FLAG_REQ_ONE, so the server should not have
  * sent us any more than one extent, nor should it have included
- * status beyond our request in that extent. However, it's easy
- * enough to ignore the server's noncompliance without killing the
+ * status beyond our request in that extent. Furthermore, a wide
+ * server should have replied with an accurate count (we left
+ * count at 0 for a narrow server).  However, it's easy enough to
+ * ignore the server's noncompliance without killing the
  * connection; just ignore trailing extents, and clamp things to
  * the length of our request.
  */
-if (chunk->length > sizeof(context_id) + sizeof(*extent)) {
-trace_nbd_parse_blockstatus_compliance("more than one extent");
+if (count != wide || chunk->length > pay_len) {
+trace_nbd_parse_blockstatus_compliance("unexpected extent count");
 }
 if (extent->length > orig_length) {
 extent->length = orig_length;
@@ -1125,7 +1138,7 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
cookie,

 static int coroutine_fn
 nbd_co_receive_blockstatus_reply(BDRVNBDState *s, uint64_t cookie,
- uint64_t

[PATCH v5 14/17] nbd/client: Request extended headers during negotiation

2023-08-10 Thread Eric Blake

All the pieces are in place for a client to finally request extended
headers.  Note that we must not request extended headers when qemu-nbd
is used to connect to the kernel module (as nbd.ko does not expect
them, but expects us to do the negotiation in userspace before handing
the socket over to the kernel), but there is no harm in all other
clients requesting them.

Extended headers are not essential to the information collected during
'qemu-nbd --list', but probing for it gives us one more piece of
information in that output.  Update the iotests affected by the new
line of output.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: add R-b

v4: rebase to earlier changes, tweak commit message for why qemu-nbd
connection to /dev/nbd cannot use extended mode [Vladimir]
---
 nbd/client-connection.c   |  2 +-
 nbd/client.c  | 20 ++-
 qemu-nbd.c|  3 +++
 tests/qemu-iotests/223.out|  6 ++
 tests/qemu-iotests/233.out|  4 
 tests/qemu-iotests/241.out|  3 +++
 tests/qemu-iotests/307.out|  5 +
 .../tests/nbd-qemu-allocation.out |  1 +
 8 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 13e4cb6684b..d9d946da006 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .do_negotiation = do_negotiation,

 .initial_info.request_sizes = true,
-.initial_info.mode = NBD_MODE_STRUCTURED,
+.initial_info.mode = NBD_MODE_EXTENDED,
 .initial_info.base_allocation = true,
 .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
 .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index e78d6c00f18..4520c08049e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -958,15 +958,23 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 if (fixedNewStyle) {
 int result = 0;

+if (max_mode >= NBD_MODE_EXTENDED) {
+result = nbd_request_simple_option(ioc,
+   NBD_OPT_EXTENDED_HEADERS,
+   false, errp);
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_EXTENDED;
+}
+}
 if (max_mode >= NBD_MODE_STRUCTURED) {
 result = nbd_request_simple_option(ioc,
NBD_OPT_STRUCTURED_REPLY,
false, errp);
-if (result < 0) {
-return -EINVAL;
+if (result) {
+return result < 0 ? -EINVAL : NBD_MODE_STRUCTURED;
 }
 }
-return result ? NBD_MODE_STRUCTURED : NBD_MODE_SIMPLE;
+return NBD_MODE_SIMPLE;
 } else {
 return NBD_MODE_EXPORT_NAME;
 }
@@ -1040,6 +1048,7 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 }

 switch (info->mode) {
+case NBD_MODE_EXTENDED:
 case NBD_MODE_STRUCTURED:
 if (base_allocation) {
 result = nbd_negotiate_simple_meta_context(ioc, info, errp);
@@ -1150,7 +1159,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,

 *info = NULL;
 result = nbd_start_negotiate(NULL, ioc, tlscreds, hostname, ,
- NBD_MODE_STRUCTURED, NULL, errp);
+ NBD_MODE_EXTENDED, NULL, errp);
 if (tlscreds && sioc) {
 ioc = sioc;
 }
@@ -1161,6 +1170,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 switch ((NBDMode)result) {
 case NBD_MODE_SIMPLE:
 case NBD_MODE_STRUCTURED:
+case NBD_MODE_EXTENDED:
 /* newstyle - use NBD_OPT_LIST to populate array, then try
  * NBD_OPT_INFO on each array member. If structured replies
  * are enabled, also try NBD_OPT_LIST_META_CONTEXT. */
@@ -1197,7 +1207,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 break;
 }

-if (result == NBD_MODE_STRUCTURED &&
+if (result >= NBD_MODE_STRUCTURED &&
 nbd_list_meta_contexts(ioc, [i], errp) < 0) {
 goto out;
 }
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 32c5a349e06..ca846f7d96d 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -237,6 +237,9 @@ static int qemu_nbd_client_list(SocketAddress *saddr, 
QCryptoTLSCreds *tls,
 printf("  opt block: %u\n", list[i].opt_block);
 printf("  max block: %u\n", list[i].max_block);

[PATCH v5 02/17] nbd/client: Pass mode through to nbd_send_request

2023-08-10 Thread Eric Blake

Once the 64-bit headers extension is enabled, the data layout we send
over the wire for a client request depends on the mode negotiated with
the server.  Rather than adding a parameter to nbd_send_request, we
can add a member to struct NBDRequest, since it already does not
reflect on-wire format.  Some callers initialize it directly; many
others rely on a common initialization point during
nbd_co_send_request().  At this point, there is no semantic change.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: R-b added
v4: new patch, based on ideas in v3 4/14, but by modifying NBDRequest
instead of adding a parameter
---
 include/block/nbd.h | 12 +++-
 block/nbd.c |  5 +++--
 nbd/client.c|  3 ++-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index abf6030b513..c4cbe130e07 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -63,17 +63,19 @@ typedef enum NBDMode {
 /* TODO add NBD_MODE_EXTENDED */
 } NBDMode;

-/* Transmission phase structs
- *
- * Note: these are _NOT_ the same as the network representation of an NBD
- * request and reply!
+/* Transmission phase structs */
+
+/*
+ * Note: NBDRequest is _NOT_ the same as the network representation of an NBD
+ * request!
  */
 typedef struct NBDRequest {
 uint64_t cookie;
 uint64_t from;
 uint32_t len;
 uint16_t flags; /* NBD_CMD_FLAG_* */
-uint16_t type; /* NBD_CMD_* */
+uint16_t type;  /* NBD_CMD_* */
+NBDMode mode;   /* Determines which network representation to use */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/block/nbd.c b/block/nbd.c
index 5f88f7a819b..ca5991f868a 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -339,7 +339,7 @@ int coroutine_fn 
nbd_co_do_establish_connection(BlockDriverState *bs,
  * We have connected, but must fail for other reasons.
  * Send NBD_CMD_DISC as a courtesy to the server.
  */
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = s->info.mode };

 nbd_send_request(s->ioc, );

@@ -521,6 +521,7 @@ nbd_co_send_request(BlockDriverState *bs, NBDRequest 
*request,

 qemu_co_mutex_lock(>send_mutex);
 request->cookie = INDEX_TO_COOKIE(i);
+request->mode = s->info.mode;

 assert(s->ioc);

@@ -1466,7 +1467,7 @@ static void nbd_yank(void *opaque)
 static void nbd_client_close(BlockDriverState *bs)
 {
 BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = s->info.mode };

 if (s->ioc) {
 nbd_send_request(s->ioc, );
diff --git a/nbd/client.c b/nbd/client.c
index faa054c4527..40a1eb72346 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -1224,7 +1224,7 @@ int nbd_receive_export_list(QIOChannel *ioc, 
QCryptoTLSCreds *tlscreds,
 /* Send NBD_CMD_DISC as a courtesy to the server, but ignore all
  * errors now that we have the information we wanted. */
 if (nbd_drop(ioc, 124, NULL) == 0) {
-NBDRequest request = { .type = NBD_CMD_DISC };
+NBDRequest request = { .type = NBD_CMD_DISC, .mode = result };

 nbd_send_request(ioc, );
 }
@@ -1354,6 +1354,7 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
 {
 uint8_t buf[NBD_REQUEST_SIZE];

+assert(request->mode <= NBD_MODE_STRUCTURED); /* TODO handle extended */
 trace_nbd_send_request(request->from, request->len, request->cookie,
request->flags, request->type,
nbd_cmd_lookup(request->type));
-- 
2.41.0

Re: [PATCH] hw/pci-host: Allow extended config space access for Designware PCIe host

2023-08-10 Thread Michael S. Tsirkin

On Fri, Aug 11, 2023 at 01:22:08AM +0800, Jason Chien wrote:
> As far as I know, the order issue is caused by nested device realization. In
> this case, realizing TYPE_DESIGNWARE_PCIE_HOST will also
> realize TYPE_DESIGNWARE_PCIE_ROOT(see designware_pcie_host_realize()).
> device_set_realized() is the function that realizing a device must go through,
> and this function first realizes the device by dc->realize() and then realizes
> the device's child bus by qbus_realize(). Whether there is any child bus of 
> the
> device may depend on dc->realize(). The realization flow will be like a
> recursive call to device_set_realized(). More precisely, the flow in this case
> is: qdev_realize() --> ... --> FIRST device_set_realized() --> FIRST dc->
> realize() --> ... --> designware_pcie_host_realize() --> qdev_realize() --> 
> ...
> --> SECOND device_set_realized() --> SECOND dc->realize() --> ... -->
>  designware_pcie_root_realize() --> ...--> back to the SECOND
> device_set_realized() --> SECOND qbus_realize() the CHILD bus "dw-pcie" --> 
> ...
> --> back to the FIRST device_set_realized() --> FIRST qbus_realize() the 
> PARENT
> bus "pcie".
> 
> I also found this patch that solves the same bus issue.

Which patch?

> Do you have any suggestions on the order of realization? Thanks!


I see. It's not easy to fix. Worth thinking about but I guess your
patch is ok for now.

> On Thu, Aug 10, 2023 at 5:24 AM Michael S. Tsirkin  wrote:
> 
> On Wed, Aug 09, 2023 at 10:22:50AM +, Jason Chien wrote:
> > In pcie_bus_realize(), a root bus is realized as a PCIe bus and a
> non-root
> > bus is realized as a PCIe bus if its parent bus is a PCIe bus. However,
> > the child bus "dw-pcie" is realized before the parent bus "pcie" which 
> is
> > the root PCIe bus. Thus, the extended configuration space is not
> accessible
> > on "dw-pcie". The issue can be resolved by adding the
> > PCI_BUS_EXTENDED_CONFIG_SPACE flag to "pcie" before "dw-pcie" is
> realized.
> >
> > Signed-off-by: Jason Chien 
> 
> I think we should fix the order of initialization rather than
> hack around it.
> 
> > ---
> >  hw/pci-host/designware.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
> > index 9e183caa48..388d252ee2 100644
> > --- a/hw/pci-host/designware.c
> > +++ b/hw/pci-host/designware.c
> > @@ -694,6 +694,7 @@ static void designware_pcie_host_realize(DeviceState
> *dev, Error **errp)
> >                                       >pci.io,
> >                                       0, 4,
> >                                       TYPE_PCIE_BUS);
> > +    pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
> > 
> >      memory_region_init(>pci.address_space_root,
> >                         OBJECT(s),
> > --
> > 2.17.1
> 
>

[PATCH v5 01/17] nbd: Replace bool structured_reply with mode enum

2023-08-10 Thread Eric Blake

The upcoming patches for 64-bit extensions requires various points in
the protocol to make decisions based on what was negotiated.  While we
could easily add a 'bool extended_headers' alongside the existing
'bool structured_reply', this does not scale well if more modes are
added in the future.  Better is to expose the mode enum added in the
recent commit bfe04d0a7d out to a wider use in the code base.

Where the code previously checked for structured_reply being set or
clear, it now prefers checking for an inequality; this works because
the nodes are in a continuum of increasing abilities, and allows us to
touch fewer places if we ever insert other modes in the middle of the
enum.  There should be no semantic change in this patch.

Signed-off-by: Eric Blake 
---

v5: rebase to master, populate correct mode during server handshake
[Vladimir], fix stray comment leaked in commit 66d4f4fe

v4: new patch, expanding enum idea from v3 4/14
---
 include/block/nbd.h |  2 +-
 block/nbd.c |  8 +---
 nbd/client-connection.c |  4 ++--
 nbd/client.c| 18 +-
 nbd/server.c| 31 ++-
 qemu-nbd.c  |  4 +++-
 6 files changed, 38 insertions(+), 29 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4428bcffbb9..abf6030b513 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -305,7 +305,7 @@ typedef struct NBDExportInfo {

 /* In-out fields, set by client before nbd_receive_negotiate() and
  * updated by server results during nbd_receive_negotiate() */
-bool structured_reply;
+NBDMode mode; /* input maximum mode tolerated; output actual mode chosen */
 bool base_allocation; /* base:allocation context for NBD_CMD_BLOCK_STATUS 
*/

 /* Set by server results during nbd_receive_negotiate() and
diff --git a/block/nbd.c b/block/nbd.c
index 5322e66166c..5f88f7a819b 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -464,7 +464,8 @@ static coroutine_fn int nbd_receive_replies(BDRVNBDState 
*s, uint64_t cookie)
 nbd_channel_error(s, ret);
 return ret;
 }
-if (nbd_reply_is_structured(>reply) && !s->info.structured_reply) {
+if (nbd_reply_is_structured(>reply) &&
+s->info.mode < NBD_MODE_STRUCTURED) {
 nbd_channel_error(s, -EINVAL);
 return -EINVAL;
 }
@@ -867,7 +868,7 @@ static coroutine_fn int nbd_co_do_receive_one_chunk(
 }

 /* handle structured reply chunk */
-assert(s->info.structured_reply);
+assert(s->info.mode >= NBD_MODE_STRUCTURED);
 chunk = >reply.structured;

 if (chunk->type == NBD_REPLY_TYPE_NONE) {
@@ -1071,7 +1072,8 @@ nbd_co_receive_cmdread_reply(BDRVNBDState *s, uint64_t 
cookie,
 void *payload = NULL;
 Error *local_err = NULL;

-NBD_FOREACH_REPLY_CHUNK(s, iter, cookie, s->info.structured_reply,
+NBD_FOREACH_REPLY_CHUNK(s, iter, cookie,
+s->info.mode >= NBD_MODE_STRUCTURED,
 qiov, , )
 {
 int ret;
diff --git a/nbd/client-connection.c b/nbd/client-connection.c
index 3d14296c042..13e4cb6684b 100644
--- a/nbd/client-connection.c
+++ b/nbd/client-connection.c
@@ -1,5 +1,5 @@
 /*
- * QEMU Block driver for  NBD
+ * QEMU Block driver for NBD
  *
  * Copyright (c) 2021 Virtuozzo International GmbH.
  *
@@ -93,7 +93,7 @@ NBDClientConnection *nbd_client_connection_new(const 
SocketAddress *saddr,
 .do_negotiation = do_negotiation,

 .initial_info.request_sizes = true,
-.initial_info.structured_reply = true,
+.initial_info.mode = NBD_MODE_STRUCTURED,
 .initial_info.base_allocation = true,
 .initial_info.x_dirty_bitmap = g_strdup(x_dirty_bitmap),
 .initial_info.name = g_strdup(export_name ?: "")
diff --git a/nbd/client.c b/nbd/client.c
index 479208d5d9d..faa054c4527 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -880,7 +880,7 @@ static int nbd_list_meta_contexts(QIOChannel *ioc,
 static int nbd_start_negotiate(AioContext *aio_context, QIOChannel *ioc,
QCryptoTLSCreds *tlscreds,
const char *hostname, QIOChannel **outioc,
-   bool structured_reply, bool *zeroes,
+   NBDMode max_mode, bool *zeroes,
Error **errp)
 {
 ERRP_GUARD();
@@ -958,7 +958,7 @@ static int nbd_start_negotiate(AioContext *aio_context, 
QIOChannel *ioc,
 if (fixedNewStyle) {
 int result = 0;

-if (structured_reply) {
+if (max_mode >= NBD_MODE_STRUCTURED) {
 result = nbd_request_simple_option(ioc,
NBD_OPT_STRUCTURED_REPLY,
false, errp);
@@ -1028,20 +1028,19 @@ int nbd_receive_negotiate(AioContext *aio_context, 
QIOChannel *ioc,

Re: [PATCH v6 03/12] target/riscv/cpu.c: split kvm prop handling to its own helper

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 6:39 PM Daniel Henrique Barboza
 wrote:
>
> Future patches will split the existing Property arrays even further, and
> the existing code in riscv_cpu_add_user_properties() will start to scale
> bad with it because it's dealing with KVM constraints mixed in with TCG
> constraints. We're going to pay a high price to share a couple of common
> lines of code between the two.
>
> Create a new riscv_cpu_add_kvm_properties() that will be forked from
> riscv_cpu_add_user_properties() if we're running KVM. The helper
> includes all properties that a KVM CPU will add. The rest of
> riscv_cpu_add_user_properties() body will then be relieved from having
> to deal with KVM constraints.
>
> Signed-off-by: Daniel Henrique Barboza 
> Reviewed-by: Weiwei Li 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 65 ++
>  1 file changed, 42 insertions(+), 23 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 2fa2581742..f1a292d967 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1881,6 +1881,46 @@ static void cpu_set_cfg_unavailable(Object *obj, 
> Visitor *v,
>  }
>  #endif
>
> +#ifndef CONFIG_USER_ONLY
> +static void riscv_cpu_add_kvm_unavail_prop(Object *obj, const char 
> *prop_name)
> +{
> +/* Check if KVM created the property already */
> +if (object_property_find(obj, prop_name)) {
> +return;
> +}
> +
> +/*
> + * Set the default to disabled for every extension
> + * unknown to KVM and error out if the user attempts
> + * to enable any of them.
> + */
> +object_property_add(obj, prop_name, "bool",
> +NULL, cpu_set_cfg_unavailable,
> +NULL, (void *)prop_name);
> +}
> +
> +static void riscv_cpu_add_kvm_properties(Object *obj)
> +{
> +Property *prop;
> +DeviceState *dev = DEVICE(obj);
> +
> +kvm_riscv_init_user_properties(obj);
> +riscv_cpu_add_misa_properties(obj);
> +
> +for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
> +riscv_cpu_add_kvm_unavail_prop(obj, prop->name);
> +}
> +
> +for (int i = 0; i < ARRAY_SIZE(riscv_cpu_options); i++) {
> +/* Check if KVM created the property already */
> +if (object_property_find(obj, riscv_cpu_options[i].name)) {
> +continue;
> +}
> +qdev_property_add_static(dev, _cpu_options[i]);
> +}
> +}
> +#endif
> +
>  /*
>   * Add CPU properties with user-facing flags.
>   *
> @@ -1896,39 +1936,18 @@ static void riscv_cpu_add_user_properties(Object *obj)
>  riscv_add_satp_mode_properties(obj);
>
>  if (kvm_enabled()) {
> -kvm_riscv_init_user_properties(obj);
> +riscv_cpu_add_kvm_properties(obj);
> +return;
>  }
>  #endif
>
>  riscv_cpu_add_misa_properties(obj);
>
>  for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
> -#ifndef CONFIG_USER_ONLY
> -if (kvm_enabled()) {
> -/* Check if KVM created the property already */
> -if (object_property_find(obj, prop->name)) {
> -continue;
> -}
> -
> -/*
> - * Set the default to disabled for every extension
> - * unknown to KVM and error out if the user attempts
> - * to enable any of them.
> - */
> -object_property_add(obj, prop->name, "bool",
> -NULL, cpu_set_cfg_unavailable,
> -NULL, (void *)prop->name);
> -continue;
> -}
> -#endif
>  qdev_property_add_static(dev, prop);
>  }
>
>  for (int i = 0; i < ARRAY_SIZE(riscv_cpu_options); i++) {
> -/* Check if KVM created the property already */
> -if (object_property_find(obj, riscv_cpu_options[i].name)) {
> -continue;
> -}
>  qdev_property_add_static(dev, _cpu_options[i]);
>  }
>  }
> --
> 2.41.0
>
>

[PATCH v5 15/17] nbd/server: Refactor list of negotiated meta contexts

2023-08-10 Thread Eric Blake

Peform several minor refactorings of how the list of negotiated meta
contexts is managed, to make upcoming patches easier: Promote the
internal type NBDExportMetaContexts to the public opaque type
NBDMetaContexts, and mark exp const.  Use a shorter member name in
NBDClient.  Hoist calls to nbd_check_meta_context() earlier in their
callers, as the number of negotiated contexts may impact the flags
exposed in regards to an export, which in turn requires a new
parameter.  Drop a redundant parameter to nbd_negotiate_meta_queries.
No semantic change intended on the success path; on the failure path,
dropping context in nbd_check_meta_export even when reporting an error
is safer.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: rebase to master, tweak commit message [Vladimir], R-b added

v4: new patch split out from v3 13/14, with smaller impact (quit
trying to separate exp outside of NBDMeataContexts)
---
 include/block/nbd.h |  1 +
 nbd/server.c| 55 -
 2 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 4e9ce679e37..7643c321f36 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -29,6 +29,7 @@
 typedef struct NBDExport NBDExport;
 typedef struct NBDClient NBDClient;
 typedef struct NBDClientConnection NBDClientConnection;
+typedef struct NBDMetaContexts NBDMetaContexts;

 extern const BlockExportDriver blk_exp_nbd;

diff --git a/nbd/server.c b/nbd/server.c
index 8b48cdca1ef..76235347174 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -105,11 +105,13 @@ struct NBDExport {

 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);

-/* NBDExportMetaContexts represents a list of contexts to be exported,
+/*
+ * NBDMetaContexts represents a list of meta contexts in use,
  * as selected by NBD_OPT_SET_META_CONTEXT. Also used for
- * NBD_OPT_LIST_META_CONTEXT. */
-typedef struct NBDExportMetaContexts {
-NBDExport *exp;
+ * NBD_OPT_LIST_META_CONTEXT.
+ */
+struct NBDMetaContexts {
+const NBDExport *exp; /* associated export */
 size_t count; /* number of negotiated contexts */
 bool base_allocation; /* export base:allocation context (block status) */
 bool allocation_depth; /* export qemu:allocation-depth */
@@ -117,7 +119,7 @@ typedef struct NBDExportMetaContexts {
 * export qemu:dirty-bitmap:,
 * sized by exp->nr_export_bitmaps
 */
-} NBDExportMetaContexts;
+};

 struct NBDClient {
 int refcount;
@@ -144,7 +146,7 @@ struct NBDClient {
 uint32_t check_align; /* If non-zero, check for aligned client requests */

 NBDMode mode;
-NBDExportMetaContexts export_meta;
+NBDMetaContexts contexts; /* Negotiated meta contexts */

 uint32_t opt; /* Current option being negotiated */
 uint32_t optlen; /* remaining length of data in ioc for the option being
@@ -455,10 +457,10 @@ static int nbd_negotiate_handle_list(NBDClient *client, 
Error **errp)
 return nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
 }

-static void nbd_check_meta_export(NBDClient *client)
+static void nbd_check_meta_export(NBDClient *client, NBDExport *exp)
 {
-if (client->exp != client->export_meta.exp) {
-client->export_meta.count = 0;
+if (exp != client->contexts.exp) {
+client->contexts.count = 0;
 }
 }

@@ -504,6 +506,7 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 error_setg(errp, "export not found");
 return -EINVAL;
 }
+nbd_check_meta_export(client, client->exp);

 myflags = client->exp->nbdflags;
 if (client->mode >= NBD_MODE_STRUCTURED) {
@@ -521,7 +524,6 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,

 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);

 return 0;
 }
@@ -641,6 +643,9 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
   errp, "export '%s' not present",
   sane_name);
 }
+if (client->opt == NBD_OPT_GO) {
+nbd_check_meta_export(client, exp);
+}

 /* Don't bother sending NBD_INFO_NAME unless client requested it */
 if (sendname) {
@@ -729,7 +734,6 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 client->check_align = check_align;
 QTAILQ_INSERT_TAIL(>exp->clients, client, next);
 blk_exp_ref(>exp->common);
-nbd_check_meta_export(client);
 rc = 1;
 }
 return rc;
@@ -852,7 +856,7 @@ static bool nbd_strshift(const char **str, const char 
*prefix)
  * Handle queries to 'base' namespace. For now, only the base:allocation
  * context is available.  Return true if @query has been handled.
  */
-static bool nbd_meta_base_query(NBDClient *client,

[PATCH v5 17/17] nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

2023-08-10 Thread Eric Blake

Allow a client to request a subset of negotiated meta contexts.  For
example, a client may ask to use a single connection to learn about
both block status and dirty bitmaps, but where the dirty bitmap
queries only need to be performed on a subset of the disk; forcing the
server to compute that information on block status queries in the rest
of the disk is wasted effort (both at the server, and on the amount of
traffic sent over the wire to be parsed and ignored by the client).

Qemu as an NBD client never requests to use more than one meta
context, so it has no need to use block status payloads.  Testing this
instead requires support from libnbd, which CAN access multiple meta
contexts in parallel from a single NBD connection; an interop test
submitted to the libnbd project at the same time as this patch
demonstrates the feature working, as well as testing some corner cases
(for example, when the payload length is longer than the export
length), although other corner cases (like passing the same id
duplicated) requires a protocol fuzzer because libnbd is not wired up
to break the protocol that badly.

This also includes tweaks to 'qemu-nbd --list' to show when a server
is advertising the capability, and to the testsuite to reflect the
addition to that output.

Of note: qemu will always advertise the new feature bit during
NBD_OPT_INFO if extended headers have alreay been negotiated
(regardless of whether any NBD_OPT_SET_META_CONTEXT negotiation has
occurred); but for NBD_OPT_GO, qemu only advertises the feature if
block status is also enabled (that is, if the client does not
negotiate any contexts, then NBD_CMD_BLOCK_STATUS cannot be used, so
the feature is not advertised).

Signed-off-by: Eric Blake 
---

v5: factor out 'id - NBD_MTA_ID_DIRTY_BITMAP' [Vladimir], rework logic
on zero-length requests to be clearer [Vladimir], rebase to earlier
changes
---
 docs/interop/nbd.txt  |   2 +-
 nbd/server.c  | 114 --
 qemu-nbd.c|   1 +
 nbd/trace-events  |   1 +
 tests/qemu-iotests/223.out|  12 +-
 tests/qemu-iotests/307.out|  10 +-
 .../tests/nbd-qemu-allocation.out |   2 +-
 7 files changed, 122 insertions(+), 20 deletions(-)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index 9aae5e1f294..18efb251de9 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,4 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.2: NBD_OPT_EXTENDED_HEADERS
+* 8.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/nbd/server.c b/nbd/server.c
index ed988aa6308..4d5416e3ffb 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -512,6 +512,9 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED && client->contexts.count) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(client->exp->size, myflags);
 stq_be_p(buf, client->exp->size);
 stw_be_p(buf + 8, myflags);
@@ -699,6 +702,10 @@ static int nbd_negotiate_handle_info(NBDClient *client, 
Error **errp)
 if (client->mode >= NBD_MODE_STRUCTURED) {
 myflags |= NBD_FLAG_SEND_DF;
 }
+if (client->mode >= NBD_MODE_EXTENDED &&
+(client->contexts.count || client->opt == NBD_OPT_INFO)) {
+myflags |= NBD_FLAG_BLOCK_STAT_PAYLOAD;
+}
 trace_nbd_negotiate_new_style_size_flags(exp->size, myflags);
 stq_be_p(buf, exp->size);
 stw_be_p(buf + 8, myflags);
@@ -2432,6 +2439,87 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient 
*client,
 return nbd_co_send_extents(client, request, ea, last, context_id, errp);
 }

+/*
+ * nbd_co_block_status_payload_read
+ * Called when a client wants a subset of negotiated contexts via a
+ * BLOCK_STATUS payload.  Check the payload for valid length and
+ * contents.  On success, return 0 with request updated to effective
+ * length.  If request was invalid but all payload consumed, return 0
+ * with request->len and request->contexts->count set to 0 (which will
+ * trigger an appropriate NBD_EINVAL response later on).  Return
+ * negative errno if the payload was not fully consumed.
+ */
+static int
+nbd_co_block_status_payload_read(NBDClient *client, NBDRequest *request,
+ Error **errp)
+{
+int payload_len = request->len;
+g_autofree char *buf = NULL;
+size_t count, i, nr_bitmaps;
+uint32_t id;
+
+if (payload_len > NBD_MAX_BUFFER_SIZE) {
+error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
+   request->len,

[PATCH v5 10/17] nbd/server: Enable initial support for extended headers

2023-08-10 Thread Eric Blake

Time to start supporting clients that request extended headers.  Now
we can finally reach the code added across several previous patches.

Even though the NBD spec has been altered to allow us to accept
NBD_CMD_READ larger than the max payload size (provided our response
is a hole or broken up over more than one data chunk), we are not
planning to take advantage of that, and continue to cap NBD_CMD_READ
to 32M regardless of header size.

For NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM, the block layer already
supports 64-bit operations without any effort on our part.  For
NBD_CMD_BLOCK_STATUS, the client's length is a hint, and the previous
patch took care of implementing the required
NBD_REPLY_TYPE_BLOCK_STATUS_EXT.

We do not yet support clients that want to do request payload
filtering of NBD_CMD_BLOCK_STATUS; that will be added in later
patches, but is not essential for qemu as a client since qemu only
requests the single context base:allocation.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: add R-b, s/8.1/8.2/

v4: split out parts into earlier patches, rebase to earlier changes,
simplify handling of generic replies, retitle (compare to v3 9/14)
---
 docs/interop/nbd.txt |  1 +
 nbd/server.c | 21 +
 2 files changed, 22 insertions(+)

diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index f5ca25174a6..9aae5e1f294 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -69,3 +69,4 @@ NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
 NBD_CMD_FLAG_FAST_ZERO
 * 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
 * 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
+* 8.2: NBD_OPT_EXTENDED_HEADERS
diff --git a/nbd/server.c b/nbd/server.c
index af41810e9e7..8b48cdca1ef 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -482,6 +482,10 @@ static int nbd_negotiate_handle_export_name(NBDClient 
*client, bool no_zeroes,
 [10 .. 133]   reserved (0) [unless no_zeroes]
  */
 trace_nbd_negotiate_handle_export_name();
+if (client->mode >= NBD_MODE_EXTENDED) {
+error_setg(errp, "Extended headers already negotiated");
+return -EINVAL;
+}
 if (client->optlen > NBD_MAX_STRING_SIZE) {
 error_setg(errp, "Bad length received");
 return -EINVAL;
@@ -1264,6 +1268,10 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
 case NBD_OPT_STRUCTURED_REPLY:
 if (length) {
 ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_EXT_HEADER_REQD, errp,
+"extended headers already negotiated");
 } else if (client->mode >= NBD_MODE_STRUCTURED) {
 ret = nbd_negotiate_send_rep_err(
 client, NBD_REP_ERR_INVALID, errp,
@@ -1280,6 +1288,19 @@ static int nbd_negotiate_options(NBDClient *client, 
Error **errp)
  errp);
 break;

+case NBD_OPT_EXTENDED_HEADERS:
+if (length) {
+ret = nbd_reject_length(client, false, errp);
+} else if (client->mode >= NBD_MODE_EXTENDED) {
+ret = nbd_negotiate_send_rep_err(
+client, NBD_REP_ERR_INVALID, errp,
+"extended headers already negotiated");
+} else {
+ret = nbd_negotiate_send_rep(client, NBD_REP_ACK, errp);
+client->mode = NBD_MODE_EXTENDED;
+}
+break;
+
 default:
 ret = nbd_opt_drop(client, NBD_REP_ERR_UNSUP, errp,
"Unsupported option %" PRIu32 " (%s)",
-- 
2.41.0

[PATCH v5 03/17] nbd: Add types for extended headers

2023-08-10 Thread Eric Blake

Add the constants and structs necessary for later patches to start
implementing the NBD_OPT_EXTENDED_HEADERS extension in both the client
and server, matching recent upstream nbd.git (through commit
e6f3b94a934).  This patch does not change any existing behavior, but
merely sets the stage for upcoming patches.

This patch does not change the status quo that neither the client nor
server use a packed-struct representation for the request header.
While most of the patch adds new types, there is also some churn for
renaming the existing NBDExtent to NBDExtent32 to contrast it with
NBDExtent64, which I thought was a nicer name than NBDExtentExt.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: Add R-b

v4: Hoist earlier in series, tweak a few comments, defer docs/interop
change to when feature is actually turned on, NBDExtent rename, add
QEMU_BUG_BUILD_ON for sanity sake, hoist in block status payload bits
from v3 14/14; R-b dropped
---
 include/block/nbd.h | 124 +++-
 nbd/nbd-internal.h  |   3 +-
 block/nbd.c |   6 +--
 nbd/common.c|  12 -
 nbd/server.c|   6 +--
 5 files changed, 106 insertions(+), 45 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index c4cbe130e07..b2fb8ab44d5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -60,7 +60,7 @@ typedef enum NBDMode {
 NBD_MODE_EXPORT_NAME,  /* newstyle but only OPT_EXPORT_NAME safe */
 NBD_MODE_SIMPLE,   /* newstyle but only simple replies */
 NBD_MODE_STRUCTURED,   /* newstyle, structured replies enabled */
-/* TODO add NBD_MODE_EXTENDED */
+NBD_MODE_EXTENDED, /* newstyle, extended headers enabled */
 } NBDMode;

 /* Transmission phase structs */
@@ -93,20 +93,36 @@ typedef struct NBDStructuredReplyChunk {
 uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+typedef struct NBDExtendedReplyChunk {
+uint32_t magic;  /* NBD_EXTENDED_REPLY_MAGIC */
+uint16_t flags;  /* combination of NBD_REPLY_FLAG_* */
+uint16_t type;   /* NBD_REPLY_TYPE_* */
+uint64_t cookie; /* request handle */
+uint64_t offset; /* request offset */
+uint64_t length; /* length of payload */
+} QEMU_PACKED NBDExtendedReplyChunk;
+
 typedef union NBDReply {
 NBDSimpleReply simple;
 NBDStructuredReplyChunk structured;
+NBDExtendedReplyChunk extended;
 struct {
 /*
- * @magic and @cookie fields have the same offset and size both in
- * simple reply and structured reply chunk, so let them be accessible
- * without ".simple." or ".structured." specification
+ * @magic and @cookie fields have the same offset and size in all
+ * forms of replies, so let them be accessible without ".simple.",
+ * ".structured.", or ".extended." specifications.
  */
 uint32_t magic;
 uint32_t _skip;
 uint64_t cookie;
-} QEMU_PACKED;
+};
 } NBDReply;
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, simple.cookie) !=
+  offsetof(NBDReply, cookie));
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, structured.cookie) !=
+  offsetof(NBDReply, cookie));
+QEMU_BUILD_BUG_ON(offsetof(NBDReply, extended.cookie) !=
+  offsetof(NBDReply, cookie));

 /* Header of chunk for NBD_REPLY_TYPE_OFFSET_DATA */
 typedef struct NBDStructuredReadData {
@@ -133,14 +149,34 @@ typedef struct NBDStructuredError {
 typedef struct NBDStructuredMeta {
 /* header's length >= 12 (at least one extent) */
 uint32_t context_id;
-/* extents follows */
+/* NBDExtent32 extents[] follows, array length implied by header */
 } QEMU_PACKED NBDStructuredMeta;

-/* Extent chunk for NBD_REPLY_TYPE_BLOCK_STATUS */
-typedef struct NBDExtent {
+/* Extent array element for NBD_REPLY_TYPE_BLOCK_STATUS */
+typedef struct NBDExtent32 {
 uint32_t length;
 uint32_t flags; /* NBD_STATE_* */
-} QEMU_PACKED NBDExtent;
+} QEMU_PACKED NBDExtent32;
+
+/* Header of NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtendedMeta {
+/* header's length >= 24 (at least one extent) */
+uint32_t context_id;
+uint32_t count; /* header length must be count * 16 + 8 */
+/* NBDExtent64 extents[count] follows */
+} QEMU_PACKED NBDExtendedMeta;
+
+/* Extent array element for NBD_REPLY_TYPE_BLOCK_STATUS_EXT */
+typedef struct NBDExtent64 {
+uint64_t length;
+uint64_t flags; /* NBD_STATE_* */
+} QEMU_PACKED NBDExtent64;
+
+/* Client payload for limiting NBD_CMD_BLOCK_STATUS reply */
+typedef struct NBDBlockStatusPayload {
+uint64_t effect_length;
+/* uint32_t ids[] follows, array length implied by header */
+} QEMU_PACKED NBDBlockStatusPayload;

 /* Transmission (export) flags: sent from server to client during handshake,
but describe what will happen during transmission */
@@ -158,20 +194,22 @@ enum {
 NBD_FLAG_SEND_RESIZE_BIT=  9, /* Send resize */

[PATCH v5 05/17] nbd/server: Refactor handling of command sanity checks

2023-08-10 Thread Eric Blake

Upcoming additions to support NBD 64-bit effect lengths will add a new
command flag NBD_CMD_FLAG_PAYLOAD_LEN that needs to be considered in
our sanity checks of the client's messages (that is, more than just
CMD_WRITE have the potential to carry a client payload when extended
headers are in effect).  But before we can start to support that, it
is easier to first refactor the existing set of various if statements
over open-coded combinations of request->type to instead be a single
switch statement over all command types that sets witnesses, then
straight-line processing based on the witnesses.  No semantic change
is intended.

Signed-off-by: Eric Blake 
---

v5: new patch split out from v4 13/24 [Vladimir]
---
 nbd/server.c | 118 ---
 1 file changed, 74 insertions(+), 44 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index db8f5943139..795f7c86781 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2329,11 +2329,16 @@ static int coroutine_fn nbd_co_send_bitmap(NBDClient 
*client,
  * to the client (although the caller may still need to disconnect after
  * reporting the error).
  */
-static int coroutine_fn nbd_co_receive_request(NBDRequestData *req, NBDRequest 
*request,
+static int coroutine_fn nbd_co_receive_request(NBDRequestData *req,
+   NBDRequest *request,
Error **errp)
 {
 NBDClient *client = req->client;
-int valid_flags;
+bool check_length = false;
+bool check_rofs = false;
+bool allocate_buffer = false;
+unsigned payload_len = 0;
+int valid_flags = NBD_CMD_FLAG_FUA;
 int ret;

 g_assert(qemu_in_coroutine());
@@ -2345,55 +2350,88 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req, NBDRequest *

 trace_nbd_co_receive_request_decode_type(request->cookie, request->type,
  nbd_cmd_lookup(request->type));
-
-if (request->type != NBD_CMD_WRITE) {
-/* No payload, we are ready to read the next request.  */
-req->complete = true;
-}
-
-if (request->type == NBD_CMD_DISC) {
+switch (request->type) {
+case NBD_CMD_DISC:
 /* Special case: we're going to disconnect without a reply,
  * whether or not flags, from, or len are bogus */
+req->complete = true;
 return -EIO;
-}

-if (request->type == NBD_CMD_READ || request->type == NBD_CMD_WRITE ||
-request->type == NBD_CMD_CACHE)
-{
-if (request->len > NBD_MAX_BUFFER_SIZE) {
-error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
-   request->len, NBD_MAX_BUFFER_SIZE);
-return -EINVAL;
+case NBD_CMD_READ:
+if (client->mode >= NBD_MODE_STRUCTURED) {
+valid_flags |= NBD_CMD_FLAG_DF;
 }
+check_length = true;
+allocate_buffer = true;
+break;

-if (request->type != NBD_CMD_CACHE) {
-req->data = blk_try_blockalign(client->exp->common.blk,
-   request->len);
-if (req->data == NULL) {
-error_setg(errp, "No memory");
-return -ENOMEM;
-}
-}
+case NBD_CMD_WRITE:
+payload_len = request->len;
+check_length = true;
+allocate_buffer = true;
+check_rofs = true;
+break;
+
+case NBD_CMD_FLUSH:
+break;
+
+case NBD_CMD_TRIM:
+check_rofs = true;
+break;
+
+case NBD_CMD_CACHE:
+check_length = true;
+break;
+
+case NBD_CMD_WRITE_ZEROES:
+valid_flags |= NBD_CMD_FLAG_NO_HOLE | NBD_CMD_FLAG_FAST_ZERO;
+check_rofs = true;
+break;
+
+case NBD_CMD_BLOCK_STATUS:
+valid_flags |= NBD_CMD_FLAG_REQ_ONE;
+break;
+
+default:
+/* Unrecognized, will fail later */
+;
 }

-if (request->type == NBD_CMD_WRITE) {
-assert(request->len <= NBD_MAX_BUFFER_SIZE);
-if (nbd_read(client->ioc, req->data, request->len, "CMD_WRITE data",
- errp) < 0)
-{
+/* Payload and buffer handling. */
+if (!payload_len) {
+req->complete = true;
+}
+if (check_length && request->len > NBD_MAX_BUFFER_SIZE) {
+/* READ, WRITE, CACHE */
+error_setg(errp, "len (%" PRIu64" ) is larger than max len (%u)",
+   request->len, NBD_MAX_BUFFER_SIZE);
+return -EINVAL;
+}
+if (allocate_buffer) {
+/* READ, WRITE */
+req->data = blk_try_blockalign(client->exp->common.blk,
+   request->len);
+if (req->data == NULL) {
+error_setg(errp, "No memory");
+return -ENOMEM;
+}
+}
+if (payload_len) {
+/* WRITE */
+assert(req->data);
+ret = nbd_read(client->ioc, req->data, payload_len,

[PATCH v5 00/17] qemu patches for 64-bit NBD extensions

2023-08-10 Thread Eric Blake

v4 was here:
https://lists.gnu.org/archive/html/qemu-devel/2023-06/msg01898.html
(1-8/24 of that series made it into 8.1; this is the rest)

v5 addresses Vladimir's review comments; and the amount of change is
smaller, so this is probably ready to merge in once 8.1 is out the
door and the remaining patches get R-b tags.  The biggest change is
probably the split of v4 13/24 into a split 5 and 6 here, which had
knock-on effects to patch 17.

001/17:[0006] [FC] 'nbd: Replace bool structured_reply with mode enum'
002/17:[] [--] 'nbd/client: Pass mode through to nbd_send_request'
003/17:[] [--] 'nbd: Add types for extended headers'
004/17:[0018] [FC] 'nbd: Prepare for 64-bit request effect lengths'
005/17:[down] 'nbd/server: Refactor handling of command sanity checks'
006/17:[down] 'nbd/server: Support a request payload'
007/17:[] [--] 'nbd/server: Prepare to receive extended header requests'
008/17:[0004] [FC] 'nbd/server: Prepare to send extended header replies'
009/17:[0004] [FC] 'nbd/server: Support 64-bit block status'
010/17:[0002] [FC] 'nbd/server: Enable initial support for extended headers'
011/17:[0002] [FC] 'nbd/client: Plumb errp through nbd_receive_replies'
012/17:[0004] [FC] 'nbd/client: Initial support for extended headers'
013/17:[0010] [FC] 'nbd/client: Accept 64-bit block status chunks'
014/17:[] [--] 'nbd/client: Request extended headers during negotiation'
015/17:[] [-C] 'nbd/server: Refactor list of negotiated meta contexts'
016/17:[0006] [FC] 'nbd/server: Prepare for per-request filtering of 
BLOCK_STATUS'
017/17:[0053] [FC] 'nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS'

Eric Blake (17):
  nbd: Replace bool structured_reply with mode enum
  nbd/client: Pass mode through to nbd_send_request
  nbd: Add types for extended headers
  nbd: Prepare for 64-bit request effect lengths
  nbd/server: Refactor handling of command sanity checks
  nbd/server: Support a request payload
  nbd/server: Prepare to receive extended header requests
  nbd/server: Prepare to send extended header replies
  nbd/server: Support 64-bit block status
  nbd/server: Enable initial support for extended headers
  nbd/client: Plumb errp through nbd_receive_replies
  nbd/client: Initial support for extended headers
  nbd/client: Accept 64-bit block status chunks
  nbd/client: Request extended headers during negotiation
  nbd/server: Refactor list of negotiated meta contexts
  nbd/server: Prepare for per-request filtering of BLOCK_STATUS
  nbd/server: Add FLAG_PAYLOAD support to CMD_BLOCK_STATUS

 docs/interop/nbd.txt  |   1 +
 include/block/nbd.h   | 147 +++--
 nbd/nbd-internal.h|   8 +-
 block/nbd.c   | 105 +++-
 nbd/client-connection.c   |   4 +-
 nbd/client.c  | 140 +++--
 nbd/common.c  |  12 +-
 nbd/server.c  | 558 +-
 qemu-nbd.c|   8 +-
 block/trace-events|   3 +-
 nbd/trace-events  |  19 +-
 tests/qemu-iotests/223.out|  18 +-
 tests/qemu-iotests/233.out|   4 +
 tests/qemu-iotests/241.out|   3 +
 tests/qemu-iotests/307.out|  15 +-
 .../tests/nbd-qemu-allocation.out |   3 +-
 16 files changed, 762 insertions(+), 286 deletions(-)


base-commit: 64d3be986f9e2379bc688bf1d0aca0557e0035ca
-- 
2.41.0

[PATCH v5 16/17] nbd/server: Prepare for per-request filtering of BLOCK_STATUS

2023-08-10 Thread Eric Blake

The next commit will add support for the optional extension
NBD_CMD_FLAG_PAYLOAD during NBD_CMD_BLOCK_STATUS, where the client can
request that the server only return a subset of negotiated contexts,
rather than all contexts.  To make that task easier, this patch
populates the list of contexts to return on a per-command basis (for
now, identical to the full set of negotiated contexts).

Signed-off-by: Eric Blake 
---

v5: fix null dereference on early error [Vladimir], hoist in assertion
from v4 24/24

v4: split out NBDMetaContexts refactoring to its own patch, track
NBDRequests.contexts as a pointer rather than inline
---
 include/block/nbd.h |  1 +
 nbd/server.c| 22 +-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7643c321f36..9285aa85826 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -77,6 +77,7 @@ typedef struct NBDRequest {
 uint16_t flags; /* NBD_CMD_FLAG_* */
 uint16_t type;  /* NBD_CMD_* */
 NBDMode mode;   /* Determines which network representation to use */
+NBDMetaContexts *contexts; /* Used by NBD_CMD_BLOCK_STATUS */
 } NBDRequest;

 typedef struct NBDSimpleReply {
diff --git a/nbd/server.c b/nbd/server.c
index 76235347174..ed988aa6308 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2511,6 +2511,7 @@ static int coroutine_fn 
nbd_co_receive_request(NBDRequestData *req,
 break;

 case NBD_CMD_BLOCK_STATUS:
+request->contexts = >contexts;
 valid_flags |= NBD_CMD_FLAG_REQ_ONE;
 break;

@@ -2751,17 +2752,18 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
   "discard failed", errp);

 case NBD_CMD_BLOCK_STATUS:
+assert(request->contexts);
 if (!request->len) {
 return nbd_send_generic_reply(client, request, -EINVAL,
   "need non-zero length", errp);
 }
 assert(client->mode >= NBD_MODE_EXTENDED ||
request->len <= UINT32_MAX);
-if (client->contexts.count) {
+if (request->contexts->count) {
 bool dont_fragment = request->flags & NBD_CMD_FLAG_REQ_ONE;
-int contexts_remaining = client->contexts.count;
+int contexts_remaining = request->contexts->count;

-if (client->contexts.base_allocation) {
+if (request->contexts->base_allocation) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from,
@@ -2774,7 +2776,7 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

-if (client->contexts.allocation_depth) {
+if (request->contexts->allocation_depth) {
 ret = nbd_co_send_block_status(client, request,
exp->common.blk,
request->from, request->len,
@@ -2787,8 +2789,9 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 }
 }

+assert(request->contexts->exp == client->exp);
 for (i = 0; i < client->exp->nr_export_bitmaps; i++) {
-if (!client->contexts.bitmaps[i]) {
+if (!request->contexts->bitmaps[i]) {
 continue;
 }
 ret = nbd_co_send_bitmap(client, request,
@@ -2804,6 +2807,10 @@ static coroutine_fn int nbd_handle_request(NBDClient 
*client,
 assert(!contexts_remaining);

 return 0;
+} else if (client->contexts.count) {
+return nbd_send_generic_reply(client, request, -EINVAL,
+  "CMD_BLOCK_STATUS payload not valid",
+  errp);
 } else {
 return nbd_send_generic_reply(client, request, -EINVAL,
   "CMD_BLOCK_STATUS not negotiated",
@@ -2882,6 +2889,11 @@ static coroutine_fn void nbd_trip(void *opaque)
 } else {
 ret = nbd_handle_request(client, , req->data, _err);
 }
+if (request.contexts && request.contexts != >contexts) {
+assert(request.type == NBD_CMD_BLOCK_STATUS);
+g_free(request.contexts->bitmaps);
+g_free(request.contexts);
+}
 if (ret < 0) {
 error_prepend(_err, "Failed to send reply: ");
 goto disconnect;
-- 
2.41.0

[PATCH v5 09/17] nbd/server: Support 64-bit block status

2023-08-10 Thread Eric Blake

The NBD spec states that if the client negotiates extended headers,
the server must avoid NBD_REPLY_TYPE_BLOCK_STATUS and instead use
NBD_REPLY_TYPE_BLOCK_STATUS_EXT which supports 64-bit lengths, even if
the reply does not need more than 32 bits.  As of this patch,
client->mode is still never NBD_MODE_EXTENDED, so the code added here
does not take effect until the next patch enables negotiation.

For now, all metacontexts that we know how to export never populate
more than 32 bits of information, so we don't have to worry about
NBD_REP_ERR_EXT_HEADER_REQD or filtering during handshake, and we
always send all zeroes for the upper 32 bits of status during
NBD_CMD_BLOCK_STATUS.

Note that we previously had some interesting size-juggling on call
chains, such as:

nbd_co_send_block_status(uint32_t length)
-> blockstatus_to_extents(uint32_t bytes)
  -> bdrv_block_status_above(bytes, _t num)
  -> nbd_extent_array_add(uint64_t num)
-> store num in 32-bit length

But we were lucky that it never overflowed: bdrv_block_status_above
never sets num larger than bytes, and we had previously been capping
'bytes' at 32 bits (since the protocol does not allow sending a larger
request without extended headers).  This patch adds some assertions
that ensure we continue to avoid overflowing 32 bits for a narrow
client, while fully utilizing 64-bits all the way through when the
client understands that.  Even in 64-bit math, overflow is not an
issue, because all lengths are coming from the block layer, and we
know that the block layer does not support images larger than off_t
(if lengths were coming from the network, the story would be
different).

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

v5: stronger justification on assertion [Vladimir], add R-b

v4: split conversion to big-endian across two helper functions rather
than in-place union [Vladimir]
---
 nbd/server.c | 108 ++-
 1 file changed, 82 insertions(+), 26 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 5c06a6466ec..af41810e9e7 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2115,20 +2115,24 @@ static int coroutine_fn 
nbd_co_send_sparse_read(NBDClient *client,
 }

 typedef struct NBDExtentArray {
-NBDExtent32 *extents;
+NBDExtent64 *extents;
 unsigned int nb_alloc;
 unsigned int count;
 uint64_t total_length;
+bool extended;
 bool can_add;
 bool converted_to_be;
 } NBDExtentArray;

-static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc)
+static NBDExtentArray *nbd_extent_array_new(unsigned int nb_alloc,
+NBDMode mode)
 {
 NBDExtentArray *ea = g_new0(NBDExtentArray, 1);

+assert(mode >= NBD_MODE_STRUCTURED);
 ea->nb_alloc = nb_alloc;
-ea->extents = g_new(NBDExtent32, nb_alloc);
+ea->extents = g_new(NBDExtent64, nb_alloc);
+ea->extended = mode >= NBD_MODE_EXTENDED;
 ea->can_add = true;

 return ea;
@@ -2147,15 +2151,36 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
 int i;

 assert(!ea->converted_to_be);
+assert(ea->extended);
 ea->can_add = false;
 ea->converted_to_be = true;

 for (i = 0; i < ea->count; i++) {
-ea->extents[i].flags = cpu_to_be32(ea->extents[i].flags);
-ea->extents[i].length = cpu_to_be32(ea->extents[i].length);
+ea->extents[i].length = cpu_to_be64(ea->extents[i].length);
+ea->extents[i].flags = cpu_to_be64(ea->extents[i].flags);
 }
 }

+/* Further modifications of the array after conversion are abandoned */
+static NBDExtent32 *nbd_extent_array_convert_to_narrow(NBDExtentArray *ea)
+{
+int i;
+NBDExtent32 *extents = g_new(NBDExtent32, ea->count);
+
+assert(!ea->converted_to_be);
+assert(!ea->extended);
+ea->can_add = false;
+ea->converted_to_be = true;
+
+for (i = 0; i < ea->count; i++) {
+assert((ea->extents[i].length | ea->extents[i].flags) <= UINT32_MAX);
+extents[i].length = cpu_to_be32(ea->extents[i].length);
+extents[i].flags = cpu_to_be32(ea->extents[i].flags);
+}
+
+return extents;
+}
+
 /*
  * Add extent to NBDExtentArray. If extent can't be added (no available space),
  * return -1.
@@ -2166,19 +2191,27 @@ static void 
nbd_extent_array_convert_to_be(NBDExtentArray *ea)
  * would result in an incorrect range reported to the client)
  */
 static int nbd_extent_array_add(NBDExtentArray *ea,
-uint32_t length, uint32_t flags)
+uint64_t length, uint32_t flags)
 {
 assert(ea->can_add);

 if (!length) {
 return 0;
 }
+if (!ea->extended) {
+assert(length <= UINT32_MAX);
+}

 /* Extend previous extent if flags are the same */
 if (ea->count > 0 && flags == ea->extents[ea->count - 1].flags) {
-uint64_t sum = (uint64_t)length + ea->extents[ea->count - 1].length;
+uint64_t sum =

Re: [PATCH 0/8] some testing and gdbstub fixes

2023-08-10 Thread Richard Henderson


On 8/10/23 09:43, Richard Henderson wrote:

On 8/10/23 09:35, Alex Bennée wrote:

So 7 and 8? I would argue for 6 as well given that's a foot gun just
waiting to happen.


Yes, the timing issues with 6 are nasty.


I'm going to queue 6-8 to tcg-next, along with the %x change Phil suggested for logging 
non-ASCII characters.



r~

Re: [PATCH 2/8] target/riscv: make CPUCFG() macro public

2023-08-10 Thread Alistair Francis

On Fri, Jul 28, 2023 at 9:20 AM Daniel Henrique Barboza
 wrote:
>
> The RISC-V KVM driver uses a CPUCFG() macro that calculates the offset
> of a certain field in the struct RISCVCPUConfig. We're going to use this
> macro in target/riscv/cpu.c as well in the next patches. Make it public.
>
> Rename it to CPU_CFG_OFFSET() for more clarity while we're at it.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 2 +-
>  target/riscv/cpu.h | 2 ++
>  target/riscv/kvm.c | 8 +++-
>  3 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 644ce7a018..3e62881d85 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -48,7 +48,7 @@ struct isa_ext_data {
>  };
>
>  #define ISA_EXT_DATA_ENTRY(_name, _min_ver, _prop) \
> -{#_name, _min_ver, offsetof(struct RISCVCPUConfig, _prop)}
> +{#_name, _min_ver, CPU_CFG_OFFSET(_prop)}
>
>  /*
>   * From vector_helper.c
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index 6ea22e0eea..577abcd724 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -62,6 +62,8 @@
>  const char *riscv_get_misa_ext_name(uint32_t bit);
>  const char *riscv_get_misa_ext_description(uint32_t bit);
>
> +#define CPU_CFG_OFFSET(_prop) offsetof(struct RISCVCPUConfig, _prop)
> +
>  /* Privileged specification version */
>  enum {
>  PRIV_VERSION_1_10_0 = 0,
> diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c
> index 9d8a8982f9..9b8565d809 100644
> --- a/target/riscv/kvm.c
> +++ b/target/riscv/kvm.c
> @@ -198,10 +198,8 @@ static void kvm_riscv_update_cpu_misa_ext(RISCVCPU *cpu, 
> CPUState *cs)
>  }
>  }
>
> -#define CPUCFG(_prop) offsetof(struct RISCVCPUConfig, _prop)
> -
>  #define KVM_EXT_CFG(_name, _prop, _reg_id) \
> -{.name = _name, .offset = CPUCFG(_prop), \
> +{.name = _name, .offset = CPU_CFG_OFFSET(_prop), \
>   .kvm_reg_id = _reg_id}
>
>  static KVMCPUConfig kvm_multi_ext_cfgs[] = {
> @@ -278,13 +276,13 @@ static void kvm_cpu_set_multi_ext_cfg(Object *obj, 
> Visitor *v,
>
>  static KVMCPUConfig kvm_cbom_blocksize = {
>  .name = "cbom_blocksize",
> -.offset = CPUCFG(cbom_blocksize),
> +.offset = CPU_CFG_OFFSET(cbom_blocksize),
>  .kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicbom_block_size)
>  };
>
>  static KVMCPUConfig kvm_cboz_blocksize = {
>  .name = "cboz_blocksize",
> -.offset = CPUCFG(cboz_blocksize),
> +.offset = CPU_CFG_OFFSET(cboz_blocksize),
>  .kvm_reg_id = KVM_REG_RISCV_CONFIG_REG(zicboz_block_size)
>  };
>
> --
> 2.41.0
>
>

Re: [PATCH 1/8] target/riscv/cpu.c: use offset in isa_ext_is_enabled/update_enabled

2023-08-10 Thread Alistair Francis

On Fri, Jul 28, 2023 at 9:18 AM Daniel Henrique Barboza
 wrote:
>
> We'll have future usage for a function where, given an offset of the
> struct RISCVCPUConfig, the flag is updated to a certain val.
>
> Change all existing callers to use edata->ext_enable_offset instead of
> 'edata'.
>
> Signed-off-by: Daniel Henrique Barboza 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index b5a2266eef..644ce7a018 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -153,18 +153,17 @@ static const struct isa_ext_data isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(xventanacondops, PRIV_VERSION_1_12_0, 
> ext_XVentanaCondOps),
>  };
>
> -static bool isa_ext_is_enabled(RISCVCPU *cpu,
> -   const struct isa_ext_data *edata)
> +static bool isa_ext_is_enabled(RISCVCPU *cpu, uint32_t ext_offset)
>  {
> -bool *ext_enabled = (void *)>cfg + edata->ext_enable_offset;
> +bool *ext_enabled = (void *)>cfg + ext_offset;
>
>  return *ext_enabled;
>  }
>
> -static void isa_ext_update_enabled(RISCVCPU *cpu,
> -   const struct isa_ext_data *edata, bool en)
> +static void isa_ext_update_enabled(RISCVCPU *cpu, uint32_t ext_offset,
> +   bool en)
>  {
> -bool *ext_enabled = (void *)>cfg + edata->ext_enable_offset;
> +bool *ext_enabled = (void *)>cfg + ext_offset;
>
>  *ext_enabled = en;
>  }
> @@ -1025,9 +1024,10 @@ static void 
> riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU *cpu)
>
>  /* Force disable extensions if priv spec version does not match */
>  for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> -if (isa_ext_is_enabled(cpu, _edata_arr[i]) &&
> +if (isa_ext_is_enabled(cpu, isa_edata_arr[i].ext_enable_offset) &&
>  (env->priv_ver < isa_edata_arr[i].min_version)) {
> -isa_ext_update_enabled(cpu, _edata_arr[i], false);
> +isa_ext_update_enabled(cpu, isa_edata_arr[i].ext_enable_offset,
> +   false);
>  #ifndef CONFIG_USER_ONLY
>  warn_report("disabling %s extension for hart 0x" TARGET_FMT_lx
>  " because privilege spec version does not match",
> @@ -2271,7 +2271,7 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
> **isa_str,
>  int i;
>
>  for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> -if (isa_ext_is_enabled(cpu, _edata_arr[i])) {
> +if (isa_ext_is_enabled(cpu, isa_edata_arr[i].ext_enable_offset)) {
>  new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
>  g_free(old);
>  old = new;
> --
> 2.41.0
>
>

Re: [PATCH] target/riscv: Fix zfa fleq.d and fltq.d

2023-08-10 Thread Alistair Francis

On Thu, Jul 27, 2023 at 8:50 PM LIU Zhiwei  wrote:
>
> Commit a47842d ("riscv: Add support for the Zfa extension") implemented the 
> zfa extension.
> However, it has some typos for fleq.d and fltq.d. Both of them misused the 
> fltq.s
> helper function.
>
> Signed-off-by: LIU Zhiwei 

Thanks!

Applied to riscv-to-apply.next after adding a `Fixes` tag

Alistair

> ---
>  target/riscv/insn_trans/trans_rvzfa.c.inc | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/insn_trans/trans_rvzfa.c.inc 
> b/target/riscv/insn_trans/trans_rvzfa.c.inc
> index 2c715af3e5..0fdd2698f6 100644
> --- a/target/riscv/insn_trans/trans_rvzfa.c.inc
> +++ b/target/riscv/insn_trans/trans_rvzfa.c.inc
> @@ -470,7 +470,7 @@ bool trans_fleq_d(DisasContext *ctx, arg_fleq_d *a)
>  TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
>  TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
>
> -gen_helper_fltq_s(dest, cpu_env, src1, src2);
> +gen_helper_fleq_d(dest, cpu_env, src1, src2);
>  gen_set_gpr(ctx, a->rd, dest);
>  return true;
>  }
> @@ -485,7 +485,7 @@ bool trans_fltq_d(DisasContext *ctx, arg_fltq_d *a)
>  TCGv_i64 src1 = get_fpr_hs(ctx, a->rs1);
>  TCGv_i64 src2 = get_fpr_hs(ctx, a->rs2);
>
> -gen_helper_fltq_s(dest, cpu_env, src1, src2);
> +gen_helper_fltq_d(dest, cpu_env, src1, src2);
>  gen_set_gpr(ctx, a->rd, dest);
>  return true;
>  }
> --
> 2.17.1
>
>

Re: [PATCH] hw/pci-host: Allow extended config space access for Designware PCIe host

2023-08-10 Thread Jason Chien

As far as I know, the order issue is caused by nested device realization.
In this case, realizing TYPE_DESIGNWARE_PCIE_HOST will also
realize TYPE_DESIGNWARE_PCIE_ROOT(see designware_pcie_host_realize()).
device_set_realized() is the function that realizing a device must go
through, and this function first realizes the device by dc->realize() and
then realizes the device's child bus by qbus_realize(). Whether there is
any child bus of the device may depend on dc->realize(). The realization
flow will be like a recursive call to device_set_realized(). More
precisely, the flow in this case is: qdev_realize() --> ... --> FIRST
device_set_realized() --> FIRST dc->realize() --> ...
--> designware_pcie_host_realize() --> qdev_realize() --> ... --> SECOND
device_set_realized() --> SECOND dc->realize() --> ...
--> designware_pcie_root_realize() --> ...--> back to the SECOND
device_set_realized() --> SECOND qbus_realize() the CHILD bus "dw-pcie" -->
... --> back to the FIRST device_set_realized() --> FIRST qbus_realize()
the PARENT bus "pcie".

I also found this patch
 that
solves the same bus issue.

Do you have any suggestions on the order of realization? Thanks!

On Thu, Aug 10, 2023 at 5:24 AM Michael S. Tsirkin  wrote:

> On Wed, Aug 09, 2023 at 10:22:50AM +, Jason Chien wrote:
> > In pcie_bus_realize(), a root bus is realized as a PCIe bus and a
> non-root
> > bus is realized as a PCIe bus if its parent bus is a PCIe bus. However,
> > the child bus "dw-pcie" is realized before the parent bus "pcie" which is
> > the root PCIe bus. Thus, the extended configuration space is not
> accessible
> > on "dw-pcie". The issue can be resolved by adding the
> > PCI_BUS_EXTENDED_CONFIG_SPACE flag to "pcie" before "dw-pcie" is
> realized.
> >
> > Signed-off-by: Jason Chien 
>
> I think we should fix the order of initialization rather than
> hack around it.
>
> > ---
> >  hw/pci-host/designware.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/pci-host/designware.c b/hw/pci-host/designware.c
> > index 9e183caa48..388d252ee2 100644
> > --- a/hw/pci-host/designware.c
> > +++ b/hw/pci-host/designware.c
> > @@ -694,6 +694,7 @@ static void designware_pcie_host_realize(DeviceState
> *dev, Error **errp)
> >   >pci.io,
> >   0, 4,
> >   TYPE_PCIE_BUS);
> > +pci->bus->flags |= PCI_BUS_EXTENDED_CONFIG_SPACE;
> >
> >  memory_region_init(>pci.address_space_root,
> > OBJECT(s),
> > --
> > 2.17.1
>
>

Re: [RESEND PATCH v3 1/1] target/riscv: Add Zihintntl extension ISA string to DTS

2023-08-10 Thread Alistair Francis

On Wed, Jul 26, 2023 at 3:42 AM Jason Chien  wrote:
>
> RVA23 Profiles states:
> The RVA23 profiles are intended to be used for 64-bit application
> processors that will run rich OS stacks from standard binary OS
> distributions and with a substantial number of third-party binary user
> applications that will be supported over a considerable length of time
> in the field.
>
> The chapter 4 of the unprivileged spec introduces the Zihintntl extension
> and Zihintntl is a mandatory extension presented in RVA23 Profiles, whose
> purpose is to enable application and operating system portability across
> different implementations. Thus the DTS should contain the Zihintntl ISA
> string in order to pass to software.
>
> The unprivileged spec states:
> Like any HINTs, these instructions may be freely ignored. Hence, although
> they are described in terms of cache-based memory hierarchies, they do not
> mandate the provision of caches.
>
> These instructions are encoded with non-used opcode, e.g. ADD x0, x0, x2,
> which QEMU already supports, and QEMU does not emulate cache. Therefore
> these instructions can be considered as a no-op, and we only need to add
> a new property for the Zihintntl extension.
>
> Reviewed-by: Frank Chang 
> Reviewed-by: Alistair Francis 
> Signed-off-by: Jason Chien 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/cpu.c | 2 ++
>  target/riscv/cpu_cfg.h | 1 +
>  2 files changed, 3 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 921c19e6cd..a49e934b41 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -87,6 +87,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
>  ISA_EXT_DATA_ENTRY(zicond, PRIV_VERSION_1_12_0, ext_zicond),
>  ISA_EXT_DATA_ENTRY(zicsr, PRIV_VERSION_1_10_0, ext_icsr),
>  ISA_EXT_DATA_ENTRY(zifencei, PRIV_VERSION_1_10_0, ext_ifencei),
> +ISA_EXT_DATA_ENTRY(zihintntl, PRIV_VERSION_1_10_0, ext_zihintntl),
>  ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
>  ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
>  ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
> @@ -1763,6 +1764,7 @@ static Property riscv_cpu_extensions[] = {
>  DEFINE_PROP_BOOL("sscofpmf", RISCVCPU, cfg.ext_sscofpmf, false),
>  DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
>  DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
> +DEFINE_PROP_BOOL("Zihintntl", RISCVCPU, cfg.ext_zihintntl, true),
>  DEFINE_PROP_BOOL("Zihintpause", RISCVCPU, cfg.ext_zihintpause, true),
>  DEFINE_PROP_BOOL("Zawrs", RISCVCPU, cfg.ext_zawrs, true),
>  DEFINE_PROP_BOOL("Zfa", RISCVCPU, cfg.ext_zfa, true),
> diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
> index 2bd9510ba3..518686eaa3 100644
> --- a/target/riscv/cpu_cfg.h
> +++ b/target/riscv/cpu_cfg.h
> @@ -66,6 +66,7 @@ struct RISCVCPUConfig {
>  bool ext_icbom;
>  bool ext_icboz;
>  bool ext_zicond;
> +bool ext_zihintntl;
>  bool ext_zihintpause;
>  bool ext_smstateen;
>  bool ext_sstc;
> --
> 2.17.1
>
>

Re: [PATCH 1/2] riscv: zicond: make non-experimental

2023-08-10 Thread Alistair Francis

On Tue, Aug 8, 2023 at 2:18 PM Vineet Gupta  wrote:
>
> zicond is now codegen supported in both llvm and gcc.
>
> This change allows seamless enabling/testing of zicond in downstream
> projects. e.g. currently riscv-gnu-toolchain parses elf attributes
> to create a cmdline for qemu but fails short of enabling it because of
> the "x-" prefix.
>
> Signed-off-by: Vineet Gupta 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 6b93b04453c8..022bd9d01223 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -1816,6 +1816,7 @@ static Property riscv_cpu_extensions[] = {
>  DEFINE_PROP_BOOL("zcf", RISCVCPU, cfg.ext_zcf, false),
>  DEFINE_PROP_BOOL("zcmp", RISCVCPU, cfg.ext_zcmp, false),
>  DEFINE_PROP_BOOL("zcmt", RISCVCPU, cfg.ext_zcmt, false),
> +DEFINE_PROP_BOOL("zicond", RISCVCPU, cfg.ext_zicond, false),
>
>  /* Vendor-specific custom extensions */
>  DEFINE_PROP_BOOL("xtheadba", RISCVCPU, cfg.ext_xtheadba, false),
> @@ -1832,7 +1833,6 @@ static Property riscv_cpu_extensions[] = {
>  DEFINE_PROP_BOOL("xventanacondops", RISCVCPU, cfg.ext_XVentanaCondOps, 
> false),
>
>  /* These are experimental so mark with 'x-' */
> -DEFINE_PROP_BOOL("x-zicond", RISCVCPU, cfg.ext_zicond, false),
>
>  /* ePMP 0.9.3 */
>  DEFINE_PROP_BOOL("x-epmp", RISCVCPU, cfg.epmp, false),
> --
> 2.34.1
>
>

Re: [PATCH 1/2] riscv: zicond: make non-experimental

2023-08-10 Thread Alistair Francis

On Tue, Aug 8, 2023 at 5:16 PM Palmer Dabbelt  wrote:
>
> On Tue, 08 Aug 2023 14:10:54 PDT (-0700), dbarb...@ventanamicro.com wrote:
> >
> >
> > On 8/8/23 17:52, Palmer Dabbelt wrote:
> >> On Tue, 08 Aug 2023 11:45:49 PDT (-0700), Vineet Gupta wrote:
> >>>
> >>>
> >>> On 8/8/23 11:29, Richard Henderson wrote:
>  On 8/8/23 11:17, Vineet Gupta wrote:
> > zicond is now codegen supported in both llvm and gcc.
> 
>  It is still not in
> 
>  https://wiki.riscv.org/display/HOME/Recently+Ratified+Extensions
> >>>
> >>> Right, its been frozen since April though and with support trickling in
> >>> rest of tooling it becomes harder to test.
> >>> I don't know what exactly QEMU's policy is on this ?
> >>
> >> IIUC we'd historically marked stuff as non-experimental when it's frozen, 
> >> largely because ratification is such a nebulous process. There's obviously 
> >> risk there, but there's risk to anything.  Last I can find is 260b594d8a 
> >> ("RISC-V: Add Zawrs ISA extension support"), which specifically calls out 
> >> Zawrs as frozen and IIUC adds support without the "x-" prefix.
> >
> > If that's the case then I think it's sensible to remove the 'experimental' 
> > status
> > of zicond as well.
> >
> >>
> >> I can't find anything written down about it, though...
> >
> > As soon as we agree on an official policy I'll do a doc update. Thanks,
>
> Thanks.  We should probably give Alistair some time to chime in, it's
> still pretty early there.

Frozen should be enough to remove the `x-`. We do have it written down
at: 
https://wiki.qemu.org/Documentation/Platforms/RISCV#RISC-V_Foundation_Extensions

Alistair

>
> >
> >
> > Daniel
> >
> >>
>

Re: [PATCH 5/5] target/arm: Implement cortex-a710

2023-08-10 Thread Peter Maydell

On Thu, 10 Aug 2023 at 18:05, Richard Henderson
 wrote:
>
> On 8/10/23 08:49, Peter Maydell wrote:
> > On Thu, 10 Aug 2023 at 03:36, Richard Henderson
> >  wrote:
> >>
> >> The cortex-a710 is a first generation ARMv9.0-A processor.
> >>
> >> Signed-off-by: Richard Henderson 
> >> ---
> >>   docs/system/arm/virt.rst |   1 +
> >>   hw/arm/virt.c|   1 +
> >>   target/arm/tcg/cpu64.c   | 167 +++
> >>   3 files changed, 169 insertions(+)
> >>
> >> diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
> >> index 51cdac6841..e1697ac8f4 100644
> >> --- a/docs/system/arm/virt.rst
> >> +++ b/docs/system/arm/virt.rst
> >> @@ -58,6 +58,7 @@ Supported guest CPU types:
> >>   - ``cortex-a57`` (64-bit)
> >>   - ``cortex-a72`` (64-bit)
> >>   - ``cortex-a76`` (64-bit)
> >> +- ``cortex-a710`` (64-bit)
> >>   - ``a64fx`` (64-bit)
> >>   - ``host`` (with KVM only)
> >>   - ``neoverse-n1`` (64-bit)
> >> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> >> index 7d9dbc2663..d1522c305d 100644
> >> --- a/hw/arm/virt.c
> >> +++ b/hw/arm/virt.c
> >> @@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
> >>   ARM_CPU_TYPE_NAME("cortex-a55"),
> >>   ARM_CPU_TYPE_NAME("cortex-a72"),
> >>   ARM_CPU_TYPE_NAME("cortex-a76"),
> >> +ARM_CPU_TYPE_NAME("cortex-a710"),
> >>   ARM_CPU_TYPE_NAME("a64fx"),
> >>   ARM_CPU_TYPE_NAME("neoverse-n1"),
> >>   ARM_CPU_TYPE_NAME("neoverse-v1"),
> >
> > Will sbsa-ref want this core ?
>
> It only has 40 PA bits, and I think sbsa-ref requires 48.

Yes, it does want 48 (we ran into that with some other core).

> >> +cpu->isar.id_mmfr4 = 0x21021110;
> >
> > I don't think we implement HPDS == 2 (that's FEAT_HPDS2).
> > I guess we should push it down to HPDS 1 only in cpu.c
> > for now. (Or implement it, it's probably simple.)
>
> Feh.  I thought I'd double-checked all of the features.
> I'll have a look at implementing that.

I think we (meaning Linaro) kind of noted a lot of features
as architecturally optional and then didn't think through
that we might need them anyway for specific implementations.
(I got surprised by FEAT_NV that way for the Neoverse-V1.)

> >> +cpu->ctr   = 0x0004b444c004ull; /* with DIC set */
> >
> > Why set DIC? The h/w doesn't.
>
> Heh.  From the comment in neoverse-v1, I thought you had force enabled it 
> there.  But it
> must simply be a h/w option?

Yes, the Neoverse-V1 TRM documents a config option of
"instruction cache hardware coherency" (which sets DIC),
and that the IDC pin "reflects the inverse value of the
BROADCASTCACHEMAINTPOU pin". So I opted for the config
choices that happen to be faster for QEMU.

thanks
-- PMM

Re: [PATCH 4/8] tests: remove test-gdbstub.py

2023-08-10 Thread Richard Henderson


On 8/10/23 08:36, Alex Bennée wrote:

This isn't directly called by our CI and because it doesn't run via
our run-test.py script does things slightly differently. Lets remove
it as we have plenty of working in-tree tests now for various aspects
of gdbstub.

Signed-off-by: Alex Bennée
---
  tests/guest-debug/test-gdbstub.py | 177 --
  1 file changed, 177 deletions(-)
  delete mode 100644 tests/guest-debug/test-gdbstub.py


The first sentence could be clearer.  But as it's unused,

Acked-by: Richard Henderson 


r~

Re: [PATCH] linux-user/elfload: Set V in ELF_HWCAP for RISC-V

2023-08-10 Thread Alistair Francis

On Tue, Aug 8, 2023 at 2:37 AM Michael Tokarev  wrote:
>
> 03.08.2023 16:14, Nathan Egge wrote:
> > From: "Nathan Egge" 
> >
> > Set V bit for hwcap if misa is set.
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1793
> > Signed-off-by: Nathan Egge 
> > ---
> >   linux-user/elfload.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> > index 861ec07abc..a299ba7300 100644
> > --- a/linux-user/elfload.c
> > +++ b/linux-user/elfload.c
> > @@ -1710,7 +1710,8 @@ static uint32_t get_elf_hwcap(void)
> >   #define MISA_BIT(EXT) (1 << (EXT - 'A'))
> >   RISCVCPU *cpu = RISCV_CPU(thread_cpu);
> >   uint32_t mask = MISA_BIT('I') | MISA_BIT('M') | MISA_BIT('A')
> > -| MISA_BIT('F') | MISA_BIT('D') | MISA_BIT('C');
> > +| MISA_BIT('F') | MISA_BIT('D') | MISA_BIT('C')
> > +| MISA_BIT('V');
>
> Is smells like a -stable material (incl. 7.2), is it not?

I think so as well

Alistair

>
> Thanks,
>
> /mjt
>

Re: [PATCH 5/8] tests/tcg: clean-up gdb confirm/pagination settings

2023-08-10 Thread Richard Henderson


On 8/10/23 08:36, Alex Bennée wrote:

We can do this all in the run-test.py script so remove the extraneous
bits from the individual tests which got copied from the original
non-CI gdb tests.

Signed-off-by: Alex Bennée
---
  tests/guest-debug/run-test.py | 2 ++
  tests/tcg/aarch64/gdbstub/test-sve-ioctl.py   | 3 ---
  tests/tcg/aarch64/gdbstub/test-sve.py | 3 ---
  tests/tcg/multiarch/gdbstub/memory.py | 3 ---
  tests/tcg/multiarch/gdbstub/sha1.py   | 4 
  tests/tcg/multiarch/gdbstub/test-proc-mappings.py | 4 
  tests/tcg/multiarch/gdbstub/test-qxfer-auxv-read.py   | 4 
  tests/tcg/multiarch/gdbstub/test-thread-breakpoint.py | 4 
  tests/tcg/s390x/gdbstub/test-signals-s390x.py | 4 
  tests/tcg/s390x/gdbstub/test-svc.py   | 4 
  10 files changed, 2 insertions(+), 33 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 6/8] tests/tcg: ensure system-mode gdb tests start stopped

2023-08-10 Thread Richard Henderson


On 8/10/23 08:36, Alex Bennée wrote:

Without -S we run into potential races with tests starting before the
gdbstub attaches. We don't need to worry about user-mode as enabling
the gdbstub implies we wait for the initial connection.

Signed-off-by: Alex Bennée
---
  tests/guest-debug/run-test.py | 9 +++--
  1 file changed, 3 insertions(+), 6 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH] target/riscv: Implement WARL behaviour for mcountinhibit/mcounteren

2023-08-10 Thread Alistair Francis

On Wed, Aug 2, 2023 at 8:50 AM Rob Bradford  wrote:
>
> These are WARL fields - zero out the bits for unavailable counters and
> special case the TM bit in mcountinhibit which is hardwired to zero.
> This patch achieves this by modifying the value written so that any use
> of the field will see the correctly masked bits.
>
> Tested by modifying OpenSBI to write max value to these CSRs and upon
> subsequent read the appropriate number of bits for number of PMUs is
> enabled and the TM bit is zero in mcountinhibit.
>
> Signed-off-by: Rob Bradford 

Thanks!

Applied to riscv-to-apply.next

Alistair

> ---
>  target/riscv/csr.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..495ff6a9c2 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1834,8 +1834,11 @@ static RISCVException 
> write_mcountinhibit(CPURISCVState *env, int csrno,
>  {
>  int cidx;
>  PMUCTRState *counter;
> +RISCVCPU *cpu = env_archcpu(env);
>
> -env->mcountinhibit = val;
> +/* WARL register - disable unavailable counters; TM bit is always 0 */
> +env->mcountinhibit =
> +val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_IR);
>
>  /* Check if any other counter is also monitoring cycles/instructions */
>  for (cidx = 0; cidx < RV_MAX_MHPMCOUNTERS; cidx++) {
> @@ -1858,7 +1861,11 @@ static RISCVException read_mcounteren(CPURISCVState 
> *env, int csrno,
>  static RISCVException write_mcounteren(CPURISCVState *env, int csrno,
> target_ulong val)
>  {
> -env->mcounteren = val;
> +RISCVCPU *cpu = env_archcpu(env);
> +
> +/* WARL register - disable unavailable counters */
> +env->mcounteren = val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | 
> COUNTEREN_TM |
> + COUNTEREN_IR);
>  return RISCV_EXCP_NONE;
>  }
>
> --
> 2.41.0
>
>

Re:Re: [PATCH v1 1/3] softmmu/physmem: fallback to opening guest RAM file as readonly in a MAP_PRIVATE mapping

2023-08-10 Thread ThinerLogoer

At 2023-08-10 22:19:45, "David Hildenbrand"  wrote:
>>> Most importantly, we won't be corrupting/touching the original file in any
>>> case, because it is R/O.
>>>
>>> If we really want to be careful, we could clue that behavior to compat
>>> machines. I'm not really sure yet if we really have to go down that path.
>>>
>>> Any other alternatives? I'd like to avoid new flags where not really
>>> required.
>> 
>> I was just thinking of a new flag. :) So have you already discussed that
>> possibility and decided that not a good idea?
>
>Not really. I was briefly playing with that idea but already struggled 
>to come up with a reasonable name :)
>
>Less toggles and just have it working nice, if possible.
>
>> 
>> The root issue to me here is we actually have two resources (memory map of
>> the process, and the file) but we only have one way to describe the
>> permissions upon the two objects.  I'd think it makes a lot more sense if a
>> new flag is added, when there's a need to differentiate the two.
>> 
>> Consider if you see a bunch of qemu instances with:
>> 
>>-mem-path $RAM_FILE
>> 
>> On the same host, which can be as weird as it could be to me.. At least
>> '-mem-path' looks still like a way to exclusively own a ram file for an
>> instance. I hesitate the new fallback can confuse people too, while that's
>> so far not the major use case.
>
>Once I learned that this is not a MAP_SHARED mapping, I was extremely 
>confused. For example, vhost-user with "-mem-path" will absolutely not 
>work with "-mem-path", even though the documentation explicitly spells 
>that out (I still have to send a patch to fix that).
>
>I guess "-mem-path" was primarily only used to consume hugetlb. Even for 
>tmpfs it will already result in a double memory consumption, just like 
>when using -memory-backend-memfd,share=no.
>
>I guess deprecating it was the right decision.
>
>But memory-backend-file also defaults to "share=no" ... so the same 
>default behavior unfortunately.
>
>> 
>> Nobody may really rely on any existing behavior of the failure, but
>> changing existing behavior is just always not wanted.  The guideline here
>> to me is: whether we want existing "-mem-path XXX" users to start using the
>> fallback in general?  If it's "no", then maybe it implies a new flag is
>> better?
>
>I think we have the following options (there might be more)
>
>1) This patch.
>
>2) New flag for memory-backend-file. We already have "readonly" and 
>"share=". I'm having a hard time coming up with a good name that really 
>describes the subtle difference.
>
>3) Glue behavior to the QEMU machine
>

4) '-deny-private-discard' argv, or environment variable, or both

I have proposed a 4) earlier in discussion which is to add a global qemu flag 
like
'-deny-private-discard' or '-disallow-private-discard' (let's find a better 
name!)
for some duration until private discard behavior phases out. We do
everything exactly the same as before without the flag, and with this flag,
private CoW mapping files are strictly opened readonly, discard on private
memory backend is brutally denied very early when possibility arises,
and private memory backed file are always opened readonly without
creating any file (so the file must exist, no more nasty edge cases).

This has the benefit that it can also help diagnose and debug all existing
private discard usages, which could be required in the long run. Therefore,
And with this flag we directly solves the immediate demand of
 while delays the hard problem indefinitely.
I think this solution seems most promising and acceptable by most ones.
At least for my use case, I would be glad to insert a such flag to my argv
if it is all I need, since it does not hurt the flexibility I care about.

Note that difference on this option probably should not cause difference
on the machine specification. Otherwise migration will fail because one
machine has this option and the other does not, which is absurd, since
it is a backend implementation flag.

>
>For 3), one option would be to always open a COW file readonly (as 
>Thiner originally proposed). We could leave "-mem-path" behavior alone 
>and only change memory-backend-file semantics. If the COW file does 
>*not* exist yet, we would refuse to create the file like patch 2+3 do. 
>Therefore, no ftruncate() errors, and fallocate() errors would always 
>happen.
>
>
>What are your thoughts?
>
>[...]
>

I would be happy if -mem-path stays supported since in this case I would
not need knowledge of memory backend before migration.

--

Regards,

logoerthiner

Re: [PATCH 5/5] target/arm: Implement cortex-a710

2023-08-10 Thread Richard Henderson


On 8/10/23 08:49, Peter Maydell wrote:

On Thu, 10 Aug 2023 at 03:36, Richard Henderson
 wrote:


The cortex-a710 is a first generation ARMv9.0-A processor.

Signed-off-by: Richard Henderson 
---
  docs/system/arm/virt.rst |   1 +
  hw/arm/virt.c|   1 +
  target/arm/tcg/cpu64.c   | 167 +++
  3 files changed, 169 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 51cdac6841..e1697ac8f4 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -58,6 +58,7 @@ Supported guest CPU types:
  - ``cortex-a57`` (64-bit)
  - ``cortex-a72`` (64-bit)
  - ``cortex-a76`` (64-bit)
+- ``cortex-a710`` (64-bit)
  - ``a64fx`` (64-bit)
  - ``host`` (with KVM only)
  - ``neoverse-n1`` (64-bit)
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7d9dbc2663..d1522c305d 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -211,6 +211,7 @@ static const char *valid_cpus[] = {
  ARM_CPU_TYPE_NAME("cortex-a55"),
  ARM_CPU_TYPE_NAME("cortex-a72"),
  ARM_CPU_TYPE_NAME("cortex-a76"),
+ARM_CPU_TYPE_NAME("cortex-a710"),
  ARM_CPU_TYPE_NAME("a64fx"),
  ARM_CPU_TYPE_NAME("neoverse-n1"),
  ARM_CPU_TYPE_NAME("neoverse-v1"),


Will sbsa-ref want this core ?


It only has 40 PA bits, and I think sbsa-ref requires 48.


+static void define_cortex_a710_cp_reginfo(ARMCPU *cpu)
+{
+/*
+ * The Cortex A710 has all of the Neoverse V1's IMPDEF
+ * registers and a few more of its own.
+ */
+define_arm_cp_regs(cpu, neoverse_n1_cp_reginfo);
+define_arm_cp_regs(cpu, neoverse_v1_cp_reginfo);
+define_arm_cp_regs(cpu, cortex_a710_cp_reginfo);


The TRM doesn't document the existence of these regs
from the n1 reginfo:

 { .name = "ERXPFGCDN_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 2,
   .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 { .name = "ERXPFGCTL_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 1,
   .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },
 { .name = "ERXPFGF_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 15, .crm = 2, .opc2 = 0,
   .access = PL1_RW, .type = ARM_CP_CONST, .resetvalue = 0 },

This one in the v1 reginfo:

 { .name = "CPUPPMCR3_EL3", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 6, .crn = 15, .crm = 2, .opc2 = 6,
   .access = PL3_RW, .type = ARM_CP_CONST, .resetvalue = 0 },

exists but has been renamed CPUPPMCR6_EL3, which means it's
a duplicate of an entry in your new array. Meanwhile the
A710's actual CPUPPMCR3_EL3 at 3, 0, c15, c2, 4 isn't in
your new array.

(I thought we had an assert to detect duplicate regdefs,
so I'm surprised this didn't fall over.)


It did fall over.  Pre-send-email testing mistake, which I found immediately after (of 
course).



+cpu->revidr = cpu->midr; /* mirror midr: "no significance" */


The bit about "no significance" is just the boilerplate text from
the architecture manual. I think we should continue our usual
practice of setting the revidr to 0.


Ok.


+cpu->isar.id_dfr0  = 0x06011099; /* w/o FEAT_TRF */


You don't have to suppress FEAT_TRF manually, we do
it in cpu.c.


Ok.


+cpu->isar.id_isar5 = 0x11011121;


For isar5 we could say /* with Crypto */


Ok.


+cpu->isar.id_mmfr4 = 0x21021110;


I don't think we implement HPDS == 2 (that's FEAT_HPDS2).
I guess we should push it down to HPDS 1 only in cpu.c
for now. (Or implement it, it's probably simple.)


Feh.  I thought I'd double-checked all of the features.
I'll have a look at implementing that.


+cpu->ctr   = 0x0004b444c004ull; /* with DIC set */


Why set DIC? The h/w doesn't.


Heh.  From the comment in neoverse-v1, I thought you had force enabled it there.  But it 
must simply be a h/w option?



+cpu->ccsidr[0] = 0x00ff001aull; /* 64KB L1 dcache */
+cpu->ccsidr[1] = 0x00ff001aull; /* 64KB L1 icache */
+cpu->ccsidr[2] = 0x03ff003aull; /* 512KB L2 cache */


I was too lazy to do this for neoverse-v1, so I don't insist
on it here, but if we're going to find ourselves calculating
new-format ccsidr values by hand for each new CPU, I wonder if we
should define a macro that takes numsets, assoc, linesize,
subtracts 1 where relevant, and shifts them into the right bit
fields? (Shame the preprocessor can't do a log2() operation ;-))


I'll create something for this.
It doesn't need to be in the preprocessor.  :-)

Thanks for the careful review.


r~

Re: [PATCH] target/riscv: Implement WARL behaviour for mcountinhibit/mcounteren

2023-08-10 Thread Alistair Francis

On Wed, Aug 2, 2023 at 8:50 AM Rob Bradford  wrote:
>
> These are WARL fields - zero out the bits for unavailable counters and
> special case the TM bit in mcountinhibit which is hardwired to zero.
> This patch achieves this by modifying the value written so that any use
> of the field will see the correctly masked bits.
>
> Tested by modifying OpenSBI to write max value to these CSRs and upon
> subsequent read the appropriate number of bits for number of PMUs is
> enabled and the TM bit is zero in mcountinhibit.
>
> Signed-off-by: Rob Bradford 

Acked-by: Alistair Francis 

Alistair

> ---
>  target/riscv/csr.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> index ea7585329e..495ff6a9c2 100644
> --- a/target/riscv/csr.c
> +++ b/target/riscv/csr.c
> @@ -1834,8 +1834,11 @@ static RISCVException 
> write_mcountinhibit(CPURISCVState *env, int csrno,
>  {
>  int cidx;
>  PMUCTRState *counter;
> +RISCVCPU *cpu = env_archcpu(env);
>
> -env->mcountinhibit = val;
> +/* WARL register - disable unavailable counters; TM bit is always 0 */
> +env->mcountinhibit =
> +val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | COUNTEREN_IR);
>
>  /* Check if any other counter is also monitoring cycles/instructions */
>  for (cidx = 0; cidx < RV_MAX_MHPMCOUNTERS; cidx++) {
> @@ -1858,7 +1861,11 @@ static RISCVException read_mcounteren(CPURISCVState 
> *env, int csrno,
>  static RISCVException write_mcounteren(CPURISCVState *env, int csrno,
> target_ulong val)
>  {
> -env->mcounteren = val;
> +RISCVCPU *cpu = env_archcpu(env);
> +
> +/* WARL register - disable unavailable counters */
> +env->mcounteren = val & (cpu->pmu_avail_ctrs | COUNTEREN_CY | 
> COUNTEREN_TM |
> + COUNTEREN_IR);
>  return RISCV_EXCP_NONE;
>  }
>
> --
> 2.41.0
>
>

1 2 3 >

1 - 100 of 227 matches

Mail list logo