date:20230322

Re: About the instance_finalize callback in VFIO PCI

2023-03-22 Thread Yang Zhong

On Wed, Mar 22, 2023 at 12:22:27PM -0600, Alex Williamson wrote:
> On Wed, 22 Mar 2023 09:10:20 -0400
> Yang Zhong  wrote:
> 
> > On Wed, Mar 22, 2023 at 01:56:13PM +0100, Cédric Le Goater wrote:
> > > On 3/22/23 13:28, Yang Zhong wrote:  
> > > > On Tue, Mar 21, 2023 at 06:30:14PM +0100, Cédric Le Goater wrote:  
> > > > > On 3/20/23 10:31, Yang Zhong wrote:  
> > > > > > Hello Alex and Paolo,
> > > > > > 
> > > > > > There is one instance_finalize callback definition in 
> > > > > > hw/vfio/pci.c, but
> > > > > > i find this callback(vfio_instance_finalize()) never be called 
> > > > > > during the
> > > > > > VM shutdown with close VM or "init 0" command in guest.
> > > > > > 
> > > > > > The Qemu related command:
> > > > > >  ..
> > > > > >  -device vfio-pci,host=d9:00.0
> > > > > >  ..  
> > > > > 
> > > > > well, the finalize op is correctly called for hot unplugged devices, 
> > > > > using
> > > > > device_add.
> > > > >   
> > > > Thanks Cédric, i can use device_del command in the monitor to
> > > > trigger this instance_finalize callback function in the VFIO PCI.
> > > > thanks!  
> > > 
> > > yes. I think that in the shutdown case, QEMU simply relies on exit() to
> > > do the cleanup. On the kernel side, unmaps, fds being closed trigger any
> > > allocated resources.
> > > 
> > > Out of curiosity, what were you trying to achieve in the finalize op ?
> > >   
> >  
> >  We are doing one new feature, which need this callback to do some
> >  cleanup work with VFIO/iommufd kernel module. thanks!
> 
> This sounds dangerously like relying on userspace for cleanup.  Kernel
> drivers need to be able to perform all cleanup themselves when file
> descriptors are closed.  They must expect that userspace can be killed
> at any point in time w/o an opportunity to do cleanup work.  Thanks,
> 

  Thanks Alex, yes, we have moved these cleanup to kernel driver side.
  I was just curious about what scenario this instance_finalize callback 
  is used in VFIO PCI, now it is clear, thanks a lot!

  Regards,
  Yang


> Alex
>

Re: [BUG][KVM_SET_USER_MEMORY_REGION] KVM_SET_USER_MEMORY_REGION failed

2023-03-22 Thread Simon Jones

This is happened in ubuntu22.04.

QEMU is install by apt like this:
apt install -y qemu qemu-kvm qemu-system

and QEMU version is 6.2.0


Simon Jones


Simon Jones  于2023年3月21日周二 08:40写道：

>
>
> Hi all,
>
> I start a VM in openstack, and openstack use libvirt to start qemu VM, but
> now log show this ERROR.
> Is there any one know this?
>
> The ERROR log from /var/log/libvirt/qemu/instance-000e.log
> ```
> 2023-03-14T10:09:17.674114Z qemu-system-x86_64:
> kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4,
> start=0xfe00, size=0x2000: Invalid argument
> kvm_set_phys_mem: error registering slot: Invalid argument
> 2023-03-14 10:09:18.198+: shutting down, reason=crashed
> ```
>
> The xml file
> ```
> root@c1c2:~# cat /etc/libvirt/qemu/instance-000e.xml
> 
>
> 
>   instance-000e
>   ff91d2dc-69a1-43ef-abde-c9e4e9a0305b
>   
> http://openstack.org/xmlns/libvirt/nova/1.1
> ">
>   
>   provider-instance
>   2023-03-14 10:09:13
>   
> 64
> 1
> 0
> 0
> 1
>   
>   
>  uuid="ff627ad39ed94479b9c5033bc462cf78">admin
>  uuid="512866f9994f4ad8916d8539a7cdeec9">admin
>   
>   
>   
> 
>   
> 
>   
> 
>   
>   65536
>   65536
>   1
>   
> 
>   OpenStack Foundation
>   OpenStack Nova
>   25.1.0
>   ff91d2dc-69a1-43ef-abde-c9e4e9a0305b
>   ff91d2dc-69a1-43ef-abde-c9e4e9a0305b
>   Virtual Machine
> 
>   
>   
> hvm
> 
> 
>   
>   
> 
> 
> 
>   
>   
> 
>   
>   
> 
> 
> 
>   
>   destroy
>   restart
>   destroy
>   
> /usr/bin/qemu-system-x86_64
> 
>   
>file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/>
>   
>function='0x0'/>
> 
> 
>function='0x2'/>
> 
> 
> 
>   
>   
>  function='0x5'/>
>   
>function='0x0'/>
> 
> 
>file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log'
> append='off'/>
>   
> 
>   
> 
> 
>file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log'
> append='off'/>
>   
> 
> 
>   
> 
> 
> 
> 
>   
> 
> 
> 
>   
>function='0x0'/>
> 
> 
>   
> 
>   
>function='0x0'/>
> 
> 
>   
>function='0x0'/>
> 
> 
>   /dev/urandom
>function='0x0'/>
> 
>   
> 
> ```
>
>
> 
> Simon Jones
>

Re: [PATCH v4 1/2] target/riscv: separate priv from mmu_idx

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 10:44, Fei Wu wrote:

Currently it's assumed the 2 low bits of mmu_idx map to privilege mode,
this assumption won't last as we are about to add more mmu_idx.

For patch set has more than 1 patch, usually add a cover letter.


Signed-off-by: Fei Wu 
---
  target/riscv/cpu.h | 1 -
  target/riscv/cpu_helper.c  | 2 +-
  target/riscv/insn_trans/trans_privileged.c.inc | 2 +-
  target/riscv/insn_trans/trans_xthead.c.inc | 7 +--
  target/riscv/translate.c   | 3 +++
  5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..66f7e3d1ba 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -623,7 +623,6 @@ G_NORETURN void riscv_raise_exception(CPURISCVState *env,
  target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
  void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
  
-#define TB_FLAGS_PRIV_MMU_MASK3

  #define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
  #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
  #define TB_FLAGS_MSTATUS_VS MSTATUS_VS
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f88c503cf4..76e1b0100e 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -762,7 +762,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
   * (riscv_cpu_do_interrupt) is correct */
  MemTxResult res;
  MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
-int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
+int mode = env->priv;
  bool use_background = false;
  hwaddr ppn;
  RISCVCPU *cpu = env_archcpu(env);
diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 59501b2780..9305b18299 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -52,7 +52,7 @@ static bool trans_ebreak(DisasContext *ctx, arg_ebreak *a)
   * that no exception will be raised when fetching them.
   */
  
-if (semihosting_enabled(ctx->mem_idx < PRV_S) &&

+if (semihosting_enabled(ctx->priv < PRV_S) &&
  (pre_addr & TARGET_PAGE_MASK) == (post_addr & TARGET_PAGE_MASK)) {
  pre= opcode_at(>base, pre_addr);
  ebreak = opcode_at(>base, ebreak_addr);
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index df504c3f2c..adfb53cb4c 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -265,12 +265,7 @@ static bool trans_th_tst(DisasContext *ctx, arg_th_tst *a)
  
  static inline int priv_level(DisasContext *ctx)

  {
-#ifdef CONFIG_USER_ONLY
-return PRV_U;
-#else
- /* Priv level is part of mem_idx. */
-return ctx->mem_idx & TB_FLAGS_PRIV_MMU_MASK;
-#endif
+return ctx->priv;
  }
  
  /* Test if priv level is M, S, or U (cannot fail). */

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..e8880f9423 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -69,6 +69,7 @@ typedef struct DisasContext {
  uint32_t mstatus_hs_fs;
  uint32_t mstatus_hs_vs;
  uint32_t mem_idx;
+uint32_t priv;
  /* Remember the rounding mode encoded in the previous fp instruction,
 which we have already installed into env->fp_status.  Or -1 for
 no previous fp instruction.  Note that we exit the TB when writing
@@ -1162,8 +1163,10 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
  } else {
  ctx->virt_enabled = false;
  }
+ctx->priv = env->priv;


This is not right. You should put env->priv into tb flags before you use 
it in translation.


Zhiwei


  #else
  ctx->virt_enabled = false;
+ctx->priv = PRV_U;
  #endif
  ctx->misa_ext = env->misa_ext;
  ctx->frm = -1;  /* unknown rounding mode */

[PATCH v8 2/4] virtio-blk: add zoned storage emulation for zoned devices

2023-03-22 Thread Sam Li

This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.

The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.

The guest os can use blktests, fio to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.

Signed-off-by: Sam Li 
---
 hw/block/virtio-blk-common.c |   2 +
 hw/block/virtio-blk.c| 389 +++
 2 files changed, 391 insertions(+)

diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
index ac52d7c176..e2f8e2f6da 100644
--- a/hw/block/virtio-blk-common.c
+++ b/hw/block/virtio-blk-common.c
@@ -29,6 +29,8 @@ static const VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_blk_config, discard_sector_alignment)},
 {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
  .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{.flags = 1ULL << VIRTIO_BLK_F_ZONED,
+ .end = endof(struct virtio_blk_config, zoned)},
 {}
 };
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index cefca93b31..66c2bc4b16 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -17,6 +17,7 @@
 #include "qemu/module.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "block/block_int.h"
 #include "trace.h"
 #include "hw/block/block.h"
 #include "hw/qdev-properties.h"
@@ -601,6 +602,335 @@ err:
 return err_status;
 }
 
+typedef struct ZoneCmdData {
+VirtIOBlockReq *req;
+struct iovec *in_iov;
+unsigned in_num;
+union {
+struct {
+unsigned int nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+int64_t offset;
+} zone_append_data;
+};
+} ZoneCmdData;
+
+/*
+ * check zoned_request: error checking before issuing requests. If all checks
+ * passed, return true.
+ * append: true if only zone append requests issued.
+ */
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
+ bool append, uint8_t *status) {
+BlockDriverState *bs = blk_bs(s->blk);
+int index;
+
+if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
+*status = VIRTIO_BLK_S_UNSUPP;
+return false;
+}
+
+if (offset < 0 || len < 0 || len > (bs->total_sectors << BDRV_SECTOR_BITS)
+|| offset > (bs->total_sectors << BDRV_SECTOR_BITS) - len) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (append) {
+if (bs->bl.write_granularity) {
+if ((offset % bs->bl.write_granularity) != 0) {
+*status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
+return false;
+}
+}
+
+index = offset / bs->bl.zone_size;
+if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (len / 512 > bs->bl.max_append_sectors) {
+if (bs->bl.max_append_sectors == 0) {
+*status = VIRTIO_BLK_S_UNSUPP;
+} else {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+}
+return false;
+}
+}
+return true;
+}
+
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
+{
+ZoneCmdData *data = opaque;
+VirtIOBlockReq *req = data->req;
+VirtIOBlock *s = req->dev;
+VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
+struct iovec *in_iov = data->in_iov;
+unsigned in_num = data->in_num;
+int64_t zrp_size, n, j = 0;
+int64_t nz = data->zone_report_data.nr_zones;
+int8_t err_status = VIRTIO_BLK_S_OK;
+
+if (ret) {
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
+struct virtio_blk_zone_report zrp_hdr = (struct virtio_blk_zone_report) {
+.nr_zones = cpu_to_le64(nz),
+};
+zrp_size = sizeof(struct virtio_blk_zone_report)
+   + sizeof(struct virtio_blk_zone_descriptor) * nz;
+n = iov_from_buf(in_iov, in_num, 0, _hdr, sizeof(zrp_hdr));
+if (n != sizeof(zrp_hdr)) {
+virtio_error(vdev, "Driver provided input buffer that is too small!");
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
+for (size_t i = sizeof(zrp_hdr); i < zrp_size;
+i += sizeof(struct virtio_blk_zone_descriptor), ++j) {
+struct virtio_blk_zone_descriptor desc =
+(struct virtio_blk_zone_descriptor) {
+.z_start = cpu_to_le64(data->zone_report_data.zones[j].start
+>> BDRV_SECTOR_BITS),
+.z_cap = cpu_to_le64(data->zone_report_data.zones[j].cap
+

[PATCH v8 4/4] virtio-blk: add some trace events for zoned emulation

2023-03-22 Thread Sam Li

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/trace-events |  7 +++
 hw/block/virtio-blk.c | 12 
 2 files changed, 19 insertions(+)

diff --git a/hw/block/trace-events b/hw/block/trace-events
index 2c45a62bd5..34be8b9135 100644
--- a/hw/block/trace-events
+++ b/hw/block/trace-events
@@ -44,9 +44,16 @@ pflash_write_unknown(const char *name, uint8_t cmd) "%s: 
unknown command 0x%02x"
 # virtio-blk.c
 virtio_blk_req_complete(void *vdev, void *req, int status) "vdev %p req %p 
status %d"
 virtio_blk_rw_complete(void *vdev, void *req, int ret) "vdev %p req %p ret %d"
+virtio_blk_zone_report_complete(void *vdev, void *req, unsigned int nr_zones, 
int ret) "vdev %p req %p nr_zones %u ret %d"
+virtio_blk_zone_mgmt_complete(void *vdev, void *req, int ret) "vdev %p req %p 
ret %d"
+virtio_blk_zone_append_complete(void *vdev, void *req, int64_t sector, int 
ret) "vdev %p req %p, append sector 0x%" PRIx64 " ret %d"
 virtio_blk_handle_write(void *vdev, void *req, uint64_t sector, size_t 
nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
 virtio_blk_handle_read(void *vdev, void *req, uint64_t sector, size_t 
nsectors) "vdev %p req %p sector %"PRIu64" nsectors %zu"
 virtio_blk_submit_multireq(void *vdev, void *mrb, int start, int num_reqs, 
uint64_t offset, size_t size, bool is_write) "vdev %p mrb %p start %d num_reqs 
%d offset %"PRIu64" size %zu is_write %d"
+virtio_blk_handle_zone_report(void *vdev, void *req, int64_t sector, unsigned 
int nr_zones) "vdev %p req %p sector 0x%" PRIx64 " nr_zones %u"
+virtio_blk_handle_zone_mgmt(void *vdev, void *req, uint8_t op, int64_t sector, 
int64_t len) "vdev %p req %p op 0x%x sector 0x%" PRIx64 " len 0x%" PRIx64 ""
+virtio_blk_handle_zone_reset_all(void *vdev, void *req, int64_t sector, 
int64_t len) "vdev %p req %p sector 0x%" PRIx64 " cap 0x%" PRIx64 ""
+virtio_blk_handle_zone_append(void *vdev, void *req, int64_t sector) "vdev %p 
req %p, append sector 0x%" PRIx64 ""
 
 # hd-geometry.c
 hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p LCHS 
%d %d %d"
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 0d85c2c9b0..2afd5cf96c 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -676,6 +676,7 @@ static void virtio_blk_zone_report_complete(void *opaque, 
int ret)
 int64_t nz = data->zone_report_data.nr_zones;
 int8_t err_status = VIRTIO_BLK_S_OK;
 
+trace_virtio_blk_zone_report_complete(vdev, req, nz, ret);
 if (ret) {
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
 goto out;
@@ -792,6 +793,8 @@ static void virtio_blk_handle_zone_report(VirtIOBlockReq 
*req,
 nr_zones = (req->in_len - sizeof(struct virtio_blk_inhdr) -
 sizeof(struct virtio_blk_zone_report)) /
sizeof(struct virtio_blk_zone_descriptor);
+trace_virtio_blk_handle_zone_report(vdev, req,
+offset >> BDRV_SECTOR_BITS, nr_zones);
 
 zone_size = sizeof(BlockZoneDescriptor) * nr_zones;
 data = g_malloc(sizeof(ZoneCmdData));
@@ -814,7 +817,9 @@ static void virtio_blk_zone_mgmt_complete(void *opaque, int 
ret)
 {
 VirtIOBlockReq *req = opaque;
 VirtIOBlock *s = req->dev;
+VirtIODevice *vdev = VIRTIO_DEVICE(s);
 int8_t err_status = VIRTIO_BLK_S_OK;
+trace_virtio_blk_zone_mgmt_complete(vdev, req,ret);
 
 if (ret) {
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
@@ -841,6 +846,8 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, 
BlockZoneOp op)
 /* Entire drive capacity */
 offset = 0;
 len = capacity;
+trace_virtio_blk_handle_zone_reset_all(vdev, req, 0,
+   bs->total_sectors);
 } else {
 if (bs->bl.zone_size > capacity - offset) {
 /* The zoned device allows the last smaller zone. */
@@ -848,6 +855,9 @@ static int virtio_blk_handle_zone_mgmt(VirtIOBlockReq *req, 
BlockZoneOp op)
 } else {
 len = bs->bl.zone_size;
 }
+trace_virtio_blk_handle_zone_mgmt(vdev, req, op,
+  offset >> BDRV_SECTOR_BITS,
+  len >> BDRV_SECTOR_BITS);
 }
 
 if (!check_zoned_request(s, offset, len, false, _status)) {
@@ -888,6 +898,7 @@ static void virtio_blk_zone_append_complete(void *opaque, 
int ret)
 err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
 goto out;
 }
+trace_virtio_blk_zone_append_complete(vdev, req, append_sector, ret);
 
 out:
 aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
@@ -909,6 +920,7 @@ static int virtio_blk_handle_zone_append(VirtIOBlockReq 
*req,
 int64_t offset = virtio_ldq_p(vdev, >out.sector) << BDRV_SECTOR_BITS;
 int64_t len = iov_size(out_iov, out_num);
 
+trace_virtio_blk_handle_zone_append(vdev, req, offset >> BDRV_SECTOR_BITS);
 if (!check_zoned_request(s, offset, len, true, _status)) {
 goto

[PATCH v8 3/4] block: add accounting for zone append operation

2023-03-22 Thread Sam Li

Taking account of the new zone append write operation for zoned devices,
BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read,
write, flush).

Signed-off-by: Sam Li 
---
 block/qapi-sysemu.c| 11 +++
 block/qapi.c   | 18 +++
 hw/block/virtio-blk.c  |  4 +++
 include/block/accounting.h |  1 +
 qapi/block-core.json   | 62 +++---
 qapi/block.json|  4 +++
 6 files changed, 89 insertions(+), 11 deletions(-)

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 7bd7554150..cec3c1afb4 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -517,6 +517,7 @@ void qmp_block_latency_histogram_set(
 bool has_boundaries, uint64List *boundaries,
 bool has_boundaries_read, uint64List *boundaries_read,
 bool has_boundaries_write, uint64List *boundaries_write,
+bool has_boundaries_append, uint64List *boundaries_append,
 bool has_boundaries_flush, uint64List *boundaries_flush,
 Error **errp)
 {
@@ -557,6 +558,16 @@ void qmp_block_latency_histogram_set(
 }
 }
 
+if (has_boundaries || has_boundaries_append) {
+ret = block_latency_histogram_set(
+stats, BLOCK_ACCT_ZONE_APPEND,
+has_boundaries_append ? boundaries_append : boundaries);
+if (ret) {
+error_setg(errp, "Device '%s' set append write boundaries fail", 
id);
+return;
+}
+}
+
 if (has_boundaries || has_boundaries_flush) {
 ret = block_latency_histogram_set(
 stats, BLOCK_ACCT_FLUSH,
diff --git a/block/qapi.c b/block/qapi.c
index c84147849d..2684484e9d 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -533,27 +533,36 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 
 ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
 ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
+ds->zone_append_bytes = stats->nr_bytes[BLOCK_ACCT_ZONE_APPEND];
 ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
 ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
 ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
+ds->zone_append_operations = stats->nr_ops[BLOCK_ACCT_ZONE_APPEND];
 ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
 
 ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
 ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
+ds->failed_zone_append_operations =
+stats->failed_ops[BLOCK_ACCT_ZONE_APPEND];
 ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
 ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
 
 ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
 ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
+ds->invalid_zone_append_operations =
+stats->invalid_ops[BLOCK_ACCT_ZONE_APPEND];
 ds->invalid_flush_operations =
 stats->invalid_ops[BLOCK_ACCT_FLUSH];
 ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
 
 ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
 ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
+ds->zone_append_merged = stats->merged[BLOCK_ACCT_ZONE_APPEND];
 ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
 ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
 ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
+ds->zone_append_total_time_ns =
+stats->total_time_ns[BLOCK_ACCT_ZONE_APPEND];
 ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
 ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
 ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
@@ -571,6 +580,7 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 
 TimedAverage *rd = >latency[BLOCK_ACCT_READ];
 TimedAverage *wr = >latency[BLOCK_ACCT_WRITE];
+TimedAverage *zap = >latency[BLOCK_ACCT_ZONE_APPEND];
 TimedAverage *fl = >latency[BLOCK_ACCT_FLUSH];
 
 dev_stats->interval_length = ts->interval_length;
@@ -583,6 +593,10 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 dev_stats->max_wr_latency_ns = timed_average_max(wr);
 dev_stats->avg_wr_latency_ns = timed_average_avg(wr);
 
+dev_stats->min_zone_append_latency_ns = timed_average_min(zap);
+dev_stats->max_zone_append_latency_ns = timed_average_max(zap);
+dev_stats->avg_zone_append_latency_ns = timed_average_avg(zap);
+
 dev_stats->min_flush_latency_ns = timed_average_min(fl);
 dev_stats->max_flush_latency_ns = timed_average_max(fl);
 dev_stats->avg_flush_latency_ns = timed_average_avg(fl);
@@ -591,6 +605,8 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, 
BlockBackend *blk)
 block_acct_queue_depth(ts, BLOCK_ACCT_READ);
 dev_stats->avg_wr_queue_depth =

[PATCH v8 1/4] include: update virtio_blk headers to v6.3-rc1

2023-03-22 Thread Sam Li

Use scripts/update-linux-headers.sh to update headers to 6.3-rc1.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
---
 include/standard-headers/drm/drm_fourcc.h|  12 +++
 include/standard-headers/linux/ethtool.h |  48 -
 include/standard-headers/linux/fuse.h|  45 +++-
 include/standard-headers/linux/pci_regs.h|   1 +
 include/standard-headers/linux/vhost_types.h |   2 +
 include/standard-headers/linux/virtio_blk.h  | 105 +++
 linux-headers/asm-arm64/kvm.h|   1 +
 linux-headers/asm-x86/kvm.h  |  34 +-
 linux-headers/linux/kvm.h|   9 ++
 linux-headers/linux/vfio.h   |  15 +--
 linux-headers/linux/vhost.h  |   8 ++
 11 files changed, 270 insertions(+), 10 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 69cab17b38..dc3e6112c1 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -87,6 +87,18 @@ extern "C" {
  *
  * The authoritative list of format modifier codes is found in
  * `include/uapi/drm/drm_fourcc.h`
+ *
+ * Open Source User Waiver
+ * ---
+ *
+ * Because this is the authoritative source for pixel formats and modifiers
+ * referenced by GL, Vulkan extensions and other standards and hence used both
+ * by open source and closed source driver stacks, the usual requirement for an
+ * upstream in-kernel or open source userspace user does not apply.
+ *
+ * To ensure, as much as feasible, compatibility across stacks and avoid
+ * confusion with incompatible enumerations stakeholders for all relevant 
driver
+ * stacks should approve additions.
  */
 
 #define fourcc_code(a, b, c, d) ((uint32_t)(a) | ((uint32_t)(b) << 8) | \
diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
index 87176ab075..99fcddf04f 100644
--- a/include/standard-headers/linux/ethtool.h
+++ b/include/standard-headers/linux/ethtool.h
@@ -711,6 +711,24 @@ enum ethtool_stringset {
ETH_SS_COUNT
 };
 
+/**
+ * enum ethtool_mac_stats_src - source of ethtool MAC statistics
+ * @ETHTOOL_MAC_STATS_SRC_AGGREGATE:
+ * if device supports a MAC merge layer, this retrieves the aggregate
+ * statistics of the eMAC and pMAC. Otherwise, it retrieves just the
+ * statistics of the single (express) MAC.
+ * @ETHTOOL_MAC_STATS_SRC_EMAC:
+ * if device supports a MM layer, this retrieves the eMAC statistics.
+ * Otherwise, it retrieves the statistics of the single (express) MAC.
+ * @ETHTOOL_MAC_STATS_SRC_PMAC:
+ * if device supports a MM layer, this retrieves the pMAC statistics.
+ */
+enum ethtool_mac_stats_src {
+   ETHTOOL_MAC_STATS_SRC_AGGREGATE,
+   ETHTOOL_MAC_STATS_SRC_EMAC,
+   ETHTOOL_MAC_STATS_SRC_PMAC,
+};
+
 /**
  * enum ethtool_module_power_mode_policy - plug-in module power mode policy
  * @ETHTOOL_MODULE_POWER_MODE_POLICY_HIGH: Module is always in high power mode.
@@ -779,6 +797,31 @@ enum ethtool_podl_pse_pw_d_status {
ETHTOOL_PODL_PSE_PW_D_STATUS_ERROR,
 };
 
+/**
+ * enum ethtool_mm_verify_status - status of MAC Merge Verify function
+ * @ETHTOOL_MM_VERIFY_STATUS_UNKNOWN:
+ * verification status is unknown
+ * @ETHTOOL_MM_VERIFY_STATUS_INITIAL:
+ * the 802.3 Verify State diagram is in the state INIT_VERIFICATION
+ * @ETHTOOL_MM_VERIFY_STATUS_VERIFYING:
+ * the Verify State diagram is in the state VERIFICATION_IDLE,
+ * SEND_VERIFY or WAIT_FOR_RESPONSE
+ * @ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
+ * indicates that the Verify State diagram is in the state VERIFIED
+ * @ETHTOOL_MM_VERIFY_STATUS_FAILED:
+ * the Verify State diagram is in the state VERIFY_FAIL
+ * @ETHTOOL_MM_VERIFY_STATUS_DISABLED:
+ * verification of preemption operation is disabled
+ */
+enum ethtool_mm_verify_status {
+   ETHTOOL_MM_VERIFY_STATUS_UNKNOWN,
+   ETHTOOL_MM_VERIFY_STATUS_INITIAL,
+   ETHTOOL_MM_VERIFY_STATUS_VERIFYING,
+   ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED,
+   ETHTOOL_MM_VERIFY_STATUS_FAILED,
+   ETHTOOL_MM_VERIFY_STATUS_DISABLED,
+};
+
 /**
  * struct ethtool_gstrings - string set for data tagging
  * @cmd: Command number = %ETHTOOL_GSTRINGS
@@ -1183,7 +1226,7 @@ struct ethtool_rxnfc {
uint32_trule_cnt;
uint32_trss_context;
};
-   uint32_trule_locs[0];
+   uint32_trule_locs[];
 };
 
 
@@ -1741,6 +1784,9 @@ enum ethtool_link_mode_bit_indices {
ETHTOOL_LINK_MODE_80baseDR8_2_Full_BIT   = 96,
ETHTOOL_LINK_MODE_80baseSR8_Full_BIT = 97,
ETHTOOL_LINK_MODE_80baseVR8_Full_BIT = 98,
+   ETHTOOL_LINK_MODE_10baseT1S_Full_BIT = 99,
+   ETHTOOL_LINK_MODE_10baseT1S_Half_BIT

[PATCH v8 0/4] Add zoned storage emulation to virtio-blk driver

2023-03-22 Thread Sam Li

This patch adds zoned storage emulation to the virtio-blk driver.

The patch implements the virtio-blk ZBD support standardization that is
recently accepted by virtio-spec. The link to related commit is at

https://github.com/oasis-tcs/virtio-spec/commit/b4e8efa0fa6c8d844328090ad15db65af8d7d981

The Linux zoned device code that implemented by Dmitry Fomichev has been
released at the latest Linux version v6.3-rc1.

Aside: adding zoned=on alike options to virtio-blk device will be
considered in following-up plan.

v7:
- address Stefan's review comments
  * rm aio_context_acquire/release in handle_req
  * rename function return type
  * rename BLOCK_ACCT_APPEND to BLOCK_ACCT_ZONE_APPEND for clarity

v6:
- update headers to v6.3-rc1

v5:
- address Stefan's review comments
  * restore the way writing zone append result to buffer
  * fix error checking case and other errands

v4:
- change the way writing zone append request result to buffer
- change zone state, zone type value of virtio_blk_zone_descriptor
- add trace events for new zone APIs

v3:
- use qemuio_from_buffer to write status bit [Stefan]
- avoid using req->elem directly [Stefan]
- fix error checkings and memory leak [Stefan]

v2:
- change units of emulated zone op coresponding to block layer APIs
- modify error checking cases [Stefan, Damien]

v1:
- add zoned storage emulation

Sam Li (4):
  include: update virtio_blk headers to v6.3-rc1
  virtio-blk: add zoned storage emulation for zoned devices
  block: add accounting for zone append operation
  virtio-blk: add some trace events for zoned emulation

 block/qapi-sysemu.c  |  11 +
 block/qapi.c |  18 +
 hw/block/trace-events|   7 +
 hw/block/virtio-blk-common.c |   2 +
 hw/block/virtio-blk.c| 405 +++
 include/block/accounting.h   |   1 +
 include/standard-headers/drm/drm_fourcc.h|  12 +
 include/standard-headers/linux/ethtool.h |  48 ++-
 include/standard-headers/linux/fuse.h|  45 ++-
 include/standard-headers/linux/pci_regs.h|   1 +
 include/standard-headers/linux/vhost_types.h |   2 +
 include/standard-headers/linux/virtio_blk.h  | 105 +
 linux-headers/asm-arm64/kvm.h|   1 +
 linux-headers/asm-x86/kvm.h  |  34 +-
 linux-headers/linux/kvm.h|   9 +
 linux-headers/linux/vfio.h   |  15 +-
 linux-headers/linux/vhost.h  |   8 +
 qapi/block-core.json |  62 ++-
 qapi/block.json  |   4 +
 19 files changed, 769 insertions(+), 21 deletions(-)

-- 
2.39.2

[PATCH v7 3/4] qemu-iotests: test zone append operation

2023-03-22 Thread Sam Li

The patch tests zone append writes by reporting the zone wp after
the completion of the call. "zap -p" option can print the sector
offset value after completion, which should be the start sector
where the append write begins.

Signed-off-by: Sam Li 
---
 qemu-io-cmds.c | 75 ++
 tests/qemu-iotests/tests/zoned | 16 +++
 tests/qemu-iotests/tests/zoned.out | 16 +++
 3 files changed, 107 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index f35ea627d7..3f75d2f5a6 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1874,6 +1874,80 @@ static const cmdinfo_t zone_reset_cmd = {
 .oneline = "reset a zone write pointer in zone block device",
 };
 
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
+  int64_t *offset, int flags, int *total)
+{
+int async_ret = NOT_DONE;
+
+blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, _ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+*total = qiov->size;
+return async_ret < 0 ? async_ret : 1;
+}
+
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
+{
+int ret;
+bool pflag = false;
+int flags = 0;
+int total = 0;
+int64_t offset;
+char *buf;
+int c, nr_iov;
+int pattern = 0xcd;
+QEMUIOVector qiov;
+
+if (optind > argc - 3) {
+return -EINVAL;
+}
+
+if ((c = getopt(argc, argv, "p")) != -1) {
+pflag = true;
+}
+
+offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
+optind++;
+nr_iov = argc - optind;
+buf = create_iovec(blk, , [optind], nr_iov, pattern,
+   flags & BDRV_REQ_REGISTERED_BUF);
+if (buf == NULL) {
+return -EINVAL;
+}
+ret = do_aio_zone_append(blk, , , flags, );
+if (ret < 0) {
+printf("zone append failed: %s\n", strerror(-ret));
+goto out;
+}
+
+if (pflag) {
+printf("After zap done, the append sector is 0x%" PRIx64 "\n",
+   tosector(offset));
+}
+
+out:
+qemu_io_free(blk, buf, qiov.size,
+ flags & BDRV_REQ_REGISTERED_BUF);
+qemu_iovec_destroy();
+return ret;
+}
+
+static const cmdinfo_t zone_append_cmd = {
+.name = "zone_append",
+.altname = "zap",
+.cfunc = zone_append_f,
+.argmin = 3,
+.argmax = 4,
+.args = "offset len [len..]",
+.oneline = "append write a number of bytes at a specified offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
 .name   = "truncate",
@@ -2672,6 +2746,7 @@ static void __attribute((constructor)) 
init_qemuio_commands(void)
 qemuio_add_command(_close_cmd);
 qemuio_add_command(_finish_cmd);
 qemuio_add_command(_reset_cmd);
+qemuio_add_command(_append_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
index 53097e44d9..46e4f25919 100755
--- a/tests/qemu-iotests/tests/zoned
+++ b/tests/qemu-iotests/tests/zoned
@@ -79,6 +79,22 @@ echo "(5) resetting the second zone"
 $QEMU_IO $IMG -c "zrs 268435456 268435456"
 echo "After resetting a zone:"
 $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(6) append write" # the physical block size of the device is 4096
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone firstly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 0 0x1000 0x2000"
+echo "After appending the first zone secondly:"
+$QEMU_IO $IMG -c "zrp 0 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone firstly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+$QEMU_IO $IMG -c "zap -p 268435456 0x1000 0x2000"
+echo "After appending the second zone secondly:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
index b2d061da49..fe53ba4744 100644
--- a/tests/qemu-iotests/tests/zoned.out
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -50,4 +50,20 @@ start: 0x8, len 0x8, cap 0x8, wptr 0x10, 
zcond:14, [type: 2]
 (5) resetting the second zone
 After resetting a zone:
 start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+
+
+(6) append write
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+After zap done, the append sector is 0x0
+After appending the first zone firstly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x18, zcond:2, [type: 2]
+After zap done, the append sector is 0x18
+After appending the first zone secondly:
+start: 0x0, len 0x8, cap 0x8, wptr 0x30, zcond:2, [type: 2]
+After zap done, the append sector is 0x8
+After appending the second zone firstly:

[PATCH v7 1/4] file-posix: add tracking of the zone write pointers

2023-03-22 Thread Sam Li

Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li 
---
 block/file-posix.c   | 168 ++-
 include/block/block-common.h |  14 +++
 include/block/block_int-common.h |   5 +
 3 files changed, 183 insertions(+), 4 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 65efe5147e..0fb425dcae 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1324,6 +1324,85 @@ static int hdev_get_max_segments(int fd, struct stat *st)
 #endif
 }
 
+#if defined(CONFIG_BLKZONED)
+/*
+ * If the ra (reset_all) flag > 0, then the wp of that zone should be reset to
+ * the start sector. Else, take the real wp of the device.
+ */
+static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz, int ra) {
+struct blk_zone *blkz;
+size_t rep_size;
+uint64_t sector = offset >> BDRV_SECTOR_BITS;
+int ret, n = 0, i = 0;
+rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+g_autofree struct blk_zone_report *rep = NULL;
+
+rep = g_malloc(rep_size);
+blkz = (struct blk_zone *)(rep + 1);
+while (n < nrz) {
+memset(rep, 0, rep_size);
+rep->sector = sector;
+rep->nr_zones = nrz - n;
+
+do {
+ret = ioctl(fd, BLKREPORTZONE, rep);
+} while (ret != 0 && errno == EINTR);
+if (ret != 0) {
+error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+fd, offset, errno);
+return -errno;
+}
+
+if (!rep->nr_zones) {
+break;
+}
+
+for (i = 0; i < rep->nr_zones; i++, n++) {
+/*
+ * The wp tracking cares only about sequential writes required and
+ * sequential write preferred zones so that the wp can advance to
+ * the right location.
+ * Use the most significant bit of the wp location to indicate the
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
+ */
+if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+wps->wp[i] &= 1ULL << 63;
+} else {
+switch(blkz[i].cond) {
+case BLK_ZONE_COND_FULL:
+case BLK_ZONE_COND_READONLY:
+/* Zone not writable */
+wps->wp[i] = (blkz[i].start + blkz[i].len) << 
BDRV_SECTOR_BITS;
+break;
+case BLK_ZONE_COND_OFFLINE:
+/* Zone not writable nor readable */
+wps->wp[i] = (blkz[i].start) << BDRV_SECTOR_BITS;
+break;
+default:
+if (ra > 0) {
+wps->wp[i] = blkz[i].start << BDRV_SECTOR_BITS;
+} else {
+wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
+}
+break;
+}
+}
+}
+sector = blkz[i - 1].start + blkz[i - 1].len;
+}
+
+return 0;
+}
+
+static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz) {
+if (get_zones_wp(fd, wps, offset, nrz, 0) < 0) {
+error_report("update zone wp failed");
+}
+}
+#endif
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -1413,6 +1492,21 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 if (ret >= 0) {
 bs->bl.max_active_zones = ret;
 }
+
+ret = get_sysfs_long_val(, "physical_block_size");
+if (ret >= 0) {
+bs->bl.write_granularity = ret;
+}
+
+bs->bl.wps = g_malloc(sizeof(BlockZoneWps) +
+sizeof(int64_t) * bs->bl.nr_zones);
+ret = get_zones_wp(s->fd, bs->bl.wps, 0, bs->bl.nr_zones, 0);
+if (ret < 0) {
+error_setg_errno(errp, -ret, "report wps failed");
+g_free(bs->bl.wps);
+return;
+}
+qemu_co_mutex_init(>bl.wps->colock);
 return;
 }
 out:
@@ -2338,9 +2432,15 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,
 {
 BDRVRawState *s = bs->opaque;
 RawPosixAIOData acb;
+int ret;
 
 if (fd_open(bs) < 0)
 return -EIO;
+#if

[PATCH v7 4/4] block: add some trace events for zone append

2023-03-22 Thread Sam Li

Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 60ad3970f3..9866d073f5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2497,6 +2497,8 @@ out:
 if (!BDRV_ZT_IS_CONV(*wp)) {
 if (type & QEMU_AIO_ZONE_APPEND) {
 *s->offset = *wp;
+trace_zbd_zone_append_complete(bs, *s->offset
+>> BDRV_SECTOR_BITS);
 }
 /* Advance the wp if needed */
 if (offset + bytes > *wp) {
@@ -3544,6 +3546,7 @@ static int coroutine_fn 
raw_co_zone_append(BlockDriverState *bs,
 len += iov_len;
 }
 
+trace_zbd_zone_append(bs, *offset >> BDRV_SECTOR_BITS);
 return raw_co_prw(bs, *offset, len, qiov, QEMU_AIO_ZONE_APPEND);
 }
 #endif
diff --git a/block/trace-events b/block/trace-events
index 3f4e1d088a..32665158d6 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -211,6 +211,8 @@ file_hdev_is_sg(int type, int version) "SG device found: 
type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
 zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report 
%d zones starting at sector offset 0x%" PRIx64 ""
 zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs 
%p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " 
sectors"
+zbd_zone_append(void *bs, int64_t sector) "bs %p append at sector offset 0x%" 
PRIx64 ""
+zbd_zone_append_complete(void *bs, int64_t sector) "bs %p returns append 
sector 0x%" PRIx64 ""
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int 
sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.39.2

[PATCH v7 2/4] block: introduce zone append write for zoned devices

2023-03-22 Thread Sam Li

A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of the call may point at
any position within the zone to which the data is being appended. Upon
completion the device will respond with the position where the data has
been written in the zone.

Signed-off-by: Sam Li 
Reviewed-by: Dmitry Fomichev 
---
 block/block-backend.c | 60 +++
 block/file-posix.c| 58 ++
 block/io.c| 21 +++
 block/io_uring.c  |  4 +++
 block/linux-aio.c |  3 ++
 block/raw-format.c|  8 +
 include/block/block-io.h  |  4 +++
 include/block/block_int-common.h  |  3 ++
 include/block/raw-aio.h   |  4 ++-
 include/sysemu/block-backend-io.h |  9 +
 10 files changed, 166 insertions(+), 8 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index f70b08e3f6..bcb3a1eff0 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1888,6 +1888,45 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return >common;
 }
 
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_append(rwco->blk, (int64_t *)acb->bytes,
+   rwco->iobuf, rwco->flags);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.ret= NOT_DONE,
+.flags  = flags,
+.iobuf  = qiov,
+};
+acb->bytes = (int64_t)offset;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
 /*
  * Send a zone_report command.
  * offset is a byte offset from the start of the device. No alignment
@@ -1939,6 +1978,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return ret;
 }
 
+/*
+ * Send a zone_append command.
+ */
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+
+ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 0fb425dcae..60ad3970f3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -160,6 +160,7 @@ typedef struct BDRVRawState {
 bool has_write_zeroes:1;
 bool use_linux_aio:1;
 bool use_linux_io_uring:1;
+int64_t *offset; /* offset of zone append operation */
 int page_cache_inconsistent; /* errno from fdatasync failure */
 bool has_fallocate;
 bool needs_alignment;
@@ -1680,7 +1681,7 @@ static ssize_t handle_aiocb_rw_vector(RawPosixAIOData 
*aiocb)
 ssize_t len;
 
 len = RETRY_ON_EINTR(
-(aiocb->aio_type & QEMU_AIO_WRITE) ?
+(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
 qemu_pwritev(aiocb->aio_fildes,
aiocb->io.iov,
aiocb->io.niov,
@@ -1709,7 +1710,7 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData 
*aiocb, char *buf)
 ssize_t len;
 
 while (offset < aiocb->aio_nbytes) {
-if (aiocb->aio_type & QEMU_AIO_WRITE) {
+if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
 len = pwrite(aiocb->aio_fildes,
  (const char *)buf + offset,
  aiocb->aio_nbytes - offset,
@@ -1802,7 +1803,7 @@ static int handle_aiocb_rw(void *opaque)
 }
 
 nbytes = handle_aiocb_rw_linear(aiocb, buf);
-if (!(aiocb->aio_type & QEMU_AIO_WRITE)) {
+if (!(aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))) {
 char *p = buf;
 size_t count = aiocb->aio_nbytes, copy;
 int i;
@@ -2437,8 +2438,12 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, 
uint64_t offset,

[PATCH v7 0/4] Add zone append write for zoned device

2023-03-22 Thread Sam Li

This patch series add zone append operation based on the previous
zoned device support part. The file-posix driver is modified to
add zone append emulation using regular writes.

v7:
- address review comments
  * fix wp assignment [Stefan]
  * fix reset_all cases, skip R/O & offline zones [Dmitry, Damien]
  * fix locking on non-zap related cases [Stefan]
  * cleanups and typos correction
- add "zap -p" option to qemuio-cmds [Stefan]

v6:
- add small fixes

v5:
- fix locking conditions and error handling
- drop some trival optimizations
- add tracing points for zone append

v4:
- fix lock related issues[Damien]
- drop all field in zone_mgmt op [Damien]
- fix state checks in zong_mgmt command [Damien]
- return start sector of wp when issuing zap req [Damien]

v3:
- only read wps when it is locked [Damien]
- allow last smaller zone case [Damien]
- add zone type and state checks in zone_mgmt command [Damien]
- fix RESET_ALL related problems

v2:
- split patch to two patches for better reviewing
- change BlockZoneWps's structure to an array of integers
- use only mutex lock on locking conditions of zone wps
- coding styles and clean-ups

v1:
- introduce zone append write

Sam Li (4):
  file-posix: add tracking of the zone write pointers
  block: introduce zone append write for zoned devices
  qemu-iotests: test zone append operation
  block: add some trace events for zone append

 block/block-backend.c  |  60 
 block/file-posix.c | 221 -
 block/io.c |  21 +++
 block/io_uring.c   |   4 +
 block/linux-aio.c  |   3 +
 block/raw-format.c |   8 ++
 block/trace-events |   2 +
 include/block/block-common.h   |  14 ++
 include/block/block-io.h   |   4 +
 include/block/block_int-common.h   |   8 ++
 include/block/raw-aio.h|   4 +-
 include/sysemu/block-backend-io.h  |   9 ++
 qemu-io-cmds.c |  75 ++
 tests/qemu-iotests/tests/zoned |  16 +++
 tests/qemu-iotests/tests/zoned.out |  16 +++
 15 files changed, 457 insertions(+), 8 deletions(-)

-- 
2.39.2

[PATCH v17 3/8] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2023-03-22 Thread Sam Li

Add zoned device option to host_device BlockDriver. It will be presented only
for zoned host block devices. By adding zone management operations to the
host_block_device BlockDriver, users can use the new block layer APIs
including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).

Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
Acked-by: Kevin Wolf 
---
 block/block-backend.c | 133 +
 block/file-posix.c| 307 +-
 block/io.c|  41 
 include/block/block-io.h  |   9 +
 include/block/block_int-common.h  |  21 ++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  18 ++
 meson.build   |   4 +
 qemu-io-cmds.c| 149 +++
 9 files changed, 685 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 278b04ce69..f70b08e3f6 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1806,6 +1806,139 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
+   (unsigned int*)acb->bytes,rwco->iobuf);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor  *zones,
+BlockCompletionFunc *cb, void *opaque)
+{
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.iobuf  = zones,
+.ret= NOT_DONE,
+};
+acb->bytes = (int64_t)nr_zones,
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_mgmt(rwco->blk, (BlockZoneOp)rwco->iobuf,
+ rwco->offset, acb->bytes);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+  int64_t offset, int64_t len,
+  BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.iobuf  = (void *)op,
+.ret= NOT_DONE,
+};
+acb->bytes = len;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
+aio_co_enter(blk_get_aio_context(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+ret = bdrv_co_zone_report(blk_bs(blk), offset, nr_zones, zones);
+blk_dec_in_flight(blk);
+return ret;
+}
+
+/*
+ * Send a zone_management command.
+ * op is the zone operation;
+ * offset is the byte offset from the start of the zoned device;
+ * len is the maximum number of bytes the command should operate on. It
+ * should be aligned with the

[PATCH v17 8/8] docs/zoned-storage: add zoned device documentation

2023-03-22 Thread Sam Li

Add the documentation about the zoned device support to virtio-blk
emulation.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Dmitry Fomichev 
---
 docs/devel/zoned-storage.rst   | 43 ++
 docs/system/qemu-block-drivers.rst.inc |  6 
 2 files changed, 49 insertions(+)
 create mode 100644 docs/devel/zoned-storage.rst

diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
new file mode 100644
index 00..6a36133e51
--- /dev/null
+++ b/docs/devel/zoned-storage.rst
@@ -0,0 +1,43 @@
+=
+zoned-storage
+=
+
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
+that are larger than the LBA size. They can only allow sequential writes, which
+can reduce write amplification in SSDs, and potentially lead to higher
+throughput and increased capacity. More details about ZBDs can be found at:
+
+https://zonedstorage.io/docs/introduction/zoned-storage
+
+1. Block layer APIs for zoned storage
+-
+QEMU block layer supports three zoned storage models:
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
+manage the zones of a device.
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
+zones, making it backward compatible with regular block devices.
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
+supported.
+
+The block device information resides inside BlockDriverState. QEMU uses
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
+block layer while processing I/O requests. A BlockBackend has a root pointer to
+a BlockDriverState graph(for example, raw format on top of file-posix). The
+zoned storage information can be propagated from the leaf BlockDriverState all
+the way up to the BlockBackend. If the zoned storage model in file-posix is
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
+
+The block layer APIs support commands needed for zoned storage devices,
+including report zones, four zone operations, and zone append.
+
+2. Emulating zoned storage controllers
+--
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
+APIs for zoned storage emulation or testing.
+
+For example, to test zone_report on a null_blk device using qemu-io is:
+$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
+-c "zrp offset nr_zones"
diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..105cb9679c 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
   you may corrupt your host data (use the ``-snapshot`` command
   line option or modify the device permissions accordingly).
 
+Zoned block devices
+  Zoned block devices can be passed through to the guest if the emulated 
storage
+  controller supports zoned storage. Use ``--blockdev host_device,
+  node-name=drive0,filename=/dev/nullb0,cache.direct=on`` to pass through
+  ``/dev/nullb0`` as ``drive0``.
+
 Windows
 ^^^
 
-- 
2.39.2

[PATCH v17 5/8] config: add check to block layer

2023-03-22 Thread Sam Li

Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
---
 block.c  | 19 +++
 block/file-posix.c   | 12 
 block/raw-format.c   |  1 +
 include/block/block_int-common.h |  5 +
 4 files changed, 37 insertions(+)

diff --git a/block.c b/block.c
index 0dd604d0f6..4ebf7bbc90 100644
--- a/block.c
+++ b/block.c
@@ -7953,6 +7953,25 @@ void bdrv_add_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 return;
 }
 
+/*
+ * Non-zoned block drivers do not follow zoned storage constraints
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
+ * drivers in a graph.
+ */
+if (!parent_bs->drv->supports_zoned_children &&
+child_bs->bl.zoned == BLK_Z_HM) {
+/*
+ * The host-aware model allows zoned storage constraints and random
+ * write. Allow mixing host-aware and non-zoned drivers. Using
+ * host-aware device as a regular device.
+ */
+error_setg(errp, "Cannot add a %s child to a %s parent",
+   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
+   parent_bs->drv->supports_zoned_children ?
+   "support zoned children" : "not support zoned children");
+return;
+}
+
 if (!QLIST_EMPTY(_bs->parents)) {
 error_setg(errp, "The node %s already has a parent",
child_bs->node_name);
diff --git a/block/file-posix.c b/block/file-posix.c
index 0c19cfb5cc..5fa80933c9 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 }
+#ifdef CONFIG_BLKZONED
+/*
+ * The kernel page cache does not reliably work for writes to SWR zones
+ * of zoned block device because it can not guarantee the order of writes.
+ */
+if ((bs->bl.zoned != BLK_Z_NONE) &&
+(!(s->open_flags & O_DIRECT))) {
+error_setg(errp, "The driver supports zoned devices, and it requires "
+ "cache.direct=on, which was not specified.");
+return -EINVAL; /* No host kernel page cache */
+}
+#endif
 
 if (S_ISBLK(st.st_mode)) {
 #ifdef __linux__
diff --git a/block/raw-format.c b/block/raw-format.c
index 6e1b9394c8..72e23e7b55 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -621,6 +621,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
+.supports_zoned_children = true,
 .bdrv_probe   = _probe,
 .bdrv_reopen_prepare  = _reopen_prepare,
 .bdrv_reopen_commit   = _reopen_commit,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index a3efb385e0..1bd2aef4d5 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -137,6 +137,11 @@ struct BlockDriver {
  */
 bool is_format;
 
+/*
+ * Set to true if the BlockDriver supports zoned children.
+ */
+bool supports_zoned_children;
+
 /*
  * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
  * this field set to true, except ones that are defined only by their
-- 
2.39.2

[PATCH v17 6/8] qemu-iotests: test new zone operations

2023-03-22 Thread Sam Li

The new block layer APIs of zoned block devices can be tested by:
$ tests/qemu-iotests/check zoned
Run each zone operation on a newly created null_blk device
and see whether it outputs the same zone information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned | 86 ++
 tests/qemu-iotests/tests/zoned.out | 53 ++
 2 files changed, 139 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/zoned
 create mode 100644 tests/qemu-iotests/tests/zoned.out

diff --git a/tests/qemu-iotests/tests/zoned b/tests/qemu-iotests/tests/zoned
new file mode 100755
index 00..53097e44d9
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+  _cleanup_test_img
+  sudo rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ../common.rc
+. ../common.filter
+. ../common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+IMG="--image-opts -n driver=host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device:"
+echo "case 1: if the operations work"
+sudo modprobe null_blk nr_devices=1 zoned=1
+sudo chmod 0666 /dev/nullb0
+
+echo "(1) report the first zone:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+$QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2" # 0x3e7000 / 512 = 0x1f38
+echo
+echo
+echo "(2) opening the first zone"
+$QEMU_IO $IMG -c "zo 0 268435456"  # 268435456 / 512 = 524288
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+$QEMU_IO $IMG -c "zo 268435456 268435456" #
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo "opening the last zone"
+$QEMU_IO $IMG -c "zo 0x3e7000 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(3) closing the first zone"
+$QEMU_IO $IMG -c "zc 0 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+$QEMU_IO $IMG -c "zc 0x3e7000 268435456"
+echo "report after:"
+$QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+$QEMU_IO $IMG -c "zf 268435456 268435456"
+echo "After finishing a zone:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(5) resetting the second zone"
+$QEMU_IO $IMG -c "zrs 268435456 268435456"
+echo "After resetting a zone:"
+$QEMU_IO $IMG -c "zrp 268435456 1"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
new file mode 100644
index 00..b2d061da49
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -0,0 +1,53 @@
+QA output created by zoned
+Testing a null_blk device:
+case 1: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+start: 0x10, len 0x8, cap 0x8, wptr 0x10, zcond:1, [type: 2]
+start: 0x18, len 0x8, cap 0x8, wptr 0x18, zcond:1, [type: 2]
+start: 0x20, len 0x8, cap 0x8, wptr 0x20, zcond:1, [type: 2]
+start: 0x28, len 0x8, cap 0x8, wptr 0x28, zcond:1, [type: 2]
+start: 0x30, len 0x8, cap 0x8, wptr 0x30, zcond:1, [type: 2]
+start: 0x38, len 0x8, cap 0x8, wptr 0x38, zcond:1, [type: 2]
+start: 0x40, len 0x8, cap 0x8, wptr 0x40, zcond:1, [type: 2]
+start: 0x48, len 0x8, cap 0x8, wptr 0x48, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 
2]
+
+
+(2) opening the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+report after:
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:3, [type: 2]
+
+opening the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:3, [type: 
2]
+
+
+(3) closing the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+closing the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 
2]
+
+
+(4) finishing the second zone
+After finishing a zone:
+start: 0x8, len 0x8, cap 0x8, wptr 0x10, zcond:14, [type: 2]
+
+
+(5) resetting the second zone
+After resetting a zone:
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1,

[PATCH v17 4/8] raw-format: add zone operations to pass through requests

2023-03-22 Thread Sam Li

raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
---
 block/raw-format.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index 66783ed8e7..6e1b9394c8 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -317,6 +317,21 @@ raw_co_pdiscard(BlockDriverState *bs, int64_t offset, 
int64_t bytes)
 return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_report(BlockDriverState *bs, int64_t offset,
+   unsigned int *nr_zones,
+   BlockZoneDescriptor *zones)
+{
+return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn GRAPH_RDLOCK
+raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len)
+{
+return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
 static int64_t coroutine_fn GRAPH_RDLOCK
 raw_co_getlength(BlockDriverState *bs)
 {
@@ -617,6 +632,8 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = _co_pwritev,
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
+.bdrv_co_zone_report  = _co_zone_report,
+.bdrv_co_zone_mgmt  = _co_zone_mgmt,
 .bdrv_co_block_status = _co_block_status,
 .bdrv_co_copy_range_from = _co_copy_range_from,
 .bdrv_co_copy_range_to  = _co_copy_range_to,
-- 
2.39.2

[PATCH v17 7/8] block: add some trace events for new block layer APIs

2023-03-22 Thread Sam Li

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Dmitry Fomichev 
---
 block/file-posix.c | 3 +++
 block/trace-events | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5fa80933c9..65efe5147e 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3266,6 +3266,7 @@ static int coroutine_fn 
raw_co_zone_report(BlockDriverState *bs, int64_t offset,
 },
 };
 
+trace_zbd_zone_report(bs, *nr_zones, offset >> BDRV_SECTOR_BITS);
 return raw_thread_pool_submit(bs, handle_aiocb_zone_report, );
 }
 #endif
@@ -3332,6 +,8 @@ static int coroutine_fn raw_co_zone_mgmt(BlockDriverState 
*bs, BlockZoneOp op,
 },
 };
 
+trace_zbd_zone_mgmt(bs, op_name, offset >> BDRV_SECTOR_BITS,
+len >> BDRV_SECTOR_BITS);
 ret = raw_thread_pool_submit(bs, handle_aiocb_zone_mgmt, );
 if (ret != 0) {
 ret = -errno;
diff --git a/block/trace-events b/block/trace-events
index 48dbf10c66..3f4e1d088a 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -209,6 +209,8 @@ file_FindEjectableOpticalMedia(const char *media) "Matching 
using %s"
 file_setup_cdrom(const char *partition) "Using %s as optical disc"
 file_hdev_is_sg(int type, int version) "SG device found: type=%d, version=%d"
 file_flush_fdatasync_failed(int err) "errno %d"
+zbd_zone_report(void *bs, unsigned int nr_zones, int64_t sector) "bs %p report 
%d zones starting at sector offset 0x%" PRIx64 ""
+zbd_zone_mgmt(void *bs, const char *op_name, int64_t sector, int64_t len) "bs 
%p %s starts at sector offset 0x%" PRIx64 " over a range of 0x%" PRIx64 " 
sectors"
 
 # ssh.c
 sftp_error(const char *op, const char *ssh_err, int ssh_err_code, int 
sftp_err_code) "%s failed: %s (libssh error code: %d, sftp error code: %d)"
-- 
2.39.2

[PATCH v17 1/8] include: add zoned device structs

2023-03-22 Thread Sam Li

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Dmitry Fomichev 
---
 include/block/block-common.h | 43 
 1 file changed, 43 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index b5122ef8ab..1576fcf2ed 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -75,6 +75,49 @@ typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildClass BdrvChildClass;
 
+typedef enum BlockZoneOp {
+BLK_ZO_OPEN,
+BLK_ZO_CLOSE,
+BLK_ZO_FINISH,
+BLK_ZO_RESET,
+} BlockZoneOp;
+
+typedef enum BlockZoneModel {
+BLK_Z_NONE = 0x0, /* Regular block device */
+BLK_Z_HM = 0x1, /* Host-managed zoned block device */
+BLK_Z_HA = 0x2, /* Host-aware zoned block device */
+} BlockZoneModel;
+
+typedef enum BlockZoneState {
+BLK_ZS_NOT_WP = 0x0,
+BLK_ZS_EMPTY = 0x1,
+BLK_ZS_IOPEN = 0x2,
+BLK_ZS_EOPEN = 0x3,
+BLK_ZS_CLOSED = 0x4,
+BLK_ZS_RDONLY = 0xD,
+BLK_ZS_FULL = 0xE,
+BLK_ZS_OFFLINE = 0xF,
+} BlockZoneState;
+
+typedef enum BlockZoneType {
+BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
+BLK_ZT_SWR = 0x2, /* Sequential writes required */
+BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
+} BlockZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provides information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+uint64_t start;
+uint64_t length;
+uint64_t cap;
+uint64_t wp;
+BlockZoneType type;
+BlockZoneState state;
+} BlockZoneDescriptor;
+
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
-- 
2.39.2

[PATCH v17 2/8] file-posix: introduce helper functions for sysfs attributes

2023-03-22 Thread Sam Li

Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.

Use get_sysfs_long_val() to get the long value of zoned device
information.

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Dmitry Fomichev 
---
 block/file-posix.c   | 122 ++-
 include/block/block_int-common.h |   3 +
 2 files changed, 91 insertions(+), 34 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 5760cf22d1..496edc644c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1202,64 +1202,112 @@ static int hdev_get_max_hw_transfer(int fd, struct 
stat *st)
 #endif
 }
 
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Get a sysfs attribute value as character string.
+ */
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
+ char **val) {
 #ifdef CONFIG_LINUX
-char buf[32];
-const char *end;
-char *sysfspath = NULL;
+g_autofree char *sysfspath = NULL;
 int ret;
-int sysfd = -1;
-long max_segments;
+size_t len;
 
-if (S_ISCHR(st->st_mode)) {
-if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
-return ret;
-}
+if (!S_ISBLK(st->st_mode)) {
 return -ENOTSUP;
 }
 
-if (!S_ISBLK(st->st_mode)) {
-return -ENOTSUP;
+sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+major(st->st_rdev), minor(st->st_rdev),
+attribute);
+ret = g_file_get_contents(sysfspath, val, , NULL);
+if (ret == -1) {
+return -ENOENT;
 }
 
-sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-major(st->st_rdev), minor(st->st_rdev));
-sysfd = open(sysfspath, O_RDONLY);
-if (sysfd == -1) {
-ret = -errno;
-goto out;
+/* The file is ended with '\n' */
+char *p;
+p = *val;
+if (*(p + len - 1) == '\n') {
+*(p + len - 1) = '\0';
 }
-ret = RETRY_ON_EINTR(read(sysfd, buf, sizeof(buf) - 1));
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
+
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
+{
+g_autofree char *val = NULL;
+int ret;
+
+ret = get_sysfs_str_val(st, "zoned", );
 if (ret < 0) {
-ret = -errno;
-goto out;
-} else if (ret == 0) {
-ret = -EIO;
-goto out;
+return ret;
 }
-buf[ret] = 0;
-/* The file is ended with '\n', pass 'end' to accept that. */
-ret = qemu_strtol(buf, , 10, _segments);
-if (ret == 0 && end && *end == '\n') {
-ret = max_segments;
+
+if (strcmp(val, "host-managed") == 0) {
+*zoned = BLK_Z_HM;
+} else if (strcmp(val, "host-aware") == 0) {
+*zoned = BLK_Z_HA;
+} else if (strcmp(val, "none") == 0) {
+*zoned = BLK_Z_NONE;
+} else {
+return -ENOTSUP;
+}
+return 0;
+}
+
+/*
+ * Get a sysfs attribute value as a long integer.
+ */
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
+{
+#ifdef CONFIG_LINUX
+g_autofree char *str = NULL;
+const char *end;
+long val;
+int ret;
+
+ret = get_sysfs_str_val(st, attribute, );
+if (ret < 0) {
+return ret;
 }
 
-out:
-if (sysfd != -1) {
-close(sysfd);
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(str, , 10, );
+if (ret == 0 && end && *end == '\0') {
+ret = val;
 }
-g_free(sysfspath);
 return ret;
 #else
 return -ENOTSUP;
 #endif
 }
 
+static int hdev_get_max_segments(int fd, struct stat *st)
+{
+#ifdef CONFIG_LINUX
+int ret;
+
+if (S_ISCHR(st->st_mode)) {
+if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
+return ret;
+}
+return -ENOTSUP;
+}
+return get_sysfs_long_val(st, "max_segments");
+#else
+return -ENOTSUP;
+#endif
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
+int ret;
+BlockZoneModel zoned;
 
 s->needs_alignment = raw_needs_alignment(bs);
 raw_probe_alignment(bs, s->fd, errp);
@@ -1297,6 +1345,12 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_hw_iov = ret;
 }
 }
+
+ret = get_sysfs_zoned_model(, );
+if (ret < 0) {
+zoned = BLK_Z_NONE;
+}
+bs->bl.zoned = zoned;
 }
 
 static int check_for_dasd(int fd)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index d419017328..6d0f470626 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -855,6 +855,9 @@ typedef struct BlockLimits {
 
 /* maximum number of iovec elements */
 int

[PATCH v17 0/8] Add support for zoned device

2023-03-22 Thread Sam Li

Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
that are larger than the LBA size. It can only allow sequential writes, which
reduces write amplification in SSD, leading to higher throughput and increased
capacity. More details about ZBDs can be found at:

https://zonedstorage.io/docs/introduction/zoned-storage

The zoned device support aims to let guests (virtual machines) access zoned
storage devices on the host (hypervisor) through a virtio-blk device. This
involves extending QEMU's block layer and virtio-blk emulation code.  In its
current status, the virtio-blk device is not aware of ZBDs but the guest sees
host-managed drives as regular drive that will runs correctly under the most
common write workloads.

This patch series extend the block layer APIs with the minimum set of zoned
commands that are necessary to support zoned devices. The commands are - Report
Zones, four zone operations and Zone Append.

There has been a debate on whethre introducing new zoned_host_device BlockDriver
specifically for zoned devices. In the end, it's been decided to stick to
existing host_device BlockDriver interface by only adding new zoned operations
inside it. The benefit of that is to avoid further changes - one example is
command line syntax - to the applications like Libvirt using QEMU zoned
emulation.

It can be tested on a null_blk device using qemu-io or qemu-iotests. For
example, to test zone report using qemu-io:
$ path/to/qemu-io --image-opts -n driver=host_device,filename=/dev/nullb0
-c "zrp offset nr_zones"

v17:
- fix qemuiotests for zoned support patches [Dmitry]

v16:
- update zoned_host device name to host_device [Stefan]
- fix probing zoned device blocksizes [Stefan]
- Use empty fields instead of changing struct size of BlkRwCo [Kevin, Stefan]

v15:
- drop zoned_host_device BlockDriver
- add zoned device option to host_device driver instead of introducing a new
  zoned_host_device BlockDriver [Stefan]

v14:
- address Stefan's comments of probing block sizes

v13:
- add some tracing points for new zone APIs [Dmitry]
- change error handling in zone_mgmt [Damien, Stefan]

v12:
- address review comments
  * drop BLK_ZO_RESET_ALL bit [Damien]
  * fix error messages, style, and typos[Damien, Hannes]

v11:
- address review comments
  * fix possible BLKZONED config compiling warnings [Stefan]
  * fix capacity field compiling warnings on older kernel [Stefan,Damien]

v10:
- address review comments
  * deal with the last small zone case in zone_mgmt operations [Damien]
  * handle the capacity field outdated in old kernel(before 5.9) [Damien]
  * use byte unit in block layer to be consistent with QEMU [Eric]
  * fix coding style related problems [Stefan]

v9:
- address review comments
  * specify units of zone commands requests [Stefan]
  * fix some error handling in file-posix [Stefan]
  * introduce zoned_host_devcie in the commit message [Markus]

v8:
- address review comments
  * solve patch conflicts and merge sysfs helper funcations into one patch
  * add cache.direct=on check in config

v7:
- address review comments
  * modify sysfs attribute helper funcations
  * move the input validation and error checking into raw_co_zone_* function
  * fix checks in config

v6:
- drop virtio-blk emulation changes
- address Stefan's review comments
  * fix CONFIG_BLKZONED configs in related functions
  * replace reading fd by g_file_get_contents() in get_sysfs_str_val()
  * rewrite documentation for zoned storage

v5:
- add zoned storage emulation to virtio-blk device
- add documentation for zoned storage
- address review comments
  * fix qemu-iotests
  * fix check to block layer
  * modify interfaces of sysfs helper functions
  * rename zoned device structs according to QEMU styles
  * reorder patches

v4:
- add virtio-blk headers for zoned device
- add configurations for zoned host device
- add zone operations for raw-format
- address review comments
  * fix memory leak bug in zone_report
  * add checks to block layers
  * fix qemu-iotests format
  * fix sysfs helper functions

v3:
- add helper functions to get sysfs attributes
- address review comments
  * fix zone report bugs
  * fix the qemu-io code path
  * use thread pool to avoid blocking ioctl() calls

v2:
- add qemu-io sub-commands
- address review comments
  * modify interfaces of APIs

v1:
- add block layer APIs resembling Linux ZoneBlockDevice ioctls

Sam Li (8):
  include: add zoned device structs
  file-posix: introduce helper functions for sysfs attributes
  block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  raw-format: add zone operations to pass through requests
  config: add check to block layer
  qemu-iotests: test new zone operations
  block: add some trace events for new block layer APIs
  docs/zoned-storage: add zoned device documentation

 block.c|  19 ++
 block/block-backend.c  | 133 
 block/file-posix.c | 444

Re: [PATCH for-8.0] aio-posix: fix race between epoll upgrade and aio_set_fd_handler()

2023-03-22 Thread Paolo Bonzini

Il mer 22 mar 2023, 15:55 Stefan Hajnoczi  ha scritto:

> +/* The list must not change while we add fds to epoll */
> +if (!qemu_lockcnt_dec_if_lock(>list_lock)) {
> +return false;
> +}
> +
> +ok = fdmon_epoll_try_enable(ctx);
> +
> +qemu_lockcnt_unlock(>list_lock);
>

Shouldn't this be inc_and_unlock to balance the change made by dec_if_lock?

Paolo

+
> +if (!ok) {
> +fdmon_epoll_disable(ctx);
> +}
> +return ok;
>  }
>
>  void fdmon_epoll_setup(AioContext *ctx)
> --
> 2.39.2
>
>

RE: [PATCH 1/2] ui/gtk: use widget size for cursor motion event

2023-03-22 Thread Kasireddy, Vivek

Hi Erico,

> >
> >>
> >> The gd_motion_event size has some calculations for the cursor position,
> >> which also take into account things like different size of the
> >> framebuffer compared to the window size.
> >> The use of window size makes things more difficult though, as at least
> >> in the case of Wayland includes the size of ui elements like a menu bar
> >> at the top of the window. This leads to a wrong position calculation by
> >> a few pixels.
> >> Fix it by using the size of the widget, which already returns the size
> >> of the actual space to render the framebuffer.
> >>
> >> Signed-off-by: Erico Nunes 
> >> ---
> >>  ui/gtk.c | 8 +++-
> >>  1 file changed, 3 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/ui/gtk.c b/ui/gtk.c
> >> index fd82e9b1ca..d1b2a80c2b 100644
> >> --- a/ui/gtk.c
> >> +++ b/ui/gtk.c
> >> @@ -868,7 +868,6 @@ static gboolean gd_motion_event(GtkWidget *widget,
> >> GdkEventMotion *motion,
> >>  {
> >>  VirtualConsole *vc = opaque;
> >>  GtkDisplayState *s = vc->s;
> >> -GdkWindow *window;
> >>  int x, y;
> >>  int mx, my;
> >>  int fbh, fbw;
> >> @@ -881,10 +880,9 @@ static gboolean gd_motion_event(GtkWidget
> *widget,
> >> GdkEventMotion *motion,
> >>  fbw = surface_width(vc->gfx.ds) * vc->gfx.scale_x;
> >>  fbh = surface_height(vc->gfx.ds) * vc->gfx.scale_y;
> >>
> >> -window = gtk_widget_get_window(vc->gfx.drawing_area);
> >> -ww = gdk_window_get_width(window);
> >> -wh = gdk_window_get_height(window);
> >> -ws = gdk_window_get_scale_factor(window);
> >> +ww = gtk_widget_get_allocated_width(widget);
> >> +wh = gtk_widget_get_allocated_height(widget);
> > [Kasireddy, Vivek] Could you please confirm if this works in X-based 
> > compositor
> > environments as well? Last time I checked (with Fedora 36 and Gnome + X), 
> > the
> > get_allocated_xxx APIs were not accurate in X-based environments. Therefore,
> > I restricted the above change to Wayland-based environments only:
> > https://lists.nongnu.org/archive/html/qemu-devel/2022-11/msg03100.html
> 
> Yes, I tested again and it seems to work fine for me even with the gtk
> ui running on X. I'm using Fedora 37.
[Kasireddy, Vivek] Ok, in that case, this patch is 
Acked-by: Vivek Kasireddy 

> 
> I was not aware of that patch series though and just spent some time
> debugging these ui issues. It looks like your series was missed?
[Kasireddy, Vivek] Yeah, not sure why my series was missed but in 
retrospect, I probably should have separated out bug fix patches
from new feature enablement patches.

> 
> I'm still debugging additional issues with cursor position calculation,
> especially in wayland environments (and in particular with
> vhost-user-gpu now). Do those patches address more cursor issues?
[Kasireddy, Vivek] They do address more cursor issues but not sure how
helpful they would be for you as most of them deal with relative mode +
Wayland environment. However, there is another one that deals with
cursor/pointer in absolute mode + multiple monitors:
https://lists.nongnu.org/archive/html/qemu-devel/2022-11/msg03097.html

Thanks,
Vivek
> 
> Thank you
> 
> Erico
>

Re: [PATCH v15 1/4] vhost: expose function vhost_dev_has_iommu()

2023-03-22 Thread Jason Wang

On Tue, Mar 21, 2023 at 10:23 PM Cindy Lu  wrote:
>
> To support vIOMMU in vdpa, need to exposed the function
> vhost_dev_has_iommu, vdpa will use this function to check
> if vIOMMU enable.
>
> Signed-off-by: Cindy Lu 

It looks like you missed my acks for patches 1 - 3.

Thanks

> ---
>  hw/virtio/vhost.c | 2 +-
>  include/hw/virtio/vhost.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index a266396576..fd746b085b 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -107,7 +107,7 @@ static void vhost_dev_sync_region(struct vhost_dev *dev,
>  }
>  }
>
> -static bool vhost_dev_has_iommu(struct vhost_dev *dev)
> +bool vhost_dev_has_iommu(struct vhost_dev *dev)
>  {
>  VirtIODevice *vdev = dev->vdev;
>
> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
> index a52f273347..f7f10c8fb7 100644
> --- a/include/hw/virtio/vhost.h
> +++ b/include/hw/virtio/vhost.h
> @@ -336,4 +336,5 @@ int vhost_dev_set_inflight(struct vhost_dev *dev,
> struct vhost_inflight *inflight);
>  int vhost_dev_get_inflight(struct vhost_dev *dev, uint16_t queue_size,
> struct vhost_inflight *inflight);
> +bool vhost_dev_has_iommu(struct vhost_dev *dev);
>  #endif
> --
> 2.34.3
>

Re: [PATCH v15 4/4] vhost-vdpa: Add support for vIOMMU.

2023-03-22 Thread Jason Wang

On Tue, Mar 21, 2023 at 10:24 PM Cindy Lu  wrote:
>
> 1. The vIOMMU support will make vDPA can work in IOMMU mode. This
> will fix security issues while using the no-IOMMU mode.
> To support this feature we need to add new functions for IOMMU MR adds and
> deletes.
>
> Also since the SVQ does not support vIOMMU yet, add the check for IOMMU
> in vhost_vdpa_dev_start, if the SVQ and IOMMU enable at the same time
> the function will return fail.
>
> 2. Skip the iova_max check vhost_vdpa_listener_skipped_section(). While
> MR is IOMMU, move this check to vhost_vdpa_iommu_map_notify()
>
> Verified in vp_vdpa and vdpa_sim_net driver
>
> Signed-off-by: Cindy Lu 
> ---
>  hw/virtio/trace-events |   2 +-
>  hw/virtio/vhost-vdpa.c | 159 ++---
>  include/hw/virtio/vhost-vdpa.h |  11 +++
>  3 files changed, 161 insertions(+), 11 deletions(-)
>
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index 8f8d05cf9b..de4da2c65c 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -33,7 +33,7 @@ vhost_user_create_notifier(int idx, void *n) "idx:%d n:%p"
>  vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, 
> uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) 
> "vdpa:%p fd: %d msg_type: %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 
> 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8
>  vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint32_t asid, 
> uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: 
> %"PRIu32" asid: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8
>  vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t 
> type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> -vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type) 
>  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
> +vhost_vdpa_iotlb_batch_end_once(void *v, int fd, uint32_t msg_type, uint8_t 
> type)  "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8
>  vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, 
> void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" 
> vaddr: %p read-only: %d"
>  vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) 
> "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64
>  vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 0c8c37e786..39720d12a6 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -26,6 +26,7 @@
>  #include "cpu.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "hw/virtio/virtio-access.h"
>
>  /*
>   * Return one past the end of the end of section. Be careful with uint64_t
> @@ -60,13 +61,21 @@ static bool 
> vhost_vdpa_listener_skipped_section(MemoryRegionSection *section,
>   iova_min, section->offset_within_address_space);
>  return true;
>  }
> +/*
> + * While using vIOMMU, sometimes the section will be larger than 
> iova_max,
> + * but the memory that actually maps is smaller, so move the check to
> + * function vhost_vdpa_iommu_map_notify(). That function will use the 
> actual
> + * size that maps to the kernel
> + */
>
> -llend = vhost_vdpa_section_end(section);
> -if (int128_gt(llend, int128_make64(iova_max))) {
> -error_report("RAM section out of device range (max=0x%" PRIx64
> - ", end addr=0x%" PRIx64 ")",
> - iova_max, int128_get64(llend));
> -return true;
> +if (!memory_region_is_iommu(section->mr)) {
> +llend = vhost_vdpa_section_end(section);
> +if (int128_gt(llend, int128_make64(iova_max))) {
> +error_report("RAM section out of device range (max=0x%" PRIx64
> + ", end addr=0x%" PRIx64 ")",
> + iova_max, int128_get64(llend));
> +return true;
> +}
>  }
>
>  return false;
> @@ -158,9 +167,8 @@ static void vhost_vdpa_iotlb_batch_begin_once(struct 
> vhost_vdpa *v)
>  v->iotlb_batch_begin_sent = true;
>  }
>
> -static void vhost_vdpa_listener_commit(MemoryListener *listener)
> +static void vhost_vdpa_iotlb_batch_end_once(struct vhost_vdpa *v)
>  {
> -struct vhost_vdpa *v = container_of(listener, struct vhost_vdpa, 
> listener);
>  struct vhost_dev *dev = v->dev;
>  struct vhost_msg_v2 msg = {};
>  int fd = v->device_fd;
> @@ -176,7 +184,7 @@ static void vhost_vdpa_listener_commit(MemoryListener 
> *listener)
>  msg.type = v->msg_type;
>  msg.iotlb.type = VHOST_IOTLB_BATCH_END;
>
> -trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
> +trace_vhost_vdpa_iotlb_batch_end_once(v, fd, msg.type, msg.iotlb.type);

I suggest to keep the commit trace. The commit and batch are different
things. If you want to trace the

Re: [PATCH for-8.1 v4 12/25] target/riscv/cpu.c: redesign register_cpu_props()

2023-03-22 Thread LIU Zhiwei


Hi Daniel,

I want to share my opinions about the cpu->cfg and misa.


Two suggestions:

1) The cpu->cfg should be set only once in cpu initialization 
phrase(cpu_init_fn or cpu_realize_fn), and never changes any more in 
other times(for example write_misa).


2) Set the misa only when cpu->cfg is ready.


In my mind, we should setting the misa and cfg in this way.

1) setting cfg  and misa_mxl in xxx_cpu_init.  Don't call set_misa here.

2) register and setting cfg for general cpus by the infrastructure.

3) check the cfg in cpu_realize_fn stage in a special function. Don't 
change cpu->cfg, just pass it as a parameter.


4)  expand the cpu->cfg, such as for RVG.

5)  setting the misa and misa_max

6) when write_misa, construct a cfg for the new misa value. If the cfg 
is legal after checking it against with the cpu->cfg, write it directly 
into misa. Don't change the cpu->cfg here.



Best Regards,
Zhiwei

On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

Now that the function is a no-op if 'env.misa_ext != 0', and no one that
are setting misa_ext != 0 is calling it because set_misa() is setting
the cpu cfg accordingly, remove the now deprecated code and rename the
function to register_generic_cpu_props().

This function is now doing exactly what the name says: it is creating
user-facing properties to allow changes in the CPU cfg via the QEMU
command line, setting default values if no user input is provided.

Note that there's the possibility of a CPU to set a certain misa value
and, at the same, also want user-facing flags and defaults from this
function. This is not the case since commit 26b2bc58599c ("target/riscv:
Don't expose the CPU properties on names CPUs"), but given that this is
also a possibility, clarify in the function that using this function
will overwrite existing values in cpu->cfg.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 48 ++
  1 file changed, 10 insertions(+), 38 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index df5c0bda70..0e56a1c01f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -221,7 +221,7 @@ static const char * const riscv_intr_names[] = {
  "reserved"
  };
  
-static void register_cpu_props(Object *obj);

+static void register_generic_cpu_props(Object *obj);
  
  const char *riscv_cpu_get_trap_name(target_ulong cause, bool async)

  {
@@ -386,7 +386,7 @@ static void rv64_base_cpu_init(Object *obj)
  CPURISCVState *env = _CPU(obj)->env;
  /* We set this in the realise function */
  set_misa(env, MXL_RV64, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
  /* Set latest version of privileged specification */
  env->priv_ver = PRIV_VERSION_LATEST;
  #ifndef CONFIG_USER_ONLY
@@ -472,7 +472,7 @@ static void rv128_base_cpu_init(Object *obj)
  CPURISCVState *env = _CPU(obj)->env;
  /* We set this in the realise function */
  set_misa(env, MXL_RV128, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
  /* Set latest version of privileged specification */
  env->priv_ver = PRIV_VERSION_LATEST;
  #ifndef CONFIG_USER_ONLY
@@ -485,7 +485,7 @@ static void rv32_base_cpu_init(Object *obj)
  CPURISCVState *env = _CPU(obj)->env;
  /* We set this in the realise function */
  set_misa(env, MXL_RV32, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
  /* Set latest version of privileged specification */
  env->priv_ver = PRIV_VERSION_LATEST;
  #ifndef CONFIG_USER_ONLY
@@ -572,7 +572,7 @@ static void riscv_host_cpu_init(Object *obj)
  #elif defined(TARGET_RISCV64)
  set_misa(env, MXL_RV64, 0);
  #endif
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
  }
  #endif
  
@@ -1554,44 +1554,16 @@ static Property riscv_cpu_extensions[] = {

  };
  
  /*

- * Register CPU props based on env.misa_ext. If a non-zero
- * value was set, register only the required cpu->cfg.ext_*
- * properties and leave. env.misa_ext = 0 means that we want
- * all the default properties to be registered.
+ * Register generic CPU props with user-facing flags declared
+ * in riscv_cpu_extensions[].
+ *
+ * Note that this will overwrite existing values in cpu->cfg.
   */
-static void register_cpu_props(Object *obj)
+static void register_generic_cpu_props(Object *obj)
  {
-RISCVCPU *cpu = RISCV_CPU(obj);
-uint32_t misa_ext = cpu->env.misa_ext;
  Property *prop;
  DeviceState *dev = DEVICE(obj);
  
-/*

- * If misa_ext is not zero, set cfg properties now to
- * allow them to be read during riscv_cpu_realize()
- * later on.
- */
-if (cpu->env.misa_ext != 0) {
-cpu->cfg.ext_i = misa_ext & RVI;
-cpu->cfg.ext_e = misa_ext & RVE;
-cpu->cfg.ext_m = misa_ext & RVM;
-cpu->cfg.ext_a = misa_ext & RVA;
-cpu->cfg.ext_f = misa_ext & RVF;
-cpu->cfg.ext_d = misa_ext & RVD;
-cpu->cfg.ext_v =

[PATCH 0/3] Add support for TPM devices over I2C bus

2023-03-22 Thread Ninad Palsule

This drop adds support for the TPM devices attached to the I2C bus. It
only supports the TPM2 protocol. You need to run it with the external
TPM emulator like swtpm. I have tested it with swtpm.

I have refered to the work done by zhdan...@meta.com but at the core
level out implementation is different.
https://github.com/theopolis/qemu/commit/2e2e57cde9e419c36af8071bb85392ad1ed70966

Based-on: $MESSAGE_ID
---
V2:
 Incorporated Stephan's comments.

Ninad Palsule (3):
  docs: Add support for TPM devices over I2C bus
  TPM TIS: Add support for TPM devices over I2C bus
  New I2C: Add support for TPM devices over I2C bus

 docs/specs/tpm.rst  |  20 +-
 hw/arm/Kconfig  |   1 +
 hw/tpm/Kconfig  |   7 +
 hw/tpm/meson.build  |   1 +
 hw/tpm/tpm_tis.h|   3 +
 hw/tpm/tpm_tis_common.c |  32 +++
 hw/tpm/tpm_tis_i2c.c| 440 
 include/sysemu/tpm.h|   3 +
 8 files changed, 506 insertions(+), 1 deletion(-)
 create mode 100644 hw/tpm/tpm_tis_i2c.c

-- 
2.37.2

[PATCH 1/3] docs: Add support for TPM devices over I2C bus

2023-03-22 Thread Ninad Palsule

This is a documentation change for I2C TPM device support.

Qemu already supports devices attached to ISA and sysbus.
This drop adds support for the I2C bus attached TPM devices.

Signed-off-by: Ninad Palsule 

---
V2:

Incorporated Stephen's review comments
- Added example in the document.
---
 docs/specs/tpm.rst | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst
index 535912a92b..bf7249b09c 100644
--- a/docs/specs/tpm.rst
+++ b/docs/specs/tpm.rst
@@ -21,11 +21,15 @@ QEMU files related to TPM TIS interface:
  - ``hw/tpm/tpm_tis_common.c``
  - ``hw/tpm/tpm_tis_isa.c``
  - ``hw/tpm/tpm_tis_sysbus.c``
+ - ``hw/tpm/tpm_tis_i2c.c``
  - ``hw/tpm/tpm_tis.h``
 
 Both an ISA device and a sysbus device are available. The former is
 used with pc/q35 machine while the latter can be instantiated in the
-Arm virt machine.
+Arm virt machine. An I2C device support is also added which can be
+instantiated in the arm based emulation machine. An I2C device is also
+supported for the Arm virt machine. This device only supports the
+TPM 2 protocol.
 
 CRB interface
 -
@@ -348,6 +352,20 @@ In case an Arm virt machine is emulated, use the following 
command line:
 -drive if=pflash,format=raw,file=flash0.img,readonly=on \
 -drive if=pflash,format=raw,file=flash1.img
 
+In case a Rainier bmc machine is emulated, use the following command line:
+
+.. code-block:: console
+
+  qemu-system-arm -M rainier-bmc -nographic \
+-kernel ${IMAGEPATH}/fitImage-linux.bin \
+-dtb ${IMAGEPATH}/aspeed-bmc-ibm-rainier.dtb \
+-initrd ${IMAGEPATH}/obmc-phosphor-initramfs.rootfs.cpio.xz \
+-drive 
file=${IMAGEPATH}/obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2\
+-net nic -net 
user,hostfwd=:127.0.0.1:-:22,hostfwd=:127.0.0.1:2443-:443\
+-chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
+-tpmdev emulator,id=tpm0,chardev=chrtpm \
+-device tpm-tis-i2c,tpmdev=tpm0,bus=aspeed.i2c.bus.12,address=0x2e
+
 In case SeaBIOS is used as firmware, it should show the TPM menu item
 after entering the menu with 'ESC'.
 
-- 
2.37.2

[PATCH 2/3] TPM TIS: Add support for TPM devices over I2C bus

2023-03-22 Thread Ninad Palsule

Qemu already supports devices attached to ISA and sysbus. This drop adds
support for the I2C bus attached TPM devices.

This commit includes changes for the common code.
- Added support for the new checksum registers which are required for
  the I2C support. The checksum calculation is handled in the qemu
  common code.
- Added wrapper function for read and write data so that I2C code can
  call it without MMIO interface.

Signed-off-by: Ninad Palsule 
---
V2:

Incorporated Stephen's comments.

- Removed checksum enable and checksum get registers.
- Added checksum calculation function which can be called from
  i2c layer.
---
 hw/tpm/tpm_tis.h|  3 +++
 hw/tpm/tpm_tis_common.c | 32 
 2 files changed, 35 insertions(+)

diff --git a/hw/tpm/tpm_tis.h b/hw/tpm/tpm_tis.h
index f6b5872ba6..6f29a508dd 100644
--- a/hw/tpm/tpm_tis.h
+++ b/hw/tpm/tpm_tis.h
@@ -86,5 +86,8 @@ int tpm_tis_pre_save(TPMState *s);
 void tpm_tis_reset(TPMState *s);
 enum TPMVersion tpm_tis_get_tpm_version(TPMState *s);
 void tpm_tis_request_completed(TPMState *s, int ret);
+uint32_t tpm_tis_read_data(TPMState *s, hwaddr addr, unsigned size);
+void tpm_tis_write_data(TPMState *s, hwaddr addr, uint64_t val, uint32_t size);
+uint16_t tpm_tis_get_checksum(TPMState *s);
 
 #endif /* TPM_TPM_TIS_H */
diff --git a/hw/tpm/tpm_tis_common.c b/hw/tpm/tpm_tis_common.c
index 503be2a541..b1acde74cb 100644
--- a/hw/tpm/tpm_tis_common.c
+++ b/hw/tpm/tpm_tis_common.c
@@ -26,6 +26,8 @@
 #include "hw/irq.h"
 #include "hw/isa/isa.h"
 #include "qapi/error.h"
+#include "qemu/bswap.h"
+#include "qemu/crc-ccitt.h"
 #include "qemu/module.h"
 
 #include "hw/acpi/tpm.h"
@@ -447,6 +449,27 @@ static uint64_t tpm_tis_mmio_read(void *opaque, hwaddr 
addr,
 return val;
 }
 
+/*
+ * A wrapper read function so that it can be directly called without
+ * mmio.
+ */
+uint32_t tpm_tis_read_data(TPMState *s, hwaddr addr, unsigned size)
+{
+return tpm_tis_mmio_read(s, addr, size);
+}
+
+/*
+ * Calculate current data buffer checksum
+ */
+uint16_t tpm_tis_get_checksum(TPMState *s)
+{
+uint16_t val = 0x;
+
+val = cpu_to_be16(crc_ccitt(0, s->buffer, s->rw_offset));
+
+return val;
+}
+
 /*
  * Write a value to a register of the TIS interface
  * See specs pages 33-63 for description of the registers
@@ -767,6 +790,15 @@ static void tpm_tis_mmio_write(void *opaque, hwaddr addr,
 }
 }
 
+/*
+ * A wrapper write function so that it can be directly called without
+ * mmio.
+ */
+void tpm_tis_write_data(TPMState *s, hwaddr addr, uint64_t val, uint32_t size)
+{
+tpm_tis_mmio_write(s, addr, val, size);
+}
+
 const MemoryRegionOps tpm_tis_memory_ops = {
 .read = tpm_tis_mmio_read,
 .write = tpm_tis_mmio_write,
-- 
2.37.2

[PATCH 3/3] New I2C: Add support for TPM devices over I2C bus

2023-03-22 Thread Ninad Palsule

Qemu already supports devices attached to ISA and sysbus. This drop adds
support for the I2C bus attached TPM devices. I2C model only supports
TPM2 protocol.

This commit includes changes for the common code.
- Added I2C emulation model. Logic was added in the model to temporarily
  cache the data as I2C interface works per byte basis.
- New tpm type "tpm-tis-i2c" added for I2C support. User specify this
  string on command line.

Testing:
  TPM I2C device modulte is tested using SWTPM (software based TPM
  package). The qemu used the rainier machine and it was connected to
  swtpm over the socket interface.

  The command to start swtpm is as follows:
  $ swtpm socket --tpmstate dir=/tmp/mytpm1\
 --ctrl type=unixio,path=/tmp/mytpm1/swtpm-sock  \
 --tpm2 --log level=100

  The command to start qemu is as follows:
  $ qemu-system-arm -M rainier-bmc -nographic \
-kernel ${IMAGEPATH}/fitImage-linux.bin \
-dtb ${IMAGEPATH}/aspeed-bmc-ibm-rainier.dtb \
-initrd ${IMAGEPATH}/obmc-phosphor-initramfs.rootfs.cpio.xz \
-drive 
file=${IMAGEPATH}/obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
-net nic -net 
user,hostfwd=:127.0.0.1:-:22,hostfwd=:127.0.0.1:2443-:443 \
-chardev socket,id=chrtpm,path=/tmp/mytpm1/swtpm-sock \
-tpmdev emulator,id=tpm0,chardev=chrtpm \
-device tpm-tis-i2c,tpmdev=tpm0,bus=aspeed.i2c.bus.12,address=0x2e

  Note: Currently you need to specify the I2C bus and device address on
command line. In future we can add a device at board level.

Signed-off-by: Ninad Palsule 
---
V2:
Incorporated Stephen's review comments.
- Handled checksum related register in I2C layer
- Defined I2C interface capabilities and return those instead of
  capabilities from TPM TIS. Add required capabilities from TIS.
- Do not cache FIFO data in the I2C layer.
- Make sure that Device address change register is not passed to I2C
  layer as capability indicate that it is not supported.
- Added boundary checks.
- Make sure that bits 26-31 are zeroed for the TPM_STS register on read
- Updated Kconfig files for new define.
---
 hw/arm/Kconfig   |   1 +
 hw/tpm/Kconfig   |   7 +
 hw/tpm/meson.build   |   1 +
 hw/tpm/tpm_tis_i2c.c | 440 +++
 include/sysemu/tpm.h |   3 +
 5 files changed, 452 insertions(+)
 create mode 100644 hw/tpm/tpm_tis_i2c.c

diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index b5aed4aff5..05d6ef1a31 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -6,6 +6,7 @@ config ARM_VIRT
 imply VFIO_PLATFORM
 imply VFIO_XGMAC
 imply TPM_TIS_SYSBUS
+imply TPM_TIS_I2C
 imply NVDIMM
 select ARM_GIC
 select ACPI
diff --git a/hw/tpm/Kconfig b/hw/tpm/Kconfig
index 29e82f3c92..a46663288c 100644
--- a/hw/tpm/Kconfig
+++ b/hw/tpm/Kconfig
@@ -1,3 +1,10 @@
+config TPM_TIS_I2C
+bool
+depends on TPM
+select TPM_BACKEND
+select I2C
+select TPM_TIS
+
 config TPM_TIS_ISA
 bool
 depends on TPM && ISA_BUS
diff --git a/hw/tpm/meson.build b/hw/tpm/meson.build
index 7abc2d794a..76fe3cb098 100644
--- a/hw/tpm/meson.build
+++ b/hw/tpm/meson.build
@@ -1,6 +1,7 @@
 softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: files('tpm_tis_common.c'))
 softmmu_ss.add(when: 'CONFIG_TPM_TIS_ISA', if_true: files('tpm_tis_isa.c'))
 softmmu_ss.add(when: 'CONFIG_TPM_TIS_SYSBUS', if_true: 
files('tpm_tis_sysbus.c'))
+softmmu_ss.add(when: 'CONFIG_TPM_TIS_I2C', if_true: files('tpm_tis_i2c.c'))
 softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_crb.c'))
 softmmu_ss.add(when: 'CONFIG_TPM_TIS', if_true: files('tpm_ppi.c'))
 softmmu_ss.add(when: 'CONFIG_TPM_CRB', if_true: files('tpm_ppi.c'))
diff --git a/hw/tpm/tpm_tis_i2c.c b/hw/tpm/tpm_tis_i2c.c
new file mode 100644
index 00..5cec5f7806
--- /dev/null
+++ b/hw/tpm/tpm_tis_i2c.c
@@ -0,0 +1,440 @@
+/*
+ * tpm_tis_i2c.c - QEMU's TPM TIS I2C Device
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ * Implementation of the TIS interface according to specs found at
+ * http://www.trustedcomputinggroup.org. This implementation currently
+ * supports version 1.3, 21 March 2013
+ * In the developers menu choose the PC Client section then find the TIS
+ * specification.
+ *
+ * TPM TIS for TPM 2 implementation following TCG PC Client Platform
+ * TPM Profile (PTP) Specification, Familiy 2.0, Revision 00.43
+ *
+ * TPM I2C implementation follows TCG TPM I2c Interface specification,
+ * Family 2.0, Level 00, Revision 1.00
+ */
+
+#include "qemu/osdep.h"
+#include "hw/i2c/i2c.h"
+#include "hw/qdev-properties.h"
+#include "hw/acpi/tpm.h"
+#include "migration/vmstate.h"
+#include "tpm_prop.h"
+#include "tpm_tis.h"
+#include "qom/object.h"
+#include "block/aio.h"
+#include "qemu/main-loop.h"
+
+/* TPM TIS I2C registers */
+#define TPM_TIS_I2C_REG_LOC_SEL  0x00
+#define

Re: [PATCH v2] tests/avocado: re-factor igb test to avoid timeouts

2023-03-22 Thread Akihiko Odaki


On 2023/03/22 23:55, Alex Bennée wrote:

The core of the test was utilising "ethtool -t eth1 offline" to run
through a test sequence. For reasons unknown the test hangs under some
configurations of the build on centos8-stream. Fundamentally running
the old fedora-31 cloud-init is just too much for something that is
directed at testing one device. So we:

   - replace fedora with a custom kernel + buildroot rootfs
   - rename the test from IGB to NetDevEthtool
   - re-factor the common code, add (currently skipped) tests for other
  devices which support ethtool
   - remove the KVM limitation as its fast enough to run in KVM or TCG

Signed-off-by: Alex Bennée 
Reviewed-by: Philippe Mathieu-Daudé 
Cc: Akihiko Odaki 

---
v2
   - use squashfs instead of largely empty ext4 device
   - use read-only cdrom
   - don't bother with login favour of direct call from init
   - kill VM once test is passed
   - add explicit kvm option


Why did you add explicit kvm option? Is there something not likely 
covered with TCG?


Regards,
Akihiko Odaki


   - add tags for device type
---
  tests/avocado/igb.py|  38 ---
  tests/avocado/netdev-ethtool.py | 116 
  2 files changed, 116 insertions(+), 38 deletions(-)
  delete mode 100644 tests/avocado/igb.py
  create mode 100644 tests/avocado/netdev-ethtool.py

diff --git a/tests/avocado/igb.py b/tests/avocado/igb.py
deleted file mode 100644
index abf5dfa07f..00
--- a/tests/avocado/igb.py
+++ /dev/null
@@ -1,38 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0-or-later
-# ethtool tests for igb registers, interrupts, etc
-
-from avocado_qemu import LinuxTest
-
-class IGB(LinuxTest):
-"""
-:avocado: tags=accel:kvm
-:avocado: tags=arch:x86_64
-:avocado: tags=distro:fedora
-:avocado: tags=distro_version:31
-:avocado: tags=machine:q35
-"""
-
-timeout = 180
-
-def test(self):
-self.require_accelerator('kvm')
-kernel_url = self.distro.pxeboot_url + 'vmlinuz'
-kernel_hash = '5b6f6876e1b5bda314f93893271da0d5777b1f3c'
-kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
-initrd_url = self.distro.pxeboot_url + 'initrd.img'
-initrd_hash = 'dd0340a1b39bd28f88532babd4581c67649ec5b1'
-initrd_path = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
-
-# Ideally we want to test MSI as well, but it is blocked by a bug
-# fixed with:
-# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28e96556baca7056d11d9fb3cdd0aba4483e00d8
-kernel_params = self.distro.default_kernel_params + ' pci=nomsi'
-
-self.vm.add_args('-kernel', kernel_path,
- '-initrd', initrd_path,
- '-append', kernel_params,
- '-accel', 'kvm',
- '-device', 'igb')
-self.launch_and_wait()
-self.ssh_command('dnf -y install ethtool')
-self.ssh_command('ethtool -t eth1 offline')
diff --git a/tests/avocado/netdev-ethtool.py b/tests/avocado/netdev-ethtool.py
new file mode 100644
index 00..f7e9464184
--- /dev/null
+++ b/tests/avocado/netdev-ethtool.py
@@ -0,0 +1,116 @@
+# ethtool tests for emulated network devices
+#
+# This test leverages ethtool's --test sequence to validate network
+# device behaviour.
+#
+# SPDX-License-Identifier: GPL-2.0-or-late
+
+from avocado import skip
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import exec_command, exec_command_and_wait_for_pattern
+from avocado_qemu import wait_for_console_pattern
+
+class NetDevEthtool(QemuSystemTest):
+"""
+:avocado: tags=arch:x86_64
+:avocado: tags=machine:q35
+"""
+
+# Runs in about 17s under KVM, 19s under TCG, 25s under GCOV
+timeout = 45
+
+# Fetch assets from the netdev-ethtool subdir of my shared test
+# images directory on fileserver.linaro.org.
+def get_asset(self, name, sha1):
+base_url = ('https://fileserver.linaro.org/s/'
+'kE4nCFLdQcoBF9t/download?'
+'path=%2Fnetdev-ethtool=' )
+url = base_url + name
+# use explicit name rather than failing to neatly parse the
+# URL into a unique one
+return self.fetch_asset(name=name, locations=(url), asset_hash=sha1)
+
+def common_test_code(self, netdev, extra_args=None, kvm=False):
+
+# This custom kernel has drivers for all the supported network
+# devices we can emulate in QEMU
+kernel = self.get_asset("bzImage",
+"33469d7802732d5815226166581442395cb289e2")
+
+rootfs = self.get_asset("rootfs.squashfs",
+"9793cea7021414ae844bda51f558bd6565b50cdc")
+
+append = 'printk.time=0 console=ttyS0 '
+append += 'root=/dev/sr0 rootfstype=squashfs '
+
+# any additional kernel tweaks for the test
+if extra_args:
+

[PATCH v4 0/2] target/riscv: reduce MSTATUS_SUM overhead

2023-03-22 Thread Fei Wu

v3 -> v4:
* seperate priv from mmu_idx
* use index 2 for S+SUM mmu_idx
* no tlb_flush for MPRV / MPP changes

Fei Wu (2):
  target/riscv: separate priv from mmu_idx
  target/riscv: reduce overhead of MSTATUS_SUM change

 target/riscv/cpu.h|  2 --
 target/riscv/cpu_helper.c | 19 ---
 target/riscv/csr.c|  3 +--
 .../riscv/insn_trans/trans_privileged.c.inc   |  2 +-
 target/riscv/insn_trans/trans_rvh.c.inc   |  4 ++--
 target/riscv/insn_trans/trans_xthead.c.inc|  7 +--
 target/riscv/internals.h  | 14 ++
 target/riscv/op_helper.c  |  5 +++--
 target/riscv/translate.c  |  3 +++
 9 files changed, 41 insertions(+), 18 deletions(-)

-- 
2.25.1

[PATCH v4 2/2] target/riscv: reduce overhead of MSTATUS_SUM change

2023-03-22 Thread Fei Wu

Kernel needs to access user mode memory e.g. during syscalls, the window
is usually opened up for a very limited time through MSTATUS.SUM, the
overhead is too much if tlb_flush() gets called for every SUM change.

This patch creates a separate MMU index for S+SUM, so that it's not
necessary to flush tlb anymore when SUM changes. This is similar to how
ARM handles Privileged Access Never (PAN).

Result of 'pipe 10' from unixbench boosts from 223656 to 1705006. Many
other syscalls benefit a lot from this too.

Signed-off-by: Fei Wu 
---
 target/riscv/cpu.h  |  1 -
 target/riscv/cpu_helper.c   | 17 +++--
 target/riscv/csr.c  |  3 +--
 target/riscv/insn_trans/trans_rvh.c.inc |  4 ++--
 target/riscv/internals.h| 14 ++
 target/riscv/op_helper.c|  5 +++--
 6 files changed, 35 insertions(+), 9 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 66f7e3d1ba..d65eeb3c85 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -623,7 +623,6 @@ G_NORETURN void riscv_raise_exception(CPURISCVState *env,
 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 
-#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 #define TB_FLAGS_MSTATUS_VS MSTATUS_VS
 
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 76e1b0100e..bbc612badf 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -21,6 +21,7 @@
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
 #include "cpu.h"
+#include "internals.h"
 #include "pmu.h"
 #include "exec/exec-all.h"
 #include "instmap.h"
@@ -36,7 +37,19 @@ int riscv_cpu_mmu_index(CPURISCVState *env, bool ifetch)
 #ifdef CONFIG_USER_ONLY
 return 0;
 #else
-return env->priv;
+if (ifetch) {
+return env->priv;
+}
+
+/* All priv -> mmu_idx mapping are here */
+int mode = env->priv;
+if (mode == PRV_M && get_field(env->mstatus, MSTATUS_MPRV)) {
+mode = get_field(env->mstatus, MSTATUS_MPP);
+}
+if (mode == PRV_S && get_field(env->mstatus, MSTATUS_SUM)) {
+return MMUIdx_S_SUM;
+}
+return mode;
 #endif
 }
 
@@ -596,7 +609,7 @@ void riscv_cpu_set_virt_enabled(CPURISCVState *env, bool 
enable)
 
 bool riscv_cpu_two_stage_lookup(int mmu_idx)
 {
-return mmu_idx & TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+return mmu_idx & MMU_HYP_ACCESS_BIT;
 }
 
 int riscv_cpu_claim_interrupts(RISCVCPU *cpu, uint64_t interrupts)
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d522efc0b6..f74e40e66d 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1246,8 +1246,7 @@ static RISCVException write_mstatus(CPURISCVState *env, 
int csrno,
 RISCVMXL xl = riscv_cpu_mxl(env);
 
 /* flush tlb on mstatus fields that affect VM */
-if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPP | MSTATUS_MPV |
-MSTATUS_MPRV | MSTATUS_SUM)) {
+if ((val ^ mstatus) & (MSTATUS_MXR | MSTATUS_MPV)) {
 tlb_flush(env_cpu(env));
 }
 mask = MSTATUS_SIE | MSTATUS_SPIE | MSTATUS_MIE | MSTATUS_MPIE |
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index 9248b48c36..15842f4282 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -40,7 +40,7 @@ static bool do_hlv(DisasContext *ctx, arg_r2 *a, MemOp mop)
 if (check_access(ctx)) {
 TCGv dest = dest_gpr(ctx, a->rd);
 TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
-int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+int mem_idx = ctx->mem_idx | MMU_HYP_ACCESS_BIT;
 tcg_gen_qemu_ld_tl(dest, addr, mem_idx, mop);
 gen_set_gpr(ctx, a->rd, dest);
 }
@@ -87,7 +87,7 @@ static bool do_hsv(DisasContext *ctx, arg_r2_s *a, MemOp mop)
 if (check_access(ctx)) {
 TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
 TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
-int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+int mem_idx = ctx->mem_idx | MMU_HYP_ACCESS_BIT;
 tcg_gen_qemu_st_tl(data, addr, mem_idx, mop);
 }
 return true;
diff --git a/target/riscv/internals.h b/target/riscv/internals.h
index 5620fbffb6..b55152a7dc 100644
--- a/target/riscv/internals.h
+++ b/target/riscv/internals.h
@@ -21,6 +21,20 @@
 
 #include "hw/registerfields.h"
 
+/*
+ * The current MMU Modes are:
+ *  - U 0b000
+ *  - S 0b001
+ *  - S+SUM 0b010
+ *  - M 0b011
+ *  - HLV/HLVX/HSV adds 0b100
+ */
+#define MMUIdx_U0
+#define MMUIdx_S1
+#define MMUIdx_S_SUM2
+#define MMUIdx_M3
+#define MMU_HYP_ACCESS_BIT  (1 << 2)
+
 /* share data between vector helpers and decode code */
 FIELD(VDATA, VM, 0, 1)
 FIELD(VDATA, LMUL, 1, 3)
diff --git a/target/riscv/op_helper.c

[PATCH v4 1/2] target/riscv: separate priv from mmu_idx

2023-03-22 Thread Fei Wu

Currently it's assumed the 2 low bits of mmu_idx map to privilege mode,
this assumption won't last as we are about to add more mmu_idx.

Signed-off-by: Fei Wu 
---
 target/riscv/cpu.h | 1 -
 target/riscv/cpu_helper.c  | 2 +-
 target/riscv/insn_trans/trans_privileged.c.inc | 2 +-
 target/riscv/insn_trans/trans_xthead.c.inc | 7 +--
 target/riscv/translate.c   | 3 +++
 5 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..66f7e3d1ba 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -623,7 +623,6 @@ G_NORETURN void riscv_raise_exception(CPURISCVState *env,
 target_ulong riscv_cpu_get_fflags(CPURISCVState *env);
 void riscv_cpu_set_fflags(CPURISCVState *env, target_ulong);
 
-#define TB_FLAGS_PRIV_MMU_MASK3
 #define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
 #define TB_FLAGS_MSTATUS_FS MSTATUS_FS
 #define TB_FLAGS_MSTATUS_VS MSTATUS_VS
diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index f88c503cf4..76e1b0100e 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -762,7 +762,7 @@ static int get_physical_address(CPURISCVState *env, hwaddr 
*physical,
  * (riscv_cpu_do_interrupt) is correct */
 MemTxResult res;
 MemTxAttrs attrs = MEMTXATTRS_UNSPECIFIED;
-int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;
+int mode = env->priv;
 bool use_background = false;
 hwaddr ppn;
 RISCVCPU *cpu = env_archcpu(env);
diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 59501b2780..9305b18299 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -52,7 +52,7 @@ static bool trans_ebreak(DisasContext *ctx, arg_ebreak *a)
  * that no exception will be raised when fetching them.
  */
 
-if (semihosting_enabled(ctx->mem_idx < PRV_S) &&
+if (semihosting_enabled(ctx->priv < PRV_S) &&
 (pre_addr & TARGET_PAGE_MASK) == (post_addr & TARGET_PAGE_MASK)) {
 pre= opcode_at(>base, pre_addr);
 ebreak = opcode_at(>base, ebreak_addr);
diff --git a/target/riscv/insn_trans/trans_xthead.c.inc 
b/target/riscv/insn_trans/trans_xthead.c.inc
index df504c3f2c..adfb53cb4c 100644
--- a/target/riscv/insn_trans/trans_xthead.c.inc
+++ b/target/riscv/insn_trans/trans_xthead.c.inc
@@ -265,12 +265,7 @@ static bool trans_th_tst(DisasContext *ctx, arg_th_tst *a)
 
 static inline int priv_level(DisasContext *ctx)
 {
-#ifdef CONFIG_USER_ONLY
-return PRV_U;
-#else
- /* Priv level is part of mem_idx. */
-return ctx->mem_idx & TB_FLAGS_PRIV_MMU_MASK;
-#endif
+return ctx->priv;
 }
 
 /* Test if priv level is M, S, or U (cannot fail). */
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 0ee8ee147d..e8880f9423 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -69,6 +69,7 @@ typedef struct DisasContext {
 uint32_t mstatus_hs_fs;
 uint32_t mstatus_hs_vs;
 uint32_t mem_idx;
+uint32_t priv;
 /* Remember the rounding mode encoded in the previous fp instruction,
which we have already installed into env->fp_status.  Or -1 for
no previous fp instruction.  Note that we exit the TB when writing
@@ -1162,8 +1163,10 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 } else {
 ctx->virt_enabled = false;
 }
+ctx->priv = env->priv;
 #else
 ctx->virt_enabled = false;
+ctx->priv = PRV_U;
 #endif
 ctx->misa_ext = env->misa_ext;
 ctx->frm = -1;  /* unknown rounding mode */
-- 
2.25.1

[PATCH 3/6] target/ppc: Fix instruction loading endianness in alignment interrupt

2023-03-22 Thread Nicholas Piggin

powerpc ifetch endianness depends on MSR[LE] so it has to byteswap
after cpu_ldl_code(). This corrects DSISR bits in alignment
interrupts when running in little endian mode.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 287659c74d..5f0e363363 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -133,6 +133,31 @@ static void dump_hcall(CPUPPCState *env)
   env->nip);
 }
 
+/* Return true iff byteswap is needed in a scalar memop */
+static inline bool need_byteswap(CPUArchState *env)
+{
+#if TARGET_BIG_ENDIAN
+ return !!(env->msr & ((target_ulong)1 << MSR_LE));
+#else
+ return !(env->msr & ((target_ulong)1 << MSR_LE));
+#endif
+}
+
+static uint32_t ppc_ldl_code(CPUArchState *env, abi_ptr addr)
+{
+uint32_t insn = cpu_ldl_code(env, addr);
+#if TARGET_BIG_ENDIAN
+if (env->msr & ((target_ulong)1 << MSR_LE)) {
+insn = bswap32(insn);
+}
+#else
+if (!(env->msr & ((target_ulong)1 << MSR_LE))) {
+insn = bswap32(insn);
+}
+#endif
+return insn;
+}
+
 static void ppc_excp_debug_sw_tlb(CPUPPCState *env, int excp)
 {
 const char *es;
@@ -3097,7 +3122,7 @@ void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr 
vaddr,
 
 /* Restore state and reload the insn we executed, for filling in DSISR.  */
 cpu_restore_state(cs, retaddr);
-insn = cpu_ldl_code(env, env->nip);
+insn = ppc_ldl_code(env, env->nip);
 
 switch (env->mmu_model) {
 case POWERPC_MMU_SOFT_4xx:
-- 
2.37.2

[PATCH 4/6] target/ppc: Alignment faults do not set DSISR in ISA v3.0 onward

2023-03-22 Thread Nicholas Piggin

This optional behavior was removed from the ISA in v3.0, see
Summary of Changes preface:

  Data Storage Interrupt Status Register for Alignment Interrupt:
  Simplifies the Alignment interrupt by remov- ing the Data Storage
  Interrupt Status Register (DSISR) from the set of registers modified
  by the Alignment interrupt.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 5f0e363363..c8b8eca3b1 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1456,13 +1456,22 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int 
excp)
 break;
 }
 case POWERPC_EXCP_ALIGN: /* Alignment exception  */
-/* Get rS/rD and rA from faulting opcode */
-/*
- * Note: the opcode fields will not be set properly for a
- * direct store load/store, but nobody cares as nobody
- * actually uses direct store segments.
- */
-env->spr[SPR_DSISR] |= (env->error_code & 0x03FF) >> 16;
+switch (env->excp_model) {
+case POWERPC_EXCP_970:
+case POWERPC_EXCP_POWER7:
+case POWERPC_EXCP_POWER8:
+/* Get rS/rD and rA from faulting opcode */
+/*
+ * Note: the opcode fields will not be set properly for a
+ * direct store load/store, but nobody cares as nobody
+ * actually uses direct store segments.
+ */
+env->spr[SPR_DSISR] |= (env->error_code & 0x03FF) >> 16;
+break;
+default:
+/* Optional DSISR update was removed from ISA v3.0 */
+break;
+}
 break;
 case POWERPC_EXCP_PROGRAM:   /* Program exception*/
 switch (env->error_code & ~0xF) {
-- 
2.37.2

[PATCH 5/6] target/ppc: Add SRR1 prefix indication to interrupt handlers

2023-03-22 Thread Nicholas Piggin

ISA v3.1 introduced prefix instructions. Among the changes, various
synchronous interrupts report whether they were caused by a prefix
instruction in (H)SRR1.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c8b8eca3b1..2e0321ab69 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1353,12 +1353,26 @@ static bool books_vhyp_handles_hv_excp(PowerPCCPU *cpu)
 return false;
 }
 
+static bool is_prefix_excp(CPUPPCState *env, uint32_t insn)
+{
+switch (env->excp_model) {
+case POWERPC_EXCP_970:
+case POWERPC_EXCP_POWER7:
+case POWERPC_EXCP_POWER8:
+case POWERPC_EXCP_POWER9:
+return false;
+default: /* POWER10 / ISAv3.1 onward */
+return ((insn & 0xfc00) == 0x0400);
+}
+}
+
 static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 target_ulong msr, new_msr, vector;
 int srr0, srr1, lev = -1;
+uint32_t insn = 0;
 
 /* new srr1 value excluding must-be-zero bits */
 msr = env->msr & ~0x783fULL;
@@ -1397,6 +1411,29 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int excp)
 
 vector |= env->excp_prefix;
 
+switch (excp) {
+case POWERPC_EXCP_MCHECK:
+case POWERPC_EXCP_DSI:
+case POWERPC_EXCP_DSEG:
+case POWERPC_EXCP_ALIGN:
+case POWERPC_EXCP_PROGRAM:
+case POWERPC_EXCP_FPU:
+case POWERPC_EXCP_TRACE:
+case POWERPC_EXCP_HDSI:
+case POWERPC_EXCP_HV_EMU:
+case POWERPC_EXCP_VPU:
+case POWERPC_EXCP_VSXU:
+case POWERPC_EXCP_FU:
+case POWERPC_EXCP_HV_FU:
+insn = ppc_ldl_code(env, env->nip);
+if (is_prefix_excp(env, insn)) {
+msr |= PPC_BIT(34);
+}
+break;
+default:
+break;
+}
+
 switch (excp) {
 case POWERPC_EXCP_MCHECK:/* Machine check exception  */
 if (!FIELD_EX64(env->msr, MSR, ME)) {
-- 
2.37.2

[PATCH 1/6] target/ppc: Fix width of some 32-bit SPRs

2023-03-22 Thread Nicholas Piggin

Some 32-bit SPRs are incorrectly implemented as 64-bits on 64-bit
targets.

This changes VRSAVE, DSISR, HDSISR, DAWRX0, PIDR, LPIDR, DEXCR,
HDEXCR, CTRL, TSCR, MMCRH, and PMC[1-6] from to be 32-bit registers.

This only goes by the 32/64 classification in the architecture, it
does not try to implement finer details of SPR implementation (e.g.,
not all bits implemented as simple read/write storage).

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c| 18 +-
 target/ppc/helper_regs.c |  2 +-
 target/ppc/misc_helper.c |  4 ++--
 target/ppc/power8-pmu.c  |  2 +-
 target/ppc/translate.c   |  2 +-
 5 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 0ce2e3c91d..5aa0b3f0f1 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5085,8 +5085,8 @@ static void register_book3s_altivec_sprs(CPUPPCState *env)
 }
 
 spr_register_kvm(env, SPR_VRSAVE, "VRSAVE",
- _read_generic, _write_generic,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
+ _read_generic, _write_generic32,
  KVM_REG_PPC_VRSAVE, 0x);
 
 }
@@ -5120,7 +5120,7 @@ static void register_book3s_207_dbg_sprs(CPUPPCState *env)
 spr_register_kvm_hv(env, SPR_DAWRX0, "DAWRX0",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _write_generic32,
 KVM_REG_PPC_DAWRX, 0x);
 spr_register_kvm_hv(env, SPR_CIABR, "CIABR",
 SPR_NOACCESS, SPR_NOACCESS,
@@ -5376,7 +5376,7 @@ static void register_book3s_ids_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_TSCR, "TSCR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
  0x);
 spr_register_hv(env, SPR_HMER, "HMER",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5406,7 +5406,7 @@ static void register_book3s_ids_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_MMCRC, "MMCRC",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
  0x);
 spr_register_hv(env, SPR_MMCRH, "MMCRH",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5441,7 +5441,7 @@ static void register_book3s_ids_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_HDSISR, "HDSISR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
  0x);
 spr_register_hv(env, SPR_HRMOR, "HRMOR",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5665,7 +5665,7 @@ static void register_power7_book4_sprs(CPUPPCState *env)
  KVM_REG_PPC_ACOP, 0);
 spr_register_kvm(env, SPR_BOOKS_PID, "PID",
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
  KVM_REG_PPC_PID, 0);
 #endif
 }
@@ -5730,7 +5730,7 @@ static void register_power10_dexcr_sprs(CPUPPCState *env)
 {
 spr_register(env, SPR_DEXCR, "DEXCR",
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _write_generic32,
 0);
 
 spr_register(env, SPR_UDEXCR, "DEXCR",
@@ -5741,7 +5741,7 @@ static void register_power10_dexcr_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_HDEXCR, "HDEXCR",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _write_generic32,
 0);
 
 spr_register(env, SPR_UHDEXCR, "HDEXCR",
diff --git a/target/ppc/helper_regs.c b/target/ppc/helper_regs.c
index 779e7db513..fb351c303f 100644
--- a/target/ppc/helper_regs.c
+++ b/target/ppc/helper_regs.c
@@ -448,7 +448,7 @@ void register_non_embedded_sprs(CPUPPCState *env)
 /* Exception processing */
 spr_register_kvm(env, SPR_DSISR, "DSISR",
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _write_generic32,
  KVM_REG_PPC_DSISR, 0x);
 spr_register_kvm(env, SPR_DAR, "DAR",
  SPR_NOACCESS, SPR_NOACCESS,
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index a9bc1522e2..40ddc5c08c 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -190,13 +190,13 @@ void helper_store_dpdes(CPUPPCState *env, target_ulong

[PATCH 2/6] target/ppc: Better CTRL SPR implementation

2023-03-22 Thread Nicholas Piggin

The CTRL register is able to write bit zero, and that is reflected in a
bit field in the register that reflects the state of all threads in the
core.

TCG does not implement SMT, so this just requires mirroring that bit into
the first bit of the thread state field.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/translate.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 58fa509057..d699acb3d0 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -413,7 +413,14 @@ void spr_write_generic(DisasContext *ctx, int sprn, int 
gprn)
 
 void spr_write_CTRL(DisasContext *ctx, int sprn, int gprn)
 {
-spr_write_generic32(ctx, sprn, gprn);
+/* This does not implement >1 thread */
+TCGv t0 = tcg_temp_new();
+TCGv t1 = tcg_temp_new();
+tcg_gen_extract_tl(t0, cpu_gpr[gprn], 0, 1); /* Extract RUN field */
+tcg_gen_shli_tl(t1, t0, 8); /* Duplicate the bit in TS */
+tcg_gen_or_tl(t1, t1, t0);
+gen_store_spr(sprn, t1);
+spr_store_dump_spr(sprn);
 
 /*
  * SPR_CTRL writes must force a new translation block,
-- 
2.37.2

[PATCH 6/6] target/ppc: Implement HEIR SPR

2023-03-22 Thread Nicholas Piggin

The hypervisor emulation assistance interrupt modifies HEIR to
contain the value of the instruction which caused the exception.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h |  1 +
 target/ppc/cpu_init.c| 23 +++
 target/ppc/excp_helper.c | 12 +++-
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 557d736dab..8c4a203ecb 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1653,6 +1653,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_HMER  (0x150)
 #define SPR_HMEER (0x151)
 #define SPR_PCR   (0x152)
+#define SPR_HEIR  (0x153)
 #define SPR_BOOKE_LPIDR   (0x152)
 #define SPR_BOOKE_TCR (0x154)
 #define SPR_BOOKE_TLB0PS  (0x158)
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 5aa0b3f0f1..ff73be1812 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -1629,6 +1629,7 @@ static void register_8xx_sprs(CPUPPCState *env)
  * HSRR0   => SPR 314 (Power 2.04 hypv)
  * HSRR1   => SPR 315 (Power 2.04 hypv)
  * LPIDR   => SPR 317 (970)
+ * HEIR=> SPR 339 (Power 2.05 hypv) (64-bit reg from 3.1)
  * EPR => SPR 702 (Power 2.04 emb)
  * perf=> 768-783 (Power 2.04)
  * perf=> 784-799 (Power 2.04)
@@ -5522,6 +5523,24 @@ static void register_power6_common_sprs(CPUPPCState *env)
  0x);
 }
 
+static void register_HEIR32_spr(CPUPPCState *env)
+{
+spr_register_hv(env, SPR_HEIR, "HEIR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic32,
+ 0x);
+}
+
+static void register_HEIR64_spr(CPUPPCState *env)
+{
+spr_register_hv(env, SPR_HEIR, "HEIR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_generic,
+ 0x);
+}
+
 static void register_power8_tce_address_control_sprs(CPUPPCState *env)
 {
 spr_register_kvm(env, SPR_TAR, "TAR",
@@ -5950,6 +5969,7 @@ static void init_proc_POWER7(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
+register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
 register_power7_book4_sprs(env);
 
@@ -6072,6 +6092,7 @@ static void init_proc_POWER8(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
+register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
@@ -6234,6 +6255,7 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
+register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
@@ -6409,6 +6431,7 @@ static void init_proc_POWER10(CPUPPCState *env)
 register_power5p_ear_sprs(env);
 register_power5p_tb_sprs(env);
 register_power6_common_sprs(env);
+register_HEIR64_spr(env);
 register_power6_dbg_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 2e0321ab69..d206903562 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -1614,13 +1614,23 @@ static void powerpc_excp_books(PowerPCCPU *cpu, int 
excp)
 case POWERPC_EXCP_HDECR: /* Hypervisor decrementer exception */
 case POWERPC_EXCP_HDSI:  /* Hypervisor data storage exception*/
 case POWERPC_EXCP_SDOOR_HV:  /* Hypervisor Doorbell interrupt*/
-case POWERPC_EXCP_HV_EMU:
 case POWERPC_EXCP_HVIRT: /* Hypervisor virtualization*/
 srr0 = SPR_HSRR0;
 srr1 = SPR_HSRR1;
 new_msr |= (target_ulong)MSR_HVB;
 new_msr |= env->msr & ((target_ulong)1 << MSR_RI);
 break;
+case POWERPC_EXCP_HV_EMU:
+env->spr[SPR_HEIR] = insn;
+if (is_prefix_excp(env, insn)) {
+uint32_t insn2 = ppc_ldl_code(env, env->nip + 4);
+env->spr[SPR_HEIR] |= (uint64_t)insn2 << 32;
+}
+srr0 = SPR_HSRR0;
+srr1 = SPR_HSRR1;
+new_msr |= (target_ulong)MSR_HVB;
+new_msr |= env->msr & ((target_ulong)1 << MSR_RI);
+break;
 case POWERPC_EXCP_VPU:   /* Vector unavailable exception */
 case POWERPC_EXCP_VSXU:   /* VSX unavailable exception   */
 case POWERPC_EXCP_FU: /* Facility unavailable exception  */
-- 
2.37.2

Re: [PATCH for-8.1 v4 11/25] target/riscv/cpu.c: set cpu config in set_misa()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 10:14, LIU Zhiwei wrote:


On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

set_misa() is setting all 'misa' related env states and nothing else.
But other functions, namely riscv_cpu_validate_set_extensions(), uses
the config object to do its job.

This creates a need to set the single letter extensions in the cfg
object to keep both in sync. At this moment this is being done by
register_cpu_props(), forcing every CPU to do a call to this function.

Let's beef up set_misa() and make the function do the sync for us. This
will relieve named CPUs to having to call register_cpu_props(), which
will then be redesigned to a more specialized role next.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 43 ---
  target/riscv/cpu.h |  4 ++--
  2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36c55abda0..df5c0bda70 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -236,8 +236,40 @@ const char *riscv_cpu_get_trap_name(target_ulong 
cause, bool async)

    static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
  {
+    RISCVCPU *cpu;
+
  env->misa_mxl_max = env->misa_mxl = mxl;
  env->misa_ext_mask = env->misa_ext = ext;
+
+    /*
+ * ext = 0 will only be a thing during cpu_init() functions
+ * as a way of setting an extension-agnostic CPU. We do
+ * not support clearing misa_ext* and the ext_N flags in
+ * RISCVCPUConfig in regular circunstances.
+ */
+    if (ext == 0) {
+    return;
+    }
+
+    /*
+ * We can't use riscv_cpu_cfg() in this case because it is
+ * a read-only inline and we're going to change the values
+ * of cpu->cfg.
+ */
+    cpu = env_archcpu(env);
+
+    cpu->cfg.ext_i = ext & RVI;
+    cpu->cfg.ext_e = ext & RVE;
+    cpu->cfg.ext_m = ext & RVM;
+    cpu->cfg.ext_a = ext & RVA;
+    cpu->cfg.ext_f = ext & RVF;
+    cpu->cfg.ext_d = ext & RVD;
+    cpu->cfg.ext_v = ext & RVV;
+    cpu->cfg.ext_c = ext & RVC;
+    cpu->cfg.ext_s = ext & RVS;
+    cpu->cfg.ext_u = ext & RVU;
+    cpu->cfg.ext_h = ext & RVH;
+    cpu->cfg.ext_j = ext & RVJ;
  }
    #ifndef CONFIG_USER_ONLY
@@ -340,7 +372,6 @@ static void riscv_any_cpu_init(Object *obj)
  #endif
    env->priv_ver = PRIV_VERSION_LATEST;
-    register_cpu_props(obj);


This patch will break the original logic. We can only can a empty CPU 
here.


Oops. I mistook it as a general cpu. Just ignore this comment.

Zhiwei




    /* inherited from parent obj via riscv_cpu_init() */
  cpu->cfg.ext_ifencei = true;
@@ -368,7 +399,6 @@ static void rv64_sifive_u_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  CPURISCVState *env = >env;
  set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS 
| RVU);

-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
@@ -387,7 +417,6 @@ static void rv64_sifive_e_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
    set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -408,9 +437,6 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  env->priv_ver = PRIV_VERSION_1_11_0;
    cpu->cfg.ext_g = true;
-    cpu->cfg.ext_c = true;
-    cpu->cfg.ext_u = true;
-    cpu->cfg.ext_s = true;


Why specially for these configurations?

Zhiwei


  cpu->cfg.ext_icsr = true;
  cpu->cfg.ext_zfh = true;
  cpu->cfg.mmu = true;
@@ -472,8 +498,6 @@ static void rv32_sifive_u_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  CPURISCVState *env = >env;
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS 
| RVU);

-
-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
@@ -492,7 +516,6 @@ static void rv32_sifive_e_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
    set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -510,7 +533,6 @@ static void rv32_ibex_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
    set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_11_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -529,7 +551,6 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
    set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);
-    register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef

Re: [PATCH for-8.1 v4 11/25] target/riscv/cpu.c: set cpu config in set_misa()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

set_misa() is setting all 'misa' related env states and nothing else.
But other functions, namely riscv_cpu_validate_set_extensions(), uses
the config object to do its job.

This creates a need to set the single letter extensions in the cfg
object to keep both in sync. At this moment this is being done by
register_cpu_props(), forcing every CPU to do a call to this function.

Let's beef up set_misa() and make the function do the sync for us. This
will relieve named CPUs to having to call register_cpu_props(), which
will then be redesigned to a more specialized role next.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 43 ---
  target/riscv/cpu.h |  4 ++--
  2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36c55abda0..df5c0bda70 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -236,8 +236,40 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
  
  static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)

  {
+RISCVCPU *cpu;
+
  env->misa_mxl_max = env->misa_mxl = mxl;
  env->misa_ext_mask = env->misa_ext = ext;
+
+/*
+ * ext = 0 will only be a thing during cpu_init() functions
+ * as a way of setting an extension-agnostic CPU. We do
+ * not support clearing misa_ext* and the ext_N flags in
+ * RISCVCPUConfig in regular circunstances.
+ */
+if (ext == 0) {
+return;
+}
+
+/*
+ * We can't use riscv_cpu_cfg() in this case because it is
+ * a read-only inline and we're going to change the values
+ * of cpu->cfg.
+ */
+cpu = env_archcpu(env);
+
+cpu->cfg.ext_i = ext & RVI;
+cpu->cfg.ext_e = ext & RVE;
+cpu->cfg.ext_m = ext & RVM;
+cpu->cfg.ext_a = ext & RVA;
+cpu->cfg.ext_f = ext & RVF;
+cpu->cfg.ext_d = ext & RVD;
+cpu->cfg.ext_v = ext & RVV;
+cpu->cfg.ext_c = ext & RVC;
+cpu->cfg.ext_s = ext & RVS;
+cpu->cfg.ext_u = ext & RVU;
+cpu->cfg.ext_h = ext & RVH;
+cpu->cfg.ext_j = ext & RVJ;
  }
  
  #ifndef CONFIG_USER_ONLY

@@ -340,7 +372,6 @@ static void riscv_any_cpu_init(Object *obj)
  #endif
  
  env->priv_ver = PRIV_VERSION_LATEST;

-register_cpu_props(obj);


This patch will break the original logic. We can only can a empty CPU here.

  
  /* inherited from parent obj via riscv_cpu_init() */

  cpu->cfg.ext_ifencei = true;
@@ -368,7 +399,6 @@ static void rv64_sifive_u_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  CPURISCVState *env = >env;
  set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
@@ -387,7 +417,6 @@ static void rv64_sifive_e_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  
  set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);

-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -408,9 +437,6 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  env->priv_ver = PRIV_VERSION_1_11_0;
  
  cpu->cfg.ext_g = true;

-cpu->cfg.ext_c = true;
-cpu->cfg.ext_u = true;
-cpu->cfg.ext_s = true;


Why specially for these configurations?

Zhiwei


  cpu->cfg.ext_icsr = true;
  cpu->cfg.ext_zfh = true;
  cpu->cfg.mmu = true;
@@ -472,8 +498,6 @@ static void rv32_sifive_u_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  CPURISCVState *env = >env;
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
-
-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
@@ -492,7 +516,6 @@ static void rv32_sifive_e_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);

-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -510,7 +533,6 @@ static void rv32_ibex_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  
  set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);

-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_11_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -529,7 +551,6 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
  RISCVCPU *cpu = RISCV_CPU(obj);
  
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);

-register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
diff --git a/target/riscv/cpu.h

Re: [PATCH for-8.1 v4 10/25] target/riscv/cpu.c: avoid set_misa() in validate_set_extensions()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

set_misa() will be tuned up to do more than it's already doing and it
will be redundant to what riscv_cpu_validate_set_extensions() does.

Note that we don't ever change env->misa_mlx

typo.

in this function, so
set_misa() can be replaced by just assigning env->misa_ext and
env->misa_ext_mask to 'ext'.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c7b6e7b84b..36c55abda0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -949,7 +949,8 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
  
  /*

   * Check consistency between chosen extensions while setting
- * cpu->cfg accordingly, doing a set_misa() in the end.
+ * cpu->cfg accordingly, setting env->misa_ext and
+ * misa_ext_mask in the end.
   */
  static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
  {
@@ -1168,7 +1169,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
  ext |= RVJ;
  }
  
-set_misa(env, env->misa_mxl, ext);

+env->misa_ext_mask = env->misa_ext = ext;


Reviewed-by: LIU Zhiwei 

Zhiwei


  }
  
  #ifndef CONFIG_USER_ONLY

Re: [PATCH for-8.1 v4 09/25] target/riscv/cpu.c: remove cfg setup from riscv_cpu_init()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

We have 4 config settings being done in riscv_cpu_init(): ext_ifencei,
ext_icsr, mmu and pmp. This is also the constructor of the "riscv-cpu"
device, which happens to be the parent device of every RISC-V cpu.

The result is that these 4 configs are being set every time, and every
other CPU should always account for them. CPUs such as sifive_e need to
disable settings that aren't enabled simply because the parent class
happens to be enabling it.

Moving all configurations from the parent class to each CPU will
centralize the config of each CPU into its own init(), which is clearer
than having to account to whatever happens to be set in the parent
device. These settings are also being set in register_cpu_props() when
no 'misa_ext' is set, so for these CPUs we don't need changes. Named
CPUs will receive all cfgs that the parent were setting into their
init().

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 60 --
  1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fef55d7d79..c7b6e7b84b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -325,7 +325,8 @@ static void set_satp_mode_default_map(RISCVCPU *cpu)
  
  static void riscv_any_cpu_init(Object *obj)

  {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
  #if defined(TARGET_RISCV32)
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
  #elif defined(TARGET_RISCV64)
@@ -340,6 +341,12 @@ static void riscv_any_cpu_init(Object *obj)
  
  env->priv_ver = PRIV_VERSION_LATEST;

  register_cpu_props(obj);
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
  }
  
  #if defined(TARGET_RISCV64)

@@ -358,13 +365,20 @@ static void rv64_base_cpu_init(Object *obj)
  
  static void rv64_sifive_u_cpu_init(Object *obj)

  {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
  set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
  register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
  #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
  }
  
  static void rv64_sifive_e_cpu_init(Object *obj)

@@ -375,10 +389,14 @@ static void rv64_sifive_e_cpu_init(Object *obj)
  set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
  register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
-cpu->cfg.mmu = false;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
  #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.pmp = true;
  }
  
  static void rv64_thead_c906_cpu_init(Object *obj)

@@ -411,6 +429,10 @@ static void rv64_thead_c906_cpu_init(Object *obj)
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_SV39);
  #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.pmp = true;
  }
  
  static void rv128_base_cpu_init(Object *obj)

@@ -447,7 +469,8 @@ static void rv32_base_cpu_init(Object *obj)
  
  static void rv32_sifive_u_cpu_init(Object *obj)

  {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
  
  register_cpu_props(obj);

@@ -455,6 +478,12 @@ static void rv32_sifive_u_cpu_init(Object *obj)
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
  #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
  }
  
  static void rv32_sifive_e_cpu_init(Object *obj)

@@ -465,10 +494,14 @@ static void rv32_sifive_e_cpu_init(Object *obj)
  set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
  register_cpu_props(obj);
  env->priv_ver = PRIV_VERSION_1_10_0;
-cpu->cfg.mmu = false;
  #ifndef CONFIG_USER_ONLY
  set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
  #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.pmp = true;
  }
  
  static void rv32_ibex_cpu_init(Object *obj)

@@ -479,11 +512,15 @@ static void rv32_ibex_cpu_init(Object *obj)
  set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
  register_cpu_props(obj);

Re: [PATCH for-8.1 v4 08/25] target/riscv/cpu.c: validate extensions before riscv_timer_init()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

There is no need to init timers if we're not even sure that our
extensions are valid. Execute riscv_cpu_validate_set_extensions() before
riscv_timer_init().

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 10 --
  1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 7458845fec..fef55d7d79 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1237,12 +1237,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  return;
  }
  
-#ifndef CONFIG_USER_ONLY

-if (cpu->cfg.ext_sstc) {
-riscv_timer_init(cpu);
-}
-#endif /* CONFIG_USER_ONLY */
-
  riscv_cpu_validate_set_extensions(cpu, _err);
  if (local_err != NULL) {
  error_propagate(errp, local_err);
@@ -1250,6 +1244,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  }
  
  #ifndef CONFIG_USER_ONLY

+if (cpu->cfg.ext_sstc) {
+riscv_timer_init(cpu);
+}
+


Reviewed-by: LIU Zhiwei 

Zhiwei


  if (cpu->cfg.pmu_num) {
  if (!riscv_pmu_init(cpu, cpu->cfg.pmu_num) && cpu->cfg.ext_sscofpmf) {
  cpu->pmu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,

Re: [PATCH for-8.1 v4 07/25] target/riscv: move pmp and epmp validations to validate_set_extensions()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

In the near future, write_misa() will use a variation of what we have
now as riscv_cpu_validate_set_extensions(). The pmp and epmp validation
will be required in write_misa()


I don't know why pmp and epmp should be checked in write_misa().
As write_misa can't alter the pmp and epmp setting, the check for 
pmp/epmp should only be at cpu object initialization time for one time.


Zhiwei


and it's already required here in
riscv_cpu_realize(), so move it to riscv_cpu_validate_set_extensions().

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 19 +--
  1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1a298e5e55..7458845fec 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -916,6 +916,15 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
  Error *local_err = NULL;
  uint32_t ext = 0;
  
+if (cpu->cfg.epmp && !cpu->cfg.pmp) {

+/*
+ * Enhanced PMP should only be available
+ * on harts with PMP support
+ */
+error_setg(errp, "Invalid configuration: EPMP requires PMP support");
+return;
+}
+
  /* Do some ISA extension error checking */
  if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
  cpu->cfg.ext_a && cpu->cfg.ext_f &&
@@ -1228,16 +1237,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  return;
  }
  
-if (cpu->cfg.epmp && !cpu->cfg.pmp) {

-/*
- * Enhanced PMP should only be available
- * on harts with PMP support
- */
-error_setg(errp, "Invalid configuration: EPMP requires PMP support");
-return;
-}
-
-
  #ifndef CONFIG_USER_ONLY
  if (cpu->cfg.ext_sstc) {
  riscv_timer_init(cpu);

Re: [PATCH for-8.1 v4 06/25] target/riscv/cpu.c: add riscv_cpu_validate_misa_mxl()

2023-03-22 Thread LIU Zhiwei




On 2023/3/23 6:19, Daniel Henrique Barboza wrote:

Let's remove more code that is open coded in riscv_cpu_realize() and put
it into a helper. Let's also add an error message instead of just
asserting out if env->misa_mxl_max != env->misa_mlx.

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 51 ++
  1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 17b301967c..1a298e5e55 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -879,6 +879,33 @@ static void riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU 
*cpu)
  }
  }
  
+static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)

+{
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
+CPUClass *cc = CPU_CLASS(mcc);
+CPURISCVState *env = >env;
+
+/* Validate that MISA_MXL is set properly. */
+switch (env->misa_mxl_max) {
+#ifdef TARGET_RISCV64
+case MXL_RV64:
+case MXL_RV128:
+cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
+break;
+#endif
+case MXL_RV32:
+cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
+break;
+default:
+g_assert_not_reached();
+}
+
+if (env->misa_mxl_max != env->misa_mxl) {
+error_setg(errp, "misa_mxl_max must be equal to misa_mxl");
+return;
+}
+}
+
  /*
   * Check consistency between chosen extensions while setting
   * cpu->cfg accordingly, doing a set_misa() in the end.
@@ -1180,9 +1207,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  {
  CPUState *cs = CPU(dev);
  RISCVCPU *cpu = RISCV_CPU(dev);
-CPURISCVState *env = >env;
  RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
-CPUClass *cc = CPU_CLASS(mcc);
  Error *local_err = NULL;
  
  cpu_exec_realizefn(cs, _err);

@@ -1197,6 +1222,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  return;
  }
  
+riscv_cpu_validate_misa_mxl(cpu, _err);

+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
  if (cpu->cfg.epmp && !cpu->cfg.pmp) {
  /*
   * Enhanced PMP should only be available
@@ -1213,22 +1244,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
  }
  #endif /* CONFIG_USER_ONLY */
  
-/* Validate that MISA_MXL is set properly. */

-switch (env->misa_mxl_max) {
-#ifdef TARGET_RISCV64
-case MXL_RV64:
-case MXL_RV128:
-cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
-break;
-#endif
-case MXL_RV32:
-cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
-break;
-default:
-g_assert_not_reached();
-}
-assert(env->misa_mxl_max == env->misa_mxl);
-


Reviewed-by: LIU Zhiwei 

Zhiwei


  riscv_cpu_validate_set_extensions(cpu, _err);
  if (local_err != NULL) {
  error_propagate(errp, local_err);

Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM

2023-03-22 Thread Michael Roth

On Tue, Feb 21, 2023 at 08:11:35PM +0800, Chao Peng wrote:
> > Hi Sean,
> > 
> > We've rebased the SEV+SNP support onto your updated UPM base support
> > tree and things seem to be working okay, but we needed some fixups on
> > top of the base support get things working, along with 1 workaround
> > for an issue that hasn't been root-caused yet:
> > 
> >   https://github.com/mdroth/linux/commits/upmv10b-host-snp-v8-wip
> > 
> >   *stash (upm_base_support): mm: restrictedmem: Kirill's pinning 
> > implementation
> >   *workaround (use_base_support): mm: restrictedmem: loosen exclusivity 
> > check
> 
> What I'm seeing is Slot#3 gets added first and then deleted. When it's
> gets added, Slot#0 already has the same range bound to restrictedmem so
> trigger the exclusive check. This check is exactly the current code for.

With the following change in QEMU, we no longer trigger this check:

  diff --git a/hw/pci-host/q35.c b/hw/pci-host/q35.c
  index 20da121374..849b5de469 100644
  --- a/hw/pci-host/q35.c
  +++ b/hw/pci-host/q35.c
  @@ -588,9 +588,9 @@ static void mch_realize(PCIDevice *d, Error **errp)
   memory_region_init_alias(>open_high_smram, OBJECT(mch), 
"smram-open-high",
mch->ram_memory, MCH_HOST_BRIDGE_SMRAM_C_BASE,
MCH_HOST_BRIDGE_SMRAM_C_SIZE);
  +memory_region_set_enabled(>open_high_smram, false);
   memory_region_add_subregion_overlap(mch->system_memory, 0xfeda,
   >open_high_smram, 1);
  -memory_region_set_enabled(>open_high_smram, false);

I'm not sure if QEMU is actually doing something wrong here though or if
this check is putting tighter restrictions on userspace than what was
expected before. Will look into it more.

> 
> >   *fixup (upm_base_support): KVM: use inclusive ranges for restrictedmem 
> > binding/unbinding
> >   *fixup (upm_base_support): mm: restrictedmem: use inclusive ranges for 
> > issuing invalidations
> 
> As many kernel APIs treat 'end' as exclusive, I would rather keep using
> exclusive 'end' for these APIs(restrictedmem_bind/restrictedmem_unbind
> and notifier callbacks) but fix it internally in the restrictedmem. E.g.
> all the places where xarray API needs a 'last'/'max' we use 'end - 1'.
> See below for the change.

Yes I did feel like I was fighting the kernel a bit on that; your
suggestion seems like it would be a better fit.

> 
> >   *fixup (upm_base_support): KVM: fix restrictedmem GFN range calculations
> 
> Subtracting slot->restrictedmem.index for start/end in
> restrictedmem_get_gfn_range() is the correct fix.
> 
> >   *fixup (upm_base_support): KVM: selftests: CoCo compilation fixes
> > 
> > We plan to post an updated RFC for v8 soon, but also wanted to share
> > the staging tree in case you end up looking at the UPM integration aspects
> > before then.
> > 
> > -Mike
> 
> This is the restrictedmem fix to solve 'end' being stored and checked in 
> xarray:

Looks good.

Thanks!

-Mike

> 
> --- a/mm/restrictedmem.c
> +++ b/mm/restrictedmem.c
> @@ -46,12 +46,12 @@ static long restrictedmem_punch_hole(struct restrictedmem 
> *rm, int mode,
>  */
> down_read(>lock);
>  
> -   xa_for_each_range(>bindings, index, notifier, start, end)
> +   xa_for_each_range(>bindings, index, notifier, start, end - 1)
> notifier->ops->invalidate_start(notifier, start, end);
>  
> ret = memfd->f_op->fallocate(memfd, mode, offset, len);
>  
> -   xa_for_each_range(>bindings, index, notifier, start, end)
> +   xa_for_each_range(>bindings, index, notifier, start, end - 1)
> notifier->ops->invalidate_end(notifier, start, end);
>  
> up_read(>lock);
> @@ -224,7 +224,7 @@ static int restricted_error_remove_page(struct 
> address_space *mapping,
> }
> spin_unlock(>i_lock);
>  
> -   xa_for_each_range(>bindings, index, notifier, start, end)
> +   xa_for_each_range(>bindings, index, notifier, start, end 
> - 1)
> notifier->ops->error(notifier, start, end);
> break;
> }
> @@ -301,11 +301,12 @@ int restrictedmem_bind(struct file *file, pgoff_t 
> start, pgoff_t end,
> if (exclusive != rm->exclusive)
> goto out_unlock;
>  
> -   if (exclusive && xa_find(>bindings, , end, 
> XA_PRESENT))
> +   if (exclusive &&
> +   xa_find(>bindings, , end - 1, XA_PRESENT))
> goto out_unlock;
> }
>  
> -   xa_store_range(>bindings, start, end, notifier, GFP_KERNEL);
> +   xa_store_range(>bindings, start, end - 1, notifier, GFP_KERNEL);
> rm->exclusive = exclusive;
> ret = 0;
>  out_unlock:
> @@ -320,7 +321,7 @@ void restrictedmem_unbind(struct file *file, pgoff_t 
> start, pgoff_t end,
> struct restrictedmem *rm = file->f_mapping->private_data;
>  
>

Re: [PATCH v3] target/riscv: reduce overhead of MSTATUS_SUM change

2023-03-22 Thread Wu, Fei

On 3/23/2023 8:38 AM, Wu, Fei wrote:
> On 3/22/2023 9:19 PM, Richard Henderson wrote:
>> On 3/22/23 05:12, Fei Wu wrote:
>>> Kernel needs to access user mode memory e.g. during syscalls, the window
>>> is usually opened up for a very limited time through MSTATUS.SUM, the
>>> overhead is too much if tlb_flush() gets called for every SUM change.
>>>
>>> This patch creates a separate MMU index for S+SUM, so that it's not
>>> necessary to flush tlb anymore when SUM changes. This is similar to how
>>> ARM handles Privileged Access Never (PAN).
>>>
>>> Result of 'pipe 10' from unixbench boosts from 223656 to 1705006. Many
>>> other syscalls benefit a lot from this too.
>>>
>>> Signed-off-by: Fei Wu 
>>> ---
>>>   target/riscv/cpu-param.h  |  2 +-
>>>   target/riscv/cpu.h    |  2 +-
>>>   target/riscv/cpu_bits.h   |  1 +
>>>   target/riscv/cpu_helper.c | 11 +++
>>>   target/riscv/csr.c    |  2 +-
>>>   5 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
>>> index ebaf26d26d..9e21b943f9 100644
>>> --- a/target/riscv/cpu-param.h
>>> +++ b/target/riscv/cpu-param.h
>>> @@ -27,6 +27,6 @@
>>>    *  - S mode HLV/HLVX/HSV 0b101
>>>    *  - M mode HLV/HLVX/HSV 0b111
>>>    */
>>> -#define NB_MMU_MODES 8
>>> +#define NB_MMU_MODES 16
>>
>> This line no longer exists on master.
>> The comment above should be updated, and perhaps moved.
>>
>>>   #define TB_FLAGS_PRIV_MMU_MASK    3
>>> -#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
>>> +#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 3)
>>
>> You can't do this, as you're now overlapping
>>
> As you mentioned below HYP_ACCESS_MASK is set directly by hyp
> instruction translation, there is no overlapping if it's not part of
> TB_FLAGS.
> 
>> FIELD(TB_FLAGS, LMUL, 3, 3)
>>
>> You'd need to shift all other fields up to do this.
>> There is room, to be sure.
>>
>> Or you could reuse MMU mode number 2.  For that you'd need to separate
>> DisasContext.mem_idx from priv.  Which should probably be done anyway,
>> because tests such as
>>
> Yes, it looks good to reuse number 2. I tried this v3 patch again with a
> different MMUIdx_S_SUM number, only 5 is okay below 8, for the other
> number there is no kernel message from guest after opensbi output. I
> need to find it out.
> 
In get_physical_address():
int mode = mmu_idx & TB_FLAGS_PRIV_MMU_MASK;

We do need separate priv from idx.

Thanks,
Fei.

>> insn_trans/trans_privileged.c.inc:    if
>> (semihosting_enabled(ctx->mem_idx < PRV_S) &&
>>
>> are already borderline wrong.
>> Yes, it's better not to compare idx to priv.
> 
>> I suggest
>>
>> - #define TB_FLAGS_PRIV_MMU_MASK    3
>> - #define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
>>
>> HYP_ACCESS_MASK never needed to be part of TB_FLAGS; it is only set
>> directly by the hyp access instruction translation.  Drop the PRIV mask
>> and represent that directly:
>>
>> - FIELD(TB_FLAGS, MEM_IDX, 0, 3)
>> + FIELD(TB_FLAGS, PRIV, 0, 2)
>> + FIELD(TB_FLAGS, SUM, 2, 1)
>>
>> Let SUM occupy the released bit.
>>
>> In internals.h,
>>
>> /*
>>  * The current MMU Modes are:
>>  *  - U 0b000
>>  *  - S 0b001
>>  *  - S+SUM 0b010
>>  *  - M 0b011
>>  *  - HLV/HLVX/HSV adds 0b100
>>  */
>> #define MMUIdx_U    0
>> #define MMUIdx_S    1
>> #define MMUIdx_S_SUM    2
>> #define MMUIdx_M    3
>> #define MMU_HYP_ACCESS_BIT  (1 << 2)
>>
>>
>> In riscv_tr_init_disas_context:
>>
>>     ctx->priv = FIELD_EX32(tb_flags, TB_FLAGS, PRIV);
>>     ctx->mmu_idx = ctx->priv;
>>     if (ctx->mmu_idx == PRV_S && FIELD_EX32(tb_flags, TB_FLAGS, SUM)) {
>>     ctx->mmu_idx = MMUIdx_S_SUM;
>>     }
>>
> There is MSTATUS_MPRV and MSTATUS_MPP kind of thing, priv+sum is not
> able to represent all of the status, probably we can just add an extra
> 'priv' at the back of TB_FLAGS?
> 
> Thanks,
> Fei.
> 
>> and similarly in riscv_cpu_mmu_index.
>>
>> Fix all uses of ctx->mmu_idx that are not specifically for memory
>> operations.
>>
>>
>> r~
>

Re: [PATCH 3/3] Add support for TPM devices over I2C bus

2023-03-22 Thread Ninad Palsule




On 3/22/23 8:04 AM, Stefan Berger wrote:



On 3/22/23 07:50, Stefan Berger wrote:



On 3/22/23 07:28, Ninad Palsule wrote:


On 3/21/23 8:30 PM, Stefan Berger wrote:






I think there should be tpm_tis_set_data_buffer function that you 
can call rather than transferring the data byte-by-byte.


Thanks for the series!

  Stefan


I thought about it but the FIFO case performs multiple operations 
hence I did not want to change it. Currently there is no function to 
set data buffer in the common code.


It may not be correct to transfer it in one go, either. I just 
printed the I2C specs and I am going to look at them now.
When one writes TPM command data to the TIS the STS register has its 
TPM_TIS_STS_VALID bit set and TPM_TIS_STS_EXPECT bit reset once the 
command is complete. This would imply that you should not have a 
holding area for the command bytes but pass them on to the TIS 
immediately to get the effect of the STS register...


Regarding the registers defined for the I2C: You can pass the data 
onto the TIS but you should mask out input flags that are not defined 
for I2C and if the return value has flags not defined for I2C you 
should also mask those out as well. This applies to the TPM_INT_ENABLE 
& TPM_STS registers on read and write and to the TPM_INT_CAPABILITY on 
read. Also you should implement support for 
TPM_I2C_INTERACE_CAPABILITY on the I2C layer and return sensible 
values for the defined bits. The TPM_I2C_DEVICE_ADDRESS register 
should be handled probably assuming fixed address support only.



Good catch.

- Added capability conversion for TPM_I2C_INTERFACE_CAPABILITY.

- Added clearing of bits in TPM_STS register.

- Adde check to reject TPM_I2C_DEVICE_ADDRESS register.

- No changes are required for TPM_INT_ENABLE and TPM_INT_CAPABILITY as 
they have same bits between TPM TIS and TPM I2C.



Ideally there would be a test case similar to this one here 
https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-tis-util.c . 
However, I am not sure how easy it is to talk to I2C without a driver 
for it.

Ok, Thanks.


  Stefan



Thanks for the review!

Ninad Palsule

Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE

2023-03-22 Thread Isaku Yamahata

On Wed, Mar 08, 2023 at 03:40:26PM +0800,
Chao Peng  wrote:

> On Wed, Mar 08, 2023 at 12:13:24AM +, Ackerley Tng wrote:
> > Chao Peng  writes:
> > 
> > > On Sat, Jan 14, 2023 at 12:01:01AM +, Sean Christopherson wrote:
> > > > On Fri, Dec 02, 2022, Chao Peng wrote:
> > > ...
> > > > Strongly prefer to use similar logic to existing code that detects 
> > > > wraps:
> > 
> > > > mem->restricted_offset + mem->memory_size < 
> > > > mem->restricted_offset
> > 
> > > > This is also where I'd like to add the "gfn is aligned to offset"
> > > > check, though
> > > > my brain is too fried to figure that out right now.
> > 
> > > Used count_trailing_zeros() for this TODO, unsure we have other better
> > > approach.
> > 
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index afc8c26fa652..fd34c5f7cd2f 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -56,6 +56,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > 
> > >   #include "coalesced_mmio.h"
> > >   #include "async_pf.h"
> > > @@ -2087,6 +2088,19 @@ static bool kvm_check_memslot_overlap(struct
> > > kvm_memslots *slots, int id,
> > >   return false;
> > >   }
> > 
> > > +/*
> > > + * Return true when ALIGNMENT(offset) >= ALIGNMENT(gpa).
> > > + */
> > > +static bool kvm_check_rmem_offset_alignment(u64 offset, u64 gpa)
> > > +{
> > > + if (!offset)
> > > + return true;
> > > + if (!gpa)
> > > + return false;
> > > +
> > > + return !!(count_trailing_zeros(offset) >= count_trailing_zeros(gpa));

This check doesn't work expected. For example, offset = 2GB, gpa=4GB
this check fails.
I come up with the following.

>From ec87e25082f0497431b732702fae82c6a05071bf Mon Sep 17 00:00:00 2001
Message-Id: 

From: Isaku Yamahata 
Date: Wed, 22 Mar 2023 15:32:56 -0700
Subject: [PATCH] KVM: Relax alignment check for restricted mem

kvm_check_rmem_offset_alignment() only checks based on offset alignment
and GPA alignment.  However, the actual alignment for offset depends
on architecture.  For x86 case, it can be 1G, 2M or 4K.  So even if
GPA is aligned for 1G+, only 1G-alignment is required for offset.

Without this patch, gpa=4G, offset=2G results in failure of memory slot
creation.

Fixes: edc8814b2c77 ("KVM: Require gfn be aligned with restricted offset")
Signed-off-by: Isaku Yamahata 
---
 arch/x86/include/asm/kvm_host.h | 15 +++
 virt/kvm/kvm_main.c |  9 -
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 88e11dd3afde..03af44650f24 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -143,6 +144,20 @@
 #define KVM_HPAGE_MASK(x)  (~(KVM_HPAGE_SIZE(x) - 1))
 #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)
 
+#define kvm_arch_required_alignmentkvm_arch_required_alignment
+static inline int kvm_arch_required_alignment(u64 gpa)
+{
+   int zeros = count_trailing_zeros(gpa);
+
+   WARN_ON_ONCE(!PAGE_ALIGNED(gpa));
+   if (zeros >= KVM_HPAGE_SHIFT(PG_LEVEL_1G))
+   return KVM_HPAGE_SHIFT(PG_LEVEL_1G);
+   else if (zeros >= KVM_HPAGE_SHIFT(PG_LEVEL_2M))
+   return KVM_HPAGE_SHIFT(PG_LEVEL_2M);
+
+   return PAGE_SHIFT;
+}
+
 #define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
 #define KVM_MIN_ALLOC_MMU_PAGES 64UL
 #define KVM_MMU_HASH_SHIFT 12
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c9c4eef457b0..f4ff96171d24 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2113,6 +2113,13 @@ static bool kvm_check_memslot_overlap(struct 
kvm_memslots *slots, int id,
return false;
 }
 
+#ifndef kvm_arch_required_alignment
+__weak int kvm_arch_required_alignment(u64 gpa)
+{
+   return PAGE_SHIFT
+}
+#endif
+
 /*
  * Return true when ALIGNMENT(offset) >= ALIGNMENT(gpa).
  */
@@ -2123,7 +2130,7 @@ static bool kvm_check_rmem_offset_alignment(u64 offset, 
u64 gpa)
if (!gpa)
return false;
 
-   return !!(count_trailing_zeros(offset) >= count_trailing_zeros(gpa));
+   return !!(count_trailing_zeros(offset) >= 
kvm_arch_required_alignment(gpa));
 }
 
 /*
-- 
2.25.1



-- 
Isaku Yamahata

Re: [PATCH v3] target/riscv: reduce overhead of MSTATUS_SUM change

2023-03-22 Thread Wu, Fei

On 3/22/2023 9:19 PM, Richard Henderson wrote:
> On 3/22/23 05:12, Fei Wu wrote:
>> Kernel needs to access user mode memory e.g. during syscalls, the window
>> is usually opened up for a very limited time through MSTATUS.SUM, the
>> overhead is too much if tlb_flush() gets called for every SUM change.
>>
>> This patch creates a separate MMU index for S+SUM, so that it's not
>> necessary to flush tlb anymore when SUM changes. This is similar to how
>> ARM handles Privileged Access Never (PAN).
>>
>> Result of 'pipe 10' from unixbench boosts from 223656 to 1705006. Many
>> other syscalls benefit a lot from this too.
>>
>> Signed-off-by: Fei Wu 
>> ---
>>   target/riscv/cpu-param.h  |  2 +-
>>   target/riscv/cpu.h    |  2 +-
>>   target/riscv/cpu_bits.h   |  1 +
>>   target/riscv/cpu_helper.c | 11 +++
>>   target/riscv/csr.c    |  2 +-
>>   5 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/riscv/cpu-param.h b/target/riscv/cpu-param.h
>> index ebaf26d26d..9e21b943f9 100644
>> --- a/target/riscv/cpu-param.h
>> +++ b/target/riscv/cpu-param.h
>> @@ -27,6 +27,6 @@
>>    *  - S mode HLV/HLVX/HSV 0b101
>>    *  - M mode HLV/HLVX/HSV 0b111
>>    */
>> -#define NB_MMU_MODES 8
>> +#define NB_MMU_MODES 16
> 
> This line no longer exists on master.
> The comment above should be updated, and perhaps moved.
> 
>>   #define TB_FLAGS_PRIV_MMU_MASK    3
>> -#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
>> +#define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 3)
> 
> You can't do this, as you're now overlapping
> 
As you mentioned below HYP_ACCESS_MASK is set directly by hyp
instruction translation, there is no overlapping if it's not part of
TB_FLAGS.

> FIELD(TB_FLAGS, LMUL, 3, 3)
> 
> You'd need to shift all other fields up to do this.
> There is room, to be sure.
> 
> Or you could reuse MMU mode number 2.  For that you'd need to separate
> DisasContext.mem_idx from priv.  Which should probably be done anyway,
> because tests such as
> 
Yes, it looks good to reuse number 2. I tried this v3 patch again with a
different MMUIdx_S_SUM number, only 5 is okay below 8, for the other
number there is no kernel message from guest after opensbi output. I
need to find it out.

> insn_trans/trans_privileged.c.inc:    if
> (semihosting_enabled(ctx->mem_idx < PRV_S) &&
> 
> are already borderline wrong.
>Yes, it's better not to compare idx to priv.

> I suggest
> 
> - #define TB_FLAGS_PRIV_MMU_MASK    3
> - #define TB_FLAGS_PRIV_HYP_ACCESS_MASK   (1 << 2)
> 
> HYP_ACCESS_MASK never needed to be part of TB_FLAGS; it is only set
> directly by the hyp access instruction translation.  Drop the PRIV mask
> and represent that directly:
> 
> - FIELD(TB_FLAGS, MEM_IDX, 0, 3)
> + FIELD(TB_FLAGS, PRIV, 0, 2)
> + FIELD(TB_FLAGS, SUM, 2, 1)
> 
> Let SUM occupy the released bit.
> 
> In internals.h,
> 
> /*
>  * The current MMU Modes are:
>  *  - U 0b000
>  *  - S 0b001
>  *  - S+SUM 0b010
>  *  - M 0b011
>  *  - HLV/HLVX/HSV adds 0b100
>  */
> #define MMUIdx_U    0
> #define MMUIdx_S    1
> #define MMUIdx_S_SUM    2
> #define MMUIdx_M    3
> #define MMU_HYP_ACCESS_BIT  (1 << 2)
> 
> 
> In riscv_tr_init_disas_context:
> 
>     ctx->priv = FIELD_EX32(tb_flags, TB_FLAGS, PRIV);
>     ctx->mmu_idx = ctx->priv;
>     if (ctx->mmu_idx == PRV_S && FIELD_EX32(tb_flags, TB_FLAGS, SUM)) {
>     ctx->mmu_idx = MMUIdx_S_SUM;
>     }
> 
There is MSTATUS_MPRV and MSTATUS_MPP kind of thing, priv+sum is not
able to represent all of the status, probably we can just add an extra
'priv' at the back of TB_FLAGS?

Thanks,
Fei.

> and similarly in riscv_cpu_mmu_index.
> 
> Fix all uses of ctx->mmu_idx that are not specifically for memory
> operations.
> 
> 
> r~

[PATCH v6 3/3] qga: test: Add tests for `merged` flag

2023-03-22 Thread Daniel Xu

This commit adds a test to ensure `merged` functions as expected.
We also add a negative test to ensure we haven't regressed previous
functionality.

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel Xu 
---
 tests/unit/test-qga.c | 158 +-
 1 file changed, 141 insertions(+), 17 deletions(-)

diff --git a/tests/unit/test-qga.c b/tests/unit/test-qga.c
index b4e0a14573..360b4cab23 100644
--- a/tests/unit/test-qga.c
+++ b/tests/unit/test-qga.c
@@ -755,6 +755,31 @@ static void test_qga_fsfreeze_status(gconstpointer fix)
 g_assert_cmpstr(status, ==, "thawed");
 }
 
+static QDict *wait_for_guest_exec_completion(int fd, int64_t pid)
+{
+QDict *ret = NULL;
+int64_t now;
+bool exited;
+QDict *val;
+
+now = g_get_monotonic_time();
+do {
+ret = qmp_fd(fd,
+ "{'execute': 'guest-exec-status',"
+ " 'arguments': { 'pid': %" PRId64 " } }", pid);
+g_assert_nonnull(ret);
+val = qdict_get_qdict(ret, "return");
+exited = qdict_get_bool(val, "exited");
+if (!exited) {
+qobject_unref(ret);
+}
+} while (!exited &&
+ g_get_monotonic_time() < now + 5 * G_TIME_SPAN_SECOND);
+g_assert(exited);
+
+return ret;
+}
+
 static void test_qga_guest_exec(gconstpointer fix)
 {
 const TestFixture *fixture = fix;
@@ -762,9 +787,8 @@ static void test_qga_guest_exec(gconstpointer fix)
 QDict *val;
 const gchar *out;
 g_autofree guchar *decoded = NULL;
-int64_t pid, now, exitcode;
+int64_t pid, exitcode;
 gsize len;
-bool exited;
 
 /* exec 'echo foo bar' */
 ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
@@ -777,23 +801,10 @@ static void test_qga_guest_exec(gconstpointer fix)
 g_assert_cmpint(pid, >, 0);
 qobject_unref(ret);
 
-/* wait for completion */
-now = g_get_monotonic_time();
-do {
-ret = qmp_fd(fixture->fd,
- "{'execute': 'guest-exec-status',"
- " 'arguments': { 'pid': %" PRId64 " } }", pid);
-g_assert_nonnull(ret);
-val = qdict_get_qdict(ret, "return");
-exited = qdict_get_bool(val, "exited");
-if (!exited) {
-qobject_unref(ret);
-}
-} while (!exited &&
- g_get_monotonic_time() < now + 5 * G_TIME_SPAN_SECOND);
-g_assert(exited);
+ret = wait_for_guest_exec_completion(fixture->fd, pid);
 
 /* check stdout */
+val = qdict_get_qdict(ret, "return");
 exitcode = qdict_get_int(val, "exitcode");
 g_assert_cmpint(exitcode, ==, 0);
 out = qdict_get_str(val, "out-data");
@@ -802,6 +813,115 @@ static void test_qga_guest_exec(gconstpointer fix)
 g_assert_cmpstr((char *)decoded, ==, "\" test_str \"");
 }
 
+#if defined(G_OS_WIN32)
+static void test_qga_guest_exec_separated(gconstpointer fix)
+{
+}
+static void test_qga_guest_exec_merged(gconstpointer fix)
+{
+const TestFixture *fixture = fix;
+g_autoptr(QDict) ret = NULL;
+QDict *val;
+const gchar *class, *desc;
+g_autofree guchar *decoded = NULL;
+
+/* exec 'echo foo bar' */
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
+ " 'path': 'echo',"
+ " 'arg': [ 'execution never reaches here' ],"
+ " 'capture-output': 'merged' } }");
+
+g_assert_nonnull(ret);
+val = qdict_get_qdict(ret, "error");
+g_assert_nonnull(val);
+class = qdict_get_str(val, "class");
+desc = qdict_get_str(val, "desc");
+g_assert_cmpstr(class, ==, "GenericError");
+g_assert_cmpint(strlen(desc), >, 0);
+}
+#else
+static void test_qga_guest_exec_separated(gconstpointer fix)
+{
+const TestFixture *fixture = fix;
+g_autoptr(QDict) ret = NULL;
+QDict *val;
+const gchar *out, *err;
+g_autofree guchar *out_decoded = NULL;
+g_autofree guchar *err_decoded = NULL;
+int64_t pid, exitcode;
+gsize len;
+
+/* exec 'echo foo bar' */
+ret = qmp_fd(fixture->fd, "{'execute': 'guest-exec', 'arguments': {"
+ " 'path': '/bin/bash',"
+ " 'arg': [ '-c', 'for i in $(seq 4); do if (( $i %% 2 )); 
then echo stdout; else echo stderr 1>&2; fi; done;' ],"
+ " 'capture-output': 'separated' } }");
+g_assert_nonnull(ret);
+qmp_assert_no_error(ret);
+val = qdict_get_qdict(ret, "return");
+pid = qdict_get_int(val, "pid");
+g_assert_cmpint(pid, >, 0);
+qobject_unref(ret);
+
+ret = wait_for_guest_exec_completion(fixture->fd, pid);
+
+val = qdict_get_qdict(ret, "return");
+exitcode = qdict_get_int(val, "exitcode");
+g_assert_cmpint(exitcode, ==, 0);
+
+/* check stdout */
+out = qdict_get_str(val, "out-data");
+out_decoded = g_base64_decode(out, );
+g_assert_cmpint(len, ==, 14);
+g_assert_cmpstr((char *)out_decoded, ==, "stdout\nstdout\n");
+
+/* check stderr */
+err =

[PATCH v6 2/3] qga: Add `merged` variant to GuestExecCaptureOutputMode

2023-03-22 Thread Daniel Xu

Currently, any captured output (via `capture-output`) is segregated into
separate GuestExecStatus fields (`out-data` and `err-data`). This means
that downstream consumers have no way to reassemble the captured data
back into the original stream.

This is relevant for chatty and semi-interactive (ie. read only) CLI
tools.  Such tools may deliberately interleave stdout and stderr for
visual effect. If segregated, the output becomes harder to visually
understand.

This commit adds a new enum variant to the GuestExecCaptureOutputMode
qapi to merge the output streams such that consumers can have a pristine
view of the original command output.

Signed-off-by: Daniel Xu 
---
 qga/commands.c   | 25 +++--
 qga/qapi-schema.json |  5 -
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index 01f68b45ab..09c683e263 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -270,12 +270,26 @@ static void guest_exec_child_watch(GPid pid, gint status, 
gpointer data)
 g_spawn_close_pid(pid);
 }
 
-/** Reset ignored signals back to default. */
 static void guest_exec_task_setup(gpointer data)
 {
 #if !defined(G_OS_WIN32)
+bool has_merge = *(bool *)data;
 struct sigaction sigact;
 
+if (has_merge) {
+/*
+ * FIXME: When `GLIB_VERSION_MIN_REQUIRED` is bumped to 2.58+, use
+ * g_spawn_async_with_fds() to be portable on windows. The current
+ * logic does not work on windows b/c `GSpawnChildSetupFunc` is run
+ * inside the parent, not the child.
+ */
+if (dup2(STDOUT_FILENO, STDERR_FILENO) != 0) {
+slog("dup2() failed to merge stderr into stdout: %s",
+ strerror(errno));
+}
+}
+
+/* Reset ignored signals back to default. */
 memset(, 0, sizeof(struct sigaction));
 sigact.sa_handler = SIG_DFL;
 
@@ -409,6 +423,7 @@ GuestExec *qmp_guest_exec(const char *path,
 GIOChannel *in_ch, *out_ch, *err_ch;
 GSpawnFlags flags;
 bool has_output = false;
+bool has_merge = false;
 GuestExecCaptureOutputMode output_mode;
 g_autofree uint8_t *input = NULL;
 size_t ninput = 0;
@@ -445,13 +460,19 @@ GuestExec *qmp_guest_exec(const char *path,
 case GUEST_EXEC_CAPTURE_OUTPUT_MODE_SEPARATED:
 has_output = true;
 break;
+#if !defined(G_OS_WIN32)
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE_MERGED:
+has_output = true;
+has_merge = true;
+break;
+#endif
 case GUEST_EXEC_CAPTURE_OUTPUT_MODE__MAX:
 /* Silence warning; impossible branch */
 break;
 }
 
 ret = g_spawn_async_with_pipes(NULL, argv, envp, flags,
-guest_exec_task_setup, NULL, , input_data ? _fd : NULL,
+guest_exec_task_setup, _merge, , input_data ? _fd : 
NULL,
 has_output ? _fd : NULL, has_output ? _fd : NULL, );
 if (!ret) {
 error_setg(errp, QERR_QGA_COMMAND_FAILED, gerr->message);
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index d1e00a4234..39dd006d16 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1210,11 +1210,14 @@
 # @stderr: only capture stderr
 # @separated: capture both stdout and stderr, but separated into
 # GuestExecStatus out-data and err-data, respectively
+# @merged: capture both stdout and stderr, but merge together
+#  into out-data. not effective on windows guests.
 #
 # Since: 8.0
 ##
  { 'enum': 'GuestExecCaptureOutputMode',
-   'data': [ 'none', 'stdout', 'stderr', 'separated' ] }
+   'data': [ 'none', 'stdout', 'stderr', 'separated',
+ { 'name': 'merged', 'if': { 'not': 'CONFIG_WIN32' } } ] }
 
 ##
 # @GuestExecCaptureOutput:
-- 
2.39.1

[PATCH v6 1/3] qga: Refactor guest-exec capture-output to take enum

2023-03-22 Thread Daniel Xu

Previously capture-output was an optional boolean flag that either
captured all output or captured none. While this is OK in most cases, it
lacks flexibility for more advanced capture cases, such as wanting to
only capture stdout.

This commits refactors guest-exec qapi to take an enum for capture mode
instead while preserving backwards compatibility.

Suggested-by: Daniel P. Berrangé 
Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Daniel Xu 
---
 qga/commands.c   | 37 ++---
 qga/qapi-schema.json | 33 -
 2 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/qga/commands.c b/qga/commands.c
index 172826f8f8..01f68b45ab 100644
--- a/qga/commands.c
+++ b/qga/commands.c
@@ -379,11 +379,23 @@ close:
 return false;
 }
 
+static GuestExecCaptureOutputMode ga_parse_capture_output(
+GuestExecCaptureOutput *capture_output)
+{
+if (!capture_output)
+return GUEST_EXEC_CAPTURE_OUTPUT_MODE_NONE;
+else if (capture_output->type == QTYPE_QBOOL)
+return capture_output->u.flag ? 
GUEST_EXEC_CAPTURE_OUTPUT_MODE_SEPARATED
+  : GUEST_EXEC_CAPTURE_OUTPUT_MODE_NONE;
+else
+return capture_output->u.mode;
+}
+
 GuestExec *qmp_guest_exec(const char *path,
bool has_arg, strList *arg,
bool has_env, strList *env,
const char *input_data,
-   bool has_capture_output, bool capture_output,
+   GuestExecCaptureOutput *capture_output,
Error **errp)
 {
 GPid pid;
@@ -396,7 +408,8 @@ GuestExec *qmp_guest_exec(const char *path,
 gint in_fd, out_fd, err_fd;
 GIOChannel *in_ch, *out_ch, *err_ch;
 GSpawnFlags flags;
-bool has_output = (has_capture_output && capture_output);
+bool has_output = false;
+GuestExecCaptureOutputMode output_mode;
 g_autofree uint8_t *input = NULL;
 size_t ninput = 0;
 
@@ -415,8 +428,26 @@ GuestExec *qmp_guest_exec(const char *path,
 
 flags = G_SPAWN_SEARCH_PATH | G_SPAWN_DO_NOT_REAP_CHILD |
 G_SPAWN_SEARCH_PATH_FROM_ENVP;
-if (!has_output) {
+
+output_mode = ga_parse_capture_output(capture_output);
+switch (output_mode) {
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE_NONE:
 flags |= G_SPAWN_STDOUT_TO_DEV_NULL | G_SPAWN_STDERR_TO_DEV_NULL;
+break;
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE_STDOUT:
+has_output = true;
+flags |= G_SPAWN_STDERR_TO_DEV_NULL;
+break;
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE_STDERR:
+has_output = true;
+flags |= G_SPAWN_STDOUT_TO_DEV_NULL;
+break;
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE_SEPARATED:
+has_output = true;
+break;
+case GUEST_EXEC_CAPTURE_OUTPUT_MODE__MAX:
+/* Silence warning; impossible branch */
+break;
 }
 
 ret = g_spawn_async_with_pipes(NULL, argv, envp, flags,
diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
index 796434ed34..d1e00a4234 100644
--- a/qga/qapi-schema.json
+++ b/qga/qapi-schema.json
@@ -1200,6 +1200,37 @@
 { 'struct': 'GuestExec',
   'data': { 'pid': 'int'} }
 
+##
+# @GuestExecCaptureOutputMode:
+#
+# An enumeration of guest-exec capture modes.
+#
+# @none: do not capture any output
+# @stdout: only capture stdout
+# @stderr: only capture stderr
+# @separated: capture both stdout and stderr, but separated into
+# GuestExecStatus out-data and err-data, respectively
+#
+# Since: 8.0
+##
+ { 'enum': 'GuestExecCaptureOutputMode',
+   'data': [ 'none', 'stdout', 'stderr', 'separated' ] }
+
+##
+# @GuestExecCaptureOutput:
+#
+# Controls what guest-exec output gets captures.
+#
+# @flag: captures both stdout and stderr if true. Equivalent
+#to GuestExecCaptureOutputMode::all. (since 2.5)
+# @mode: capture mode; preferred interface
+#
+# Since: 8.0
+##
+ { 'alternate': 'GuestExecCaptureOutput',
+   'data': { 'flag': 'bool',
+ 'mode': 'GuestExecCaptureOutputMode'} }
+
 ##
 # @guest-exec:
 #
@@ -1218,7 +1249,7 @@
 ##
 { 'command': 'guest-exec',
   'data':{ 'path': 'str', '*arg': ['str'], '*env': ['str'],
-   '*input-data': 'str', '*capture-output': 'bool' },
+   '*input-data': 'str', '*capture-output': 
'GuestExecCaptureOutput' },
   'returns': 'GuestExec' }
 
 
-- 
2.39.1

[PATCH v6 0/3] qga: Support merging output streams in guest-exec

2023-03-22 Thread Daniel Xu

Currently, the captured output (via `capture-output`) is segregated into
separate GuestExecStatus fields (`out-data` and `err-data`). This means
that downstream consumers have no way to reassemble the captured data
back into the original stream.

This is relevant for chatty and semi-interactive (ie. read only) CLI
tools.  Such tools may deliberately interleave stdout and stderr for
visual effect. If segregated, the output becomes harder to visually
understand.

This patchset adds support for merging stderr and stdout output streams
via a backwards compatibile refactor and a new enum variant,
`merged`.

---

Changes from v5:
* Add qapi conditional for `merged` flag
* Remove error check for windows as above conditional enforces it

Changes from v4:
* Rename `all` -> `separated`
* Rename `all-merge` -> `merged`

Changes from v3:
* Split out ASAN fixes into separate patch series
* Refactor `capture-output` flag into an enum
* Avoid using /bin/bash on windows

Changes from v2:
* Error out if `merge-output` on windows guests
* Add FIXMEs for when glib is updated
* Fix memory leaks in qemu-keymap

Changes from v1:
* Drop invalid test fix
* Do not support `merge-output` on windows guests
* Fix a UAF in tests

Daniel Xu (3):
  qga: Refactor guest-exec capture-output to take enum
  qga: Add `merged` variant to GuestExecCaptureOutputMode
  qga: test: Add tests for `merged` flag

 qga/commands.c|  62 +++--
 qga/qapi-schema.json  |  36 +-
 tests/unit/test-qga.c | 158 +-
 3 files changed, 233 insertions(+), 23 deletions(-)

-- 
2.39.1

Re: [PATCH v5 2/3] qga: Add `merged` variant to GuestExecCaptureOutputMode

2023-03-22 Thread Daniel Xu

Hi Daniel,

Sorry about the delay -- was out of town the past week.

On Fri, Mar 10, 2023, at 2:24 AM, Daniel P. Berrangé wrote:
> On Thu, Mar 09, 2023 at 03:40:57PM -0700, Daniel Xu wrote:
>> Currently, any captured output (via `capture-output`) is segregated into
>> separate GuestExecStatus fields (`out-data` and `err-data`). This means
>> that downstream consumers have no way to reassemble the captured data
>> back into the original stream.
>> 
>> This is relevant for chatty and semi-interactive (ie. read only) CLI
>> tools.  Such tools may deliberately interleave stdout and stderr for
>> visual effect. If segregated, the output becomes harder to visually
>> understand.
>> 
>> This commit adds a new enum variant to the GuestExecCaptureOutputMode
>> qapi to merge the output streams such that consumers can have a pristine
>> view of the original command output.
>> 
>> Signed-off-by: Daniel Xu 
>> ---
>>  qga/commands.c   | 31 +--
>>  qga/qapi-schema.json |  4 +++-
>>  2 files changed, 32 insertions(+), 3 deletions(-)
>> 
>> diff --git a/qga/commands.c b/qga/commands.c
>> index 01f68b45ab..c347d434ed 100644
>> --- a/qga/commands.c
>> +++ b/qga/commands.c
>> @@ -270,12 +270,26 @@ static void guest_exec_child_watch(GPid pid, gint 
>> status, gpointer data)
>>  g_spawn_close_pid(pid);
>>  }
>>  
>> -/** Reset ignored signals back to default. */
>>  static void guest_exec_task_setup(gpointer data)
>>  {
>>  #if !defined(G_OS_WIN32)
>> +bool has_merge = *(bool *)data;
>>  struct sigaction sigact;
>>  
>> +if (has_merge) {
>> +/*
>> + * FIXME: When `GLIB_VERSION_MIN_REQUIRED` is bumped to 2.58+, use
>> + * g_spawn_async_with_fds() to be portable on windows. The current
>> + * logic does not work on windows b/c `GSpawnChildSetupFunc` is run
>> + * inside the parent, not the child.
>> + */
>> +if (dup2(STDOUT_FILENO, STDERR_FILENO) != 0) {
>> +slog("dup2() failed to merge stderr into stdout: %s",
>> + strerror(errno));
>> +}
>> +}
>> +
>> +/* Reset ignored signals back to default. */
>>  memset(, 0, sizeof(struct sigaction));
>>  sigact.sa_handler = SIG_DFL;
>>  
>> @@ -409,6 +423,7 @@ GuestExec *qmp_guest_exec(const char *path,
>>  GIOChannel *in_ch, *out_ch, *err_ch;
>>  GSpawnFlags flags;
>>  bool has_output = false;
>> +bool has_merge = false;
>
> Wrap in  #ifndef _WIN32

I think it would be better to leave this variable un-gated b/c gating it
would make the later call to g_spawn_async_with_pipes() less clean.
I don't think it'll trigger any unused variable warnings either since we
are technically always using it.

>
>>  GuestExecCaptureOutputMode output_mode;
>>  g_autofree uint8_t *input = NULL;
>>  size_t ninput = 0;
>> @@ -445,13 +460,25 @@ GuestExec *qmp_guest_exec(const char *path,
>>  case GUEST_EXEC_CAPTURE_OUTPUT_MODE_SEPARATED:
>>  has_output = true;
>>  break;
>> +case GUEST_EXEC_CAPTURE_OUTPUT_MODE_MERGED:
>> +has_output = true;
>> +has_merge = true;
>> +break;
>
> Wrap in  #ifndef _WIN32
>
>>  case GUEST_EXEC_CAPTURE_OUTPUT_MODE__MAX:
>>  /* Silence warning; impossible branch */
>>  break;
>>  }
>>  
>> +#if defined(G_OS_WIN32)
>> +/* FIXME: see comment in guest_exec_task_setup() */
>> +if (has_merge) {
>> +error_setg(errp, "merged unsupported on windows");
>> +return NULL;
>> +}
>> +#endif
>
> THis can be dropped, since 'has_merge' won't exist for
> Win32 builds.
>
>> +
>>  ret = g_spawn_async_with_pipes(NULL, argv, envp, flags,
>> -guest_exec_task_setup, NULL, , input_data ? _fd : NULL,
>> +guest_exec_task_setup, _merge, , input_data ? _fd : 
>> NULL,
>>  has_output ? _fd : NULL, has_output ? _fd : NULL, 
>> );
>>  if (!ret) {
>>  error_setg(errp, QERR_QGA_COMMAND_FAILED, gerr->message);
>> diff --git a/qga/qapi-schema.json b/qga/qapi-schema.json
>> index d1e00a4234..b4782525ae 100644
>> --- a/qga/qapi-schema.json
>> +++ b/qga/qapi-schema.json
>> @@ -1210,11 +1210,13 @@
>>  # @stderr: only capture stderr
>>  # @separated: capture both stdout and stderr, but separated into
>>  # GuestExecStatus out-data and err-data, respectively
>> +# @merged: capture both stdout and stderr, but merge together
>> +#  into out-data. not effective on windows guests.
>>  #
>>  # Since: 8.0
>>  ##
>>   { 'enum': 'GuestExecCaptureOutputMode',
>> -   'data': [ 'none', 'stdout', 'stderr', 'separated' ] }
>> +   'data': [ 'none', 'stdout', 'stderr', 'separated', 'merged' ] }
>
> Actually, I've just realized we can make this conditional:
>
>
>  'data': [ 'none', 'stdout', 'stderr', 'separated',
>{ 'name': 'merged', 'if': 'CONFIG_WIN32' } ] }
>
> so the constant doesn't even exist in Win32 builds.

Ack, that looks cleaner.

[...]

Thanks,
Daniel

[PATCH for-8.1 v4 15/25] target/riscv/cpu.c: split RVG code from validate_set_extensions()

2023-03-22 Thread Daniel Henrique Barboza

We can set all RVG related extensions during realize time, before
validate_set_extensions() itself. Put it in a separated function so the
validate function already uses the updated state.

Note that we're setting both cfg->ext_N and env->misa_ext bits, instead
of just setting cfg->ext_N. The intention here is to start syncing all
misa_ext operations with its cpu->cfg flags, in preparation to allow for
the validate function to operate using a misa_ext. This doesn't make any
difference for the current code state, but will be a requirement for
write_misa() later on.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 60 ++
 1 file changed, 45 insertions(+), 15 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f41888baa0..a7bad518be 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -281,6 +281,36 @@ static uint32_t 
riscv_get_misa_ext_with_cpucfg(RISCVCPUConfig *cfg)
 return ext;
 }
 
+static void riscv_cpu_enable_g(RISCVCPU *cpu, Error **errp)
+{
+CPURISCVState *env = >env;
+RISCVCPUConfig *cfg = >cfg;
+
+if (!(cfg->ext_i && cfg->ext_m && cfg->ext_a &&
+  cfg->ext_f && cfg->ext_d &&
+  cfg->ext_icsr && cfg->ext_ifencei)) {
+
+warn_report("Setting G will also set IMAFD_Zicsr_Zifencei");
+cfg->ext_i = true;
+env->misa_ext |= RVI;
+
+cfg->ext_m = true;
+env->misa_ext |= RVM;
+
+cfg->ext_a = true;
+env->misa_ext |= RVA;
+
+cfg->ext_f = true;
+env->misa_ext |= RVF;
+
+cfg->ext_d = true;
+env->misa_ext |= RVD;
+
+cfg->ext_icsr = true;
+cfg->ext_ifencei = true;
+}
+}
+
 static void riscv_set_cpucfg_with_misa(RISCVCPUConfig *cfg,
uint32_t misa_ext)
 {
@@ -1033,21 +1063,6 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-/* Do some ISA extension error checking */
-if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
-cpu->cfg.ext_a && cpu->cfg.ext_f &&
-cpu->cfg.ext_d &&
-cpu->cfg.ext_icsr && cpu->cfg.ext_ifencei)) {
-warn_report("Setting G will also set IMAFD_Zicsr_Zifencei");
-cpu->cfg.ext_i = true;
-cpu->cfg.ext_m = true;
-cpu->cfg.ext_a = true;
-cpu->cfg.ext_f = true;
-cpu->cfg.ext_d = true;
-cpu->cfg.ext_icsr = true;
-cpu->cfg.ext_ifencei = true;
-}
-
 if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
 error_setg(errp,
"I and E extensions are incompatible");
@@ -1290,6 +1305,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPUState *cs = CPU(dev);
 RISCVCPU *cpu = RISCV_CPU(dev);
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
+CPURISCVState *env = >env;
 Error *local_err = NULL;
 
 cpu_exec_realizefn(cs, _err);
@@ -1310,6 +1326,20 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (cpu->cfg.ext_g) {
+riscv_cpu_enable_g(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
+/*
+ * Sync env->misa_ext_mask with the new
+ * env->misa_ext val.
+ */
+env->misa_ext_mask = env->misa_ext;
+}
+
 riscv_cpu_validate_set_extensions(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-- 
2.39.2

[PATCH for-8.1 v4 25/25] target/riscv: handle RVG updates in write_misa()

2023-03-22 Thread Daniel Henrique Barboza

RVG is enabled when IMAFD_Zicsr_Zifencei is also enabled. Change
write_misa() to enable IMAFD if G is being written in the CSR.

Likewise, RVG should be disabled if any of IMAFD got disabled during the
process. Clear RVG in this case.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/csr.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 839862f1a8..1c0f438dfb 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1381,6 +1381,14 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
 val &= RVE;
 }
 
+if (val & RVG && !(env->misa_ext & RVG)) {
+/*
+ * If the write wants to enable RVG, enable all its
+ * dependencies as well.
+ */
+val |= RVI | RVM | RVA | RVF | RVD;
+}
+
 /*
  * This flow is similar to what riscv_cpu_realize() does,
  * with the difference that we will update env->misa_ext
@@ -1396,6 +1404,12 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
 return RISCV_EXCP_NONE;
 }
 
+if (!(val & RVI && val & RVM && val & RVA &&
+  val & RVF && val & RVD)) {
+/* Disable RVG if any of its dependencies were disabled */
+val &= ~RVG;
+}
+
 riscv_cpu_commit_cpu_cfg(cpu, val);
 
 if (!(val & RVF)) {
-- 
2.39.2

[PATCH for-8.1 v4 13/25] target/riscv: put env->misa_ext <-> cpu->cfg code into helpers

2023-03-22 Thread Daniel Henrique Barboza

The extremely tedious code that sets cpu->cfg based on misa_ext, and
vice-versa, is scattered around riscv_cpu_validate_set_extensions() and
set_misa().

Introduce helpers to do this work, cleaning up the logic of both
functions a bit. While we're at it, add a note in cpu.h informing that
any future change in MISA RV* bits should also be reflected in the
helpers as well.

We'll want to keep env->misa_ext changes in sync with cpu->cfg during
realize() in the next patches, and both helpers will have a role to play
in that.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 120 -
 target/riscv/cpu.h |   3 +-
 2 files changed, 65 insertions(+), 58 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0e56a1c01f..c4f18d0436 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -234,10 +234,69 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
 }
 }
 
-static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
+static uint32_t riscv_get_misa_ext_with_cpucfg(RISCVCPUConfig *cfg)
 {
-RISCVCPU *cpu;
+uint32_t ext = 0;
 
+if (cfg->ext_i) {
+ext |= RVI;
+}
+if (cfg->ext_e) {
+ext |= RVE;
+}
+if (cfg->ext_m) {
+ext |= RVM;
+}
+if (cfg->ext_a) {
+ext |= RVA;
+}
+if (cfg->ext_f) {
+ext |= RVF;
+}
+if (cfg->ext_d) {
+ext |= RVD;
+}
+if (cfg->ext_c) {
+ext |= RVC;
+}
+if (cfg->ext_s) {
+ext |= RVS;
+}
+if (cfg->ext_u) {
+ext |= RVU;
+}
+if (cfg->ext_h) {
+ext |= RVH;
+}
+if (cfg->ext_v) {
+ext |= RVV;
+}
+if (cfg->ext_j) {
+ext |= RVJ;
+}
+
+return ext;
+}
+
+static void riscv_set_cpucfg_with_misa(RISCVCPUConfig *cfg,
+   uint32_t misa_ext)
+{
+cfg->ext_i = misa_ext & RVI;
+cfg->ext_e = misa_ext & RVE;
+cfg->ext_m = misa_ext & RVM;
+cfg->ext_a = misa_ext & RVA;
+cfg->ext_f = misa_ext & RVF;
+cfg->ext_d = misa_ext & RVD;
+cfg->ext_v = misa_ext & RVV;
+cfg->ext_c = misa_ext & RVC;
+cfg->ext_s = misa_ext & RVS;
+cfg->ext_u = misa_ext & RVU;
+cfg->ext_h = misa_ext & RVH;
+cfg->ext_j = misa_ext & RVJ;
+}
+
+static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
+{
 env->misa_mxl_max = env->misa_mxl = mxl;
 env->misa_ext_mask = env->misa_ext = ext;
 
@@ -251,25 +310,7 @@ static void set_misa(CPURISCVState *env, RISCVMXL mxl, 
uint32_t ext)
 return;
 }
 
-/*
- * We can't use riscv_cpu_cfg() in this case because it is
- * a read-only inline and we're going to change the values
- * of cpu->cfg.
- */
-cpu = env_archcpu(env);
-
-cpu->cfg.ext_i = ext & RVI;
-cpu->cfg.ext_e = ext & RVE;
-cpu->cfg.ext_m = ext & RVM;
-cpu->cfg.ext_a = ext & RVA;
-cpu->cfg.ext_f = ext & RVF;
-cpu->cfg.ext_d = ext & RVD;
-cpu->cfg.ext_v = ext & RVV;
-cpu->cfg.ext_c = ext & RVC;
-cpu->cfg.ext_s = ext & RVS;
-cpu->cfg.ext_u = ext & RVU;
-cpu->cfg.ext_h = ext & RVH;
-cpu->cfg.ext_j = ext & RVJ;
+riscv_set_cpucfg_with_misa(_archcpu(env)->cfg, ext);
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -1153,42 +1194,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
  */
 riscv_cpu_disable_priv_spec_isa_exts(cpu);
 
-if (cpu->cfg.ext_i) {
-ext |= RVI;
-}
-if (cpu->cfg.ext_e) {
-ext |= RVE;
-}
-if (cpu->cfg.ext_m) {
-ext |= RVM;
-}
-if (cpu->cfg.ext_a) {
-ext |= RVA;
-}
-if (cpu->cfg.ext_f) {
-ext |= RVF;
-}
-if (cpu->cfg.ext_d) {
-ext |= RVD;
-}
-if (cpu->cfg.ext_c) {
-ext |= RVC;
-}
-if (cpu->cfg.ext_s) {
-ext |= RVS;
-}
-if (cpu->cfg.ext_u) {
-ext |= RVU;
-}
-if (cpu->cfg.ext_h) {
-ext |= RVH;
-}
-if (cpu->cfg.ext_v) {
-ext |= RVV;
-}
-if (cpu->cfg.ext_j) {
-ext |= RVJ;
-}
+ext = riscv_get_misa_ext_with_cpucfg(>cfg);
 
 env->misa_ext_mask = env->misa_ext = ext;
 }
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ebe0fff668..2263629332 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -66,7 +66,8 @@
 #define RV(x) ((target_ulong)1 << (x - 'A'))
 
 /*
- * Consider updating set_misa() when adding new
+ * Consider updating riscv_get_misa_ext_with_cpucfg()
+ * and riscv_set_cpucfg_with_misa() when adding new
  * MISA bits here.
  */
 #define RVI RV('I')
-- 
2.39.2

[PATCH for-8.1 v4 07/25] target/riscv: move pmp and epmp validations to validate_set_extensions()

2023-03-22 Thread Daniel Henrique Barboza

In the near future, write_misa() will use a variation of what we have
now as riscv_cpu_validate_set_extensions(). The pmp and epmp validation
will be required in write_misa() and it's already required here in
riscv_cpu_realize(), so move it to riscv_cpu_validate_set_extensions().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 19 +--
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1a298e5e55..7458845fec 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -916,6 +916,15 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 Error *local_err = NULL;
 uint32_t ext = 0;
 
+if (cpu->cfg.epmp && !cpu->cfg.pmp) {
+/*
+ * Enhanced PMP should only be available
+ * on harts with PMP support
+ */
+error_setg(errp, "Invalid configuration: EPMP requires PMP support");
+return;
+}
+
 /* Do some ISA extension error checking */
 if (cpu->cfg.ext_g && !(cpu->cfg.ext_i && cpu->cfg.ext_m &&
 cpu->cfg.ext_a && cpu->cfg.ext_f &&
@@ -1228,16 +1237,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (cpu->cfg.epmp && !cpu->cfg.pmp) {
-/*
- * Enhanced PMP should only be available
- * on harts with PMP support
- */
-error_setg(errp, "Invalid configuration: EPMP requires PMP support");
-return;
-}
-
-
 #ifndef CONFIG_USER_ONLY
 if (cpu->cfg.ext_sstc) {
 riscv_timer_init(cpu);
-- 
2.39.2

Re: [PATCH RESEND v2] hw/i2c: Enable an id for the pca954x devices

2023-03-22 Thread Philippe Mathieu-Daudé


On 22/3/23 22:19, Corey Minyard wrote:

On Wed, Mar 22, 2023 at 10:21:36AM -0700, Patrick Venture wrote:

This allows the devices to be more readily found and specified.
Without setting the name field, they can only be found by device type
name, which doesn't let you specify the second of the same device type
behind a bus.

Tested: Verified that by default the device was findable with the name
'pca954x[77]', for an instance attached at that address.


This looks good to me.

Acked-by: Corey Minyard 

if you are taking this in through another tree.  Or do you want me to
take this?


Since I have to send a MIPS PR, I'll take this one;
to alleviate you and the CI minutes.

[PATCH for-8.1 v4 19/25] target/riscv: write env->misa_ext* in register_generic_cpu_props()

2023-03-22 Thread Daniel Henrique Barboza

In the process of creating the user-facing flags in
register_generic_cpu_props() we're also setting default values for the
cpu->cfg flags that represents MISA bits.

Leaving it as is will cause a discrepancy between users of this function
(at this moment the non-named CPUs) and named CPUs. Named CPUs are using
set_misa() with a non-zero 'ext' value, writing cpu->cfg in the process.
They'll reach riscv_cpu_realize() in a state where env->misa_ext will
reflect cpu->cfg, allowing functions to choose whether to use
env->misa_ext or cpu->cfg to validate MISA bits.

If we guarantee that env->misa_ext will always reflect cpu->cfg at the
start of riscv_cpu_realize(), functions will be able to no longer rely
on cpu->cfg for MISA validation. This happens to be one blocker we have
to properly support write_misa().

Sync env->misa_ext* in register_generic_cpu_props(). After that, there
will be no more places where env->misa_ext needs to be sync back with
cpu->cfg, so remove the now obsolete code at the end of
riscv_cpu_validate_set_extensions().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index d2eb2b3ba1..f1e82a8dda 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1107,14 +1107,10 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 
 /*
  * Check consistency between chosen extensions while setting
- * cpu->cfg accordingly, setting env->misa_ext and
- * misa_ext_mask in the end.
+ * cpu->cfg accordingly.
  */
 static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
 {
-CPURISCVState *env = >env;
-uint32_t ext = 0;
-
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
 /*
  * Enhanced PMP should only be available
@@ -1231,10 +1227,6 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
  * validated and set everything we need.
  */
 riscv_cpu_disable_priv_spec_isa_exts(cpu);
-
-ext = riscv_get_misa_ext_with_cpucfg(>cfg);
-
-env->misa_ext_mask = env->misa_ext = ext;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -1345,6 +1337,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+/*
+ * This is the last point where env->misa_ext* can
+ * be changed.
+ */
 if (cpu->cfg.ext_g) {
 riscv_cpu_enable_g(cpu, _err);
 if (local_err != NULL) {
@@ -1622,10 +1618,12 @@ static Property riscv_cpu_extensions[] = {
  * Register generic CPU props with user-facing flags declared
  * in riscv_cpu_extensions[].
  *
- * Note that this will overwrite existing values in cpu->cfg.
+ * Note that this will overwrite existing values in cpu->cfg
+ * and MISA.
  */
 static void register_generic_cpu_props(Object *obj)
 {
+RISCVCPU *cpu = RISCV_CPU(obj);
 Property *prop;
 DeviceState *dev = DEVICE(obj);
 
@@ -1636,6 +1634,10 @@ static void register_generic_cpu_props(Object *obj)
 #ifndef CONFIG_USER_ONLY
 riscv_add_satp_mode_properties(obj);
 #endif
+
+/* Keep env->misa_ext and misa_ext_mask updated */
+cpu->env.misa_ext = riscv_get_misa_ext_with_cpucfg(>cfg);
+cpu->env.misa_ext_mask = cpu->env.misa_ext;
 }
 
 static Property riscv_cpu_properties[] = {
-- 
2.39.2

[PATCH for-8.1 v4 12/25] target/riscv/cpu.c: redesign register_cpu_props()

2023-03-22 Thread Daniel Henrique Barboza

Now that the function is a no-op if 'env.misa_ext != 0', and no one that
are setting misa_ext != 0 is calling it because set_misa() is setting
the cpu cfg accordingly, remove the now deprecated code and rename the
function to register_generic_cpu_props().

This function is now doing exactly what the name says: it is creating
user-facing properties to allow changes in the CPU cfg via the QEMU
command line, setting default values if no user input is provided.

Note that there's the possibility of a CPU to set a certain misa value
and, at the same, also want user-facing flags and defaults from this
function. This is not the case since commit 26b2bc58599c ("target/riscv:
Don't expose the CPU properties on names CPUs"), but given that this is
also a possibility, clarify in the function that using this function
will overwrite existing values in cpu->cfg.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 48 ++
 1 file changed, 10 insertions(+), 38 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index df5c0bda70..0e56a1c01f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -221,7 +221,7 @@ static const char * const riscv_intr_names[] = {
 "reserved"
 };
 
-static void register_cpu_props(Object *obj);
+static void register_generic_cpu_props(Object *obj);
 
 const char *riscv_cpu_get_trap_name(target_ulong cause, bool async)
 {
@@ -386,7 +386,7 @@ static void rv64_base_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 /* We set this in the realise function */
 set_misa(env, MXL_RV64, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
 /* Set latest version of privileged specification */
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
@@ -472,7 +472,7 @@ static void rv128_base_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 /* We set this in the realise function */
 set_misa(env, MXL_RV128, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
 /* Set latest version of privileged specification */
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
@@ -485,7 +485,7 @@ static void rv32_base_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 /* We set this in the realise function */
 set_misa(env, MXL_RV32, 0);
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
 /* Set latest version of privileged specification */
 env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
@@ -572,7 +572,7 @@ static void riscv_host_cpu_init(Object *obj)
 #elif defined(TARGET_RISCV64)
 set_misa(env, MXL_RV64, 0);
 #endif
-register_cpu_props(obj);
+register_generic_cpu_props(obj);
 }
 #endif
 
@@ -1554,44 +1554,16 @@ static Property riscv_cpu_extensions[] = {
 };
 
 /*
- * Register CPU props based on env.misa_ext. If a non-zero
- * value was set, register only the required cpu->cfg.ext_*
- * properties and leave. env.misa_ext = 0 means that we want
- * all the default properties to be registered.
+ * Register generic CPU props with user-facing flags declared
+ * in riscv_cpu_extensions[].
+ *
+ * Note that this will overwrite existing values in cpu->cfg.
  */
-static void register_cpu_props(Object *obj)
+static void register_generic_cpu_props(Object *obj)
 {
-RISCVCPU *cpu = RISCV_CPU(obj);
-uint32_t misa_ext = cpu->env.misa_ext;
 Property *prop;
 DeviceState *dev = DEVICE(obj);
 
-/*
- * If misa_ext is not zero, set cfg properties now to
- * allow them to be read during riscv_cpu_realize()
- * later on.
- */
-if (cpu->env.misa_ext != 0) {
-cpu->cfg.ext_i = misa_ext & RVI;
-cpu->cfg.ext_e = misa_ext & RVE;
-cpu->cfg.ext_m = misa_ext & RVM;
-cpu->cfg.ext_a = misa_ext & RVA;
-cpu->cfg.ext_f = misa_ext & RVF;
-cpu->cfg.ext_d = misa_ext & RVD;
-cpu->cfg.ext_v = misa_ext & RVV;
-cpu->cfg.ext_c = misa_ext & RVC;
-cpu->cfg.ext_s = misa_ext & RVS;
-cpu->cfg.ext_u = misa_ext & RVU;
-cpu->cfg.ext_h = misa_ext & RVH;
-cpu->cfg.ext_j = misa_ext & RVJ;
-
-/*
- * We don't want to set the default riscv_cpu_extensions
- * in this case.
- */
-return;
-}
-
 for (prop = riscv_cpu_extensions; prop && prop->name; prop++) {
 qdev_property_add_static(dev, prop);
 }
-- 
2.39.2

RE: [PATCH] vfio/pci: add support for VF token

2023-03-22 Thread Minwoo Im

> On Mon, 20 Mar 2023 11:03:40 +0100
> Cédric Le Goater  wrote:
> 
> > On 3/20/23 08:35, Minwoo Im wrote:
> > > VF token was introduced [1] to kernel vfio-pci along with SR-IOV
> > > support [2].  This patch adds support VF token among PF and VF(s). To
> > > passthu PCIe VF to a VM, kernel >= v5.7 needs this.
> > >
> > > It can be configured with UUID like:
> > >
> > >-device vfio-pci,host=:BB:DD:F,vf-token=,...
> > >
> > > [1] https://lore.kernel.org/linux-
> pci/158396393244.5601.10297430724964025753.st...@gimli.home/
> > > [2] https://lore.kernel.org/linux-
> pci/158396044753.5601.14804870681174789709.st...@gimli.home/
> > >
> > > Cc: Alex Williamson 
> > > Signed-off-by: Minwoo Im 
> > > Reviewed-by: Klaus Jensen 
> > > ---
> > >   hw/vfio/pci.c | 13 -
> > >   hw/vfio/pci.h |  1 +
> > >   2 files changed, 13 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index ec9a854361..cf27f28936 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -2856,6 +2856,8 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > > **errp)
> > >   int groupid;
> > >   int i, ret;
> > >   bool is_mdev;
> > > +char uuid[UUID_FMT_LEN];
> > > +char *name;
> > >
> > >   if (!vbasedev->sysfsdev) {
> > >   if (!(~vdev->host.domain || ~vdev->host.bus ||
> > > @@ -2936,7 +2938,15 @@ static void vfio_realize(PCIDevice *pdev, Error 
> > > **errp)
> > >   goto error;
> > >   }
> > >
> > > -ret = vfio_get_device(group, vbasedev->name, vbasedev, errp);
> > > +if (!qemu_uuid_is_null(>vf_token)) {
> > > +qemu_uuid_unparse(>vf_token, uuid);
> > > +name = g_strdup_printf("%s vf_token=%s", vbasedev->name, uuid);
> > > +} else {
> > > +name = vbasedev->name;
> > > +}
> > > +
> > > +ret = vfio_get_device(group, name, vbasedev, errp);
> > > +g_free(name);
> > >   if (ret) {
> > >   vfio_put_group(group);
> > >   goto error;
> >
> > Shouldn't we set the VF token in the kernel also ? See this QEMU 
> > implementation
> >
> >https://lore.kernel.org/lkml/20200204161737.34696...@w520.home/
> >
> > May be I misunderstood.
> >
> 
> I think you're referring to the part there that calls
> VFIO_DEVICE_FEATURE in order to set a VF token.  I don't think that's
> necessarily applicable here.  I believe this patch is only trying to
> make it so that QEMU can consume a VF associated with a PF owned by a
> userspace vfio driver, ie. not QEMU.

Yes, that's what this patch exactly does.

> 
> Setting the VF token is only relevant to PFs, which would require
> significantly more SR-IOV infrastructure in QEMU than sketched out in
> that proof-of-concept patch.  Even if we did have a QEMU owned PF where
> we wanted to generate VFs, the token we use in that case would likely
> need to be kept private to QEMU, not passed on the command line.
> Thanks,

Can we also take a command line property for the PF for that case that
QEMU owns a PF?  I think the one who wants to make QEMU owns PF or VF
should know the VF token.  If I've missed anything, please let me know.

Thanks!

[PATCH for-8.1 v4 05/25] target/riscv/cpu.c: add priv_spec validate/disable_exts helpers

2023-03-22 Thread Daniel Henrique Barboza

We're doing env->priv_spec validation and assignment at the start of
riscv_cpu_realize(), which is fine, but then we're doing a force disable
on extensions that aren't compatible with the priv version.

This second step is being done too early. The disabled extensions might be
re-enabled again in riscv_cpu_validate_set_extensions() by accident. A
better place to put this code is at the end of
riscv_cpu_validate_set_extensions() after all the validations are
completed.

Add a new helper, riscv_cpu_disable_priv_spec_isa_exts(), to disable the
extesions after the validation is done. While we're at it, create a
riscv_cpu_validate_priv_spec() helper to host all env->priv_spec related
validation to unclog riscv_cpu_realize a bit.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 91 --
 1 file changed, 56 insertions(+), 35 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1ee322001b..17b301967c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -833,6 +833,52 @@ static void riscv_cpu_validate_v(CPURISCVState *env, 
RISCVCPUConfig *cfg,
 env->vext_ver = vext_version;
 }
 
+static void riscv_cpu_validate_priv_spec(RISCVCPU *cpu, Error **errp)
+{
+CPURISCVState *env = >env;
+int priv_version = -1;
+
+if (cpu->cfg.priv_spec) {
+if (!g_strcmp0(cpu->cfg.priv_spec, "v1.12.0")) {
+priv_version = PRIV_VERSION_1_12_0;
+} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.11.0")) {
+priv_version = PRIV_VERSION_1_11_0;
+} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.10.0")) {
+priv_version = PRIV_VERSION_1_10_0;
+} else {
+error_setg(errp,
+   "Unsupported privilege spec version '%s'",
+   cpu->cfg.priv_spec);
+return;
+}
+
+env->priv_ver = priv_version;
+}
+}
+
+static void riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU *cpu)
+{
+CPURISCVState *env = >env;
+int i;
+
+/* Force disable extensions if priv spec version does not match */
+for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
+if (isa_ext_is_enabled(cpu, _edata_arr[i]) &&
+(env->priv_ver < isa_edata_arr[i].min_version)) {
+isa_ext_update_enabled(cpu, _edata_arr[i], false);
+#ifndef CONFIG_USER_ONLY
+warn_report("disabling %s extension for hart 0x" TARGET_FMT_lx
+" because privilege spec version does not match",
+isa_edata_arr[i].name, env->mhartid);
+#else
+warn_report("disabling %s extension because "
+"privilege spec version does not match",
+isa_edata_arr[i].name);
+#endif
+}
+}
+}
+
 /*
  * Check consistency between chosen extensions while setting
  * cpu->cfg accordingly, doing a set_misa() in the end.
@@ -1002,6 +1048,12 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 cpu->cfg.ext_zksh = true;
 }
 
+/*
+ * Disable isa extensions based on priv spec after we
+ * validated and set everything we need.
+ */
+riscv_cpu_disable_priv_spec_isa_exts(cpu);
+
 if (cpu->cfg.ext_i) {
 ext |= RVI;
 }
@@ -1131,7 +1183,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 CPUClass *cc = CPU_CLASS(mcc);
-int i, priv_version = -1;
 Error *local_err = NULL;
 
 cpu_exec_realizefn(cs, _err);
@@ -1140,40 +1191,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-if (cpu->cfg.priv_spec) {
-if (!g_strcmp0(cpu->cfg.priv_spec, "v1.12.0")) {
-priv_version = PRIV_VERSION_1_12_0;
-} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.11.0")) {
-priv_version = PRIV_VERSION_1_11_0;
-} else if (!g_strcmp0(cpu->cfg.priv_spec, "v1.10.0")) {
-priv_version = PRIV_VERSION_1_10_0;
-} else {
-error_setg(errp,
-   "Unsupported privilege spec version '%s'",
-   cpu->cfg.priv_spec);
-return;
-}
-}
-
-if (priv_version >= PRIV_VERSION_1_10_0) {
-env->priv_ver = priv_version;
-}
-
-/* Force disable extensions if priv spec version does not match */
-for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
-if (isa_ext_is_enabled(cpu, _edata_arr[i]) &&
-(env->priv_ver < isa_edata_arr[i].min_version)) {
-isa_ext_update_enabled(cpu, _edata_arr[i], false);
-#ifndef CONFIG_USER_ONLY
-warn_report("disabling %s extension for hart 0x" TARGET_FMT_lx
-" because privilege spec version does not match",
-isa_edata_arr[i].name, env->mhartid);
-#else
-warn_report("disabling %s

Re: [PULL v2 for 8.0 00/35] various fixes (testing, plugins, gitdm)

2023-03-22 Thread Alex Bennée



Peter Maydell  writes:

> On Wed, 22 Mar 2023 at 16:33, Alex Bennée  wrote:
>>
>> The following changes since commit c283ff89d11ff123efc9af49128ef58511f73012:
>>
>>   Update version for v8.0.0-rc1 release (2023-03-21 17:15:43 +)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/stsquad/qemu.git tags/pull-for-8.0-220323-1
>>
>> for you to fetch changes up to e35b9a2e81ccce86db6f1417b1d73bb97d7cbc17:
>>
>>   qtests: avoid printing comments before g_test_init() (2023-03-22
>>   15:08:26 +)
>>
>> Note you will need to remove the old openbsd disk image to trigger a
>> rebuild that avoids the issues with -ENOSPC. My pipeline can be seen
>> here:
>>
>>   https://gitlab.com/stsquad/qemu/-/pipelines/814624909
>>
>> 
>> Misc fixes for 8.0 (testing, plugins, gitdm)
>>
>>   - update Alpine image used for testing images
>>   - include libslirp in custom runner build env
>>   - update gitlab-runner recipe for CentOS
>>   - update docker calls for better caching behaviour
>>   - document some plugin callbacks
>>   - don't use tags to define drives for lkft baseline tests
>>   - fix missing clear of plugin_mem_cbs
>>   - fix iotests to report individual results
>>   - update the gitdm metadata for contributors
>>   - avoid printing comments before g_test_init()
>>   - probe for multiprocess support before running avocado test
>>   - refactor igb.py into netdev-ethtool.py avocado test
>>   - rebuild openbsd to have more space space for iotests
>
> I saw this on ppc64. I suspect it of being a pre-existing
> intermittent -- I'm retrying it.

On what platform is that?

>
> ▶ 737/761 qcow2 copy-before-write
>FAIL
> 737/761 qemu:block / io-qcow2-copy-before-write
>ERROR   6.80s   exit status 1
> ― ✀  ―
> stderr:
> --- /home/pm215/qemu/tests/qemu-iotests/tests/copy-before-write.out
> +++ 
> /home/pm215/qemu/build/all/scratch/qcow2-file-copy-before-write/copy-before-write.out.bad
> @@ -1,5 +1,21 @@
> -
> +..F.
> +==
> +FAIL: test_timeout_break_guest 
> (__main__.TestCbwError.test_timeout_break_guest)
> +--
> +Traceback (most recent call last):
> +  File "/home/pm215/qemu/tests/qemu-iotests/tests/copy-before-write",
> line 200, in test_timeout_break_guest
> +self.assertEqual(log, """\
> +AssertionError: 'wrot[90 chars])\nwrote 524288/524288 bytes at offset
> 524288\[151 chars]c)\n' != 'wrot[90 chars])\nwrite failed: Connection
> timed out\nread 10[85 chars]c)\n'
> +  wrote 524288/524288 bytes at offset 0
> +  512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> ++ write failed: Connection timed out
> +- wrote 524288/524288 bytes at offset 524288
> +- 512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +  read 1048576/1048576 bytes at offset 0
> +  1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
> +
> +
>  --
>  Ran 4 tests
>
> -OK
> +FAILED (failures=1)
>
> -- PMM


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro

[Bug 1703506] Re: SMT not supported by QEMU on AMD Ryzen CPU

2023-03-22 Thread Nelo

This affected me. Took me several days.


The solution posted by asd fghjkl (ryzen27) worked for me as  well:

sudo nano /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1
and then rebooted

I'm very glad i found this thread. Don't know where to report this or if
it's even a bug, But hope it gets fixed!

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1703506

Title:
  SMT not supported by QEMU on AMD Ryzen CPU

Status in QEMU:
  Expired

Bug description:
  HyperThreading/SMT is supported by AMD Ryzen CPUs but results in this
  message when setting the topology to threads=2:

  qemu-system-x86_64: AMD CPU doesn't support hyperthreading. Please
  configure -smp options properly.

  Checking in a Windows 10 guest reveals that SMT is not enabled, and
  from what I understand, QEMU converts the topology from threads to
  cores internally on AMD CPUs. This appears to cause performance
  problems in the guest perhaps because programs are assuming that these
  threads are actual cores.

  Software: Linux 4.12, qemu 2.9.0 host with KVM enabled, Windows 10 pro
  guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1703506/+subscriptions

[PATCH for-8.1 v4 22/25] target/riscv: use misa_ext val in riscv_cpu_validate_extensions()

2023-03-22 Thread Daniel Henrique Barboza

Similar to what we did with riscv_cpu_validate_misa_ext(), let's read
all MISA bits from a misa_ext val instead of reading from the cpu->cfg
object.

This will allow write_misa() to use riscv_cpu_validate_extensions().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ed02332093..0e6b8fb45e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1109,10 +1109,13 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 }
 
 /*
- * Check consistency between chosen extensions. No changes
- * in env->misa_ext are made.
+ * Check consistency between cpu->cfg extensions and a
+ * candidate misa_ext value. No changes in env->misa_ext
+ * are made.
  */
-static void riscv_cpu_validate_extensions(RISCVCPU *cpu, Error **errp)
+static void riscv_cpu_validate_extensions(RISCVCPU *cpu,
+  uint32_t misa_ext,
+  Error **errp)
 {
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
 /*
@@ -1123,12 +1126,12 @@ static void riscv_cpu_validate_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) {
+if (misa_ext & RVF && !cpu->cfg.ext_icsr) {
 error_setg(errp, "F extension requires Zicsr");
 return;
 }
 
-if ((cpu->cfg.ext_zawrs) && !cpu->cfg.ext_a) {
+if ((cpu->cfg.ext_zawrs) && !(misa_ext & RVA)) {
 error_setg(errp, "Zawrs extension requires A extension");
 return;
 }
@@ -1137,13 +1140,13 @@ static void riscv_cpu_validate_extensions(RISCVCPU 
*cpu, Error **errp)
 cpu->cfg.ext_zfhmin = true;
 }
 
-if (cpu->cfg.ext_zfhmin && !cpu->cfg.ext_f) {
+if (cpu->cfg.ext_zfhmin && !(misa_ext & RVF)) {
 error_setg(errp, "Zfh/Zfhmin extensions require F extension");
 return;
 }
 
 /* The V vector extension depends on the Zve64d extension */
-if (cpu->cfg.ext_v) {
+if (misa_ext & RVV) {
 cpu->cfg.ext_zve64d = true;
 }
 
@@ -1157,12 +1160,12 @@ static void riscv_cpu_validate_extensions(RISCVCPU 
*cpu, Error **errp)
 cpu->cfg.ext_zve32f = true;
 }
 
-if (cpu->cfg.ext_zve64d && !cpu->cfg.ext_d) {
+if (cpu->cfg.ext_zve64d && !(misa_ext & RVD)) {
 error_setg(errp, "Zve64d/V extensions require D extension");
 return;
 }
 
-if (cpu->cfg.ext_zve32f && !cpu->cfg.ext_f) {
+if (cpu->cfg.ext_zve32f && !(misa_ext & RVF)) {
 error_setg(errp, "Zve32f/Zve64f extensions require F extension");
 return;
 }
@@ -1195,7 +1198,7 @@ static void riscv_cpu_validate_extensions(RISCVCPU *cpu, 
Error **errp)
 error_setg(errp, "Zfinx extension requires Zicsr");
 return;
 }
-if (cpu->cfg.ext_f) {
+if (misa_ext & RVF) {
 error_setg(errp,
"Zfinx cannot be supported together with F extension");
 return;
@@ -1367,7 +1370,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-riscv_cpu_validate_extensions(cpu, _err);
+riscv_cpu_validate_extensions(cpu, env->misa_ext, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 return;
-- 
2.39.2

[PATCH for-8.1 v4 14/25] target/riscv: add RVG

2023-03-22 Thread Daniel Henrique Barboza

The 'G' bit in misa_ext is a virtual extension that enables a set of
extensions (i, m, a, f, d, icsr and ifencei). We're already have code to
handle it but no bit definition. Add it.

Add RVG to set_misa() in rv64_thead_c906_cpu_init() and remove the
manual cpu->cfg.ext_g assignment while we're at it.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 8 ++--
 target/riscv/cpu.h | 1 +
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c4f18d0436..f41888baa0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -274,6 +274,9 @@ static uint32_t 
riscv_get_misa_ext_with_cpucfg(RISCVCPUConfig *cfg)
 if (cfg->ext_j) {
 ext |= RVJ;
 }
+if (cfg->ext_g) {
+ext |= RVG;
+}
 
 return ext;
 }
@@ -293,6 +296,7 @@ static void riscv_set_cpucfg_with_misa(RISCVCPUConfig *cfg,
 cfg->ext_u = misa_ext & RVU;
 cfg->ext_h = misa_ext & RVH;
 cfg->ext_j = misa_ext & RVJ;
+cfg->ext_g = misa_ext & RVG;
 }
 
 static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
@@ -474,10 +478,10 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 RISCVCPU *cpu = RISCV_CPU(obj);
 
-set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
+set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD |
+RVC | RVS | RVU | RVG);
 env->priv_ver = PRIV_VERSION_1_11_0;
 
-cpu->cfg.ext_g = true;
 cpu->cfg.ext_icsr = true;
 cpu->cfg.ext_zfh = true;
 cpu->cfg.mmu = true;
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 2263629332..dbb4df9df0 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -82,6 +82,7 @@
 #define RVU RV('U')
 #define RVH RV('H')
 #define RVJ RV('J')
+#define RVG RV('G')
 
 
 /* Privileged specification version */
-- 
2.39.2

Re: [PATCH RESEND v2] hw/i2c: Enable an id for the pca954x devices

2023-03-22 Thread Corey Minyard

On Wed, Mar 22, 2023 at 10:21:36AM -0700, Patrick Venture wrote:
> This allows the devices to be more readily found and specified.
> Without setting the name field, they can only be found by device type
> name, which doesn't let you specify the second of the same device type
> behind a bus.
> 
> Tested: Verified that by default the device was findable with the name
> 'pca954x[77]', for an instance attached at that address.

This looks good to me.

Acked-by: Corey Minyard 

if you are taking this in through another tree.  Or do you want me to
take this?

-corey

> 
> Signed-off-by: Patrick Venture 
> Reviewed-by: Hao Wu 
> Reviewed-by: Philippe Mathieu-Daudé 
> ---
> v2: s/id/name/g to use name as the identifier field. left 'id' in subject for 
> email chain.
> ---
>  hw/i2c/i2c_mux_pca954x.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/hw/i2c/i2c_mux_pca954x.c b/hw/i2c/i2c_mux_pca954x.c
> index 3945de795c..76e69bebc5 100644
> --- a/hw/i2c/i2c_mux_pca954x.c
> +++ b/hw/i2c/i2c_mux_pca954x.c
> @@ -20,6 +20,7 @@
>  #include "hw/i2c/i2c_mux_pca954x.h"
>  #include "hw/i2c/smbus_slave.h"
>  #include "hw/qdev-core.h"
> +#include "hw/qdev-properties.h"
>  #include "hw/sysbus.h"
>  #include "qemu/log.h"
>  #include "qemu/module.h"
> @@ -43,6 +44,8 @@ typedef struct Pca954xState {
>  
>  bool enabled[PCA9548_CHANNEL_COUNT];
>  I2CBus *bus[PCA9548_CHANNEL_COUNT];
> +
> +char *name;
>  } Pca954xState;
>  
>  /*
> @@ -181,6 +184,17 @@ static void pca9548_class_init(ObjectClass *klass, void 
> *data)
>  s->nchans = PCA9548_CHANNEL_COUNT;
>  }
>  
> +static void pca954x_realize(DeviceState *dev, Error **errp)
> +{
> +Pca954xState *s = PCA954X(dev);
> +DeviceState *d = DEVICE(s);
> +if (s->name) {
> +d->id = g_strdup(s->name);
> +} else {
> +d->id = g_strdup_printf("pca954x[%x]", s->parent.i2c.address);
> +}
> +}
> +
>  static void pca954x_init(Object *obj)
>  {
>  Pca954xState *s = PCA954X(obj);
> @@ -197,6 +211,11 @@ static void pca954x_init(Object *obj)
>  }
>  }
>  
> +static Property pca954x_props[] = {
> +DEFINE_PROP_STRING("nane", Pca954xState, name),
> +DEFINE_PROP_END_OF_LIST()
> +};
> +
>  static void pca954x_class_init(ObjectClass *klass, void *data)
>  {
>  I2CSlaveClass *sc = I2C_SLAVE_CLASS(klass);
> @@ -209,9 +228,12 @@ static void pca954x_class_init(ObjectClass *klass, void 
> *data)
>  rc->phases.enter = pca954x_enter_reset;
>  
>  dc->desc = "Pca954x i2c-mux";
> +dc->realize = pca954x_realize;
>  
>  k->write_data = pca954x_write_data;
>  k->receive_byte = pca954x_read_byte;
> +
> +device_class_set_props(dc, pca954x_props);
>  }
>  
>  static const TypeInfo pca954x_info[] = {
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
> 


smime.p7s
Description: S/MIME cryptographic signature

[PATCH] hw/arm/virt: support both pl011 and 16550 uart

2023-03-22 Thread Patrick Venture

From: Shu-Chun Weng 

Select uart for virt machine from pl011 and ns16550a with
-M virt,uart={pl011|ns16550a}.

Signed-off-by: Shu-Chun Weng 
Signed-off-by: Patrick Venture 
---
 hw/arm/virt.c | 85 ++-
 include/hw/arm/virt.h |  6 +++
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..84b335a5d7 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -80,6 +80,7 @@
 #include "hw/virtio/virtio-iommu.h"
 #include "hw/char/pl011.h"
 #include "qemu/guest-random.h"
+#include "hw/char/serial.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
 static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -847,8 +848,37 @@ static void create_gic(VirtMachineState *vms, MemoryRegion 
*mem)
 }
 }
 
-static void create_uart(const VirtMachineState *vms, int uart,
-MemoryRegion *mem, Chardev *chr)
+static void create_uart_ns16550a(const VirtMachineState *vms, int uart,
+ MemoryRegion *mem, Chardev *chr)
+{
+char *nodename;
+hwaddr base = vms->memmap[uart].base;
+hwaddr size = vms->memmap[uart].size;
+int irq = vms->irqmap[uart];
+const char compat[] = "ns16550a";
+
+serial_mm_init(get_system_memory(), base, 0,
+   qdev_get_gpio_in(vms->gic, irq), 19200, serial_hd(0),
+   DEVICE_LITTLE_ENDIAN);
+
+nodename = g_strdup_printf("/serial@%" PRIx64, base);
+
+MachineState *ms = MACHINE(vms);
+
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop(ms->fdt, nodename, "compatible", compat, sizeof(compat));
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", 2, base, 2, size);
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "clock-frequency",
+ 1, 0x825f0);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
+   GIC_FDT_IRQ_TYPE_SPI, irq,
+   GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+
+g_free(nodename);
+}
+
+static void create_uart_pl011(const VirtMachineState *vms, int uart,
+  MemoryRegion *mem, Chardev *chr)
 {
 char *nodename;
 hwaddr base = vms->memmap[uart].base;
@@ -895,6 +925,16 @@ static void create_uart(const VirtMachineState *vms, int 
uart,
 g_free(nodename);
 }
 
+static void create_uart(const VirtMachineState *vms, int uart,
+MemoryRegion *mem, Chardev *chr)
+{
+if (vms->uart == UART_NS16550A) {
+create_uart_ns16550a(vms, uart, mem, chr);
+} else {
+create_uart_pl011(vms, uart, mem, chr);
+}
+}
+
 static void create_rtc(const VirtMachineState *vms)
 {
 char *nodename;
@@ -2601,6 +2641,39 @@ static void virt_set_gic_version(Object *obj, const char 
*value, Error **errp)
 }
 }
 
+static char *virt_get_uart_type(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+const char *val = NULL;
+
+switch (vms->uart) {
+case UART_PL011:
+val = "pl011";
+break;
+case UART_NS16550A:
+val = "ns16550a";
+break;
+default:
+error_setg(errp, "Invalid uart value");
+}
+
+return g_strdup(val);
+}
+
+static void virt_set_uart_type(Object *obj, const char *value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+if (!strcmp(value, "pl011")) {
+vms->uart = UART_PL011;
+} else if (!strcmp(value, "ns16550a")) {
+vms->uart = UART_NS16550A;
+} else {
+error_setg(errp, "Invalid uart type");
+error_append_hint(errp, "Valid values are pl011, and ns16550a.\n");
+}
+}
+
 static char *virt_get_iommu(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -3172,6 +3245,14 @@ static void virt_instance_init(Object *obj)
 vms->highmem_compact = !vmc->no_highmem_compact;
 vms->gic_version = VIRT_GIC_VERSION_NOSEL;
 
+/* Default uart type is pl011 */
+vms->uart = UART_PL011;
+object_property_add_str(obj, "uart", virt_get_uart_type,
+virt_set_uart_type);
+object_property_set_description(obj, "uart",
+"Set uart type. "
+"Valid values are pl011 and ns16550a");
+
 vms->highmem_ecam = !vmc->no_highmem_ecam;
 vms->highmem_mmio = true;
 vms->highmem_redists = true;
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index e1ddbea96b..04539f347d 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -122,6 +122,11 @@ typedef enum VirtGICType {
 #define VIRT_GIC_VERSION_3_MASK BIT(VIRT_GIC_VERSION_3)
 #define VIRT_GIC_VERSION_4_MASK BIT(VIRT_GIC_VERSION_4)
 
+typedef enum UartType {
+UART_PL011,
+UART_NS16550A,
+} UartType;
+
 struct VirtMachineClass {
 MachineClass parent;
 bool disallow_affinity_adjustment;
@@ -183,6 +188,7 @@ struct VirtMachineState {

Re: [PULL v2 for 8.0 00/35] various fixes (testing, plugins, gitdm)

2023-03-22 Thread Peter Maydell

On Wed, 22 Mar 2023 at 21:54, Alex Bennée  wrote:
>
>
> Peter Maydell  writes:
>
> > On Wed, 22 Mar 2023 at 16:33, Alex Bennée  wrote:
> >>
> >> The following changes since commit 
> >> c283ff89d11ff123efc9af49128ef58511f73012:
> >>
> >>   Update version for v8.0.0-rc1 release (2023-03-21 17:15:43 +)
> >>
> >> are available in the Git repository at:
> >>
> >>   https://gitlab.com/stsquad/qemu.git tags/pull-for-8.0-220323-1
> >>
> >> for you to fetch changes up to e35b9a2e81ccce86db6f1417b1d73bb97d7cbc17:
> >>
> >>   qtests: avoid printing comments before g_test_init() (2023-03-22
> >>   15:08:26 +)
> >>
> >> Note you will need to remove the old openbsd disk image to trigger a
> >> rebuild that avoids the issues with -ENOSPC. My pipeline can be seen
> >> here:
> >>
> >>   https://gitlab.com/stsquad/qemu/-/pipelines/814624909
> >>
> >> 
> >> Misc fixes for 8.0 (testing, plugins, gitdm)
> >>
> >>   - update Alpine image used for testing images
> >>   - include libslirp in custom runner build env
> >>   - update gitlab-runner recipe for CentOS
> >>   - update docker calls for better caching behaviour
> >>   - document some plugin callbacks
> >>   - don't use tags to define drives for lkft baseline tests
> >>   - fix missing clear of plugin_mem_cbs
> >>   - fix iotests to report individual results
> >>   - update the gitdm metadata for contributors
> >>   - avoid printing comments before g_test_init()
> >>   - probe for multiprocess support before running avocado test
> >>   - refactor igb.py into netdev-ethtool.py avocado test
> >>   - rebuild openbsd to have more space space for iotests
> >
> > I saw this on ppc64. I suspect it of being a pre-existing
> > intermittent -- I'm retrying it.
>
> On what platform is that?

ppc64be Linux (it's one of the gcc compile farm machines).
It was indeed intermittent, in that it didn't happen on retry. So:

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.

-- PMM

Re: [PATCH v2 6/7] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-03-22 Thread Richard Henderson


On 3/22/23 13:33, Aaron Lindsay wrote:

On Feb 22 11:37, Richard Henderson wrote:

On 2/22/23 09:35, Aaron Lindsay wrote:

@@ -406,6 +421,16 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, 
uint64_t modifier,
   uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 1) &
   ~MAKE_64BIT_MASK(55, 1);
   result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
+if (cpu_isar_feature(aa64_fpac_combine, env_archcpu(env)) ||
+(cpu_isar_feature(aa64_fpac, env_archcpu(env)) &&
+ !is_combined)) {


Indentation is off.


I pulled `env_archcpu(env)` out of this if-statement in my latest
patchset in addition to the indentation, but am not confident I have
done what you intended. The QEMU Coding Style guide doesn't seem to
address longer statements like this in its section on indentation, so I
attempted to follow other examples in the code, but I'll take further
direction here.



if (function(a) ||
(function(b) &&
 function(c))) {
...
1234567890


r~

[PATCH for-8.1 v4 09/25] target/riscv/cpu.c: remove cfg setup from riscv_cpu_init()

2023-03-22 Thread Daniel Henrique Barboza

We have 4 config settings being done in riscv_cpu_init(): ext_ifencei,
ext_icsr, mmu and pmp. This is also the constructor of the "riscv-cpu"
device, which happens to be the parent device of every RISC-V cpu.

The result is that these 4 configs are being set every time, and every
other CPU should always account for them. CPUs such as sifive_e need to
disable settings that aren't enabled simply because the parent class
happens to be enabling it.

Moving all configurations from the parent class to each CPU will
centralize the config of each CPU into its own init(), which is clearer
than having to account to whatever happens to be set in the parent
device. These settings are also being set in register_cpu_props() when
no 'misa_ext' is set, so for these CPUs we don't need changes. Named
CPUs will receive all cfgs that the parent were setting into their
init().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 60 --
 1 file changed, 48 insertions(+), 12 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index fef55d7d79..c7b6e7b84b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -325,7 +325,8 @@ static void set_satp_mode_default_map(RISCVCPU *cpu)
 
 static void riscv_any_cpu_init(Object *obj)
 {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
 #if defined(TARGET_RISCV32)
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVU);
 #elif defined(TARGET_RISCV64)
@@ -340,6 +341,12 @@ static void riscv_any_cpu_init(Object *obj)
 
 env->priv_ver = PRIV_VERSION_LATEST;
 register_cpu_props(obj);
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
 }
 
 #if defined(TARGET_RISCV64)
@@ -358,13 +365,20 @@ static void rv64_base_cpu_init(Object *obj)
 
 static void rv64_sifive_u_cpu_init(Object *obj)
 {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
 register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
 #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
 }
 
 static void rv64_sifive_e_cpu_init(Object *obj)
@@ -375,10 +389,14 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
 register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
-cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
 #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.pmp = true;
 }
 
 static void rv64_thead_c906_cpu_init(Object *obj)
@@ -411,6 +429,10 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_SV39);
 #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.pmp = true;
 }
 
 static void rv128_base_cpu_init(Object *obj)
@@ -447,7 +469,8 @@ static void rv32_base_cpu_init(Object *obj)
 
 static void rv32_sifive_u_cpu_init(Object *obj)
 {
-CPURISCVState *env = _CPU(obj)->env;
+RISCVCPU *cpu = RISCV_CPU(obj);
+CPURISCVState *env = >env;
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
 
 register_cpu_props(obj);
@@ -455,6 +478,12 @@ static void rv32_sifive_u_cpu_init(Object *obj)
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
 #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.mmu = true;
+cpu->cfg.pmp = true;
 }
 
 static void rv32_sifive_e_cpu_init(Object *obj)
@@ -465,10 +494,14 @@ static void rv32_sifive_e_cpu_init(Object *obj)
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
 register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
-cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
 #endif
+
+/* inherited from parent obj via riscv_cpu_init() */
+cpu->cfg.ext_ifencei = true;
+cpu->cfg.ext_icsr = true;
+cpu->cfg.pmp = true;
 }
 
 static void rv32_ibex_cpu_init(Object *obj)
@@ -479,11 +512,15 @@ static void rv32_ibex_cpu_init(Object *obj)
 set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
 register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_11_0;
-cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu,

Re: [PATCH] hw/acpi/cxl: Drop device-memory support from CFMWS entries

2023-03-22 Thread Dan Williams

Jonathan Cameron wrote:
> On Mon, 20 Mar 2023 23:08:31 -0700
> Dan Williams  wrote:
> 
> > While it was a reasonable idea to specify no window restricitions at the
> > outset of the CXL emulation support, it turns out that in practice a
> > platform will never follow the QEMU example of specifying simultaneous
> > support for HDM-H and HDM-D[B] in a single window.
> > 
> > HDM-D mandates extra bus cycles for host/device bias protocol, and HDM-DB
> > mandates extra bus cycles for back-invalidate protocol, so hardware must
> > be explicitly prepared for device-memory unlike host-only memory
> > (HDM-H).
> > 
> > In preparation for the kernel dropping support for windows that do not
> > select between device and host-only memory, move QEMU exclusively to
> > declaring host-only windows.
> > 
> > Signed-off-by: Dan Williams 
> Hi Dan,
> 
> Can we have some spec references? I think the Protocol tables in
> appendix C would work for that - but more specific examples called
> out from them would be good.

I presume our messages crossed in the ether:

https://lore.kernel.org/linux-cxl/641a018ed7fb8_269929...@dwillia2-xfh.jf.intel.com.notmuch/

...but yes, if I was still committed to this change, which I am not at
this point, some references from Appendix C and "3.3.11 Forward Progress
and Ordering Rules would be appropriate".

> I'm also not totally convinced it isn't a host implementation detail
> - key here is that host bridge's are still part of the host so can
> do impdef stuff as long as they look correct to CXL side and to
> host side.
> 
> Time for some wild implementation conjecturing.
> 
> Imagine a host that has host bridges of above average smartness.
> Those host bridges have HDM decoders (this doesn't work if not)
> 
> Host is using a single HPA window for HDM-D[B] and HDM-H.
> The host bridge knows the target is HDM-H - it can get that form
> the HDM decoder Target Type bit etc.  The HB can send (to the
> rest of the Host) whatever replies are necessary / fill in extra
> info to make it look like HDM-D[B] to the host interconnect protocol.
> 
> (after some fun with a white board we think you can make this efficient
>  by potentially making the Host bridge HDM decoder setup visible to
>  other parts of the host - relatively easy give lots of time allowed
>  for a decoder commit).
> 
> Why would you do this?  Limited HPA space availability on the host
> and wanting to be very flexible about use of the CXL windows.

Limited HPA space and fewer decode rules at the root level indeed sounds
compelling.

> Obviously this is all moot if there is a constraint we can point to
> in a specification.

At this time I don't have such a reference.

> BTW. I'm carrying a patch (it's on the gitlab tree) that I haven't
> posted yet that lets you configure this restriction at runtime as
> a similar potential host implementation restriction occurs for
> PMEM vs Volatile.  That is also needed to exercise the fun corners of
> QTG etc.

Sounds good.

[PATCH] Hexagon (translate.c): avoid redundant PC updates on COF

2023-03-22 Thread Matheus Tavares Bernardino

When there is a conditional change of flow or an endloop instruction, we
preload HEX_REG_PC with ctx->next_PC at gen_start_packet(). Nonetheless,
we still generate TCG code to do this update again at gen_goto_tb() when
the condition for the COF is not met, thus producing redundant
instructions. This can be seen with the following packet:

 0x004002e4:  0x5c20d000 {   if (!P0) jump:t PC+0 }

Which generates this TCG code:

    004002e4
-> mov_i32 pc,$0x4002e8
   and_i32 loc9,p0,$0x1
   mov_i32 branch_taken,loc9
   add_i32 pkt_cnt,pkt_cnt,$0x2
   add_i32 insn_cnt,insn_cnt,$0x2
   brcond_i32 branch_taken,$0x0,ne,$L1
   goto_tb $0x0
   mov_i32 pc,$0x4002e4
   exit_tb $0x7fb0c36e5200
   set_label $L1
   goto_tb $0x1
-> mov_i32 pc,$0x4002e8
   exit_tb $0x7fb0c36e5201
   set_label $L0
   exit_tb $0x7fb0c36e5203

Note that even after optimizations, the redundant PC update is still
present:

    004002e4
-> mov_i32 pc,$0x4002e8 sync: 0  dead: 0 1  pref=0x
   mov_i32 branch_taken,$0x1sync: 0  dead: 0 1  pref=0x
   add_i32 pkt_cnt,pkt_cnt,$0x2 sync: 0  dead: 0 1  pref=0x
   add_i32 insn_cnt,insn_cnt,$0x2   sync: 0  dead: 0 1 2  pref=0x
   goto_tb $0x1
-> mov_i32 pc,$0x4002e8 sync: 0  dead: 0 1  pref=0x
   exit_tb $0x7fb0c36e5201
   set_label $L0
   exit_tb $0x7fb0c36e5203

With this patch, the second redundant update is properly discarded.

Note that we need the additional "move_to_pc" flag instead of just
avoiding the update whenever `dest == ctx->next_PC`, as that could
potentially skip updates from a COF with met condition, whose
ctx->branch_dest just happens to be equal to ctx->next_PC.

Signed-off-by: Matheus Tavares Bernardino 
---
 target/hexagon/translate.c | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git target/hexagon/translate.c target/hexagon/translate.c
index 665476ab48..58d638f734 100644
--- target/hexagon/translate.c
+++ target/hexagon/translate.c
@@ -128,14 +128,19 @@ static bool use_goto_tb(DisasContext *ctx, target_ulong 
dest)
 return translator_use_goto_tb(>base, dest);
 }
 
-static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest)
+static void gen_goto_tb(DisasContext *ctx, int idx, target_ulong dest, bool
+move_to_pc)
 {
 if (use_goto_tb(ctx, dest)) {
 tcg_gen_goto_tb(idx);
-tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+if (move_to_pc) {
+tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+}
 tcg_gen_exit_tb(ctx->base.tb, idx);
 } else {
-tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+if (move_to_pc) {
+tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], dest);
+}
 tcg_gen_lookup_and_goto_ptr();
 }
 }
@@ -150,11 +155,11 @@ static void gen_end_tb(DisasContext *ctx)
 if (ctx->branch_cond != TCG_COND_ALWAYS) {
 TCGLabel *skip = gen_new_label();
 tcg_gen_brcondi_tl(ctx->branch_cond, hex_branch_taken, 0, skip);
-gen_goto_tb(ctx, 0, ctx->branch_dest);
+gen_goto_tb(ctx, 0, ctx->branch_dest, true);
 gen_set_label(skip);
-gen_goto_tb(ctx, 1, ctx->next_PC);
+gen_goto_tb(ctx, 1, ctx->next_PC, false);
 } else {
-gen_goto_tb(ctx, 0, ctx->branch_dest);
+gen_goto_tb(ctx, 0, ctx->branch_dest, true);
 }
 } else if (ctx->is_tight_loop &&
pkt->insn[pkt->num_insns - 1].opcode == J2_endloop0) {
@@ -165,9 +170,9 @@ static void gen_end_tb(DisasContext *ctx)
 TCGLabel *skip = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_LEU, hex_gpr[HEX_REG_LC0], 1, skip);
 tcg_gen_subi_tl(hex_gpr[HEX_REG_LC0], hex_gpr[HEX_REG_LC0], 1);
-gen_goto_tb(ctx, 0, ctx->base.tb->pc);
+gen_goto_tb(ctx, 0, ctx->base.tb->pc, true);
 gen_set_label(skip);
-gen_goto_tb(ctx, 1, ctx->next_PC);
+gen_goto_tb(ctx, 1, ctx->next_PC, false);
 } else {
 tcg_gen_lookup_and_goto_ptr();
 }
-- 
2.37.2

[PATCH for-8.1 v4 04/25] target/riscv: add PRIV_VERSION_LATEST

2023-03-22 Thread Daniel Henrique Barboza

All these generic CPUs are using the latest priv available, at this
moment PRIV_VERSION_1_12_0:

- riscv_any_cpu_init()
- rv32_base_cpu_init()
- rv64_base_cpu_init()
- rv128_base_cpu_init()

Create a new PRIV_VERSION_LATEST enum and use it in those cases. I'll
make it easier to update everything at once when a new priv version is
available.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: Richard Henderson 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 8 
 target/riscv/cpu.h | 2 ++
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 18032dfd4e..1ee322001b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -338,7 +338,7 @@ static void riscv_any_cpu_init(Object *obj)
 VM_1_10_SV32 : VM_1_10_SV57);
 #endif
 
-env->priv_ver = PRIV_VERSION_1_12_0;
+env->priv_ver = PRIV_VERSION_LATEST;
 register_cpu_props(obj);
 }
 
@@ -350,7 +350,7 @@ static void rv64_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV64, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-env->priv_ver = PRIV_VERSION_1_12_0;
+env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
@@ -426,7 +426,7 @@ static void rv128_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV128, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-env->priv_ver = PRIV_VERSION_1_12_0;
+env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
@@ -439,7 +439,7 @@ static void rv32_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV32, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-env->priv_ver = PRIV_VERSION_1_12_0;
+env->priv_ver = PRIV_VERSION_LATEST;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
 #endif
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 638e47c75a..76f81c6b68 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -88,6 +88,8 @@ enum {
 PRIV_VERSION_1_10_0 = 0,
 PRIV_VERSION_1_11_0,
 PRIV_VERSION_1_12_0,
+
+PRIV_VERSION_LATEST = PRIV_VERSION_1_12_0,
 };
 
 #define VEXT_VERSION_1_00_0 0x0001
-- 
2.39.2

[PATCH for-8.1 v4 11/25] target/riscv/cpu.c: set cpu config in set_misa()

2023-03-22 Thread Daniel Henrique Barboza

set_misa() is setting all 'misa' related env states and nothing else.
But other functions, namely riscv_cpu_validate_set_extensions(), uses
the config object to do its job.

This creates a need to set the single letter extensions in the cfg
object to keep both in sync. At this moment this is being done by
register_cpu_props(), forcing every CPU to do a call to this function.

Let's beef up set_misa() and make the function do the sync for us. This
will relieve named CPUs to having to call register_cpu_props(), which
will then be redesigned to a more specialized role next.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 43 ---
 target/riscv/cpu.h |  4 ++--
 2 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 36c55abda0..df5c0bda70 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -236,8 +236,40 @@ const char *riscv_cpu_get_trap_name(target_ulong cause, 
bool async)
 
 static void set_misa(CPURISCVState *env, RISCVMXL mxl, uint32_t ext)
 {
+RISCVCPU *cpu;
+
 env->misa_mxl_max = env->misa_mxl = mxl;
 env->misa_ext_mask = env->misa_ext = ext;
+
+/*
+ * ext = 0 will only be a thing during cpu_init() functions
+ * as a way of setting an extension-agnostic CPU. We do
+ * not support clearing misa_ext* and the ext_N flags in
+ * RISCVCPUConfig in regular circunstances.
+ */
+if (ext == 0) {
+return;
+}
+
+/*
+ * We can't use riscv_cpu_cfg() in this case because it is
+ * a read-only inline and we're going to change the values
+ * of cpu->cfg.
+ */
+cpu = env_archcpu(env);
+
+cpu->cfg.ext_i = ext & RVI;
+cpu->cfg.ext_e = ext & RVE;
+cpu->cfg.ext_m = ext & RVM;
+cpu->cfg.ext_a = ext & RVA;
+cpu->cfg.ext_f = ext & RVF;
+cpu->cfg.ext_d = ext & RVD;
+cpu->cfg.ext_v = ext & RVV;
+cpu->cfg.ext_c = ext & RVC;
+cpu->cfg.ext_s = ext & RVS;
+cpu->cfg.ext_u = ext & RVU;
+cpu->cfg.ext_h = ext & RVH;
+cpu->cfg.ext_j = ext & RVJ;
 }
 
 #ifndef CONFIG_USER_ONLY
@@ -340,7 +372,6 @@ static void riscv_any_cpu_init(Object *obj)
 #endif
 
 env->priv_ver = PRIV_VERSION_LATEST;
-register_cpu_props(obj);
 
 /* inherited from parent obj via riscv_cpu_init() */
 cpu->cfg.ext_ifencei = true;
@@ -368,7 +399,6 @@ static void rv64_sifive_u_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
@@ -387,7 +417,6 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -408,9 +437,6 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 env->priv_ver = PRIV_VERSION_1_11_0;
 
 cpu->cfg.ext_g = true;
-cpu->cfg.ext_c = true;
-cpu->cfg.ext_u = true;
-cpu->cfg.ext_s = true;
 cpu->cfg.ext_icsr = true;
 cpu->cfg.ext_zfh = true;
 cpu->cfg.mmu = true;
@@ -472,8 +498,6 @@ static void rv32_sifive_u_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 CPURISCVState *env = >env;
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
-
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
@@ -492,7 +516,6 @@ static void rv32_sifive_e_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -510,7 +533,6 @@ static void rv32_ibex_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 
 set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_11_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -529,7 +551,6 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);
-register_cpu_props(obj);
 env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 76f81c6b68..ebe0fff668 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -66,8 +66,8 @@
 #define RV(x) ((target_ulong)1 << (x - 'A'))
 
 /*
- * Consider updating register_cpu_props() when adding
- * new MISA bits here.
+ *

[PATCH for-8.1 v4 08/25] target/riscv/cpu.c: validate extensions before riscv_timer_init()

2023-03-22 Thread Daniel Henrique Barboza

There is no need to init timers if we're not even sure that our
extensions are valid. Execute riscv_cpu_validate_set_extensions() before
riscv_timer_init().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 7458845fec..fef55d7d79 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1237,12 +1237,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-#ifndef CONFIG_USER_ONLY
-if (cpu->cfg.ext_sstc) {
-riscv_timer_init(cpu);
-}
-#endif /* CONFIG_USER_ONLY */
-
 riscv_cpu_validate_set_extensions(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
@@ -1250,6 +1244,10 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 #ifndef CONFIG_USER_ONLY
+if (cpu->cfg.ext_sstc) {
+riscv_timer_init(cpu);
+}
+
 if (cpu->cfg.pmu_num) {
 if (!riscv_pmu_init(cpu, cpu->cfg.pmu_num) && cpu->cfg.ext_sscofpmf) {
 cpu->pmu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
-- 
2.39.2

[PATCH for-8.1 v4 06/25] target/riscv/cpu.c: add riscv_cpu_validate_misa_mxl()

2023-03-22 Thread Daniel Henrique Barboza

Let's remove more code that is open coded in riscv_cpu_realize() and put
it into a helper. Let's also add an error message instead of just
asserting out if env->misa_mxl_max != env->misa_mlx.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 51 ++
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 17b301967c..1a298e5e55 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -879,6 +879,33 @@ static void riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU 
*cpu)
 }
 }
 
+static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)
+{
+RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
+CPUClass *cc = CPU_CLASS(mcc);
+CPURISCVState *env = >env;
+
+/* Validate that MISA_MXL is set properly. */
+switch (env->misa_mxl_max) {
+#ifdef TARGET_RISCV64
+case MXL_RV64:
+case MXL_RV128:
+cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
+break;
+#endif
+case MXL_RV32:
+cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
+break;
+default:
+g_assert_not_reached();
+}
+
+if (env->misa_mxl_max != env->misa_mxl) {
+error_setg(errp, "misa_mxl_max must be equal to misa_mxl");
+return;
+}
+}
+
 /*
  * Check consistency between chosen extensions while setting
  * cpu->cfg accordingly, doing a set_misa() in the end.
@@ -1180,9 +1207,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 {
 CPUState *cs = CPU(dev);
 RISCVCPU *cpu = RISCV_CPU(dev);
-CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
-CPUClass *cc = CPU_CLASS(mcc);
 Error *local_err = NULL;
 
 cpu_exec_realizefn(cs, _err);
@@ -1197,6 +1222,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+riscv_cpu_validate_misa_mxl(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
 /*
  * Enhanced PMP should only be available
@@ -1213,22 +1244,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 #endif /* CONFIG_USER_ONLY */
 
-/* Validate that MISA_MXL is set properly. */
-switch (env->misa_mxl_max) {
-#ifdef TARGET_RISCV64
-case MXL_RV64:
-case MXL_RV128:
-cc->gdb_core_xml_file = "riscv-64bit-cpu.xml";
-break;
-#endif
-case MXL_RV32:
-cc->gdb_core_xml_file = "riscv-32bit-cpu.xml";
-break;
-default:
-g_assert_not_reached();
-}
-assert(env->misa_mxl_max == env->misa_mxl);
-
 riscv_cpu_validate_set_extensions(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-- 
2.39.2

[PATCH for-8.1 v4 18/25] target/riscv: error out on priv failure for RVH

2023-03-22 Thread Daniel Henrique Barboza

riscv_cpu_disable_priv_spec_isa_exts(), at the end of
riscv_cpu_validate_set_extensions(), will disable cpu->cfg.ext_h and
cpu->cfg.ext_v if priv_ver check fails.

This check can be done in riscv_cpu_validate_misa_ext(). The difference
here is that we're not silently disable it: we'll error out. Silently
disabling a MISA extension after all the validation is completed can can
cause inconsistencies that we don't have to deal with. Verify ealier and
fail faster.

Note that we're ignoring RVV priv_ver validation since its minimal priv
is also the minimal value we support. RVH will error out if enabled
under priv_ver under 1_12_0.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 399f63b42f..d2eb2b3ba1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1055,6 +1055,20 @@ static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, 
Error **errp)
 return;
 }
 
+/*
+ * Check for priv spec version. RVH is 1_12_0, RVV is 1_10_0.
+ * We don't support anything under 1_10_0 so skip checking
+ * priv for RVV.
+ *
+ * We're hardcoding it here to avoid looping into the
+ * 50+ entries of isa_edata_arr[] just to check the RVH
+ * entry.
+ */
+if (cpu->cfg.ext_h && env->priv_ver < PRIV_VERSION_1_12_0) {
+error_setg(errp, "H extension requires priv spec 1.12.0");
+return;
+}
+
 if (cpu->cfg.ext_v) {
 riscv_cpu_validate_v(env, >cfg, _err);
 if (local_err != NULL) {
-- 
2.39.2

[PATCH for-8.1 v4 20/25] target/riscv: make validate_misa_ext() use a misa_ext val

2023-03-22 Thread Daniel Henrique Barboza

We have all MISA specific validations in riscv_cpu_validate_misa_ext(),
and we have a guarantee that env->misa_ext will always be in sync with
cpu->cfg at this point during realize time, so let's convert it to use a
'misa_ext' parameter instead of reading cpu->cfg.

This will prepare the function to be used in write_misa() where we won't
have an updated cpu->cfg object to work with. riscv_cpu_validate_v() is
changed to receive a const pointer to the cpu->cfg object via
riscv_cpu_cfg().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 29 -
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f1e82a8dda..bd90e1d329 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -930,7 +930,8 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
disassemble_info *info)
 }
 }
 
-static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
+static void riscv_cpu_validate_v(CPURISCVState *env,
+ const RISCVCPUConfig *cfg,
  Error **errp)
 {
 int vext_version = VEXT_VERSION_1_00_0;
@@ -1016,41 +1017,43 @@ static void 
riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU *cpu)
 }
 }
 
-static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, Error **errp)
+
+static void riscv_cpu_validate_misa_ext(CPURISCVState *env,
+uint32_t misa_ext,
+Error **errp)
 {
-CPURISCVState *env = >env;
 Error *local_err = NULL;
 
-if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
+if (misa_ext & RVI && misa_ext & RVE) {
 error_setg(errp,
"I and E extensions are incompatible");
 return;
 }
 
-if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
+if (!(misa_ext & RVI) && !(misa_ext & RVE)) {
 error_setg(errp,
"Either I or E extension must be set");
 return;
 }
 
-if (cpu->cfg.ext_s && !cpu->cfg.ext_u) {
+if (misa_ext & RVS && !(misa_ext & RVU)) {
 error_setg(errp,
"Setting S extension without U extension is illegal");
 return;
 }
 
-if (cpu->cfg.ext_h && !cpu->cfg.ext_i) {
+if (misa_ext & RVH && !(misa_ext & RVI)) {
 error_setg(errp,
"H depends on an I base integer ISA with 32 x registers");
 return;
 }
 
-if (cpu->cfg.ext_h && !cpu->cfg.ext_s) {
+if (misa_ext & RVH && !(misa_ext & RVS)) {
 error_setg(errp, "H extension implicitly requires S-mode");
 return;
 }
 
-if (cpu->cfg.ext_d && !cpu->cfg.ext_f) {
+if (misa_ext & RVD && !(misa_ext & RVF)) {
 error_setg(errp, "D extension requires F extension");
 return;
 }
@@ -1064,13 +1067,13 @@ static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, 
Error **errp)
  * 50+ entries of isa_edata_arr[] just to check the RVH
  * entry.
  */
-if (cpu->cfg.ext_h && env->priv_ver < PRIV_VERSION_1_12_0) {
+if (misa_ext & RVH && env->priv_ver < PRIV_VERSION_1_12_0) {
 error_setg(errp, "H extension requires priv spec 1.12.0");
 return;
 }
 
-if (cpu->cfg.ext_v) {
-riscv_cpu_validate_v(env, >cfg, _err);
+if (misa_ext & RVV) {
+riscv_cpu_validate_v(env, riscv_cpu_cfg(env), _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 return;
@@ -1355,7 +1358,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 env->misa_ext_mask = env->misa_ext;
 }
 
-riscv_cpu_validate_misa_ext(cpu, _err);
+riscv_cpu_validate_misa_ext(env, env->misa_ext, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 return;
-- 
2.39.2

[PATCH for-8.1 v4 21/25] target/riscv: split riscv_cpu_validate_set_extensions()

2023-03-22 Thread Daniel Henrique Barboza

We're now ready to split riscv_cpu_validate_set_extensions() in two.
None of these steps are going to touch env->misa_ext*.

riscv_cpu_validate_extensions() will take care of all validations based
on cpu->cfg values. cpu->cfg changes that are required for the
validation are being tolerated here. This is the case of extensions such
as ext_zfh enabling ext_zfhmin.

The RVV chain enablement (ext_zve64d, ext_zve64f and ext_zve32f) is also
being tolerated because the risk of failure is being mitigated by the
RVV -> RVD && RVF dependency in validate_misa_ext() done prior.

In an ideal world we would have all these extensions declared as object
properties, with getters and setters, and we would be able to, e.g.,
enable ext_zfhmin as soon as ext_zfh is enabled. This would avoid
cpu->cfg changes during riscv_cpu_validate_extensions(). Easier said
than done, not just because of the hundreds of lines involved in it, but
also because we want these properties to be available just for generic
CPUs (named CPUs don't want these properties exposed for users). For now
we'll work with that we have.

riscv_cpu_commit_cpu_cfg() is the last step of the validation where more
cpu->cfg properties are set and disabling of extensions due to priv spec
happens. We're already validated everything we wanted, so any cpu->cfg
change made here is valid.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index bd90e1d329..ed02332093 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1109,10 +1109,10 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 }
 
 /*
- * Check consistency between chosen extensions while setting
- * cpu->cfg accordingly.
+ * Check consistency between chosen extensions. No changes
+ * in env->misa_ext are made.
  */
-static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
+static void riscv_cpu_validate_extensions(RISCVCPU *cpu, Error **errp)
 {
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
 /*
@@ -1201,7 +1201,10 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 }
+}
 
+static void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu)
+{
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
 cpu->cfg.ext_zkr = true;
@@ -1364,12 +1367,14 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-riscv_cpu_validate_set_extensions(cpu, _err);
+riscv_cpu_validate_extensions(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
 return;
 }
 
+riscv_cpu_commit_cpu_cfg(cpu);
+
 #ifndef CONFIG_USER_ONLY
 if (cpu->cfg.ext_sstc) {
 riscv_timer_init(cpu);
-- 
2.39.2

[PATCH for-8.1 v4 24/25] target/riscv: update cpu->cfg misa bits in commit_cpu_cfg()

2023-03-22 Thread Daniel Henrique Barboza

write_misa() is able to use the same validation workflow
riscv_cpu_realize() uses. But it's still not capable of updating
cpu->cfg misa props yet.

We have no way of blocking future (and current) code from checking
env->misa_ext (via riscv_has_ext()) or reading cpu->cfg directly, so our
best alternative is to keep everything in sync.

riscv_cpu_commit_cpu_cfg() now receives an extra 'misa_ext' parameter.
If this val is different from the existing env->misa_ext, update
env->misa and cpu->cfg with the new value. riscv_cpu_realize() will
ignore this code since env->misa_ext isn't touched during validation,
but write_misa() will use it to keep cpu->cfg in sync with the new
env->misa_ext value.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 16 ++--
 target/riscv/cpu.h |  2 +-
 target/riscv/csr.c |  3 +--
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 41b17ba0c3..88806d1050 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1204,8 +1204,20 @@ void riscv_cpu_validate_extensions(RISCVCPU *cpu, 
uint32_t misa_ext,
 }
 }
 
-void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu)
+void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu, uint32_t misa_ext)
 {
+CPURISCVState *env = >env;
+
+/*
+ * write_misa() needs to update cpu->cfg with the new
+ * MISA bits. This is a no-op for the riscv_cpu_realize()
+ * path.
+ */
+if (env->misa_ext != misa_ext) {
+env->misa_ext = misa_ext;
+riscv_set_cpucfg_with_misa(>cfg, misa_ext);
+}
+
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
 cpu->cfg.ext_zkr = true;
@@ -1374,7 +1386,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
-riscv_cpu_commit_cpu_cfg(cpu);
+riscv_cpu_commit_cpu_cfg(cpu, env->misa_ext);
 
 #ifndef CONFIG_USER_ONLY
 if (cpu->cfg.ext_sstc) {
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index ca2ba6a647..befc3b8fff 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -597,7 +597,7 @@ void riscv_cpu_validate_misa_ext(CPURISCVState *env, 
uint32_t misa_ext,
  Error **errp);
 void riscv_cpu_validate_extensions(RISCVCPU *cpu, uint32_t misa_ext,
Error **errp);
-void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu);
+void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu, uint32_t misa_ext);
 
 #define cpu_list riscv_cpu_list
 #define cpu_mmu_index riscv_cpu_mmu_index
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 8d5e8f9ad1..839862f1a8 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1396,7 +1396,7 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
 return RISCV_EXCP_NONE;
 }
 
-riscv_cpu_commit_cpu_cfg(cpu);
+riscv_cpu_commit_cpu_cfg(cpu, val);
 
 if (!(val & RVF)) {
 env->mstatus &= ~MSTATUS_FS;
@@ -1404,7 +1404,6 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
 
 /* flush translation cache */
 tb_flush(env_cpu(env));
-env->misa_ext = val;
 env->xl = riscv_cpu_mxl(env);
 return RISCV_EXCP_NONE;
 }
-- 
2.39.2

[PATCH for-8.1 v4 23/25] target/riscv: rework write_misa()

2023-03-22 Thread Daniel Henrique Barboza

write_misa() must use as much common logic as possible. We want to open
code just the bits that are exclusive to the CSR write operation and TCG
internals.

Rewrite write_misa() to work as follows:

- mask the write using misa_ext_mask to avoid enabling unsupported
  extensions;

- suppress RVC if the next insn isn't aligned;

- handle RVE. This is done by filtering all bits but RVE from 'val'.
  Setting RVE will forcefully set only RVE - assuming it gets
  validated afterwards;

- emulate the steps done by realize(): validate the candidate misa_ext
  val, then validate the configuration with the candidate misa_ext val,
  and finally commit the changes to cpu->cfg.

If any of the validation steps fails, the write operation is a no-op.

Let's keep write_misa() as experimental for now until this logic gains
enough mileage.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 12 --
 target/riscv/cpu.h |  6 +
 target/riscv/csr.c | 59 ++
 3 files changed, 45 insertions(+), 32 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 0e6b8fb45e..41b17ba0c3 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1018,9 +1018,8 @@ static void riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU 
*cpu)
 }
 
 
-static void riscv_cpu_validate_misa_ext(CPURISCVState *env,
-uint32_t misa_ext,
-Error **errp)
+void riscv_cpu_validate_misa_ext(CPURISCVState *env, uint32_t misa_ext,
+ Error **errp)
 {
 Error *local_err = NULL;
 
@@ -1113,9 +1112,8 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
  * candidate misa_ext value. No changes in env->misa_ext
  * are made.
  */
-static void riscv_cpu_validate_extensions(RISCVCPU *cpu,
-  uint32_t misa_ext,
-  Error **errp)
+void riscv_cpu_validate_extensions(RISCVCPU *cpu, uint32_t misa_ext,
+   Error **errp)
 {
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
 /*
@@ -1206,7 +1204,7 @@ static void riscv_cpu_validate_extensions(RISCVCPU *cpu,
 }
 }
 
-static void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu)
+void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu)
 {
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index dbb4df9df0..ca2ba6a647 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -593,6 +593,12 @@ bool riscv_cpu_tlb_fill(CPUState *cs, vaddr address, int 
size,
 char *riscv_isa_string(RISCVCPU *cpu);
 void riscv_cpu_list(void);
 
+void riscv_cpu_validate_misa_ext(CPURISCVState *env, uint32_t misa_ext,
+ Error **errp);
+void riscv_cpu_validate_extensions(RISCVCPU *cpu, uint32_t misa_ext,
+   Error **errp);
+void riscv_cpu_commit_cpu_cfg(RISCVCPU *cpu);
+
 #define cpu_list riscv_cpu_list
 #define cpu_mmu_index riscv_cpu_mmu_index
 
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d522efc0b6..8d5e8f9ad1 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1343,39 +1343,17 @@ static RISCVException read_misa(CPURISCVState *env, int 
csrno,
 static RISCVException write_misa(CPURISCVState *env, int csrno,
  target_ulong val)
 {
+RISCVCPU *cpu = env_archcpu(env);
+Error *local_err = NULL;
+
 if (!riscv_cpu_cfg(env)->misa_w) {
 /* drop write to misa */
 return RISCV_EXCP_NONE;
 }
 
-/* 'I' or 'E' must be present */
-if (!(val & (RVI | RVE))) {
-/* It is not, drop write to misa */
-return RISCV_EXCP_NONE;
-}
-
-/* 'E' excludes all other extensions */
-if (val & RVE) {
-/*
- * when we support 'E' we can do "val = RVE;" however
- * for now we just drop writes if 'E' is present.
- */
-return RISCV_EXCP_NONE;
-}
-
-/*
- * misa.MXL writes are not supported by QEMU.
- * Drop writes to those bits.
- */
-
 /* Mask extensions that are not supported by this hart */
 val &= env->misa_ext_mask;
 
-/* 'D' depends on 'F', so clear 'D' if 'F' is not present */
-if ((val & RVD) && !(val & RVF)) {
-val &= ~RVD;
-}
-
 /*
  * Suppress 'C' if next instruction is not aligned
  * TODO: this should check next_pc
@@ -1389,6 +1367,37 @@ static RISCVException write_misa(CPURISCVState *env, int 
csrno,
 return RISCV_EXCP_NONE;
 }
 
+/*
+ * We'll handle special cases in separate. If one
+ * of these bits are enabled we'll handle them and
+ * end the CSR write.
+ */
+if (val & RVE && !(env->misa_ext & RVE)) {
+/*
+ * RVE must be enabled by itself. Clear all other
+ * misa_env bits and let the validation do its
+ * job.
+ */
+val &= RVE;
+}
+
+/*
+

[PATCH for-8.1 v4 01/25] target/riscv/cpu.c: add riscv_cpu_validate_v()

2023-03-22 Thread Daniel Henrique Barboza

The RVV verification will error out if fails and it's being done at the
end of riscv_cpu_validate_set_extensions(). Let's put it in its own
function and do it earlier.

We'll move it out of riscv_cpu_validate_set_extensions() in the near future,
but for now this is enough to clean the code a bit.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 86 ++
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1e97473af2..18591aa53a 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -802,6 +802,46 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
disassemble_info *info)
 }
 }
 
+static void riscv_cpu_validate_v(CPURISCVState *env, RISCVCPUConfig *cfg,
+ Error **errp)
+{
+int vext_version = VEXT_VERSION_1_00_0;
+
+if (!is_power_of_2(cfg->vlen)) {
+error_setg(errp, "Vector extension VLEN must be power of 2");
+return;
+}
+if (cfg->vlen > RV_VLEN_MAX || cfg->vlen < 128) {
+error_setg(errp,
+   "Vector extension implementation only supports VLEN "
+   "in the range [128, %d]", RV_VLEN_MAX);
+return;
+}
+if (!is_power_of_2(cfg->elen)) {
+error_setg(errp, "Vector extension ELEN must be power of 2");
+return;
+}
+if (cfg->elen > 64 || cfg->elen < 8) {
+error_setg(errp,
+   "Vector extension implementation only supports ELEN "
+   "in the range [8, 64]");
+return;
+}
+if (cfg->vext_spec) {
+if (!g_strcmp0(cfg->vext_spec, "v1.0")) {
+vext_version = VEXT_VERSION_1_00_0;
+} else {
+error_setg(errp, "Unsupported vector spec version '%s'",
+   cfg->vext_spec);
+return;
+}
+} else {
+qemu_log("vector version is not specified, "
+ "use the default value v1.0\n");
+}
+set_vext_version(env, vext_version);
+}
+
 /*
  * Check consistency between chosen extensions while setting
  * cpu->cfg accordingly, doing a set_misa() in the end.
@@ -809,6 +849,7 @@ static void riscv_cpu_disas_set_info(CPUState *s, 
disassemble_info *info)
 static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
 {
 CPURISCVState *env = >env;
+Error *local_err = NULL;
 uint32_t ext = 0;
 
 /* Do some ISA extension error checking */
@@ -939,6 +980,14 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 }
 }
 
+if (cpu->cfg.ext_v) {
+riscv_cpu_validate_v(env, >cfg, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
+
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
 cpu->cfg.ext_zkr = true;
@@ -993,44 +1042,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 ext |= RVH;
 }
 if (cpu->cfg.ext_v) {
-int vext_version = VEXT_VERSION_1_00_0;
 ext |= RVV;
-if (!is_power_of_2(cpu->cfg.vlen)) {
-error_setg(errp,
-   "Vector extension VLEN must be power of 2");
-return;
-}
-if (cpu->cfg.vlen > RV_VLEN_MAX || cpu->cfg.vlen < 128) {
-error_setg(errp,
-   "Vector extension implementation only supports VLEN "
-   "in the range [128, %d]", RV_VLEN_MAX);
-return;
-}
-if (!is_power_of_2(cpu->cfg.elen)) {
-error_setg(errp,
-   "Vector extension ELEN must be power of 2");
-return;
-}
-if (cpu->cfg.elen > 64 || cpu->cfg.elen < 8) {
-error_setg(errp,
-   "Vector extension implementation only supports ELEN "
-   "in the range [8, 64]");
-return;
-}
-if (cpu->cfg.vext_spec) {
-if (!g_strcmp0(cpu->cfg.vext_spec, "v1.0")) {
-vext_version = VEXT_VERSION_1_00_0;
-} else {
-error_setg(errp,
-   "Unsupported vector spec version '%s'",
-   cpu->cfg.vext_spec);
-return;
-}
-} else {
-qemu_log("vector version is not specified, "
- "use the default value v1.0\n");
-}
-set_vext_version(env, vext_version);
 }
 if (cpu->cfg.ext_j) {
 ext |= RVJ;
-- 
2.39.2

[PATCH for-8.1 v4 16/25] target/riscv/cpu.c: add riscv_cpu_validate_misa_ext()

2023-03-22 Thread Daniel Henrique Barboza

Even after taking RVG off from riscv_cpu_validate_set_extensions(), the
function is still doing too much. It is validating misa bits, then
validating named extensions, and if the validation succeeds it's doing
more changes in both cpu->cfg and MISA bits.

It works for the support we have today, since we'll error out during
realize() time. This is not enough to support write_misa() though - we
don't want to error out if userspace writes something odd in the CSR.

This patch starts splitting riscv_cpu_validate_set_extensions() into a
three step process: validate misa_ext, validate cpu->cfg, then commit
the configuration. This separation will allow us to use these functions
in write_misa() without having to worry about saving CPU state during
runtime because the function changed state but failed to validate.

riscv_cpu_validate_misa_ext() will host all validations related to misa
bits only. Validations using misa bits + name extensions will remain in
validate_set_extensions().

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 77 ++
 1 file changed, 43 insertions(+), 34 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a7bad518be..f9710dd786 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1016,6 +1016,43 @@ static void 
riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU *cpu)
 }
 }
 
+static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, Error **errp)
+{
+if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
+error_setg(errp,
+   "I and E extensions are incompatible");
+return;
+}
+
+if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
+error_setg(errp,
+   "Either I or E extension must be set");
+return;
+}
+
+if (cpu->cfg.ext_s && !cpu->cfg.ext_u) {
+error_setg(errp,
+   "Setting S extension without U extension is illegal");
+return;
+}
+
+if (cpu->cfg.ext_h && !cpu->cfg.ext_i) {
+error_setg(errp,
+   "H depends on an I base integer ISA with 32 x registers");
+return;
+}
+
+if (cpu->cfg.ext_h && !cpu->cfg.ext_s) {
+error_setg(errp, "H extension implicitly requires S-mode");
+return;
+}
+
+if (cpu->cfg.ext_d && !cpu->cfg.ext_f) {
+error_setg(errp, "D extension requires F extension");
+return;
+}
+}
+
 static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)
 {
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(cpu);
@@ -1063,35 +1100,6 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
-error_setg(errp,
-   "I and E extensions are incompatible");
-return;
-}
-
-if (!cpu->cfg.ext_i && !cpu->cfg.ext_e) {
-error_setg(errp,
-   "Either I or E extension must be set");
-return;
-}
-
-if (cpu->cfg.ext_s && !cpu->cfg.ext_u) {
-error_setg(errp,
-   "Setting S extension without U extension is illegal");
-return;
-}
-
-if (cpu->cfg.ext_h && !cpu->cfg.ext_i) {
-error_setg(errp,
-   "H depends on an I base integer ISA with 32 x registers");
-return;
-}
-
-if (cpu->cfg.ext_h && !cpu->cfg.ext_s) {
-error_setg(errp, "H extension implicitly requires S-mode");
-return;
-}
-
 if (cpu->cfg.ext_f && !cpu->cfg.ext_icsr) {
 error_setg(errp, "F extension requires Zicsr");
 return;
@@ -,11 +1119,6 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 return;
 }
 
-if (cpu->cfg.ext_d && !cpu->cfg.ext_f) {
-error_setg(errp, "D extension requires F extension");
-return;
-}
-
 /* The V vector extension depends on the Zve64d extension */
 if (cpu->cfg.ext_v) {
 cpu->cfg.ext_zve64d = true;
@@ -1340,6 +1343,12 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 env->misa_ext_mask = env->misa_ext;
 }
 
+riscv_cpu_validate_misa_ext(cpu, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
 riscv_cpu_validate_set_extensions(cpu, _err);
 if (local_err != NULL) {
 error_propagate(errp, local_err);
-- 
2.39.2

[PATCH for-8.1 v4 00/25] target/riscv: rework CPU extensions validation

2023-03-22 Thread Daniel Henrique Barboza

Hi,

In this version I simplified the logic used in write_misa() after
reviews from Weiwei Li. The patch that handled RVV activation was
removed, making RVV a regular MISA bit to activate/deactivate.

We're also checking whether one of the IMAFD extensions got disabled
during write_misa() and, if that's the case, we'll clear RVG.

Series is based on top of Alistair's riscv-to-apply.next.

Patches acked: 1-5.

Changes from v3:
- patch 11:
  - remove c/u/s cpu->cfg assignment from rv64_thead_c906_cpu_init()
- patch 14:
  - add RVG in set_misa() call inside rv64_thead_c906_cpu_init()
  - remove cpu->cfg.ext_g assignment from rv64_thead_c906_cpu_init()
- patch 15:
  - remove ext_zfinx verification from riscv_cpu_enable_g()
- patch 25:
  - do not call riscv_cpu_enable_g() in write_misa()
  - enable/disable RVG extensions manually in write_misa()
- patch 26: removed
- v3 link: https://lists.gnu.org/archive/html/qemu-devel/2023-03/msg05097.html


Daniel Henrique Barboza (25):
  target/riscv/cpu.c: add riscv_cpu_validate_v()
  target/riscv/cpu.c: remove set_vext_version()
  target/riscv/cpu.c: remove set_priv_version()
  target/riscv: add PRIV_VERSION_LATEST
  target/riscv/cpu.c: add priv_spec validate/disable_exts helpers
  target/riscv/cpu.c: add riscv_cpu_validate_misa_mxl()
  target/riscv: move pmp and epmp validations to
validate_set_extensions()
  target/riscv/cpu.c: validate extensions before riscv_timer_init()
  target/riscv/cpu.c: remove cfg setup from riscv_cpu_init()
  target/riscv/cpu.c: avoid set_misa() in validate_set_extensions()
  target/riscv/cpu.c: set cpu config in set_misa()
  target/riscv/cpu.c: redesign register_cpu_props()
  target/riscv: put env->misa_ext <-> cpu->cfg code into helpers
  target/riscv: add RVG
  target/riscv/cpu.c: split RVG code from validate_set_extensions()
  target/riscv/cpu.c: add riscv_cpu_validate_misa_ext()
  target/riscv: move riscv_cpu_validate_v() to validate_misa_ext()
  target/riscv: error out on priv failure for RVH
  target/riscv: write env->misa_ext* in register_generic_cpu_props()
  target/riscv: make validate_misa_ext() use a misa_ext val
  target/riscv: split riscv_cpu_validate_set_extensions()
  target/riscv: use misa_ext val in riscv_cpu_validate_extensions()
  target/riscv: rework write_misa()
  target/riscv: update cpu->cfg misa bits in commit_cpu_cfg()
  target/riscv: handle RVG updates in write_misa()

 target/riscv/cpu.c | 654 -
 target/riscv/cpu.h |  14 +-
 target/riscv/csr.c |  72 +++--
 3 files changed, 463 insertions(+), 277 deletions(-)

-- 
2.39.2

[PATCH for-8.1 v4 17/25] target/riscv: move riscv_cpu_validate_v() to validate_misa_ext()

2023-03-22 Thread Daniel Henrique Barboza

riscv_cpu_validate_v() consists of checking RVV related attributes, such
as vlen and elen, and setting env->vext_spec.

This can be done during riscv_cpu_validate_misa_ext() time, allowing us
to fail earlier if RVV constrains are not met.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f9710dd786..399f63b42f 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1018,6 +1018,9 @@ static void riscv_cpu_disable_priv_spec_isa_exts(RISCVCPU 
*cpu)
 
 static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, Error **errp)
 {
+CPURISCVState *env = >env;
+Error *local_err = NULL;
+
 if (cpu->cfg.ext_i && cpu->cfg.ext_e) {
 error_setg(errp,
"I and E extensions are incompatible");
@@ -1051,6 +1054,14 @@ static void riscv_cpu_validate_misa_ext(RISCVCPU *cpu, 
Error **errp)
 error_setg(errp, "D extension requires F extension");
 return;
 }
+
+if (cpu->cfg.ext_v) {
+riscv_cpu_validate_v(env, >cfg, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+}
 }
 
 static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, Error **errp)
@@ -1088,7 +1099,6 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
 {
 CPURISCVState *env = >env;
-Error *local_err = NULL;
 uint32_t ext = 0;
 
 if (cpu->cfg.epmp && !cpu->cfg.pmp) {
@@ -1179,14 +1189,6 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 }
 }
 
-if (cpu->cfg.ext_v) {
-riscv_cpu_validate_v(env, >cfg, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return;
-}
-}
-
 if (cpu->cfg.ext_zk) {
 cpu->cfg.ext_zkn = true;
 cpu->cfg.ext_zkr = true;
-- 
2.39.2

Re: [PATCH v2 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-03-22 Thread Richard Henderson


On 3/22/23 13:36, Aaron Lindsay wrote:

I have not played around with this further. Do you feel this is
important to look into prior to merging this patchset (since QARMA3
isn't the default)?


No, a mere curiosity.


r~

[PATCH for-8.1 v4 02/25] target/riscv/cpu.c: remove set_vext_version()

2023-03-22 Thread Daniel Henrique Barboza

This setter is doing nothing else but setting env->vext_ver. Assign the
value directly.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 18591aa53a..2752efe1eb 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -245,11 +245,6 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
-static void set_vext_version(CPURISCVState *env, int vext_ver)
-{
-env->vext_ver = vext_ver;
-}
-
 #ifndef CONFIG_USER_ONLY
 static uint8_t satp_mode_from_str(const char *satp_mode_str)
 {
@@ -839,7 +834,7 @@ static void riscv_cpu_validate_v(CPURISCVState *env, 
RISCVCPUConfig *cfg,
 qemu_log("vector version is not specified, "
  "use the default value v1.0\n");
 }
-set_vext_version(env, vext_version);
+env->vext_ver = vext_version;
 }
 
 /*
-- 
2.39.2

[PATCH for-8.1 v4 10/25] target/riscv/cpu.c: avoid set_misa() in validate_set_extensions()

2023-03-22 Thread Daniel Henrique Barboza

set_misa() will be tuned up to do more than it's already doing and it
will be redundant to what riscv_cpu_validate_set_extensions() does.

Note that we don't ever change env->misa_mlx in this function, so
set_misa() can be replaced by just assigning env->misa_ext and
env->misa_ext_mask to 'ext'.

Signed-off-by: Daniel Henrique Barboza 
---
 target/riscv/cpu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c7b6e7b84b..36c55abda0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -949,7 +949,8 @@ static void riscv_cpu_validate_misa_mxl(RISCVCPU *cpu, 
Error **errp)
 
 /*
  * Check consistency between chosen extensions while setting
- * cpu->cfg accordingly, doing a set_misa() in the end.
+ * cpu->cfg accordingly, setting env->misa_ext and
+ * misa_ext_mask in the end.
  */
 static void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error **errp)
 {
@@ -1168,7 +1169,7 @@ static void riscv_cpu_validate_set_extensions(RISCVCPU 
*cpu, Error **errp)
 ext |= RVJ;
 }
 
-set_misa(env, env->misa_mxl, ext);
+env->misa_ext_mask = env->misa_ext = ext;
 }
 
 #ifndef CONFIG_USER_ONLY
-- 
2.39.2

[PATCH for-8.1 v4 03/25] target/riscv/cpu.c: remove set_priv_version()

2023-03-22 Thread Daniel Henrique Barboza

The setter is doing nothing special. Just set env->priv_ver directly.

Signed-off-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
---
 target/riscv/cpu.c | 30 +-
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 2752efe1eb..18032dfd4e 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -240,11 +240,6 @@ static void set_misa(CPURISCVState *env, RISCVMXL mxl, 
uint32_t ext)
 env->misa_ext_mask = env->misa_ext = ext;
 }
 
-static void set_priv_version(CPURISCVState *env, int priv_ver)
-{
-env->priv_ver = priv_ver;
-}
-
 #ifndef CONFIG_USER_ONLY
 static uint8_t satp_mode_from_str(const char *satp_mode_str)
 {
@@ -343,7 +338,7 @@ static void riscv_any_cpu_init(Object *obj)
 VM_1_10_SV32 : VM_1_10_SV57);
 #endif
 
-set_priv_version(env, PRIV_VERSION_1_12_0);
+env->priv_ver = PRIV_VERSION_1_12_0;
 register_cpu_props(obj);
 }
 
@@ -355,7 +350,7 @@ static void rv64_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV64, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-set_priv_version(env, PRIV_VERSION_1_12_0);
+env->priv_ver = PRIV_VERSION_1_12_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
@@ -366,7 +361,7 @@ static void rv64_sifive_u_cpu_init(Object *obj)
 CPURISCVState *env = _CPU(obj)->env;
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_10_0);
+env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV39);
 #endif
@@ -379,7 +374,7 @@ static void rv64_sifive_e_cpu_init(Object *obj)
 
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVC | RVU);
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_10_0);
+env->priv_ver = PRIV_VERSION_1_10_0;
 cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -392,7 +387,7 @@ static void rv64_thead_c906_cpu_init(Object *obj)
 RISCVCPU *cpu = RISCV_CPU(obj);
 
 set_misa(env, MXL_RV64, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
-set_priv_version(env, PRIV_VERSION_1_11_0);
+env->priv_ver = PRIV_VERSION_1_11_0;
 
 cpu->cfg.ext_g = true;
 cpu->cfg.ext_c = true;
@@ -431,7 +426,7 @@ static void rv128_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV128, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-set_priv_version(env, PRIV_VERSION_1_12_0);
+env->priv_ver = PRIV_VERSION_1_12_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV57);
 #endif
@@ -444,7 +439,7 @@ static void rv32_base_cpu_init(Object *obj)
 set_misa(env, MXL_RV32, 0);
 register_cpu_props(obj);
 /* Set latest version of privileged specification */
-set_priv_version(env, PRIV_VERSION_1_12_0);
+env->priv_ver = PRIV_VERSION_1_12_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
 #endif
@@ -454,8 +449,9 @@ static void rv32_sifive_u_cpu_init(Object *obj)
 {
 CPURISCVState *env = _CPU(obj)->env;
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVD | RVC | RVS | RVU);
+
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_10_0);
+env->priv_ver = PRIV_VERSION_1_10_0;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(RISCV_CPU(obj), VM_1_10_SV32);
 #endif
@@ -468,7 +464,7 @@ static void rv32_sifive_e_cpu_init(Object *obj)
 
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVC | RVU);
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_10_0);
+env->priv_ver = PRIV_VERSION_1_10_0;
 cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -482,7 +478,7 @@ static void rv32_ibex_cpu_init(Object *obj)
 
 set_misa(env, MXL_RV32, RVI | RVM | RVC | RVU);
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_11_0);
+env->priv_ver = PRIV_VERSION_1_11_0;
 cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -497,7 +493,7 @@ static void rv32_imafcu_nommu_cpu_init(Object *obj)
 
 set_misa(env, MXL_RV32, RVI | RVM | RVA | RVF | RVC | RVU);
 register_cpu_props(obj);
-set_priv_version(env, PRIV_VERSION_1_10_0);
+env->priv_ver = PRIV_VERSION_1_10_0;
 cpu->cfg.mmu = false;
 #ifndef CONFIG_USER_ONLY
 set_satp_mode_max_supported(cpu, VM_1_10_MBARE);
@@ -1160,7 +1156,7 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 if (priv_version >= PRIV_VERSION_1_10_0) {
-set_priv_version(env, priv_version);
+env->priv_ver = priv_version;
 }
 
 /* Force disable extensions if priv spec version does not

Re: [PATCH 4/4] hw/pci: Ensure pci_add_capability() is called before device is realized

2023-03-22 Thread Philippe Mathieu-Daudé


On 22/3/23 09:52, Philippe Mathieu-Daudé wrote:

On 22/3/23 03:18, Michael S. Tsirkin wrote:

On Tue, Mar 14, 2023 at 12:14:35PM +0100, Philippe Mathieu-Daudé wrote:

PCI capabilities can't appear magically at runtime.
Guests aren't expecting that. Assert all capabilities
are added _before_ a device instance is realized.

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/pci/pci.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index ac41fcbf6a..ed60b352e4 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2397,7 +2397,7 @@ static void pci_del_option_rom(PCIDevice *pdev)
   * On success, pci_add_capability() returns a positive value
   * that the offset of the pci capability.
   * On failure, it sets an error and returns a negative error
- * code.
+ * code. @pdev must be unrealized.
   */
  int pci_add_capability(PCIDevice *pdev, uint8_t cap_id,
 uint8_t offset, uint8_t size,
@@ -2406,6 +2406,8 @@ int pci_add_capability(PCIDevice *pdev, uint8_t 
cap_id,

  uint8_t *config;
  int i, overlapping_cap;
+    assert(!DEVICE(pdev)->realized);
+
  if (!offset) {
  offset = pci_find_space(pdev, size);
  /* out of PCI config space is programming error */


Fails in CI:

https://gitlab.com/mstredhat/qemu/-/jobs/3976974199

qemu-system-i386: ../hw/pci/pci.c:2409: pci_add_capability: Assertion 
`!DEVICE(pdev)->realized' failed.

Broken pipe
../tests/qtest/libqtest.c:193: kill_qemu() detected QEMU death from 
signal 6 (Aborted) (core dumped)

TAP parsing error: Too few tests run (expected 49, got 40)
(test program exited with status code -6)


Thanks for testing!

Likely the AMD-Vi device, see on the cover this series is
Based-on: <20230313153031.86107-1-phi...@linaro.org>
   "hw/i386/amd_iommu: Orphanize & QDev cleanups"
https://lore.kernel.org/qemu-devel/20230313153031.86107-1-phi...@linaro.org/


I confirm this is the AMD-Vi device, so you are missing the
previous (based-on) series:

#1 0x102d4e5b0 in pci_add_capability pci.c:2354
#2 0x102d2ff28 in msi_init msi.c:227
#3 0x10371a340 in amdvi_sysbus_realize amd_iommu.c:1553
#4 0x1037194e8 in x86_iommu_realize x86-iommu.c:124
#5 0x10409db88 in device_set_realized+0x788 
(qemu-system-i386:arm64+0x101d91b88)
#6 0x1040cb248 in property_set_bool+0x2a0 
(qemu-system-i386:arm64+0x101dbf248)
#7 0x1040c31f4 in object_property_set+0x4bc 
(qemu-system-i386:arm64+0x101db71f4)
#8 0x1040d9990 in object_property_set_qobject+0x38 
(qemu-system-i386:arm64+0x101dcd990)
#9 0x1040c40f8 in object_property_set_bool+0xfc 
(qemu-system-i386:arm64+0x101db80f8)
#10 0x10409639c in qdev_realize+0x3bc 
(qemu-system-i386:arm64+0x101d8a39c)

#11 0x10334f8e8 in qdev_device_add_from_qdict qdev-monitor.c:714
#12 0x103352114 in qdev_device_add qdev-monitor.c:733
#13 0x10337906c in device_init_func vl.c:1140
#14 0x104a84200 in qemu_opts_foreach qemu-option.c:1135
#15 0x103364fcc in qemu_create_cli_devices vl.c:2541
#16 0x103364ab8 in qmp_x_exit_preconfig vl.c:2609
#17 0x10336c0dc in qemu_init vl.c:3611
#18 0x1040812e4 in main main.c:47
#19 0x1a025be4c  ()

Due to the required base series, this is 8.1 material.

Re: [PATCH v2 7/7] target/arm: Add CPU properties for most v8.3 PAC features

2023-03-22 Thread Aaron Lindsay

On Feb 22 12:14, Richard Henderson wrote:
> On 2/22/23 09:35, Aaron Lindsay wrote:
> > +static Property arm_cpu_pauth2_property =
> > +DEFINE_PROP_BOOL("pauth2", ARMCPU, prop_pauth2, false);
> > +static Property arm_cpu_pauth_fpac_property =
> > +DEFINE_PROP_BOOL("pauth-fpac", ARMCPU, prop_pauth_fpac, false);
> > +static Property arm_cpu_pauth_fpac_combine_property =
> > +DEFINE_PROP_BOOL("pauth-fpac-combine", ARMCPU, 
> > prop_pauth_fpac_combine, false);
> 
> For -cpu max, I would expect these to default on.
> Or perhaps not expose these or epac as properties at all.

I've removed these properties, and epac's as well. It now defaults to
the equivalent of prop_pauth_fpac_combine==true in my previous patch.

> I see that qarma3 does about half the work of qarma5, so it would be
> interesting to measure the relative speed of the 3 implementations on a boot
> of kernel + selftests.
> 
> You may want to look a the code generated and play with flatten and noinline
> attributes around pauth_computepac and subroutines.  E.g.
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_qarma5(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return pauth_computepac_architected(data, modifier, key, false);
> }
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_qarma3(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return pauth_computepac_architected(data, modifier, key, true);
> }
> 
> static uint64_t __attribute__((flatten, noinline))
> pauth_computepac_impdef(uint64_t data, uint64_t modifier, ARMPACKey key)
> {
> return qemu_xxhash64_4(data, modifier, key.lo, key.hi);
> }
> 
> static uint64_t pauth_computepac(CPUARMState *env, uint64_t data,
>  uint64_t modifier, ARMPACKey key)
> {
> if (cpu_isar_feature(aa64_pauth_arch_qarma5, env_archcpu(env))) {
> return pauth_computepac_qarma5(data, modifier, key);
> } else if (cpu_isar_feature(aa64_pauth_arch_qarma3, env_archcpu(env))) {
> return pauth_computepac_qarma3(data, modifier, key);
> } else {
> return pauth_computepac_impdef(data, modifier, key);
> }
> }

I have not played around with this further. Do you feel this is
important to look into prior to merging this patchset (since QARMA3
isn't the default)?

-Aaron

Re: [PATCH v2 6/7] target/arm: Implement v8.3 FPAC and FPACCOMBINE

2023-03-22 Thread Aaron Lindsay

On Feb 22 11:37, Richard Henderson wrote:
> On 2/22/23 09:35, Aaron Lindsay wrote:
> > @@ -406,6 +421,16 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t 
> > ptr, uint64_t modifier,
> >   uint64_t xor_mask = MAKE_64BIT_MASK(bot_bit, top_bit - bot_bit + 
> > 1) &
> >   ~MAKE_64BIT_MASK(55, 1);
> >   result = ((ptr ^ pac) & xor_mask) | (ptr & ~xor_mask);
> > +if (cpu_isar_feature(aa64_fpac_combine, env_archcpu(env)) ||
> > +(cpu_isar_feature(aa64_fpac, env_archcpu(env)) &&
> > + !is_combined)) {
> 
> Indentation is off.

I pulled `env_archcpu(env)` out of this if-statement in my latest
patchset in addition to the indentation, but am not confident I have
done what you intended. The QEMU Coding Style guide doesn't seem to
address longer statements like this in its section on indentation, so I
attempted to follow other examples in the code, but I'll take further
direction here.

-Aaron

[PATCH v3 4/8] target/arm: Implement v8.3 EnhancedPAC

2023-03-22 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
Reviewed-by: Peter Maydell 
Reviewed-by: Richard Henderson 
---
 target/arm/tcg/pauth_helper.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/target/arm/tcg/pauth_helper.c b/target/arm/tcg/pauth_helper.c
index 122c208de2..7682f139ef 100644
--- a/target/arm/tcg/pauth_helper.c
+++ b/target/arm/tcg/pauth_helper.c
@@ -326,6 +326,7 @@ static uint64_t pauth_computepac(CPUARMState *env, uint64_t 
data,
 static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
  ARMPACKey *key, bool data)
 {
+ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
 ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
 uint64_t pac, ext_ptr, ext, test;
@@ -351,11 +352,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
  */
 test = sextract64(ptr, bot_bit, top_bit - bot_bit);
 if (test != 0 && test != -1) {
-/*
- * Note that our top_bit is one greater than the pseudocode's
- * version, hence "- 2" here.
- */
-pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+if (cpu_isar_feature(aa64_pauth_epac, cpu)) {
+pac = 0;
+} else {
+/*
+ * Note that our top_bit is one greater than the pseudocode's
+ * version, hence "- 2" here.
+ */
+pac ^= MAKE_64BIT_MASK(top_bit - 2, 1);
+}
 }
 
 /*
-- 
2.25.1

[PATCH v3 1/8] target/arm: Add ID_AA64ISAR2_EL1

2023-03-22 Thread Aaron Lindsay

Signed-off-by: Aaron Lindsay 
---
 target/arm/cpu.h | 1 +
 target/arm/helper.c  | 4 ++--
 target/arm/hvf/hvf.c | 1 +
 target/arm/kvm64.c   | 2 ++
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c097cae988..f0f27f259d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1015,6 +1015,7 @@ struct ArchCPU {
 uint32_t dbgdevid1;
 uint64_t id_aa64isar0;
 uint64_t id_aa64isar1;
+uint64_t id_aa64isar2;
 uint64_t id_aa64pfr0;
 uint64_t id_aa64pfr1;
 uint64_t id_aa64mmfr0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 2297626bfb..32426495c0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8204,11 +8204,11 @@ void register_cp_regs_for_features(ARMCPU *cpu)
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
   .resetvalue = cpu->isar.id_aa64isar1 },
-{ .name = "ID_AA64ISAR2_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+{ .name = "ID_AA64ISAR2_EL1", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 2,
   .access = PL1_R, .type = ARM_CP_CONST,
   .accessfn = access_aa64_tid3,
-  .resetvalue = 0 },
+  .resetvalue = cpu->isar.id_aa64isar2 },
 { .name = "ID_AA64ISAR3_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
   .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 6, .opc2 = 3,
   .access = PL1_R, .type = ARM_CP_CONST,
diff --git a/target/arm/hvf/hvf.c b/target/arm/hvf/hvf.c
index ad65603445..4d7366b761 100644
--- a/target/arm/hvf/hvf.c
+++ b/target/arm/hvf/hvf.c
@@ -507,6 +507,7 @@ static bool 
hvf_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
 { HV_SYS_REG_ID_AA64DFR1_EL1, _isar.id_aa64dfr1 },
 { HV_SYS_REG_ID_AA64ISAR0_EL1, _isar.id_aa64isar0 },
 { HV_SYS_REG_ID_AA64ISAR1_EL1, _isar.id_aa64isar1 },
+{ HV_SYS_REG_ID_AA64ISAR2_EL1, _isar.id_aa64isar2 },
 { HV_SYS_REG_ID_AA64MMFR0_EL1, _isar.id_aa64mmfr0 },
 { HV_SYS_REG_ID_AA64MMFR1_EL1, _isar.id_aa64mmfr1 },
 { HV_SYS_REG_ID_AA64MMFR2_EL1, _isar.id_aa64mmfr2 },
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 1197253d12..4b71306f92 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -590,6 +590,8 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
   ARM64_SYS_REG(3, 0, 0, 6, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar1,
   ARM64_SYS_REG(3, 0, 0, 6, 1));
+err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar2,
+  ARM64_SYS_REG(3, 0, 0, 6, 2));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr0,
   ARM64_SYS_REG(3, 0, 0, 7, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr1,
-- 
2.25.1

1 2 3 4 >

1 - 100 of 306 matches

Mail list logo