[PATCH v3 1/2] hw/registerfields: add `FIELDx_1CLEAR()` macro

2022-10-16 Thread Wilfred Mallawa
From: Wilfred Mallawa 

Adds a helper macro that implements the register `w1c`
functionality.

Ex:
  uint32_t data = FIELD32_1CLEAR(val, REG, FIELD);

If ANY bits of the specified `FIELD` is set
then the respective field is cleared and returned to `data`.

If the field is cleared (0), then no change and
val is returned.

Signed-off-by: Wilfred Mallawa 
---
 include/hw/registerfields.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/include/hw/registerfields.h b/include/hw/registerfields.h
index 1330ca77de..0b8404c2f7 100644
--- a/include/hw/registerfields.h
+++ b/include/hw/registerfields.h
@@ -115,6 +115,28 @@
   R_ ## reg ## _ ## field ## _LENGTH, _v.v);  \
 _d; })
 
+/*
+ * Clear the specified field in storage if
+ * any field bits are set, else no changes made. Implements
+ * single/multi-bit `w1c`
+ *
+ */
+#define FIELD8_1CLEAR(storage, reg, field)\
+(FIELD_EX8(storage, reg, field) ? \
+FIELD_DP8(storage, reg, field, 0x00) : storage)
+
+#define FIELD16_1CLEAR(storage, reg, field)   \
+(FIELD_EX16(storage, reg, field) ?\
+FIELD_DP16(storage, reg, field, 0x00) : storage)
+
+#define FIELD32_1CLEAR(storage, reg, field)   \
+(FIELD_EX32(storage, reg, field) ?\
+FIELD_DP32(storage, reg, field, 0x00) : storage)
+
+#define FIELD64_1CLEAR(storage, reg, field)   \
+(FIELD_EX64(storage, reg, field) ?\
+FIELD_DP64(storage, reg, field, 0x00) : storage)
+
 #define FIELD_SDP8(storage, reg, field, val) ({   \
 struct {  \
 signed int v:R_ ## reg ## _ ## field ## _LENGTH;  \
-- 
2.37.3




[PATCH v3 2/2] hw/ssi/ibex_spi: implement `FIELD32_1CLEAR` macro

2022-10-16 Thread Wilfred Mallawa
From: Wilfred Mallawa 

use the `FIELD32_1CLEAR` macro to implement register
`rw1c` functionality to `ibex_spi`.

This change was tested by running the `SPI_HOST` from TockOS.

Signed-off-by: Wilfred Mallawa 
---
 hw/ssi/ibex_spi_host.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/hw/ssi/ibex_spi_host.c b/hw/ssi/ibex_spi_host.c
index 57df462e3c..0a456cd1ed 100644
--- a/hw/ssi/ibex_spi_host.c
+++ b/hw/ssi/ibex_spi_host.c
@@ -342,7 +342,7 @@ static void ibex_spi_host_write(void *opaque, hwaddr addr,
 {
 IbexSPIHostState *s = opaque;
 uint32_t val32 = val64;
-uint32_t shift_mask = 0xff, status = 0, data = 0;
+uint32_t shift_mask = 0xff, status = 0;
 uint8_t txqd_len;
 
 trace_ibex_spi_host_write(addr, size, val64);
@@ -355,12 +355,11 @@ static void ibex_spi_host_write(void *opaque, hwaddr addr,
 case IBEX_SPI_HOST_INTR_STATE:
 /* rw1c status register */
 if (FIELD_EX32(val32, INTR_STATE, ERROR)) {
-data = FIELD_DP32(data, INTR_STATE, ERROR, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], INTR_STATE, ERROR);
 }
 if (FIELD_EX32(val32, INTR_STATE, SPI_EVENT)) {
-data = FIELD_DP32(data, INTR_STATE, SPI_EVENT, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], INTR_STATE, 
SPI_EVENT);
 }
-s->regs[addr] = data;
 break;
 case IBEX_SPI_HOST_INTR_ENABLE:
 s->regs[addr] = val32;
@@ -505,27 +504,25 @@ static void ibex_spi_host_write(void *opaque, hwaddr addr,
  *  When an error occurs, the corresponding bit must be cleared
  *  here before issuing any further commands
  */
-status = s->regs[addr];
 /* rw1c status register */
 if (FIELD_EX32(val32, ERROR_STATUS, CMDBUSY)) {
-status = FIELD_DP32(status, ERROR_STATUS, CMDBUSY, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
CMDBUSY);
 }
 if (FIELD_EX32(val32, ERROR_STATUS, OVERFLOW)) {
-status = FIELD_DP32(status, ERROR_STATUS, OVERFLOW, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
OVERFLOW);
 }
 if (FIELD_EX32(val32, ERROR_STATUS, UNDERFLOW)) {
-status = FIELD_DP32(status, ERROR_STATUS, UNDERFLOW, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
UNDERFLOW);
 }
 if (FIELD_EX32(val32, ERROR_STATUS, CMDINVAL)) {
-status = FIELD_DP32(status, ERROR_STATUS, CMDINVAL, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
CMDINVAL);
 }
 if (FIELD_EX32(val32, ERROR_STATUS, CSIDINVAL)) {
-status = FIELD_DP32(status, ERROR_STATUS, CSIDINVAL, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
CSIDINVAL);
 }
 if (FIELD_EX32(val32, ERROR_STATUS, ACCESSINVAL)) {
-status = FIELD_DP32(status, ERROR_STATUS, ACCESSINVAL, 0);
+s->regs[addr] = FIELD32_1CLEAR(s->regs[addr], ERROR_STATUS, 
ACCESSINVAL);
 }
-s->regs[addr] = status;
 break;
 case IBEX_SPI_HOST_EVENT_ENABLE:
 /* Controls which classes of SPI events raise an interrupt. */
-- 
2.37.3




[PATCH v3 0/2] implement `FIELDx_1CLEAR() macro

2022-10-16 Thread Wilfred Mallawa
From: Wilfred Mallawa 

This patch series implements a `FIELDx_1CLEAR()` macro and implements it
in the `hw/ssi/ibex_spi.c` model.

*** Changelog ***
Since v2:
- change the macro arguments name to match
  the existing macros.
 (reg_val, reg, field) -> (storage, reg, field)

- Add the use of this macro to `ibex_spi`

Since v1: 
- Instead of needing all field bits to be set 
  we clear the field if any are set.
  If the field is 0/clear then no change.

Wilfred Mallawa (2):
  hw/registerfields: add `FIELDx_1CLEAR()` macro
  hw/ssi/ibex_spi:  implement `FIELD32_1CLEAR` macro

 hw/ssi/ibex_spi_host.c  | 21 +
 include/hw/registerfields.h | 22 ++
 2 files changed, 31 insertions(+), 12 deletions(-)

-- 
2.37.3




Re: [PATCH v4 2/3] block: introduce zone append write for zoned devices

2022-10-16 Thread Damien Le Moal
On 10/16/22 23:56, Sam Li wrote:
> A zone append command is a write operation that specifies the first
> logical block of a zone as the write position. When writing to a zoned
> block device using zone append, the byte offset of writes is pointing
> to the write pointer of that zone. Upon completion the device will
> respond with the position the data has been written in the zone.
> 
> Signed-off-by: Sam Li 
> ---
>  block/block-backend.c | 65 ++
>  block/file-posix.c| 89 +--
>  block/io.c| 21 
>  block/raw-format.c|  8 +++
>  include/block/block-io.h  |  3 ++
>  include/block/block_int-common.h  |  5 ++
>  include/block/raw-aio.h   |  4 +-
>  include/sysemu/block-backend-io.h |  9 
>  8 files changed, 198 insertions(+), 6 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 1c618e9c68..06931ddd24 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1439,6 +1439,9 @@ typedef struct BlkRwCo {
>  struct {
>  unsigned long op;
>  } zone_mgmt;
> +struct {
> +int64_t *append_sector;

As mentioned previosuly, call this sector. "append" is already in the
zone_append struct member name

> +} zone_append;
>  };
>  } BlkRwCo;
>  
> @@ -1871,6 +1874,47 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
> BlockZoneOp op,
>  return >common;
>  }
>  
> +static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = >rwco;
> +
> +rwco->ret = blk_co_zone_append(rwco->blk, 
> rwco->zone_append.append_sector,

...so you avoid awkward repetitions of "append" like here. You'll have:
rwco->zone_append.sector, which is shorter and more natural.

> +   rwco->iobuf, rwco->flags);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
> +QEMUIOVector *qiov, BdrvRequestFlags flags,
> +BlockCompletionFunc *cb, void *opaque) {
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.ret= NOT_DONE,
> +.flags  = flags,
> +.iobuf  = qiov,
> +.zone_append = {
> +.append_sector = offset,
> +},
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
>  /*
>   * Send a zone_report command.
>   * offset is a byte offset from the start of the device. No alignment
> @@ -1923,6 +1967,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, 
> BlockZoneOp op,
>  return ret;
>  }
>  
> +/*
> + * Send a zone_append command.
> + */
> +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
> +QEMUIOVector *qiov, BdrvRequestFlags flags)
> +{
> +int ret;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +blk_wait_while_drained(blk);
> +if (!blk_is_available(blk)) {
> +blk_dec_in_flight(blk);
> +return -ENOMEDIUM;
> +}
> +
> +ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
> +blk_dec_in_flight(blk);
> +return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 5ff5500301..3d0cc33d02 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -205,6 +205,7 @@ typedef struct RawPosixAIOData {
>  struct {
>  struct iovec *iov;
>  int niov;
> +int64_t *offset;
>  } io;
>  struct {
>  uint64_t cmd;
> @@ -1475,6 +1476,11 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  bs->bl.max_active_zones = ret;
>  }
>  
> +ret = get_sysfs_long_val(, "physical_block_size");
> +if (ret >= 0) {
> +bs->bl.write_granularity = ret;
> +}
> +
>  bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
>  if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
>  error_report("report wps failed");
> @@ -1647,9 +1653,18 @@ qemu_pwritev(int fd, const struct iovec *iov, int 
> nr_iov, off_t offset)
>  static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
>  {
>  ssize_t len;
> +

Re: [PATCH v4 1/3] file-posix: add the tracking of the zones write pointers

2022-10-16 Thread Damien Le Moal
On 10/16/22 23:56, Sam Li wrote:
> Since Linux doesn't have a user API to issue zone append operations to
> zoned devices from user space, the file-posix driver is modified to add
> zone append emulation using regular writes. To do this, the file-posix
> driver tracks the wp location of all zones of the device. It uses an
> array of uint64_t. The most significant bit of each wp location indicates
> if the zone type is conventional zones.
> 
> The zones wp can be changed due to the following operations issued:
> - zone reset: change the wp to the start offset of that zone
> - zone finish: change to the end location of that zone
> - write to a zone
> - zone append
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c   | 144 +++
>  include/block/block-common.h |  14 +++
>  include/block/block_int-common.h |   3 +
>  3 files changed, 161 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 7c5a330fc1..5ff5500301 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1324,6 +1324,66 @@ static int hdev_get_max_segments(int fd, struct stat 
> *st)
>  #endif
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +unsigned int nrz) {
> +struct blk_zone *blkz;
> +int64_t rep_size;
> +int64_t sector = offset >> BDRV_SECTOR_BITS;
> +int ret, n = 0, i = 0;
> +rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> blk_zone);
> +g_autofree struct blk_zone_report *rep = NULL;
> +
> +rep = g_malloc(rep_size);
> +blkz = (struct blk_zone *)(rep + 1);
> +while (n < nrz) {
> +memset(rep, 0, rep_size);
> +rep->sector = sector;
> +rep->nr_zones = nrz - n;
> +
> +do {
> +ret = ioctl(fd, BLKREPORTZONE, rep);
> +} while (ret != 0 && errno == EINTR);
> +if (ret != 0) {
> +error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +fd, offset, errno);
> +return -errno;
> +}
> +
> +if (!rep->nr_zones) {
> +break;
> +}
> +
> +for (i = 0; i < rep->nr_zones; i++, n++) {
> +/*
> + * The wp tracking cares only about sequential writes required 
> and
> + * sequential write preferred zones so that the wp can advance to
> + * the right location.
> + * Use the most significant bit of the wp location to indicate 
> the
> + * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
> + */
> +if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +wps->wp[i] = 1ULL << 63;
> +} else {
> +wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;

Nit: For full, read-only and offline zones, the wp of a zone is undefined,
that is, its value may be total garbage and should not be used. The kernel
will normally report a wp set to zone start + zone len for these cases,
but better do the same here too. So this single line should be something
like this:

switch (blkz[i].cond) {
case BLK_ZONE_COND_FULL:
case BLK_ZONE_COND_READONLY:
/* Zone not writeable */
wps->wp[i] = (blkz[i].start + blkz[i].len) << BDRV_SECTOR_BITS;
break;
case BLK_ZONE_COND_OFFLINE:
/* Zone not writable nor readable */
wps->wp[i] = blkz[i].start << BDRV_SECTOR_BITS;
break;
default:
wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
break;
}

> +}
> +}
> +sector = blkz[i - 1].start + blkz[i - 1].len;
> +}
> +
> +return 0;
> +}
> +
> +static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +unsigned int nrz) {
> +qemu_mutex_lock(>lock);
> +if (get_zones_wp(fd, wps, offset, nrz) < 0) {
> +error_report("update zone wp failed");
> +}
> +qemu_mutex_unlock(>lock);
> +}
> +#endif
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
> @@ -1414,6 +1474,14 @@ static void raw_refresh_limits(BlockDriverState *bs, 
> Error **errp)
>  if (ret >= 0) {
>  bs->bl.max_active_zones = ret;
>  }
> +
> +bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
> +if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
> +error_report("report wps failed");
> +g_free(bs->bl.wps);
> +return;
> +}
> +qemu_mutex_init(>bl.wps->lock);
>  }
>  }
>  
> @@ -1725,6 +1793,25 @@ static int handle_aiocb_rw(void *opaque)
>  
>  out:
>  if (nbytes == aiocb->aio_nbytes) {
> +#if defined(CONFIG_BLKZONED)
> +if (aiocb->aio_type & QEMU_AIO_WRITE) {
> +BlockZoneWps *wps = aiocb->bs->bl.wps;
> +int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
> +if (wps) {
> +

Re: [PATCH v12 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-10-16 Thread Damien Le Moal
On 10/16/22 23:51, Sam Li wrote:
> Add a new zoned_host_device BlockDriver. The zoned_host_device option
> accepts only zoned host block devices. By adding zone management
> operations in this new BlockDriver, users can use the new block
> layer APIs including Report Zone and four zone management operations
> (open, close, finish, reset, reset_all).
> 
> Qemu-io uses the new APIs to perform zoned storage commands of the device:
> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> zone_finish(zf).
> 
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts -n driver=zoned_host_device, 
> filename=/dev/nullb0
> -c "zrp offset nr_zones"
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c | 148 +
>  block/file-posix.c| 335 ++
>  block/io.c|  41 
>  include/block/block-io.h  |   7 +
>  include/block/block_int-common.h  |  24 +++
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |  18 ++
>  meson.build   |   4 +
>  qapi/block-core.json  |   8 +-
>  qemu-io-cmds.c| 149 +
>  10 files changed, 737 insertions(+), 3 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index aa4adf06ae..1c618e9c68 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
>  void *iobuf;
>  int ret;
>  BdrvRequestFlags flags;
> +union {
> +struct {
> +unsigned int *nr_zones;
> +BlockZoneDescriptor *zones;
> +} zone_report;
> +struct {
> +unsigned long op;
> +} zone_mgmt;
> +};
>  } BlkRwCo;
>  
>  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> @@ -1775,6 +1784,145 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>  
> +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = >rwco;
> +
> +rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> +   rwco->zone_report.nr_zones,
> +   rwco->zone_report.zones);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> +unsigned int *nr_zones,
> +BlockZoneDescriptor  *zones,
> +BlockCompletionFunc *cb, void *opaque)
> +{
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_report = {
> +.zones = zones,
> +.nr_zones = nr_zones,
> +},
> +};
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> +{
> +BlkAioEmAIOCB *acb = opaque;
> +BlkRwCo *rwco = >rwco;
> +
> +rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> + rwco->offset, acb->bytes);
> +blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +  int64_t offset, int64_t len,
> +  BlockCompletionFunc *cb, void *opaque) {
> +BlkAioEmAIOCB *acb;
> +Coroutine *co;
> +IO_CODE();
> +
> +blk_inc_in_flight(blk);
> +acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +acb->rwco = (BlkRwCo) {
> +.blk= blk,
> +.offset = offset,
> +.ret= NOT_DONE,
> +.zone_mgmt = {
> +.op = op,
> +},
> +};
> +acb->bytes = len;
> +acb->has_returned = false;
> +
> +co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> +bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +acb->has_returned = true;
> +if (acb->rwco.ret != NOT_DONE) {
> +replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +}
> +
> +return >common;
> +}
> +
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.
> + */
> +int 

Re: [PULL 0/2] M68k for 7.2 patches

2022-10-16 Thread Jason A. Donenfeld
On Sun, Oct 16, 2022 at 03:50:54PM -0400, Stefan Hajnoczi wrote:
> On Fri, 14 Oct 2022 at 03:26, Laurent Vivier  wrote:
> >
> > The following changes since commit f1d33f55c47dfdaf8daacd618588ad3ae4c452d1:
> >
> >   Merge tag 'pull-testing-gdbstub-plugins-gitdm-061022-3' of 
> > https://github.com/stsquad/qemu into staging (2022-10-06 07:11:56 -0400)
> >
> > are available in the Git repository at:
> >
> >   https://github.com/vivier/qemu-m68k.git tags/m68k-for-7.2-pull-request
> >
> > for you to fetch changes up to fa327be58280f76d2565ff0bdb9b0010ac97c3b0:
> >
> >   m68k: write bootinfo as rom section and re-randomize on reboot 
> > (2022-10-11 23:02:46 +0200)
> >
> > 
> > Pull request m68k branch 20221014
> >
> > Update rng seed boot parameter
> >
> > 
> >
> > Jason A. Donenfeld (2):
> >   m68k: rework BI_VIRT_RNG_SEED as BI_RNG_SEED
> >   m68k: write bootinfo as rom section and re-randomize on reboot
> 
> This commit breaks mingw64 due to the Windows LLP64 data model where
> pointers don't fit into unsigned long
> (https://en.wikipedia.org/wiki/LP64#64-bit_data_models). Please use
> uintptr_t instead of unsigned long:

Holy smokes; I didn't realize that qemu was ever compiled this way.

Laurent - do you want me to send you a follow-up commit fixing that, a
new commit fixing that, or do you want to adjust the current commit
yourself? Any choice is fine with me.

Jason

> 
> x86_64-w64-mingw32-gcc -m64 -mcx16 -Ilibqemu-m68k-softmmu.fa.p -I.
> -I.. -Itarget/m68k -I../target/m68k -Iqapi -Itrace -Iui -Iui/shader
> -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/pixman-1
> -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0
> -I/usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include
> -fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g
> -iquote . -iquote /builds/qemu-project/qemu -iquote
> /builds/qemu-project/qemu/include -iquote
> /builds/qemu-project/qemu/tcg/i386 -mms-bitfields -U_FORTIFY_SOURCE
> -D_FORTIFY_SOURCE=2 -fno-pie -no-pie -D_GNU_SOURCE
> -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes
> -Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes
> -fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration
> -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k
> -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs
> -Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2
> -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi
> -fstack-protector-strong -DNEED_CPU_H
> '-DCONFIG_TARGET="m68k-softmmu-config-target.h"'
> '-DCONFIG_DEVICES="m68k-softmmu-config-devices.h"' -MD -MQ
> libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj -MF
> libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj.d -o
> libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj -c ../hw/m68k/virt.c
> In file included from ../hw/m68k/virt.c:23:
> ../hw/m68k/virt.c: In function 'virt_init':
> ../hw/m68k/bootinfo.h:58:26: error: cast from pointer to integer of
> different size [-Werror=pointer-to-int-cast]
> 58 | base = (void *)(((unsigned long)base + 3) & ~3); \
> | ^
> ../hw/m68k/virt.c:261:13: note: in expansion of macro 'BOOTINFOSTR'
> 261 | BOOTINFOSTR(param_ptr, BI_COMMAND_LINE,
> | ^~~
> ../hw/m68k/bootinfo.h:58:16: error: cast to pointer from integer of
> different size [-Werror=int-to-pointer-cast]
> 58 | base = (void *)(((unsigned long)base + 3) & ~3); \
> | ^
> ../hw/m68k/virt.c:261:13: note: in expansion of macro 'BOOTINFOSTR'
> 261 | BOOTINFOSTR(param_ptr, BI_COMMAND_LINE,
> | ^~~
> ../hw/m68k/bootinfo.h:75:26: error: cast from pointer to integer of
> different size [-Werror=pointer-to-int-cast]
> 75 | base = (void *)(((unsigned long)base + 3) & ~3); \
> | ^
> ../hw/m68k/virt.c:268:9: note: in expansion of macro 'BOOTINFODATA'
> 268 | BOOTINFODATA(param_ptr, BI_RNG_SEED,
> | ^~~~
> ../hw/m68k/bootinfo.h:75:16: error: cast to pointer from integer of
> different size [-Werror=int-to-pointer-cast]
> 75 | base = (void *)(((unsigned long)base + 3) & ~3); \
> | ^
> ../hw/m68k/virt.c:268:9: note: in expansion of macro 'BOOTINFODATA'
> 268 | BOOTINFODATA(param_ptr, BI_RNG_SEED,
> | ^~~~
> cc1: all warnings being treated as errors
> 
> https://gitlab.com/qemu-project/qemu/-/jobs/3179717070
> 
> >
> >  hw/m68k/bootinfo.h| 48 ++--
> >  .../standard-headers/asm-m68k/bootinfo-virt.h |  4 +-
> >  include/standard-headers/asm-m68k/bootinfo.h  |  8 +-
> >  hw/m68k/q800.c| 76 ++-
> >  hw/m68k/virt.c| 57 +-
> >  5 files changed, 130 insertions(+), 63 deletions(-)
> >
> > --
> > 2.37.3
> >
> >
> 



RE: [PATCH] migration: Fix a potential guest memory corruption

2022-10-16 Thread Duan, Zhenzhong



>-Original Message-
>From: Dr. David Alan Gilbert 
>Sent: Tuesday, October 11, 2022 7:06 PM
>To: Duan, Zhenzhong 
>Cc: qemu-devel@nongnu.org; quint...@redhat.com
>Subject: Re: [PATCH] migration: Fix a potential guest memory corruption
>
>* Zhenzhong Duan (zhenzhong.d...@intel.com) wrote:
>
>Hi,
Hi,
Sorry for late response. Just back from vacation.

>
>> Imagine a rare case, after a dirty page is sent to compression
>> threads's ring, dirty bitmap sync trigger right away and mark the same
>> page dirty again and sent. Then the new page may be overwriten by
>> stale page in compression threads's ring in the destination.
>
>Yes, I think we had a similar problem in multifd.
Multifd flush operation multifd_send_sync_main() is called in each memory 
iteration
which is more aggressive than in compression. I think not an issue in multifd?

>
>> So we need to ensure there is only one copy of the same dirty page
>> either by flushing the ring after every bitmap sync or avoiding
>> processing same dirty page continuously.
>>
>> I choose the 2nd which avoids the time consuming flush operation.
>
>I'm not sure this guarantees it; it makes it much less likely but if only a few
>pages are dirtied and you have lots of threads, I think the same thing could
>still happy.
I didn't get it, imagine there are dirty page "A B C D E F G" in current 
RAMBLOCK.
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to "A B D E F G"
3. Page D is checked and sent to compression thread again, so there may be two 
copy of page D in compression thread, corruption!
4. Page "E F G" are sent to compression thread.
5. flush operation triggered at end of current RAMBLOCK.
6. In next iteration, page "A B" are sent.

After patch:
1. Page "A B C D" are sent to compression thread.
2. dirty page sync triggers, update dirty map to " A B D E F G"
3. Page after page D are checked and sent to compression thread which are Page 
"E F G".
5. flush operation triggered at end of current RAMBLOCK, ensures page D flushed.
6. In next iteration, page "A B D" are sent.

Thanks
Zhenzhong
>
>I think you're going to need to flush the ring after each sync.
>
>Dave
>
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  migration/ram.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/migration/ram.c b/migration/ram.c index
>> dc1de9ddbc68..67b2035586bd 100644
>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -1551,7 +1551,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>>  pss->postcopy_requested = false;
>>  pss->postcopy_target_channel = RAM_CHANNEL_PRECOPY;
>>
>> -pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page);
>> +pss->page = migration_bitmap_find_dirty(rs, pss->block, pss->page
>> + + 1);
>>  if (pss->complete_round && pss->block == rs->last_seen_block &&
>>  pss->page >= rs->last_page) {
>>  /*
>> @@ -1564,7 +1564,7 @@ static bool find_dirty_block(RAMState *rs,
>PageSearchStatus *pss, bool *again)
>>  if (!offset_in_ramblock(pss->block,
>>  ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
>>  /* Didn't find anything in this RAM Block */
>> -pss->page = 0;
>> +pss->page = -1;
>>  pss->block = QLIST_NEXT_RCU(pss->block, next);
>>  if (!pss->block) {
>>  /*
>> @@ -2694,7 +2694,7 @@ static void ram_state_reset(RAMState *rs)  {
>>  rs->last_seen_block = NULL;
>>  rs->last_sent_block = NULL;
>> -rs->last_page = 0;
>> +rs->last_page = -1;
>>  rs->last_version = ram_list.version;
>>  rs->xbzrle_enabled = false;
>>  postcopy_preempt_reset(rs);
>> @@ -2889,7 +2889,7 @@ void
>ram_postcopy_send_discard_bitmap(MigrationState *ms)
>>  /* Easiest way to make sure we don't resume in the middle of a host-page
>*/
>>  rs->last_seen_block = NULL;
>>  rs->last_sent_block = NULL;
>> -rs->last_page = 0;
>> +rs->last_page = -1;
>>
>>  postcopy_each_ram_send_discard(ms);
>>
>> --
>> 2.25.1
>>
>--
>Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




Re: [PATCH] qemu-config: extract same logic in *_add_opts() to fill_config_groups()

2022-10-16 Thread Wang, Lei
Kindly ping for any comments.

BR,
Lei

On 9/2/2022 3:57 PM, Markus Armbruster wrote:
> Cc: Gerd & Kevin, because they were involved with the code that gets
> refactored here, and no good deed shall go unpunished.
> 
> "Wang, Lei"  writes:
> 
>> QEMU use qemu_add_opts() and qemu_add_drive_opts() to add config options
>> when initialization. Extract the same logic in both functions to a
>> seperate function fill_config_groups() to reduce code redundency.
>>
>> Signed-off-by: Wang, Lei 
>> ---
>>  util/qemu-config.c | 39 ---
>>  1 file changed, 20 insertions(+), 19 deletions(-)
>>
>> diff --git a/util/qemu-config.c b/util/qemu-config.c
>> index 433488aa56..3a1c85223a 100644
>> --- a/util/qemu-config.c
>> +++ b/util/qemu-config.c
>> @@ -282,36 +282,37 @@ QemuOptsList *qemu_find_opts_err(const char *group, 
>> Error **errp)
>>  return find_list(vm_config_groups, group, errp);
>>  }
>>  
>> -void qemu_add_drive_opts(QemuOptsList *list)
>> +static int fill_config_groups(QemuOptsList *groups[], int entries,
>> +  QemuOptsList *list)
>>  {
>> -int entries, i;
>> +int i;
>>  
>> -entries = ARRAY_SIZE(drive_config_groups);
>>  entries--; /* keep list NULL terminated */
>>  for (i = 0; i < entries; i++) {
>> -if (drive_config_groups[i] == NULL) {
>> -drive_config_groups[i] = list;
>> -return;
>> +if (groups[i] == NULL) {
>> +groups[i] = list;
>> +return 0;
>>  }
>>  }
>> -fprintf(stderr, "ran out of space in drive_config_groups");
>> -abort();
>> +return -1;
>>  }
>>  
>> -void qemu_add_opts(QemuOptsList *list)
>> +void qemu_add_drive_opts(QemuOptsList *list)
>>  {
>> -int entries, i;
>> +if (fill_config_groups(drive_config_groups, 
>> ARRAY_SIZE(drive_config_groups),
>> +   list) < 0) {
>> +fprintf(stderr, "ran out of space in drive_config_groups");
>> +abort();
>> +}
>> +}
>>  
>> -entries = ARRAY_SIZE(vm_config_groups);
>> -entries--; /* keep list NULL terminated */
>> -for (i = 0; i < entries; i++) {
>> -if (vm_config_groups[i] == NULL) {
>> -vm_config_groups[i] = list;
>> -return;
>> -}
>> +void qemu_add_opts(QemuOptsList *list)
>> +{
>> +if (fill_config_groups(vm_config_groups, ARRAY_SIZE(vm_config_groups),
>> +   list) < 0) {
>> +fprintf(stderr, "ran out of space in vm_config_groups");
>> +abort();
>>  }
>> -fprintf(stderr, "ran out of space in vm_config_groups");
>> -abort();
>>  }
>>  
>>  /* Returns number of config groups on success, -errno on error */
> 



Re: [RFC v3 1/2] include: update virtio_blk headers from Linux 5.19-rc2+

2022-10-16 Thread Damien Le Moal
On 10/17/22 09:53, Dmitry Fomichev wrote:
> On Sun, 2022-10-16 at 23:05 +0800, Sam Li wrote:
>> Use scripts/update-linux-headers.sh to update virtio-blk headers
>> from Dmitry's "virtio-blk:add support for zoned block devices"
>> linux patch. There is a link for more information:
>> https://github.com/dmitry-fomichev/virtblk-zbd
>>
>> Signed-off-by: Sam Li 
>> Reviewed-by: Stefan Hajnoczi 
>> Signed-off-by: Sam Li 
> 
> the duplicate sign-off is not needed. With this,
> 
> Reviewed-by: Dmitry Fomichev 

The mention of the linux kernel version should be removed from the patch
title as the changes are not included in any upstream kernel yet.

> 
>> ---
>>  include/standard-headers/linux/virtio_blk.h | 109 
>>  1 file changed, 109 insertions(+)
>>
>> diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-
>> headers/linux/virtio_blk.h
>> index 2dcc90826a..490bd21c76 100644
>> --- a/include/standard-headers/linux/virtio_blk.h
>> +++ b/include/standard-headers/linux/virtio_blk.h
>> @@ -40,6 +40,7 @@
>>  #define VIRTIO_BLK_F_MQ12  /* support more than one vq 
>> */
>>  #define VIRTIO_BLK_F_DISCARD   13  /* DISCARD is supported */
>>  #define VIRTIO_BLK_F_WRITE_ZEROES  14  /* WRITE ZEROES is supported 
>> */
>> +#define VIRTIO_BLK_F_ZONED 17  /* Zoned block device */
>>  
>>  /* Legacy feature bits */
>>  #ifndef VIRTIO_BLK_NO_LEGACY
>> @@ -119,6 +120,20 @@ struct virtio_blk_config {
>> uint8_t write_zeroes_may_unmap;
>>  
>> uint8_t unused1[3];
>> +
>> +   /* Secure erase fields that are defined in the virtio spec */
>> +   uint8_t sec_erase[12];
>> +
>> +   /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
>> +   struct virtio_blk_zoned_characteristics {
>> +   __virtio32 zone_sectors;
>> +   __virtio32 max_open_zones;
>> +   __virtio32 max_active_zones;
>> +   __virtio32 max_append_sectors;
>> +   __virtio32 write_granularity;
>> +   uint8_t model;
>> +   uint8_t unused2[3];
>> +   } zoned;
>>  } QEMU_PACKED;
>>  
>>  /*
>> @@ -153,6 +168,27 @@ struct virtio_blk_config {
>>  /* Write zeroes command */
>>  #define VIRTIO_BLK_T_WRITE_ZEROES  13
>>  
>> +/* Zone append command */
>> +#define VIRTIO_BLK_T_ZONE_APPEND    15
>> +
>> +/* Report zones command */
>> +#define VIRTIO_BLK_T_ZONE_REPORT    16
>> +
>> +/* Open zone command */
>> +#define VIRTIO_BLK_T_ZONE_OPEN  18
>> +
>> +/* Close zone command */
>> +#define VIRTIO_BLK_T_ZONE_CLOSE 20
>> +
>> +/* Finish zone command */
>> +#define VIRTIO_BLK_T_ZONE_FINISH    22
>> +
>> +/* Reset zone command */
>> +#define VIRTIO_BLK_T_ZONE_RESET 24
>> +
>> +/* Reset All zones command */
>> +#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
>> +
>>  #ifndef VIRTIO_BLK_NO_LEGACY
>>  /* Barrier before this op. */
>>  #define VIRTIO_BLK_T_BARRIER   0x8000
>> @@ -172,6 +208,72 @@ struct virtio_blk_outhdr {
>> __virtio64 sector;
>>  };
>>  
>> +/*
>> + * Supported zoned device models.
>> + */
>> +
>> +/* Regular block device */
>> +#define VIRTIO_BLK_Z_NONE  0
>> +/* Host-managed zoned device */
>> +#define VIRTIO_BLK_Z_HM    1
>> +/* Host-aware zoned device */
>> +#define VIRTIO_BLK_Z_HA    2
>> +
>> +/*
>> + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
>> + */
>> +struct virtio_blk_zone_descriptor {
>> +   /* Zone capacity */
>> +   __virtio64 z_cap;
>> +   /* The starting sector of the zone */
>> +   __virtio64 z_start;
>> +   /* Zone write pointer position in sectors */
>> +   __virtio64 z_wp;
>> +   /* Zone type */
>> +   uint8_t z_type;
>> +   /* Zone state */
>> +   uint8_t z_state;
>> +   uint8_t reserved[38];
>> +};
>> +
>> +struct virtio_blk_zone_report {
>> +   __virtio64 nr_zones;
>> +   uint8_t reserved[56];
>> +   struct virtio_blk_zone_descriptor zones[];
>> +};
>> +
>> +/*
>> + * Supported zone types.
>> + */
>> +
>> +/* Conventional zone */
>> +#define VIRTIO_BLK_ZT_CONV 1
>> +/* Sequential Write Required zone */
>> +#define VIRTIO_BLK_ZT_SWR  2
>> +/* Sequential Write Preferred zone */
>> +#define VIRTIO_BLK_ZT_SWP  3
>> +
>> +/*
>> + * Zone states that are available for zones of all types.
>> + */
>> +
>> +/* Not a write pointer (conventional zones only) */
>> +#define VIRTIO_BLK_ZS_NOT_WP   0
>> +/* Empty */
>> +#define VIRTIO_BLK_ZS_EMPTY    1
>> +/* Implicitly Open */
>> +#define VIRTIO_BLK_ZS_IOPEN    2
>> +/* Explicitly Open */
>> +#define VIRTIO_BLK_ZS_EOPEN    3
>> +/* Closed */
>> +#define VIRTIO_BLK_ZS_CLOSED   4
>> +/* Read-Only */
>> +#define VIRTIO_BLK_ZS_RDONLY   13
>> +/* Full */
>> +#define VIRTIO_BLK_ZS_FULL 14
>> +/* Offline */
>> +#define VIRTIO_BLK_ZS_OFFLINE  15
>> +
>>  /* Unmap this range (only valid for write zeroes command) */
>>  #define 

[PATCH] tcg/aarch64: Remove unused code in tcg_out_op

2022-10-16 Thread Qi Hu
AArch64 defines the TCG_TARGET_HAS_direct_jump. So the "else" block is
useless in the case of "INDEX_op_goto_tb" in function "tcg_out_op". Add
an assertion and delete these codes for clarity.

Suggested-by: WANG Xuerui 
Signed-off-by: Qi Hu 
---
 tcg/aarch64/tcg-target.c.inc | 31 ++-
 1 file changed, 14 insertions(+), 17 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c.inc b/tcg/aarch64/tcg-target.c.inc
index d997f7922a..344b63e20f 100644
--- a/tcg/aarch64/tcg-target.c.inc
+++ b/tcg/aarch64/tcg-target.c.inc
@@ -1916,24 +1916,21 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 break;
 
 case INDEX_op_goto_tb:
-if (s->tb_jmp_insn_offset != NULL) {
-/* TCG_TARGET_HAS_direct_jump */
-/* Ensure that ADRP+ADD are 8-byte aligned so that an atomic
-   write can be used to patch the target address. */
-if ((uintptr_t)s->code_ptr & 7) {
-tcg_out32(s, NOP);
-}
-s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
-/* actual branch destination will be patched by
-   tb_target_set_jmp_target later. */
-tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
-tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, 
TCG_REG_TMP, 0);
-} else {
-/* !TCG_TARGET_HAS_direct_jump */
-tcg_debug_assert(s->tb_jmp_target_addr != NULL);
-intptr_t offset = tcg_pcrel_diff(s, (s->tb_jmp_target_addr + a0)) 
>> 2;
-tcg_out_insn(s, 3305, LDR, offset, TCG_REG_TMP);
+tcg_debug_assert(s->tb_jmp_insn_offset != NULL);
+/*
+ * Ensure that ADRP+ADD are 8-byte aligned so that an atomic
+ * write can be used to patch the target address.
+ */
+if ((uintptr_t)s->code_ptr & 7) {
+tcg_out32(s, NOP);
 }
+s->tb_jmp_insn_offset[a0] = tcg_current_code_size(s);
+/*
+ * actual branch destination will be patched by
+ * tb_target_set_jmp_target later
+ */
+tcg_out_insn(s, 3406, ADRP, TCG_REG_TMP, 0);
+tcg_out_insn(s, 3401, ADDI, TCG_TYPE_I64, TCG_REG_TMP, TCG_REG_TMP, 0);
 tcg_out_insn(s, 3207, BR, TCG_REG_TMP);
 set_jmp_reset_offset(s, a0);
 break;
-- 
2.38.0




Re: [PATCH v11 3/5] target/riscv: generate virtual instruction exception

2022-10-16 Thread weiwei



On 2022/10/16 20:47, Mayuresh Chitale wrote:

This patch adds a mechanism to generate a virtual instruction
instruction exception instead of an illegal instruction exception
during instruction decode when virt is enabled.

Signed-off-by: Mayuresh Chitale 
---
  target/riscv/translate.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index db123da5ec..8b0bd38bb2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -76,6 +76,7 @@ typedef struct DisasContext {
 to reset this known value.  */
  int frm;
  RISCVMXL ol;
+bool virt_inst_excp;
  bool virt_enabled;
  const RISCVCPUConfig *cfg_ptr;
  bool hlsx;
@@ -243,7 +244,11 @@ static void gen_exception_illegal(DisasContext *ctx)
  {
  tcg_gen_st_i32(tcg_constant_i32(ctx->opcode), cpu_env,
 offsetof(CPURISCVState, bins));
-generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
+if (ctx->virt_inst_excp) {
+generate_exception(ctx, RISCV_EXCP_VIRT_INSTRUCTION_FAULT);
+} else {
+generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
+}
  }
  
  static void gen_exception_inst_addr_mis(DisasContext *ctx)

@@ -1062,6 +1067,7 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
  { has_XVentanaCondOps_p,  decode_XVentanaCodeOps },
  };
  
+ctx->virt_inst_excp = false;

  /* Check for compressed insn */
  if (insn_len(opcode) == 2) {
  if (!has_ext(ctx, RVC)) {

Reviewed-by: Weiwei Li 
Regards,
Weiwei Li




Re: [PATCH v4 2/3] block: introduce zone append write for zoned devices

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:56 +0800, Sam Li wrote:
> A zone append command is a write operation that specifies the first
> logical block of a zone as the write position. When writing to a zoned
> block device using zone append, the byte offset of writes is pointing
> to the write pointer of that zone. Upon completion the device will
> respond with the position the data has been written in the zone.
> 
> Signed-off-by: Sam Li 
> ---
>  block/block-backend.c | 65 ++
>  block/file-posix.c    | 89 +--
>  block/io.c    | 21 
>  block/raw-format.c    |  8 +++
>  include/block/block-io.h  |  3 ++
>  include/block/block_int-common.h  |  5 ++
>  include/block/raw-aio.h   |  4 +-
>  include/sysemu/block-backend-io.h |  9 
>  8 files changed, 198 insertions(+), 6 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 1c618e9c68..06931ddd24 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1439,6 +1439,9 @@ typedef struct BlkRwCo {
>  struct {
>  unsigned long op;
>  } zone_mgmt;
> +    struct {
> +    int64_t *append_sector;
> +    } zone_append;
>  };
>  } BlkRwCo;
>  
> @@ -1871,6 +1874,47 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk,
> BlockZoneOp op,
>  return >common;
>  }
>  
> +static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
> +{
> +    BlkAioEmAIOCB *acb = opaque;
> +    BlkRwCo *rwco = >rwco;
> +
> +    rwco->ret = blk_co_zone_append(rwco->blk, 
> rwco->zone_append.append_sector,
> +   rwco->iobuf, rwco->flags);
> +    blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
> +    QEMUIOVector *qiov, BdrvRequestFlags flags,
> +    BlockCompletionFunc *cb, void *opaque) {
> +    BlkAioEmAIOCB *acb;
> +    Coroutine *co;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +    acb->rwco = (BlkRwCo) {
> +    .blk    = blk,
> +    .ret    = NOT_DONE,
> +    .flags  = flags,
> +    .iobuf  = qiov,
> +    .zone_append = {
> +    .append_sector = offset,
> +    },
> +    };
> +    acb->has_returned = false;
> +
> +    co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
> +    bdrv_coroutine_enter(blk_bs(blk), co);
> +    acb->has_returned = true;
> +    if (acb->rwco.ret != NOT_DONE) {
> +    replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +    }
> +
> +    return >common;
> +}
> +
>  /*
>   * Send a zone_report command.
>   * offset is a byte offset from the start of the device. No alignment
> @@ -1923,6 +1967,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk,
> BlockZoneOp op,
>  return ret;
>  }
>  
> +/*
> + * Send a zone_append command.
> + */
> +int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
> +    QEMUIOVector *qiov, BdrvRequestFlags flags)
> +{
> +    int ret;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    blk_wait_while_drained(blk);
> +    if (!blk_is_available(blk)) {
> +    blk_dec_in_flight(blk);
> +    return -ENOMEDIUM;
> +    }
> +
> +    ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
> +    blk_dec_in_flight(blk);
> +    return ret;
> +}
> +
>  void blk_drain(BlockBackend *blk)
>  {
>  BlockDriverState *bs = blk_bs(blk);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 5ff5500301..3d0cc33d02 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -205,6 +205,7 @@ typedef struct RawPosixAIOData {
>  struct {
>  struct iovec *iov;
>  int niov;
> +    int64_t *offset;
>  } io;
>  struct {
>  uint64_t cmd;
> @@ -1475,6 +1476,11 @@ static void raw_refresh_limits(BlockDriverState *bs,
> Error **errp)
>  bs->bl.max_active_zones = ret;
>  }
>  
> +    ret = get_sysfs_long_val(, "physical_block_size");
> +    if (ret >= 0) {
> +    bs->bl.write_granularity = ret;
> +    }
> +
>  bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
>  if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
>  error_report("report wps failed");
> @@ -1647,9 +1653,18 @@ qemu_pwritev(int fd, const struct iovec *iov, int
> nr_iov, off_t offset)
>  static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
>  {
>  ssize_t len;
> +    BlockZoneWps *wps = aiocb->bs->bl.wps;
> +    int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;

Can this code ever be called for a non-zoned device with 0 zone size?
If yes, you need to avoid division by zero here...

> +
> +    if 

Re: [RFC v3 2/2] virtio-blk: add zoned storage emulation for zoned devices

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 23:05 +0800, Sam Li wrote:
> This patch extends virtio-blk emulation to handle zoned device commands
> by calling the new block layer APIs to perform zoned device I/O on
> behalf of the guest. It supports Report Zone, four zone oparations (open,
> close, finish, reset), and Append Zone.
> 
> The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
> support zoned block devices. Regular block devices(conventional zones)
> will not be set.
> 
> Then the guest os can use blkzone(8) to test those commands on zoned devices.
> Furthermore, using zonefs to test zone append write is also supported.
> 
> Signed-off-by: Sam Li 
> ---
>  hw/block/virtio-blk-common.c   |   2 +
>  hw/block/virtio-blk.c  | 412 -
>  include/hw/virtio/virtio-blk.h |  11 +-
>  3 files changed, 422 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
> index ac52d7c176..e2f8e2f6da 100644
> --- a/hw/block/virtio-blk-common.c
> +++ b/hw/block/virtio-blk-common.c
> @@ -29,6 +29,8 @@ static const VirtIOFeature feature_sizes[] = {
>   .end = endof(struct virtio_blk_config, discard_sector_alignment)},
>  {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
>   .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
> +    {.flags = 1ULL << VIRTIO_BLK_F_ZONED,
> + .end = endof(struct virtio_blk_config, zoned)},
>  {}
>  };
>  
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 8131ec2dbc..58891aea31 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -26,6 +26,9 @@
>  #include "hw/virtio/virtio-blk.h"
>  #include "dataplane/virtio-blk.h"
>  #include "scsi/constants.h"
> +#if defined(CONFIG_BLKZONED)
> +#include 
> +#endif
>  #ifdef __linux__
>  # include 
>  #endif
> @@ -55,10 +58,29 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req,
> unsigned char status)
>  {
>  VirtIOBlock *s = req->dev;
>  VirtIODevice *vdev = VIRTIO_DEVICE(s);
> +    int64_t inhdr_len, n;
> +    void *buf;
>  
>  trace_virtio_blk_req_complete(vdev, req, status);
>  
> -    stb_p(>in->status, status);
> +    iov_discard_undo(>inhdr_undo);
> +    if (virtio_ldl_p(vdev, >out.type) == VIRTIO_BLK_T_ZONE_APPEND) {
> +    inhdr_len = sizeof(struct virtio_blk_zone_append_inhdr);
> +    req->in.in_hdr->status = status;
> +    buf = req->in.in_hdr;
> +    } else {
> +    inhdr_len = sizeof(struct virtio_blk_inhdr);
> +    req->in.zone_append_inhdr->status = status;
> +    buf = req->in.zone_append_inhdr;
> +    }
> +
> +    n = iov_from_buf(req->elem.in_sg, req->elem.in_num,
> + req->in_len - inhdr_len, buf, inhdr_len);
> +    if (n != inhdr_len) {
> +    virtio_error(vdev, "Driver provided input buffer less than size of "
> + "in header");
> +    }
> +
>  iov_discard_undo(>inhdr_undo);
>  iov_discard_undo(>outhdr_undo);
>  virtqueue_push(req->vq, >elem, req->in_len);
> @@ -592,6 +614,334 @@ err:
>  return err_status;
>  }
>  
> +typedef struct ZoneCmdData {
> +    VirtIOBlockReq *req;
> +    union {
> +    struct {
> +    unsigned int nr_zones;
> +    BlockZoneDescriptor *zones;
> +    } zone_report_data;
> +    struct {
> +    int64_t offset;
> +    } zone_append_data;
> +    };
> +} ZoneCmdData;
> +
> +/*
> + * check zoned_request: error checking before issuing requests. If all checks
> + * passed, return true.
> + * append: true if only zone append requests issued.
> + */
> +static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
> + bool append, uint8_t *status) {
> +    BlockDriverState *bs = blk_bs(s->blk);
> +    int index = offset / bs->bl.zone_size;
> +
> +    if (offset < 0 || len < 0 || offset > bs->bl.capacity - len) {
> +    *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> +    return false;
> +    }
> +
> +    if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
> +    *status = VIRTIO_BLK_S_UNSUPP;
> +    return false;
> +    }
> +
> +    if (append) {
> +    if ((offset % bs->bl.write_granularity) != 0) {
> +    *status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
> +    return false;
> +    }
> +
> +    if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
> +    *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> +    return false;
> +    }
> +
> +    if (len / 512 > bs->bl.max_append_sectors) {
> +    if (bs->bl.max_append_sectors == 0) {
> +    *status = VIRTIO_BLK_S_UNSUPP;
> +    } else {
> +    *status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
> +    }
> +    return false;
> +    }
> +    }
> +    return true;
> +}
> +
> +static void virtio_blk_zone_report_complete(void *opaque, int ret)
> +{
> +    ZoneCmdData *data = opaque;
> +    VirtIOBlockReq *req = data->req;
> +    VirtIOBlock *s = 

Re: [PATCH v4 1/3] file-posix: add the tracking of the zones write pointers

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:56 +0800, Sam Li wrote:
> Since Linux doesn't have a user API to issue zone append operations to
> zoned devices from user space, the file-posix driver is modified to add
> zone append emulation using regular writes. To do this, the file-posix
> driver tracks the wp location of all zones of the device. It uses an
> array of uint64_t. The most significant bit of each wp location indicates
> if the zone type is conventional zones.
> 
> The zones wp can be changed due to the following operations issued:
> - zone reset: change the wp to the start offset of that zone
> - zone finish: change to the end location of that zone
> - write to a zone
> - zone append
> 
> Signed-off-by: Sam Li 
> ---
>  block/file-posix.c   | 144 +++
>  include/block/block-common.h |  14 +++
>  include/block/block_int-common.h |   3 +
>  3 files changed, 161 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 7c5a330fc1..5ff5500301 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1324,6 +1324,66 @@ static int hdev_get_max_segments(int fd, struct stat
> *st)
>  #endif
>  }
>  
> +#if defined(CONFIG_BLKZONED)
> +static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +    unsigned int nrz) {
> +    struct blk_zone *blkz;
> +    int64_t rep_size;

size_t

> +    int64_t sector = offset >> BDRV_SECTOR_BITS;

uint64_t

> +    int ret, n = 0, i = 0;
> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct 
> blk_zone);
> +    g_autofree struct blk_zone_report *rep = NULL;
> +
> +    rep = g_malloc(rep_size);
> +    blkz = (struct blk_zone *)(rep + 1);
> +    while (n < nrz) {
> +    memset(rep, 0, rep_size);
> +    rep->sector = sector;
> +    rep->nr_zones = nrz - n;
> +
> +    do {
> +    ret = ioctl(fd, BLKREPORTZONE, rep);
> +    } while (ret != 0 && errno == EINTR);
> +    if (ret != 0) {
> +    error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
> +    fd, offset, errno);
> +    return -errno;
> +    }
> +
> +    if (!rep->nr_zones) {
> +    break;
> +    }
> +
> +    for (i = 0; i < rep->nr_zones; i++, n++) {
> +    /*
> + * The wp tracking cares only about sequential writes required 
> and
> + * sequential write preferred zones so that the wp can advance to
> + * the right location.
> + * Use the most significant bit of the wp location to indicate 
> the
> + * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
> + */
> +    if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
> +    wps->wp[i] = 1ULL << 63;
> +    } else {
> +    wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
> +    }
> +    }
> +    sector = blkz[i - 1].start + blkz[i - 1].len;
> +    }
> +
> +    return 0;
> +}
> +
> +static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
> +    unsigned int nrz) {
> +    qemu_mutex_lock(>lock);
> +    if (get_zones_wp(fd, wps, offset, nrz) < 0) {
> +    error_report("update zone wp failed");
> +    }
> +    qemu_mutex_unlock(>lock);
> +}
> +#endif
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
> @@ -1414,6 +1474,14 @@ static void raw_refresh_limits(BlockDriverState *bs,
> Error **errp)
>  if (ret >= 0) {
>  bs->bl.max_active_zones = ret;
>  }
> +
> +    bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
> +    if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
> +    error_report("report wps failed");
> +    g_free(bs->bl.wps);
> +    return;
> +    }
> +    qemu_mutex_init(>bl.wps->lock);
>  }
>  }
>  
> @@ -1725,6 +1793,25 @@ static int handle_aiocb_rw(void *opaque)
>  
>  out:
>  if (nbytes == aiocb->aio_nbytes) {
> +#if defined(CONFIG_BLKZONED)
> +    if (aiocb->aio_type & QEMU_AIO_WRITE) {
> +    BlockZoneWps *wps = aiocb->bs->bl.wps;
> +    int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
> +    if (wps) {

In my testing, I get a divide by zero exception in the "index"
calculation above. Changing this part as follows

-int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
-if (wps) {
+if (wps && aiocb->bs->bl.zone_size) {
+int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
+

fixes the crash.

> +    qemu_mutex_lock(>lock);
> +    if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
> +    uint64_t wend_offset =
> +    aiocb->aio_offset + aiocb->aio_nbytes;
> +
> +    /* Advance the wp if needed */
> +    if (wend_offset > wps->wp[index]) {
> +  

Re: [PATCH v12 6/7] qemu-iotests: test new zone operations

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> We have added new block layer APIs of zoned block devices.
> Test it with:
> Create a null_blk device, run each zone operation on it and see
> whether reporting right zone information.

change this to "whether the logs show the correct zone information"?

> 

Could you please describe how to run this specific set of tests
in more detail?
 
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> ---
>  tests/qemu-iotests/tests/zoned.out | 53 ++
>  tests/qemu-iotests/tests/zoned.sh  | 86 ++
>  2 files changed, 139 insertions(+)
>  create mode 100644 tests/qemu-iotests/tests/zoned.out
>  create mode 100755 tests/qemu-iotests/tests/zoned.sh
> 
> diff --git a/tests/qemu-iotests/tests/zoned.out b/tests/qemu-
> iotests/tests/zoned.out
> new file mode 100644
> index 00..0c8f96deb9
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/zoned.out
> @@ -0,0 +1,53 @@
> +QA output created by zoned.sh
> +Testing a null_blk device:
> +Simple cases: if the operations work
> +(1) report the first zone:
> +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
> +
> +report the first 10 zones
> +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
> +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
> +start: 0x10, len 0x8, cap 0x8, wptr 0x10, zcond:1, [type: 2]
> +start: 0x18, len 0x8, cap 0x8, wptr 0x18, zcond:1, [type: 2]
> +start: 0x20, len 0x8, cap 0x8, wptr 0x20, zcond:1, [type: 2]
> +start: 0x28, len 0x8, cap 0x8, wptr 0x28, zcond:1, [type: 2]
> +start: 0x30, len 0x8, cap 0x8, wptr 0x30, zcond:1, [type: 2]
> +start: 0x38, len 0x8, cap 0x8, wptr 0x38, zcond:1, [type: 2]
> +start: 0x40, len 0x8, cap 0x8, wptr 0x40, zcond:1, [type: 2]
> +start: 0x48, len 0x8, cap 0x8, wptr 0x48, zcond:1, [type: 2]
> +
> +report the last zone:
> +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type:
> 2]
> +
> +
> +(2) opening the first zone
> +report after:
> +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:3, [type: 2]
> +
> +opening the second zone
> +report after:
> +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:3, [type: 2]
> +
> +opening the last zone
> +report after:
> +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:3, [type:
> 2]
> +
> +
> +(3) closing the first zone
> +report after:
> +start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
> +
> +closing the last zone
> +report after:
> +start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type:
> 2]
> +
> +
> +(4) finishing the second zone
> +After finishing a zone:
> +start: 0x8, len 0x8, cap 0x8, wptr 0x10, zcond:14, [type: 2]
> +
> +
> +(5) resetting the second zone
> +After resetting a zone:
> +start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
> +*** done
> diff --git a/tests/qemu-iotests/tests/zoned.sh b/tests/qemu-
> iotests/tests/zoned.sh
> new file mode 100755
> index 00..fced0194c5
> --- /dev/null
> +++ b/tests/qemu-iotests/tests/zoned.sh
> @@ -0,0 +1,86 @@
> +#!/usr/bin/env bash
> +#
> +# Test zone management operations.
> +#
> +
> +seq="$(basename $0)"
> +echo "QA output created by $seq"
> +status=1 # failure is the default!
> +
> +_cleanup()
> +{
> +  _cleanup_test_img
> +  sudo rmmod null_blk
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +. ./common.qemu
> +
> +# This test only runs on Linux hosts with raw image files.
> +_supported_fmt raw
> +_supported_proto file
> +_supported_os Linux
> +
> +QEMU_IO="build/qemu-io"
> +IMG="--image-opts -n driver=zoned_host_device,filename=/dev/nullb0"
> +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> +
> +echo "Testing a null_blk device:"
> +echo "case 1: if the operations work"
> +sudo modprobe null_blk nr_devices=1 zoned=1
> +
> +echo "(1) report the first zone:"
> +sudo $QEMU_IO $IMG -c "zrp 0 1"
> +echo
> +echo "report the first 10 zones"
> +sudo $QEMU_IO $IMG -c "zrp 0 10"
> +echo
> +echo "report the last zone:"
> +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" # 0x3e7000 / 512 = 0x1f38
> +echo
> +echo
> +echo "(2) opening the first zone"
> +sudo $QEMU_IO $IMG -c "zo 0 268435456"  # 268435456 / 512 = 524288
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zrp 0 1"
> +echo
> +echo "opening the second zone"
> +sudo $QEMU_IO $IMG -c "zo 268435456 268435456" #
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zrp 268435456 1"
> +echo
> +echo "opening the last zone"
> +sudo $QEMU_IO $IMG -c "zo 0x3e7000 268435456"
> +echo "report after:"
> +sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2"
> +echo
> +echo
> +echo "(3) closing the first zone"
> +sudo $QEMU_IO $IMG -c "zc 0 268435456"
> +echo "report after:"
> +sudo 

Re: [PATCH v12 5/7] config: add check to block layer

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> Putting zoned/non-zoned BlockDrivers on top of each other is not
> allowed.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Hannes Reinecke 

Reviewed-by: Dmitry Fomichev 

> ---
>  block.c  | 19 +++
>  block/file-posix.c   | 12 
>  block/raw-format.c   |  1 +
>  include/block/block_int-common.h |  5 +
>  4 files changed, 37 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 1fbf6b9e69..5d6fa4a25a 100644
> --- a/block.c
> +++ b/block.c
> @@ -7951,6 +7951,25 @@ void bdrv_add_child(BlockDriverState *parent_bs,
> BlockDriverState *child_bs,
>  return;
>  }
>  
> +    /*
> + * Non-zoned block drivers do not follow zoned storage constraints
> + * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
> + * drivers in a graph.
> + */
> +    if (!parent_bs->drv->supports_zoned_children &&
> +    child_bs->bl.zoned == BLK_Z_HM) {
> +    /*
> + * The host-aware model allows zoned storage constraints and random
> + * write. Allow mixing host-aware and non-zoned drivers. Using
> + * host-aware device as a regular device.
> + */
> +    error_setg(errp, "Cannot add a %s child to a %s parent",
> +   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
> +   parent_bs->drv->supports_zoned_children ?
> +   "support zoned children" : "not support zoned children");
> +    return;
> +    }
> +
>  if (!QLIST_EMPTY(_bs->parents)) {
>  error_setg(errp, "The node %s already has a parent",
>     child_bs->node_name);
> diff --git a/block/file-posix.c b/block/file-posix.c
> index bd28e3eaea..7c5a330fc1 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict
> *options,
>  goto fail;
>  }
>  }
> +#ifdef CONFIG_BLKZONED
> +    /*
> + * The kernel page cache does not reliably work for writes to SWR zones
> + * of zoned block device because it can not guarantee the order of 
> writes.
> + */
> +    if ((strcmp(bs->drv->format_name, "zoned_host_device") == 0) &&
> +    (!(s->open_flags & O_DIRECT))) {
> +    error_setg(errp, "driver=zoned_host_device was specified, but it "
> +   "requires cache.direct=on, which was not specified.");
> +    return -EINVAL; /* No host kernel page cache */
> +    }
> +#endif
>  
>  if (S_ISBLK(st.st_mode)) {
>  #ifdef __linux__
> diff --git a/block/raw-format.c b/block/raw-format.c
> index bac43f1d25..18dc52a150 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -615,6 +615,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild
> *c,
>  BlockDriver bdrv_raw = {
>  .format_name  = "raw",
>  .instance_size    = sizeof(BDRVRawState),
> +    .supports_zoned_children = true,
>  .bdrv_probe   = _probe,
>  .bdrv_reopen_prepare  = _reopen_prepare,
>  .bdrv_reopen_commit   = _reopen_commit,
> diff --git a/include/block/block_int-common.h b/include/block/block_int-
> common.h
> index cdc06e77a6..37dddc603c 100644
> --- a/include/block/block_int-common.h
> +++ b/include/block/block_int-common.h
> @@ -127,6 +127,11 @@ struct BlockDriver {
>   */
>  bool is_format;
>  
> +    /*
> + * Set to true if the BlockDriver supports zoned children.
> + */
> +    bool supports_zoned_children;
> +
>  /*
>   * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
>   * this field set to true, except ones that are defined only by their



Re: [PATCH v12 4/7] raw-format: add zone operations to pass through requests

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> raw-format driver usually sits on top of file-posix driver. It needs to
> pass through requests of zone commands.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> Reviewed-by: Hannes Reinecke 

Reviewed-by: Dmitry Fomichev 

> ---
>  block/raw-format.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/block/raw-format.c b/block/raw-format.c
> index f337ac7569..bac43f1d25 100644
> --- a/block/raw-format.c
> +++ b/block/raw-format.c
> @@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState
> *bs,
>  return bdrv_co_pdiscard(bs->file, offset, bytes);
>  }
>  
> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t
> offset,
> +   unsigned int *nr_zones,
> +   BlockZoneDescriptor *zones) {
> +    return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
> +}
> +
> +static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp 
> op,
> + int64_t offset, int64_t len) {
> +    return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
> +}
> +
>  static int64_t raw_getlength(BlockDriverState *bs)
>  {
>  int64_t len;
> @@ -615,6 +626,8 @@ BlockDriver bdrv_raw = {
>  .bdrv_co_pwritev  = _co_pwritev,
>  .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
>  .bdrv_co_pdiscard = _co_pdiscard,
> +    .bdrv_co_zone_report  = _co_zone_report,
> +    .bdrv_co_zone_mgmt  = _co_zone_mgmt,
>  .bdrv_co_block_status = _co_block_status,
>  .bdrv_co_copy_range_from = _co_copy_range_from,
>  .bdrv_co_copy_range_to  = _co_copy_range_to,



Re: [PATCH v12 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> Add a new zoned_host_device BlockDriver. The zoned_host_device option
> accepts only zoned host block devices. By adding zone management
> operations in this new BlockDriver, users can use the new block
> layer APIs including Report Zone and four zone management operations
> (open, close, finish, reset, reset_all).
> 
> Qemu-io uses the new APIs to perform zoned storage commands of the device:
> zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
> zone_finish(zf).
> 
> For example, to test zone_report, use following command:
> $ ./build/qemu-io --image-opts -n driver=zoned_host_device,
> filename=/dev/nullb0
> -c "zrp offset nr_zones"
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> ---
>  block/block-backend.c | 148 +
>  block/file-posix.c    | 335 ++
>  block/io.c    |  41 
>  include/block/block-io.h  |   7 +
>  include/block/block_int-common.h  |  24 +++
>  include/block/raw-aio.h   |   6 +-
>  include/sysemu/block-backend-io.h |  18 ++
>  meson.build   |   4 +
>  qapi/block-core.json  |   8 +-
>  qemu-io-cmds.c    | 149 +
>  10 files changed, 737 insertions(+), 3 deletions(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index aa4adf06ae..1c618e9c68 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
>  void *iobuf;
>  int ret;
>  BdrvRequestFlags flags;
> +    union {
> +    struct {
> +    unsigned int *nr_zones;
> +    BlockZoneDescriptor *zones;
> +    } zone_report;
> +    struct {
> +    unsigned long op;
> +    } zone_mgmt;
> +    };
>  } BlkRwCo;
>  
>  int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
> @@ -1775,6 +1784,145 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
>  return ret;
>  }
>  
> +static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
> +{
> +    BlkAioEmAIOCB *acb = opaque;
> +    BlkRwCo *rwco = >rwco;
> +
> +    rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
> +   rwco->zone_report.nr_zones,
> +   rwco->zone_report.zones);
> +    blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
> +    unsigned int *nr_zones,
> +    BlockZoneDescriptor  *zones,
> +    BlockCompletionFunc *cb, void *opaque)
> +{
> +    BlkAioEmAIOCB *acb;
> +    Coroutine *co;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +    acb->rwco = (BlkRwCo) {
> +    .blk    = blk,
> +    .offset = offset,
> +    .ret    = NOT_DONE,
> +    .zone_report = {
> +    .zones = zones,
> +    .nr_zones = nr_zones,
> +    },
> +    };
> +    acb->has_returned = false;
> +
> +    co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
> +    bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +    acb->has_returned = true;
> +    if (acb->rwco.ret != NOT_DONE) {
> +    replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +    }
> +
> +    return >common;
> +}
> +
> +static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
> +{
> +    BlkAioEmAIOCB *acb = opaque;
> +    BlkRwCo *rwco = >rwco;
> +
> +    rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
> + rwco->offset, acb->bytes);
> +    blk_aio_complete(acb);
> +}
> +
> +BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
> +  int64_t offset, int64_t len,
> +  BlockCompletionFunc *cb, void *opaque) {
> +    BlkAioEmAIOCB *acb;
> +    Coroutine *co;
> +    IO_CODE();
> +
> +    blk_inc_in_flight(blk);
> +    acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
> +    acb->rwco = (BlkRwCo) {
> +    .blk    = blk,
> +    .offset = offset,
> +    .ret    = NOT_DONE,
> +    .zone_mgmt = {
> +    .op = op,
> +    },
> +    };
> +    acb->bytes = len;
> +    acb->has_returned = false;
> +
> +    co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
> +    bdrv_coroutine_enter(blk_bs(blk), co);
> +
> +    acb->has_returned = true;
> +    if (acb->rwco.ret != NOT_DONE) {
> +    replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
> + blk_aio_complete_bh, acb);
> +    }
> +
> +    return >common;
> +}
> +
> +/*
> + * Send a zone_report command.
> + * offset is a byte offset from the start of the device. No alignment
> + * required for offset.
> + * nr_zones represents IN maximum and OUT actual.

Re: [PATCH v12 1/7] include: add zoned device structs

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> Reviewed-by: Hannes Reinecke 
> ---
>  include/block/block-common.h | 43 
>  1 file changed, 43 insertions(+)
> 
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index fdb7306e78..36bd0e480e 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver;
>  typedef struct BdrvChild BdrvChild;
>  typedef struct BdrvChildClass BdrvChildClass;
>  
> +typedef enum BlockZoneOp {
> +    BLK_ZO_OPEN,
> +    BLK_ZO_CLOSE,
> +    BLK_ZO_FINISH,
> +    BLK_ZO_RESET,
> +} BlockZoneOp;
> +
> +typedef enum BlockZoneModel {
> +    BLK_Z_NONE = 0x0, /* Regular block device */
> +    BLK_Z_HM = 0x1, /* Host-managed zoned block device */
> +    BLK_Z_HA = 0x2, /* Host-aware zoned block device */
> +} BlockZoneModel;
> +
> +typedef enum BlockZoneCondition {
> +    BLK_ZS_NOT_WP = 0x0,
> +    BLK_ZS_EMPTY = 0x1,
> +    BLK_ZS_IOPEN = 0x2,
> +    BLK_ZS_EOPEN = 0x3,
> +    BLK_ZS_CLOSED = 0x4,
> +    BLK_ZS_RDONLY = 0xD,
> +    BLK_ZS_FULL = 0xE,
> +    BLK_ZS_OFFLINE = 0xF,
> +} BlockZoneCondition;

The virtio-zbd specification doesn't define conditions, it uses the term
"state" instead, similar to ZNS. Please rename BlockZoneCondition to
BlockZoneState to follow the spec terminology.

> +
> +typedef enum BlockZoneType {
> +    BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
> +    BLK_ZT_SWR = 0x2, /* Sequential writes required */
> +    BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
> +} BlockZoneType;
> +
> +/*
> + * Zone descriptor data structure.
> + * Provides information on a zone with all position and size values in bytes.
> + */
> +typedef struct BlockZoneDescriptor {
> +    uint64_t start;
> +    uint64_t length;
> +    uint64_t cap;
> +    uint64_t wp;
> +    BlockZoneType type;
> +    BlockZoneCondition cond;

BlockZoneState state;

> +} BlockZoneDescriptor;
> +
>  typedef struct BlockDriverInfo {
>  /* in bytes, 0 if irrelevant */
>  int cluster_size;



Re: [PATCH v12 7/7] docs/zoned-storage: add zoned device documentation

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> Add the documentation about the zoned device support to virtio-blk
> emulation.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 
> ---
>  docs/devel/zoned-storage.rst   | 43 ++
>  docs/system/qemu-block-drivers.rst.inc |  6 
>  2 files changed, 49 insertions(+)
>  create mode 100644 docs/devel/zoned-storage.rst
> 
> diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
> new file mode 100644
> index 00..cf169d029b
> --- /dev/null
> +++ b/docs/devel/zoned-storage.rst
> @@ -0,0 +1,43 @@
> +=
> +zoned-storage
> +=
> +
> +Zoned Block Devices (ZBDs) divide the LBA space into block regions called
> zones
> +that are larger than the LBA size. They can only allow sequential writes,
> which
> +can reduce write amplification in SSDs, and potentially lead to higher
> +throughput and increased capacity. More details about ZBDs can be found at:
> +
> +https://zonedstorage.io/docs/introduction/zoned-storage
> +
> +1. Block layer APIs for zoned storage
> +-
> +QEMU block layer has three zoned storage model:

replace it with

+QEMU block layer supports three zoned storage models:

? with this nit,

Reviewed-by: Dmitry Fomichev 

> +- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
> +to zones. It supports ZBD-specific I/O commands that can be used by a host to
> +manage the zones of a device.
> +- BLK_Z_HA: The host-aware zoned model allows random write operations in
> +zones, making it backward compatible with regular block devices.
> +- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
> +regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
> +supported.
> +
> +The block device information resides inside BlockDriverState. QEMU uses
> +BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
> +block layer while processing I/O requests. A BlockBackend has a root pointer
> to
> +a BlockDriverState graph(for example, raw format on top of file-posix). The
> +zoned storage information can be propagated from the leaf BlockDriverState 
> all
> +the way up to the BlockBackend. If the zoned storage model in file-posix is
> +set to BLK_Z_HM, then block drivers will declare support for zoned host
> device.
> +
> +The block layer APIs support commands needed for zoned storage devices,
> +including report zones, four zone operations, and zone append.
> +
> +2. Emulating zoned storage controllers
> +--
> +When the BlockBackend's BlockLimits model reports a zoned storage device,
> users
> +like the virtio-blk emulation or the qemu-io-cmds.c utility can use block
> layer
> +APIs for zoned storage emulation or testing.
> +
> +For example, to test zone_report on a null_blk device using qemu-io is:
> +$ path/to/qemu-io --image-opts -n
> driver=zoned_host_device,filename=/dev/nullb0
> +-c "zrp offset nr_zones"
> diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-
> drivers.rst.inc
> index dfe5d2293d..0b97227fd9 100644
> --- a/docs/system/qemu-block-drivers.rst.inc
> +++ b/docs/system/qemu-block-drivers.rst.inc
> @@ -430,6 +430,12 @@ Hard disks
>    you may corrupt your host data (use the ``-snapshot`` command
>    line option or modify the device permissions accordingly).
>  
> +Zoned block devices
> +  Zoned block devices can be passed through to the guest if the emulated
> storage
> +  controller supports zoned storage. Use ``--blockdev zoned_host_device,
> +  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
> +  as ``drive0``.
> +
>  Windows
>  ^^^
>  



Re: [RFC v3 1/2] include: update virtio_blk headers from Linux 5.19-rc2+

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 23:05 +0800, Sam Li wrote:
> Use scripts/update-linux-headers.sh to update virtio-blk headers
> from Dmitry's "virtio-blk:add support for zoned block devices"
> linux patch. There is a link for more information:
> https://github.com/dmitry-fomichev/virtblk-zbd
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Sam Li 

the duplicate sign-off is not needed. With this,

Reviewed-by: Dmitry Fomichev 

> ---
>  include/standard-headers/linux/virtio_blk.h | 109 
>  1 file changed, 109 insertions(+)
> 
> diff --git a/include/standard-headers/linux/virtio_blk.h b/include/standard-
> headers/linux/virtio_blk.h
> index 2dcc90826a..490bd21c76 100644
> --- a/include/standard-headers/linux/virtio_blk.h
> +++ b/include/standard-headers/linux/virtio_blk.h
> @@ -40,6 +40,7 @@
>  #define VIRTIO_BLK_F_MQ12  /* support more than one vq */
>  #define VIRTIO_BLK_F_DISCARD   13  /* DISCARD is supported */
>  #define VIRTIO_BLK_F_WRITE_ZEROES  14  /* WRITE ZEROES is supported 
> */
> +#define VIRTIO_BLK_F_ZONED 17  /* Zoned block device */
>  
>  /* Legacy feature bits */
>  #ifndef VIRTIO_BLK_NO_LEGACY
> @@ -119,6 +120,20 @@ struct virtio_blk_config {
> uint8_t write_zeroes_may_unmap;
>  
> uint8_t unused1[3];
> +
> +   /* Secure erase fields that are defined in the virtio spec */
> +   uint8_t sec_erase[12];
> +
> +   /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
> +   struct virtio_blk_zoned_characteristics {
> +   __virtio32 zone_sectors;
> +   __virtio32 max_open_zones;
> +   __virtio32 max_active_zones;
> +   __virtio32 max_append_sectors;
> +   __virtio32 write_granularity;
> +   uint8_t model;
> +   uint8_t unused2[3];
> +   } zoned;
>  } QEMU_PACKED;
>  
>  /*
> @@ -153,6 +168,27 @@ struct virtio_blk_config {
>  /* Write zeroes command */
>  #define VIRTIO_BLK_T_WRITE_ZEROES  13
>  
> +/* Zone append command */
> +#define VIRTIO_BLK_T_ZONE_APPEND    15
> +
> +/* Report zones command */
> +#define VIRTIO_BLK_T_ZONE_REPORT    16
> +
> +/* Open zone command */
> +#define VIRTIO_BLK_T_ZONE_OPEN  18
> +
> +/* Close zone command */
> +#define VIRTIO_BLK_T_ZONE_CLOSE 20
> +
> +/* Finish zone command */
> +#define VIRTIO_BLK_T_ZONE_FINISH    22
> +
> +/* Reset zone command */
> +#define VIRTIO_BLK_T_ZONE_RESET 24
> +
> +/* Reset All zones command */
> +#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
> +
>  #ifndef VIRTIO_BLK_NO_LEGACY
>  /* Barrier before this op. */
>  #define VIRTIO_BLK_T_BARRIER   0x8000
> @@ -172,6 +208,72 @@ struct virtio_blk_outhdr {
> __virtio64 sector;
>  };
>  
> +/*
> + * Supported zoned device models.
> + */
> +
> +/* Regular block device */
> +#define VIRTIO_BLK_Z_NONE  0
> +/* Host-managed zoned device */
> +#define VIRTIO_BLK_Z_HM    1
> +/* Host-aware zoned device */
> +#define VIRTIO_BLK_Z_HA    2
> +
> +/*
> + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
> + */
> +struct virtio_blk_zone_descriptor {
> +   /* Zone capacity */
> +   __virtio64 z_cap;
> +   /* The starting sector of the zone */
> +   __virtio64 z_start;
> +   /* Zone write pointer position in sectors */
> +   __virtio64 z_wp;
> +   /* Zone type */
> +   uint8_t z_type;
> +   /* Zone state */
> +   uint8_t z_state;
> +   uint8_t reserved[38];
> +};
> +
> +struct virtio_blk_zone_report {
> +   __virtio64 nr_zones;
> +   uint8_t reserved[56];
> +   struct virtio_blk_zone_descriptor zones[];
> +};
> +
> +/*
> + * Supported zone types.
> + */
> +
> +/* Conventional zone */
> +#define VIRTIO_BLK_ZT_CONV 1
> +/* Sequential Write Required zone */
> +#define VIRTIO_BLK_ZT_SWR  2
> +/* Sequential Write Preferred zone */
> +#define VIRTIO_BLK_ZT_SWP  3
> +
> +/*
> + * Zone states that are available for zones of all types.
> + */
> +
> +/* Not a write pointer (conventional zones only) */
> +#define VIRTIO_BLK_ZS_NOT_WP   0
> +/* Empty */
> +#define VIRTIO_BLK_ZS_EMPTY    1
> +/* Implicitly Open */
> +#define VIRTIO_BLK_ZS_IOPEN    2
> +/* Explicitly Open */
> +#define VIRTIO_BLK_ZS_EOPEN    3
> +/* Closed */
> +#define VIRTIO_BLK_ZS_CLOSED   4
> +/* Read-Only */
> +#define VIRTIO_BLK_ZS_RDONLY   13
> +/* Full */
> +#define VIRTIO_BLK_ZS_FULL 14
> +/* Offline */
> +#define VIRTIO_BLK_ZS_OFFLINE  15
> +
>  /* Unmap this range (only valid for write zeroes command) */
>  #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x0001
>  
> @@ -198,4 +300,11 @@ struct virtio_scsi_inhdr {
>  #define VIRTIO_BLK_S_OK0
>  #define VIRTIO_BLK_S_IOERR 1
>  #define VIRTIO_BLK_S_UNSUPP2
> +
> +/* Error codes that are specific to zoned block devices */
> +#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
> +#define 

Re: [PATCH v12 2/7] file-posix: introduce helper functions for sysfs attributes

2022-10-16 Thread Dmitry Fomichev
On Sun, 2022-10-16 at 22:51 +0800, Sam Li wrote:
> Use get_sysfs_str_val() to get the string value of device
> zoned model. Then get_sysfs_zoned_model() can convert it to
> BlockZoneModel type of QEMU.
> 
> Use get_sysfs_long_val() to get the long value of zoned device
> information.
> 
> Signed-off-by: Sam Li 
> Reviewed-by: Hannes Reinecke 
> Reviewed-by: Stefan Hajnoczi 
> Reviewed-by: Damien Le Moal 

Reviewed-by: Dmitry Fomichev 

> ---
>  block/file-posix.c   | 124 ++-
>  include/block/block_int-common.h |   3 +
>  2 files changed, 91 insertions(+), 36 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 23acffb9a4..8cb07fdb8a 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1201,66 +1201,112 @@ static int hdev_get_max_hw_transfer(int fd, struct
> stat *st)
>  #endif
>  }
>  
> -static int hdev_get_max_segments(int fd, struct stat *st)
> -{
> +/*
> + * Get a sysfs attribute value as character string.
> + */
> +static int get_sysfs_str_val(struct stat *st, const char *attribute,
> + char **val) {
>  #ifdef CONFIG_LINUX
> -    char buf[32];
> -    const char *end;
> -    char *sysfspath = NULL;
> +    g_autofree char *sysfspath = NULL;
>  int ret;
> -    int sysfd = -1;
> -    long max_segments;
> +    size_t len;
>  
> -    if (S_ISCHR(st->st_mode)) {
> -    if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
> -    return ret;
> -    }
> +    if (!S_ISBLK(st->st_mode)) {
>  return -ENOTSUP;
>  }
>  
> -    if (!S_ISBLK(st->st_mode)) {
> -    return -ENOTSUP;
> +    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
> +    major(st->st_rdev), minor(st->st_rdev),
> +    attribute);
> +    ret = g_file_get_contents(sysfspath, val, , NULL);
> +    if (ret == -1) {
> +    return -ENOENT;
>  }
>  
> -    sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
> -    major(st->st_rdev), minor(st->st_rdev));
> -    sysfd = open(sysfspath, O_RDONLY);
> -    if (sysfd == -1) {
> -    ret = -errno;
> -    goto out;
> +    /* The file is ended with '\n' */
> +    char *p;
> +    p = *val;
> +    if (*(p + len - 1) == '\n') {
> +    *(p + len - 1) = '\0';
>  }
> -    do {
> -    ret = read(sysfd, buf, sizeof(buf) - 1);
> -    } while (ret == -1 && errno == EINTR);
> +    return ret;
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
> +static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
> +{
> +    g_autofree char *val = NULL;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, "zoned", );
>  if (ret < 0) {
> -    ret = -errno;
> -    goto out;
> -    } else if (ret == 0) {
> -    ret = -EIO;
> -    goto out;
> +    return ret;
>  }
> -    buf[ret] = 0;
> -    /* The file is ended with '\n', pass 'end' to accept that. */
> -    ret = qemu_strtol(buf, , 10, _segments);
> -    if (ret == 0 && end && *end == '\n') {
> -    ret = max_segments;
> +
> +    if (strcmp(val, "host-managed") == 0) {
> +    *zoned = BLK_Z_HM;
> +    } else if (strcmp(val, "host-aware") == 0) {
> +    *zoned = BLK_Z_HA;
> +    } else if (strcmp(val, "none") == 0) {
> +    *zoned = BLK_Z_NONE;
> +    } else {
> +    return -ENOTSUP;
>  }
> +    return 0;
> +}
>  
> -out:
> -    if (sysfd != -1) {
> -    close(sysfd);
> +/*
> + * Get a sysfs attribute value as a long integer.
> + */
> +static long get_sysfs_long_val(struct stat *st, const char *attribute)
> +{
> +#ifdef CONFIG_LINUX
> +    g_autofree char *str = NULL;
> +    const char *end;
> +    long val;
> +    int ret;
> +
> +    ret = get_sysfs_str_val(st, attribute, );
> +    if (ret < 0) {
> +    return ret;
> +    }
> +
> +    /* The file is ended with '\n', pass 'end' to accept that. */
> +    ret = qemu_strtol(str, , 10, );
> +    if (ret == 0 && end && *end == '\0') {
> +    ret = val;
>  }
> -    g_free(sysfspath);
>  return ret;
>  #else
>  return -ENOTSUP;
>  #endif
>  }
>  
> +static int hdev_get_max_segments(int fd, struct stat *st)
> +{
> +#ifdef CONFIG_LINUX
> +    int ret;
> +
> +    if (S_ISCHR(st->st_mode)) {
> +    if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
> +    return ret;
> +    }
> +    return -ENOTSUP;
> +    }
> +    return get_sysfs_long_val(st, "max_segments");
> +#else
> +    return -ENOTSUP;
> +#endif
> +}
> +
>  static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>  BDRVRawState *s = bs->opaque;
>  struct stat st;
> +    int ret;
> +    BlockZoneModel zoned;
>  
>  s->needs_alignment = raw_needs_alignment(bs);
>  raw_probe_alignment(bs, s->fd, errp);
> @@ -1298,6 +1344,12 @@ static void raw_refresh_limits(BlockDriverState *bs,
> Error **errp)
>  bs->bl.max_hw_iov = ret;
>  }
>  }
> +
> +    ret = 

[PATCH] target/i386: Save and restore pc_save before tcg_remove_ops_after

2022-10-16 Thread Richard Henderson
Restore pc_save while undoing any state change that may have
happened while decoding the instruction.  Leave a TODO about
removing all of that when the table-based decoder is complete.

Cc: Paolo Bonzini 
Suggested-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 279a3ae999..75ca99084e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -4817,6 +4817,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 int modrm, reg, rm, mod, op, opreg, val;
 bool orig_cc_op_dirty = s->cc_op_dirty;
 CCOp orig_cc_op = s->cc_op;
+target_ulong orig_pc_save = s->pc_save;
 
 s->pc = s->base.pc_next;
 s->override = -1;
@@ -4838,8 +4839,15 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 2:
 /* Restore state that may affect the next instruction. */
 s->pc = s->base.pc_next;
+/*
+ * TODO: These save/restore can be removed after the table-based
+ * decoder is complete; we will be decoding the insn completely
+ * before any code generation that might affect these variables.
+ */
 s->cc_op_dirty = orig_cc_op_dirty;
 s->cc_op = orig_cc_op;
+s->pc_save = orig_pc_save;
+/* END TODO */
 s->base.num_insns--;
 tcg_remove_ops_after(s->prev_insn_end);
 s->base.is_jmp = DISAS_TOO_MANY;
-- 
2.34.1




Re: [PULL 00/10] riscv-to-apply queue

2022-10-16 Thread Stefan Hajnoczi
Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/7.2 for any 
user-visible changes.


signature.asc
Description: PGP signature


[PATCH v7 9/9] target/arm: Enable TARGET_TB_PCREL

2022-10-16 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
v7: Introduce DisasLabel to clean up pc_save frobbing.
Adjust pc_save around tcg_remove_ops_after.
---
 target/arm/cpu-param.h|   1 +
 target/arm/translate.h|  50 -
 target/arm/cpu.c  |  23 
 target/arm/translate-a64.c|  56 ---
 target/arm/translate-m-nocp.c |   2 +-
 target/arm/translate.c| 100 ++
 6 files changed, 161 insertions(+), 71 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 08681828ac..ae472cf330 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -30,6 +30,7 @@
  */
 # define TARGET_PAGE_BITS_VARY
 # define TARGET_PAGE_BITS_MIN  10
+# define TARGET_TB_PCREL 1
 #endif
 
 #define NB_MMU_MODES 8
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 4aa239e23c..3cdc7dbc2f 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -6,18 +6,42 @@
 
 
 /* internal defines */
+
+/*
+ * Save pc_save across a branch, so that we may restore the value from
+ * before the branch at the point the label is emitted.
+ */
+typedef struct DisasLabel {
+TCGLabel *label;
+target_ulong pc_save;
+} DisasLabel;
+
 typedef struct DisasContext {
 DisasContextBase base;
 const ARMISARegisters *isar;
 
 /* The address of the current instruction being translated. */
 target_ulong pc_curr;
+/*
+ * For TARGET_TB_PCREL, the full value of cpu_pc is not known
+ * (although the page offset is known).  For convenience, the
+ * translation loop uses the full virtual address that triggered
+ * the translation, from base.pc_start through pc_curr.
+ * For efficiency, we do not update cpu_pc for every instruction.
+ * Instead, pc_save has the value of pc_curr at the time of the
+ * last update to cpu_pc, which allows us to compute the addend
+ * needed to bring cpu_pc current: pc_curr - pc_save.
+ * If cpu_pc now contains the destination of an indirect branch,
+ * pc_save contains -1 to indicate that relative updates are no
+ * longer possible.
+ */
+target_ulong pc_save;
 target_ulong page_start;
 uint32_t insn;
 /* Nonzero if this instruction has been conditionally skipped.  */
 int condjmp;
 /* The label that will be jumped to when the instruction is skipped.  */
-TCGLabel *condlabel;
+DisasLabel condlabel;
 /* Thumb-2 conditional execution bits.  */
 int condexec_mask;
 int condexec_cond;
@@ -28,8 +52,6 @@ typedef struct DisasContext {
  * after decode (ie after any UNDEF checks)
  */
 bool eci_handled;
-/* TCG op to rewind to if this turns out to be an invalid ECI state */
-TCGOp *insn_eci_rewind;
 int sctlr_b;
 MemOp be_data;
 #if !defined(CONFIG_USER_ONLY)
@@ -566,6 +588,28 @@ static inline MemOp finalize_memop(DisasContext *s, MemOp 
opc)
  */
 uint64_t asimd_imm_const(uint32_t imm, int cmode, int op);
 
+/*
+ * gen_disas_label:
+ * Create a label and cache a copy of pc_save.
+ */
+static inline DisasLabel gen_disas_label(DisasContext *s)
+{
+return (DisasLabel){
+.label = gen_new_label(),
+.pc_save = s->pc_save,
+};
+}
+
+/*
+ * set_disas_label:
+ * Emit a label and restore the cached copy of pc_save.
+ */
+static inline void set_disas_label(DisasContext *s, DisasLabel l)
+{
+gen_set_label(l.label);
+s->pc_save = l.pc_save;
+}
+
 /*
  * Helpers for implementing sets of trans_* functions.
  * Defer the implementation of NAME to FUNC, with optional extra arguments.
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 94ca6f163f..0bc5e9b125 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -76,17 +76,18 @@ static vaddr arm_cpu_get_pc(CPUState *cs)
 void arm_cpu_synchronize_from_tb(CPUState *cs,
  const TranslationBlock *tb)
 {
-ARMCPU *cpu = ARM_CPU(cs);
-CPUARMState *env = >env;
-
-/*
- * It's OK to look at env for the current mode here, because it's
- * never possible for an AArch64 TB to chain to an AArch32 TB.
- */
-if (is_a64(env)) {
-env->pc = tb_pc(tb);
-} else {
-env->regs[15] = tb_pc(tb);
+/* The program counter is always up to date with TARGET_TB_PCREL. */
+if (!TARGET_TB_PCREL) {
+CPUARMState *env = cs->env_ptr;
+/*
+ * It's OK to look at env for the current mode here, because it's
+ * never possible for an AArch64 TB to chain to an AArch32 TB.
+ */
+if (is_a64(env)) {
+env->pc = tb_pc(tb);
+} else {
+env->regs[15] = tb_pc(tb);
+}
 }
 }
 #endif /* CONFIG_TCG */
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index f9f8559c01..9cf2f40a80 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -142,12 +142,18 @@ static void reset_btype(DisasContext *s)
 
 static void gen_pc_plus_diff(DisasContext 

[PATCH v7 7/9] target/arm: Introduce gen_pc_plus_diff for aarch64

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 41 +++---
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 623f7e2e96..f9f8559c01 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -140,9 +140,14 @@ static void reset_btype(DisasContext *s)
 }
 }
 
+static void gen_pc_plus_diff(DisasContext *s, TCGv_i64 dest, target_long diff)
+{
+tcg_gen_movi_i64(dest, s->pc_curr + diff);
+}
+
 void gen_a64_update_pc(DisasContext *s, target_long diff)
 {
-tcg_gen_movi_i64(cpu_pc, s->pc_curr + diff);
+gen_pc_plus_diff(s, cpu_pc, diff);
 }
 
 /*
@@ -1360,7 +1365,7 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t 
insn)
 
 if (insn & (1U << 31)) {
 /* BL Branch with link */
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+gen_pc_plus_diff(s, cpu_reg(s, 30), curr_insn_len(s));
 }
 
 /* B Branch / BL Branch with link */
@@ -2301,11 +2306,17 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 default:
 goto do_unallocated;
 }
-gen_a64_set_pc(s, dst);
 /* BLR also needs to load return address */
 if (opc == 1) {
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+TCGv_i64 lr = cpu_reg(s, 30);
+if (dst == lr) {
+TCGv_i64 tmp = new_tmp_a64(s);
+tcg_gen_mov_i64(tmp, dst);
+dst = tmp;
+}
+gen_pc_plus_diff(s, lr, curr_insn_len(s));
 }
+gen_a64_set_pc(s, dst);
 break;
 
 case 8: /* BRAA */
@@ -2328,11 +2339,17 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 } else {
 dst = cpu_reg(s, rn);
 }
-gen_a64_set_pc(s, dst);
 /* BLRAA also needs to load return address */
 if (opc == 9) {
-tcg_gen_movi_i64(cpu_reg(s, 30), s->base.pc_next);
+TCGv_i64 lr = cpu_reg(s, 30);
+if (dst == lr) {
+TCGv_i64 tmp = new_tmp_a64(s);
+tcg_gen_mov_i64(tmp, dst);
+dst = tmp;
+}
+gen_pc_plus_diff(s, lr, curr_insn_len(s));
 }
+gen_a64_set_pc(s, dst);
 break;
 
 case 4: /* ERET */
@@ -2900,7 +2917,8 @@ static void disas_ld_lit(DisasContext *s, uint32_t insn)
 
 tcg_rt = cpu_reg(s, rt);
 
-clean_addr = tcg_constant_i64(s->pc_curr + imm);
+clean_addr = new_tmp_a64(s);
+gen_pc_plus_diff(s, clean_addr, imm);
 if (is_vector) {
 do_fp_ld(s, rt, clean_addr, size);
 } else {
@@ -4244,23 +4262,22 @@ static void disas_ldst(DisasContext *s, uint32_t insn)
 static void disas_pc_rel_adr(DisasContext *s, uint32_t insn)
 {
 unsigned int page, rd;
-uint64_t base;
-uint64_t offset;
+int64_t offset;
 
 page = extract32(insn, 31, 1);
 /* SignExtend(immhi:immlo) -> offset */
 offset = sextract64(insn, 5, 19);
 offset = offset << 2 | extract32(insn, 29, 2);
 rd = extract32(insn, 0, 5);
-base = s->pc_curr;
 
 if (page) {
 /* ADRP (page based) */
-base &= ~0xfff;
 offset <<= 12;
+/* The page offset is ok for TARGET_TB_PCREL. */
+offset -= s->pc_curr & 0xfff;
 }
 
-tcg_gen_movi_i64(cpu_reg(s, rd), base + offset);
+gen_pc_plus_diff(s, cpu_reg(s, rd), offset);
 }
 
 /*
-- 
2.34.1




[PATCH v7 5/9] target/arm: Remove gen_exception_internal_insn pc argument

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.
Since we always pass dc->pc_curr, fold the arithmetic to zero displacement.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c |  6 +++---
 target/arm/translate.c | 10 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 49380e1cfe..623f7e2e96 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -332,9 +332,9 @@ static void gen_exception_internal(int excp)
 gen_helper_exception_internal(cpu_env, tcg_constant_i32(excp));
 }
 
-static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
+static void gen_exception_internal_insn(DisasContext *s, int excp)
 {
-gen_a64_update_pc(s, pc - s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -2211,7 +2211,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
  * Secondly, "HLT 0xf000" is the A64 semihosting syscall instruction.
  */
 if (semihosting_enabled(s->current_el == 0) && imm16 == 0xf000) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, EXCP_SEMIHOST);
 } else {
 unallocated_encoding(s);
 }
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 350f991649..9104ab8232 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1074,10 +1074,10 @@ static inline void gen_smc(DisasContext *s)
 s->base.is_jmp = DISAS_SMC;
 }
 
-static void gen_exception_internal_insn(DisasContext *s, uint32_t pc, int excp)
+static void gen_exception_internal_insn(DisasContext *s, int excp)
 {
 gen_set_condexec(s);
-gen_update_pc(s, pc - s->pc_curr);
+gen_update_pc(s, 0);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -1169,7 +1169,7 @@ static inline void gen_hlt(DisasContext *s, int imm)
  */
 if (semihosting_enabled(s->current_el != 0) &&
 (imm == (s->thumb ? 0x3c : 0xf000))) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, EXCP_SEMIHOST);
 return;
 }
 
@@ -6556,7 +6556,7 @@ static bool trans_BKPT(DisasContext *s, arg_BKPT *a)
 if (arm_dc_feature(s, ARM_FEATURE_M) &&
 semihosting_enabled(s->current_el == 0) &&
 (a->imm == 0xab)) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, EXCP_SEMIHOST);
 } else {
 gen_exception_bkpt_insn(s, syn_aa32_bkpt(a->imm, false));
 }
@@ -8762,7 +8762,7 @@ static bool trans_SVC(DisasContext *s, arg_SVC *a)
 if (!arm_dc_feature(s, ARM_FEATURE_M) &&
 semihosting_enabled(s->current_el == 0) &&
 (a->imm == semihost_imm)) {
-gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
+gen_exception_internal_insn(s, EXCP_SEMIHOST);
 } else {
 gen_update_pc(s, curr_insn_len(s));
 s->svc_imm = a->imm;
-- 
2.34.1




[PATCH v7 8/9] target/arm: Introduce gen_pc_plus_diff for aarch32

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 38 +-
 1 file changed, 21 insertions(+), 17 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index ca128edab7..5f6bd9b5b7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -260,23 +260,22 @@ static inline int get_a32_user_mem_index(DisasContext *s)
 }
 }
 
-/* The architectural value of PC.  */
-static uint32_t read_pc(DisasContext *s)
-{
-return s->pc_curr + (s->thumb ? 4 : 8);
-}
-
 /* The pc_curr difference for an architectural jump. */
 static target_long jmp_diff(DisasContext *s, target_long diff)
 {
 return diff + (s->thumb ? 4 : 8);
 }
 
+static void gen_pc_plus_diff(DisasContext *s, TCGv_i32 var, target_long diff)
+{
+tcg_gen_movi_i32(var, s->pc_curr + diff);
+}
+
 /* Set a variable to the value of a CPU register.  */
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 {
 if (reg == 15) {
-tcg_gen_movi_i32(var, read_pc(s));
+gen_pc_plus_diff(s, var, jmp_diff(s, 0));
 } else {
 tcg_gen_mov_i32(var, cpu_R[reg]);
 }
@@ -292,7 +291,11 @@ TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs)
 TCGv_i32 tmp = tcg_temp_new_i32();
 
 if (reg == 15) {
-tcg_gen_movi_i32(tmp, (read_pc(s) & ~3) + ofs);
+/*
+ * This address is computed from an aligned PC:
+ * subtract off the low bits.
+ */
+gen_pc_plus_diff(s, tmp, jmp_diff(s, ofs - (s->pc_curr & 3)));
 } else {
 tcg_gen_addi_i32(tmp, cpu_R[reg], ofs);
 }
@@ -1155,7 +1158,7 @@ void unallocated_encoding(DisasContext *s)
 /* Force a TB lookup after an instruction that changes the CPU state.  */
 void gen_lookup_tb(DisasContext *s)
 {
-tcg_gen_movi_i32(cpu_R[15], s->base.pc_next);
+gen_pc_plus_diff(s, cpu_R[15], curr_insn_len(s));
 s->base.is_jmp = DISAS_EXIT;
 }
 
@@ -6479,7 +6482,7 @@ static bool trans_BLX_r(DisasContext *s, arg_BLX_r *a)
 return false;
 }
 tmp = load_reg(s, a->rm);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 gen_bx(s, tmp);
 return true;
 }
@@ -8347,7 +8350,7 @@ static bool trans_B_cond_thumb(DisasContext *s, arg_ci *a)
 
 static bool trans_BL(DisasContext *s, arg_i *a)
 {
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
@@ -8366,7 +8369,7 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 if (s->thumb && (a->imm & 2)) {
 return false;
 }
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | s->thumb);
 store_cpu_field_constant(!s->thumb, thumb);
 /* This jump is computed from an aligned PC: subtract off the low bits. */
 gen_jmp(s, jmp_diff(s, a->imm - (s->pc_curr & 3)));
@@ -8376,7 +8379,7 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 static bool trans_BL_BLX_prefix(DisasContext *s, arg_BL_BLX_prefix *a)
 {
 assert(!arm_dc_feature(s, ARM_FEATURE_THUMB2));
-tcg_gen_movi_i32(cpu_R[14], read_pc(s) + (a->imm << 12));
+gen_pc_plus_diff(s, cpu_R[14], jmp_diff(s, a->imm << 12));
 return true;
 }
 
@@ -8386,7 +8389,7 @@ static bool trans_BL_suffix(DisasContext *s, 
arg_BL_suffix *a)
 
 assert(!arm_dc_feature(s, ARM_FEATURE_THUMB2));
 tcg_gen_addi_i32(tmp, cpu_R[14], (a->imm << 1) | 1);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | 1);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | 1);
 gen_bx(s, tmp);
 return true;
 }
@@ -8402,7 +8405,7 @@ static bool trans_BLX_suffix(DisasContext *s, 
arg_BLX_suffix *a)
 tmp = tcg_temp_new_i32();
 tcg_gen_addi_i32(tmp, cpu_R[14], a->imm << 1);
 tcg_gen_andi_i32(tmp, tmp, 0xfffc);
-tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | 1);
+gen_pc_plus_diff(s, cpu_R[14], curr_insn_len(s) | 1);
 gen_bx(s, tmp);
 return true;
 }
@@ -8725,10 +8728,11 @@ static bool op_tbranch(DisasContext *s, arg_tbranch *a, 
bool half)
 tcg_gen_add_i32(addr, addr, tmp);
 
 gen_aa32_ld_i32(s, tmp, addr, get_mem_index(s), half ? MO_UW : MO_UB);
-tcg_temp_free_i32(addr);
 
 tcg_gen_add_i32(tmp, tmp, tmp);
-tcg_gen_addi_i32(tmp, tmp, read_pc(s));
+gen_pc_plus_diff(s, addr, jmp_diff(s, 0));
+tcg_gen_add_i32(tmp, tmp, addr);
+tcg_temp_free_i32(addr);
 store_reg(s, 15, tmp);
 return true;
 }
-- 
2.34.1




[PATCH v7 6/9] target/arm: Change gen_jmp* to work on displacements

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.c | 37 +
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/target/arm/translate.c b/target/arm/translate.c
index 9104ab8232..ca128edab7 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -266,6 +266,12 @@ static uint32_t read_pc(DisasContext *s)
 return s->pc_curr + (s->thumb ? 4 : 8);
 }
 
+/* The pc_curr difference for an architectural jump. */
+static target_long jmp_diff(DisasContext *s, target_long diff)
+{
+return diff + (s->thumb ? 4 : 8);
+}
+
 /* Set a variable to the value of a CPU register.  */
 void load_reg_var(DisasContext *s, TCGv_i32 var, int reg)
 {
@@ -2592,7 +2598,7 @@ static void gen_goto_ptr(void)
  * cpu_loop_exec. Any live exit_requests will be processed as we
  * enter the next TB.
  */
-static void gen_goto_tb(DisasContext *s, int n, int diff)
+static void gen_goto_tb(DisasContext *s, int n, target_long diff)
 {
 target_ulong dest = s->pc_curr + diff;
 
@@ -2608,10 +2614,8 @@ static void gen_goto_tb(DisasContext *s, int n, int diff)
 }
 
 /* Jump, specifying which TB number to use if we gen_goto_tb() */
-static inline void gen_jmp_tb(DisasContext *s, uint32_t dest, int tbno)
+static void gen_jmp_tb(DisasContext *s, target_long diff, int tbno)
 {
-int diff = dest - s->pc_curr;
-
 if (unlikely(s->ss_active)) {
 /* An indirect jump so that we still trigger the debug exception.  */
 gen_update_pc(s, diff);
@@ -2653,9 +2657,9 @@ static inline void gen_jmp_tb(DisasContext *s, uint32_t 
dest, int tbno)
 }
 }
 
-static inline void gen_jmp(DisasContext *s, uint32_t dest)
+static inline void gen_jmp(DisasContext *s, target_long diff)
 {
-gen_jmp_tb(s, dest, 0);
+gen_jmp_tb(s, diff, 0);
 }
 
 static inline void gen_mulxy(TCGv_i32 t0, TCGv_i32 t1, int x, int y)
@@ -8322,7 +8326,7 @@ static bool trans_CLRM(DisasContext *s, arg_CLRM *a)
 
 static bool trans_B(DisasContext *s, arg_i *a)
 {
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8337,14 +8341,14 @@ static bool trans_B_cond_thumb(DisasContext *s, arg_ci 
*a)
 return true;
 }
 arm_skip_unless(s, a->cond);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
 static bool trans_BL(DisasContext *s, arg_i *a)
 {
 tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8364,7 +8368,8 @@ static bool trans_BLX_i(DisasContext *s, arg_BLX_i *a)
 }
 tcg_gen_movi_i32(cpu_R[14], s->base.pc_next | s->thumb);
 store_cpu_field_constant(!s->thumb, thumb);
-gen_jmp(s, (read_pc(s) & ~3) + a->imm);
+/* This jump is computed from an aligned PC: subtract off the low bits. */
+gen_jmp(s, jmp_diff(s, a->imm - (s->pc_curr & 3)));
 return true;
 }
 
@@ -8525,10 +8530,10 @@ static bool trans_WLS(DisasContext *s, arg_WLS *a)
  * when we take this upcoming exit from this TB, so gen_jmp_tb() is OK.
  */
 }
-gen_jmp_tb(s, s->base.pc_next, 1);
+gen_jmp_tb(s, curr_insn_len(s), 1);
 
 gen_set_label(nextlabel);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
@@ -8608,7 +8613,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 
 if (a->f) {
 /* Loop-forever: just jump back to the loop start */
-gen_jmp(s, read_pc(s) - a->imm);
+gen_jmp(s, jmp_diff(s, -a->imm));
 return true;
 }
 
@@ -8639,7 +8644,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 tcg_temp_free_i32(decr);
 }
 /* Jump back to the loop start */
-gen_jmp(s, read_pc(s) - a->imm);
+gen_jmp(s, jmp_diff(s, -a->imm));
 
 gen_set_label(loopend);
 if (a->tp) {
@@ -8647,7 +8652,7 @@ static bool trans_LE(DisasContext *s, arg_LE *a)
 store_cpu_field(tcg_constant_i32(4), v7m.ltpsize);
 }
 /* End TB, continuing to following insn */
-gen_jmp_tb(s, s->base.pc_next, 1);
+gen_jmp_tb(s, curr_insn_len(s), 1);
 return true;
 }
 
@@ -8746,7 +8751,7 @@ static bool trans_CBZ(DisasContext *s, arg_CBZ *a)
 tcg_gen_brcondi_i32(a->nz ? TCG_COND_EQ : TCG_COND_NE,
 tmp, 0, s->condlabel);
 tcg_temp_free_i32(tmp);
-gen_jmp(s, read_pc(s) + a->imm);
+gen_jmp(s, jmp_diff(s, a->imm));
 return true;
 }
 
-- 
2.34.1




[PATCH v7 4/9] target/arm: Change gen_exception_insn* to work on displacements

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.h|  5 +++--
 target/arm/translate-a64.c| 28 ++-
 target/arm/translate-m-nocp.c |  6 ++---
 target/arm/translate-mve.c|  2 +-
 target/arm/translate-vfp.c|  6 ++---
 target/arm/translate.c| 42 +--
 6 files changed, 43 insertions(+), 46 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index d651044855..4aa239e23c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -281,9 +281,10 @@ void arm_jump_cc(DisasCompare *cmp, TCGLabel *label);
 void arm_gen_test_cc(int cc, TCGLabel *label);
 MemOp pow2_align(unsigned i);
 void unallocated_encoding(DisasContext *s);
-void gen_exception_insn_el(DisasContext *s, uint64_t pc, int excp,
+void gen_exception_insn_el(DisasContext *s, target_long pc_diff, int excp,
uint32_t syn, uint32_t target_el);
-void gen_exception_insn(DisasContext *s, uint64_t pc, int excp, uint32_t syn);
+void gen_exception_insn(DisasContext *s, target_long pc_diff,
+int excp, uint32_t syn);
 
 /* Return state of Alternate Half-precision flag, caller frees result */
 static inline TCGv_i32 get_ahp_flag(void)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 585d42d5b2..49380e1cfe 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1155,7 +1155,7 @@ static bool fp_access_check_only(DisasContext *s)
 assert(!s->fp_access_checked);
 s->fp_access_checked = true;
 
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_fp_access_trap(1, 0xe, false, 0),
   s->fp_excp_el);
 return false;
@@ -1170,7 +1170,7 @@ static bool fp_access_check(DisasContext *s)
 return false;
 }
 if (s->sme_trap_nonstreaming && s->is_nonstreaming) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_Streaming, false));
 return false;
 }
@@ -1190,7 +1190,7 @@ bool sve_access_check(DisasContext *s)
 goto fail_exit;
 }
 } else if (s->sve_excp_el) {
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_sve_access_trap(), s->sve_excp_el);
 goto fail_exit;
 }
@@ -1212,7 +1212,7 @@ bool sve_access_check(DisasContext *s)
 static bool sme_access_check(DisasContext *s)
 {
 if (s->sme_excp_el) {
-gen_exception_insn_el(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn_el(s, 0, EXCP_UDEF,
   syn_smetrap(SME_ET_AccessTrap, false),
   s->sme_excp_el);
 return false;
@@ -1242,12 +1242,12 @@ bool sme_enabled_check_with_svcr(DisasContext *s, 
unsigned req)
 return false;
 }
 if (FIELD_EX64(req, SVCR, SM) && !s->pstate_sm) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_NotStreaming, false));
 return false;
 }
 if (FIELD_EX64(req, SVCR, ZA) && !s->pstate_za) {
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
+gen_exception_insn(s, 0, EXCP_UDEF,
syn_smetrap(SME_ET_InactiveZA, false));
 return false;
 }
@@ -1907,7 +1907,7 @@ static void gen_sysreg_undef(DisasContext *s, bool isread,
 } else {
 syndrome = syn_uncategorized();
 }
-gen_exception_insn(s, s->pc_curr, EXCP_UDEF, syndrome);
+gen_exception_insn(s, 0, EXCP_UDEF, syndrome);
 }
 
 /* MRS - move from system register
@@ -2161,8 +2161,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 switch (op2_ll) {
 case 1: /* SVC */
 gen_ss_advance(s);
-gen_exception_insn(s, s->base.pc_next, EXCP_SWI,
-   syn_aa64_svc(imm16));
+gen_exception_insn(s, 4, EXCP_SWI, syn_aa64_svc(imm16));
 break;
 case 2: /* HVC */
 if (s->current_el == 0) {
@@ -2175,8 +2174,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 gen_a64_update_pc(s, 0);
 gen_helper_pre_hvc(cpu_env);
 gen_ss_advance(s);
-gen_exception_insn_el(s, s->base.pc_next, EXCP_HVC,
-  syn_aa64_hvc(imm16), 2);
+gen_exception_insn_el(s, 4, EXCP_HVC, syn_aa64_hvc(imm16), 2);
 break;
 case 3: /* SMC */
 if (s->current_el == 

[PATCH v7 2/9] target/arm: Change gen_goto_tb to work on displacements

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on absolute values.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 40 --
 target/arm/translate.c | 10 ++
 2 files changed, 27 insertions(+), 23 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5b67375f4e..6a372ed184 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -370,8 +370,10 @@ static inline bool use_goto_tb(DisasContext *s, uint64_t 
dest)
 return translator_use_goto_tb(>base, dest);
 }
 
-static inline void gen_goto_tb(DisasContext *s, int n, uint64_t dest)
+static void gen_goto_tb(DisasContext *s, int n, int64_t diff)
 {
+uint64_t dest = s->pc_curr + diff;
+
 if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
 gen_a64_set_pc_im(dest);
@@ -1354,7 +1356,7 @@ static inline AArch64DecodeFn *lookup_disas_fn(const 
AArch64DecodeTable *table,
  */
 static void disas_uncond_b_imm(DisasContext *s, uint32_t insn)
 {
-uint64_t addr = s->pc_curr + sextract32(insn, 0, 26) * 4;
+int64_t diff = sextract32(insn, 0, 26) * 4;
 
 if (insn & (1U << 31)) {
 /* BL Branch with link */
@@ -1363,7 +1365,7 @@ static void disas_uncond_b_imm(DisasContext *s, uint32_t 
insn)
 
 /* B Branch / BL Branch with link */
 reset_btype(s);
-gen_goto_tb(s, 0, addr);
+gen_goto_tb(s, 0, diff);
 }
 
 /* Compare and branch (immediate)
@@ -1375,14 +1377,14 @@ static void disas_uncond_b_imm(DisasContext *s, 
uint32_t insn)
 static void disas_comp_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int sf, op, rt;
-uint64_t addr;
+int64_t diff;
 TCGLabel *label_match;
 TCGv_i64 tcg_cmp;
 
 sf = extract32(insn, 31, 1);
 op = extract32(insn, 24, 1); /* 0: CBZ; 1: CBNZ */
 rt = extract32(insn, 0, 5);
-addr = s->pc_curr + sextract32(insn, 5, 19) * 4;
+diff = sextract32(insn, 5, 19) * 4;
 
 tcg_cmp = read_cpu_reg(s, rt, sf);
 label_match = gen_new_label();
@@ -1391,9 +1393,9 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 }
 
 /* Test and branch (immediate)
@@ -1405,13 +1407,13 @@ static void disas_comp_b_imm(DisasContext *s, uint32_t 
insn)
 static void disas_test_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int bit_pos, op, rt;
-uint64_t addr;
+int64_t diff;
 TCGLabel *label_match;
 TCGv_i64 tcg_cmp;
 
 bit_pos = (extract32(insn, 31, 1) << 5) | extract32(insn, 19, 5);
 op = extract32(insn, 24, 1); /* 0: TBZ; 1: TBNZ */
-addr = s->pc_curr + sextract32(insn, 5, 14) * 4;
+diff = sextract32(insn, 5, 14) * 4;
 rt = extract32(insn, 0, 5);
 
 tcg_cmp = tcg_temp_new_i64();
@@ -1422,9 +1424,9 @@ static void disas_test_b_imm(DisasContext *s, uint32_t 
insn)
 tcg_gen_brcondi_i64(op ? TCG_COND_NE : TCG_COND_EQ,
 tcg_cmp, 0, label_match);
 tcg_temp_free_i64(tcg_cmp);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 }
 
 /* Conditional branch (immediate)
@@ -1436,13 +1438,13 @@ static void disas_test_b_imm(DisasContext *s, uint32_t 
insn)
 static void disas_cond_b_imm(DisasContext *s, uint32_t insn)
 {
 unsigned int cond;
-uint64_t addr;
+int64_t diff;
 
 if ((insn & (1 << 4)) || (insn & (1 << 24))) {
 unallocated_encoding(s);
 return;
 }
-addr = s->pc_curr + sextract32(insn, 5, 19) * 4;
+diff = sextract32(insn, 5, 19) * 4;
 cond = extract32(insn, 0, 4);
 
 reset_btype(s);
@@ -1450,12 +1452,12 @@ static void disas_cond_b_imm(DisasContext *s, uint32_t 
insn)
 /* genuinely conditional branches */
 TCGLabel *label_match = gen_new_label();
 arm_gen_test_cc(cond, label_match);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 gen_set_label(label_match);
-gen_goto_tb(s, 1, addr);
+gen_goto_tb(s, 1, diff);
 } else {
 /* 0xe and 0xf are both "always" conditions */
-gen_goto_tb(s, 0, addr);
+gen_goto_tb(s, 0, diff);
 }
 }
 
@@ -1629,7 +1631,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
  * any pending interrupts immediately.
  */
 reset_btype(s);
-gen_goto_tb(s, 0, s->base.pc_next);
+gen_goto_tb(s, 0, 4);
 return;
 
 case 7: /* SB */
@@ -1641,7 +1643,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
  * MB and end the TB instead.
  */
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
-gen_goto_tb(s, 0, s->base.pc_next);
+  

[PATCH v7 3/9] target/arm: Change gen_*set_pc_im to gen_*update_pc

2022-10-16 Thread Richard Henderson
In preparation for TARGET_TB_PCREL, reduce reliance on
absolute values by passing in pc difference.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a32.h |  2 +-
 target/arm/translate.h |  6 ++--
 target/arm/translate-a64.c | 32 +-
 target/arm/translate-vfp.c |  2 +-
 target/arm/translate.c | 68 --
 5 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/target/arm/translate-a32.h b/target/arm/translate-a32.h
index 78a84c1414..5339c22f1e 100644
--- a/target/arm/translate-a32.h
+++ b/target/arm/translate-a32.h
@@ -40,7 +40,7 @@ void write_neon_element64(TCGv_i64 src, int reg, int ele, 
MemOp memop);
 TCGv_i32 add_reg_for_lit(DisasContext *s, int reg, int ofs);
 void gen_set_cpsr(TCGv_i32 var, uint32_t mask);
 void gen_set_condexec(DisasContext *s);
-void gen_set_pc_im(DisasContext *s, target_ulong val);
+void gen_update_pc(DisasContext *s, target_long diff);
 void gen_lookup_tb(DisasContext *s);
 long vfp_reg_offset(bool dp, unsigned reg);
 long neon_full_reg_offset(unsigned reg);
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 90bf7c57fc..d651044855 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -254,7 +254,7 @@ static inline int curr_insn_len(DisasContext *s)
  * For instructions which want an immediate exit to the main loop, as opposed
  * to attempting to use lookup_and_goto_ptr.  Unlike DISAS_UPDATE_EXIT, this
  * doesn't write the PC on exiting the translation loop so you need to ensure
- * something (gen_a64_set_pc_im or runtime helper) has done so before we reach
+ * something (gen_a64_update_pc or runtime helper) has done so before we reach
  * return from cpu_tb_exec.
  */
 #define DISAS_EXIT  DISAS_TARGET_9
@@ -263,14 +263,14 @@ static inline int curr_insn_len(DisasContext *s)
 
 #ifdef TARGET_AARCH64
 void a64_translate_init(void);
-void gen_a64_set_pc_im(uint64_t val);
+void gen_a64_update_pc(DisasContext *s, target_long diff);
 extern const TranslatorOps aarch64_translator_ops;
 #else
 static inline void a64_translate_init(void)
 {
 }
 
-static inline void gen_a64_set_pc_im(uint64_t val)
+static inline void gen_a64_update_pc(DisasContext *s, target_long diff)
 {
 }
 #endif
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 6a372ed184..585d42d5b2 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -140,9 +140,9 @@ static void reset_btype(DisasContext *s)
 }
 }
 
-void gen_a64_set_pc_im(uint64_t val)
+void gen_a64_update_pc(DisasContext *s, target_long diff)
 {
-tcg_gen_movi_i64(cpu_pc, val);
+tcg_gen_movi_i64(cpu_pc, s->pc_curr + diff);
 }
 
 /*
@@ -334,14 +334,14 @@ static void gen_exception_internal(int excp)
 
 static void gen_exception_internal_insn(DisasContext *s, uint64_t pc, int excp)
 {
-gen_a64_set_pc_im(pc);
+gen_a64_update_pc(s, pc - s->pc_curr);
 gen_exception_internal(excp);
 s->base.is_jmp = DISAS_NORETURN;
 }
 
 static void gen_exception_bkpt_insn(DisasContext *s, uint32_t syndrome)
 {
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_exception_bkpt_insn(cpu_env, tcg_constant_i32(syndrome));
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -376,11 +376,11 @@ static void gen_goto_tb(DisasContext *s, int n, int64_t 
diff)
 
 if (use_goto_tb(s, dest)) {
 tcg_gen_goto_tb(n);
-gen_a64_set_pc_im(dest);
+gen_a64_update_pc(s, diff);
 tcg_gen_exit_tb(s->base.tb, n);
 s->base.is_jmp = DISAS_NORETURN;
 } else {
-gen_a64_set_pc_im(dest);
+gen_a64_update_pc(s, diff);
 if (s->ss_active) {
 gen_step_complete_exception(s);
 } else {
@@ -1952,7 +1952,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
 uint32_t syndrome;
 
 syndrome = syn_aa64_sysregtrap(op0, op1, op2, crn, crm, rt, isread);
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_access_check_cp_reg(cpu_env,
tcg_constant_ptr(ri),
tcg_constant_i32(syndrome),
@@ -1962,7 +1962,7 @@ static void handle_sys(DisasContext *s, uint32_t insn, 
bool isread,
  * The readfn or writefn might raise an exception;
  * synchronize the CPU state in case it does.
  */
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 }
 
 /* Handle special cases first */
@@ -2172,7 +2172,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 /* The pre HVC helper handles cases when HVC gets trapped
  * as an undefined insn by runtime configuration.
  */
-gen_a64_set_pc_im(s->pc_curr);
+gen_a64_update_pc(s, 0);
 gen_helper_pre_hvc(cpu_env);
 gen_ss_advance(s);
 gen_exception_insn_el(s, s->base.pc_next, 

[PATCH v7 1/9] target/arm: Introduce curr_insn_len

2022-10-16 Thread Richard Henderson
A simple helper to retrieve the length of the current insn.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/arm/translate.h | 5 +
 target/arm/translate-vfp.c | 2 +-
 target/arm/translate.c | 5 ++---
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/target/arm/translate.h b/target/arm/translate.h
index af5d4a7086..90bf7c57fc 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -226,6 +226,11 @@ static inline void disas_set_insn_syndrome(DisasContext 
*s, uint32_t syn)
 s->insn_start = NULL;
 }
 
+static inline int curr_insn_len(DisasContext *s)
+{
+return s->base.pc_next - s->pc_curr;
+}
+
 /* is_jmp field values */
 #define DISAS_JUMP  DISAS_TARGET_0 /* only pc was modified dynamically */
 /* CPU state was modified dynamically; exit to main loop for interrupts. */
diff --git a/target/arm/translate-vfp.c b/target/arm/translate-vfp.c
index bd5ae27d09..94cc1e4b77 100644
--- a/target/arm/translate-vfp.c
+++ b/target/arm/translate-vfp.c
@@ -242,7 +242,7 @@ static bool vfp_access_check_a(DisasContext *s, bool 
ignore_vfp_enabled)
 if (s->sme_trap_nonstreaming) {
 gen_exception_insn(s, s->pc_curr, EXCP_UDEF,
syn_smetrap(SME_ET_Streaming,
-   s->base.pc_next - s->pc_curr == 2));
+   curr_insn_len(s) == 2));
 return false;
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 2f72afe019..5752b7af5c 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -6650,7 +6650,7 @@ static ISSInfo make_issinfo(DisasContext *s, int rd, bool 
p, bool w)
 /* ISS not valid if writeback */
 if (p && !w) {
 ret = rd;
-if (s->base.pc_next - s->pc_curr == 2) {
+if (curr_insn_len(s) == 2) {
 ret |= ISSIs16Bit;
 }
 } else {
@@ -9812,8 +9812,7 @@ static void arm_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 /* nothing more to generate */
 break;
 case DISAS_WFI:
-gen_helper_wfi(cpu_env,
-   tcg_constant_i32(dc->base.pc_next - dc->pc_curr));
+gen_helper_wfi(cpu_env, tcg_constant_i32(curr_insn_len(dc)));
 /*
  * The helper doesn't necessarily throw an exception, but we
  * must go back to the main loop to check for interrupts anyway.
-- 
2.34.1




[PATCH v7 0/9] target/arm: pc-relative translation blocks

2022-10-16 Thread Richard Henderson
This is the Arm specific changes required to reduce the
amount of translation for address space randomization.

Changes for v7:
  * Remove read_pc in patch 8.
  * Add DisasLabel and inlines to generate/emit,
cleaning up the management of pc_save.
  * Restore pc_save after tcg_remove_ops_after.


r~


Richard Henderson (9):
  target/arm: Introduce curr_insn_len
  target/arm: Change gen_goto_tb to work on displacements
  target/arm: Change gen_*set_pc_im to gen_*update_pc
  target/arm: Change gen_exception_insn* to work on displacements
  target/arm: Remove gen_exception_internal_insn pc argument
  target/arm: Change gen_jmp* to work on displacements
  target/arm: Introduce gen_pc_plus_diff for aarch64
  target/arm: Introduce gen_pc_plus_diff for aarch32
  target/arm: Enable TARGET_TB_PCREL

 target/arm/cpu-param.h|   1 +
 target/arm/translate-a32.h|   2 +-
 target/arm/translate.h|  66 +++-
 target/arm/cpu.c  |  23 +--
 target/arm/translate-a64.c| 191 +--
 target/arm/translate-m-nocp.c |   8 +-
 target/arm/translate-mve.c|   2 +-
 target/arm/translate-vfp.c|  10 +-
 target/arm/translate.c| 276 --
 9 files changed, 351 insertions(+), 228 deletions(-)

-- 
2.34.1




Re: [PATCH v4 00/10] m25p80: Add SFDP support

2022-10-16 Thread Joel Stanley
On Thu, 13 Oct 2022 at 16:12, Cédric Le Goater  wrote:
>
> Hello,
>
> This patchset adds support for JEDEC STANDARD JESD216 Serial Flash
> Discovery Parameters (SFDP). SFDP describes the features of a serial
> flash device using a set of internal parameter tables. Support in
> Linux has been added some time ago and the spi-nor driver is using it
> more often to detect the flash settings and even flash models.

Reviewed-by: Joel Stanley 
Tested-by: Joel Stanley 

Thanks Cédric!



Re: [PULL 0/2] M68k for 7.2 patches

2022-10-16 Thread Stefan Hajnoczi
On Fri, 14 Oct 2022 at 03:26, Laurent Vivier  wrote:
>
> The following changes since commit f1d33f55c47dfdaf8daacd618588ad3ae4c452d1:
>
>   Merge tag 'pull-testing-gdbstub-plugins-gitdm-061022-3' of 
> https://github.com/stsquad/qemu into staging (2022-10-06 07:11:56 -0400)
>
> are available in the Git repository at:
>
>   https://github.com/vivier/qemu-m68k.git tags/m68k-for-7.2-pull-request
>
> for you to fetch changes up to fa327be58280f76d2565ff0bdb9b0010ac97c3b0:
>
>   m68k: write bootinfo as rom section and re-randomize on reboot (2022-10-11 
> 23:02:46 +0200)
>
> 
> Pull request m68k branch 20221014
>
> Update rng seed boot parameter
>
> 
>
> Jason A. Donenfeld (2):
>   m68k: rework BI_VIRT_RNG_SEED as BI_RNG_SEED
>   m68k: write bootinfo as rom section and re-randomize on reboot

This commit breaks mingw64 due to the Windows LLP64 data model where
pointers don't fit into unsigned long
(https://en.wikipedia.org/wiki/LP64#64-bit_data_models). Please use
uintptr_t instead of unsigned long:

x86_64-w64-mingw32-gcc -m64 -mcx16 -Ilibqemu-m68k-softmmu.fa.p -I.
-I.. -Itarget/m68k -I../target/m68k -Iqapi -Itrace -Iui -Iui/shader
-I/usr/x86_64-w64-mingw32/sys-root/mingw/include/pixman-1
-I/usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0
-I/usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include
-fdiagnostics-color=auto -Wall -Winvalid-pch -Werror -std=gnu11 -O2 -g
-iquote . -iquote /builds/qemu-project/qemu -iquote
/builds/qemu-project/qemu/include -iquote
/builds/qemu-project/qemu/tcg/i386 -mms-bitfields -U_FORTIFY_SOURCE
-D_FORTIFY_SOURCE=2 -fno-pie -no-pie -D_GNU_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes
-Wredundant-decls -Wundef -Wwrite-strings -Wmissing-prototypes
-fno-strict-aliasing -fno-common -fwrapv -Wold-style-declaration
-Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k
-Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs
-Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2
-Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi
-fstack-protector-strong -DNEED_CPU_H
'-DCONFIG_TARGET="m68k-softmmu-config-target.h"'
'-DCONFIG_DEVICES="m68k-softmmu-config-devices.h"' -MD -MQ
libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj -MF
libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj.d -o
libqemu-m68k-softmmu.fa.p/hw_m68k_virt.c.obj -c ../hw/m68k/virt.c
In file included from ../hw/m68k/virt.c:23:
../hw/m68k/virt.c: In function 'virt_init':
../hw/m68k/bootinfo.h:58:26: error: cast from pointer to integer of
different size [-Werror=pointer-to-int-cast]
58 | base = (void *)(((unsigned long)base + 3) & ~3); \
| ^
../hw/m68k/virt.c:261:13: note: in expansion of macro 'BOOTINFOSTR'
261 | BOOTINFOSTR(param_ptr, BI_COMMAND_LINE,
| ^~~
../hw/m68k/bootinfo.h:58:16: error: cast to pointer from integer of
different size [-Werror=int-to-pointer-cast]
58 | base = (void *)(((unsigned long)base + 3) & ~3); \
| ^
../hw/m68k/virt.c:261:13: note: in expansion of macro 'BOOTINFOSTR'
261 | BOOTINFOSTR(param_ptr, BI_COMMAND_LINE,
| ^~~
../hw/m68k/bootinfo.h:75:26: error: cast from pointer to integer of
different size [-Werror=pointer-to-int-cast]
75 | base = (void *)(((unsigned long)base + 3) & ~3); \
| ^
../hw/m68k/virt.c:268:9: note: in expansion of macro 'BOOTINFODATA'
268 | BOOTINFODATA(param_ptr, BI_RNG_SEED,
| ^~~~
../hw/m68k/bootinfo.h:75:16: error: cast to pointer from integer of
different size [-Werror=int-to-pointer-cast]
75 | base = (void *)(((unsigned long)base + 3) & ~3); \
| ^
../hw/m68k/virt.c:268:9: note: in expansion of macro 'BOOTINFODATA'
268 | BOOTINFODATA(param_ptr, BI_RNG_SEED,
| ^~~~
cc1: all warnings being treated as errors

https://gitlab.com/qemu-project/qemu/-/jobs/3179717070

>
>  hw/m68k/bootinfo.h| 48 ++--
>  .../standard-headers/asm-m68k/bootinfo-virt.h |  4 +-
>  include/standard-headers/asm-m68k/bootinfo.h  |  8 +-
>  hw/m68k/q800.c| 76 ++-
>  hw/m68k/virt.c| 57 +-
>  5 files changed, 130 insertions(+), 63 deletions(-)
>
> --
> 2.37.3
>
>



Re: [PATCH v4] tcg/loongarch64: Add direct jump support

2022-10-16 Thread Richard Henderson

On 10/15/22 19:27, Qi Hu wrote:

Similar to the ARM64, LoongArch has PC-relative instructions such as
PCADDU18I. These instructions can be used to support direct jump for
LoongArch. Additionally, if instruction "B offset" can cover the target
address(target is within ±128MB range), a single "B offset" plus a nop
will be used by "tb_target_set_jump_target".

Cc: Richard Henderson
Signed-off-by: Qi Hu
---
Changes since v3:
- Fix the offset check error which is pointed by WANG Xuerui.
- Use TMP0 instead of T0.
- Remove useless block due to direct jump support.
- Add some assertions.
---
  tcg/loongarch64/tcg-target.c.inc | 48 +---
  tcg/loongarch64/tcg-target.h |  9 --
  2 files changed, 50 insertions(+), 7 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v1 06/12] xen-hvm: move common functions to hw/xen/xen-hvm-common.c

2022-10-16 Thread Julien Grall

Hi Vikram,

On 15/10/2022 06:07, Vikram Garhwal wrote:

+void xen_register_ioreq(XenIOState *state, unsigned int max_cpus,
+MemoryListener xen_memory_listener)
+{



[...]


+
+xen_bus_init();
+
+/* Initialize backend core & drivers */
+if (xen_be_init() != 0) {
+error_report("xen backend core setup failed");
+goto err;
+}
+xen_be_register_common();


Calling xen_be_init() and xen_be_register_common() from 
xen_register_ioreq() sounds wrong to me. There are no dependency between 
the two. I think it would be better to create a new function to register 
backends.


Cheers,

--
Julien Grall



Re: [PATCH v1 07/12] include/hw/xen/xen_common: return error from xen_create_ioreq_server

2022-10-16 Thread Julien Grall

Hi Vikram,

On 15/10/2022 06:07, Vikram Garhwal wrote:

From: Stefano Stabellini 

This is done to prepare for enabling xenpv support for ARM architecture.
On ARM it is possible to have a functioning xenpv machine with only the
PV backends and no IOREQ server. If the IOREQ server creation fails,
continue to the PV backends initialization.

Signed-off-by: Stefano Stabellini 
---
  include/hw/xen/xen_common.h | 12 +++-
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/include/hw/xen/xen_common.h b/include/hw/xen/xen_common.h
index 77ce17d8a4..c2d2f36bde 100644
--- a/include/hw/xen/xen_common.h
+++ b/include/hw/xen/xen_common.h
@@ -467,8 +467,8 @@ static inline void xen_unmap_pcidev(domid_t dom,
  {
  }
  
-static inline void xen_create_ioreq_server(domid_t dom,

-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
  {


I think there is a return missing here.


  }
  
@@ -600,8 +600,8 @@ static inline void xen_unmap_pcidev(domid_t dom,

PCI_FUNC(pci_dev->devfn));
  }
  
-static inline void xen_create_ioreq_server(domid_t dom,

-   ioservid_t *ioservid)
+static inline int xen_create_ioreq_server(domid_t dom,
+  ioservid_t *ioservid)
  {
  int rc = xendevicemodel_create_ioreq_server(xen_dmod, dom,
  HVM_IOREQSRV_BUFIOREQ_ATOMIC,
@@ -609,12 +609,14 @@ static inline void xen_create_ioreq_server(domid_t dom,
  
  if (rc == 0) {

  trace_xen_ioreq_server_create(*ioservid);
-return;
+return rc;
  }
  
  *ioservid = 0;

  use_default_ioreq_server = true;
  trace_xen_default_ioreq_server();
+
+return rc;
  }
  
  static inline void xen_destroy_ioreq_server(domid_t dom,


Cheers,

--
Julien Grall



Re: [PATCH v1 10/12] hw/arm: introduce xenpv machine

2022-10-16 Thread Julien Grall

Hi,

There seem to be some missing patches on xen-devel (including the cover 
letter). Is that expected?


On 15/10/2022 06:07, Vikram Garhwal wrote:

Add a new machine xenpv which creates a IOREQ server to register/connect with
Xen Hypervisor.


I don't like the name 'xenpv' because it doesn't convey the fact that 
some of the HW may be emulated rather than para-virtualized. In fact one 
may only want to use for emulating devices.


Potential name would be 'xen-arm' or re-using 'virt' but with 
'accel=xen' to select a Xen layout.




Xen IOREQ connection expect the TARGET_PAGE_SIZE to 4096, and the xenpv machine
on ARM will have no CPU definitions. We need to define TARGET_PAGE_SIZE
appropriately ourselves.

Optional: When CONFIG_TPM is enabled, it also creates a tpm-tis-device, adds a
TPM emulator and connects to swtpm running on host machine via chardev socket
and support TPM functionalities for a guest domain.

Extra command line for aarch64 xenpv QEMU to connect to swtpm:
 -chardev socket,id=chrtpm,path=/tmp/myvtpm2/swtpm-sock \
 -tpmdev emulator,id=tpm0,chardev=chrtpm \

swtpm implements a TPM software emulator(TPM 1.2 & TPM 2) built on libtpms and
provides access to TPM functionality over socket, chardev and CUSE interface.
Github repo: https://github.com/stefanberger/swtpm
Example for starting swtpm on host machine:
 mkdir /tmp/vtpm2
 swtpm socket --tpmstate dir=/tmp/vtpm2 \
 --ctrl type=unixio,path=/tmp/vtpm2/swtpm-sock &


I see patches for QEMU but not Xen. How can this be tested with existing 
Xen? Will libxl ever create QEMU?


[...]


+static int xen_init_ioreq(XenIOState *state, unsigned int max_cpus)
+{
+xen_dmod = xendevicemodel_open(0, 0);
+xen_xc = xc_interface_open(0, 0, 0);
+
+if (xen_xc == NULL) {


You are checking xen_xc but not xen_dmod. Why?


+perror("xen: can't open xen interface\n");
+return -1;
+}
+
+xen_fmem = xenforeignmemory_open(0, 0);
+if (xen_fmem == NULL) {
+perror("xen: can't open xen fmem interface\n");
+xc_interface_close(xen_xc);
+return -1;
+}
+
+xen_register_ioreq(state, max_cpus, xen_memory_listener);
+
+xenstore_record_dm_state(xenstore, "running");
+
+return 0;
+}
+
+static void xen_enable_tpm(void)
+{
+/* qemu_find_tpm_be is only available when CONFIG_TPM is enabled. */
+#ifdef CONFIG_TPM
+Error *errp = NULL;
+DeviceState *dev;
+SysBusDevice *busdev;
+
+TPMBackend *be = qemu_find_tpm_be("tpm0");
+if (be == NULL) {
+DPRINTF("Couldn't fine the backend for tpm0\n");
+return;
+}
+dev = qdev_new(TYPE_TPM_TIS_SYSBUS);
+object_property_set_link(OBJECT(dev), "tpmdev", OBJECT(be), );
+object_property_set_str(OBJECT(dev), "tpmdev", be->id, );
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(busdev, _fatal);
+sysbus_mmio_map(busdev, 0, GUEST_TPM_BASE);


I can't find where GUEST_TPM_BASE is defined. But then the guest memory 
layout is not expected to be stable. With your current approach, it 
means QEMU would need to be rebuilt for every Xen version. Is it what we 
want?



+
+DPRINTF("Connected tpmdev at address 0x%lx\n", GUEST_TPM_BASE);
+#endif
+}
+
+static void xen_arm_init(MachineState *machine)
+{
+XenArmState *xam = XEN_ARM(machine);
+
+xam->state =  g_new0(XenIOState, 1);
+
+if (xen_init_ioreq(xam->state, machine->smp.cpus)) {
+return;


In another patch, you said the IOREQ would be optional. IHMO, I think 
this is a bad idea to register it by default because one may only want 
to use PV drivers. Registering IOREQ will add unnecessary overhead in Xen.


Furthermore, it means that someone selecting TPM but Xen is not built 
with CONFIG_IOREQ=y (BTW This is still a tech preview but there are 
security holes on Arm...) will not get an error. Instead, the OS will 
until it crashes when trying to access the TPM.


Overall I think it would be better if IOREQ is only registered when a 
device requires (like TPM) it *and* throw an error if there is a problem 
during the initialization.



+} > +
+xen_enable_tpm();
+
+return;
+}
+
+static void xen_arm_machine_class_init(ObjectClass *oc, void *data)
+{
+
+MachineClass *mc = MACHINE_CLASS(oc);
+mc->desc = "Xen Para-virtualized PC";
+mc->init = xen_arm_init;
+mc->max_cpus = 1;
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);


Shouldn't this be protected with #ifdef CONFIG_TPM?


+}
+
+static const TypeInfo xen_arm_machine_type = {
+.name = TYPE_XEN_ARM,
+.parent = TYPE_MACHINE,
+.class_init = xen_arm_machine_class_init,
+.instance_size = sizeof(XenArmState),
+};
+
+static void xen_arm_machine_register_types(void)
+{
+type_register_static(_arm_machine_type);
+}
+
+type_init(xen_arm_machine_register_types)
diff --git a/include/hw/arm/xen_arch_hvm.h b/include/hw/arm/xen_arch_hvm.h
new file mode 100644
index 00..f645dfec28
--- /dev/null
+++ 

Re: [PATCH] docs/devel: remove incorrect claim about git send-email

2022-10-16 Thread Alyssa Ross
Linus Heckemann  writes:

> Alyssa Ross  writes:
>
>> Alyssa Ross  writes:
>>
>>> Linus Heckemann  writes:
>>>
 While it's unclear to me what git send-email actually does with the
 -v2 parameter (it is not documented, but also not rejected), it does
 not add a v2 tag to the email's subject, which is what led to the
 mishap in [1].

 [1]: https://lists.nongnu.org/archive/html/qemu-devel/2022-09/msg00679.html
>>>
>>> It does for me!
>>>
>>> Tested with:
>>>
>>>git send-email -v2 --to h...@alyssa.is HEAD~
>>>
>>> X-Mailer: git-send-email 2.37.1
>>
>> I wouldn't be surprised if it only adds it when it's generating the
>> patch though.  Did you perhaps run git format-patch first to generate a
>> patch file, and then use git send-email to send it?
>
> Yes! I didn't realise that git send-email can be used without the
> intermediate format-patch step. I guess it's a git bug that git
> send-email will silently ignore -v when used with a patch file. I'll
> have a look at fixing that.

Yeah, that sounds like the best way to go.  I think it'll swallow /any/
format-patch options when used that way.  Would be nice if it warned.


signature.asc
Description: PGP signature


[PATCH v4 2/3] block: introduce zone append write for zoned devices

2022-10-16 Thread Sam Li
A zone append command is a write operation that specifies the first
logical block of a zone as the write position. When writing to a zoned
block device using zone append, the byte offset of writes is pointing
to the write pointer of that zone. Upon completion the device will
respond with the position the data has been written in the zone.

Signed-off-by: Sam Li 
---
 block/block-backend.c | 65 ++
 block/file-posix.c| 89 +--
 block/io.c| 21 
 block/raw-format.c|  8 +++
 include/block/block-io.h  |  3 ++
 include/block/block_int-common.h  |  5 ++
 include/block/raw-aio.h   |  4 +-
 include/sysemu/block-backend-io.h |  9 
 8 files changed, 198 insertions(+), 6 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1c618e9c68..06931ddd24 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1439,6 +1439,9 @@ typedef struct BlkRwCo {
 struct {
 unsigned long op;
 } zone_mgmt;
+struct {
+int64_t *append_sector;
+} zone_append;
 };
 } BlkRwCo;
 
@@ -1871,6 +1874,47 @@ BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return >common;
 }
 
+static void coroutine_fn blk_aio_zone_append_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_append(rwco->blk, rwco->zone_append.append_sector,
+   rwco->iobuf, rwco->flags);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags,
+BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.ret= NOT_DONE,
+.flags  = flags,
+.iobuf  = qiov,
+.zone_append = {
+.append_sector = offset,
+},
+};
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_append_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
 /*
  * Send a zone_report command.
  * offset is a byte offset from the start of the device. No alignment
@@ -1923,6 +1967,27 @@ int coroutine_fn blk_co_zone_mgmt(BlockBackend *blk, 
BlockZoneOp op,
 return ret;
 }
 
+/*
+ * Send a zone_append command.
+ */
+int coroutine_fn blk_co_zone_append(BlockBackend *blk, int64_t *offset,
+QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+blk_wait_while_drained(blk);
+if (!blk_is_available(blk)) {
+blk_dec_in_flight(blk);
+return -ENOMEDIUM;
+}
+
+ret = bdrv_co_zone_append(blk_bs(blk), offset, qiov, flags);
+blk_dec_in_flight(blk);
+return ret;
+}
+
 void blk_drain(BlockBackend *blk)
 {
 BlockDriverState *bs = blk_bs(blk);
diff --git a/block/file-posix.c b/block/file-posix.c
index 5ff5500301..3d0cc33d02 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -205,6 +205,7 @@ typedef struct RawPosixAIOData {
 struct {
 struct iovec *iov;
 int niov;
+int64_t *offset;
 } io;
 struct {
 uint64_t cmd;
@@ -1475,6 +1476,11 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_active_zones = ret;
 }
 
+ret = get_sysfs_long_val(, "physical_block_size");
+if (ret >= 0) {
+bs->bl.write_granularity = ret;
+}
+
 bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
 if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
 error_report("report wps failed");
@@ -1647,9 +1653,18 @@ qemu_pwritev(int fd, const struct iovec *iov, int 
nr_iov, off_t offset)
 static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
 {
 ssize_t len;
+BlockZoneWps *wps = aiocb->bs->bl.wps;
+int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
+
+if (wps) {
+qemu_mutex_lock(>lock);
+if (aiocb->aio_type & QEMU_AIO_ZONE_APPEND) {
+aiocb->aio_offset = wps->wp[index];
+}
+}
 
 do {
-if (aiocb->aio_type & QEMU_AIO_WRITE)
+if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND))
 len = qemu_pwritev(aiocb->aio_fildes,
aiocb->io.iov,
aiocb->io.niov,
@@ -1660,6 +1675,9 

[RFC v3 2/2] virtio-blk: add zoned storage emulation for zoned devices

2022-10-16 Thread Sam Li
This patch extends virtio-blk emulation to handle zoned device commands
by calling the new block layer APIs to perform zoned device I/O on
behalf of the guest. It supports Report Zone, four zone oparations (open,
close, finish, reset), and Append Zone.

The VIRTIO_BLK_F_ZONED feature bit will only be set if the host does
support zoned block devices. Regular block devices(conventional zones)
will not be set.

Then the guest os can use blkzone(8) to test those commands on zoned devices.
Furthermore, using zonefs to test zone append write is also supported.

Signed-off-by: Sam Li 
---
 hw/block/virtio-blk-common.c   |   2 +
 hw/block/virtio-blk.c  | 412 -
 include/hw/virtio/virtio-blk.h |  11 +-
 3 files changed, 422 insertions(+), 3 deletions(-)

diff --git a/hw/block/virtio-blk-common.c b/hw/block/virtio-blk-common.c
index ac52d7c176..e2f8e2f6da 100644
--- a/hw/block/virtio-blk-common.c
+++ b/hw/block/virtio-blk-common.c
@@ -29,6 +29,8 @@ static const VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_blk_config, discard_sector_alignment)},
 {.flags = 1ULL << VIRTIO_BLK_F_WRITE_ZEROES,
  .end = endof(struct virtio_blk_config, write_zeroes_may_unmap)},
+{.flags = 1ULL << VIRTIO_BLK_F_ZONED,
+ .end = endof(struct virtio_blk_config, zoned)},
 {}
 };
 
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 8131ec2dbc..58891aea31 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -26,6 +26,9 @@
 #include "hw/virtio/virtio-blk.h"
 #include "dataplane/virtio-blk.h"
 #include "scsi/constants.h"
+#if defined(CONFIG_BLKZONED)
+#include 
+#endif
 #ifdef __linux__
 # include 
 #endif
@@ -55,10 +58,29 @@ static void virtio_blk_req_complete(VirtIOBlockReq *req, 
unsigned char status)
 {
 VirtIOBlock *s = req->dev;
 VirtIODevice *vdev = VIRTIO_DEVICE(s);
+int64_t inhdr_len, n;
+void *buf;
 
 trace_virtio_blk_req_complete(vdev, req, status);
 
-stb_p(>in->status, status);
+iov_discard_undo(>inhdr_undo);
+if (virtio_ldl_p(vdev, >out.type) == VIRTIO_BLK_T_ZONE_APPEND) {
+inhdr_len = sizeof(struct virtio_blk_zone_append_inhdr);
+req->in.in_hdr->status = status;
+buf = req->in.in_hdr;
+} else {
+inhdr_len = sizeof(struct virtio_blk_inhdr);
+req->in.zone_append_inhdr->status = status;
+buf = req->in.zone_append_inhdr;
+}
+
+n = iov_from_buf(req->elem.in_sg, req->elem.in_num,
+ req->in_len - inhdr_len, buf, inhdr_len);
+if (n != inhdr_len) {
+virtio_error(vdev, "Driver provided input buffer less than size of "
+ "in header");
+}
+
 iov_discard_undo(>inhdr_undo);
 iov_discard_undo(>outhdr_undo);
 virtqueue_push(req->vq, >elem, req->in_len);
@@ -592,6 +614,334 @@ err:
 return err_status;
 }
 
+typedef struct ZoneCmdData {
+VirtIOBlockReq *req;
+union {
+struct {
+unsigned int nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report_data;
+struct {
+int64_t offset;
+} zone_append_data;
+};
+} ZoneCmdData;
+
+/*
+ * check zoned_request: error checking before issuing requests. If all checks
+ * passed, return true.
+ * append: true if only zone append requests issued.
+ */
+static bool check_zoned_request(VirtIOBlock *s, int64_t offset, int64_t len,
+ bool append, uint8_t *status) {
+BlockDriverState *bs = blk_bs(s->blk);
+int index = offset / bs->bl.zone_size;
+
+if (offset < 0 || len < 0 || offset > bs->bl.capacity - len) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (!virtio_has_feature(s->host_features, VIRTIO_BLK_F_ZONED)) {
+*status = VIRTIO_BLK_S_UNSUPP;
+return false;
+}
+
+if (append) {
+if ((offset % bs->bl.write_granularity) != 0) {
+*status = VIRTIO_BLK_S_ZONE_UNALIGNED_WP;
+return false;
+}
+
+if (BDRV_ZT_IS_CONV(bs->bl.wps->wp[index])) {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+return false;
+}
+
+if (len / 512 > bs->bl.max_append_sectors) {
+if (bs->bl.max_append_sectors == 0) {
+*status = VIRTIO_BLK_S_UNSUPP;
+} else {
+*status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+}
+return false;
+}
+}
+return true;
+}
+
+static void virtio_blk_zone_report_complete(void *opaque, int ret)
+{
+ZoneCmdData *data = opaque;
+VirtIOBlockReq *req = data->req;
+VirtIOBlock *s = req->dev;
+VirtIODevice *vdev = VIRTIO_DEVICE(req->dev);
+struct iovec *in_iov = req->elem.in_sg;
+unsigned in_num = req->elem.in_num;
+int64_t zrp_size, nz, n, j = 0;
+int8_t err_status = VIRTIO_BLK_S_OK;
+
+if (ret) {
+err_status = VIRTIO_BLK_S_ZONE_INVALID_CMD;
+goto out;
+}
+
+nz = 

[RFC v3 1/2] include: update virtio_blk headers from Linux 5.19-rc2+

2022-10-16 Thread Sam Li
Use scripts/update-linux-headers.sh to update virtio-blk headers
from Dmitry's "virtio-blk:add support for zoned block devices"
linux patch. There is a link for more information:
https://github.com/dmitry-fomichev/virtblk-zbd

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Sam Li 
---
 include/standard-headers/linux/virtio_blk.h | 109 
 1 file changed, 109 insertions(+)

diff --git a/include/standard-headers/linux/virtio_blk.h 
b/include/standard-headers/linux/virtio_blk.h
index 2dcc90826a..490bd21c76 100644
--- a/include/standard-headers/linux/virtio_blk.h
+++ b/include/standard-headers/linux/virtio_blk.h
@@ -40,6 +40,7 @@
 #define VIRTIO_BLK_F_MQ12  /* support more than one vq */
 #define VIRTIO_BLK_F_DISCARD   13  /* DISCARD is supported */
 #define VIRTIO_BLK_F_WRITE_ZEROES  14  /* WRITE ZEROES is supported */
+#define VIRTIO_BLK_F_ZONED 17  /* Zoned block device */
 
 /* Legacy feature bits */
 #ifndef VIRTIO_BLK_NO_LEGACY
@@ -119,6 +120,20 @@ struct virtio_blk_config {
uint8_t write_zeroes_may_unmap;
 
uint8_t unused1[3];
+
+   /* Secure erase fields that are defined in the virtio spec */
+   uint8_t sec_erase[12];
+
+   /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */
+   struct virtio_blk_zoned_characteristics {
+   __virtio32 zone_sectors;
+   __virtio32 max_open_zones;
+   __virtio32 max_active_zones;
+   __virtio32 max_append_sectors;
+   __virtio32 write_granularity;
+   uint8_t model;
+   uint8_t unused2[3];
+   } zoned;
 } QEMU_PACKED;
 
 /*
@@ -153,6 +168,27 @@ struct virtio_blk_config {
 /* Write zeroes command */
 #define VIRTIO_BLK_T_WRITE_ZEROES  13
 
+/* Zone append command */
+#define VIRTIO_BLK_T_ZONE_APPEND15
+
+/* Report zones command */
+#define VIRTIO_BLK_T_ZONE_REPORT16
+
+/* Open zone command */
+#define VIRTIO_BLK_T_ZONE_OPEN  18
+
+/* Close zone command */
+#define VIRTIO_BLK_T_ZONE_CLOSE 20
+
+/* Finish zone command */
+#define VIRTIO_BLK_T_ZONE_FINISH22
+
+/* Reset zone command */
+#define VIRTIO_BLK_T_ZONE_RESET 24
+
+/* Reset All zones command */
+#define VIRTIO_BLK_T_ZONE_RESET_ALL 26
+
 #ifndef VIRTIO_BLK_NO_LEGACY
 /* Barrier before this op. */
 #define VIRTIO_BLK_T_BARRIER   0x8000
@@ -172,6 +208,72 @@ struct virtio_blk_outhdr {
__virtio64 sector;
 };
 
+/*
+ * Supported zoned device models.
+ */
+
+/* Regular block device */
+#define VIRTIO_BLK_Z_NONE  0
+/* Host-managed zoned device */
+#define VIRTIO_BLK_Z_HM1
+/* Host-aware zoned device */
+#define VIRTIO_BLK_Z_HA2
+
+/*
+ * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply.
+ */
+struct virtio_blk_zone_descriptor {
+   /* Zone capacity */
+   __virtio64 z_cap;
+   /* The starting sector of the zone */
+   __virtio64 z_start;
+   /* Zone write pointer position in sectors */
+   __virtio64 z_wp;
+   /* Zone type */
+   uint8_t z_type;
+   /* Zone state */
+   uint8_t z_state;
+   uint8_t reserved[38];
+};
+
+struct virtio_blk_zone_report {
+   __virtio64 nr_zones;
+   uint8_t reserved[56];
+   struct virtio_blk_zone_descriptor zones[];
+};
+
+/*
+ * Supported zone types.
+ */
+
+/* Conventional zone */
+#define VIRTIO_BLK_ZT_CONV 1
+/* Sequential Write Required zone */
+#define VIRTIO_BLK_ZT_SWR  2
+/* Sequential Write Preferred zone */
+#define VIRTIO_BLK_ZT_SWP  3
+
+/*
+ * Zone states that are available for zones of all types.
+ */
+
+/* Not a write pointer (conventional zones only) */
+#define VIRTIO_BLK_ZS_NOT_WP   0
+/* Empty */
+#define VIRTIO_BLK_ZS_EMPTY1
+/* Implicitly Open */
+#define VIRTIO_BLK_ZS_IOPEN2
+/* Explicitly Open */
+#define VIRTIO_BLK_ZS_EOPEN3
+/* Closed */
+#define VIRTIO_BLK_ZS_CLOSED   4
+/* Read-Only */
+#define VIRTIO_BLK_ZS_RDONLY   13
+/* Full */
+#define VIRTIO_BLK_ZS_FULL 14
+/* Offline */
+#define VIRTIO_BLK_ZS_OFFLINE  15
+
 /* Unmap this range (only valid for write zeroes command) */
 #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x0001
 
@@ -198,4 +300,11 @@ struct virtio_scsi_inhdr {
 #define VIRTIO_BLK_S_OK0
 #define VIRTIO_BLK_S_IOERR 1
 #define VIRTIO_BLK_S_UNSUPP2
+
+/* Error codes that are specific to zoned block devices */
+#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3
+#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP4
+#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE   5
+#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6
+
 #endif /* _LINUX_VIRTIO_BLK_H */
-- 
2.37.3




[PATCH v4 1/3] file-posix: add the tracking of the zones write pointers

2022-10-16 Thread Sam Li
Since Linux doesn't have a user API to issue zone append operations to
zoned devices from user space, the file-posix driver is modified to add
zone append emulation using regular writes. To do this, the file-posix
driver tracks the wp location of all zones of the device. It uses an
array of uint64_t. The most significant bit of each wp location indicates
if the zone type is conventional zones.

The zones wp can be changed due to the following operations issued:
- zone reset: change the wp to the start offset of that zone
- zone finish: change to the end location of that zone
- write to a zone
- zone append

Signed-off-by: Sam Li 
---
 block/file-posix.c   | 144 +++
 include/block/block-common.h |  14 +++
 include/block/block_int-common.h |   3 +
 3 files changed, 161 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 7c5a330fc1..5ff5500301 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1324,6 +1324,66 @@ static int hdev_get_max_segments(int fd, struct stat *st)
 #endif
 }
 
+#if defined(CONFIG_BLKZONED)
+static int get_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz) {
+struct blk_zone *blkz;
+int64_t rep_size;
+int64_t sector = offset >> BDRV_SECTOR_BITS;
+int ret, n = 0, i = 0;
+rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
+g_autofree struct blk_zone_report *rep = NULL;
+
+rep = g_malloc(rep_size);
+blkz = (struct blk_zone *)(rep + 1);
+while (n < nrz) {
+memset(rep, 0, rep_size);
+rep->sector = sector;
+rep->nr_zones = nrz - n;
+
+do {
+ret = ioctl(fd, BLKREPORTZONE, rep);
+} while (ret != 0 && errno == EINTR);
+if (ret != 0) {
+error_report("%d: ioctl BLKREPORTZONE at %" PRId64 " failed %d",
+fd, offset, errno);
+return -errno;
+}
+
+if (!rep->nr_zones) {
+break;
+}
+
+for (i = 0; i < rep->nr_zones; i++, n++) {
+/*
+ * The wp tracking cares only about sequential writes required and
+ * sequential write preferred zones so that the wp can advance to
+ * the right location.
+ * Use the most significant bit of the wp location to indicate the
+ * zone type: 0 for SWR/SWP zones and 1 for conventional zones.
+ */
+if (blkz[i].type == BLK_ZONE_TYPE_CONVENTIONAL) {
+wps->wp[i] = 1ULL << 63;
+} else {
+wps->wp[i] = blkz[i].wp << BDRV_SECTOR_BITS;
+}
+}
+sector = blkz[i - 1].start + blkz[i - 1].len;
+}
+
+return 0;
+}
+
+static void update_zones_wp(int fd, BlockZoneWps *wps, int64_t offset,
+unsigned int nrz) {
+qemu_mutex_lock(>lock);
+if (get_zones_wp(fd, wps, offset, nrz) < 0) {
+error_report("update zone wp failed");
+}
+qemu_mutex_unlock(>lock);
+}
+#endif
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
@@ -1414,6 +1474,14 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 if (ret >= 0) {
 bs->bl.max_active_zones = ret;
 }
+
+bs->bl.wps = g_malloc(sizeof(BlockZoneWps) + sizeof(int64_t) * ret);
+if (get_zones_wp(s->fd, bs->bl.wps, 0, ret) < 0) {
+error_report("report wps failed");
+g_free(bs->bl.wps);
+return;
+}
+qemu_mutex_init(>bl.wps->lock);
 }
 }
 
@@ -1725,6 +1793,25 @@ static int handle_aiocb_rw(void *opaque)
 
 out:
 if (nbytes == aiocb->aio_nbytes) {
+#if defined(CONFIG_BLKZONED)
+if (aiocb->aio_type & QEMU_AIO_WRITE) {
+BlockZoneWps *wps = aiocb->bs->bl.wps;
+int index = aiocb->aio_offset / aiocb->bs->bl.zone_size;
+if (wps) {
+qemu_mutex_lock(>lock);
+if (!BDRV_ZT_IS_CONV(wps->wp[index])) {
+uint64_t wend_offset =
+aiocb->aio_offset + aiocb->aio_nbytes;
+
+/* Advance the wp if needed */
+if (wend_offset > wps->wp[index]) {
+wps->wp[index] = wend_offset;
+}
+}
+qemu_mutex_unlock(>lock);
+}
+}
+#endif
 return 0;
 } else if (nbytes >= 0 && nbytes < aiocb->aio_nbytes) {
 if (aiocb->aio_type & QEMU_AIO_WRITE) {
@@ -1736,6 +1823,11 @@ out:
 }
 } else {
 assert(nbytes < 0);
+#if defined(CONFIG_BLKZONED)
+if (aiocb->aio_type & QEMU_AIO_WRITE) {
+update_zones_wp(aiocb->aio_fildes, aiocb->bs->bl.wps, 0, 1);
+}
+#endif
 return nbytes;
 }
 }
@@ -2022,14 +2114,29 @@ static int handle_aiocb_zone_report(void *opaque)
 #endif
 
 #if 

[RFC v3 0/2] Add zoned storage emulation to virtio-blk driver

2022-10-16 Thread Sam Li
Note: the virtio-blk headers isn't upstream in the kernel yet therefore
marked as an RFC.

v3:
- use qemuio_from_buffer to write status bit [Stefan]
- avoid using req->elem directly [Stefan]
- fix error checkings and memory leak [Stefan]

v2:
- change units of emulated zone op coresponding to block layer APIs
- modify error checking cases [Stefan, Damien]

v1:
- add zoned storage emulation

Sam Li (2):
  include: update virtio_blk headers from Linux 5.19-rc2+
  virtio-blk: add zoned storage emulation for zoned devices

 hw/block/virtio-blk-common.c|   2 +
 hw/block/virtio-blk.c   | 412 +++-
 include/hw/virtio/virtio-blk.h  |  11 +-
 include/standard-headers/linux/virtio_blk.h | 109 ++
 4 files changed, 531 insertions(+), 3 deletions(-)

-- 
2.37.3




[PATCH v4 3/3] qemu-iotests: test zone append operation

2022-10-16 Thread Sam Li
This tests is mainly a helper to indicate append writes in block layer
behaves as expected.

Signed-off-by: Sam Li 
---
 qemu-io-cmds.c | 63 ++
 tests/qemu-iotests/tests/zoned.out |  7 
 tests/qemu-iotests/tests/zoned.sh  |  9 +
 3 files changed, 79 insertions(+)

diff --git a/qemu-io-cmds.c b/qemu-io-cmds.c
index c1b28ea108..ca92291a44 100644
--- a/qemu-io-cmds.c
+++ b/qemu-io-cmds.c
@@ -1856,6 +1856,68 @@ static const cmdinfo_t zone_reset_cmd = {
 .oneline = "reset a zone write pointer in zone block device",
 };
 
+static int do_aio_zone_append(BlockBackend *blk, QEMUIOVector *qiov,
+  int64_t *offset, int flags, int *total)
+{
+int async_ret = NOT_DONE;
+
+blk_aio_zone_append(blk, offset, qiov, flags, aio_rw_done, _ret);
+while (async_ret == NOT_DONE) {
+main_loop_wait(false);
+}
+
+*total = qiov->size;
+return async_ret < 0 ? async_ret : 1;
+}
+
+static int zone_append_f(BlockBackend *blk, int argc, char **argv)
+{
+int ret;
+int flags = 0;
+int total = 0;
+int64_t offset;
+char *buf;
+int nr_iov;
+int pattern = 0xcd;
+QEMUIOVector qiov;
+
+if (optind > argc - 2) {
+return -EINVAL;
+}
+optind++;
+offset = cvtnum(argv[optind]);
+if (offset < 0) {
+print_cvtnum_err(offset, argv[optind]);
+return offset;
+}
+optind++;
+nr_iov = argc - optind;
+buf = create_iovec(blk, , [optind], nr_iov, pattern);
+if (buf == NULL) {
+return -EINVAL;
+}
+ret = do_aio_zone_append(blk, , , flags, );
+if (ret < 0) {
+printf("zone append failed: %s\n", strerror(-ret));
+goto out;
+}
+
+out:
+qemu_iovec_destroy();
+qemu_io_free(buf);
+return ret;
+}
+
+static const cmdinfo_t zone_append_cmd = {
+.name = "zone_append",
+.altname = "zap",
+.cfunc = zone_append_f,
+.argmin = 3,
+.argmax = 3,
+.args = "offset len [len..]",
+.oneline = "append write a number of bytes at a specified offset",
+};
+
 static int truncate_f(BlockBackend *blk, int argc, char **argv);
 static const cmdinfo_t truncate_cmd = {
 .name   = "truncate",
@@ -2653,6 +2715,7 @@ static void __attribute((constructor)) 
init_qemuio_commands(void)
 qemuio_add_command(_close_cmd);
 qemuio_add_command(_finish_cmd);
 qemuio_add_command(_reset_cmd);
+qemuio_add_command(_append_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
 qemuio_add_command(_cmd);
diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
index 0c8f96deb9..b3b139b4ec 100644
--- a/tests/qemu-iotests/tests/zoned.out
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -50,4 +50,11 @@ start: 0x8, len 0x8, cap 0x8, wptr 0x10, 
zcond:14, [type: 2]
 (5) resetting the second zone
 After resetting a zone:
 start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+
+
+(6) append write
+After appending the first zone:
+start: 0x0, len 0x8, cap 0x8, wptr 0x18, zcond:2, [type: 2]
+After appending the second zone:
+start: 0x8, len 0x8, cap 0x8, wptr 0x80018, zcond:2, [type: 2]
 *** done
diff --git a/tests/qemu-iotests/tests/zoned.sh 
b/tests/qemu-iotests/tests/zoned.sh
index fced0194c5..888711eef2 100755
--- a/tests/qemu-iotests/tests/zoned.sh
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -79,6 +79,15 @@ echo "(5) resetting the second zone"
 sudo $QEMU_IO $IMG -c "zrs 268435456 268435456"
 echo "After resetting a zone:"
 sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(6) append write" # physical block size of the device is 4096
+sudo $QEMU_IO $IMG -c "zap 0 0x1000 0x2000"
+echo "After appending the first zone:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+sudo $QEMU_IO $IMG -c "zap 268435456 0x1000 0x2000"
+echo "After appending the second zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
 
 # success, all done
 echo "*** done"
-- 
2.37.3




[PATCH v12 6/7] qemu-iotests: test new zone operations

2022-10-16 Thread Sam Li
We have added new block layer APIs of zoned block devices. Test it with:
Create a null_blk device, run each zone operation on it and see
whether reporting right zone information.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
---
 tests/qemu-iotests/tests/zoned.out | 53 ++
 tests/qemu-iotests/tests/zoned.sh  | 86 ++
 2 files changed, 139 insertions(+)
 create mode 100644 tests/qemu-iotests/tests/zoned.out
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

diff --git a/tests/qemu-iotests/tests/zoned.out 
b/tests/qemu-iotests/tests/zoned.out
new file mode 100644
index 00..0c8f96deb9
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.out
@@ -0,0 +1,53 @@
+QA output created by zoned.sh
+Testing a null_blk device:
+Simple cases: if the operations work
+(1) report the first zone:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+report the first 10 zones
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+start: 0x10, len 0x8, cap 0x8, wptr 0x10, zcond:1, [type: 2]
+start: 0x18, len 0x8, cap 0x8, wptr 0x18, zcond:1, [type: 2]
+start: 0x20, len 0x8, cap 0x8, wptr 0x20, zcond:1, [type: 2]
+start: 0x28, len 0x8, cap 0x8, wptr 0x28, zcond:1, [type: 2]
+start: 0x30, len 0x8, cap 0x8, wptr 0x30, zcond:1, [type: 2]
+start: 0x38, len 0x8, cap 0x8, wptr 0x38, zcond:1, [type: 2]
+start: 0x40, len 0x8, cap 0x8, wptr 0x40, zcond:1, [type: 2]
+start: 0x48, len 0x8, cap 0x8, wptr 0x48, zcond:1, [type: 2]
+
+report the last zone:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 
2]
+
+
+(2) opening the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:3, [type: 2]
+
+opening the second zone
+report after:
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:3, [type: 2]
+
+opening the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:3, [type: 
2]
+
+
+(3) closing the first zone
+report after:
+start: 0x0, len 0x8, cap 0x8, wptr 0x0, zcond:1, [type: 2]
+
+closing the last zone
+report after:
+start: 0x1f38, len 0x8, cap 0x8, wptr 0x1f38, zcond:1, [type: 
2]
+
+
+(4) finishing the second zone
+After finishing a zone:
+start: 0x8, len 0x8, cap 0x8, wptr 0x10, zcond:14, [type: 2]
+
+
+(5) resetting the second zone
+After resetting a zone:
+start: 0x8, len 0x8, cap 0x8, wptr 0x8, zcond:1, [type: 2]
+*** done
diff --git a/tests/qemu-iotests/tests/zoned.sh 
b/tests/qemu-iotests/tests/zoned.sh
new file mode 100755
index 00..fced0194c5
--- /dev/null
+++ b/tests/qemu-iotests/tests/zoned.sh
@@ -0,0 +1,86 @@
+#!/usr/bin/env bash
+#
+# Test zone management operations.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+status=1 # failure is the default!
+
+_cleanup()
+{
+  _cleanup_test_img
+  sudo rmmod null_blk
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+# This test only runs on Linux hosts with raw image files.
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+QEMU_IO="build/qemu-io"
+IMG="--image-opts -n driver=zoned_host_device,filename=/dev/nullb0"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
+
+echo "Testing a null_blk device:"
+echo "case 1: if the operations work"
+sudo modprobe null_blk nr_devices=1 zoned=1
+
+echo "(1) report the first zone:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "report the first 10 zones"
+sudo $QEMU_IO $IMG -c "zrp 0 10"
+echo
+echo "report the last zone:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2" # 0x3e7000 / 512 = 0x1f38
+echo
+echo
+echo "(2) opening the first zone"
+sudo $QEMU_IO $IMG -c "zo 0 268435456"  # 268435456 / 512 = 524288
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "opening the second zone"
+sudo $QEMU_IO $IMG -c "zo 268435456 268435456" #
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo "opening the last zone"
+sudo $QEMU_IO $IMG -c "zo 0x3e7000 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(3) closing the first zone"
+sudo $QEMU_IO $IMG -c "zc 0 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0 1"
+echo
+echo "closing the last zone"
+sudo $QEMU_IO $IMG -c "zc 0x3e7000 268435456"
+echo "report after:"
+sudo $QEMU_IO $IMG -c "zrp 0x3e7000 2"
+echo
+echo
+echo "(4) finishing the second zone"
+sudo $QEMU_IO $IMG -c "zf 268435456 268435456"
+echo "After finishing a zone:"
+sudo $QEMU_IO $IMG -c "zrp 268435456 1"
+echo
+echo
+echo "(5) resetting the second zone"
+sudo $QEMU_IO $IMG -c "zrs 268435456 268435456"
+echo "After resetting a zone:"
+sudo $QEMU_IO 

[PATCH v12 7/7] docs/zoned-storage: add zoned device documentation

2022-10-16 Thread Sam Li
Add the documentation about the zoned device support to virtio-blk
emulation.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
---
 docs/devel/zoned-storage.rst   | 43 ++
 docs/system/qemu-block-drivers.rst.inc |  6 
 2 files changed, 49 insertions(+)
 create mode 100644 docs/devel/zoned-storage.rst

diff --git a/docs/devel/zoned-storage.rst b/docs/devel/zoned-storage.rst
new file mode 100644
index 00..cf169d029b
--- /dev/null
+++ b/docs/devel/zoned-storage.rst
@@ -0,0 +1,43 @@
+=
+zoned-storage
+=
+
+Zoned Block Devices (ZBDs) divide the LBA space into block regions called zones
+that are larger than the LBA size. They can only allow sequential writes, which
+can reduce write amplification in SSDs, and potentially lead to higher
+throughput and increased capacity. More details about ZBDs can be found at:
+
+https://zonedstorage.io/docs/introduction/zoned-storage
+
+1. Block layer APIs for zoned storage
+-
+QEMU block layer has three zoned storage model:
+- BLK_Z_HM: The host-managed zoned model only allows sequential writes access
+to zones. It supports ZBD-specific I/O commands that can be used by a host to
+manage the zones of a device.
+- BLK_Z_HA: The host-aware zoned model allows random write operations in
+zones, making it backward compatible with regular block devices.
+- BLK_Z_NONE: The non-zoned model has no zones support. It includes both
+regular and drive-managed ZBD devices. ZBD-specific I/O commands are not
+supported.
+
+The block device information resides inside BlockDriverState. QEMU uses
+BlockLimits struct(BlockDriverState::bl) that is continuously accessed by the
+block layer while processing I/O requests. A BlockBackend has a root pointer to
+a BlockDriverState graph(for example, raw format on top of file-posix). The
+zoned storage information can be propagated from the leaf BlockDriverState all
+the way up to the BlockBackend. If the zoned storage model in file-posix is
+set to BLK_Z_HM, then block drivers will declare support for zoned host device.
+
+The block layer APIs support commands needed for zoned storage devices,
+including report zones, four zone operations, and zone append.
+
+2. Emulating zoned storage controllers
+--
+When the BlockBackend's BlockLimits model reports a zoned storage device, users
+like the virtio-blk emulation or the qemu-io-cmds.c utility can use block layer
+APIs for zoned storage emulation or testing.
+
+For example, to test zone_report on a null_blk device using qemu-io is:
+$ path/to/qemu-io --image-opts -n driver=zoned_host_device,filename=/dev/nullb0
+-c "zrp offset nr_zones"
diff --git a/docs/system/qemu-block-drivers.rst.inc 
b/docs/system/qemu-block-drivers.rst.inc
index dfe5d2293d..0b97227fd9 100644
--- a/docs/system/qemu-block-drivers.rst.inc
+++ b/docs/system/qemu-block-drivers.rst.inc
@@ -430,6 +430,12 @@ Hard disks
   you may corrupt your host data (use the ``-snapshot`` command
   line option or modify the device permissions accordingly).
 
+Zoned block devices
+  Zoned block devices can be passed through to the guest if the emulated 
storage
+  controller supports zoned storage. Use ``--blockdev zoned_host_device,
+  node-name=drive0,filename=/dev/nullb0`` to pass through ``/dev/nullb0``
+  as ``drive0``.
+
 Windows
 ^^^
 
-- 
2.37.3




[PATCH v4 0/3] Add zone append write for zoned device

2022-10-16 Thread Sam Li
v4:
- fix lock related issues[Damien]
- drop all field in zone_mgmt op [Damien]
- fix state checks in zong_mgmt command [Damien]
- return start sector of wp when issuing zap req [Damien]

v3:
- only read wps when it is locked [Damien]
- allow last smaller zone case [Damien]
- add zone type and state checks in zone_mgmt command [Damien]
- fix RESET_ALL related problems

v2:
- split patch to two patches for better reviewing
- change BlockZoneWps's structure to an array of integers
- use only mutex lock on locking conditions of zone wps
- coding styles and clean-ups

v1:
- introduce zone append write

Sam Li (3):
  file-posix: add the tracking of the zones write pointers
  block: introduce zone append write for zoned devices
  qemu-iotests: test zone append operation

 block/block-backend.c  |  65 
 block/file-posix.c | 229 -
 block/io.c |  21 +++
 block/raw-format.c |   8 +
 include/block/block-common.h   |  14 ++
 include/block/block-io.h   |   3 +
 include/block/block_int-common.h   |   8 +
 include/block/raw-aio.h|   4 +-
 include/sysemu/block-backend-io.h  |   9 ++
 qemu-io-cmds.c |  63 
 tests/qemu-iotests/tests/zoned.out |   7 +
 tests/qemu-iotests/tests/zoned.sh  |   9 ++
 12 files changed, 436 insertions(+), 4 deletions(-)

-- 
2.37.3




[PATCH v12 5/7] config: add check to block layer

2022-10-16 Thread Sam Li
Putting zoned/non-zoned BlockDrivers on top of each other is not
allowed.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Hannes Reinecke 
---
 block.c  | 19 +++
 block/file-posix.c   | 12 
 block/raw-format.c   |  1 +
 include/block/block_int-common.h |  5 +
 4 files changed, 37 insertions(+)

diff --git a/block.c b/block.c
index 1fbf6b9e69..5d6fa4a25a 100644
--- a/block.c
+++ b/block.c
@@ -7951,6 +7951,25 @@ void bdrv_add_child(BlockDriverState *parent_bs, 
BlockDriverState *child_bs,
 return;
 }
 
+/*
+ * Non-zoned block drivers do not follow zoned storage constraints
+ * (i.e. sequential writes to zones). Refuse mixing zoned and non-zoned
+ * drivers in a graph.
+ */
+if (!parent_bs->drv->supports_zoned_children &&
+child_bs->bl.zoned == BLK_Z_HM) {
+/*
+ * The host-aware model allows zoned storage constraints and random
+ * write. Allow mixing host-aware and non-zoned drivers. Using
+ * host-aware device as a regular device.
+ */
+error_setg(errp, "Cannot add a %s child to a %s parent",
+   child_bs->bl.zoned == BLK_Z_HM ? "zoned" : "non-zoned",
+   parent_bs->drv->supports_zoned_children ?
+   "support zoned children" : "not support zoned children");
+return;
+}
+
 if (!QLIST_EMPTY(_bs->parents)) {
 error_setg(errp, "The node %s already has a parent",
child_bs->node_name);
diff --git a/block/file-posix.c b/block/file-posix.c
index bd28e3eaea..7c5a330fc1 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -776,6 +776,18 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 goto fail;
 }
 }
+#ifdef CONFIG_BLKZONED
+/*
+ * The kernel page cache does not reliably work for writes to SWR zones
+ * of zoned block device because it can not guarantee the order of writes.
+ */
+if ((strcmp(bs->drv->format_name, "zoned_host_device") == 0) &&
+(!(s->open_flags & O_DIRECT))) {
+error_setg(errp, "driver=zoned_host_device was specified, but it "
+   "requires cache.direct=on, which was not specified.");
+return -EINVAL; /* No host kernel page cache */
+}
+#endif
 
 if (S_ISBLK(st.st_mode)) {
 #ifdef __linux__
diff --git a/block/raw-format.c b/block/raw-format.c
index bac43f1d25..18dc52a150 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -615,6 +615,7 @@ static void raw_child_perm(BlockDriverState *bs, BdrvChild 
*c,
 BlockDriver bdrv_raw = {
 .format_name  = "raw",
 .instance_size= sizeof(BDRVRawState),
+.supports_zoned_children = true,
 .bdrv_probe   = _probe,
 .bdrv_reopen_prepare  = _reopen_prepare,
 .bdrv_reopen_commit   = _reopen_commit,
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index cdc06e77a6..37dddc603c 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -127,6 +127,11 @@ struct BlockDriver {
  */
 bool is_format;
 
+/*
+ * Set to true if the BlockDriver supports zoned children.
+ */
+bool supports_zoned_children;
+
 /*
  * Drivers not implementing bdrv_parse_filename nor bdrv_open should have
  * this field set to true, except ones that are defined only by their
-- 
2.37.3




[PATCH v12 2/7] file-posix: introduce helper functions for sysfs attributes

2022-10-16 Thread Sam Li
Use get_sysfs_str_val() to get the string value of device
zoned model. Then get_sysfs_zoned_model() can convert it to
BlockZoneModel type of QEMU.

Use get_sysfs_long_val() to get the long value of zoned device
information.

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
---
 block/file-posix.c   | 124 ++-
 include/block/block_int-common.h |   3 +
 2 files changed, 91 insertions(+), 36 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 23acffb9a4..8cb07fdb8a 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1201,66 +1201,112 @@ static int hdev_get_max_hw_transfer(int fd, struct 
stat *st)
 #endif
 }
 
-static int hdev_get_max_segments(int fd, struct stat *st)
-{
+/*
+ * Get a sysfs attribute value as character string.
+ */
+static int get_sysfs_str_val(struct stat *st, const char *attribute,
+ char **val) {
 #ifdef CONFIG_LINUX
-char buf[32];
-const char *end;
-char *sysfspath = NULL;
+g_autofree char *sysfspath = NULL;
 int ret;
-int sysfd = -1;
-long max_segments;
+size_t len;
 
-if (S_ISCHR(st->st_mode)) {
-if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
-return ret;
-}
+if (!S_ISBLK(st->st_mode)) {
 return -ENOTSUP;
 }
 
-if (!S_ISBLK(st->st_mode)) {
-return -ENOTSUP;
+sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/%s",
+major(st->st_rdev), minor(st->st_rdev),
+attribute);
+ret = g_file_get_contents(sysfspath, val, , NULL);
+if (ret == -1) {
+return -ENOENT;
 }
 
-sysfspath = g_strdup_printf("/sys/dev/block/%u:%u/queue/max_segments",
-major(st->st_rdev), minor(st->st_rdev));
-sysfd = open(sysfspath, O_RDONLY);
-if (sysfd == -1) {
-ret = -errno;
-goto out;
+/* The file is ended with '\n' */
+char *p;
+p = *val;
+if (*(p + len - 1) == '\n') {
+*(p + len - 1) = '\0';
 }
-do {
-ret = read(sysfd, buf, sizeof(buf) - 1);
-} while (ret == -1 && errno == EINTR);
+return ret;
+#else
+return -ENOTSUP;
+#endif
+}
+
+static int get_sysfs_zoned_model(struct stat *st, BlockZoneModel *zoned)
+{
+g_autofree char *val = NULL;
+int ret;
+
+ret = get_sysfs_str_val(st, "zoned", );
 if (ret < 0) {
-ret = -errno;
-goto out;
-} else if (ret == 0) {
-ret = -EIO;
-goto out;
+return ret;
 }
-buf[ret] = 0;
-/* The file is ended with '\n', pass 'end' to accept that. */
-ret = qemu_strtol(buf, , 10, _segments);
-if (ret == 0 && end && *end == '\n') {
-ret = max_segments;
+
+if (strcmp(val, "host-managed") == 0) {
+*zoned = BLK_Z_HM;
+} else if (strcmp(val, "host-aware") == 0) {
+*zoned = BLK_Z_HA;
+} else if (strcmp(val, "none") == 0) {
+*zoned = BLK_Z_NONE;
+} else {
+return -ENOTSUP;
 }
+return 0;
+}
 
-out:
-if (sysfd != -1) {
-close(sysfd);
+/*
+ * Get a sysfs attribute value as a long integer.
+ */
+static long get_sysfs_long_val(struct stat *st, const char *attribute)
+{
+#ifdef CONFIG_LINUX
+g_autofree char *str = NULL;
+const char *end;
+long val;
+int ret;
+
+ret = get_sysfs_str_val(st, attribute, );
+if (ret < 0) {
+return ret;
+}
+
+/* The file is ended with '\n', pass 'end' to accept that. */
+ret = qemu_strtol(str, , 10, );
+if (ret == 0 && end && *end == '\0') {
+ret = val;
 }
-g_free(sysfspath);
 return ret;
 #else
 return -ENOTSUP;
 #endif
 }
 
+static int hdev_get_max_segments(int fd, struct stat *st)
+{
+#ifdef CONFIG_LINUX
+int ret;
+
+if (S_ISCHR(st->st_mode)) {
+if (ioctl(fd, SG_GET_SG_TABLESIZE, ) == 0) {
+return ret;
+}
+return -ENOTSUP;
+}
+return get_sysfs_long_val(st, "max_segments");
+#else
+return -ENOTSUP;
+#endif
+}
+
 static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 BDRVRawState *s = bs->opaque;
 struct stat st;
+int ret;
+BlockZoneModel zoned;
 
 s->needs_alignment = raw_needs_alignment(bs);
 raw_probe_alignment(bs, s->fd, errp);
@@ -1298,6 +1344,12 @@ static void raw_refresh_limits(BlockDriverState *bs, 
Error **errp)
 bs->bl.max_hw_iov = ret;
 }
 }
+
+ret = get_sysfs_zoned_model(, );
+if (ret < 0) {
+zoned = BLK_Z_NONE;
+}
+bs->bl.zoned = zoned;
 }
 
 static int check_for_dasd(int fd)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..7f7863cc9e 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -825,6 +825,9 @@ typedef struct BlockLimits {
 
 /* maximum number of iovec elements */
   

[PATCH v12 4/7] raw-format: add zone operations to pass through requests

2022-10-16 Thread Sam Li
raw-format driver usually sits on top of file-posix driver. It needs to
pass through requests of zone commands.

Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 block/raw-format.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/block/raw-format.c b/block/raw-format.c
index f337ac7569..bac43f1d25 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -314,6 +314,17 @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState 
*bs,
 return bdrv_co_pdiscard(bs->file, offset, bytes);
 }
 
+static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t 
offset,
+   unsigned int *nr_zones,
+   BlockZoneDescriptor *zones) {
+return bdrv_co_zone_report(bs->file->bs, offset, nr_zones, zones);
+}
+
+static int coroutine_fn raw_co_zone_mgmt(BlockDriverState *bs, BlockZoneOp op,
+ int64_t offset, int64_t len) {
+return bdrv_co_zone_mgmt(bs->file->bs, op, offset, len);
+}
+
 static int64_t raw_getlength(BlockDriverState *bs)
 {
 int64_t len;
@@ -615,6 +626,8 @@ BlockDriver bdrv_raw = {
 .bdrv_co_pwritev  = _co_pwritev,
 .bdrv_co_pwrite_zeroes = _co_pwrite_zeroes,
 .bdrv_co_pdiscard = _co_pdiscard,
+.bdrv_co_zone_report  = _co_zone_report,
+.bdrv_co_zone_mgmt  = _co_zone_mgmt,
 .bdrv_co_block_status = _co_block_status,
 .bdrv_co_copy_range_from = _co_copy_range_from,
 .bdrv_co_copy_range_to  = _co_copy_range_to,
-- 
2.37.3




[PATCH v12 3/7] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls

2022-10-16 Thread Sam Li
Add a new zoned_host_device BlockDriver. The zoned_host_device option
accepts only zoned host block devices. By adding zone management
operations in this new BlockDriver, users can use the new block
layer APIs including Report Zone and four zone management operations
(open, close, finish, reset, reset_all).

Qemu-io uses the new APIs to perform zoned storage commands of the device:
zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs),
zone_finish(zf).

For example, to test zone_report, use following command:
$ ./build/qemu-io --image-opts -n driver=zoned_host_device, filename=/dev/nullb0
-c "zrp offset nr_zones"

Signed-off-by: Sam Li 
Reviewed-by: Hannes Reinecke 
---
 block/block-backend.c | 148 +
 block/file-posix.c| 335 ++
 block/io.c|  41 
 include/block/block-io.h  |   7 +
 include/block/block_int-common.h  |  24 +++
 include/block/raw-aio.h   |   6 +-
 include/sysemu/block-backend-io.h |  18 ++
 meson.build   |   4 +
 qapi/block-core.json  |   8 +-
 qemu-io-cmds.c| 149 +
 10 files changed, 737 insertions(+), 3 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index aa4adf06ae..1c618e9c68 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1431,6 +1431,15 @@ typedef struct BlkRwCo {
 void *iobuf;
 int ret;
 BdrvRequestFlags flags;
+union {
+struct {
+unsigned int *nr_zones;
+BlockZoneDescriptor *zones;
+} zone_report;
+struct {
+unsigned long op;
+} zone_mgmt;
+};
 } BlkRwCo;
 
 int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags)
@@ -1775,6 +1784,145 @@ int coroutine_fn blk_co_flush(BlockBackend *blk)
 return ret;
 }
 
+static void coroutine_fn blk_aio_zone_report_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_report(rwco->blk, rwco->offset,
+   rwco->zone_report.nr_zones,
+   rwco->zone_report.zones);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor  *zones,
+BlockCompletionFunc *cb, void *opaque)
+{
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.ret= NOT_DONE,
+.zone_report = {
+.zones = zones,
+.nr_zones = nr_zones,
+},
+};
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_report_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+static void coroutine_fn blk_aio_zone_mgmt_entry(void *opaque)
+{
+BlkAioEmAIOCB *acb = opaque;
+BlkRwCo *rwco = >rwco;
+
+rwco->ret = blk_co_zone_mgmt(rwco->blk, rwco->zone_mgmt.op,
+ rwco->offset, acb->bytes);
+blk_aio_complete(acb);
+}
+
+BlockAIOCB *blk_aio_zone_mgmt(BlockBackend *blk, BlockZoneOp op,
+  int64_t offset, int64_t len,
+  BlockCompletionFunc *cb, void *opaque) {
+BlkAioEmAIOCB *acb;
+Coroutine *co;
+IO_CODE();
+
+blk_inc_in_flight(blk);
+acb = blk_aio_get(_aio_em_aiocb_info, blk, cb, opaque);
+acb->rwco = (BlkRwCo) {
+.blk= blk,
+.offset = offset,
+.ret= NOT_DONE,
+.zone_mgmt = {
+.op = op,
+},
+};
+acb->bytes = len;
+acb->has_returned = false;
+
+co = qemu_coroutine_create(blk_aio_zone_mgmt_entry, acb);
+bdrv_coroutine_enter(blk_bs(blk), co);
+
+acb->has_returned = true;
+if (acb->rwco.ret != NOT_DONE) {
+replay_bh_schedule_oneshot_event(blk_get_aio_context(blk),
+ blk_aio_complete_bh, acb);
+}
+
+return >common;
+}
+
+/*
+ * Send a zone_report command.
+ * offset is a byte offset from the start of the device. No alignment
+ * required for offset.
+ * nr_zones represents IN maximum and OUT actual.
+ */
+int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
+unsigned int *nr_zones,
+BlockZoneDescriptor *zones)
+{
+int ret;
+IO_CODE();
+
+blk_inc_in_flight(blk); /* increase before waiting */
+blk_wait_while_drained(blk);
+if 

[PATCH v12 1/7] include: add zoned device structs

2022-10-16 Thread Sam Li
Signed-off-by: Sam Li 
Reviewed-by: Stefan Hajnoczi 
Reviewed-by: Damien Le Moal 
Reviewed-by: Hannes Reinecke 
---
 include/block/block-common.h | 43 
 1 file changed, 43 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..36bd0e480e 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -49,6 +49,49 @@ typedef struct BlockDriver BlockDriver;
 typedef struct BdrvChild BdrvChild;
 typedef struct BdrvChildClass BdrvChildClass;
 
+typedef enum BlockZoneOp {
+BLK_ZO_OPEN,
+BLK_ZO_CLOSE,
+BLK_ZO_FINISH,
+BLK_ZO_RESET,
+} BlockZoneOp;
+
+typedef enum BlockZoneModel {
+BLK_Z_NONE = 0x0, /* Regular block device */
+BLK_Z_HM = 0x1, /* Host-managed zoned block device */
+BLK_Z_HA = 0x2, /* Host-aware zoned block device */
+} BlockZoneModel;
+
+typedef enum BlockZoneCondition {
+BLK_ZS_NOT_WP = 0x0,
+BLK_ZS_EMPTY = 0x1,
+BLK_ZS_IOPEN = 0x2,
+BLK_ZS_EOPEN = 0x3,
+BLK_ZS_CLOSED = 0x4,
+BLK_ZS_RDONLY = 0xD,
+BLK_ZS_FULL = 0xE,
+BLK_ZS_OFFLINE = 0xF,
+} BlockZoneCondition;
+
+typedef enum BlockZoneType {
+BLK_ZT_CONV = 0x1, /* Conventional random writes supported */
+BLK_ZT_SWR = 0x2, /* Sequential writes required */
+BLK_ZT_SWP = 0x3, /* Sequential writes preferred */
+} BlockZoneType;
+
+/*
+ * Zone descriptor data structure.
+ * Provides information on a zone with all position and size values in bytes.
+ */
+typedef struct BlockZoneDescriptor {
+uint64_t start;
+uint64_t length;
+uint64_t cap;
+uint64_t wp;
+BlockZoneType type;
+BlockZoneCondition cond;
+} BlockZoneDescriptor;
+
 typedef struct BlockDriverInfo {
 /* in bytes, 0 if irrelevant */
 int cluster_size;
-- 
2.37.3




[PATCH v12 0/7] Add support for zoned device

2022-10-16 Thread Sam Li
Zoned Block Devices (ZBDs) devide the LBA space to block regions called zones
that are larger than the LBA size. It can only allow sequential writes, which
reduces write amplification in SSD, leading to higher throughput and increased
capacity. More details about ZBDs can be found at:

https://zonedstorage.io/docs/introduction/zoned-storage

The zoned device support aims to let guests (virtual machines) access zoned
storage devices on the host (hypervisor) through a virtio-blk device. This
involves extending QEMU's block layer and virtio-blk emulation code.  In its
current status, the virtio-blk device is not aware of ZBDs but the guest sees
host-managed drives as regular drive that will runs correctly under the most
common write workloads.

This patch series extend the block layer APIs with the minimum set of zoned
commands that are necessary to support zoned devices. The commands are - Report
Zones, four zone operations and Zone Append (developing).

It can be tested on a null_blk device using qemu-io or qemu-iotests. For
example, to test zone report using qemu-io:
$ path/to/qemu-io --image-opts -n driver=zoned_host_device,filename=/dev/nullb0
-c "zrp offset nr_zones"

v12:
- address review comments
  * drop BLK_ZO_RESET_ALL bit [Damien]
  * fix error messages, style, and typos[Damien, Hannes]

v11:
- address review comments
  * fix possible BLKZONED config compiling warnings [Stefan]
  * fix capacity field compiling warnings on older kernel [Stefan,Damien]

v10:
- address review comments
  * deal with the last small zone case in zone_mgmt operations [Damien]
  * handle the capacity field outdated in old kernel(before 5.9) [Damien]
  * use byte unit in block layer to be consistent with QEMU [Eric]
  * fix coding style related problems [Stefan]

v9:
- address review comments
  * specify units of zone commands requests [Stefan]
  * fix some error handling in file-posix [Stefan]
  * introduce zoned_host_devcie in the commit message [Markus]

v8:
- address review comments
  * solve patch conflicts and merge sysfs helper funcations into one patch
  * add cache.direct=on check in config

v7:
- address review comments
  * modify sysfs attribute helper funcations
  * move the input validation and error checking into raw_co_zone_* function
  * fix checks in config

v6:
- drop virtio-blk emulation changes
- address Stefan's review comments
  * fix CONFIG_BLKZONED configs in related functions
  * replace reading fd by g_file_get_contents() in get_sysfs_str_val()
  * rewrite documentation for zoned storage

v5:
- add zoned storage emulation to virtio-blk device
- add documentation for zoned storage
- address review comments
  * fix qemu-iotests
  * fix check to block layer
  * modify interfaces of sysfs helper functions
  * rename zoned device structs according to QEMU styles
  * reorder patches

v4:
- add virtio-blk headers for zoned device
- add configurations for zoned host device
- add zone operations for raw-format
- address review comments
  * fix memory leak bug in zone_report
  * add checks to block layers
  * fix qemu-iotests format
  * fix sysfs helper functions

v3:
- add helper functions to get sysfs attributes
- address review comments
  * fix zone report bugs
  * fix the qemu-io code path
  * use thread pool to avoid blocking ioctl() calls

v2:
- add qemu-io sub-commands
- address review comments
  * modify interfaces of APIs

v1:
- add block layer APIs resembling Linux ZoneBlockDevice ioctls

Sam Li (7):
  include: add zoned device structs
  file-posix: introduce helper functions for sysfs attributes
  block: add block layer APIs resembling Linux ZonedBlockDevice ioctls
  raw-format: add zone operations to pass through requests
  config: add check to block layer
  qemu-iotests: test new zone operations
  docs/zoned-storage: add zoned device documentation

 block.c|  19 +
 block/block-backend.c  | 148 
 block/file-posix.c | 471 +++--
 block/io.c |  41 +++
 block/raw-format.c |  14 +
 docs/devel/zoned-storage.rst   |  43 +++
 docs/system/qemu-block-drivers.rst.inc |   6 +
 include/block/block-common.h   |  43 +++
 include/block/block-io.h   |   7 +
 include/block/block_int-common.h   |  32 ++
 include/block/raw-aio.h|   6 +-
 include/sysemu/block-backend-io.h  |  18 +
 meson.build|   4 +
 qapi/block-core.json   |   8 +-
 qemu-io-cmds.c | 149 
 tests/qemu-iotests/tests/zoned.out |  53 +++
 tests/qemu-iotests/tests/zoned.sh  |  86 +
 17 files changed, 1109 insertions(+), 39 deletions(-)
 create mode 100644 docs/devel/zoned-storage.rst
 create mode 100644 tests/qemu-iotests/tests/zoned.out
 create mode 100755 tests/qemu-iotests/tests/zoned.sh

-- 
2.37.3




Re: [PATCH v3 7/9] hw/ppc/e500: Implement pflash handling

2022-10-16 Thread BALATON Zoltan

On Sun, 16 Oct 2022, Bernhard Beschow wrote:

Allows e500 boards to have their root file system reside on flash using
only builtin devices located in the eLBC memory region.

Note that the flash memory area is only created when a -pflash argument is
given, and that the size is determined by the given file. The idea is to
put users into control.

Signed-off-by: Bernhard Beschow 
---
docs/system/ppc/ppce500.rst | 16 ++
hw/ppc/Kconfig  |  1 +
hw/ppc/e500.c   | 62 +
3 files changed, 79 insertions(+)

diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
index ba6bcb7314..99d2c680d6 100644
--- a/docs/system/ppc/ppce500.rst
+++ b/docs/system/ppc/ppce500.rst
@@ -165,3 +165,19 @@ if “-device eTSEC” is given to QEMU:
.. code-block:: bash

  -netdev tap,ifname=tap0,script=no,downscript=no,id=net0 -device 
eTSEC,netdev=net0
+
+Root file system on flash drive
+---
+
+Rather than using a root file system on ram disk, it is possible to have it on
+CFI flash. Given an ext2 image whose size must be a power of two, it can be 
used
+as follows:
+
+.. code-block:: bash
+
+  $ qemu-system-ppc{64|32} -M ppce500 -cpu e500mc -smp 4 -m 2G \


We have qemu-system-ppc and qemu-system-ppc64 not qemu-system-ppc32 so 
maybe qemu-system-ppc[64] even though that looks odd so maybe just 
qemu-system-ppc and then people should know that ppc64 includes ppc config 
as well.


Regards,
BALATON Zoltan


+  -display none -serial stdio \
+  -kernel vmlinux \
+  -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
+  -append "rootwait root=/dev/mtdblock0"
+
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 791fe78a50..769a1ead1c 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -126,6 +126,7 @@ config E500
select ETSEC
select GPIO_MPC8XXX
select OPENPIC
+select PFLASH_CFI01
select PLATFORM_BUS
select PPCE500_PCI
select SERIAL
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 3e950ea3ba..23d2c3451a 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -23,8 +23,10 @@
#include "e500-ccsr.h"
#include "net/net.h"
#include "qemu/config-file.h"
+#include "hw/block/flash.h"
#include "hw/char/serial.h"
#include "hw/pci/pci.h"
+#include "sysemu/block-backend-io.h"
#include "sysemu/sysemu.h"
#include "sysemu/kvm.h"
#include "sysemu/reset.h"
@@ -267,6 +269,31 @@ static void sysbus_device_create_devtree(SysBusDevice 
*sbdev, void *opaque)
}
}

+static void create_devtree_flash(SysBusDevice *sbdev,
+ PlatformDevtreeData *data)
+{
+g_autofree char *name = NULL;
+uint64_t num_blocks = object_property_get_uint(OBJECT(sbdev),
+   "num-blocks",
+   _fatal);
+uint64_t sector_length = object_property_get_uint(OBJECT(sbdev),
+  "sector-length",
+  _fatal);
+uint64_t bank_width = object_property_get_uint(OBJECT(sbdev),
+   "width",
+   _fatal);
+hwaddr flashbase = 0;
+hwaddr flashsize = num_blocks * sector_length;
+void *fdt = data->fdt;
+
+name = g_strdup_printf("%s/nor@%" PRIx64, data->node, flashbase);
+qemu_fdt_add_subnode(fdt, name);
+qemu_fdt_setprop_string(fdt, name, "compatible", "cfi-flash");
+qemu_fdt_setprop_sized_cells(fdt, name, "reg",
+ 1, flashbase, 1, flashsize);
+qemu_fdt_setprop_cell(fdt, name, "bank-width", bank_width);
+}
+
static void platform_bus_create_devtree(PPCE500MachineState *pms,
void *fdt, const char *mpic)
{
@@ -276,6 +303,8 @@ static void platform_bus_create_devtree(PPCE500MachineState 
*pms,
uint64_t addr = pmc->platform_bus_base;
uint64_t size = pmc->platform_bus_size;
int irq_start = pmc->platform_bus_first_irq;
+SysBusDevice *sbdev;
+bool ambiguous;

/* Create a /platform node that we can put all devices into */

@@ -302,6 +331,13 @@ static void 
platform_bus_create_devtree(PPCE500MachineState *pms,
/* Loop through all dynamic sysbus devices and create nodes for them */
foreach_dynamic_sysbus_device(sysbus_device_create_devtree, );

+sbdev = SYS_BUS_DEVICE(object_resolve_path_type("", TYPE_PFLASH_CFI01,
+));
+if (sbdev) {
+assert(!ambiguous);
+create_devtree_flash(sbdev, );
+}
+
g_free(node);
}

@@ -856,6 +892,7 @@ void ppce500_init(MachineState *machine)
unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4};
IrqLines *irqs;
DeviceState *dev, *mpicdev;
+DriveInfo *dinfo;
CPUPPCState *firstenv = NULL;
MemoryRegion *ccsr_addr_space;
SysBusDevice *s;
@@ -1024,6 +1061,31 @@ void 

Re: [PATCH v6 2/2] block: Refactor get_tmp_filename()

2022-10-16 Thread Bin Meng
On Mon, Oct 10, 2022 at 12:05 PM Bin Meng  wrote:
>
> At present there are two callers of get_tmp_filename() and they are
> inconsistent.
>
> One does:
>
> /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
> char *tmp_filename = g_malloc0(PATH_MAX + 1);
> ...
> ret = get_tmp_filename(tmp_filename, PATH_MAX + 1);
>
> while the other does:
>
> s->qcow_filename = g_malloc(PATH_MAX);
> ret = get_tmp_filename(s->qcow_filename, PATH_MAX);
>
> As we can see different 'size' arguments are passed. There are also
> platform specific implementations inside the function, and the use
> of snprintf is really undesirable.
>
> The function name is also misleading. It creates a temporary file,
> not just a filename.
>
> Refactor this routine by changing its name and signature to:
>
> char *create_tmp_file(Error **errp)
>
> and use g_get_tmp_dir() / g_mkstemp() for a consistent implementation.
>
> While we are here, add some comments to mention that /var/tmp is
> preferred over /tmp on non-win32 hosts.
>
> Signed-off-by: Bin Meng 
> ---
>
> Changes in v6:
> - use g_mkstemp() and stick to use /var/tmp for non-win32 hosts
>
> Changes in v5:
> - minor change in the commit message
> - add some notes in the function comment block
> - add g_autofree for tmp_filename
>
> Changes in v4:
> - Rename the function to create_tmp_file() and take "Error **errp" as
>   a parameter, so that callers can pass errp all the way down to this
>   routine.
> - Commit message updated to reflect the latest change
>
> Changes in v3:
> - Do not use errno directly, instead still let get_tmp_filename() return
>   a negative number to indicate error
>
> Changes in v2:
> - Use g_autofree and g_steal_pointer
>
>  include/block/block_int-common.h |  2 +-
>  block.c  | 56 +---
>  block/vvfat.c|  7 ++--
>  3 files changed, 34 insertions(+), 31 deletions(-)
>

Any comments?



[PATCH v11 5/5] target/riscv: smstateen knobs

2022-10-16 Thread Mayuresh Chitale
Add knobs to allow users to enable smstateen and also export it via the
ISA extension string.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li
Reviewed-by: Alistair Francis 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index e6d9c706bb..ae3f57a72b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -102,6 +102,7 @@ static const struct isa_ext_data isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zhinx, true, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, true, PRIV_VERSION_1_12_0, ext_zhinxmin),
 ISA_EXT_DATA_ENTRY(smaia, true, PRIV_VERSION_1_12_0, ext_smaia),
+ISA_EXT_DATA_ENTRY(smstateen, true, PRIV_VERSION_1_12_0, ext_smstateen),
 ISA_EXT_DATA_ENTRY(ssaia, true, PRIV_VERSION_1_12_0, ext_ssaia),
 ISA_EXT_DATA_ENTRY(sscofpmf, true, PRIV_VERSION_1_12_0, ext_sscofpmf),
 ISA_EXT_DATA_ENTRY(sstc, true, PRIV_VERSION_1_12_0, ext_sstc),
@@ -1024,6 +1025,7 @@ static Property riscv_cpu_extensions[] = {
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
 DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
 
+DEFINE_PROP_BOOL("smstateen", RISCVCPU, cfg.ext_smstateen, false),
 DEFINE_PROP_BOOL("svinval", RISCVCPU, cfg.ext_svinval, false),
 DEFINE_PROP_BOOL("svnapot", RISCVCPU, cfg.ext_svnapot, false),
 DEFINE_PROP_BOOL("svpbmt", RISCVCPU, cfg.ext_svpbmt, false),
-- 
2.25.1




[PATCH v11 4/5] target/riscv: smstateen check for fcsr

2022-10-16 Thread Mayuresh Chitale
If smstateen is implemented and sstateen0.fcsr is clear then the floating point
operations must return illegal instruction exception or virtual instruction
trap, if relevant.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li 
---
 target/riscv/csr.c| 23 
 target/riscv/insn_trans/trans_rvf.c.inc   | 43 +--
 target/riscv/insn_trans/trans_rvzfh.c.inc | 12 +++
 3 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 71236f2b5d..8b25f885ec 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -84,6 +84,10 @@ static RISCVException fs(CPURISCVState *env, int csrno)
 !RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
 return RISCV_EXCP_ILLEGAL_INST;
 }
+
+if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
+return smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR);
+}
 #endif
 return RISCV_EXCP_NONE;
 }
@@ -2023,6 +2027,9 @@ static RISCVException write_mstateen0(CPURISCVState *env, 
int csrno,
   target_ulong new_val)
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
 
 return write_mstateen(env, csrno, wr_mask, new_val);
 }
@@ -2059,6 +2066,10 @@ static RISCVException write_mstateen0h(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_mstateenh(env, csrno, wr_mask, new_val);
 }
 
@@ -2096,6 +2107,10 @@ static RISCVException write_hstateen0(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_hstateen(env, csrno, wr_mask, new_val);
 }
 
@@ -2135,6 +2150,10 @@ static RISCVException write_hstateen0h(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_hstateenh(env, csrno, wr_mask, new_val);
 }
 
@@ -2182,6 +2201,10 @@ static RISCVException write_sstateen0(CPURISCVState 
*env, int csrno,
 {
 uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
+if (!riscv_has_ext(env, RVF)) {
+wr_mask |= SMSTATEEN0_FCSR;
+}
+
 return write_sstateen(env, csrno, wr_mask, new_val);
 }
 
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index a1d3eb52ad..93657680c6 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -24,9 +24,46 @@
 return false; \
 } while (0)
 
-#define REQUIRE_ZFINX_OR_F(ctx) do {\
-if (!ctx->cfg_ptr->ext_zfinx) { \
-REQUIRE_EXT(ctx, RVF); \
+#ifndef CONFIG_USER_ONLY
+static inline bool smstateen_fcsr_check(DisasContext *ctx, int index)
+{
+CPUState *cpu = ctx->cs;
+CPURISCVState *env = cpu->env_ptr;
+uint64_t stateen = env->mstateen[index];
+
+if (!ctx->cfg_ptr->ext_smstateen || env->priv == PRV_M) {
+return true;
+}
+
+if (ctx->virt_enabled) {
+stateen &= env->hstateen[index];
+}
+
+if (env->priv == PRV_U && has_ext(ctx, RVS)) {
+stateen &= env->sstateen[index];
+}
+
+if (!(stateen & SMSTATEEN0_FCSR)) {
+if (ctx->virt_enabled) {
+ctx->virt_inst_excp = true;
+}
+return false;
+}
+
+return true;
+}
+#else
+#define smstateen_fcsr_check(ctx, index) (true)
+#endif
+
+#define REQUIRE_ZFINX_OR_F(ctx) do { \
+if (!has_ext(ctx, RVF)) { \
+if (!ctx->cfg_ptr->ext_zfinx) { \
+return false; \
+} \
+if (!smstateen_fcsr_check(ctx, 0)) { \
+return false; \
+} \
 } \
 } while (0)
 
diff --git a/target/riscv/insn_trans/trans_rvzfh.c.inc 
b/target/riscv/insn_trans/trans_rvzfh.c.inc
index 5d07150cd0..6c2e338c0a 100644
--- a/target/riscv/insn_trans/trans_rvzfh.c.inc
+++ b/target/riscv/insn_trans/trans_rvzfh.c.inc
@@ -20,18 +20,27 @@
 if (!ctx->cfg_ptr->ext_zfh) {  \
 return false; \
 } \
+if (!smstateen_fcsr_check(ctx, 0)) { \
+return false; \
+} \
 } while (0)
 
 #define REQUIRE_ZHINX_OR_ZFH(ctx) do { \
 if (!ctx->cfg_ptr->ext_zhinx && !ctx->cfg_ptr->ext_zfh) { \
 return false;  \
 }  \
+if (!smstateen_fcsr_check(ctx, 0)) { \
+return false; \
+} \
 } while (0)
 
 #define REQUIRE_ZFH_OR_ZFHMIN(ctx) do {   \
 if (!(ctx->cfg_ptr->ext_zfh || ctx->cfg_ptr->ext_zfhmin)) { \
 return false; \
 } \
+if (!smstateen_fcsr_check(ctx, 0)) { \
+return false; \
+   

[PATCH v11 3/5] target/riscv: generate virtual instruction exception

2022-10-16 Thread Mayuresh Chitale
This patch adds a mechanism to generate a virtual instruction
instruction exception instead of an illegal instruction exception
during instruction decode when virt is enabled.

Signed-off-by: Mayuresh Chitale 
---
 target/riscv/translate.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index db123da5ec..8b0bd38bb2 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -76,6 +76,7 @@ typedef struct DisasContext {
to reset this known value.  */
 int frm;
 RISCVMXL ol;
+bool virt_inst_excp;
 bool virt_enabled;
 const RISCVCPUConfig *cfg_ptr;
 bool hlsx;
@@ -243,7 +244,11 @@ static void gen_exception_illegal(DisasContext *ctx)
 {
 tcg_gen_st_i32(tcg_constant_i32(ctx->opcode), cpu_env,
offsetof(CPURISCVState, bins));
-generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
+if (ctx->virt_inst_excp) {
+generate_exception(ctx, RISCV_EXCP_VIRT_INSTRUCTION_FAULT);
+} else {
+generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
+}
 }
 
 static void gen_exception_inst_addr_mis(DisasContext *ctx)
@@ -1062,6 +1067,7 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
 { has_XVentanaCondOps_p,  decode_XVentanaCodeOps },
 };
 
+ctx->virt_inst_excp = false;
 /* Check for compressed insn */
 if (insn_len(opcode) == 2) {
 if (!has_ext(ctx, RVC)) {
-- 
2.25.1




[PATCH v3 9/9] hw/ppc/e500: Add Freescale eSDHC to e500plat

2022-10-16 Thread Bernhard Beschow
Adds missing functionality to e500plat machine which increases the
chance of given "real" firmware images to access SD cards.

Signed-off-by: Bernhard Beschow 
---
 docs/system/ppc/ppce500.rst | 12 
 hw/ppc/Kconfig  |  1 +
 hw/ppc/e500.c   | 35 ++-
 hw/ppc/e500.h   |  1 +
 hw/ppc/e500plat.c   |  1 +
 5 files changed, 49 insertions(+), 1 deletion(-)

diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
index 99d2c680d6..298ee9ee16 100644
--- a/docs/system/ppc/ppce500.rst
+++ b/docs/system/ppc/ppce500.rst
@@ -19,6 +19,7 @@ The ``ppce500`` machine supports the following devices:
 * Power-off functionality via one GPIO pin
 * 1 Freescale MPC8xxx PCI host controller
 * VirtIO devices via PCI bus
+* 1 Freescale Enhanced Secure Digital Host controller (eSDHC)
 * 1 Freescale Enhanced Triple Speed Ethernet controller (eTSEC)
 
 Hardware configuration information
@@ -181,3 +182,14 @@ as follows:
   -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
   -append "rootwait root=/dev/mtdblock0"
 
+Alternatively, the root file system can also reside on an emulated SD card
+whose size must again be a power of two:
+
+.. code-block:: bash
+
+  $ qemu-system-ppc{64|32} -M ppce500 -cpu e500mc -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel vmlinux \
+  -device sd-card,drive=mydrive \
+  -drive id=mydrive,if=none,file=/path/to/rootfs.ext2,format=raw \
+  -append "rootwait root=/dev/mmcblk0"
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 769a1ead1c..6e31f568ba 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -129,6 +129,7 @@ config E500
 select PFLASH_CFI01
 select PLATFORM_BUS
 select PPCE500_PCI
+select SDHCI
 select SERIAL
 select MPC_I2C
 select FDT_PPC
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 23d2c3451a..f43a21d8bb 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -48,6 +48,7 @@
 #include "hw/net/fsl_etsec/etsec.h"
 #include "hw/i2c/i2c.h"
 #include "hw/irq.h"
+#include "hw/sd/sdhci.h"
 
 #define EPAPR_MAGIC(0x45504150)
 #define DTC_LOAD_PAD   0x180
@@ -66,11 +67,14 @@
 #define MPC8544_SERIAL1_REGS_OFFSET 0x4600ULL
 #define MPC8544_PCI_REGS_OFFSET0x8000ULL
 #define MPC8544_PCI_REGS_SIZE  0x1000ULL
+#define MPC85XX_ESDHC_REGS_OFFSET  0x2e000ULL
+#define MPC85XX_ESDHC_REGS_SIZE0x1000ULL
 #define MPC8544_UTIL_OFFSET0xeULL
 #define MPC8XXX_GPIO_OFFSET0x000FF000ULL
 #define MPC8544_I2C_REGS_OFFSET0x3000ULL
 #define MPC8XXX_GPIO_IRQ   47
 #define MPC8544_I2C_IRQ43
+#define MPC85XX_ESDHC_IRQ  72
 #define RTC_REGS_OFFSET0x68
 
 #define PLATFORM_CLK_FREQ_HZ   (400 * 1000 * 1000)
@@ -203,6 +207,22 @@ static void dt_i2c_create(void *fdt, const char *soc, 
const char *mpic,
 g_free(i2c);
 }
 
+static void dt_sdhc_create(void *fdt, const char *parent, const char *mpic)
+{
+hwaddr mmio = MPC85XX_ESDHC_REGS_OFFSET;
+hwaddr size = MPC85XX_ESDHC_REGS_SIZE;
+int irq = MPC85XX_ESDHC_IRQ;
+g_autofree char *name = NULL;
+
+name = g_strdup_printf("%s/sdhc@%" PRIx64, parent, mmio);
+qemu_fdt_add_subnode(fdt, name);
+qemu_fdt_setprop(fdt, name, "sdhci,auto-cmd12", NULL, 0);
+qemu_fdt_setprop_phandle(fdt, name, "interrupt-parent", mpic);
+qemu_fdt_setprop_cells(fdt, name, "bus-width", 4);
+qemu_fdt_setprop_cells(fdt, name, "interrupts", irq, 0x2);
+qemu_fdt_setprop_cells(fdt, name, "reg", mmio, size);
+qemu_fdt_setprop_string(fdt, name, "compatible", "fsl,esdhc");
+}
 
 typedef struct PlatformDevtreeData {
 void *fdt;
@@ -553,6 +573,10 @@ static int ppce500_load_device_tree(PPCE500MachineState 
*pms,
 
 dt_rtc_create(fdt, "i2c", "rtc");
 
+/* sdhc */
+if (pmc->has_esdhc) {
+dt_sdhc_create(fdt, soc, mpic);
+}
 
 gutil = g_strdup_printf("%s/global-utilities@%llx", soc,
 MPC8544_UTIL_OFFSET);
@@ -982,7 +1006,8 @@ void ppce500_init(MachineState *machine)
0, qdev_get_gpio_in(mpicdev, 42), 399193,
serial_hd(1), DEVICE_BIG_ENDIAN);
 }
-/* I2C */
+
+/* I2C */
 dev = qdev_new("mpc-i2c");
 s = SYS_BUS_DEVICE(dev);
 sysbus_realize_and_unref(s, _fatal);
@@ -992,6 +1017,14 @@ void ppce500_init(MachineState *machine)
 i2c = (I2CBus *)qdev_get_child_bus(dev, "i2c");
 i2c_slave_create_simple(i2c, "ds1338", RTC_REGS_OFFSET);
 
+/* eSDHC */
+if (pmc->has_esdhc) {
+dev = qdev_new(TYPE_FSL_ESDHC);
+s = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(s, _fatal);
+sysbus_mmio_map(s, 0, pmc->ccsrbar_base + MPC85XX_ESDHC_REGS_OFFSET);
+sysbus_connect_irq(s, 0, qdev_get_gpio_in(mpicdev, MPC85XX_ESDHC_IRQ));
+}
 
 /* General Utility device */
 dev = qdev_new("mpc8544-guts");
diff --git a/hw/ppc/e500.h b/hw/ppc/e500.h
index 

[PATCH v11 2/5] target/riscv: smstateen check for h/s/envcfg

2022-10-16 Thread Mayuresh Chitale
Accesses to henvcfg, henvcfgh and senvcfg are allowed only if the corresponding
bit in mstateen0/hstateen0 is enabled. Otherwise an illegal instruction trap is
generated.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li
Reviewed-by: Alistair Francis 
---
 target/riscv/csr.c | 87 ++
 1 file changed, 80 insertions(+), 7 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index c861424e85..71236f2b5d 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -41,6 +41,42 @@ void riscv_set_csr_ops(int csrno, riscv_csr_operations *ops)
 }
 
 /* Predicates */
+#if !defined(CONFIG_USER_ONLY)
+static RISCVException smstateen_acc_ok(CPURISCVState *env, int index,
+   uint64_t bit)
+{
+bool virt = riscv_cpu_virt_enabled(env);
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (env->priv == PRV_M || !cpu->cfg.ext_smstateen) {
+return RISCV_EXCP_NONE;
+}
+
+if (!(env->mstateen[index] & bit)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (virt) {
+if (!(env->hstateen[index] & bit)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+
+if (env->priv == PRV_U && !(env->sstateen[index] & bit)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+}
+
+if (env->priv == PRV_U && riscv_has_ext(env, RVS)) {
+if (!(env->sstateen[index] & bit)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+}
+
+return RISCV_EXCP_NONE;
+}
+#endif
+
 static RISCVException fs(CPURISCVState *env, int csrno)
 {
 #if !defined(CONFIG_USER_ONLY)
@@ -1874,6 +1910,13 @@ static RISCVException write_menvcfgh(CPURISCVState *env, 
int csrno,
 static RISCVException read_senvcfg(CPURISCVState *env, int csrno,
  target_ulong *val)
 {
+RISCVException ret;
+
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
 *val = env->senvcfg;
 return RISCV_EXCP_NONE;
 }
@@ -1882,15 +1925,27 @@ static RISCVException write_senvcfg(CPURISCVState *env, 
int csrno,
   target_ulong val)
 {
 uint64_t mask = SENVCFG_FIOM | SENVCFG_CBIE | SENVCFG_CBCFE | SENVCFG_CBZE;
+RISCVException ret;
 
-env->senvcfg = (env->senvcfg & ~mask) | (val & mask);
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
 
+env->senvcfg = (env->senvcfg & ~mask) | (val & mask);
 return RISCV_EXCP_NONE;
 }
 
 static RISCVException read_henvcfg(CPURISCVState *env, int csrno,
  target_ulong *val)
 {
+RISCVException ret;
+
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
 *val = env->henvcfg;
 return RISCV_EXCP_NONE;
 }
@@ -1899,6 +1954,12 @@ static RISCVException write_henvcfg(CPURISCVState *env, 
int csrno,
   target_ulong val)
 {
 uint64_t mask = HENVCFG_FIOM | HENVCFG_CBIE | HENVCFG_CBCFE | HENVCFG_CBZE;
+RISCVException ret;
+
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
 
 if (riscv_cpu_mxl(env) == MXL_RV64) {
 mask |= HENVCFG_PBMTE | HENVCFG_STCE;
@@ -1912,6 +1973,13 @@ static RISCVException write_henvcfg(CPURISCVState *env, 
int csrno,
 static RISCVException read_henvcfgh(CPURISCVState *env, int csrno,
  target_ulong *val)
 {
+RISCVException ret;
+
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
+
 *val = env->henvcfg >> 32;
 return RISCV_EXCP_NONE;
 }
@@ -1921,9 +1989,14 @@ static RISCVException write_henvcfgh(CPURISCVState *env, 
int csrno,
 {
 uint64_t mask = HENVCFG_PBMTE | HENVCFG_STCE;
 uint64_t valh = (uint64_t)val << 32;
+RISCVException ret;
 
-env->henvcfg = (env->henvcfg & ~mask) | (valh & mask);
+ret = smstateen_acc_ok(env, 0, SMSTATEEN0_HSENVCFG);
+if (ret != RISCV_EXCP_NONE) {
+return ret;
+}
 
+env->henvcfg = (env->henvcfg & ~mask) | (valh & mask);
 return RISCV_EXCP_NONE;
 }
 
@@ -1949,7 +2022,7 @@ static RISCVException write_mstateen(CPURISCVState *env, 
int csrno,
 static RISCVException write_mstateen0(CPURISCVState *env, int csrno,
   target_ulong new_val)
 {
-uint64_t wr_mask = SMSTATEEN_STATEEN;
+uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
 
 return write_mstateen(env, csrno, wr_mask, new_val);
 }
@@ -1984,7 +2057,7 @@ static RISCVException write_mstateenh(CPURISCVState *env, 
int csrno,
 static RISCVException write_mstateen0h(CPURISCVState *env, int csrno,
   target_ulong new_val)
 {
-uint64_t wr_mask = 

Re: [PATCH] tests/docker: Add flex/bison to `debian-hexagon-cross`

2022-10-16 Thread Alex Bennée


Anton Johansson  writes:

> debian-hexagon-cross contains two images, one to build the toolchain
> used for building the Hexagon tests themselves, and one image to build
> QEMU and run the tests.
>
> This commit adds flex/bison to the final image that builds QEMU so that
> it can also build idef-parser.
>
> Note: This container is not built by the CI and needs to be rebuilt and
> updated manually.

Queued to testing/next, thanks.

-- 
Alex Bennée



[PATCH v3 7/9] hw/ppc/e500: Implement pflash handling

2022-10-16 Thread Bernhard Beschow
Allows e500 boards to have their root file system reside on flash using
only builtin devices located in the eLBC memory region.

Note that the flash memory area is only created when a -pflash argument is
given, and that the size is determined by the given file. The idea is to
put users into control.

Signed-off-by: Bernhard Beschow 
---
 docs/system/ppc/ppce500.rst | 16 ++
 hw/ppc/Kconfig  |  1 +
 hw/ppc/e500.c   | 62 +
 3 files changed, 79 insertions(+)

diff --git a/docs/system/ppc/ppce500.rst b/docs/system/ppc/ppce500.rst
index ba6bcb7314..99d2c680d6 100644
--- a/docs/system/ppc/ppce500.rst
+++ b/docs/system/ppc/ppce500.rst
@@ -165,3 +165,19 @@ if “-device eTSEC” is given to QEMU:
 .. code-block:: bash
 
   -netdev tap,ifname=tap0,script=no,downscript=no,id=net0 -device 
eTSEC,netdev=net0
+
+Root file system on flash drive
+---
+
+Rather than using a root file system on ram disk, it is possible to have it on
+CFI flash. Given an ext2 image whose size must be a power of two, it can be 
used
+as follows:
+
+.. code-block:: bash
+
+  $ qemu-system-ppc{64|32} -M ppce500 -cpu e500mc -smp 4 -m 2G \
+  -display none -serial stdio \
+  -kernel vmlinux \
+  -drive if=pflash,file=/path/to/rootfs.ext2,format=raw \
+  -append "rootwait root=/dev/mtdblock0"
+
diff --git a/hw/ppc/Kconfig b/hw/ppc/Kconfig
index 791fe78a50..769a1ead1c 100644
--- a/hw/ppc/Kconfig
+++ b/hw/ppc/Kconfig
@@ -126,6 +126,7 @@ config E500
 select ETSEC
 select GPIO_MPC8XXX
 select OPENPIC
+select PFLASH_CFI01
 select PLATFORM_BUS
 select PPCE500_PCI
 select SERIAL
diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c
index 3e950ea3ba..23d2c3451a 100644
--- a/hw/ppc/e500.c
+++ b/hw/ppc/e500.c
@@ -23,8 +23,10 @@
 #include "e500-ccsr.h"
 #include "net/net.h"
 #include "qemu/config-file.h"
+#include "hw/block/flash.h"
 #include "hw/char/serial.h"
 #include "hw/pci/pci.h"
+#include "sysemu/block-backend-io.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "sysemu/reset.h"
@@ -267,6 +269,31 @@ static void sysbus_device_create_devtree(SysBusDevice 
*sbdev, void *opaque)
 }
 }
 
+static void create_devtree_flash(SysBusDevice *sbdev,
+ PlatformDevtreeData *data)
+{
+g_autofree char *name = NULL;
+uint64_t num_blocks = object_property_get_uint(OBJECT(sbdev),
+   "num-blocks",
+   _fatal);
+uint64_t sector_length = object_property_get_uint(OBJECT(sbdev),
+  "sector-length",
+  _fatal);
+uint64_t bank_width = object_property_get_uint(OBJECT(sbdev),
+   "width",
+   _fatal);
+hwaddr flashbase = 0;
+hwaddr flashsize = num_blocks * sector_length;
+void *fdt = data->fdt;
+
+name = g_strdup_printf("%s/nor@%" PRIx64, data->node, flashbase);
+qemu_fdt_add_subnode(fdt, name);
+qemu_fdt_setprop_string(fdt, name, "compatible", "cfi-flash");
+qemu_fdt_setprop_sized_cells(fdt, name, "reg",
+ 1, flashbase, 1, flashsize);
+qemu_fdt_setprop_cell(fdt, name, "bank-width", bank_width);
+}
+
 static void platform_bus_create_devtree(PPCE500MachineState *pms,
 void *fdt, const char *mpic)
 {
@@ -276,6 +303,8 @@ static void platform_bus_create_devtree(PPCE500MachineState 
*pms,
 uint64_t addr = pmc->platform_bus_base;
 uint64_t size = pmc->platform_bus_size;
 int irq_start = pmc->platform_bus_first_irq;
+SysBusDevice *sbdev;
+bool ambiguous;
 
 /* Create a /platform node that we can put all devices into */
 
@@ -302,6 +331,13 @@ static void 
platform_bus_create_devtree(PPCE500MachineState *pms,
 /* Loop through all dynamic sysbus devices and create nodes for them */
 foreach_dynamic_sysbus_device(sysbus_device_create_devtree, );
 
+sbdev = SYS_BUS_DEVICE(object_resolve_path_type("", TYPE_PFLASH_CFI01,
+));
+if (sbdev) {
+assert(!ambiguous);
+create_devtree_flash(sbdev, );
+}
+
 g_free(node);
 }
 
@@ -856,6 +892,7 @@ void ppce500_init(MachineState *machine)
 unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4};
 IrqLines *irqs;
 DeviceState *dev, *mpicdev;
+DriveInfo *dinfo;
 CPUPPCState *firstenv = NULL;
 MemoryRegion *ccsr_addr_space;
 SysBusDevice *s;
@@ -1024,6 +1061,31 @@ void ppce500_init(MachineState *machine)
 pmc->platform_bus_base,
 >pbus_dev->mmio);
 
+dinfo = drive_get(IF_PFLASH, 0, 0);
+if (dinfo) {
+BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
+

[PATCH v11 0/5] RISC-V Smstateen support

2022-10-16 Thread Mayuresh Chitale
This series adds support for the Smstateen specification which provides a
mechanism to plug the potential covert channels which are opened by extensions
that add to processor state that may not get context-switched. Currently access
to *envcfg registers and floating point(fcsr) is controlled via smstateen.

These patches can also be found on riscv_smstateen_v11 branch at:
https://github.com/mdchitale/qemu.git

Changes in v11:
- Rebase to latest riscv-to-apply.next
- set virt_inst_excp at the begining of decode_opc
- Add reviewed by in patch 4

Changes in v10:
- Add support to generate virt instruction exception after decode failure.
  Use this change for smstateen fcsr failure when virt is enabled.
- Implement single write function for *smstateen1 to *smstateen3 registers.

Changes in v9:
- Rebase to latest riscv-to-apply.next
- Add reviewed by in patches 2 and 4

Changes in v8:
- Rebase to latest riscv-to-apply.next
- Fix m-mode check for hstateen
- Fix return exception type for VU mode
- Improve commit description for patch3

Changes in v7:
- Update smstateen check as per discussion on the following issue:
  https://github.com/riscv/riscv-state-enable/issues/9
- Drop the smstateen AIA patch for now.
- Indentation and other fixes

Changes in v6:
- Sync with latest riscv-to-apply.next
- Make separate read/write ops for m/h/s/stateen1/2/3 regs
- Add check for mstateen.staten when reading or using h/s/stateen regs
- Add smstateen fcsr check for all floating point operations
- Move knobs to enable smstateen in a separate patch.

Changes in v5:
- Fix the order in which smstateen extension is added to the
  isa_edata_arr as
described in rule #3 the comment.

Changes in v4:
- Fix build issue with riscv32/riscv64-linux-user targets

Changes in v3:
- Fix coding style issues
- Fix *stateen0h index calculation

Changes in v2:
- Make h/s/envcfg bits in m/h/stateen registers as writeable by default.

Mayuresh Chitale (5):
  target/riscv: Add smstateen support
  target/riscv: smstateen check for h/s/envcfg
  target/riscv: generate virtual instruction exception
  target/riscv: smstateen check for fcsr
  target/riscv: smstateen knobs

 target/riscv/cpu.c|   2 +
 target/riscv/cpu.h|   4 +
 target/riscv/cpu_bits.h   |  37 ++
 target/riscv/csr.c| 414 +-
 target/riscv/insn_trans/trans_rvf.c.inc   |  43 ++-
 target/riscv/insn_trans/trans_rvzfh.c.inc |  12 +
 target/riscv/machine.c|  21 ++
 target/riscv/translate.c  |   8 +-
 8 files changed, 536 insertions(+), 5 deletions(-)

-- 
2.25.1




[PATCH v11 1/5] target/riscv: Add smstateen support

2022-10-16 Thread Mayuresh Chitale
Smstateen extension specifies a mechanism to close
the potential covert channels that could cause security issues.

This patch adds the CSRs defined in the specification and
the corresponding predicates and read/write functions.

Signed-off-by: Mayuresh Chitale 
Reviewed-by: Weiwei Li 
---
 target/riscv/cpu.h  |   4 +
 target/riscv/cpu_bits.h |  37 +
 target/riscv/csr.c  | 316 
 target/riscv/machine.c  |  21 +++
 4 files changed, 378 insertions(+)

diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 3a9e25053f..040ed13675 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -366,6 +366,9 @@ struct CPUArchState {
 
 /* CSRs for execution enviornment configuration */
 uint64_t menvcfg;
+uint64_t mstateen[SMSTATEEN_MAX_COUNT];
+uint64_t hstateen[SMSTATEEN_MAX_COUNT];
+uint64_t sstateen[SMSTATEEN_MAX_COUNT];
 target_ulong senvcfg;
 uint64_t henvcfg;
 #endif
@@ -441,6 +444,7 @@ struct RISCVCPUConfig {
 bool ext_ifencei;
 bool ext_icsr;
 bool ext_zihintpause;
+bool ext_smstateen;
 bool ext_sstc;
 bool ext_svinval;
 bool ext_svnapot;
diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index d8f5f0abed..8b0d7e20ea 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -197,6 +197,12 @@
 /* Supervisor Configuration CSRs */
 #define CSR_SENVCFG 0x10A
 
+/* Supervisor state CSRs */
+#define CSR_SSTATEEN0   0x10C
+#define CSR_SSTATEEN1   0x10D
+#define CSR_SSTATEEN2   0x10E
+#define CSR_SSTATEEN3   0x10F
+
 /* Supervisor Trap Handling */
 #define CSR_SSCRATCH0x140
 #define CSR_SEPC0x141
@@ -244,6 +250,16 @@
 #define CSR_HENVCFG 0x60A
 #define CSR_HENVCFGH0x61A
 
+/* Hypervisor state CSRs */
+#define CSR_HSTATEEN0   0x60C
+#define CSR_HSTATEEN0H  0x61C
+#define CSR_HSTATEEN1   0x60D
+#define CSR_HSTATEEN1H  0x61D
+#define CSR_HSTATEEN2   0x60E
+#define CSR_HSTATEEN2H  0x61E
+#define CSR_HSTATEEN3   0x60F
+#define CSR_HSTATEEN3H  0x61F
+
 /* Virtual CSRs */
 #define CSR_VSSTATUS0x200
 #define CSR_VSIE0x204
@@ -289,6 +305,27 @@
 #define CSR_MENVCFG 0x30A
 #define CSR_MENVCFGH0x31A
 
+/* Machine state CSRs */
+#define CSR_MSTATEEN0   0x30C
+#define CSR_MSTATEEN0H  0x31C
+#define CSR_MSTATEEN1   0x30D
+#define CSR_MSTATEEN1H  0x31D
+#define CSR_MSTATEEN2   0x30E
+#define CSR_MSTATEEN2H  0x31E
+#define CSR_MSTATEEN3   0x30F
+#define CSR_MSTATEEN3H  0x31F
+
+/* Common defines for all smstateen */
+#define SMSTATEEN_MAX_COUNT 4
+#define SMSTATEEN0_CS   (1ULL << 0)
+#define SMSTATEEN0_FCSR (1ULL << 1)
+#define SMSTATEEN0_HSCONTXT (1ULL << 57)
+#define SMSTATEEN0_IMSIC(1ULL << 58)
+#define SMSTATEEN0_AIA  (1ULL << 59)
+#define SMSTATEEN0_SVSLCT   (1ULL << 60)
+#define SMSTATEEN0_HSENVCFG (1ULL << 62)
+#define SMSTATEEN_STATEEN   (1ULL << 63)
+
 /* Enhanced Physical Memory Protection (ePMP) */
 #define CSR_MSECCFG 0x747
 #define CSR_MSECCFGH0x757
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 5c9a7ee287..c861424e85 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -283,6 +283,72 @@ static RISCVException umode32(CPURISCVState *env, int 
csrno)
 return umode(env, csrno);
 }
 
+static RISCVException mstateen(CPURISCVState *env, int csrno)
+{
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (!cpu->cfg.ext_smstateen) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+return any(env, csrno);
+}
+
+static RISCVException hstateen_pred(CPURISCVState *env, int csrno, int base)
+{
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (!cpu->cfg.ext_smstateen) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (env->priv < PRV_M) {
+if (!(env->mstateen[csrno - base] & SMSTATEEN_STATEEN)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+}
+
+return hmode(env, csrno);
+}
+
+static RISCVException hstateen(CPURISCVState *env, int csrno)
+{
+return hstateen_pred(env, csrno, CSR_HSTATEEN0);
+}
+
+static RISCVException hstateenh(CPURISCVState *env, int csrno)
+{
+return hstateen_pred(env, csrno, CSR_HSTATEEN0H);
+}
+
+static RISCVException sstateen(CPURISCVState *env, int csrno)
+{
+bool virt = riscv_cpu_virt_enabled(env);
+int index = csrno - CSR_SSTATEEN0;
+CPUState *cs = env_cpu(env);
+RISCVCPU *cpu = RISCV_CPU(cs);
+
+if (!cpu->cfg.ext_smstateen) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (env->priv < PRV_M) {
+if (!(env->mstateen[index] & SMSTATEEN_STATEEN)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (virt) {
+if (!(env->hstateen[index] & SMSTATEEN_STATEEN)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+}
+}
+
+return smode(env, csrno);
+}
+
 /* 

Re: [PATCH v2 13/13] hw/ppc/e500: Add Freescale eSDHC to e500 boards

2022-10-16 Thread Bernhard Beschow
Am 3. Oktober 2022 21:06:57 UTC schrieb "Philippe Mathieu-Daudé" 
:
>On 3/10/22 22:31, Bernhard Beschow wrote:
>> Adds missing functionality to emulated e500 SOCs which increases the
>> chance of given "real" firmware images to access SD cards.
>> 
>> Signed-off-by: Bernhard Beschow 
>> ---
>>   docs/system/ppc/ppce500.rst | 13 +
>>   hw/ppc/Kconfig  |  1 +
>>   hw/ppc/e500.c   | 31 ++-
>>   3 files changed, 44 insertions(+), 1 deletion(-)
>
>> +static void dt_sdhc_create(void *fdt, const char *parent, const char *mpic)
>> +{
>> +hwaddr mmio = MPC85XX_ESDHC_REGS_OFFSET;
>> +hwaddr size = MPC85XX_ESDHC_REGS_SIZE;
>> +int irq = MPC85XX_ESDHC_IRQ;
>
>Why not pass these 3 variable as argument?

Besides looking for a way to derive these parameters from QOM properties I 
wanted to keep the code consistent to existing one, e.g. dt_i2c_create().

Best regards,
Bernhard
>
>> +g_autofree char *name = NULL;
>> +
>> +name = g_strdup_printf("%s/sdhc@%" PRIx64, parent, mmio);
>> +qemu_fdt_add_subnode(fdt, name);
>> +qemu_fdt_setprop(fdt, name, "sdhci,auto-cmd12", NULL, 0);
>> +qemu_fdt_setprop_phandle(fdt, name, "interrupt-parent", mpic);
>> +qemu_fdt_setprop_cells(fdt, name, "bus-width", 4);
>> +qemu_fdt_setprop_cells(fdt, name, "interrupts", irq, 0x2);
>> +qemu_fdt_setprop_cells(fdt, name, "reg", mmio, size);
>> +qemu_fdt_setprop_string(fdt, name, "compatible", "fsl,esdhc");
>> +}
>> typedef struct PlatformDevtreeData {
>>   void *fdt;
>> @@ -553,6 +573,8 @@ static int ppce500_load_device_tree(PPCE500MachineState 
>> *pms,
>> dt_rtc_create(fdt, "i2c", "rtc");
>>   +/* sdhc */
>> +dt_sdhc_create(fdt, soc, mpic);
>>   



[PATCH v3 6/9] hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*

2022-10-16 Thread Bernhard Beschow
The device model's functions start with "usdhc_", so rename the defines
accordingly for consistency.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Bin Meng 
---
 hw/sd/sdhci.c | 66 +--
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 6da5e2c781..306070c872 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1577,24 +1577,24 @@ static const TypeInfo sdhci_bus_info = {
 
 /* --- qdev i.MX eSDHC --- */
 
-#define ESDHC_MIX_CTRL  0x48
+#define USDHC_MIX_CTRL  0x48
 
-#define ESDHC_VENDOR_SPEC   0xc0
-#define ESDHC_IMX_FRC_SDCLK_ON  (1 << 8)
+#define USDHC_VENDOR_SPEC   0xc0
+#define USDHC_IMX_FRC_SDCLK_ON  (1 << 8)
 
-#define ESDHC_DLL_CTRL  0x60
+#define USDHC_DLL_CTRL  0x60
 
-#define ESDHC_TUNING_CTRL   0xcc
-#define ESDHC_TUNE_CTRL_STATUS  0x68
-#define ESDHC_WTMK_LVL  0x44
+#define USDHC_TUNING_CTRL   0xcc
+#define USDHC_TUNE_CTRL_STATUS  0x68
+#define USDHC_WTMK_LVL  0x44
 
 /* Undocumented register used by guests working around erratum ERR004536 */
-#define ESDHC_UNDOCUMENTED_REG270x6c
+#define USDHC_UNDOCUMENTED_REG270x6c
 
-#define ESDHC_CTRL_4BITBUS  (0x1 << 1)
-#define ESDHC_CTRL_8BITBUS  (0x2 << 1)
+#define USDHC_CTRL_4BITBUS  (0x1 << 1)
+#define USDHC_CTRL_8BITBUS  (0x2 << 1)
 
-#define ESDHC_PRNSTS_SDSTB  (1 << 3)
+#define USDHC_PRNSTS_SDSTB  (1 << 3)
 
 static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
 {
@@ -1615,11 +1615,11 @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, 
unsigned size)
 hostctl1 = SDHC_DMA_TYPE(s->hostctl1) << (8 - 3);
 
 if (s->hostctl1 & SDHC_CTRL_8BITBUS) {
-hostctl1 |= ESDHC_CTRL_8BITBUS;
+hostctl1 |= USDHC_CTRL_8BITBUS;
 }
 
 if (s->hostctl1 & SDHC_CTRL_4BITBUS) {
-hostctl1 |= ESDHC_CTRL_4BITBUS;
+hostctl1 |= USDHC_CTRL_4BITBUS;
 }
 
 ret  = hostctl1;
@@ -1630,21 +1630,21 @@ static uint64_t usdhc_read(void *opaque, hwaddr offset, 
unsigned size)
 
 case SDHC_PRNSTS:
 /* Add SDSTB (SD Clock Stable) bit to PRNSTS */
-ret = sdhci_read(opaque, offset, size) & ~ESDHC_PRNSTS_SDSTB;
+ret = sdhci_read(opaque, offset, size) & ~USDHC_PRNSTS_SDSTB;
 if (s->clkcon & SDHC_CLOCK_INT_STABLE) {
-ret |= ESDHC_PRNSTS_SDSTB;
+ret |= USDHC_PRNSTS_SDSTB;
 }
 break;
 
-case ESDHC_VENDOR_SPEC:
+case USDHC_VENDOR_SPEC:
 ret = s->vendor_spec;
 break;
-case ESDHC_DLL_CTRL:
-case ESDHC_TUNE_CTRL_STATUS:
-case ESDHC_UNDOCUMENTED_REG27:
-case ESDHC_TUNING_CTRL:
-case ESDHC_MIX_CTRL:
-case ESDHC_WTMK_LVL:
+case USDHC_DLL_CTRL:
+case USDHC_TUNE_CTRL_STATUS:
+case USDHC_UNDOCUMENTED_REG27:
+case USDHC_TUNING_CTRL:
+case USDHC_MIX_CTRL:
+case USDHC_WTMK_LVL:
 ret = 0;
 break;
 }
@@ -1660,18 +1660,18 @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 uint32_t value = (uint32_t)val;
 
 switch (offset) {
-case ESDHC_DLL_CTRL:
-case ESDHC_TUNE_CTRL_STATUS:
-case ESDHC_UNDOCUMENTED_REG27:
-case ESDHC_TUNING_CTRL:
-case ESDHC_WTMK_LVL:
+case USDHC_DLL_CTRL:
+case USDHC_TUNE_CTRL_STATUS:
+case USDHC_UNDOCUMENTED_REG27:
+case USDHC_TUNING_CTRL:
+case USDHC_WTMK_LVL:
 break;
 
-case ESDHC_VENDOR_SPEC:
+case USDHC_VENDOR_SPEC:
 s->vendor_spec = value;
 switch (s->vendor) {
 case SDHCI_VENDOR_IMX:
-if (value & ESDHC_IMX_FRC_SDCLK_ON) {
+if (value & USDHC_IMX_FRC_SDCLK_ON) {
 s->prnsts &= ~SDHC_IMX_CLOCK_GATE_OFF;
 } else {
 s->prnsts |= SDHC_IMX_CLOCK_GATE_OFF;
@@ -1740,12 +1740,12 @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
  * Second, split "Data Transfer Width" from bits 2 and 1 in to
  * bits 5 and 1
  */
-if (value & ESDHC_CTRL_8BITBUS) {
+if (value & USDHC_CTRL_8BITBUS) {
 hostctl1 |= SDHC_CTRL_8BITBUS;
 }
 
-if (value & ESDHC_CTRL_4BITBUS) {
-hostctl1 |= ESDHC_CTRL_4BITBUS;
+if (value & USDHC_CTRL_4BITBUS) {
+hostctl1 |= USDHC_CTRL_4BITBUS;
 }
 
 /*
@@ -1768,7 +1768,7 @@ usdhc_write(void *opaque, hwaddr offset, uint64_t val, 
unsigned size)
 sdhci_write(opaque, offset, value, size);
 break;
 
-case ESDHC_MIX_CTRL:
+case USDHC_MIX_CTRL:
 /*
  * So, when SD/MMC stack in Linux tries to write to "Transfer
  * Mode Register", ESDHC i.MX quirk code will translate it
-- 
2.38.0




Re: [PATCH v3 0/9] ppc/e500: Add support for two types of flash, cleanup

2022-10-16 Thread Bernhard Beschow
Am 16. Oktober 2022 12:27:28 UTC schrieb Bernhard Beschow :
>Cover letter:
>
>~
>
>
>
>This series adds support for -pflash and direct SD card access to the
>
>PPC e500 boards. The idea is to increase compatibility with "real" firmware
>
>images where only the bare minimum of drivers is compiled in.
>
>
>
>The series is structured as follows:
>
>
>
>Patches 1-6 perform some general cleanup which paves the way for the rest of
>
>the series.
>
>
>
>Patch 7 adds -pflash handling where memory-mapped flash can be added on
>
>user's behalf. That is, the flash memory region in the eLBC is only added if
>
>the -pflash argument is supplied. Note that the cfi01 device model becomes
>
>stricter in checking the size of the emulated flash space.
>
>
>
>Patches 8 and 9 add a new device model - the Freescale eSDHC - to the e500
>
>boards which was missing so far.
>
>
>
>User documentation is also added as the new features become available.
>
>
>
>Tesing done:
>
>* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
>
>"console=ttyS0 rootwait root=/dev/mtdblock0 nokaslr" -drive
>
>if=pflash,file=rootfs.ext2,format=raw`
>
>* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
>
>"console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive -drive
>
>id=mydrive,if=none,file=rootfs.ext2,format=raw`
>
>
>
>The load was created using latest Buildroot with `make
>
>qemu_ppc_e500mc_defconfig` where the rootfs was configured to be of ext2 type.
>
>In both cases it was possible to log in and explore the root file system.
>
>
>
>v3:
>
>~~~
>
>Phil:
>
>- Also add power-of-2 fix to pflash_cfi02
>
>- Resolve cfi01-specific assertion in e500 code
>
>- Resolve unused define in eSDHC device model
>
>- Resolve redundant alignment checks in eSDHC device model
>
>
>
>Bin:
>
>- Add dedicated flash chapter to documentation
>
>
>
>Bernhard:
>
>- Use is_power_of_2() instead of ctpop64() for better readability
>
>- Only instantiate eSDHC device model in ppce500 (not used in MPC8544DS)
>
>- Rebase onto gitlab.com/danielhb/qemu/tree/ppc-next
- Move cfi0x memory region setup into board code to avoid cfi01-specific 
assertion there
- While at it, resolve unreachable code related to cfi01 device creation
- Reorder patches such that trivial patches come first

Best regards,
Bernhard

>
>
>
>v2:
>
>~~~
>
>Bin:
>
>- Add source for MPC8544DS platform bus' memory map in commit message.
>
>- Keep "ESDHC" in comment referring to Linux driver.
>
>- Use "qemu-system-ppc{64|32} in documentation.
>
>- Use g_autofree in device tree code.
>
>- Remove unneeded device tree properties.
>
>- Error out if pflash size doesn't fit into eLBC memory window.
>
>- Remove unused ESDHC defines.
>
>- Define macro ESDHC_WML for register offset with magic constant.
>
>- Fix some whitespace issues when adding eSDHC device to e500.
>
>
>
>Phil:
>
>- Fix tense in commit message.
>
>
>
>Bernhard Beschow (9):
>
>  hw/block/pflash_cfi0{1,2}: Error out if device length isn't a power of
>
>two
>
>  hw/{arm,ppc}: Resolve unreachable code
>
>  hw/block/pflash_cfi01: Attach memory region in boards
>
>  hw/block/pflash_cfi02: Attach memory region in boards
>
>  hw/sd/sdhci-internal: Unexport ESDHC defines
>
>  hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
>
>  hw/ppc/e500: Implement pflash handling
>
>  hw/sd/sdhci: Implement Freescale eSDHC device model
>
>  hw/ppc/e500: Add Freescale eSDHC to e500plat
>
>
>
> docs/system/ppc/ppce500.rst  |  28 
>
> hw/arm/collie.c  |  20 ++-
>
> hw/arm/digic_boards.c|  16 +-
>
> hw/arm/gumstix.c |  24 +--
>
> hw/arm/mainstone.c   |  15 +-
>
> hw/arm/musicpal.c|  15 +-
>
> hw/arm/omap_sx1.c|  25 ++--
>
> hw/arm/versatilepb.c |  14 +-
>
> hw/arm/xilinx_zynq.c |  12 +-
>
> hw/arm/z2.c  |  12 +-
>
> hw/block/pflash_cfi01.c  |  12 +-
>
> hw/block/pflash_cfi02.c  |  14 +-
>
> hw/microblaze/petalogix_ml605_mmu.c  |  16 +-
>
> hw/microblaze/petalogix_s3adsp1800_mmu.c |  10 +-
>
> hw/mips/malta.c  |   4 +-
>
> hw/ppc/Kconfig   |   2 +
>
> hw/ppc/e500.c|  97 +++-
>
> hw/ppc/e500.h|   1 +
>
> hw/ppc/e500plat.c|   1 +
>
> hw/ppc/sam460ex.c|  19 ++-
>
> hw/ppc/virtex_ml507.c|   5 +-
>
> hw/sd/sdhci-internal.h   |  20 ---
>
> hw/sd/sdhci.c| 183 ---
>
> hw/sh4/r2d.c |  11 +-
>
> include/hw/block/flash.h |   7 +-
>
> include/hw/sd/sdhci.h|   3 +
>
> 26 files changed, 433 insertions(+), 153 deletions(-)
>
>
>
>-- >
>2.38.0
>
>
>




[PATCH v3 8/9] hw/sd/sdhci: Implement Freescale eSDHC device model

2022-10-16 Thread Bernhard Beschow
Will allow e500 boards to access SD cards using just their own devices.

Signed-off-by: Bernhard Beschow 
---
 hw/sd/sdhci.c | 120 +-
 include/hw/sd/sdhci.h |   3 ++
 2 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 306070c872..8d8ad9ff24 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1369,6 +1369,7 @@ void sdhci_initfn(SDHCIState *s)
 s->transfer_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, sdhci_data_transfer, 
s);
 
 s->io_ops = _mmio_ops;
+s->io_registers_map_size = SDHC_REGISTERS_MAP_SIZE;
 }
 
 void sdhci_uninitfn(SDHCIState *s)
@@ -1392,7 +1393,7 @@ void sdhci_common_realize(SDHCIState *s, Error **errp)
 s->fifo_buffer = g_malloc0(s->buf_maxsz);
 
 memory_region_init_io(>iomem, OBJECT(s), s->io_ops, s, "sdhci",
-  SDHC_REGISTERS_MAP_SIZE);
+  s->io_registers_map_size);
 }
 
 void sdhci_common_unrealize(SDHCIState *s)
@@ -1575,6 +1576,122 @@ static const TypeInfo sdhci_bus_info = {
 .class_init = sdhci_bus_class_init,
 };
 
+/* --- qdev Freescale eSDHC --- */
+
+/* Watermark Level Register */
+#define ESDHC_WML0x44
+
+/* Control Register for DMA transfer */
+#define ESDHC_DMA_SYSCTL0x40c
+
+#define ESDHC_REGISTERS_MAP_SIZE0x410
+
+static uint64_t esdhci_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t ret;
+
+switch (offset) {
+case SDHC_SYSAD:
+case SDHC_BLKSIZE:
+case SDHC_ARGUMENT:
+case SDHC_TRNMOD:
+case SDHC_RSPREG0:
+case SDHC_RSPREG1:
+case SDHC_RSPREG2:
+case SDHC_RSPREG3:
+case SDHC_BDATA:
+case SDHC_PRNSTS:
+case SDHC_HOSTCTL:
+case SDHC_CLKCON:
+case SDHC_NORINTSTS:
+case SDHC_NORINTSTSEN:
+case SDHC_NORINTSIGEN:
+case SDHC_ACMD12ERRSTS:
+case SDHC_CAPAB:
+case SDHC_SLOT_INT_STATUS:
+ret = sdhci_read(opaque, offset, size);
+break;
+
+case ESDHC_WML:
+case ESDHC_DMA_SYSCTL:
+ret = 0;
+qemu_log_mask(LOG_UNIMP, "ESDHC rd @0x%02" HWADDR_PRIx
+  " not implemented\n", offset);
+break;
+
+default:
+ret = 0;
+qemu_log_mask(LOG_GUEST_ERROR, "ESDHC rd @0x%02" HWADDR_PRIx
+  " unknown offset\n", offset);
+break;
+}
+
+return ret;
+}
+
+static void esdhci_write(void *opaque, hwaddr offset, uint64_t val,
+ unsigned size)
+{
+switch (offset) {
+case SDHC_SYSAD:
+case SDHC_BLKSIZE:
+case SDHC_ARGUMENT:
+case SDHC_TRNMOD:
+case SDHC_BDATA:
+case SDHC_HOSTCTL:
+case SDHC_CLKCON:
+case SDHC_NORINTSTS:
+case SDHC_NORINTSTSEN:
+case SDHC_NORINTSIGEN:
+case SDHC_FEAER:
+sdhci_write(opaque, offset, val, size);
+break;
+
+case ESDHC_WML:
+case ESDHC_DMA_SYSCTL:
+qemu_log_mask(LOG_UNIMP, "ESDHC wr @0x%02" HWADDR_PRIx " <- 0x%08lx "
+  "not implemented\n", offset, val);
+break;
+
+default:
+qemu_log_mask(LOG_GUEST_ERROR, "ESDHC wr @0x%02" HWADDR_PRIx
+  " <- 0x%08lx unknown offset\n", offset, val);
+break;
+}
+}
+
+static const MemoryRegionOps esdhc_mmio_ops = {
+.read = esdhci_read,
+.write = esdhci_write,
+.valid = {
+.min_access_size = 4,
+.max_access_size = 4,
+.unaligned = false
+},
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void esdhci_init(Object *obj)
+{
+DeviceState *dev = DEVICE(obj);
+SDHCIState *s = SYSBUS_SDHCI(obj);
+
+s->io_ops = _mmio_ops;
+s->io_registers_map_size = ESDHC_REGISTERS_MAP_SIZE;
+
+/*
+ * Compatible with:
+ * - SD Host Controller Specification Version 2.0 Part A2
+ */
+qdev_prop_set_uint8(dev, "sd-spec-version", 2);
+}
+
+static const TypeInfo esdhc_info = {
+.name = TYPE_FSL_ESDHC,
+.parent = TYPE_SYSBUS_SDHCI,
+.instance_init = esdhci_init,
+};
+
 /* --- qdev i.MX eSDHC --- */
 
 #define USDHC_MIX_CTRL  0x48
@@ -1907,6 +2024,7 @@ static void sdhci_register_types(void)
 {
 type_register_static(_sysbus_info);
 type_register_static(_bus_info);
+type_register_static(_info);
 type_register_static(_usdhc_info);
 type_register_static(_s3c_info);
 }
diff --git a/include/hw/sd/sdhci.h b/include/hw/sd/sdhci.h
index 01a64c5442..5b32e83eee 100644
--- a/include/hw/sd/sdhci.h
+++ b/include/hw/sd/sdhci.h
@@ -45,6 +45,7 @@ struct SDHCIState {
 AddressSpace *dma_as;
 MemoryRegion *dma_mr;
 const MemoryRegionOps *io_ops;
+uint64_t io_registers_map_size;
 
 QEMUTimer *insert_timer;   /* timer for 'changing' sd card. */
 QEMUTimer *transfer_timer;
@@ -122,6 +123,8 @@ DECLARE_INSTANCE_CHECKER(SDHCIState, PCI_SDHCI,
 DECLARE_INSTANCE_CHECKER(SDHCIState, SYSBUS_SDHCI,
  TYPE_SYSBUS_SDHCI)
 
+#define TYPE_FSL_ESDHC "fsl-esdhc"
+
 

[PATCH v3 2/9] hw/{arm,ppc}: Resolve unreachable code

2022-10-16 Thread Bernhard Beschow
pflash_cfi01_register() always returns with a non-NULL pointer (otherwise
it would crash internally). Therefore, the bodies of the if-statements
are unreachable.

Signed-off-by: Bernhard Beschow 
---
 hw/arm/gumstix.c | 18 ++
 hw/arm/mainstone.c   | 13 +
 hw/arm/omap_sx1.c| 22 --
 hw/arm/versatilepb.c |  6 ++
 hw/arm/z2.c  |  9 +++--
 hw/ppc/sam460ex.c| 12 
 6 files changed, 28 insertions(+), 52 deletions(-)

diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index 3a4bc332c4..1296628ed9 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -65,12 +65,9 @@ static void connex_init(MachineState *machine)
 exit(1);
 }
 
-if (!pflash_cfi01_register(0x, "connext.rom", connex_rom,
-   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-   sector_len, 2, 0, 0, 0, 0, 0)) {
-error_report("Error registering flash memory");
-exit(1);
-}
+pflash_cfi01_register(0x, "connext.rom", connex_rom,
+  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+  sector_len, 2, 0, 0, 0, 0, 0);
 
 /* Interrupt line of NIC is connected to GPIO line 36 */
 smc91c111_init(_table[0], 0x04000300,
@@ -95,12 +92,9 @@ static void verdex_init(MachineState *machine)
 exit(1);
 }
 
-if (!pflash_cfi01_register(0x, "verdex.rom", verdex_rom,
-   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-   sector_len, 2, 0, 0, 0, 0, 0)) {
-error_report("Error registering flash memory");
-exit(1);
-}
+pflash_cfi01_register(0x, "verdex.rom", verdex_rom,
+  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+  sector_len, 2, 0, 0, 0, 0, 0);
 
 /* Interrupt line of NIC is connected to GPIO line 99 */
 smc91c111_init(_table[0], 0x04000300,
diff --git a/hw/arm/mainstone.c b/hw/arm/mainstone.c
index 8454b65458..40f708f2d3 100644
--- a/hw/arm/mainstone.c
+++ b/hw/arm/mainstone.c
@@ -130,14 +130,11 @@ static void mainstone_common_init(MemoryRegion 
*address_space_mem,
 /* There are two 32MiB flash devices on the board */
 for (i = 0; i < 2; i ++) {
 dinfo = drive_get(IF_PFLASH, 0, i);
-if (!pflash_cfi01_register(mainstone_flash_base[i],
-   i ? "mainstone.flash1" : "mainstone.flash0",
-   MAINSTONE_FLASH,
-   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-   sector_len, 4, 0, 0, 0, 0, 0)) {
-error_report("Error registering flash memory");
-exit(1);
-}
+pflash_cfi01_register(mainstone_flash_base[i],
+  i ? "mainstone.flash1" : "mainstone.flash0",
+  MAINSTONE_FLASH,
+  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+  sector_len, 4, 0, 0, 0, 0, 0);
 }
 
 mst_irq = sysbus_create_simple("mainstone-fpga", MST_FPGA_PHYS,
diff --git a/hw/arm/omap_sx1.c b/hw/arm/omap_sx1.c
index 57829b3744..820652265b 100644
--- a/hw/arm/omap_sx1.c
+++ b/hw/arm/omap_sx1.c
@@ -153,13 +153,10 @@ static void sx1_init(MachineState *machine, const int 
version)
 
 fl_idx = 0;
 if ((dinfo = drive_get(IF_PFLASH, 0, fl_idx)) != NULL) {
-if (!pflash_cfi01_register(OMAP_CS0_BASE,
-   "omap_sx1.flash0-1", flash_size,
-   blk_by_legacy_dinfo(dinfo),
-   sector_size, 4, 0, 0, 0, 0, 0)) {
-fprintf(stderr, "qemu: Error registering flash memory %d.\n",
-   fl_idx);
-}
+pflash_cfi01_register(OMAP_CS0_BASE,
+  "omap_sx1.flash0-1", flash_size,
+  blk_by_legacy_dinfo(dinfo),
+  sector_size, 4, 0, 0, 0, 0, 0);
 fl_idx++;
 }
 
@@ -175,13 +172,10 @@ static void sx1_init(MachineState *machine, const int 
version)
 memory_region_add_subregion(address_space,
 OMAP_CS1_BASE + flash1_size, [1]);
 
-if (!pflash_cfi01_register(OMAP_CS1_BASE,
-   "omap_sx1.flash1-1", flash1_size,
-   blk_by_legacy_dinfo(dinfo),
-   sector_size, 4, 0, 0, 0, 0, 0)) {
-fprintf(stderr, "qemu: Error registering flash memory %d.\n",
-   fl_idx);
-}
+pflash_cfi01_register(OMAP_CS1_BASE,
+  "omap_sx1.flash1-1", flash1_size,
+  blk_by_legacy_dinfo(dinfo),
+  sector_size, 4, 0, 0, 0, 0, 0);
 fl_idx++;
 } else {
 

[PATCH v3 4/9] hw/block/pflash_cfi02: Attach memory region in boards

2022-10-16 Thread Bernhard Beschow
pflash_cfi02_register() had a parameter which was only passed to
sysbus_mmio_map() but not used otherwise. Pulling out sysbus_mmio_map()
resolves that parameter and concentrates the memory region setup in
board code. Furthermore, it allows attaching cfi02 devices relative to
some parent bus rather than to the global "sysbus".

While at it, replace sysbus_mmio_map() with non-sysbus equivalents.

Signed-off-by: Bernhard Beschow 
---
 hw/arm/digic_boards.c| 16 ++--
 hw/arm/musicpal.c| 15 +--
 hw/arm/xilinx_zynq.c | 12 +++-
 hw/block/pflash_cfi02.c  |  9 ++---
 hw/sh4/r2d.c | 11 +++
 include/hw/block/flash.h |  4 ++--
 6 files changed, 41 insertions(+), 26 deletions(-)

diff --git a/hw/arm/digic_boards.c b/hw/arm/digic_boards.c
index 4093af09cb..d3c5426cf9 100644
--- a/hw/arm/digic_boards.c
+++ b/hw/arm/digic_boards.c
@@ -116,12 +116,16 @@ static void digic4_add_k8p3215uqb_rom(DigicState *s, 
hwaddr addr,
 #define FLASH_K8P3215UQB_SIZE (4 * 1024 * 1024)
 #define FLASH_K8P3215UQB_SECTOR_SIZE (64 * 1024)
 
-pflash_cfi02_register(addr, "pflash", FLASH_K8P3215UQB_SIZE,
-  NULL, FLASH_K8P3215UQB_SECTOR_SIZE,
-  DIGIC4_ROM_MAX_SIZE / FLASH_K8P3215UQB_SIZE,
-  4,
-  0x00EC, 0x007E, 0x0003, 0x0001,
-  0x0555, 0x2aa, 0);
+PFlashCFI02 *pfl;
+
+pfl = pflash_cfi02_register("pflash", FLASH_K8P3215UQB_SIZE,
+NULL, FLASH_K8P3215UQB_SECTOR_SIZE,
+DIGIC4_ROM_MAX_SIZE / FLASH_K8P3215UQB_SIZE,
+4,
+0x00EC, 0x007E, 0x0003, 0x0001,
+0x0555, 0x2aa, 0);
+memory_region_add_subregion(get_system_memory(), addr,
+pflash_cfi02_get_memory(pfl));
 
 digic_load_rom(s, addr, FLASH_K8P3215UQB_SIZE, filename);
 }
diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index b65c020115..efad741f6d 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -1261,6 +1261,7 @@ static void musicpal_init(MachineState *machine)
 /* Register flash */
 dinfo = drive_get(IF_PFLASH, 0, 0);
 if (dinfo) {
+PFlashCFI02 *pfl;
 BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
 
 flash_size = blk_getlength(blk);
@@ -1275,12 +1276,14 @@ static void musicpal_init(MachineState *machine)
  * 0xFF80 (if there is 8 MB flash). So remap flash access if the
  * image is smaller than 32 MB.
  */
-pflash_cfi02_register(0x1ULL - MP_FLASH_SIZE_MAX,
-  "musicpal.flash", flash_size,
-  blk, 0x1,
-  MP_FLASH_SIZE_MAX / flash_size,
-  2, 0x00BF, 0x236D, 0x, 0x,
-  0x, 0x2AAA, 0);
+pfl = pflash_cfi02_register("musicpal.flash", flash_size,
+blk, 0x1,
+MP_FLASH_SIZE_MAX / flash_size,
+2, 0x00BF, 0x236D, 0x, 0x,
+0x, 0x2AAA, 0);
+memory_region_add_subregion(address_space_mem,
+0x1ULL - MP_FLASH_SIZE_MAX,
+pflash_cfi02_get_memory(pfl));
 }
 sysbus_create_simple(TYPE_MV88W8618_FLASHCFG, MP_FLASHCFG_BASE, NULL);
 
diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
index 3190cc0b8d..a2abb1cf31 100644
--- a/hw/arm/xilinx_zynq.c
+++ b/hw/arm/xilinx_zynq.c
@@ -182,6 +182,7 @@ static void zynq_init(MachineState *machine)
 MemoryRegion *ocm_ram = g_new(MemoryRegion, 1);
 DeviceState *dev, *slcr;
 SysBusDevice *busdev;
+PFlashCFI02 *pfl;
 qemu_irq pic[64];
 int n;
 
@@ -218,11 +219,12 @@ static void zynq_init(MachineState *machine)
 DriveInfo *dinfo = drive_get(IF_PFLASH, 0, 0);
 
 /* AMD */
-pflash_cfi02_register(0xe200, "zynq.pflash", FLASH_SIZE,
-  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-  FLASH_SECTOR_SIZE, 1,
-  1, 0x0066, 0x0022, 0x, 0x, 0x0555, 0x2aa,
-  0);
+pfl = pflash_cfi02_register("zynq.pflash", FLASH_SIZE,
+dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+FLASH_SECTOR_SIZE, 1, 1, 0x0066, 0x0022, 
0x,
+0x, 0x0555, 0x2aa, 0);
+memory_region_add_subregion(address_space_mem, 0xe200,
+pflash_cfi02_get_memory(pfl));
 
 /* Create the main clock source, and feed slcr with it */
 zynq_machine->ps_clk = CLOCK(object_new(TYPE_CLOCK));
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 

[PATCH v3 1/9] hw/block/pflash_cfi0{1, 2}: Error out if device length isn't a power of two

2022-10-16 Thread Bernhard Beschow
According to the JEDEC standard the device length is communicated to an
OS as an exponent (power of two).

Signed-off-by: Bernhard Beschow 
Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/block/pflash_cfi01.c | 8 ++--
 hw/block/pflash_cfi02.c | 5 +
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/block/pflash_cfi01.c b/hw/block/pflash_cfi01.c
index 0cbc2fb4cb..9c235bf66e 100644
--- a/hw/block/pflash_cfi01.c
+++ b/hw/block/pflash_cfi01.c
@@ -690,7 +690,7 @@ static const MemoryRegionOps pflash_cfi01_ops = {
 .endianness = DEVICE_NATIVE_ENDIAN,
 };
 
-static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl)
+static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl, Error **errp)
 {
 uint64_t blocks_per_device, sector_len_per_device, device_len;
 int num_devices;
@@ -708,6 +708,10 @@ static void pflash_cfi01_fill_cfi_table(PFlashCFI01 *pfl)
 sector_len_per_device = pfl->sector_len / num_devices;
 }
 device_len = sector_len_per_device * blocks_per_device;
+if (!is_power_of_2(device_len)) {
+error_setg(errp, "Device size must be a power of two.");
+return;
+}
 
 /* Hardcoded CFI table */
 /* Standard "QRY" string */
@@ -865,7 +869,7 @@ static void pflash_cfi01_realize(DeviceState *dev, Error 
**errp)
  */
 pfl->cmd = 0x00;
 pfl->status = 0x80; /* WSM ready */
-pflash_cfi01_fill_cfi_table(pfl);
+pflash_cfi01_fill_cfi_table(pfl, errp);
 }
 
 static void pflash_cfi01_system_reset(DeviceState *dev)
diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
index 2a99b286b0..ff2fe154c1 100644
--- a/hw/block/pflash_cfi02.c
+++ b/hw/block/pflash_cfi02.c
@@ -880,6 +880,11 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+if (!is_power_of_2(pfl->chip_len)) {
+error_setg(errp, "Device size must be a power of two.");
+return;
+}
+
 memory_region_init_rom_device(>orig_mem, OBJECT(pfl),
   _cfi02_ops, pfl, pfl->name,
   pfl->chip_len, errp);
-- 
2.38.0




[PATCH v3 3/9] hw/block/pflash_cfi01: Attach memory region in boards

2022-10-16 Thread Bernhard Beschow
pflash_cfi01_register() had a parameter which was only passed to
sysbus_mmio_map() but not used otherwise. Pulling out sysbus_mmio_map()
resolves that parameter and concentrates the memory region setup in
board code. Furthermore, it allows attaching cfi01 devices relative to
some parent bus rather than to the global "sysbus".

While at it, replace sysbus_mmio_map() with non-sysbus equivalents.

Signed-off-by: Bernhard Beschow 
---
 hw/arm/collie.c  | 20 +---
 hw/arm/gumstix.c | 18 --
 hw/arm/mainstone.c   | 16 ++--
 hw/arm/omap_sx1.c| 19 +++
 hw/arm/versatilepb.c | 12 +++-
 hw/arm/z2.c  |  9 ++---
 hw/block/pflash_cfi01.c  |  4 +---
 hw/microblaze/petalogix_ml605_mmu.c  | 16 ++--
 hw/microblaze/petalogix_s3adsp1800_mmu.c | 10 ++
 hw/mips/malta.c  |  4 ++--
 hw/ppc/sam460ex.c| 15 +--
 hw/ppc/virtex_ml507.c|  5 -
 include/hw/block/flash.h |  3 +--
 13 files changed, 92 insertions(+), 59 deletions(-)

diff --git a/hw/arm/collie.c b/hw/arm/collie.c
index 8df31e2793..25fb5f657b 100644
--- a/hw/arm/collie.c
+++ b/hw/arm/collie.c
@@ -37,8 +37,10 @@ static struct arm_boot_info collie_binfo = {
 static void collie_init(MachineState *machine)
 {
 DriveInfo *dinfo;
+PFlashCFI01 *pfl;
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 CollieMachineState *cms = COLLIE_MACHINE(machine);
+MemoryRegion *system_memory = get_system_memory();
 
 if (machine->ram_size != mc->default_ram_size) {
 char *sz = size_to_str(mc->default_ram_size);
@@ -49,17 +51,21 @@ static void collie_init(MachineState *machine)
 
 cms->sa1110 = sa1110_init(machine->cpu_type);
 
-memory_region_add_subregion(get_system_memory(), SA_SDCS0, machine->ram);
+memory_region_add_subregion(system_memory, SA_SDCS0, machine->ram);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
-pflash_cfi01_register(SA_CS0, "collie.fl1", 0x0200,
-dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-64 * KiB, 4, 0x00, 0x00, 0x00, 0x00, 0);
+pfl = pflash_cfi01_register("collie.fl1", 0x0200,
+dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+64 * KiB, 4, 0x00, 0x00, 0x00, 0x00, 0);
+memory_region_add_subregion(system_memory, SA_CS0,
+pflash_cfi01_get_memory(pfl));
 
 dinfo = drive_get(IF_PFLASH, 0, 1);
-pflash_cfi01_register(SA_CS1, "collie.fl2", 0x0200,
-dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-64 * KiB, 4, 0x00, 0x00, 0x00, 0x00, 0);
+pfl = pflash_cfi01_register("collie.fl2", 0x0200,
+dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+64 * KiB, 4, 0x00, 0x00, 0x00, 0x00, 0);
+memory_region_add_subregion(system_memory, SA_CS1,
+pflash_cfi01_get_memory(pfl));
 
 sysbus_create_simple("scoop", 0x4080, NULL);
 
diff --git a/hw/arm/gumstix.c b/hw/arm/gumstix.c
index 1296628ed9..d6c997ad8e 100644
--- a/hw/arm/gumstix.c
+++ b/hw/arm/gumstix.c
@@ -51,6 +51,7 @@ static void connex_init(MachineState *machine)
 {
 PXA2xxState *cpu;
 DriveInfo *dinfo;
+PFlashCFI01 *pfl;
 MemoryRegion *address_space_mem = get_system_memory();
 
 uint32_t connex_rom = 0x0100;
@@ -65,9 +66,11 @@ static void connex_init(MachineState *machine)
 exit(1);
 }
 
-pflash_cfi01_register(0x, "connext.rom", connex_rom,
-  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-  sector_len, 2, 0, 0, 0, 0, 0);
+pfl = pflash_cfi01_register("connext.rom", connex_rom,
+dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
+sector_len, 2, 0, 0, 0, 0, 0);
+memory_region_add_subregion(address_space_mem, 0x,
+pflash_cfi01_get_memory(pfl));
 
 /* Interrupt line of NIC is connected to GPIO line 36 */
 smc91c111_init(_table[0], 0x04000300,
@@ -78,6 +81,7 @@ static void verdex_init(MachineState *machine)
 {
 PXA2xxState *cpu;
 DriveInfo *dinfo;
+PFlashCFI01 *pfl;
 MemoryRegion *address_space_mem = get_system_memory();
 
 uint32_t verdex_rom = 0x0200;
@@ -92,9 +96,11 @@ static void verdex_init(MachineState *machine)
 exit(1);
 }
 
-pflash_cfi01_register(0x, "verdex.rom", verdex_rom,
-  dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-  sector_len, 2, 0, 0, 0, 0, 0);
+pfl = pflash_cfi01_register("verdex.rom", verdex_rom,
+dinfo ? 

[PATCH v3 5/9] hw/sd/sdhci-internal: Unexport ESDHC defines

2022-10-16 Thread Bernhard Beschow
These defines aren't used outside of sdhci.c, so can be defined there.

Signed-off-by: Bernhard Beschow 
Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/sd/sdhci-internal.h | 20 
 hw/sd/sdhci.c  | 19 +++
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/hw/sd/sdhci-internal.h b/hw/sd/sdhci-internal.h
index e8c753d6d1..964570f8e8 100644
--- a/hw/sd/sdhci-internal.h
+++ b/hw/sd/sdhci-internal.h
@@ -288,26 +288,6 @@ enum {
 
 extern const VMStateDescription sdhci_vmstate;
 
-
-#define ESDHC_MIX_CTRL  0x48
-
-#define ESDHC_VENDOR_SPEC   0xc0
-#define ESDHC_IMX_FRC_SDCLK_ON  (1 << 8)
-
-#define ESDHC_DLL_CTRL  0x60
-
-#define ESDHC_TUNING_CTRL   0xcc
-#define ESDHC_TUNE_CTRL_STATUS  0x68
-#define ESDHC_WTMK_LVL  0x44
-
-/* Undocumented register used by guests working around erratum ERR004536 */
-#define ESDHC_UNDOCUMENTED_REG270x6c
-
-#define ESDHC_CTRL_4BITBUS  (0x1 << 1)
-#define ESDHC_CTRL_8BITBUS  (0x2 << 1)
-
-#define ESDHC_PRNSTS_SDSTB  (1 << 3)
-
 /*
  * Default SD/MMC host controller features information, which will be
  * presented in CAPABILITIES register of generic SD host controller at reset.
diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 0e5e988927..6da5e2c781 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1577,6 +1577,25 @@ static const TypeInfo sdhci_bus_info = {
 
 /* --- qdev i.MX eSDHC --- */
 
+#define ESDHC_MIX_CTRL  0x48
+
+#define ESDHC_VENDOR_SPEC   0xc0
+#define ESDHC_IMX_FRC_SDCLK_ON  (1 << 8)
+
+#define ESDHC_DLL_CTRL  0x60
+
+#define ESDHC_TUNING_CTRL   0xcc
+#define ESDHC_TUNE_CTRL_STATUS  0x68
+#define ESDHC_WTMK_LVL  0x44
+
+/* Undocumented register used by guests working around erratum ERR004536 */
+#define ESDHC_UNDOCUMENTED_REG270x6c
+
+#define ESDHC_CTRL_4BITBUS  (0x1 << 1)
+#define ESDHC_CTRL_8BITBUS  (0x2 << 1)
+
+#define ESDHC_PRNSTS_SDSTB  (1 << 3)
+
 static uint64_t usdhc_read(void *opaque, hwaddr offset, unsigned size)
 {
 SDHCIState *s = SYSBUS_SDHCI(opaque);
-- 
2.38.0




[PATCH v3 0/9] ppc/e500: Add support for two types of flash, cleanup

2022-10-16 Thread Bernhard Beschow
Cover letter:
~

This series adds support for -pflash and direct SD card access to the
PPC e500 boards. The idea is to increase compatibility with "real" firmware
images where only the bare minimum of drivers is compiled in.

The series is structured as follows:

Patches 1-6 perform some general cleanup which paves the way for the rest of
the series.

Patch 7 adds -pflash handling where memory-mapped flash can be added on
user's behalf. That is, the flash memory region in the eLBC is only added if
the -pflash argument is supplied. Note that the cfi01 device model becomes
stricter in checking the size of the emulated flash space.

Patches 8 and 9 add a new device model - the Freescale eSDHC - to the e500
boards which was missing so far.

User documentation is also added as the new features become available.

Tesing done:
* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mtdblock0 nokaslr" -drive
if=pflash,file=rootfs.ext2,format=raw`
* `qemu-system-ppc -M ppce500 -cpu e500mc -m 256 -kernel uImage -append
"console=ttyS0 rootwait root=/dev/mmcblk0" -device sd-card,drive=mydrive -drive
id=mydrive,if=none,file=rootfs.ext2,format=raw`

The load was created using latest Buildroot with `make
qemu_ppc_e500mc_defconfig` where the rootfs was configured to be of ext2 type.
In both cases it was possible to log in and explore the root file system.

v3:
~~~
Phil:
- Also add power-of-2 fix to pflash_cfi02
- Resolve cfi01-specific assertion in e500 code
- Resolve unused define in eSDHC device model
- Resolve redundant alignment checks in eSDHC device model

Bin:
- Add dedicated flash chapter to documentation

Bernhard:
- Use is_power_of_2() instead of ctpop64() for better readability
- Only instantiate eSDHC device model in ppce500 (not used in MPC8544DS)
- Rebase onto gitlab.com/danielhb/qemu/tree/ppc-next

v2:
~~~
Bin:
- Add source for MPC8544DS platform bus' memory map in commit message.
- Keep "ESDHC" in comment referring to Linux driver.
- Use "qemu-system-ppc{64|32} in documentation.
- Use g_autofree in device tree code.
- Remove unneeded device tree properties.
- Error out if pflash size doesn't fit into eLBC memory window.
- Remove unused ESDHC defines.
- Define macro ESDHC_WML for register offset with magic constant.
- Fix some whitespace issues when adding eSDHC device to e500.

Phil:
- Fix tense in commit message.

Bernhard Beschow (9):
  hw/block/pflash_cfi0{1,2}: Error out if device length isn't a power of
two
  hw/{arm,ppc}: Resolve unreachable code
  hw/block/pflash_cfi01: Attach memory region in boards
  hw/block/pflash_cfi02: Attach memory region in boards
  hw/sd/sdhci-internal: Unexport ESDHC defines
  hw/sd/sdhci: Rename ESDHC_* defines to USDHC_*
  hw/ppc/e500: Implement pflash handling
  hw/sd/sdhci: Implement Freescale eSDHC device model
  hw/ppc/e500: Add Freescale eSDHC to e500plat

 docs/system/ppc/ppce500.rst  |  28 
 hw/arm/collie.c  |  20 ++-
 hw/arm/digic_boards.c|  16 +-
 hw/arm/gumstix.c |  24 +--
 hw/arm/mainstone.c   |  15 +-
 hw/arm/musicpal.c|  15 +-
 hw/arm/omap_sx1.c|  25 ++--
 hw/arm/versatilepb.c |  14 +-
 hw/arm/xilinx_zynq.c |  12 +-
 hw/arm/z2.c  |  12 +-
 hw/block/pflash_cfi01.c  |  12 +-
 hw/block/pflash_cfi02.c  |  14 +-
 hw/microblaze/petalogix_ml605_mmu.c  |  16 +-
 hw/microblaze/petalogix_s3adsp1800_mmu.c |  10 +-
 hw/mips/malta.c  |   4 +-
 hw/ppc/Kconfig   |   2 +
 hw/ppc/e500.c|  97 +++-
 hw/ppc/e500.h|   1 +
 hw/ppc/e500plat.c|   1 +
 hw/ppc/sam460ex.c|  19 ++-
 hw/ppc/virtex_ml507.c|   5 +-
 hw/sd/sdhci-internal.h   |  20 ---
 hw/sd/sdhci.c| 183 ---
 hw/sh4/r2d.c |  11 +-
 include/hw/block/flash.h |   7 +-
 include/hw/sd/sdhci.h|   3 +
 26 files changed, 433 insertions(+), 153 deletions(-)

-- 
2.38.0




Re: [PATCH v3] qapi/qmp: Add timestamps to qmp command responses

2022-10-16 Thread Denis Plotnikov



On 14.10.2022 16:19, Daniel P. Berrangé wrote:

On Fri, Oct 14, 2022 at 02:57:06PM +0200, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Fri, Oct 14, 2022 at 11:31:13AM +0200, Markus Armbruster wrote:

Daniel P. Berrangé  writes:


On Thu, Oct 13, 2022 at 05:00:26PM +0200, Markus Armbruster wrote:

Denis Plotnikov  writes:


Add "start" & "end" time values to qmp command responses.

Please spell it QMP.  More of the same below.


These time values are added to let the qemu management layer get the exact
command execution time without any other time variance which might be brought by
other parts of management layer or qemu internals. This is particulary useful
for the management layer logging for later problems resolving.

I'm still having difficulties seeing the value add over existing
tracepoints and logging.

Can you tell me about a problem you cracked (or could have cracked) with
the help of this?

Consider your QMP client is logging all commands and replies in its
own logfile (libvirt can do this). Having this start/end timestamps
included means the QMP client log is self contained.

A QMP client can include client-side timestamps in its log.  What value
is being added by server-side timestamps?  According to the commit
message, it's for getting "the exact command execution time without any
other time variance which might be brought by other parts of management
layer or qemu internals."  Why is that useful?  In particular, why is
excluding network and QEMU queueing delays (inbound and outbound)
useful?

Lets, say some commands normally runs in ~100ms, but occasionally
runs in 2secs, and you want to understand why.

A first step is understanding whether a given command itself is
slow at executing, or whether its execution has merely been
delayed because some other aspect of QEMU has delayed its execution.
If the server timestamps show it was very fast, then that indicates
delayed processing. Thus instead of debugging the slow command, I
can think about what scenarios would be responsible for the delay.
Perhaps a previous QMP command was very slow, or maybe there is
simply a large volume of QMP commands backlogged, or some part of
QEMU got blocked.

Another case would be a command that is normally fast, and sometimes
is slower, but still relatively fast. The network and queueing side
might be a significant enough proportion of the total time to obscure
the slowdown. If you can eliminate the non-execution time, you can
see the performance trends over time to spot the subtle slowdowns
and detect abnormal behaviour before it becomes too terrible.

This is troubleshooting.  Asking for better troubleshooting tools is
fair.

However, the proposed timestamps provide much more limited insight than
existing tracepoints.  For instance, enabling

tracepoints are absolutely great and let you get a hell of alot
more information, *provided* you are in a position to actually
use tracepoints. This is, unfortunately, frequently not the case
when supporting real world production deployments.

Exactly!!! Thanks for the pointing out!


Bug reports from customers typically include little more than a
log file they got from the mgmt client at time the problem happened.
The problem experianced may no longer exist, so asking them to run
a tracepoint script is not possible. They may also be reluctant to
actually run tracepoint scripts on a production system, or simply
lack the ability todo so at all, due to constraints of the deployment
environment. Logs from libvirt are something that are collected by
default for many mgmt apps, or can be turned on by the user with
minimal risk of disruption.

Overall, there's a compelling desire to be proactive in collecting
information ahead of time, that might be useful in diagnosing
future bug reports.


This is the main reason. When you encounter a problem one of the first 
questions is "Was there something similar in the past. Another question 
is how often does it happen.


With the timestamps these questions answering becomes easier.

Another thing is that with the qmp command timestamps you can build a 
monitoring system which will report about the cases when 
execution_time_from_mgmt_perspective - excution_time_qmp_command > 
some_threshold which in turn proactively tell you about the potential 
problems. And then you'll start using the qmp tracepoints (and other 
means) to figure out the real reason of the execution time variance.


Thanks, Denis



So it isn't an 'either / or' decision of QMP reply logs vs use of
tracepoints, both are beneficial, with their own pros/cons.

With regards,
Daniel




[PATCH] vfio/migration: Fix wrong enum usage

2022-10-16 Thread Avihai Horon
vfio_migration_init() initializes VFIOMigration->device_state using enum
of VFIO migration protocol v2. Current implemented protocol is v1 so v1
enum should be used. Fix it.

Fixes: 429c72800654 ("vfio/migration: Fix incorrect initialization value for 
parameters in VFIOMigration")
Signed-off-by: Avihai Horon 
---
 hw/vfio/migration.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index d9598ce070..8dbbfa2c56 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -803,7 +803,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 }
 
 vbasedev->migration = g_new0(VFIOMigration, 1);
-vbasedev->migration->device_state = VFIO_DEVICE_STATE_RUNNING;
+vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
 vbasedev->migration->vm_running = runstate_is_running();
 
 ret = vfio_region_setup(obj, vbasedev, >migration->region,
-- 
2.21.3