Re: [PATCH-for-6.1 07/10] hw/block/pflash_cfi02: Simplify pflash_cfi02_register() prototype

2021-03-28 Thread David Gibson
On Fri, Mar 26, 2021 at 01:27:25AM +0100, Philippe Mathieu-Daudé wrote:
> The previous commit removed the mapping code from TYPE_PFLASH_CFI02.
> pflash_cfi02_register() doesn't use the 'nb_mappings' argument
> anymore. Simply remove it to simplify.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Revieed-by: David Gibson 

> ---
>  include/hw/block/flash.h | 1 -
>  hw/arm/digic_boards.c| 1 -
>  hw/arm/musicpal.c| 1 -
>  hw/arm/xilinx_zynq.c | 2 +-
>  hw/block/pflash_cfi02.c  | 3 +--
>  hw/lm32/lm32_boards.c| 4 ++--
>  hw/ppc/ppc405_boards.c   | 6 +++---
>  hw/sh4/r2d.c | 2 +-
>  8 files changed, 8 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/block/flash.h b/include/hw/block/flash.h
> index 7dde0adcee7..0e5dd818a9d 100644
> --- a/include/hw/block/flash.h
> +++ b/include/hw/block/flash.h
> @@ -36,7 +36,6 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
> hwaddr size,
> BlockBackend *blk,
> uint32_t sector_len,
> -   int nb_mappings,
> int width,
> uint16_t id0, uint16_t id1,
> uint16_t id2, uint16_t id3,
> diff --git a/hw/arm/digic_boards.c b/hw/arm/digic_boards.c
> index 293402b1240..eb694c70d4c 100644
> --- a/hw/arm/digic_boards.c
> +++ b/hw/arm/digic_boards.c
> @@ -128,7 +128,6 @@ static void digic4_add_k8p3215uqb_rom(DigicState *s, 
> hwaddr addr,
>   FLASH_K8P3215UQB_SIZE / 
> FLASH_K8P3215UQB_SECTOR_SIZE);
>  qdev_prop_set_uint32(dev, "sector-length", FLASH_K8P3215UQB_SECTOR_SIZE);
>  qdev_prop_set_uint8(dev, "width", 4); /* 32-bit */
> -qdev_prop_set_uint8(dev, "mappings", 0);
>  qdev_prop_set_uint8(dev, "big-endian", 0);
>  qdev_prop_set_uint16(dev, "id0", 0x00ec);
>  qdev_prop_set_uint16(dev, "id1", 0x007e);
> diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
> index 7d1f2f3fb3f..e882e11df36 100644
> --- a/hw/arm/musicpal.c
> +++ b/hw/arm/musicpal.c
> @@ -1657,7 +1657,6 @@ static void musicpal_init(MachineState *machine)
>  qdev_prop_set_uint32(dev, "num-blocks", flash_size / sector_size);
>  qdev_prop_set_uint32(dev, "sector-length", sector_size);
>  qdev_prop_set_uint8(dev, "width", 2); /* 16-bit */
> -qdev_prop_set_uint8(dev, "mappings", 0);
>  qdev_prop_set_uint8(dev, "big-endian", 0);
>  qdev_prop_set_uint16(dev, "id0", 0x00bf);
>  qdev_prop_set_uint16(dev, "id1", 0x236d);
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 8db6cfd47f5..d12b00e7648 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -220,7 +220,7 @@ static void zynq_init(MachineState *machine)
>  pflash_cfi02_register(0xe200, "zynq.pflash", FLASH_SIZE,
>dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
>FLASH_SECTOR_SIZE, 1,
> -  1, 0x0066, 0x0022, 0x, 0x, 0x0555, 0x2aa,
> +  0x0066, 0x0022, 0x, 0x, 0x0555, 0x2aa,
>0);
>  
>  /* Create the main clock source, and feed slcr with it */
> diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
> index 6f4b3e3c3fe..2b412402fac 100644
> --- a/hw/block/pflash_cfi02.c
> +++ b/hw/block/pflash_cfi02.c
> @@ -968,7 +968,7 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
> hwaddr size,
> BlockBackend *blk,
> uint32_t sector_len,
> -   int nb_mappings, int width,
> +   int width,
> uint16_t id0, uint16_t id1,
> uint16_t id2, uint16_t id3,
> uint16_t unlock_addr0,
> @@ -977,7 +977,6 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
>  {
>  DeviceState *dev = qdev_new(TYPE_PFLASH_CFI02);
>  
> -assert(nb_mappings <= 1);
>  if (blk) {
>  qdev_prop_set_drive(dev, "drive", blk);
>  }
> diff --git a/hw/lm32/lm32_boards.c b/hw/lm32/lm32_boards.c
> index b5d97dd53ed..96877ba7cfb 100644
> --- a/hw/lm32/lm32_boards.c
> +++ b/hw/lm32/lm32_boards.c
> @@ -121,7 +121,7 @@ static void lm32_evr_init(MachineState *machine)
>  pflash_cfi02_register(flash_base, "lm32_evr.flash", flash_size,
>dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
>flash_sector_size,
> -  1, 2, 0x01, 0x7e, 0x43, 0x00, 0x555, 0x2aa, 1);
> +  2, 0x01, 0x7e, 0x43, 0x00, 0x555, 0x2aa, 1);
>  
>  /* create irq lines */
>  env->pic_state = lm32_pic_init(qemu_allocate_irq(cpu_irq_handler, cpu, 
> 0));
> @@ -218,7 +218,7 @@ static void 

Re: [PATCH-for-6.1 06/10] hw/block/pflash_cfi02: Remove pflash_setup_mappings()

2021-03-28 Thread David Gibson
On Fri, Mar 26, 2021 at 01:27:24AM +0100, Philippe Mathieu-Daudé wrote:
> All boards calling pflash_cfi02_register() use nb_mappings=1,
> which does not do any mapping:
> 
>   $ git grep -wl pflash_cfi02_register hw/
>   hw/arm/xilinx_zynq.c
>   hw/block/pflash_cfi02.c
>   hw/lm32/lm32_boards.c
>   hw/ppc/ppc405_boards.c
>   hw/sh4/r2d.c
> 
> We can remove this now unneeded code.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: David Gibson 

> ---
>  hw/block/pflash_cfi02.c | 35 ++-
>  1 file changed, 2 insertions(+), 33 deletions(-)
> 
> diff --git a/hw/block/pflash_cfi02.c b/hw/block/pflash_cfi02.c
> index 02c514fb6e0..6f4b3e3c3fe 100644
> --- a/hw/block/pflash_cfi02.c
> +++ b/hw/block/pflash_cfi02.c
> @@ -75,7 +75,6 @@ struct PFlashCFI02 {
>  uint32_t nb_blocs[PFLASH_MAX_ERASE_REGIONS];
>  uint32_t sector_len[PFLASH_MAX_ERASE_REGIONS];
>  uint32_t chip_len;
> -uint8_t mappings;
>  uint8_t width;
>  uint8_t be;
>  int wcycle; /* if 0, the flash is read normally */
> @@ -92,13 +91,6 @@ struct PFlashCFI02 {
>  uint16_t unlock_addr1;
>  uint8_t cfi_table[0x4d];
>  QEMUTimer timer;
> -/*
> - * The device replicates the flash memory across its memory space.  
> Emulate
> - * that by having a container (.mem) filled with an array of aliases
> - * (.mem_mappings) pointing to the flash memory (.orig_mem).
> - */
> -MemoryRegion mem;
> -MemoryRegion *mem_mappings;/* array; one per mapping */
>  MemoryRegion orig_mem;
>  bool rom_mode;
>  int read_counter; /* used for lazy switch-back to rom mode */
> @@ -158,23 +150,6 @@ static inline void toggle_dq2(PFlashCFI02 *pfl)
>  pfl->status ^= 0x04;
>  }
>  
> -/*
> - * Set up replicated mappings of the same region.
> - */
> -static void pflash_setup_mappings(PFlashCFI02 *pfl)
> -{
> -unsigned i;
> -hwaddr size = memory_region_size(>orig_mem);
> -
> -memory_region_init(>mem, OBJECT(pfl), "pflash", pfl->mappings * 
> size);
> -pfl->mem_mappings = g_new(MemoryRegion, pfl->mappings);
> -for (i = 0; i < pfl->mappings; ++i) {
> -memory_region_init_alias(>mem_mappings[i], OBJECT(pfl),
> - "pflash-alias", >orig_mem, 0, size);
> -memory_region_add_subregion(>mem, i * size, 
> >mem_mappings[i]);
> -}
> -}
> -
>  static void pflash_reset_state_machine(PFlashCFI02 *pfl)
>  {
>  trace_pflash_reset(pfl->name);
> @@ -917,12 +892,7 @@ static void pflash_cfi02_realize(DeviceState *dev, Error 
> **errp)
>  pfl->sector_erase_map = bitmap_new(pfl->total_sectors);
>  
>  pfl->rom_mode = true;
> -if (pfl->mappings > 1) {
> -pflash_setup_mappings(pfl);
> -sysbus_init_mmio(SYS_BUS_DEVICE(dev), >mem);
> -} else {
> -sysbus_init_mmio(SYS_BUS_DEVICE(dev), >orig_mem);
> -}
> +sysbus_init_mmio(SYS_BUS_DEVICE(dev), >orig_mem);
>  
>  timer_init_ns(>timer, QEMU_CLOCK_VIRTUAL, pflash_timer, pfl);
>  pfl->status = 0;
> @@ -950,7 +920,6 @@ static Property pflash_cfi02_properties[] = {
>  DEFINE_PROP_UINT32("num-blocks3", PFlashCFI02, nb_blocs[3], 0),
>  DEFINE_PROP_UINT32("sector-length3", PFlashCFI02, sector_len[3], 0),
>  DEFINE_PROP_UINT8("width", PFlashCFI02, width, 0),
> -DEFINE_PROP_UINT8("mappings", PFlashCFI02, mappings, 0),
>  DEFINE_PROP_UINT8("big-endian", PFlashCFI02, be, 0),
>  DEFINE_PROP_UINT16("id0", PFlashCFI02, ident0, 0),
>  DEFINE_PROP_UINT16("id1", PFlashCFI02, ident1, 0),
> @@ -1008,6 +977,7 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
>  {
>  DeviceState *dev = qdev_new(TYPE_PFLASH_CFI02);
>  
> +assert(nb_mappings <= 1);
>  if (blk) {
>  qdev_prop_set_drive(dev, "drive", blk);
>  }
> @@ -1015,7 +985,6 @@ PFlashCFI02 *pflash_cfi02_register(hwaddr base,
>  qdev_prop_set_uint32(dev, "num-blocks", size / sector_len);
>  qdev_prop_set_uint32(dev, "sector-length", sector_len);
>  qdev_prop_set_uint8(dev, "width", width);
> -qdev_prop_set_uint8(dev, "mappings", nb_mappings);
>  qdev_prop_set_uint8(dev, "big-endian", !!be);
>  qdev_prop_set_uint16(dev, "id0", id0);
>  qdev_prop_set_uint16(dev, "id1", id1);

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [PATCH 0/6] Add debug interface to kick/call on purpose

2021-03-28 Thread Jason Wang



在 2021/3/27 上午5:16, Dongli Zhang 写道:

Hi Jason,

On 3/26/21 12:24 AM, Jason Wang wrote:

在 2021/3/26 下午1:44, Dongli Zhang 写道:

The virtio device/driver (e.g., vhost-scsi or vhost-net) may hang due to
the loss of doorbell kick, e.g.,

https://urldefense.com/v3/__https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg01711.html__;!!GqivPVa7Brio!KS3pAU2cKjz4wgI4QSlE-YsJPhPG71nkE5_tGhaOf7mi_xvNxbvKkfn03rk5BNDLSEU$

... or due to the loss of IRQ, e.g., as fixed by linux kernel commit
fe200ae48ef5 ("genirq: Mark polled irqs and defer the real handler").

This patch introduces a new debug interface 'DeviceEvent' to DeviceClass
to help narrow down if the issue is due to loss of irq/kick. So far the new
interface handles only two events: 'call' and 'kick'. Any device (e.g.,
virtio/vhost or VFIO) may implement the interface (e.g., via eventfd, MSI-X
or legacy IRQ).

The 'call' is to inject irq on purpose by admin for a specific device (e.g.,
vhost-scsi) from QEMU/host to VM, while the 'kick' is to kick the doorbell
on purpose by admin at QEMU/host side for a specific device.


This device can be used as a workaround if call/kick is lost due to
virtualization software (e.g., kernel or QEMU) issue.

We may also implement the interface for VFIO PCI, e.g., to write to
VFIOPCIDevice->msi_vectors[i].interrupt will be able to inject IRQ to VM
on purpose. This is considered future work once the virtio part is done.


Below is from live crash analysis. Initially, the queue=2 has count=15 for
'kick' eventfd_ctx. Suppose there is data in vring avail while there is no
used available. We suspect this is because vhost-scsi was not notified by
VM. In order to narrow down and analyze the issue, we use live crash to
dump the current counter of eventfd for queue=2.

crash> eventfd_ctx 8f67f6bbe700
struct eventfd_ctx {
    kref = {
  refcount = {
    refs = {
  counter = 4
    }
  }
    },
    wqh = {
  lock = {
    {
  rlock = {
    raw_lock = {
  val = {
    counter = 0
  }
    }
  }
    }
  },
  head = {
    next = 0x8f841dc08e18,
    prev = 0x8f841dc08e18
  }
    },
    count = 15, ---> eventfd is 15 !!!
    flags = 526336
}

Now we kick the doorbell for vhost-scsi queue=2 on purpose for diagnostic
with this interface.

{ "execute": "x-debug-device-event",
    "arguments": { "dev": "/machine/peripheral/vscsi0",
   "event": "kick", "queue": 2 } }

The counter is increased to 16. Suppose the hang issue is resolved, it
indicates something bad is in software that the 'kick' is lost.

What do you mean by "software" here? And it looks to me you're testing whether
event_notifier_set() is called by virtio_queue_notify() here. If so, I'm not
sure how much value could we gain from a dedicated debug interface like this
consider there're a lot of exisinting general purpose debugging method like
tracing or gdb. I'd say the path from virtio_queue_notify() to
event_notifier_set() is only a very small fraction of the process of virtqueue
kick which is unlikey to be buggy. Consider usually the ioeventfd will be
offloaded to KVM, it's more a chance that something is wrong in setuping
ioeventfd instead of here. Irq is even more complicated.

Thank you very much!

I am not testing whether event_notifier_set() is called by 
virtio_queue_notify().

The 'software' indicates the data processing and event notification mechanism
involved with virtio/vhost PV driver frontend. E.g., while VM is waiting for an
extra IRQ, vhost side did not trigger IRQ, suppose vring_need_event()
erroneously returns false due to corrupted ring buffer status.



So there could be several factors that may block the notification:

1) eventfd bug (ioeventfd vs irqfd)
2) wrong virtqueue state (either driver or device)
3) missing barriers (either driver or device)
4) Qemu bug (irqchip and routing)
...

Consider we want to debug virtio issue, only 2) or 3) is what we really 
cared.


So for kick you did (assume vhost is on):

virtio_device_event_kick()
    virtio_queue_notify()
        event_notifier_set()

It looks to me you're actaully testing if ioeventfd is correctly set by 
Qemu.


For call you did:

virtio_device_event_call()
    event_notifier_set()

A test of irqfd is correctly set by Qemu. So all of those are not virtio 
specific stuffs but you introduce virtio specific command to do debug 
non virtio functions.


In the case of what you mentioned for vring_need_event(), what we really 
want is to dump the virtqueue state from the guest. This might requries 
some work of extending virtio spec (e.g to dump device status like 
indices, event, wrap counters).




This was initially proposed for vhost only and I was going to export
ioeventfd/irqfd from vhost to admin via sysfs. Finally, I realized I would
better implement this at QEMU.

The QEMU inits the eventfd (ioeventfd and irqfd), and offloads them to
KVM/vhost. The VM