date:20240520

Re: [PATCH] hw/riscv/virt: Add hotplugging and virtio-md-pci support

2024-05-20 Thread Björn Töpel

Daniel Henrique Barboza  writes:

> On 5/20/24 15:51, Björn Töpel wrote:
>> Daniel/David,
>> 
>> Daniel Henrique Barboza  writes:
>> 
>>> On 5/18/24 16:50, David Hildenbrand wrote:

 Hi,


>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>> index 4fdb66052587..16c2bdbfe6b6 100644
>> --- a/hw/riscv/virt.c
>> +++ b/hw/riscv/virt.c
>> @@ -53,6 +53,8 @@
>>     #include "hw/pci-host/gpex.h"
>>     #include "hw/display/ramfb.h"
>>     #include "hw/acpi/aml-build.h"
>> +#include "hw/mem/memory-device.h"
>> +#include "hw/virtio/virtio-mem-pci.h"
>>     #include "qapi/qapi-visit-common.h"
>>     #include "hw/virtio/virtio-iommu.h"
>> @@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState 
>> *machine)
>>     DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>>     int i, base_hartid, hart_count;
>>     int socket_count = riscv_socket_count(machine);
>> +    hwaddr device_memory_base, device_memory_size;
>>     /* Check socket count limit */
>>     if (VIRT_SOCKETS_MAX < socket_count) {
>> @@ -1553,6 +1556,25 @@ static void virt_machine_init(MachineState 
>> *machine)
>>     memory_region_add_subregion(system_memory, 
>> memmap[VIRT_MROM].base,
>>     mask_rom);
>> +    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + 
>> machine->ram_size,
>> +  GiB);
>> +    device_memory_size = machine->maxram_size - machine->ram_size;
>> +
>> +    if (riscv_is_32bit(>soc[0])) {
>> +    hwaddr memtop = device_memory_base + 
>> ROUND_UP(device_memory_size, GiB);
>> +
>> +    if (memtop > UINT32_MAX) {
>> +    error_report("Memory exceeds 32-bit limit by %lu bytes",
>> + memtop - UINT32_MAX);
>> +    exit(EXIT_FAILURE);
>> +    }
>> +    }
>> +
>> +    if (device_memory_size > 0) {
>> +    machine_memory_devices_init(machine, device_memory_base,
>> +    device_memory_size);
>> +    }
>> +
>
> I think we need a design discussion before proceeding here. You're 
> allocating all
> available memory as a memory device area, but in theory we might also 
> support
> pc-dimm hotplugs (which would be the equivalent of adding physical RAM 
> dimms to
> the board.) in the future too. If you're not familiar with this feature 
> you can
> check it out the docs in [1].

 Note that DIMMs are memory devices as well. You can plug into the memory 
 device area both, ACPI-based memory devices (DIMM, NVDIMM) or virtio-based 
 memory devices (virtio-mem, virtio-pmem).

>
> As an example, the 'virt' ARM board (hw/arm/virt.c) reserves a space for 
> this
> type of hotplug by checking how much 'ram_slots' we're allocating for it:
>
> device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
>

 Note that we increased the region size to be able to fit most requests 
 even if alignment of memory devices is weird. See below.

 In sane setups, this is usually not required (adding a single additional 
 GB for some flexiility might be good enough).

> Other boards do the same with ms->ram_slots. We should consider doing it 
> as well,
> now, even if we're not up to the point of supporting pc-dimm hotplug, to 
> avoid
> having to change the memory layout later in the road and breaking existing
> setups.
>
> If we want to copy the ARM board, ram_slots is capped to 
> ACPI_MAX_RAM_SLOTS (256).
> Each RAM slot is considered to be a 1GiB dimm, i.e. we would reserve 
> 256GiB for
> them.

 This only reserves some *additional* space to fixup weird alignment of 
 memory devices. *not* the actual space for these devices.

 We don't consider each DIMM to be 1 GiB in size, but add an additional 1 
 GiB in case we have to align DIMMs in physical address space.

 I *think* this dates back to old x86 handling where we aligned the address 
 of each DIMM to be at a 1 GiB boundary. So if you would have plugged two 
 128 MiB DIMMs, you'd have required more than 256 MiB of space in the area 
 after aligning inside the memory device area.

>>>
>>> Thanks for the explanation. I missed the part where the ram_slots were being
>>> used just to solve potential alignment issues and pc-dimms could occupy the 
>>> same
>>> space being allocated via machine_memory_devices_init().
>>>
>>> This patch isn't far off then. If we take care to avoid plugging unaligned 
>>> memory
>>> we might not even need this spare area.
>> 
>> I'm a bit lost here, so please bare with me. We don't require the 1 GiB
>> alignment on RV AFAIU. I'm having a hard time figuring out what missing

Re: [PATCH v3 2/2] cxl/core: add poison creation event handler

2024-05-20 Thread Shiyang Ruan via





在 2024/5/3 19:32, Shiyang Ruan 写道:



在 2024/4/24 2:40, Dan Williams 写道:

Shiyang Ruan wrote:

Currently driver only traces cxl events, poison creation (for both vmem
and pmem type) on cxl memdev is silent.


As it should be.


OS needs to be notified then it could handle poison pages in time.


No, it was always the case that latent poison is an "action optional"
event. I am not understanding the justification for this approach. What
breaks if the kernel does not forward events to memory_failure_queue()?


I think for type3(pmem) device, it should be handled like NVDIMM.  If 
there are processes or filesystems running on it, they could be notified 
then operate a friendly shutdown if POISON happens.




Consider that in the CPU consumption case that the firmware first path
will do its own memory_failure_queue() and in the native case the MCE
handler will take care of this. So that leaves pages that are accessed
by DMA or background operation that encounter poison. Those are "action
optional" scenarios and it is not clear to me how the driver tells the
difference.


So for real CXL device, it always use FW-First path to notify such 
failure event?  Then, there is nothing to do with OS-First path?




This needs more precision on which agent is repsonsible for what level
of reporting. The distribution of responsibility between ACPI GHES,
EDAC, and the CXL driver is messy and I expect this changelog to
demonstrate it understands all those considerations.


Ok, I'll try to understand them.


Hi Dan,

I checked the GHES, EDAC codes. I think they belong to FW-First path. 
GHES polls mem errors, then

 1. report by EDAC
 2. construct a MCE, mce_log(), and handle in work queue
 3. queue it into memory_failure right now if needed (sync, ...)
And community is adding CXL FW-First trace support[1].

But in OS-First path, error record is sent to CXL driver via MSI, it 
won't conflict with FW-First path, I think.


[1] 
https://lore.kernel.org/linux-cxl/43ab39e9-c9c2-bfe4-7d1c-bad462221...@amd.com/T/#t







Per CXL spec, the device error event could be signaled through
FW-First and OS-First methods.

So, add poison creation event handler in OS-First method:
   - Qemu:


Why is QEMU relevant for this patch? QEMU is only a development vehicle
the upstream enabling should be reference shipping or expected to be
shipping hardware implementations.


Yes, but currently we don't have a real CXL device so developing and 
verification could only be done on Qemu with simulated CXL device.





 - CXL device reports POISON creation event to OS by MSI by sending
   GMER/DER after injecting a poison record;


When you say "inject" here do you mean "add to the poison list if
present". Because "inject" to me means the "Inject Poison" Memory Device
Command.


It's a Qemu qmp command called "cxl-inject-poison", which only adds a 
given address,length record to CXL's poison list, doesn't send 
INJECT_POISON mbox command.





   - CXL driver:
 a. parse the POISON event from GMER/DER;
 b. translate poisoned DPA to HPA (PFN);
 c. enqueue poisoned PFN to memory_failure's work queue;

Signed-off-by: Shiyang Ruan 
---
  drivers/cxl/core/mbox.c   | 119 +-
  drivers/cxl/cxlmem.h  |   8 +--
  include/linux/cxl-event.h |  18 +-
  3 files changed, 125 insertions(+), 20 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f0f54aeccc87..76af0d73859d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -837,25 +837,116 @@ int cxl_enumerate_cmds(struct cxl_memdev_state 
*mds)

  }
  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-    enum cxl_event_log_type type,
-    enum cxl_event_type event_type,
-    const uuid_t *uuid, union cxl_event *evt)
+static void cxl_report_poison(struct cxl_memdev *cxlmd, struct 
cxl_region *cxlr,

+  u64 dpa)
  {
-    if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
+    u64 hpa = cxl_trace_hpa(cxlr, cxlmd, dpa);
+    unsigned long pfn = PHYS_PFN(hpa);
+
+    if (!IS_ENABLED(CONFIG_MEMORY_FAILURE))
+    return;


No need for this check, memory_failure_queue() is already stubbed out in
the CONFIG_MEMORY_FAILURE=n case.

Yes, I'm overthinking it.




+    memory_failure_queue(pfn, MF_ACTION_REQUIRED);


My expectation is MF_ACTION_REQUIRED is not appropriate for CXL event
reported errors since action is only required for direct consumption
events and those need not be reported through the device event queue.

Got it.


I'm not very sure about 'Host write/read' type.  In my opinion, these 
two types of event should be sent from device when CPU is accessing a 
bad memory address, they could be thought of a sync event which needs 
the 'MF_ACTION_REQUIRED' flag.  Then, we can determine the flag by the 
types like this:

- CXL_EVENT_TRANSACTION_READ | CXL_EVENT_TRANSACTION_WRITE

Re: [PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-20 Thread Thomas Huth


On 21/05/2024 07.08, Thomas Huth wrote:

On 20/05/2024 19.47, Zide Chen wrote:

Both cpu-pm and mem-lock are related to system resource overcommit, but
they are separate from each other, in terms of how they are realized,
and of course, they are applied to different system resources.

It's tempting to use separate command lines to specify their behavior.
e.g., in the following example, the cpu-pm command is quietly
overwritten, and it's not easy to notice it without careful inspection.

   --overcommit mem-lock=on
   --overcommit cpu-pm=on

Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
Signed-off-by: Zide Chen 
---
  system/vl.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5b8..ed682643805b 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
  if (!opts) {
  exit(1);
  }
-    enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
-    enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
+
+    /* Don't override the -overcommit option if set */
+    enable_mlock = enable_mlock ||
+    qemu_opt_get_bool(opts, "mem-lock", false);
+    enable_cpu_pm = enable_cpu_pm ||
+    qemu_opt_get_bool(opts, "cpu-pm", false);
  break;
  case QEMU_OPTION_compat:
  {


Reviewed-by: Thomas Huth 


Ah, wait, actually, this is a bad idea, too, since now you cannot disable an 
enabled option anymore. Imagine that you have for example enabled the option 
in the config file, and now you'd like to disable it on the command line 
again - you're stuck with the enabled setting in that case.


I think the better solution is to replace the "false" default value at the end:

enable_mlock = qemu_opt_get_bool(opts, "mem-lock", enable_mlock);
enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", enable_cpu_pm);

What do you think about this?

 Thomas

Re: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry

2024-05-20 Thread CLEMENT MATHIEU--DRIF


On 21/05/2024 05:11, Duan, Zhenzhong wrote:
> Caution: External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
>
>> -Original Message-
>> From: CLEMENT MATHIEU--DRIF 
>> Subject: Re: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when
>> creating an instance of IOMMUTLBEntry
>>
>>
>> On 17/05/2024 12:40, Duan, Zhenzhong wrote:
>>> Caution: External email. Do not open attachments or click links, unless this
>> email comes from a known sender and you know the content is safe.
>>>
 -Original Message-
 From: CLEMENT MATHIEU--DRIF 
 Subject: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when
 creating an instance of IOMMUTLBEntry

 Signed-off-by: Clément Mathieu--Drif > d...@eviden.com>
 ---
 hw/i386/intel_iommu.c | 7 +++
 1 file changed, 7 insertions(+)

 diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
 index 53f17d66c0..c4ebd4569e 100644
 --- a/hw/i386/intel_iommu.c
 +++ b/hw/i386/intel_iommu.c
 @@ -2299,6 +2299,7 @@ out:
   entry->translated_addr = vtd_get_slpte_addr(pte, s->aw_bits) &
 page_mask;
   entry->addr_mask = ~page_mask;
   entry->perm = access_flags;
 +entry->pasid = pasid;
>>> For PCI_NO_PASID, do we want to assign PCI_NO_PASID or rid2pasid?
>> we have the following statement a few lines above :
>> if (rid2pasid) {
>>  pasid = VTD_CE_GET_RID2PASID();
>> }
>>
>> so we store rid2pasid if the feature is enabled.
>>
>> But maybe we should store PCI_NO_PASID because the rest of the world is
>> not supposed to be aware of what we are doing with rid2pasid.
>>
>> Does it look good to you?
> Yes, that make sense.
ok, will do
>
>>> Thanks
>>> Zhenzhong
>>>
   return true;

 error:
 @@ -2307,6 +2308,7 @@ error:
   entry->translated_addr = 0;
   entry->addr_mask = 0;
   entry->perm = IOMMU_NONE;
 +entry->pasid = PCI_NO_PASID;
   return false;
 }

 @@ -3497,6 +3499,7 @@ static void
 vtd_piotlb_pasid_invalidate_notify(IntelIOMMUState *s,
   event.entry.target_as = _space_memory;
   event.entry.iova = notifier->start;
   event.entry.perm = IOMMU_NONE;
 +event.entry.pasid = pasid;
   event.entry.addr_mask = notifier->end - notifier->start;
   event.entry.translated_addr = 0;

 @@ -3678,6 +3681,7 @@ static void
 vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
   event.entry.target_as = _space_memory;
   event.entry.iova = addr;
   event.entry.perm = IOMMU_NONE;
 +event.entry.pasid = pasid;
   event.entry.addr_mask = size - 1;
   event.entry.translated_addr = 0;

 @@ -4335,6 +4339,7 @@ static void
 do_invalidate_device_tlb(VTDAddressSpace *vtd_dev_as,
   event.entry.iova = addr;
   event.entry.perm = IOMMU_NONE;
   event.entry.translated_addr = 0;
 +event.entry.pasid = vtd_dev_as->pasid;
   memory_region_notify_iommu(_dev_as->iommu, 0, event);
 }

 @@ -4911,6 +4916,7 @@ static IOMMUTLBEntry
 vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
   IOMMUTLBEntry iotlb = {
   /* We'll fill in the rest later. */
   .target_as = _space_memory,
 +.pasid = vtd_as->pasid,
   };
   bool success;

 @@ -4923,6 +4929,7 @@ static IOMMUTLBEntry
 vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
   iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
   iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
   iotlb.perm = IOMMU_RW;
 +iotlb.pasid = PCI_NO_PASID;
   success = true;
   }

 --
 2.44.0

Re: [PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-20 Thread Thomas Huth


On 20/05/2024 19.47, Zide Chen wrote:

Both cpu-pm and mem-lock are related to system resource overcommit, but
they are separate from each other, in terms of how they are realized,
and of course, they are applied to different system resources.

It's tempting to use separate command lines to specify their behavior.
e.g., in the following example, the cpu-pm command is quietly
overwritten, and it's not easy to notice it without careful inspection.

   --overcommit mem-lock=on
   --overcommit cpu-pm=on

Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
Signed-off-by: Zide Chen 
---
  system/vl.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5b8..ed682643805b 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
  if (!opts) {
  exit(1);
  }
-enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
-enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
+
+/* Don't override the -overcommit option if set */
+enable_mlock = enable_mlock ||
+qemu_opt_get_bool(opts, "mem-lock", false);
+enable_cpu_pm = enable_cpu_pm ||
+qemu_opt_get_bool(opts, "cpu-pm", false);
  break;
  case QEMU_OPTION_compat:
  {


Reviewed-by: Thomas Huth

Re: [PATCH] hw/loongarch/virt: Fix FDT memory node address width

2024-05-20 Thread gaosong


在 2024/5/21 上午5:06, Jiaxun Yang 写道:

Higher bits for memory nodes were omitted at qemu_fdt_setprop_cells.

Signed-off-by: Jiaxun Yang 
---
This should be stable backported, otherwise DT boot is totally broken.
---
  hw/loongarch/virt.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

Thank you.

Reviewed-by: Song Gao 

Thanks.
Song Gao

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index f0640d2d8035..f97626bacf65 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -463,7 +463,8 @@ static void fdt_add_memory_node(MachineState *ms,
  char *nodename = g_strdup_printf("/memory@%" PRIx64, base);
  
  qemu_fdt_add_subnode(ms->fdt, nodename);

-qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0, base, 0, size);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", base >> 32, base,
+   size >> 32, size);
  qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "memory");
  
  if (ms->numa_state && ms->numa_state->num_nodes) {


---
base-commit: 85ef20f1673feaa083f4acab8cf054df77b0dbed
change-id: 20240520-loongarch-fdt-memnode-e36c01ae9b6e

Best regards,

RE: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when creating an instance of IOMMUTLBEntry

2024-05-20 Thread Duan, Zhenzhong



>-Original Message-
>From: CLEMENT MATHIEU--DRIF 
>Subject: Re: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when
>creating an instance of IOMMUTLBEntry
>
>
>On 17/05/2024 12:40, Duan, Zhenzhong wrote:
>> Caution: External email. Do not open attachments or click links, unless this
>email comes from a known sender and you know the content is safe.
>>
>>
>>> -Original Message-
>>> From: CLEMENT MATHIEU--DRIF 
>>> Subject: [PATCH ats_vtd v2 20/25] intel_iommu: fill the PASID field when
>>> creating an instance of IOMMUTLBEntry
>>>
>>> Signed-off-by: Clément Mathieu--Drif d...@eviden.com>
>>> ---
>>> hw/i386/intel_iommu.c | 7 +++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index 53f17d66c0..c4ebd4569e 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -2299,6 +2299,7 @@ out:
>>>  entry->translated_addr = vtd_get_slpte_addr(pte, s->aw_bits) &
>>> page_mask;
>>>  entry->addr_mask = ~page_mask;
>>>  entry->perm = access_flags;
>>> +entry->pasid = pasid;
>> For PCI_NO_PASID, do we want to assign PCI_NO_PASID or rid2pasid?
>we have the following statement a few lines above :
>if (rid2pasid) {
>     pasid = VTD_CE_GET_RID2PASID();
>}
>
>so we store rid2pasid if the feature is enabled.
>
>But maybe we should store PCI_NO_PASID because the rest of the world is
>not supposed to be aware of what we are doing with rid2pasid.
>
>Does it look good to you?

Yes, that make sense.

>>
>> Thanks
>> Zhenzhong
>>
>>>  return true;
>>>
>>> error:
>>> @@ -2307,6 +2308,7 @@ error:
>>>  entry->translated_addr = 0;
>>>  entry->addr_mask = 0;
>>>  entry->perm = IOMMU_NONE;
>>> +entry->pasid = PCI_NO_PASID;
>>>  return false;
>>> }
>>>
>>> @@ -3497,6 +3499,7 @@ static void
>>> vtd_piotlb_pasid_invalidate_notify(IntelIOMMUState *s,
>>>  event.entry.target_as = _space_memory;
>>>  event.entry.iova = notifier->start;
>>>  event.entry.perm = IOMMU_NONE;
>>> +event.entry.pasid = pasid;
>>>  event.entry.addr_mask = notifier->end - notifier->start;
>>>  event.entry.translated_addr = 0;
>>>
>>> @@ -3678,6 +3681,7 @@ static void
>>> vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id,
>>>  event.entry.target_as = _space_memory;
>>>  event.entry.iova = addr;
>>>  event.entry.perm = IOMMU_NONE;
>>> +event.entry.pasid = pasid;
>>>  event.entry.addr_mask = size - 1;
>>>  event.entry.translated_addr = 0;
>>>
>>> @@ -4335,6 +4339,7 @@ static void
>>> do_invalidate_device_tlb(VTDAddressSpace *vtd_dev_as,
>>>  event.entry.iova = addr;
>>>  event.entry.perm = IOMMU_NONE;
>>>  event.entry.translated_addr = 0;
>>> +event.entry.pasid = vtd_dev_as->pasid;
>>>  memory_region_notify_iommu(_dev_as->iommu, 0, event);
>>> }
>>>
>>> @@ -4911,6 +4916,7 @@ static IOMMUTLBEntry
>>> vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>>>  IOMMUTLBEntry iotlb = {
>>>  /* We'll fill in the rest later. */
>>>  .target_as = _space_memory,
>>> +.pasid = vtd_as->pasid,
>>>  };
>>>  bool success;
>>>
>>> @@ -4923,6 +4929,7 @@ static IOMMUTLBEntry
>>> vtd_iommu_translate(IOMMUMemoryRegion *iommu, hwaddr addr,
>>>  iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
>>>  iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
>>>  iotlb.perm = IOMMU_RW;
>>> +iotlb.pasid = PCI_NO_PASID;
>>>  success = true;
>>>  }
>>>
>>> --
>>> 2.44.0

Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-20 Thread Jason Wang

On Tue, May 21, 2024 at 6:23 AM Fabiano Rosas  wrote:
>
> Fiona Ebner  writes:
>
> > Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
> > version 8.1 can fail with:
> >
> >> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 
> >> 0x10179bfffe7
> >> kvm: Failed to load virtio-net:virtio
> >> kvm: error while loading state for instance 0x0 of device 
> >> ':00:12.0/virtio-net'
> >> kvm: load of migration failed: Operation not permitted
> >
> > The series
> >
> > 53da8b5a99 virtio-net: Add support for USO features
> > 9da1684954 virtio-net: Add USO flags to vhost support.
> > f03e0cf63b tap: Add check for USO features
> > 2ab0ec3121 tap: Add USO support to tap device.
> >
> > only landed in QEMU 8.2, so the compatibility flags should be part of
> > machine version 8.1.
> >
> > Moving the flags unfortunately breaks forward migration with machine
> > version 8.1 from a binary without this patch to a binary with this
> > patch.
> >
> > Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
> > Signed-off-by: Fiona Ebner 
>
> Reviewed-by: Fabiano Rosas 
>
> I'll get to it eventually, but is this another one where just having
> -device virtio-net in the command line when testing cross-version
> migration would already have caught the issue?

Yes if you are using Qemu >= 8.2. Qemu has a default machine type for
each version.

Thanks

>

Re: [PATCH] intel_iommu: Use the latest fault reasons defined by spec

2024-05-20 Thread Jason Wang

On Mon, May 20, 2024 at 12:15 PM Liu, Yi L  wrote:
>
> > From: Duan, Zhenzhong 
> > Sent: Monday, May 20, 2024 11:41 AM
> >
> >
> >
> > >-Original Message-
> > >From: Jason Wang 
> > >Sent: Monday, May 20, 2024 8:44 AM
> > >To: Duan, Zhenzhong 
> > >Cc: qemu-devel@nongnu.org; Liu, Yi L ; Peng, Chao P
> > >; Yu Zhang ; Michael
> > >S. Tsirkin ; Paolo Bonzini ;
> > >Richard Henderson ; Eduardo Habkost
> > >; Marcel Apfelbaum 
> > >Subject: Re: [PATCH] intel_iommu: Use the latest fault reasons defined by
> > >spec
> > >
> > >On Fri, May 17, 2024 at 6:26 PM Zhenzhong Duan
> > > wrote:
> > >>
> > >> From: Yu Zhang 
> > >>
> > >> Currently we use only VTD_FR_PASID_TABLE_INV as fault reason.
> > >> Update with more detailed fault reasons listed in VT-d spec 7.2.3.
> > >>
> > >> Signed-off-by: Yu Zhang 
> > >> Signed-off-by: Zhenzhong Duan 
> > >> ---
> > >
> > >I wonder if this could be noticed by the guest or not. If yes should
> > >we consider starting to add thing like version to vtd emulation code?
> >
> > Kernel only dumps the reason like below:
> >
> > DMAR: [DMA Write NO_PASID] Request device [20:00.0] fault addr 0x123460
> > [fault reason 0x71] SM: Present bit in first-level paging entry is clear
>
> Yes, guest kernel would notice it as the fault would be injected to vm.
>
> > Maybe bump 1.0 -> 1.1?
> > My understanding version number is only informational and is far from
> > accurate to mark if a feature supported. Driver should check cap/ecap
> > bits instead.
>
> Should the version ID here be aligned with VT-d spec?

Probably, this might be something that could be noticed by the
management to migration compatibility.

> If yes, it should
> be 3.0 as the scalable mode was introduced in spec 3.0. And the fault
> code was redefined together with the introduction of this translation
> mode. Below is the a snippet from the change log of VT-d spec.
>
> June 2018 3.0
> • Removed all text related to Extended-Mode.
> • Added support for scalable-mode translation for DMA Remapping, that enables 
> PASIDgranular first-level, second-level, nested and pass-through translation 
> functions.
> • Widen invalidation queue descriptors and page request queue descriptors 
> from 128 bits
> to 256 bits and redefined page-request and page-response descriptors.
> • Listed all fault conditions in a unified table and described DMA Remapping 
> hardware
> behavior under each condition. Assigned new code for each fault condition in 
> scalablemode operation.
> • Added support for Accessed/Dirty (A/D) bits in second-level translation.
> • Added support for submitting commands and receiving response from virtual 
> DMA
> Remapping hardware.
> • Added a table on snooping behavior and memory type of hardware access to 
> various
> remapping structures as appendix.
> • Move Page Request Overflow (PRO) fault reporting from Fault Status register
> (FSTS_REG) to Page Request Status register (PRS_REG).
>
> Regards.
> Yi Liu

Thanks

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-20 Thread Peter Xu

Conference back then pto until today, so tomorrow will be my first working
day after those. Sorry Steve, will try my best to read it before next week.
I didn't dare to read too much my inbox yet.  A bit scared but need to face
it tomorrow.

On Mon, May 20, 2024, 6:28 p.m. Fabiano Rosas  wrote:

> Steven Sistare  writes:
>
> > Hi Peter, Hi Fabiano,
> >Will you have time to review the migration guts of this series any
> time soon?
> > In particular:
> >
> > [PATCH V1 05/26] migration: precreate vmstate
> > [PATCH V1 06/26] migration: precreate vmstate for exec
> > [PATCH V1 12/26] migration: vmstate factory object
> > [PATCH V1 18/26] migration: cpr-exec-args parameter
> > [PATCH V1 20/26] migration: cpr-exec mode
> >
>
> I'll get to them this week. I'm trying to make some progress with my own
> code before I forget how to program. I'm also trying to find some time
> to implement the device options in the migration tests so we can stop
> these virtio-* breakages that have been popping up.
>
>

Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-20 Thread Jason Wang

On Fri, May 17, 2024 at 3:54 PM Fiona Ebner  wrote:
>
> Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
> version 8.1 can fail with:
>
> > kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
> > kvm: Failed to load virtio-net:virtio
> > kvm: error while loading state for instance 0x0 of device 
> > ':00:12.0/virtio-net'
> > kvm: load of migration failed: Operation not permitted
>
> The series
>
> 53da8b5a99 virtio-net: Add support for USO features
> 9da1684954 virtio-net: Add USO flags to vhost support.
> f03e0cf63b tap: Add check for USO features
> 2ab0ec3121 tap: Add USO support to tap device.
>
> only landed in QEMU 8.2, so the compatibility flags should be part of
> machine version 8.1.
>
> Moving the flags unfortunately breaks forward migration with machine
> version 8.1 from a binary without this patch to a binary with this
> patch.
>
> Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
> Signed-off-by: Fiona Ebner 

Acked-by: Jason Wang 

Thanks

[PATCH v2 06/12] target/ppc: Add PPR32 SPR

2024-05-20 Thread Nicholas Piggin

PPR32 provides access to the upper half of PPR.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h|  1 +
 target/ppc/spr_common.h |  2 ++
 target/ppc/cpu_init.c   | 12 
 target/ppc/translate.c  | 16 
 4 files changed, 31 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 2532408be0..141cbefb4c 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2120,6 +2120,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_POWER_MMCRS   (0x37E)
 #define SPR_WORT  (0x37F)
 #define SPR_PPR   (0x380)
+#define SPR_PPR32 (0x382)
 #define SPR_750_GQR0  (0x390)
 #define SPR_440_DNV0  (0x390)
 #define SPR_750_GQR1  (0x391)
diff --git a/target/ppc/spr_common.h b/target/ppc/spr_common.h
index eb2561f593..9e40b3b608 100644
--- a/target/ppc/spr_common.h
+++ b/target/ppc/spr_common.h
@@ -203,6 +203,8 @@ void spr_read_tfmr(DisasContext *ctx, int gprn, int sprn);
 void spr_write_tfmr(DisasContext *ctx, int sprn, int gprn);
 void spr_write_lpcr(DisasContext *ctx, int sprn, int gprn);
 void spr_read_dexcr_ureg(DisasContext *ctx, int gprn, int sprn);
+void spr_read_ppr32(DisasContext *ctx, int sprn, int gprn);
+void spr_write_ppr32(DisasContext *ctx, int sprn, int gprn);
 #endif
 
 void register_low_BATs(CPUPPCState *env);
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 892fb6ce02..7684a59d75 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5623,6 +5623,14 @@ static void register_HEIR64_spr(CPUPPCState *env)
  0x);
 }
 
+static void register_power7_common_sprs(CPUPPCState *env)
+{
+spr_register(env, SPR_PPR32, "PPR32",
+ _read_ppr32, _write_ppr32,
+ _read_ppr32, _write_ppr32,
+ 0x);
+}
+
 static void register_power8_tce_address_control_sprs(CPUPPCState *env)
 {
 spr_register_kvm(env, SPR_TAR, "TAR",
@@ -6118,6 +6126,7 @@ static void init_proc_POWER7(CPUPPCState *env)
 register_power6_common_sprs(env);
 register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
+register_power7_common_sprs(env);
 register_power7_book4_sprs(env);
 
 /* env variables */
@@ -6264,6 +6273,7 @@ static void init_proc_POWER8(CPUPPCState *env)
 register_power6_common_sprs(env);
 register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
+register_power7_common_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
 register_power8_ebb_sprs(env);
@@ -6431,6 +6441,7 @@ static void init_proc_POWER9(CPUPPCState *env)
 register_power6_common_sprs(env);
 register_HEIR32_spr(env);
 register_power6_dbg_sprs(env);
+register_power7_common_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
 register_power8_ebb_sprs(env);
@@ -6625,6 +6636,7 @@ static void init_proc_POWER10(CPUPPCState *env)
 register_power6_common_sprs(env);
 register_HEIR64_spr(env);
 register_power6_dbg_sprs(env);
+register_power7_common_sprs(env);
 register_power8_tce_address_control_sprs(env);
 register_power8_ids_sprs(env);
 register_power8_ebb_sprs(env);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index ca4f4c9371..137370b649 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -1414,6 +1414,22 @@ void spr_read_dexcr_ureg(DisasContext *ctx, int gprn, 
int sprn)
 gen_load_spr(t0, sprn + 16);
 tcg_gen_ext32u_tl(cpu_gpr[gprn], t0);
 }
+
+/* The PPR32 SPR accesses the upper 32-bits of PPR */
+void spr_read_ppr32(DisasContext *ctx, int sprn, int gprn)
+{
+gen_load_spr(cpu_gpr[gprn], SPR_PPR);
+tcg_gen_shri_tl(cpu_gpr[gprn], cpu_gpr[gprn], 32);
+}
+
+void spr_write_ppr32(DisasContext *ctx, int sprn, int gprn)
+{
+TCGv t0 = tcg_temp_new();
+
+tcg_gen_shli_tl(t0, cpu_gpr[gprn], 32);
+gen_store_spr(SPR_PPR, t0);
+spr_store_dump_spr(SPR_PPR);
+}
 #endif
 
 #define GEN_HANDLER(name, opc1, opc2, opc3, inval, type)  \
-- 
2.43.0

[PATCH v2 02/12] target/ppc: improve checkstop logging

2024-05-20 Thread Nicholas Piggin

Change the logging not to print to stderr as well, because a
checkstop is a guest error (or perhaps a simulated machine error)
rather than a QEMU error, so send it to the log.

Update the checkstop message, and log CPU registers too.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index b2b51537b7..17bf8df9d7 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -430,17 +430,19 @@ static void powerpc_mcheck_checkstop(CPUPPCState *env)
 /* KVM guests always have MSR[ME] enabled */
 #ifdef CONFIG_TCG
 CPUState *cs = env_cpu(env);
+FILE *f;
 
 if (FIELD_EX64(env->msr, MSR, ME)) {
 return;
 }
 
-/* Machine check exception is not enabled. Enter checkstop state. */
-fprintf(stderr, "Machine check while not allowed. "
-"Entering checkstop state\n");
-if (qemu_log_separate()) {
-qemu_log("Machine check while not allowed. "
- "Entering checkstop state\n");
+f = qemu_log_trylock();
+if (f) {
+fprintf(f, "Entering checkstop state: "
+   "machine check with MSR[ME]=0\n");
+cpu_dump_state(cs, f, CPU_DUMP_FPU | CPU_DUMP_CCOP);
+qemu_log_unlock(f);
+}
 
 /*
  * This stops the machine and logs CPU state without killing QEMU
-- 
2.43.0

[PATCH v2 08/12] target/ppc: Add SMT support to simple SPRs

2024-05-20 Thread Nicholas Piggin

AMOR, MMCRC, HRMOR, TSCR, HMEER, RPR SPRs are per-core or per-LPAR
registers with simple (generic) implementations.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 7684a59d75..023b58a3ac 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -246,7 +246,7 @@ static void register_amr_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_AMOR, "AMOR",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _core_lpar_write_generic,
 0);
 #endif /* !CONFIG_USER_ONLY */
 }
@@ -5489,7 +5489,7 @@ static void register_book3s_ids_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_MMCRC, "MMCRC",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic32,
+ _read_generic, _core_write_generic32,
  0x);
 spr_register_hv(env, SPR_MMCRH, "MMCRH",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5529,7 +5529,7 @@ static void register_book3s_ids_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_HRMOR, "HRMOR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _core_write_generic,
  0x);
 }
 
@@ -5757,7 +5757,7 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
 spr_register_hv(env, SPR_TSCR, "TSCR",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic32,
+ _read_generic, _core_write_generic32,
  0x);
 spr_register_hv(env, SPR_HMER, "HMER",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5767,7 +5767,7 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
 spr_register_hv(env, SPR_HMEER, "HMEER",
  SPR_NOACCESS, SPR_NOACCESS,
  SPR_NOACCESS, SPR_NOACCESS,
- _read_generic, _write_generic,
+ _read_generic, _core_write_generic,
  0x);
 spr_register_hv(env, SPR_TFMR, "TFMR",
  SPR_NOACCESS, SPR_NOACCESS,
@@ -5843,7 +5843,7 @@ static void register_power8_rpr_sprs(CPUPPCState *env)
 spr_register_hv(env, SPR_RPR, "RPR",
 SPR_NOACCESS, SPR_NOACCESS,
 SPR_NOACCESS, SPR_NOACCESS,
-_read_generic, _write_generic,
+_read_generic, _core_write_generic,
 0x0103070F1F3F);
 #endif
 }
-- 
2.43.0

[PATCH v2 10/12] target/ppc: Implement LDBAR, TTR SPRs

2024-05-20 Thread Nicholas Piggin

LDBAR, TTR are a Power-specific SPRs. These simple implementations
are enough for IBM proprietary firmware for now.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h  |  2 ++
 target/ppc/cpu_init.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 141cbefb4c..823be85d03 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -2098,6 +2098,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_DEXCR (0x33C)
 #define SPR_IC(0x350)
 #define SPR_VTB   (0x351)
+#define SPR_LDBAR (0x352)
 #define SPR_MMCRC (0x353)
 #define SPR_PSSCR (0x357)
 #define SPR_440_INV0  (0x370)
@@ -2144,6 +2145,7 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_440_IVLIM (0x399)
 #define SPR_TSCR  (0x399)
 #define SPR_750_DMAU  (0x39A)
+#define SPR_POWER_TTR (0x39A)
 #define SPR_750_DMAL  (0x39B)
 #define SPR_440_RSTCFG(0x39B)
 #define SPR_BOOKE_DCDBTRL (0x39C)
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 023b58a3ac..7f2f8e5a4a 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5784,6 +5784,16 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
  _access_nop, _write_generic,
  _access_nop, _write_generic,
  0x);
+spr_register_hv(env, SPR_LDBAR, "LDBAR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _core_lpar_write_generic,
+ 0x);
+spr_register_hv(env, SPR_POWER_TTR, "TTR",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _core_write_generic,
+ 0x);
 #endif
 }
 
-- 
2.43.0

[PATCH v2 07/12] target/ppc: add helper to write per-LPAR SPRs

2024-05-20 Thread Nicholas Piggin

An SPR can be either per-thread, per-core, or per-LPAR. Per-LPAR means
per-thread or per-core, depending on 1LPAR mode.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/spr_common.h |  2 ++
 target/ppc/translate.c  | 28 
 2 files changed, 30 insertions(+)

diff --git a/target/ppc/spr_common.h b/target/ppc/spr_common.h
index 9e40b3b608..85f73b860b 100644
--- a/target/ppc/spr_common.h
+++ b/target/ppc/spr_common.h
@@ -83,6 +83,8 @@ void spr_read_generic(DisasContext *ctx, int gprn, int sprn);
 void spr_write_generic(DisasContext *ctx, int sprn, int gprn);
 void spr_write_generic32(DisasContext *ctx, int sprn, int gprn);
 void spr_core_write_generic(DisasContext *ctx, int sprn, int gprn);
+void spr_core_write_generic32(DisasContext *ctx, int sprn, int gprn);
+void spr_core_lpar_write_generic(DisasContext *ctx, int sprn, int gprn);
 void spr_write_MMCR0(DisasContext *ctx, int sprn, int gprn);
 void spr_write_MMCR1(DisasContext *ctx, int sprn, int gprn);
 void spr_write_MMCRA(DisasContext *ctx, int sprn, int gprn);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 137370b649..c688551434 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -535,6 +535,34 @@ void spr_core_write_generic(DisasContext *ctx, int sprn, 
int gprn)
 spr_store_dump_spr(sprn);
 }
 
+void spr_core_write_generic32(DisasContext *ctx, int sprn, int gprn)
+{
+TCGv t0;
+
+if (!(ctx->flags & POWERPC_FLAG_SMT)) {
+spr_write_generic32(ctx, sprn, gprn);
+return;
+}
+
+if (!gen_serialize(ctx)) {
+return;
+}
+
+t0 = tcg_temp_new();
+tcg_gen_ext32u_tl(t0, cpu_gpr[gprn]);
+gen_helper_spr_core_write_generic(tcg_env, tcg_constant_i32(sprn), t0);
+spr_store_dump_spr(sprn);
+}
+
+void spr_core_lpar_write_generic(DisasContext *ctx, int sprn, int gprn)
+{
+if (ctx->flags & POWERPC_FLAG_SMT_1LPAR) {
+spr_core_write_generic(ctx, sprn, gprn);
+} else {
+spr_write_generic(ctx, sprn, gprn);
+}
+}
+
 static void spr_write_CTRL_ST(DisasContext *ctx, int sprn, int gprn)
 {
 /* This does not implement >1 thread */
-- 
2.43.0

[PATCH v2 04/12] target/ppc: BookE DECAR SPR is 32-bit

2024-05-20 Thread Nicholas Piggin

The DECAR SPR is 32-bits width.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index ee01415c32..927721d49a 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -792,7 +792,7 @@ static void register_BookE_sprs(CPUPPCState *env, uint64_t 
ivor_mask)
  0x);
 spr_register(env, SPR_BOOKE_DECAR, "DECAR",
  SPR_NOACCESS, SPR_NOACCESS,
- SPR_NOACCESS, _write_generic,
+ SPR_NOACCESS, _write_generic32,
  0x);
 /* SPRGs */
 spr_register(env, SPR_USPRG0, "USPRG0",
-- 
2.43.0

[PATCH v2 12/12] target/ppc: add SMT support to msgsnd broadcast

2024-05-20 Thread Nicholas Piggin

msgsnd has a broadcast mode that sends hypervisor doorbells to all
threads belonging to the same core as the target. A "subcore" mode
sends to all or one thread depending on 1LPAR mode.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h  |  6 +-
 target/ppc/helper.h   |  2 +-
 target/ppc/excp_helper.c  | 57 +--
 .../ppc/translate/processor-ctrl-impl.c.inc   |  2 +-
 4 files changed, 46 insertions(+), 21 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index e4c342b17d..e201b7f6c2 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1163,7 +1163,11 @@ FIELD(FPSCR, FI, FPSCR_FI, 1)
 
 #define DBELL_TYPE_DBELL_SERVER(0x05 << DBELL_TYPE_SHIFT)
 
-#define DBELL_BRDCAST  PPC_BIT(37)
+#define DBELL_BRDCAST_MASK PPC_BITMASK(37, 38)
+#define DBELL_BRDCAST_SHIFT25
+#define DBELL_BRDCAST_SUBPROC  (0x1 << DBELL_BRDCAST_SHIFT)
+#define DBELL_BRDCAST_CORE (0x2 << DBELL_BRDCAST_SHIFT)
+
 #define DBELL_LPIDTAG_SHIFT14
 #define DBELL_LPIDTAG_MASK (0xfff << DBELL_LPIDTAG_SHIFT)
 #define DBELL_PIRTAG_MASK  0x3fff
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 57bf8354e7..dd92c6a937 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -695,7 +695,7 @@ DEF_HELPER_FLAGS_3(store_sr, TCG_CALL_NO_RWG, void, env, 
tl, tl)
 
 DEF_HELPER_1(msgsnd, void, tl)
 DEF_HELPER_2(msgclr, void, env, tl)
-DEF_HELPER_1(book3s_msgsnd, void, tl)
+DEF_HELPER_2(book3s_msgsnd, void, env, tl)
 DEF_HELPER_2(book3s_msgclr, void, env, tl)
 #endif
 
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index e786a9044b..0a9e8539a4 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -2978,7 +2978,7 @@ void helper_msgsnd(target_ulong rb)
 PowerPCCPU *cpu = POWERPC_CPU(cs);
 CPUPPCState *cenv = >env;
 
-if ((rb & DBELL_BRDCAST) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
+if ((rb & DBELL_BRDCAST_MASK) || (cenv->spr[SPR_BOOKE_PIR] == pir)) {
 ppc_set_irq(cpu, irq, 1);
 }
 }
@@ -2997,6 +2997,16 @@ static bool dbell_type_server(target_ulong rb)
 return (rb & DBELL_TYPE_MASK) == DBELL_TYPE_DBELL_SERVER;
 }
 
+static inline bool dbell_bcast_core(target_ulong rb)
+{
+return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_CORE;
+}
+
+static inline bool dbell_bcast_subproc(target_ulong rb)
+{
+return (rb & DBELL_BRDCAST_MASK) == DBELL_BRDCAST_SUBPROC;
+}
+
 void helper_book3s_msgclr(CPUPPCState *env, target_ulong rb)
 {
 if (!dbell_type_server(rb)) {
@@ -3006,32 +3016,43 @@ void helper_book3s_msgclr(CPUPPCState *env, 
target_ulong rb)
 ppc_set_irq(env_archcpu(env), PPC_INTERRUPT_HDOORBELL, 0);
 }
 
-static void book3s_msgsnd_common(int pir, int irq)
+void helper_book3s_msgsnd(CPUPPCState *env, target_ulong rb)
 {
-CPUState *cs;
+int pir = rb & DBELL_PROCIDTAG_MASK;
+bool brdcast = false;
+CPUState *cs, *ccs;
+PowerPCCPU *cpu;
 
-bql_lock();
-CPU_FOREACH(cs) {
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-CPUPPCState *cenv = >env;
+if (!dbell_type_server(rb)) {
+return;
+}
 
-/* TODO: broadcast message to all threads of the same  processor */
-if (cenv->spr_cb[SPR_PIR].default_value == pir) {
-ppc_set_irq(cpu, irq, 1);
-}
+cpu = ppc_get_vcpu_by_pir(pir);
+if (!cpu) {
+return;
 }
-bql_unlock();
-}
+cs = CPU(cpu);
 
-void helper_book3s_msgsnd(target_ulong rb)
-{
-int pir = rb & DBELL_PROCIDTAG_MASK;
+if (dbell_bcast_core(rb) || (dbell_bcast_subproc(rb) &&
+ (env->flags & POWERPC_FLAG_SMT_1LPAR))) {
+brdcast = true;
+}
 
-if (!dbell_type_server(rb)) {
+if (cs->nr_threads == 1 || !brdcast) {
+ppc_set_irq(cpu, PPC_INTERRUPT_HDOORBELL, 1);
 return;
 }
 
-book3s_msgsnd_common(pir, PPC_INTERRUPT_HDOORBELL);
+/*
+ * Why is bql needed for walking CPU list? Answer seems to be because ppc
+ * irq handling needs it, but ppc_set_irq takes the lock itself if needed,
+ * so could this be removed?
+ */
+bql_lock();
+THREAD_SIBLING_FOREACH(cs, ccs) {
+ppc_set_irq(POWERPC_CPU(ccs), PPC_INTERRUPT_HDOORBELL, 1);
+}
+bql_unlock();
 }
 
 #ifdef TARGET_PPC64
diff --git a/target/ppc/translate/processor-ctrl-impl.c.inc 
b/target/ppc/translate/processor-ctrl-impl.c.inc
index 0142801985..8abbb89630 100644
--- a/target/ppc/translate/processor-ctrl-impl.c.inc
+++ b/target/ppc/translate/processor-ctrl-impl.c.inc
@@ -59,7 +59,7 @@ static bool trans_MSGSND(DisasContext *ctx, arg_X_rb *a)
 
 #if !defined(CONFIG_USER_ONLY)
 if (is_book3s_arch2x(ctx)) {
-gen_helper_book3s_msgsnd(cpu_gpr[a->rb]);
+gen_helper_book3s_msgsnd(tcg_env, cpu_gpr[a->rb]);
 } else {
 gen_helper_msgsnd(cpu_gpr[a->rb]);
 }
--

[PATCH v2 03/12] target/ppc: Implement attn instruction on BookS 64-bit processors

2024-05-20 Thread Nicholas Piggin

attn is an implementation-specific instruction that on POWER (and G5/
970) can be enabled with a HID bit (disabled = illegal), and executing
it causes the host processor to stop and the service processor to be
notified. Generally used for debugging.

Implement attn and make it checkstop the system, which should be good
enough for QEMU debugging.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h | 12 +
 target/ppc/helper.h  |  1 +
 target/ppc/insn32.decode |  4 ++
 target/ppc/cpu_init.c| 69 
 target/ppc/excp_helper.c | 43 +
 target/ppc/translate/misc-impl.c.inc | 10 
 6 files changed, 130 insertions(+), 9 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c358927211..2532408be0 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1375,6 +1375,9 @@ struct CPUArchState {
 /* Power management */
 int (*check_pow)(CPUPPCState *env);
 
+/* attn instruction enable */
+int (*check_attn)(CPUPPCState *env);
+
 #if !defined(CONFIG_USER_ONLY)
 void *load_info;  /* holds boot loading state */
 #endif
@@ -1523,6 +1526,7 @@ struct PowerPCCPUClass {
 int n_host_threads;
 void (*init_proc)(CPUPPCState *env);
 int  (*check_pow)(CPUPPCState *env);
+int  (*check_attn)(CPUPPCState *env);
 };
 
 ObjectClass *ppc_cpu_class_by_name(const char *name);
@@ -2320,6 +2324,8 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define HID0_NAP(1 << 22)   /* pre-2.06 */
 #define HID0_HILE   PPC_BIT(19) /* POWER8 */
 #define HID0_POWER9_HILEPPC_BIT(4)
+#define HID0_ENABLE_ATTNPPC_BIT(31) /* POWER8 */
+#define HID0_POWER9_ENABLE_ATTN PPC_BIT(3)
 
 /*/
 /* PowerPC Instructions types definitions*/
@@ -3025,6 +3031,12 @@ static inline int check_pow_nocheck(CPUPPCState *env)
 return 1;
 }
 
+/* attn enable check */
+static inline int check_attn_none(CPUPPCState *env)
+{
+return 0;
+}
+
 /*/
 /* PowerPC implementations definitions   */
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 55293e20a9..09d50f9b76 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -825,5 +825,6 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
 #if defined(TARGET_PPC64)
 DEF_HELPER_1(clrbhrb, void, env)
 DEF_HELPER_FLAGS_2(mfbhrbe, TCG_CALL_NO_WG, i64, env, i32)
+DEF_HELPER_1(attn, noreturn, env)
 #endif
 #endif
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d4dd022df4..ee33141476 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -1198,3 +1198,7 @@ EIEIO   01 - - - 1101010110 -
 
 MFBHRBE 01 . . . 0100101110 -   @XFX_bhrbe
 CLRBHRB 01 - - - 0110101110 -
+
+## Misc POWER instructions
+
+ATTN00 0 0 0 01 0
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 1ec84b5ddc..ee01415c32 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -2107,6 +2107,26 @@ static int check_pow_hid0_74xx(CPUPPCState *env)
 return 0;
 }
 
+#if defined(TARGET_PPC64)
+static int check_attn_hid0(CPUPPCState *env)
+{
+if (env->spr[SPR_HID0] & HID0_ENABLE_ATTN) {
+return 1;
+}
+
+return 0;
+}
+
+static int check_attn_hid0_power9(CPUPPCState *env)
+{
+if (env->spr[SPR_HID0] & HID0_POWER9_ENABLE_ATTN) {
+return 1;
+}
+
+return 0;
+}
+#endif
+
 static void init_proc_405(CPUPPCState *env)
 {
 register_40x_sprs(env);
@@ -2138,6 +2158,7 @@ POWERPC_FAMILY(405)(ObjectClass *oc, void *data)
 dc->desc = "PowerPC 405";
 pcc->init_proc = init_proc_405;
 pcc->check_pow = check_pow_nocheck;
+pcc->check_attn = check_attn_none;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
PPC_DCR | PPC_WRTEE |
PPC_CACHE | PPC_CACHE_ICBI | PPC_40x_ICBT |
@@ -2210,6 +2231,7 @@ POWERPC_FAMILY(440EP)(ObjectClass *oc, void *data)
 dc->desc = "PowerPC 440 EP";
 pcc->init_proc = init_proc_440EP;
 pcc->check_pow = check_pow_nocheck;
+pcc->check_attn = check_attn_none;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING |
PPC_FLOAT | PPC_FLOAT_FRES | PPC_FLOAT_FSEL |
PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
@@ -2248,6 +2270,7 @@ POWERPC_FAMILY(460EX)(ObjectClass *oc, void *data)
 dc->desc = "PowerPC 460 EX";
 pcc->init_proc = init_proc_440EP;
 pcc->check_pow = check_pow_nocheck;
+pcc->check_attn = check_attn_none;
 pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING |
PPC_FLOAT |

[PATCH v2 05/12] target/ppc: Wire up BookE ATB registers for e500 family

2024-05-20 Thread Nicholas Piggin

>From the Freescale PowerPC Architecture Primer:

  Alternate time base APU. This APU, implemented on the e500v2, defines
  a 64-bit time base counter that differs from the PowerPC defined time
  base in that it is not writable and counts at a different, and
  typically much higher, frequency. The alternate time base always
  counts up, wrapping when the 64-bit count overflows.

This implementation of ATB uses the same frequency as the TB. The
existing spr_read_atbu/l functions are unused without this patch
to wire them into the SPR.

RTEMS uses this SPR on the e6500, though this hasn't been tested.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu_init.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 927721d49a..892fb6ce02 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -920,6 +920,18 @@ static void register_BookE206_sprs(CPUPPCState *env, 
uint32_t mas_mask,
 #endif
 }
 
+static void register_atb_sprs(CPUPPCState *env)
+{
+spr_register(env, SPR_ATBL, "ATBL",
+ _read_atbl, SPR_NOACCESS,
+ _read_atbl, SPR_NOACCESS,
+ 0x);
+spr_register(env, SPR_ATBU, "ATBU",
+ _read_atbu, SPR_NOACCESS,
+ _read_atbu, SPR_NOACCESS,
+ 0x);
+}
+
 /* SPR specific to PowerPC 440 implementation */
 static void register_440_sprs(CPUPPCState *env)
 {
@@ -2927,6 +2939,11 @@ static void init_proc_e500(CPUPPCState *env, int version)
 register_BookE206_sprs(env, 0x00DF, tlbncfg, mmucfg);
 register_usprgh_sprs(env);
 
+if (version != fsl_e500v1) {
+/* e500v1 has no support for alternate timebase */
+register_atb_sprs(env);
+}
+
 spr_register(env, SPR_HID0, "HID0",
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _write_generic,
-- 
2.43.0

[PATCH v2 11/12] target/ppc: Implement SPRC/SPRD SPRs

2024-05-20 Thread Nicholas Piggin

This implements the POWER SPRC/SPRD SPRs, and SCRATCH0-7 registers that
can be accessed via these indirect SPRs.

SCRATCH registers only provide storage, but they are used by firmware
for low level crash and progress data, so this implementation logs
writes to the registers to help with analysis.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/cpu.h |  7 +++--
 target/ppc/helper.h  |  3 ++
 target/ppc/spr_common.h  |  3 ++
 target/ppc/cpu_init.c| 10 ++
 target/ppc/misc_helper.c | 66 
 target/ppc/translate.c   | 18 +++
 6 files changed, 105 insertions(+), 2 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 823be85d03..e4c342b17d 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1264,6 +1264,9 @@ struct CPUArchState {
 ppc_slb_t slb[MAX_SLB_ENTRIES]; /* PowerPC 64 SLB area */
 struct CPUBreakpoint *ciabr_breakpoint;
 struct CPUWatchpoint *dawr0_watchpoint;
+
+/* POWER CPU regs/state */
+target_ulong scratch[8]; /* SCRATCH registers (shared across core) */
 #endif
 target_ulong sr[32];   /* segment registers */
 uint32_t nb_BATs;  /* number of BATs */
@@ -1806,9 +1809,9 @@ void ppc_compat_add_property(Object *obj, const char 
*name,
 #define SPR_SPRG2 (0x112)
 #define SPR_SPRG3 (0x113)
 #define SPR_SPRG4 (0x114)
-#define SPR_SCOMC (0x114)
+#define SPR_POWER_SPRC(0x114)
 #define SPR_SPRG5 (0x115)
-#define SPR_SCOMD (0x115)
+#define SPR_POWER_SPRD(0x115)
 #define SPR_SPRG6 (0x116)
 #define SPR_SPRG7 (0x117)
 #define SPR_ASR   (0x118)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 09d50f9b76..57bf8354e7 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -730,6 +730,9 @@ DEF_HELPER_2(book3s_msgsndp, void, env, tl)
 DEF_HELPER_2(book3s_msgclrp, void, env, tl)
 DEF_HELPER_1(load_tfmr, tl, env)
 DEF_HELPER_2(store_tfmr, void, env, tl)
+DEF_HELPER_FLAGS_2(store_sprc, TCG_CALL_NO_RWG, void, env, tl)
+DEF_HELPER_FLAGS_1(load_sprd, TCG_CALL_NO_RWG_SE, tl, env)
+DEF_HELPER_FLAGS_2(store_sprd, TCG_CALL_NO_RWG, void, env, tl)
 #endif
 DEF_HELPER_2(store_sdr1, void, env, tl)
 DEF_HELPER_2(store_pidr, void, env, tl)
diff --git a/target/ppc/spr_common.h b/target/ppc/spr_common.h
index 85f73b860b..01aff449bc 100644
--- a/target/ppc/spr_common.h
+++ b/target/ppc/spr_common.h
@@ -207,6 +207,9 @@ void spr_write_lpcr(DisasContext *ctx, int sprn, int gprn);
 void spr_read_dexcr_ureg(DisasContext *ctx, int gprn, int sprn);
 void spr_read_ppr32(DisasContext *ctx, int sprn, int gprn);
 void spr_write_ppr32(DisasContext *ctx, int sprn, int gprn);
+void spr_write_sprc(DisasContext *ctx, int sprn, int gprn);
+void spr_read_sprd(DisasContext *ctx, int sprn, int gprn);
+void spr_write_sprd(DisasContext *ctx, int sprn, int gprn);
 #endif
 
 void register_low_BATs(CPUPPCState *env);
diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c
index 7f2f8e5a4a..f21dbcfefb 100644
--- a/target/ppc/cpu_init.c
+++ b/target/ppc/cpu_init.c
@@ -5794,6 +5794,16 @@ static void register_power_common_book4_sprs(CPUPPCState 
*env)
  SPR_NOACCESS, SPR_NOACCESS,
  _read_generic, _core_write_generic,
  0x);
+spr_register_hv(env, SPR_POWER_SPRC, "SPRC",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_generic, _write_sprc,
+ 0x);
+spr_register_hv(env, SPR_POWER_SPRD, "SPRD",
+ SPR_NOACCESS, SPR_NOACCESS,
+ SPR_NOACCESS, SPR_NOACCESS,
+ _read_sprd, _write_sprd,
+ 0x);
 #endif
 }
 
diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index a67930d031..fa47be2298 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -307,6 +307,72 @@ void helper_store_dpdes(CPUPPCState *env, target_ulong val)
 }
 bql_unlock();
 }
+
+/* Indirect SCOM (SPRC/SPRD) access to SCRATCH0-7 are implemented. */
+void helper_store_sprc(CPUPPCState *env, target_ulong val)
+{
+if (val & ~0x3f8ULL) {
+qemu_log_mask(LOG_GUEST_ERROR, "Invalid SPRC register value "
+  TARGET_FMT_lx"\n", val);
+return;
+}
+env->spr[SPR_POWER_SPRC] = val;
+}
+
+target_ulong helper_load_sprd(CPUPPCState *env)
+{
+target_ulong sprc = env->spr[SPR_POWER_SPRC];
+
+switch (sprc & 0x3c0) {
+case 0: /* SCRATCH0-7 */
+return env->scratch[(sprc >> 3) & 0x7];
+default:
+qemu_log_mask(LOG_UNIMP, "mfSPRD: Unimplemented SPRC:0x"
+  TARGET_FMT_lx"\n", sprc);
+break;
+}
+return 0;
+}
+
+static void do_store_scratch(CPUPPCState *env, int nr, target_ulong val)
+{
+CPUState *cs = env_cpu(env);
+CPUState *ccs;
+uint32_t nr_threads = cs->nr_threads;
+
+/*
+

[PATCH v2 09/12] target/ppc: Add SMT support to PTCR SPR

2024-05-20 Thread Nicholas Piggin

PTCR is a per-core register.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/misc_helper.c | 16 ++--
 target/ppc/translate.c   |  4 
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/target/ppc/misc_helper.c b/target/ppc/misc_helper.c
index 6f419c9346..a67930d031 100644
--- a/target/ppc/misc_helper.c
+++ b/target/ppc/misc_helper.c
@@ -173,6 +173,7 @@ void helper_store_sdr1(CPUPPCState *env, target_ulong val)
 void helper_store_ptcr(CPUPPCState *env, target_ulong val)
 {
 if (env->spr[SPR_PTCR] != val) {
+CPUState *cs = env_cpu(env);
 PowerPCCPU *cpu = env_archcpu(env);
 target_ulong ptcr_mask = PTCR_PATB | PTCR_PATS;
 target_ulong patbsize = val & PTCR_PATS;
@@ -194,8 +195,19 @@ void helper_store_ptcr(CPUPPCState *env, target_ulong val)
 return;
 }
 
-env->spr[SPR_PTCR] = val;
-tlb_flush(env_cpu(env));
+if (cs->nr_threads == 1 || !(env->flags & POWERPC_FLAG_SMT_1LPAR)) {
+env->spr[SPR_PTCR] = val;
+tlb_flush(cs);
+} else {
+CPUState *ccs;
+
+THREAD_SIBLING_FOREACH(cs, ccs) {
+PowerPCCPU *ccpu = POWERPC_CPU(ccs);
+CPUPPCState *cenv = >env;
+cenv->spr[SPR_PTCR] = val;
+tlb_flush(ccs);
+}
+}
 }
 }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c688551434..76f829ad12 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -971,6 +971,10 @@ void spr_write_hior(DisasContext *ctx, int sprn, int gprn)
 }
 void spr_write_ptcr(DisasContext *ctx, int sprn, int gprn)
 {
+if (!gen_serialize_core(ctx)) {
+return;
+}
+
 gen_helper_store_ptcr(tcg_env, cpu_gpr[gprn]);
 }
 
-- 
2.43.0

[PATCH v2 00/12] target/ppc: Various TCG emulation patches

2024-05-20 Thread Nicholas Piggin

This is a bunch of instruction and register additions, improved SMT
support, etc. for TCG.

Since v1:
- Not reposting the trivial memop patches that got reviews.
- Fix checkstop reason printing (Richard)
- Fix the attn instruction checks (Richard)
- Don't allocate tcg temp before SMT and serialization checks
  in spr_core_write_generic32() (Richard)
- Move attn to decodetree.

Thanks
Nick

Nicholas Piggin (12):
  target/ppc: Make checkstop actually stop the system
  target/ppc: improve checkstop logging
  target/ppc: Implement attn instruction on BookS 64-bit processors
  target/ppc: BookE DECAR SPR is 32-bit
  target/ppc: Wire up BookE ATB registers for e500 family
  target/ppc: Add PPR32 SPR
  target/ppc: add helper to write per-LPAR SPRs
  target/ppc: Add SMT support to simple SPRs
  target/ppc: Add SMT support to PTCR SPR
  target/ppc: Implement LDBAR, TTR SPRs
  target/ppc: Implement SPRC/SPRD SPRs
  target/ppc: add SMT support to msgsnd broadcast

 target/ppc/cpu.h  |  28 +++-
 target/ppc/helper.h   |   6 +-
 target/ppc/spr_common.h   |   7 +
 target/ppc/insn32.decode  |   4 +
 target/ppc/cpu_init.c | 132 +-
 target/ppc/excp_helper.c  | 114 +++
 target/ppc/misc_helper.c  |  82 ++-
 target/ppc/translate.c|  66 +
 target/ppc/translate/misc-impl.c.inc  |  10 ++
 .../ppc/translate/processor-ctrl-impl.c.inc   |   2 +-
 10 files changed, 409 insertions(+), 42 deletions(-)

-- 
2.43.0

[PATCH v2 01/12] target/ppc: Make checkstop actually stop the system

2024-05-20 Thread Nicholas Piggin

checkstop state does not halt the system, interrupts continue to be
serviced, and other CPUs run. Make it stop the machine with
qemu_system_guest_panicked.

Signed-off-by: Nicholas Piggin 
---
 target/ppc/excp_helper.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 3be086d10b..b2b51537b7 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -19,6 +19,8 @@
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
 #include "qemu/log.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "internal.h"
@@ -425,6 +427,8 @@ static void powerpc_set_excp_state(PowerPCCPU *cpu, 
target_ulong vector,
 
 static void powerpc_mcheck_checkstop(CPUPPCState *env)
 {
+/* KVM guests always have MSR[ME] enabled */
+#ifdef CONFIG_TCG
 CPUState *cs = env_cpu(env);
 
 if (FIELD_EX64(env->msr, MSR, ME)) {
@@ -437,9 +441,15 @@ static void powerpc_mcheck_checkstop(CPUPPCState *env)
 if (qemu_log_separate()) {
 qemu_log("Machine check while not allowed. "
  "Entering checkstop state\n");
-}
-cs->halted = 1;
-cpu_interrupt_exittb(cs);
+
+/*
+ * This stops the machine and logs CPU state without killing QEMU
+ * (like cpu_abort()) so the machine can still be debugged (because
+ * it is often a guest error).
+ */
+qemu_system_guest_panicked(NULL);
+cpu_loop_exit_noexc(cs);
+#endif
 }
 
 static void powerpc_excp_40x(PowerPCCPU *cpu, int excp)
-- 
2.43.0

RE: [RISC-V][tech-server-soc] [RFC v2 2/2] hw/riscv: Add server platform reference machine

2024-05-20 Thread Xu, Haibo1

> -Original Message-
> From: tech-server-...@lists.riscv.org  On
> Behalf Of Andrew Jones
> Sent: Monday, May 20, 2024 11:56 PM
> To: Wu, Fei2 
> Cc: pbonz...@redhat.com; pal...@dabbelt.com; alistair.fran...@wdc.com;
> Meng, Bin ; liwei1...@gmail.com;
> dbarb...@ventanamicro.com; zhiwei_...@linux.alibaba.com; qemu-
> de...@nongnu.org; qemu-ri...@nongnu.org; Warkentin, Andrei
> ; shaolin@alibaba-inc.com;
> v...@rivosinc.com; suni...@ventanamicro.com; Xu, Haibo1
> ; Chai, Evan ; Wang, Yin
> ; tech-server-platf...@lists.riscv.org; tech-server-
> s...@lists.riscv.org; ati...@rivosinc.com; co...@kernel.org;
> heinrich.schucha...@canonical.com; marcin.juszkiew...@linaro.org
> Subject: Re: [RISC-V][tech-server-soc] [RFC v2 2/2] hw/riscv: Add server 
> platform
> reference machine
> 
> On Tue, Mar 12, 2024 at 09:52:21PM GMT, Fei Wu wrote:
> > The RISC-V Server Platform specification[1] defines a standardized set
> > of hardware and software capabilities, that portable system software,
> > such as OS and hypervisors can rely on being present in a RISC-V
> > server platform.
> >
> > A corresponding Qemu RISC-V server platform reference (rvsp-ref for
> > short) machine type is added to provide a environment for firmware/OS
> > development and testing. The main features included in rvsp-ref are:
> >
> >  - Based on riscv virt machine type
> >  - A new memory map as close as virt machine as possible
> >  - A new virt CPU type rvsp-ref-cpu for server platform compliance
> >  - AIA
> >  - PCIe AHCI
> >  - PCIe NIC
> 
> We should rebase on the IOMMU series [1] and add an IOMMU to the platform,
> as it's required by the Server Soc spec (which is required by the server 
> platform
> spec).
> 
> [1] https://lore.kernel.org/qemu-devel/20240307160319.675044-1-
> dbarb...@ventanamicro.com/
> 
 
Good point! Then we can also include the IOMMU driver in Linux for integration 
test.

> Thanks,
> drew
> 
> 
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
> View/Reply Online (#171): 
> https://lists.riscv.org/g/tech-server-soc/message/171
> Mute This Topic: https://lists.riscv.org/mt/104884663/7216082
> Group Owner: tech-server-soc+ow...@lists.riscv.org
> Unsubscribe: https://lists.riscv.org/g/tech-server-
> soc/leave/12613148/7216082/2077856617/xyzzy [haibo1...@intel.com]
> -=-=-=-=-=-=-=-=-=-=-=-
>

[PATCH V10 8/8] docs/specs/acpi_hw_reduced_hotplug: Add the CPU Hotplug Event Bit

2024-05-20 Thread Salil Mehta via

GED interface is used by many hotplug events like memory hotplug, NVDIMM hotplug
and non-hotplug events like system power down event. Each of these can be
selected using a bit in the 32 bit GED IO interface. A bit has been reserved for
the CPU hotplug event.

Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
---
 docs/specs/acpi_hw_reduced_hotplug.rst | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/docs/specs/acpi_hw_reduced_hotplug.rst 
b/docs/specs/acpi_hw_reduced_hotplug.rst
index 0bd3f9399f..3acd6fcd8b 100644
--- a/docs/specs/acpi_hw_reduced_hotplug.rst
+++ b/docs/specs/acpi_hw_reduced_hotplug.rst
@@ -64,7 +64,8 @@ GED IO interface (4 byte access)
0: Memory hotplug event
1: System power down event
2: NVDIMM hotplug event
-3-31: Reserved
+   3: CPU hotplug event
+4-31: Reserved
 
 **write_access:**
 
-- 
2.34.1

[PATCH V10 7/8] gdbstub: Add helper function to unregister GDB register space

2024-05-20 Thread Salil Mehta via

Add common function to help unregister the GDB register space. This shall be
done in context to the CPU unrealization.

Signed-off-by: Salil Mehta 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Gavin Shan 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
Reviewed-by: Vishnu Pajjuri 
---
 gdbstub/gdbstub.c  | 13 +
 hw/core/cpu-common.c   |  1 -
 include/exec/gdbstub.h |  6 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index b3574997ea..1949b09240 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -617,6 +617,19 @@ void gdb_register_coprocessor(CPUState *cpu,
 }
 }
 
+void gdb_unregister_coprocessor_all(CPUState *cpu)
+{
+/*
+ * Safe to nuke everything. GDBRegisterState::xml is static const char so
+ * it won't be freed
+ */
+g_array_free(cpu->gdb_regs, true);
+
+cpu->gdb_regs = NULL;
+cpu->gdb_num_regs = 0;
+cpu->gdb_num_g_regs = 0;
+}
+
 static void gdb_process_breakpoint_remove_all(GDBProcess *p)
 {
 CPUState *cpu = gdb_get_first_cpu_in_process(p);
diff --git a/hw/core/cpu-common.c b/hw/core/cpu-common.c
index 0f0a247f56..e5140b4bc1 100644
--- a/hw/core/cpu-common.c
+++ b/hw/core/cpu-common.c
@@ -274,7 +274,6 @@ static void cpu_common_finalize(Object *obj)
 {
 CPUState *cpu = CPU(obj);
 
-g_array_free(cpu->gdb_regs, TRUE);
 qemu_lockcnt_destroy(>in_ioctl_lock);
 qemu_mutex_destroy(>work_mutex);
 }
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index eb14b91139..249d4d4bc8 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -49,6 +49,12 @@ void gdb_register_coprocessor(CPUState *cpu,
   gdb_get_reg_cb get_reg, gdb_set_reg_cb set_reg,
   const GDBFeature *feature, int g_pos);
 
+/**
+ * gdb_unregister_coprocessor_all() - unregisters supplemental set of registers
+ * @cpu - the CPU associated with registers
+ */
+void gdb_unregister_coprocessor_all(CPUState *cpu);
+
 /**
  * gdbserver_start: start the gdb server
  * @port_or_device: connection spec for gdb
-- 
2.34.1

[PATCH V10 6/8] physmem: Add helper function to destroy CPU AddressSpace

2024-05-20 Thread Salil Mehta via

Virtual CPU Hot-unplug leads to unrealization of a CPU object. This also
involves destruction of the CPU AddressSpace. Add common function to help
destroy the CPU AddressSpace.

Signed-off-by: Salil Mehta 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Gavin Shan 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
---
 include/exec/cpu-common.h |  8 
 include/hw/core/cpu.h |  1 +
 system/physmem.c  | 29 +
 3 files changed, 38 insertions(+)

diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index 815342d043..240ee04369 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -129,6 +129,14 @@ size_t qemu_ram_pagesize_largest(void);
  */
 void cpu_address_space_init(CPUState *cpu, int asidx,
 const char *prefix, MemoryRegion *mr);
+/**
+ * cpu_address_space_destroy:
+ * @cpu: CPU for which address space needs to be destroyed
+ * @asidx: integer index of this address space
+ *
+ * Note that with KVM only one address space is supported.
+ */
+void cpu_address_space_destroy(CPUState *cpu, int asidx);
 
 void cpu_physical_memory_rw(hwaddr addr, void *buf,
 hwaddr len, bool is_write);
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index bb398e8237..60b160d0b4 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -486,6 +486,7 @@ struct CPUState {
 QSIMPLEQ_HEAD(, qemu_work_item) work_list;
 
 struct CPUAddressSpace *cpu_ases;
+int cpu_ases_count;
 int num_ases;
 AddressSpace *as;
 MemoryRegion *memory;
diff --git a/system/physmem.c b/system/physmem.c
index 342b7a8fd4..146f17826a 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -763,6 +763,7 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
 
 if (!cpu->cpu_ases) {
 cpu->cpu_ases = g_new0(CPUAddressSpace, cpu->num_ases);
+cpu->cpu_ases_count = cpu->num_ases;
 }
 
 newas = >cpu_ases[asidx];
@@ -776,6 +777,34 @@ void cpu_address_space_init(CPUState *cpu, int asidx,
 }
 }
 
+void cpu_address_space_destroy(CPUState *cpu, int asidx)
+{
+CPUAddressSpace *cpuas;
+
+assert(cpu->cpu_ases);
+assert(asidx >= 0 && asidx < cpu->num_ases);
+/* KVM cannot currently support multiple address spaces. */
+assert(asidx == 0 || !kvm_enabled());
+
+cpuas = >cpu_ases[asidx];
+if (tcg_enabled()) {
+memory_listener_unregister(>tcg_as_listener);
+}
+
+address_space_destroy(cpuas->as);
+g_free_rcu(cpuas->as, rcu);
+
+if (asidx == 0) {
+/* reset the convenience alias for address space 0 */
+cpu->as = NULL;
+}
+
+if (--cpu->cpu_ases_count == 0) {
+g_free(cpu->cpu_ases);
+cpu->cpu_ases = NULL;
+}
+}
+
 AddressSpace *cpu_get_address_space(CPUState *cpu, int asidx)
 {
 /* Return the AddressSpace corresponding to the specified index */
-- 
2.34.1

[PATCH V10 5/8] hw/acpi: Update CPUs AML with cpu-(ctrl)dev change

2024-05-20 Thread Salil Mehta via

CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port
based and existing CPUs AML code assumes _CRS objects would evaluate to a system
resource which describes IO Port address. But on ARM arch CPUs control
device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should
evaluate to system resource which describes memory-mapped base address. Update
build CPUs AML function to accept both IO/MEMORY region spaces and accordingly
update the _CRS object.

On x86, CPU Hotplug uses Generic ACPI GPE Block Bit 2 (GPE.2) event handler to
notify OSPM about any CPU hot(un)plug events. Latest CPU Hotplug is based on
ACPI Generic Event Device framework and uses ACPI GED device for the same. Not
all architectures support GPE based CPU Hotplug event handler. Hence, make AML
for GPE.2 event handler conditional.

Co-developed-by: Keqian Zhu 
Signed-off-by: Keqian Zhu 
Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
---
 hw/acpi/cpu.c | 23 ---
 hw/i386/acpi-build.c  |  3 ++-
 include/hw/acpi/cpu.h |  5 +++--
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index af2b6655d2..4c63514b16 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -343,9 +343,10 @@ const VMStateDescription vmstate_cpu_hotplug = {
 #define CPU_FW_EJECT_EVENT "CEJF"
 
 void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
-build_madt_cpu_fn build_madt_cpu, hwaddr io_base,
+build_madt_cpu_fn build_madt_cpu, hwaddr base_addr,
 const char *res_root,
-const char *event_handler_method)
+const char *event_handler_method,
+AmlRegionSpace rs)
 {
 Aml *ifctx;
 Aml *field;
@@ -370,13 +371,19 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 aml_append(cpu_ctrl_dev, aml_mutex(CPU_LOCK, 0));
 
 crs = aml_resource_template();
-aml_append(crs, aml_io(AML_DECODE16, io_base, io_base, 1,
+if (rs == AML_SYSTEM_IO) {
+aml_append(crs, aml_io(AML_DECODE16, base_addr, base_addr, 1,
ACPI_CPU_HOTPLUG_REG_LEN));
+} else {
+aml_append(crs, aml_memory32_fixed(base_addr,
+   ACPI_CPU_HOTPLUG_REG_LEN, AML_READ_WRITE));
+}
+
 aml_append(cpu_ctrl_dev, aml_name_decl("_CRS", crs));
 
 /* declare CPU hotplug MMIO region with related access fields */
 aml_append(cpu_ctrl_dev,
-aml_operation_region("PRST", AML_SYSTEM_IO, aml_int(io_base),
+aml_operation_region("PRST", rs, aml_int(base_addr),
  ACPI_CPU_HOTPLUG_REG_LEN));
 
 field = aml_field("PRST", AML_BYTE_ACC, AML_NOLOCK,
@@ -700,9 +707,11 @@ void build_cpus_aml(Aml *table, MachineState *machine, 
CPUHotplugFeatures opts,
 aml_append(sb_scope, cpus_dev);
 aml_append(table, sb_scope);
 
-method = aml_method(event_handler_method, 0, AML_NOTSERIALIZED);
-aml_append(method, aml_call0("\\_SB.CPUS." CPU_SCAN_METHOD));
-aml_append(table, method);
+if (event_handler_method) {
+method = aml_method(event_handler_method, 0, AML_NOTSERIALIZED);
+aml_append(method, aml_call0("\\_SB.CPUS." CPU_SCAN_METHOD));
+aml_append(table, method);
+}
 
 g_free(cphp_res_path);
 }
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 53f804ac16..b73b136605 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1537,7 +1537,8 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 .fw_unplugs_cpu = pm->smi_on_cpu_unplug,
 };
 build_cpus_aml(dsdt, machine, opts, pc_madt_cpu_entry,
-   pm->cpu_hp_io_base, "\\_SB.PCI0", "\\_GPE._E02");
+   pm->cpu_hp_io_base, "\\_SB.PCI0", "\\_GPE._E02",
+   AML_SYSTEM_IO);
 }
 
 if (pcms->memhp_io_base && nr_mem) {
diff --git a/include/hw/acpi/cpu.h b/include/hw/acpi/cpu.h
index e6e1a9ef59..48cded697c 100644
--- a/include/hw/acpi/cpu.h
+++ b/include/hw/acpi/cpu.h
@@ -61,9 +61,10 @@ typedef void (*build_madt_cpu_fn)(int uid, const 
CPUArchIdList *apic_ids,
   GArray *entry, bool force_enabled);
 
 void build_cpus_aml(Aml *table, MachineState *machine, CPUHotplugFeatures opts,
-build_madt_cpu_fn build_madt_cpu, hwaddr io_base,
+build_madt_cpu_fn build_madt_cpu, hwaddr base_addr,
 const char *res_root,
-const char *event_handler_method);
+const char *event_handler_method,
+AmlRegionSpace rs);
 
 void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list);

[PATCH V10 4/8] hw/acpi: Update GED _EVT method AML with CPU scan

2024-05-20 Thread Salil Mehta via

OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually
results in start of the CPU scan. Scan figures out the CPU and the kind of
event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT
method with the call to \\_SB.CPUS.CSCN

Also, macro CPU_SCAN_METHOD might be referred in other places like during GED
intialization so it makes sense to have its definition placed in some common
header file like cpu_hotplug.h. But doing this can cause compilation break
because of the conflicting macro definitions present in cpu.c and cpu_hotplug.c
and because both these files get compiled due to historic reasons of x86 world
i.e. decision to use legacy(GPE.2)/modern(GED) CPU hotplug interface happens
during runtime [1]. To mitigate above, for now, declare a new common macro
ACPI_CPU_SCAN_METHOD for CPU scan method instead.
(This needs a separate discussion later on for clean-up)

Reference:
[1] 
https://lore.kernel.org/qemu-devel/1463496205-251412-24-git-send-email-imamm...@redhat.com/

Co-developed-by: Keqian Zhu 
Signed-off-by: Keqian Zhu 
Signed-off-by: Salil Mehta 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
---
 hw/acpi/cpu.c  | 2 +-
 hw/acpi/generic_event_device.c | 4 
 include/hw/acpi/cpu_hotplug.h  | 2 ++
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 473b37ba88..af2b6655d2 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -327,7 +327,7 @@ const VMStateDescription vmstate_cpu_hotplug = {
 #define CPUHP_RES_DEVICE  "PRES"
 #define CPU_LOCK  "CPLK"
 #define CPU_STS_METHOD"CSTA"
-#define CPU_SCAN_METHOD   "CSCN"
+#define CPU_SCAN_METHOD   ACPI_CPU_SCAN_METHOD
 #define CPU_NOTIFY_METHOD "CTFY"
 #define CPU_EJECT_METHOD  "CEJ0"
 #define CPU_OST_METHOD"COST"
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 54d3b4bf9d..63226b0040 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -109,6 +109,10 @@ void build_ged_aml(Aml *table, const char *name, 
HotplugHandler *hotplug_dev,
 aml_append(if_ctx, aml_call0(MEMORY_DEVICES_CONTAINER "."
  MEMORY_SLOT_SCAN_METHOD));
 break;
+case ACPI_GED_CPU_HOTPLUG_EVT:
+aml_append(if_ctx, aml_call0(ACPI_CPU_CONTAINER "."
+ ACPI_CPU_SCAN_METHOD));
+break;
 case ACPI_GED_PWR_DOWN_EVT:
 aml_append(if_ctx,
aml_notify(aml_name(ACPI_POWER_BUTTON_DEVICE),
diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h
index 48b291e45e..ef631750b4 100644
--- a/include/hw/acpi/cpu_hotplug.h
+++ b/include/hw/acpi/cpu_hotplug.h
@@ -20,6 +20,8 @@
 #include "hw/acpi/cpu.h"
 
 #define ACPI_CPU_HOTPLUG_REG_LEN 12
+#define ACPI_CPU_SCAN_METHOD "CSCN"
+#define ACPI_CPU_CONTAINER "\\_SB.CPUS"
 
 typedef struct AcpiCpuHotplug {
 Object *device;
-- 
2.34.1

[PATCH V10 2/8] hw/acpi: Move CPU ctrl-dev MMIO region len macro to common header file

2024-05-20 Thread Salil Mehta via

CPU ctrl-dev MMIO region length could be used in ACPI GED and various other
architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more
appropriate common header file.

Signed-off-by: Salil Mehta 
Reviewed-by: Alex Bennée 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Gavin Shan 
Reviewed-by: David Hildenbrand 
Reviewed-by: Shaoqin Huang 
Tested-by: Vishnu Pajjuri 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
---
 hw/acpi/cpu.c | 2 +-
 include/hw/acpi/cpu_hotplug.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 2d81c1e790..69aaa563db 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -1,13 +1,13 @@
 #include "qemu/osdep.h"
 #include "migration/vmstate.h"
 #include "hw/acpi/cpu.h"
+#include "hw/acpi/cpu_hotplug.h"
 #include "hw/core/cpu.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-acpi.h"
 #include "trace.h"
 #include "sysemu/numa.h"
 
-#define ACPI_CPU_HOTPLUG_REG_LEN 12
 #define ACPI_CPU_SELECTOR_OFFSET_WR 0
 #define ACPI_CPU_FLAGS_OFFSET_RW 4
 #define ACPI_CPU_CMD_OFFSET_WR 5
diff --git a/include/hw/acpi/cpu_hotplug.h b/include/hw/acpi/cpu_hotplug.h
index 3b932a..48b291e45e 100644
--- a/include/hw/acpi/cpu_hotplug.h
+++ b/include/hw/acpi/cpu_hotplug.h
@@ -19,6 +19,8 @@
 #include "hw/hotplug.h"
 #include "hw/acpi/cpu.h"
 
+#define ACPI_CPU_HOTPLUG_REG_LEN 12
+
 typedef struct AcpiCpuHotplug {
 Object *device;
 MemoryRegion io;
-- 
2.34.1

[PATCH V10 3/8] hw/acpi: Update ACPI GED framework to support vCPU Hotplug

2024-05-20 Thread Salil Mehta via

ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the
_CRS object of GED to intimate OSPM about an event. Later then demultiplexes the
notified event by evaluating ACPI _EVT method to know the type of event. Use
ACPI GED to also notify the guest kernel about any CPU hot(un)plug events.

ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG
support has been enabled for particular architecture. Add cpu_hotplug_hw_init()
stub to avoid compilation break.

Co-developed-by: Keqian Zhu 
Signed-off-by: Keqian Zhu 
Signed-off-by: Salil Mehta 
Reviewed-by: Jonathan Cameron 
Reviewed-by: Gavin Shan 
Reviewed-by: David Hildenbrand 
Reviewed-by: Shaoqin Huang 
Tested-by: Vishnu Pajjuri 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Vishnu Pajjuri 
---
 hw/acpi/acpi-cpu-hotplug-stub.c|  6 ++
 hw/acpi/cpu.c  |  6 +-
 hw/acpi/generic_event_device.c | 17 +
 include/hw/acpi/generic_event_device.h |  4 
 4 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/acpi-cpu-hotplug-stub.c b/hw/acpi/acpi-cpu-hotplug-stub.c
index 3fc4b14c26..c6c61bb9cd 100644
--- a/hw/acpi/acpi-cpu-hotplug-stub.c
+++ b/hw/acpi/acpi-cpu-hotplug-stub.c
@@ -19,6 +19,12 @@ void legacy_acpi_cpu_hotplug_init(MemoryRegion *parent, 
Object *owner,
 return;
 }
 
+void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
+ CPUHotplugState *state, hwaddr base_addr)
+{
+return;
+}
+
 void acpi_cpu_ospm_status(CPUHotplugState *cpu_st, ACPIOSTInfoList ***list)
 {
 return;
diff --git a/hw/acpi/cpu.c b/hw/acpi/cpu.c
index 69aaa563db..473b37ba88 100644
--- a/hw/acpi/cpu.c
+++ b/hw/acpi/cpu.c
@@ -221,7 +221,11 @@ void cpu_hotplug_hw_init(MemoryRegion *as, Object *owner,
 const CPUArchIdList *id_list;
 int i;
 
-assert(mc->possible_cpu_arch_ids);
+/* hotplug might not be available for all types like x86/microvm etc. */
+if (!mc->possible_cpu_arch_ids) {
+return;
+}
+
 id_list = mc->possible_cpu_arch_ids(machine);
 state->dev_count = id_list->len;
 state->devs = g_new0(typeof(*state->devs), state->dev_count);
diff --git a/hw/acpi/generic_event_device.c b/hw/acpi/generic_event_device.c
index 2d6e91b124..54d3b4bf9d 100644
--- a/hw/acpi/generic_event_device.c
+++ b/hw/acpi/generic_event_device.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "hw/acpi/acpi.h"
+#include "hw/acpi/cpu.h"
 #include "hw/acpi/generic_event_device.h"
 #include "hw/irq.h"
 #include "hw/mem/pc-dimm.h"
@@ -25,6 +26,7 @@ static const uint32_t ged_supported_events[] = {
 ACPI_GED_MEM_HOTPLUG_EVT,
 ACPI_GED_PWR_DOWN_EVT,
 ACPI_GED_NVDIMM_HOTPLUG_EVT,
+ACPI_GED_CPU_HOTPLUG_EVT,
 };
 
 /*
@@ -234,6 +236,8 @@ static void acpi_ged_device_plug_cb(HotplugHandler 
*hotplug_dev,
 } else {
 acpi_memory_plug_cb(hotplug_dev, >memhp_state, dev, errp);
 }
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_plug_cb(hotplug_dev, >cpuhp_state, dev, errp);
 } else {
 error_setg(errp, "virt: device plug request for unsupported device"
" type: %s", object_get_typename(OBJECT(dev)));
@@ -248,6 +252,8 @@ static void acpi_ged_unplug_request_cb(HotplugHandler 
*hotplug_dev,
 if ((object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
!(object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM {
 acpi_memory_unplug_request_cb(hotplug_dev, >memhp_state, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_request_cb(hotplug_dev, >cpuhp_state, dev, errp);
 } else {
 error_setg(errp, "acpi: device unplug request for unsupported device"
" type: %s", object_get_typename(OBJECT(dev)));
@@ -261,6 +267,8 @@ static void acpi_ged_unplug_cb(HotplugHandler *hotplug_dev,
 
 if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
 acpi_memory_unplug_cb(>memhp_state, dev, errp);
+} else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+acpi_cpu_unplug_cb(>cpuhp_state, dev, errp);
 } else {
 error_setg(errp, "acpi: device unplug for unsupported device"
" type: %s", object_get_typename(OBJECT(dev)));
@@ -272,6 +280,7 @@ static void acpi_ged_ospm_status(AcpiDeviceIf *adev, 
ACPIOSTInfoList ***list)
 AcpiGedState *s = ACPI_GED(adev);
 
 acpi_memory_ospm_status(>memhp_state, list);
+acpi_cpu_ospm_status(>cpuhp_state, list);
 }
 
 static void acpi_ged_send_event(AcpiDeviceIf *adev, AcpiEventStatusBits ev)
@@ -286,6 +295,8 @@ static void acpi_ged_send_event(AcpiDeviceIf *adev, 
AcpiEventStatusBits ev)
 sel = ACPI_GED_PWR_DOWN_EVT;
 } else if (ev & ACPI_NVDIMM_HOTPLUG_STATUS) {
 sel = ACPI_GED_NVDIMM_HOTPLUG_EVT;
+} else if (ev & ACPI_CPU_HOTPLUG_STATUS) {
+sel = ACPI_GED_CPU_HOTPLUG_EVT;
 }

[PATCH V10 1/8] accel/kvm: Extract common KVM vCPU {creation, parking} code

2024-05-20 Thread Salil Mehta via

KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.

Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.

Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events. No functional change is intended here.

Signed-off-by: Salil Mehta 
Reviewed-by: Gavin Shan 
Tested-by: Vishnu Pajjuri 
Reviewed-by: Jonathan Cameron 
Tested-by: Xianglai Li 
Tested-by: Miguel Luis 
Reviewed-by: Shaoqin Huang 
Reviewed-by: Vishnu Pajjuri 
---
 accel/kvm/kvm-all.c| 97 --
 accel/kvm/kvm-cpus.h   | 23 ++
 accel/kvm/trace-events |  5 ++-
 3 files changed, 92 insertions(+), 33 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c0be9f5eed..a8f93078dc 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -340,14 +340,73 @@ err:
 return ret;
 }
 
+void kvm_park_vcpu(CPUState *cpu)
+{
+struct KVMParkedVcpu *vcpu;
+
+trace_kvm_park_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
+
+vcpu = g_malloc0(sizeof(*vcpu));
+vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+vcpu->kvm_fd = cpu->kvm_fd;
+QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+}
+
+int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id)
+{
+struct KVMParkedVcpu *cpu;
+
+QLIST_FOREACH(cpu, >kvm_parked_vcpus, node) {
+if (cpu->vcpu_id == vcpu_id) {
+int kvm_fd;
+
+trace_kvm_unpark_vcpu(vcpu_id);
+
+QLIST_REMOVE(cpu, node);
+kvm_fd = cpu->kvm_fd;
+g_free(cpu);
+return kvm_fd;
+}
+}
+
+return -ENOENT;
+}
+
+int kvm_create_vcpu(CPUState *cpu)
+{
+unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
+KVMState *s = kvm_state;
+int kvm_fd;
+
+/* check if the KVM vCPU already exist but is parked */
+kvm_fd = kvm_unpark_vcpu(s, vcpu_id);
+if (kvm_fd < 0) {
+/* vCPU not parked: create a new KVM vCPU */
+kvm_fd = kvm_vm_ioctl(s, KVM_CREATE_VCPU, vcpu_id);
+if (kvm_fd < 0) {
+error_report("KVM_CREATE_VCPU IOCTL failed for vCPU %lu", vcpu_id);
+return kvm_fd;
+}
+}
+
+trace_kvm_create_vcpu(cpu->cpu_index, vcpu_id, kvm_fd);
+
+cpu->kvm_fd = kvm_fd;
+cpu->kvm_state = s;
+cpu->vcpu_dirty = true;
+cpu->dirty_pages = 0;
+cpu->throttle_us_per_full = 0;
+
+return 0;
+}
+
 static int do_kvm_destroy_vcpu(CPUState *cpu)
 {
 KVMState *s = kvm_state;
 long mmap_size;
-struct KVMParkedVcpu *vcpu = NULL;
 int ret = 0;
 
-trace_kvm_destroy_vcpu();
+trace_kvm_destroy_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
 ret = kvm_arch_destroy_vcpu(cpu);
 if (ret < 0) {
@@ -373,10 +432,7 @@ static int do_kvm_destroy_vcpu(CPUState *cpu)
 }
 }
 
-vcpu = g_malloc0(sizeof(*vcpu));
-vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
-vcpu->kvm_fd = cpu->kvm_fd;
-QLIST_INSERT_HEAD(_state->kvm_parked_vcpus, vcpu, node);
+kvm_park_vcpu(cpu);
 err:
 return ret;
 }
@@ -389,24 +445,6 @@ void kvm_destroy_vcpu(CPUState *cpu)
 }
 }
 
-static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
-{
-struct KVMParkedVcpu *cpu;
-
-QLIST_FOREACH(cpu, >kvm_parked_vcpus, node) {
-if (cpu->vcpu_id == vcpu_id) {
-int kvm_fd;
-
-QLIST_REMOVE(cpu, node);
-kvm_fd = cpu->kvm_fd;
-g_free(cpu);
-return kvm_fd;
-}
-}
-
-return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
-}
-
 int kvm_init_vcpu(CPUState *cpu, Error **errp)
 {
 KVMState *s = kvm_state;
@@ -415,19 +453,14 @@ int kvm_init_vcpu(CPUState *cpu, Error **errp)
 
 trace_kvm_init_vcpu(cpu->cpu_index, kvm_arch_vcpu_id(cpu));
 
-ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
+ret = kvm_create_vcpu(cpu);
 if (ret < 0) {
-error_setg_errno(errp, -ret, "kvm_init_vcpu: kvm_get_vcpu failed 
(%lu)",
+error_setg_errno(errp, -ret,
+ "kvm_init_vcpu: kvm_create_vcpu failed (%lu)",
  kvm_arch_vcpu_id(cpu));
 goto err;
 }
 
-cpu->kvm_fd = ret;
-cpu->kvm_state = s;
-cpu->vcpu_dirty = true;
-cpu->dirty_pages = 0;
-cpu->throttle_us_per_full = 0;
-
 mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
 if (mmap_size < 0) {
 ret = mmap_size;
diff --git a/accel/kvm/kvm-cpus.h b/accel/kvm/kvm-cpus.h
index ca40add32c..2e6bb38b5d 100644
--- a/accel/kvm/kvm-cpus.h
+++ b/accel/kvm/kvm-cpus.h
@@ -22,5 +22,28 @@ bool kvm_supports_guest_debug(void);
 int kvm_insert_breakpoint(CPUState *cpu, int type, vaddr addr,

[PATCH V10 0/8] Add architecture agnostic code to support vCPU Hotplug

2024-05-20 Thread Salil Mehta via

Virtual CPU hotplug support is being added across various architectures[1][3].
This series adds various code bits common across all architectures:

1. vCPU creation and Parking code refactor [Patch 1]
2. Update ACPI GED framework to support vCPU Hotplug [Patch 2,3]
3. ACPI CPUs AML code change [Patch 4,5]
4. Helper functions to support unrealization of CPU objects [Patch 6,7]
5. Docs [Patch 8]


Repository:

[*] https://github.com/salil-mehta/qemu.git 
virt-cpuhp-armv8/rfc-v3.arch.agnostic.v10

NOTE: For ARM, above will work in combination of the architecture specific part 
based on
RFC V2 [1]. This architecture specific patch-set RFC V3 shall be floated soon 
and is present
at below location

[*] https://github.com/salil-mehta/qemu/tree/virt-cpuhp-armv8/rfc-v3-rc1


Revision History:

Patch-set  V9 -> V10
1. Addressed Nicholas Piggin's (IBM) & Philippe Mathieu-Daudé (Linaro) comments
   - carved out kvm_unpark_vcpu and added its trace
   - Widened the scope of the kvm_unpark_vcpu so that it can be used by generic 
framework
 being thought out
Link: 
https://lore.kernel.org/qemu-devel/20240519210620.228342-1-salil.me...@huawei.com/
Link: 
https://lore.kernel.org/qemu-devel/e94b0e14-efee-4050-9c9f-08382a36b...@linaro.org/

Patch-set  V8 -> V9
1. Addressed Vishnu Pajjuri's (Ampere) comments
   - Added kvm_fd to trace in kvm_create_vcpu
   - Some clean ups: arch vcpu-id and sbd variable
   - Added the missed initialization of cpu->gdb_num_regs
2. Addressed the commnet from Zhao Liu (Intel)
   - Make initialization of CPU Hotplug state conditional 
(possible_cpu_arch_ids!=NULL)
Link: 
https://lore.kernel.org/qemu-devel/2024031202.12992-1-salil.me...@huawei.com/

Patch-set V7 -> V8
1. Rebased and Fixed the conflicts

Patch-set  V6 -> V7
1. Addressed Alex Bennée's comments
   - Updated the docs
2. Addressed Igor Mammedov's comments
   - Merged patches [Patch V6 3/9] & [Patch V6 7/9] with [Patch V6 4/9]
   - Updated commit-log of [Patch V6 1/9] and [Patch V6 5/9] 
3. Added Shaoqin Huang's Reviewed-by tags for whole series.
Link: 
https://lore.kernel.org/qemu-devel/20231013105129.25648-1-salil.me...@huawei.com/

Patch-set  V5 -> V6
1. Addressed Gavin Shan's comments
   - Fixed the assert() ranges of address spaces
   - Rebased the patch-set to latest changes in the qemu.git
   - Added Reviewed-by tags for patches {8,9}
2. Addressed Jonathan Cameron's comments
   - Updated commit-log for [Patch V5 1/9] with mention of trace events
   - Added Reviewed-by tags for patches {1,5}
3. Added Tested-by tags from Xianglai Li
4. Fixed checkpatch.pl error "Qemu -> QEMU" in [Patch V5 1/9] 
Link: 
https://lore.kernel.org/qemu-devel/20231011194355.15628-1-salil.me...@huawei.com/

Patch-set  V4 -> V5
1. Addressed Gavin Shan's comments
   - Fixed the trace events print string for kvm_{create,get,park,destroy}_vcpu
   - Added Reviewed-by tag for patch {1}
2. Added Shaoqin Huang's Reviewed-by tags for Patches {2,3}
3. Added Tested-by Tag from Vishnu Pajjuri to the patch-set
4. Dropped the ARM specific [Patch V4 10/10]
Link: 
https://lore.kernel.org/qemu-devel/20231009203601.17584-1-salil.me...@huawei.com/

Patch-set  V3 -> V4
1. Addressed David Hilderbrand's comments
   - Fixed the wrong doc comment of kvm_park_vcpu API prototype
   - Added Reviewed-by tags for patches {2,4}
Link: 
https://lore.kernel.org/qemu-devel/20231009112812.10612-1-salil.me...@huawei.com/

Patch-set  V2 -> V3
1. Addressed Jonathan Cameron's comments
   - Fixed 'vcpu-id' type wrongly changed from 'unsigned long' to 'integer'
   - Removed unnecessary use of variable 'vcpu_id' in kvm_park_vcpu
   - Updated [Patch V2 3/10] commit-log with details of ACPI_CPU_SCAN_METHOD 
macro
   - Updated [Patch V2 5/10] commit-log with details of conditional event 
handler method
   - Added Reviewed-by tags for patches {2,3,4,6,7}
2. Addressed Gavin Shan's comments
   - Remove unnecessary use of variable 'vcpu_id' in kvm_par_vcpu
   - Fixed return value in kvm_get_vcpu from -1 to -ENOENT
   - Reset the value of 'gdb_num_g_regs' in gdb_unregister_coprocessor_all
   - Fixed the kvm_{create,park}_vcpu prototypes docs
   - Added Reviewed-by tags for patches {2,3,4,5,6,7,9,10}
3. Addressed one earlier missed comment by Alex Bennée in RFC V1
   - Added traces instead of DPRINTF in the newly added and some existing 
functions
Link: 
https://lore.kernel.org/qemu-devel/20230930001933.2660-1-salil.me...@huawei.com/

Patch-set V1 -> V2
1. Addressed Alex Bennée's comments
   - Refactored the kvm_create_vcpu logic to get rid of goto
   - Added the docs for kvm_{create,park}_vcpu prototypes
   - Splitted the gdbstub and AddressSpace destruction change into separate 
patches
   - Added Reviewed-by tags for patches {2,10}
Link: 
https://lore.kernel.org/qemu-devel/20230929124304.13672-1-salil.me...@huawei.com/

References:

[1] 
https://lore.kernel.org/qemu-devel/20230926100436.28284-1-salil.me...@huawei.com/
[2]

[PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-05-20 Thread David Hubbard

From: Cord Amfmgm 

This changes the way the ohci emulation handles a Transfer Descriptor with
"Current Buffer Pointer" set to "Buffer End" + 1.

The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
to signal the buffer has zero length. Currently qemu only accepts zero-length
Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
accepts both cases.

The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
and earlier matched the spec. (I haven't taken the time to bisect exactly
where the logic was changed.)

With a tiny OS[1] that boots and executes a test, the issue can be seen:

* OS that sends USB requests to a USB mass storage device
  but sends td.cbp = td.be + 1
* qemu 4.2
* qemu HEAD (4e66a0854)
* Actual OHCI controller (hardware)

Command line:
qemu-system-x86_64 -m 20 \
 -device pci-ohci,id=ohci \
 -drive if=none,format=raw,id=d,file=testmbr.raw \
 -device usb-storage,bus=ohci.0,drive=d \
 --trace "usb_*" --trace "ohci_*" -D qemu.log

Results are:

 qemu 4.2   | qemu HEAD  | actual HW
++
 works fine | ohci_die() | works fine

Tip: if the flags "-serial pty -serial stdio" are added to the command line
the test will output USB requests like this:

Testing qemu HEAD:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=c20920
>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host err
> usb stopped

And in qemu.log:

usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > 
next_offset=0x00c2090f

Testing qemu 4.2:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=620920
>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0 
> be=62090f
>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0 
> be=62090f
>rx { 12 1 0 2 0 0 0 8 }
> setup { 0 5 1 0 0 0 0 0 } tx {}
> ED info=8 { mps=8 en=0 d=0 } tail=620880
>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0 
> be=620907
> setup { 80 6 0 1 0 0 12 0 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0 
> be=620927
>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0 
> be=620939
>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0 
> be=620939
>rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> setup { 80 6 0 2 0 0 0 1 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0 
> be=620a27
>   td1 6209a0 nxt=6209c0 f3140004in cbp=620a28 be=620b27   cbp=620a48 
> be=620b27
>   td2 6209c0 nxt=620880 f308   out cbp=620b28 be=620b27   cbp=0 
> be=620b27
>rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 
> }
> setup { 0 9 1 0 0 0 0 0 } tx {}
> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>   td0 620880 nxt=620940 f200 setup cbp=620a00 be=620a07   cbp=0 
> be=620a07
>   td1 620940 nxt=620900 f310in cbp=620a08 be=620a07   cbp=0 
> be=620a07

[1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru,
and kra...@redhat.com:

* testCbpOffBy1.img.xz
* sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed

Signed-off-by: Cord Amfmgm 
---
 hw/usb/hcd-ohci.c   | 4 ++--
 hw/usb/trace-events | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index acd6016980..71b54914d3 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed 
*ed)
 if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
 len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
 } else {
-if (td.cbp > td.be) {
-trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
+if (td.cbp - 1 > td.be) {  /* rely on td.cbp != 0 */
+trace_usb_ohci_td_bad_buf(td.cbp, td.be);
 ohci_die(ohci);
 return 1;
 }
diff --git a/hw/usb/trace-events b/hw/usb/trace-events
index fd7b90d70c..fe282e7876 100644
--- a/hw/usb/trace-events
+++ b/hw/usb/trace-events
@@ -29,6 +29,7 @@ usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
 usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
 usb_ohci_iso_td_bad_response(int ret) "Bad device response %d"
 usb_ohci_td_bad_pid(const char *s, uint32_t edf, uint32_t tdf) "Bad pid %s: 
ed.flags 0x%x td.flags 0x%x"
+usb_ohci_td_bad_buf(uint32_t cbp, uint32_t be) "Bad cbp = 0x%x > be = 0x%x"

Re: [PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-05-20 Thread Cord Amfmgm

On Mon, May 20, 2024 at 6:24 PM David Hubbard  wrote:

> From: Cord Amfmgm 
>
> This changes the way the ohci emulation handles a Transfer Descriptor with
> "Current Buffer Pointer" set to "Buffer End" + 1.
>

Please disregard, this patch is no different from the previous one sent a
couple weeks ago. Resending...


>
> The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than
> td.be
> to signal the buffer has zero length. Currently qemu only accepts
> zero-length
> Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI
> hardware
> accepts both cases.
>
> The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
> and earlier matched the spec. (I haven't taken the time to bisect exactly
> where the logic was changed.)
>
> With a tiny OS[1] that boots and executes a test, the issue can be seen:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends td.cbp = td.be + 1
> * qemu 4.2
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 4.2   | qemu HEAD  | actual HW
> ++
>  works fine | ohci_die() | works fine
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command line
> the test will output USB requests like this:
>
> Testing qemu HEAD:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=8 { mps=8 en=0 d=0 } tail=c20920
> >   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
> >   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
> >   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host
> err
> > usb stopped
>
> And in qemu.log:
>
> usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 >
> next_offset=0x00c2090f
>
> Testing qemu 4.2:
>
> > Free mem 2M ohci port2 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=8 { mps=8 en=0 d=0 } tail=620920
> >   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0
> be=620907
> >   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0
> be=62090f
> >   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0
> be=62090f
> >rx { 12 1 0 2 0 0 0 8 }
> > setup { 0 5 1 0 0 0 0 0 } tx {}
> > ED info=8 { mps=8 en=0 d=0 } tail=620880
> >   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0
> be=620907
> >   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0
> be=620907
> > setup { 80 6 0 1 0 0 12 0 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620960
> >   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0
> be=620927
> >   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0
> be=620939
> >   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0
> be=620939
> >rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> > setup { 80 6 0 2 0 0 0 1 }
> > ED info=80001 { mps=8 en=0 d=1 } tail=620880
> >   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0
> be=620a27
> >   td1 6209a0 nxt=6209c0 f3140004in cbp=620a28 be=620b27
>  cbp=620a48 be=620b27
> >   td2 6209c0 nxt=620880 f308   out cbp=620b28 be=620b27   cbp=0
> be=620b27
> >rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2
> 40 0 0 }
> > setup { 0 9 1 0 0 0 0 0 } tx {}
> > ED info=80001 { mps=8 en=0 d=1 } tail=620900
> >   td0 620880 nxt=620940 f200 setup cbp=620a00 be=620a07   cbp=0
> be=620a07
> >   td1 620940 nxt=620900 f310in cbp=620a08 be=620a07   cbp=0
> be=620a07
>
> [1] The OS disk image has been emailed to phi...@linaro.org,
> m...@tls.msk.ru,
> and kra...@redhat.com:
>
> * testCbpOffBy1.img.xz
> * sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed
>
> Signed-off-by: Cord Amfmgm 
> ---
>  hw/usb/hcd-ohci.c   | 4 ++--
>  hw/usb/trace-events | 1 +
>  2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index acd6016980..86caf5e43b 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
>  if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
>  len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>  } else {
> -if (td.cbp > td.be) {
> -trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> +if (td.cbp > td.be + 1) {
> +trace_usb_ohci_td_bad_buf(td.cbp, td.be);
>  ohci_die(ohci);
>  return 1;
>  }
> diff --git a/hw/usb/trace-events b/hw/usb/trace-events
> index fd7b90d70c..fe282e7876 100644
> --- a/hw/usb/trace-events
> +++ b/hw/usb/trace-events
> @@ -29,6 +29,7 @@

[PATCH] hw/usb/hcd-ohci: Fix ohci_service_td: accept valid TDs

2024-05-20 Thread David Hubbard

From: Cord Amfmgm 

This changes the way the ohci emulation handles a Transfer Descriptor with
"Current Buffer Pointer" set to "Buffer End" + 1.

The OHCI spec 4.3.1.2 Table 4-2 allows td.cbp to be one byte more than td.be
to signal the buffer has zero length. Currently qemu only accepts zero-length
Transfer Descriptors if the td.cbp is equal to 0, while actual OHCI hardware
accepts both cases.

The qemu ohci emulation has a regression in ohci_service_td. Version 4.2
and earlier matched the spec. (I haven't taken the time to bisect exactly
where the logic was changed.)

With a tiny OS[1] that boots and executes a test, the issue can be seen:

* OS that sends USB requests to a USB mass storage device
  but sends td.cbp = td.be + 1
* qemu 4.2
* qemu HEAD (4e66a0854)
* Actual OHCI controller (hardware)

Command line:
qemu-system-x86_64 -m 20 \
 -device pci-ohci,id=ohci \
 -drive if=none,format=raw,id=d,file=testmbr.raw \
 -device usb-storage,bus=ohci.0,drive=d \
 --trace "usb_*" --trace "ohci_*" -D qemu.log

Results are:

 qemu 4.2   | qemu HEAD  | actual HW
++
 works fine | ohci_die() | works fine

Tip: if the flags "-serial pty -serial stdio" are added to the command line
the test will output USB requests like this:

Testing qemu HEAD:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=c20920
>   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
>   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
>   td2 c20980 nxt=c20920 f308   out cbp=c20910 be=c2090f ohci20 host err
> usb stopped

And in qemu.log:

usb_ohci_iso_td_bad_cc_overrun ISO_TD start_offset=0x00c20910 > 
next_offset=0x00c2090f

Testing qemu 4.2:

> Free mem 2M ohci port2 conn FS
> setup { 80 6 0 1 0 0 8 0 }
> ED info=8 { mps=8 en=0 d=0 } tail=620920
>   td0 620880 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620980 f314in cbp=620908 be=62090f   cbp=0 
> be=62090f
>   td2 620980 nxt=620920 f308   out cbp=620910 be=62090f   cbp=0 
> be=62090f
>rx { 12 1 0 2 0 0 0 8 }
> setup { 0 5 1 0 0 0 0 0 } tx {}
> ED info=8 { mps=8 en=0 d=0 } tail=620880
>   td0 620920 nxt=620960 f200 setup cbp=620900 be=620907   cbp=0 
> be=620907
>   td1 620960 nxt=620880 f310in cbp=620908 be=620907   cbp=0 
> be=620907
> setup { 80 6 0 1 0 0 12 0 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620960
>   td0 620880 nxt=6209c0 f200 setup cbp=620920 be=620927   cbp=0 
> be=620927
>   td1 6209c0 nxt=6209e0 f314in cbp=620928 be=620939   cbp=0 
> be=620939
>   td2 6209e0 nxt=620960 f308   out cbp=62093a be=620939   cbp=0 
> be=620939
>rx { 12 1 0 2 0 0 0 8 f4 46 1 0 0 0 1 2 3 1 }
> setup { 80 6 0 2 0 0 0 1 }
> ED info=80001 { mps=8 en=0 d=1 } tail=620880
>   td0 620960 nxt=6209a0 f200 setup cbp=620a20 be=620a27   cbp=0 
> be=620a27
>   td1 6209a0 nxt=6209c0 f3140004in cbp=620a28 be=620b27   cbp=620a48 
> be=620b27
>   td2 6209c0 nxt=620880 f308   out cbp=620b28 be=620b27   cbp=0 
> be=620b27
>rx { 9 2 20 0 1 1 4 c0 0 9 4 0 0 2 8 6 50 0 7 5 81 2 40 0 0 7 5 2 2 40 0 0 
> }
> setup { 0 9 1 0 0 0 0 0 } tx {}
> ED info=80001 { mps=8 en=0 d=1 } tail=620900
>   td0 620880 nxt=620940 f200 setup cbp=620a00 be=620a07   cbp=0 
> be=620a07
>   td1 620940 nxt=620900 f310in cbp=620a08 be=620a07   cbp=0 
> be=620a07

[1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru,
and kra...@redhat.com:

* testCbpOffBy1.img.xz
* sha256: f87baddcb86de845de12f002c698670a426affb40946025cc32694f9daa3abed

Signed-off-by: Cord Amfmgm 
---
 hw/usb/hcd-ohci.c   | 4 ++--
 hw/usb/trace-events | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
index acd6016980..86caf5e43b 100644
--- a/hw/usb/hcd-ohci.c
+++ b/hw/usb/hcd-ohci.c
@@ -941,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct ohci_ed 
*ed)
 if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
 len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
 } else {
-if (td.cbp > td.be) {
-trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
+if (td.cbp > td.be + 1) {
+trace_usb_ohci_td_bad_buf(td.cbp, td.be);
 ohci_die(ohci);
 return 1;
 }
diff --git a/hw/usb/trace-events b/hw/usb/trace-events
index fd7b90d70c..fe282e7876 100644
--- a/hw/usb/trace-events
+++ b/hw/usb/trace-events
@@ -29,6 +29,7 @@ usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
 usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
 usb_ohci_iso_td_bad_response(int ret) "Bad device response %d"
 usb_ohci_td_bad_pid(const char *s, uint32_t edf, uint32_t tdf) "Bad pid %s: 
ed.flags 0x%x td.flags 0x%x"
+usb_ohci_td_bad_buf(uint32_t cbp, uint32_t be) "Bad cbp = 0x%x > be = 0x%x"
 usb_ohci_port_attach(int

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-20 Thread Fabiano Rosas

Steven Sistare  writes:

> Hi Peter, Hi Fabiano,
>Will you have time to review the migration guts of this series any time 
> soon?
> In particular:
>
> [PATCH V1 05/26] migration: precreate vmstate
> [PATCH V1 06/26] migration: precreate vmstate for exec
> [PATCH V1 12/26] migration: vmstate factory object
> [PATCH V1 18/26] migration: cpr-exec-args parameter
> [PATCH V1 20/26] migration: cpr-exec mode
>

I'll get to them this week. I'm trying to make some progress with my own
code before I forget how to program. I'm also trying to find some time
to implement the device options in the migration tests so we can stop
these virtio-* breakages that have been popping up.

Re: hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-20 Thread Cord Amfmgm

On Mon, May 20, 2024 at 12:05 PM Peter Maydell 
wrote:

> On Tue, 6 Feb 2024 at 13:25, Cord Amfmgm  wrote:
> >
> > This changes the ohci validation to not assert if invalid
> > data is fed to the ohci controller. The poc suggested in
> > https://bugs.launchpad.net/qemu/+bug/1907042
> > and then migrated to bug #303 does the following to
> > feed it a SETUP pid and EndPt of 1:
> >
> > uint32_t MaxPacket = 64;
> > uint32_t TDFormat = 0;
> > uint32_t Skip = 0;
> > uint32_t Speed = 0;
> > uint32_t Direction = 0;  /* #define OHCI_TD_DIR_SETUP 0 */
> > uint32_t EndPt = 1;
> > uint32_t FuncAddress = 0;
> > ed->attr = (MaxPacket << 16) | (TDFormat << 15) | (Skip << 14)
> >| (Speed << 13) | (Direction << 11) | (EndPt << 7)
> >| FuncAddress;
> > ed->tailp = /*TDQTailPntr= */ 0;
> > ed->headp = ((/*TDQHeadPntr= */ [0]) & 0xfff0)
> >| (/* ToggleCarry= */ 0 << 1);
> > ed->next_ed = (/* NextED= */ 0 & 0xfff0)
> >
> > qemu-fuzz also caught the same issue in #1510. They are
> > both fixed by this patch.
> >
> > The if (td.cbp > td.be) logic in ohci_service_td() causes an
> > ohci_die(). My understanding of the OHCI spec 4.3.1.2
> > Table 4-2 allows td.cbp to be one byte more than td.be to
> > signal the buffer has zero length. The new check in qemu
> > appears to have been added since qemu-4.2. This patch
> > includes both fixes since they are located very close
> > together.
>
> For the "zero length buffer" case, do you have a more detailed
> pointer to the bit of the spec that says that "cbp = be + 1" is a
> valid way to specify a zero length buffer? Table 4-2 in the copy I
> have says for CurrentBufferPointer "a value of 0 indicates
> a zero-length data packet or that all bytes have been transferred",
> and the sample host OS driver function QueueGeneralRequest()
> later in the spec handles a 0 length packet by setting
>   TD->HcTD.CBP = TD->HcTD.BE = 0;
> (which our emulation's code does handle).
>

Reading the spec carefully, a CBP set to 0 should always mean the
zero-length buffer case (or that all bytes have been transferred, so the
buffer is now zero-length - the same thing).

Table 4-2 is the correct reference, and this part is clear. It's the part
you quoted. "Contains the physical address of the next memory location that
will be accessed for transfer to/from the endpoint. A value of 0 indicates
a zero-length data packet or that all bytes have been transferred."

Table 4-2 has this additional nugget that may be confusingly worded, for
BufferEnd: "Contains physical address of the last byte in the buffer for
this TD"

As you say, QueueGeneralRequest() handles a 0 length packet by setting CBP
= BE = 0.

There's a little bit more right below Table 4-2 in section 4.3.1.3.1:

"The CurrentBufferPointer value in the General TD is the address of the
data buffer that will be used for a data packet transfer to/from the
endpoint addressed by the ED. When the transfer is completed without an
error of any kind, the Host Controller advances the value of
CurrentBufferPointer by the number of bytes transferred"

I'll put it in the context of an example buffer of length 8. Perhaps this
is the easiest answer about Table 4-2's BufferEnd definition...

char buf[8];
char * CurrentBufferPointer = [0];
char * BufferEnd = [7]; // "address of the last byte in the buffer"
// The OHCI Host Controller than advances CurrentBufferPointer like this:
CurrentBufferPointer += 8
// After the transfer:
// CurrentBufferPointer = [8];
// BufferEnd = [7];

And here's an example buffer of length 0 -- you probably already know what
I'm going to do here:

char buf[0];
char * CurrentBufferPointer = [0];
char * BufferEnd = [-1]; // "address of the last byte in the buffer"
// The OHCI Host Controller than advances CurrentBufferPointer like this:
CurrentBufferPointer += 0
// After the transfer:
// CurrentBufferPointer = [0];
// BufferEnd = [-1];


> > @@ -936,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> > ohci_ed *ed)
> >  if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
> >  len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
> >  } else {
> > -if (td.cbp > td.be) {
> > -trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> > +if (td.cbp > td.be + 1) {
>
> I think this has an overflow if td.be is 0x.
>

Opps, yes. I will submit a revised patch. Since this change is protected
inside a condition if (td.cbp && td.be) I plan to rewrite it as:

if (td.cbp - 1 > td.be) { // rely on td.cbp != 0


>
> > +trace_usb_ohci_td_bad_buf(td.cbp, td.be);
> >  ohci_die(ohci);
> >  return 1;
> >  }
>
> (On the other hand having looked at the code I'm happy
> now that having a len of 0 passed into usb_packet_addbuf()
> is OK because we already do that for the "cbp =

Re: [PATCH] hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1

2024-05-20 Thread Fabiano Rosas

Fiona Ebner  writes:

> Migration from an 8.2 or 9.0 binary to an 8.1 binary with machine
> version 8.1 can fail with:
>
>> kvm: Features 0x1c0010130afffa7 unsupported. Allowed features: 0x10179bfffe7
>> kvm: Failed to load virtio-net:virtio
>> kvm: error while loading state for instance 0x0 of device 
>> ':00:12.0/virtio-net'
>> kvm: load of migration failed: Operation not permitted
>
> The series
>
> 53da8b5a99 virtio-net: Add support for USO features
> 9da1684954 virtio-net: Add USO flags to vhost support.
> f03e0cf63b tap: Add check for USO features
> 2ab0ec3121 tap: Add USO support to tap device.
>
> only landed in QEMU 8.2, so the compatibility flags should be part of
> machine version 8.1.
>
> Moving the flags unfortunately breaks forward migration with machine
> version 8.1 from a binary without this patch to a binary with this
> patch.
>
> Fixes: 53da8b5a99 ("virtio-net: Add support for USO features")
> Signed-off-by: Fiona Ebner 

Reviewed-by: Fabiano Rosas 

I'll get to it eventually, but is this another one where just having
-device virtio-net in the command line when testing cross-version
migration would already have caught the issue?

Re: [PATCH 1/2] hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-20 Thread Cord Amfmgm

On Mon, May 20, 2024 at 11:55 AM Peter Maydell 
wrote:

> On Thu, 9 May 2024 at 01:30, David Hubbard  wrote:
> >
> > From: Cord Amfmgm 
> >
> > This changes the ohci validation to not assert if invalid data is fed to
> the
> > ohci controller. The poc in https://bugs.launchpad.net/qemu/+bug/1907042
> and
> > migrated to bug #303 does the following to feed it a SETUP pid (valid)
> > at an EndPt of 1 (invalid - all SETUP pids must be addressed to EndPt 0):
> >
> > uint32_t MaxPacket = 64;
> > uint32_t TDFormat = 0;
> > uint32_t Skip = 0;
> > uint32_t Speed = 0;
> > uint32_t Direction = 0;  /* #define OHCI_TD_DIR_SETUP 0 */
> > uint32_t EndPt = 1;
> > uint32_t FuncAddress = 0;
> > ed->attr = (MaxPacket << 16) | (TDFormat << 15) | (Skip << 14)
> >| (Speed << 13) | (Direction << 11) | (EndPt << 7)
> >| FuncAddress;
> > ed->tailp = /*TDQTailPntr= */ 0;
> > ed->headp = ((/*TDQHeadPntr= */ [0]) & 0xfff0)
> >| (/* ToggleCarry= */ 0 << 1);
> > ed->next_ed = (/* NextED= */ 0 & 0xfff0)
> >
> > qemu-fuzz also caught the same issue in #1510. They are both fixed by
> this
> > patch.
> >
> > With a tiny OS[1] that boots and executes the poc the repro shows the
> issue:
> >
> > * OS that sends USB requests to a USB mass storage device
> >   but sends a SETUP with EndPt = 1
> > * qemu 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.19)
> > * qemu HEAD (4e66a0854)
> > * Actual OHCI controller (hardware)
> >
> > Command line:
> > qemu-system-x86_64 -m 20 \
> >  -device pci-ohci,id=ohci \
> >  -drive if=none,format=raw,id=d,file=testmbr.raw \
> >  -device usb-storage,bus=ohci.0,drive=d \
> >  --trace "usb_*" --trace "ohci_*" -D qemu.log
> >
> > Results are:
> >
> >  qemu 6.2.0 | qemu HEAD | actual HW
> > +---+
> >  assertion  | assertion | sets stall bit
> >
> > The assertion message is:
> >
> > > qemu-system-x86_64: ../../hw/usb/core.c:744: usb_ep_get: Assertion
> `pid == USB_TOKEN_IN || pid == USB_TOKEN_OUT' failed.
> > > Aborted (core dumped)
> >
> > Tip: if the flags "-serial pty -serial stdio" are added to the command
> line
> > the poc outputs its USB requests like this:
> >
> > > Free mem 2M ohci port0 conn FS
> > > setup { 80 6 0 1 0 0 8 0 }
> > > ED info=8 { mps=8 en=0 d=0 } tail=c20920
> > >   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907
>  cbp=0 be=c20907
> > >   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f
>  cbp=0 be=c2090f
> > >   td2 c20980 nxt=c20920 f308   out cbp=0 be=0
>  cbp=0 be=0
> > >rx { 12 1 0 2 0 0 0 8 }
> > > setup { 0 5 1 0 0 0 0 0 } tx {}
> > > ED info=8 { mps=8 en=0 d=0 } tail=c20880
> > >   td0 c20920 nxt=c20960 f200 setup cbp=c20900 be=c20907
>  cbp=0 be=c20907
> > >   td1 c20960 nxt=c20880 f310in cbp=0 be=0
>  cbp=0 be=0
> > > setup { 80 6 0 1 0 0 12 0 }
> > > ED info=80081 { mps=8 en=0 d=1 } tail=c20960
> > >   td0 c20880 nxt=c209c0 f200 setup cbp=c20920 be=c20927
> > >   td1 c209c0 nxt=c209e0 f314in cbp=c20928 be=c20939
> > >   td2 c209e0 nxt=c20960 f308   out cbp=0 be=0qemu-system-x86_64:
> ../../hw/usb/core.c:744: usb_ep_get: Assertion `pid == USB_TOKEN_IN || pid
> == USB_TOKEN_OUT' failed.
> > > Aborted (core dumped)
> >
> > [1] The OS disk image has been emailed to phi...@linaro.org,
> m...@tls.msk.ru,
> > and kra...@redhat.com:
> >
> > * testBadSetup.img.xz
> > * sha256:
> 045b43f4396de02b149518358bf8025d5ba11091e86458875339fc649e6e5ac6
> >
> > Signed-off-by: Cord Amfmgm 
> > ---
> >  hw/usb/hcd-ohci.c   | 5 +
> >  hw/usb/trace-events | 1 +
> >  2 files changed, 6 insertions(+)
> >
> > diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> > index fc8fc91a1d..acd6016980 100644
> > --- a/hw/usb/hcd-ohci.c
> > +++ b/hw/usb/hcd-ohci.c
> > @@ -927,6 +927,11 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
> >  case OHCI_TD_DIR_SETUP:
> >  str = "setup";
> >  pid = USB_TOKEN_SETUP;
> > +if (OHCI_BM(ed->flags, ED_EN) > 0) {  /* setup only allowed to
> ep 0 */
> > +trace_usb_ohci_td_bad_pid(str, ed->flags, td.flags);
> > +ohci_die(ohci);
> > +return 1;
> > +}
> >  break;
> >  default:
> >  trace_usb_ohci_td_bad_direction(dir);
> > diff --git a/hw/usb/trace-events b/hw/usb/trace-events
> > index ed7dc210d3..fd7b90d70c 100644
> > --- a/hw/usb/trace-events
> > +++ b/hw/usb/trace-events
> > @@ -28,6 +28,7 @@ usb_ohci_iso_td_data_overrun(int ret, ssize_t len)
> "DataOverrun %d > %zu"
> >  usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
> >  usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
> >  usb_ohci_iso_td_bad_response(int ret) "Bad device response %d"
> > +usb_ohci_td_bad_pid(const char *s, uint32_t edf, uint32_t tdf) "Bad pid
> %s: ed.flags 0x%x td.flags 0x%x"
> >  usb_ohci_port_attach(int index) "port #%d"
> >

[PATCH] hw/loongarch/virt: Fix FDT memory node address width

2024-05-20 Thread Jiaxun Yang

Higher bits for memory nodes were omitted at qemu_fdt_setprop_cells.

Signed-off-by: Jiaxun Yang 
---
This should be stable backported, otherwise DT boot is totally broken.
---
 hw/loongarch/virt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index f0640d2d8035..f97626bacf65 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -463,7 +463,8 @@ static void fdt_add_memory_node(MachineState *ms,
 char *nodename = g_strdup_printf("/memory@%" PRIx64, base);
 
 qemu_fdt_add_subnode(ms->fdt, nodename);
-qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0, base, 0, size);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", base >> 32, base,
+   size >> 32, size);
 qemu_fdt_setprop_string(ms->fdt, nodename, "device_type", "memory");
 
 if (ms->numa_state && ms->numa_state->num_nodes) {

---
base-commit: 85ef20f1673feaa083f4acab8cf054df77b0dbed
change-id: 20240520-loongarch-fdt-memnode-e36c01ae9b6e

Best regards,
-- 
Jiaxun Yang

Fwd: spapr-vlan hotplug

2024-05-20 Thread Marcos Jean Sampaio

-- Forwarded message -
De: Marcos Jean Sampaio 
Date: sáb., 18 de mai. de 2024 às 18:02
Subject: spapr-vlan hotplug
To: , 

Hello guys,

First, I would like to thank you for making this software possible! Many
thanks!

I just installed and ran AIX and PowerHA on QEMU successfully using virsh
and virt-manager. I shared my experience on Youtube videos and also on my
github.
It worked pretty well after some adjustments but something that I still
can't do is hotplug a network interface. I've tried using virtio and
spapr-vlan but in both cases the new device is not hotplugged. In the case
of virtio it allows me to add it but isn't recognized after cfgmgr. In the
case of spapr-vlan it just will be recognized and active in the next boot.

virsh attach-device --live aix_7200-04-02-2027_node01 netdev.xml
error: Failed to attach device from netdev.xml
error: internal error: unable to execute QEMU command 'device_add': Device
'spapr-vlan' can't go on PCI bus

netdev.xml content

I also tried add it using qemu-monitor and have the following message:

virsh qemu-monitor-command aix_7200-04-02-2027_node01 --hmp "device_add
spapr-vlan,netdev=hostnet5,id=net5,mac=56:44:45:30:31:55,reg=0x5000"
Error: Bus 'spapr-vio' does not support hotplugging

Is there any other way to do this or it isn't working for network
interfaces? For disks it is working very well. Below you have my
environment details.

My videos
https://www.youtube.com/playlist?list=PLWNnbCzUTMSY6c6rjKtGuSAzHCPONExv2

https://github.com/mjsamp/AIX-on-qemu-ppc64

Environment
Lenovo ThinkPad E480 Intel® Core™ i3-8130U CPU @ 2.20GHz × 4 8G Mem
XrayDisk 240GB SSD
Ubuntu 20.04.6 LTS (Focal Fossa) 64-bit
Kernel Linux 5.4.0-181-generic x86_64
QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.28)
libvirt QEMU Driver 6.0.0-0ubuntu8.20
AIX aix_7200-04-02-2027_1of2_072020.iso

Regardings,

Marcos Jean Sampaio

Re: [PATCH] hw/riscv/virt: Add hotplugging and virtio-md-pci support

2024-05-20 Thread Daniel Henrique Barboza





On 5/20/24 15:51, Björn Töpel wrote:

Daniel/David,

Daniel Henrique Barboza  writes:


On 5/18/24 16:50, David Hildenbrand wrote:


Hi,



diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
index 4fdb66052587..16c2bdbfe6b6 100644
--- a/hw/riscv/virt.c
+++ b/hw/riscv/virt.c
@@ -53,6 +53,8 @@
    #include "hw/pci-host/gpex.h"
    #include "hw/display/ramfb.h"
    #include "hw/acpi/aml-build.h"
+#include "hw/mem/memory-device.h"
+#include "hw/virtio/virtio-mem-pci.h"
    #include "qapi/qapi-visit-common.h"
    #include "hw/virtio/virtio-iommu.h"
@@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
    DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
    int i, base_hartid, hart_count;
    int socket_count = riscv_socket_count(machine);
+    hwaddr device_memory_base, device_memory_size;
    /* Check socket count limit */
    if (VIRT_SOCKETS_MAX < socket_count) {
@@ -1553,6 +1556,25 @@ static void virt_machine_init(MachineState *machine)
    memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
    mask_rom);
+    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + 
machine->ram_size,
+  GiB);
+    device_memory_size = machine->maxram_size - machine->ram_size;
+
+    if (riscv_is_32bit(>soc[0])) {
+    hwaddr memtop = device_memory_base + ROUND_UP(device_memory_size, GiB);
+
+    if (memtop > UINT32_MAX) {
+    error_report("Memory exceeds 32-bit limit by %lu bytes",
+ memtop - UINT32_MAX);
+    exit(EXIT_FAILURE);
+    }
+    }
+
+    if (device_memory_size > 0) {
+    machine_memory_devices_init(machine, device_memory_base,
+    device_memory_size);
+    }
+


I think we need a design discussion before proceeding here. You're allocating 
all
available memory as a memory device area, but in theory we might also support
pc-dimm hotplugs (which would be the equivalent of adding physical RAM dimms to
the board.) in the future too. If you're not familiar with this feature you can
check it out the docs in [1].


Note that DIMMs are memory devices as well. You can plug into the memory device 
area both, ACPI-based memory devices (DIMM, NVDIMM) or virtio-based memory 
devices (virtio-mem, virtio-pmem).



As an example, the 'virt' ARM board (hw/arm/virt.c) reserves a space for this
type of hotplug by checking how much 'ram_slots' we're allocating for it:

device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;



Note that we increased the region size to be able to fit most requests even if 
alignment of memory devices is weird. See below.

In sane setups, this is usually not required (adding a single additional GB for 
some flexiility might be good enough).


Other boards do the same with ms->ram_slots. We should consider doing it as 
well,
now, even if we're not up to the point of supporting pc-dimm hotplug, to avoid
having to change the memory layout later in the road and breaking existing
setups.

If we want to copy the ARM board, ram_slots is capped to ACPI_MAX_RAM_SLOTS 
(256).
Each RAM slot is considered to be a 1GiB dimm, i.e. we would reserve 256GiB for
them.


This only reserves some *additional* space to fixup weird alignment of memory 
devices. *not* the actual space for these devices.

We don't consider each DIMM to be 1 GiB in size, but add an additional 1 GiB in 
case we have to align DIMMs in physical address space.

I *think* this dates back to old x86 handling where we aligned the address of 
each DIMM to be at a 1 GiB boundary. So if you would have plugged two 128 MiB 
DIMMs, you'd have required more than 256 MiB of space in the area after 
aligning inside the memory device area.



Thanks for the explanation. I missed the part where the ram_slots were being
used just to solve potential alignment issues and pc-dimms could occupy the same
space being allocated via machine_memory_devices_init().

This patch isn't far off then. If we take care to avoid plugging unaligned 
memory
we might not even need this spare area.


I'm a bit lost here, so please bare with me. We don't require the 1 GiB
alignment on RV AFAIU. I'm having a hard time figuring out what missing
in my patch.


Forget about the 1 GiB size. This is something that we won't need to deal with
because we don't align in 1 Gib.

Let's say for example that we want to support pc-dimm hotplug of 256 slots like 
the
'virt' ARM machine does. Let's also say that we will allow users to hotplug any
DIMM size they want, taking care of any alignment issues by ourselves.

In hw/riscv/boot.c I see that our alignment sizes are 4Mb for 32 bits and 2Mb 
for
64 bits. Forget 32 bits a bit and let's say that our alignment is 2Mb.

So, in a worst case scenario, an user could hotplug 256 slots, all of them 
unaligned,
and then we would need to align each one of them by adding 2Mb. So, to account 
for
this alignment

RE: [PATCH v2 3/4] target/hexagon: idef-parser fix leak of init_list

2024-05-20 Thread ltaylorsimpson




> -Original Message-
> From: Anton Johansson 
> Sent: Friday, May 10, 2024 9:53 AM
> To: qemu-devel@nongnu.org
> Cc: a...@rev.ng; ltaylorsimp...@gmail.com; bc...@quicinc.com
> Subject: [PATCH v2 3/4] target/hexagon: idef-parser fix leak of init_list
> 
> gen_inst_init_args() is called for instructions using a predicate as an
rvalue.
> Upon first call, the list of arguments which might need initialization
init_list is
> freed to indicate that they have been processed. For instructions without
an
> rvalue predicate,
> gen_inst_init_args() isn't called and init_list will never be freed.
> 
> Free init_list from free_instruction() if it hasn't already been freed.
> A comment in free_instruction is also updated.
> 
> Signed-off-by: Anton Johansson 

Reviewed-by: Taylor Simpson

RE: [PATCH v2 4/4] target/hexagon: idef-parser simplify predicate init

2024-05-20 Thread ltaylorsimpson




> -Original Message-
> From: Anton Johansson 
> Sent: Friday, May 10, 2024 9:53 AM
> To: qemu-devel@nongnu.org
> Cc: a...@rev.ng; ltaylorsimp...@gmail.com; bc...@quicinc.com
> Subject: [PATCH v2 4/4] target/hexagon: idef-parser simplify predicate
init
> 
> Only predicate instruction arguments need to be initialized by
idef-parser.
> This commit removes registers from the init_list and simplifies
> gen_inst_init_args() slightly.
> 
> Signed-off-by: Anton Johansson 

Reviewed-by: Taylor Simpson

RE: [PATCH] Hexagon: fix HVX store new

2024-05-20 Thread ltaylorsimpson




> -Original Message-
> From: Matheus Tavares Bernardino 
> Sent: Monday, May 20, 2024 10:53 AM
> To: qemu-devel@nongnu.org
> Cc: ltaylorsimp...@gmail.com; sidn...@quicinc.com; bc...@quicinc.com;
> richard.hender...@linaro.org; a...@rev.ng; a...@rev.ng
> Subject: [PATCH] Hexagon: fix HVX store new
> 
> At 09a7e7db0f (Hexagon (target/hexagon) Remove uses of
> op_regs_generated.h.inc, 2024-03-06), we've changed the logic of
> check_new_value() to use the new pre-calculated
> packet->insn[...].dest_idx instead of calculating the index on the fly
> using opcode_reginfo[...]. The dest_idx index is calculated roughly like
the
> following:
> 
> for reg in iset[tag]["syntax"]:
> if reg.is_written():
> dest_idx = regno
> break
> 
> Thus, we take the first register that is writtable. Before that, however,
we
> also used to follow an alphabetical order on the register
> type: 'd', 'e', 'x', and 'y'. No longer following that makes us select the
wrong
> register index and the HVX store new instruction does not update the
> memory like expected.
> 
> Signed-off-by: Matheus Tavares Bernardino 

Reviewed-by: Taylor Simpson

Re: [PATCH] hw/riscv/virt: Add hotplugging and virtio-md-pci support

2024-05-20 Thread Daniel Henrique Barboza





On 5/20/24 15:33, Björn Töpel wrote:

Daniel,

Thanks for taking a look!

Daniel Henrique Barboza  writes:


Hi Björj,

On 5/14/24 08:06, Björn Töpel wrote:

From: Björn Töpel 

Virtio-based memory devices allows for dynamic resizing of virtual
machine memory, and requires proper hotplugging (add/remove) support
to work.

Enable virtio-md-pci with the corresponding missing hotplugging
callbacks for the RISC-V "virt" machine.

Signed-off-by: Björn Töpel 
---
This is basic support for MHP that works with DT. There some minor
ACPI SRAT plumbing in there as well. Ideally we'd like proper ACPI MHP
support as well. I have a branch [1], where I've applied this patch,
plus ACPI GED/PC-DIMM MHP support on top of Sunil's QEMU branch
(contains some ACPI DSDT additions) [2], for the curious/brave ones.
However, the ACPI MHP support this is not testable on upstream Linux
yet (ACPI AIA support, and ACPI NUMA SRAT series are ongoing).

I'll follow-up with proper ACPI GED/PC-DIMM MHP patches, once the
dependencies land (Linux kernel and QEMU).

I'll post the Linux MHP/virtio-mem v2 patches later this week!


Cheers,
Björn

[1] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi/
[2] 
https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-suni...@ventanamicro.com/
---
   hw/riscv/Kconfig   |  2 ++
   hw/riscv/virt-acpi-build.c |  7 +
   hw/riscv/virt.c| 64 +-
   hw/virtio/virtio-mem.c |  2 +-
   4 files changed, 73 insertions(+), 2 deletions(-)

diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
index a2030e3a6ff0..08f82dbb681a 100644
--- a/hw/riscv/Kconfig
+++ b/hw/riscv/Kconfig
@@ -56,6 +56,8 @@ config RISCV_VIRT
   select PLATFORM_BUS
   select ACPI
   select ACPI_PCI
+select VIRTIO_MEM_SUPPORTED
+select VIRTIO_PMEM_SUPPORTED
   
   config SHAKTI_C

   bool
diff --git a/hw/riscv/virt-acpi-build.c b/hw/riscv/virt-acpi-build.c
index 0925528160f8..6dc3baa9ec86 100644
--- a/hw/riscv/virt-acpi-build.c
+++ b/hw/riscv/virt-acpi-build.c
@@ -610,6 +610,13 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
RISCVVirtState *vms)
   }
   }
   
+if (ms->device_memory) {

+build_srat_memory(table_data, ms->device_memory->base,
+  memory_region_size(>device_memory->mr),
+  ms->numa_state->num_nodes - 1,
+  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
+}
+
   acpi_table_end(linker, );


When the time comes I believe we'll want this chunk in a separated ACPI patch.


Hmm, I first thought about adding this to the ACPI MHP series, but then
realized that virtio-mem relies on SRAT for ACPI boots (again -- RISC-V
Linux does not support that upstream yet...).

Do you mean that you'd prefer this chunk in a separate patch?


TBH I wouldn't mind keeping this ACPI chunk here but I reckon that the ACPI
subsystem review is usually done in separate, with a different set of people
reviewing it and so on.

We might as well keep it here for now. If more ACPI changes ended up being done
(e.g. ACPI unit test changes) then doing a separated ACPI patch makes more 
sense.


Thanks,

Daniel





Björn

Re: [PATCH] hw/riscv/virt: Add hotplugging and virtio-md-pci support

2024-05-20 Thread Björn Töpel

Daniel/David,

Daniel Henrique Barboza  writes:

> On 5/18/24 16:50, David Hildenbrand wrote:
>> 
>> Hi,
>> 
>> 
 diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
 index 4fdb66052587..16c2bdbfe6b6 100644
 --- a/hw/riscv/virt.c
 +++ b/hw/riscv/virt.c
 @@ -53,6 +53,8 @@
    #include "hw/pci-host/gpex.h"
    #include "hw/display/ramfb.h"
    #include "hw/acpi/aml-build.h"
 +#include "hw/mem/memory-device.h"
 +#include "hw/virtio/virtio-mem-pci.h"
    #include "qapi/qapi-visit-common.h"
    #include "hw/virtio/virtio-iommu.h"
 @@ -1407,6 +1409,7 @@ static void virt_machine_init(MachineState *machine)
    DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
    int i, base_hartid, hart_count;
    int socket_count = riscv_socket_count(machine);
 +    hwaddr device_memory_base, device_memory_size;
    /* Check socket count limit */
    if (VIRT_SOCKETS_MAX < socket_count) {
 @@ -1553,6 +1556,25 @@ static void virt_machine_init(MachineState *machine)
    memory_region_add_subregion(system_memory, memmap[VIRT_MROM].base,
    mask_rom);
 +    device_memory_base = ROUND_UP(s->memmap[VIRT_DRAM].base + 
 machine->ram_size,
 +  GiB);
 +    device_memory_size = machine->maxram_size - machine->ram_size;
 +
 +    if (riscv_is_32bit(>soc[0])) {
 +    hwaddr memtop = device_memory_base + ROUND_UP(device_memory_size, 
 GiB);
 +
 +    if (memtop > UINT32_MAX) {
 +    error_report("Memory exceeds 32-bit limit by %lu bytes",
 + memtop - UINT32_MAX);
 +    exit(EXIT_FAILURE);
 +    }
 +    }
 +
 +    if (device_memory_size > 0) {
 +    machine_memory_devices_init(machine, device_memory_base,
 +    device_memory_size);
 +    }
 +
>>>
>>> I think we need a design discussion before proceeding here. You're 
>>> allocating all
>>> available memory as a memory device area, but in theory we might also 
>>> support
>>> pc-dimm hotplugs (which would be the equivalent of adding physical RAM 
>>> dimms to
>>> the board.) in the future too. If you're not familiar with this feature you 
>>> can
>>> check it out the docs in [1].
>> 
>> Note that DIMMs are memory devices as well. You can plug into the memory 
>> device area both, ACPI-based memory devices (DIMM, NVDIMM) or virtio-based 
>> memory devices (virtio-mem, virtio-pmem).
>> 
>>>
>>> As an example, the 'virt' ARM board (hw/arm/virt.c) reserves a space for 
>>> this
>>> type of hotplug by checking how much 'ram_slots' we're allocating for it:
>>>
>>> device_memory_size = ms->maxram_size - ms->ram_size + ms->ram_slots * GiB;
>>>
>> 
>> Note that we increased the region size to be able to fit most requests even 
>> if alignment of memory devices is weird. See below.
>> 
>> In sane setups, this is usually not required (adding a single additional GB 
>> for some flexiility might be good enough).
>> 
>>> Other boards do the same with ms->ram_slots. We should consider doing it as 
>>> well,
>>> now, even if we're not up to the point of supporting pc-dimm hotplug, to 
>>> avoid
>>> having to change the memory layout later in the road and breaking existing
>>> setups.
>>>
>>> If we want to copy the ARM board, ram_slots is capped to ACPI_MAX_RAM_SLOTS 
>>> (256).
>>> Each RAM slot is considered to be a 1GiB dimm, i.e. we would reserve 256GiB 
>>> for
>>> them.
>> 
>> This only reserves some *additional* space to fixup weird alignment of 
>> memory devices. *not* the actual space for these devices.
>> 
>> We don't consider each DIMM to be 1 GiB in size, but add an additional 1 GiB 
>> in case we have to align DIMMs in physical address space.
>> 
>> I *think* this dates back to old x86 handling where we aligned the address 
>> of each DIMM to be at a 1 GiB boundary. So if you would have plugged two 128 
>> MiB DIMMs, you'd have required more than 256 MiB of space in the area after 
>> aligning inside the memory device area.
>> 
>
> Thanks for the explanation. I missed the part where the ram_slots were being
> used just to solve potential alignment issues and pc-dimms could occupy the 
> same
> space being allocated via machine_memory_devices_init().
>
> This patch isn't far off then. If we take care to avoid plugging unaligned 
> memory
> we might not even need this spare area.

I'm a bit lost here, so please bare with me. We don't require the 1 GiB
alignment on RV AFAIU. I'm having a hard time figuring out what missing
in my patch.

[...]

>>> I see that David Hildenbrand is also CCed in the patch so he'll let us know 
>>> if
>>> I'm out of line with what I'm asking.
>> 
>> Supporting PC-DIMMs might be required at some point when dealing with OSes 
>> that don't support virtio-mem and friends.

...and also for testing the PC-DIMM ACPI patching path.

Re: [PATCH] physmem: allow debug writes to MMIO regions

2024-05-20 Thread Perry Hung


Philippe, Peter,

Thank you for the comments. I am not even sure what the semantics of 
putting a breakpoint or watchpoint
on device regions are supposed to be. I would imagine it is 
architecture-specific as to whether this is even allowed.


It appears for example, that armv8-a allows watchpoints to be set on any 
type of memory. armv7-a prohibits
watchpoints on Device or Strongly-ordered memory that might be accessed 
by instructions multiple times

(e.g LDM and LDC instructions).

What is the current behavior for QEMU and what should 
breakpoints/watchpoints do when placed on IO memory?


-perry

On 5/20/24 10:22 AM, Peter Maydell wrote:

On Wed, 15 May 2024 at 13:49, Philippe Mathieu-Daudé  wrote:

Hi Perry,

On 14/5/24 01:33, Perry Hung wrote:

Writes from GDB to memory-mapped IO regions are currently silently
dropped. cpu_memory_rw_debug() calls address_space_write_rom(), which
calls address_space_write_rom_internal(), which ignores all non-ram/rom
regions.

Add a check for MMIO regions and direct those to address_space_rw()
instead.


Reported-by: Andreas Rasmusson 
BugLink: https://bugs.launchpad.net/qemu/+bug/1625216


Resolves: https://gitlab.com/qemu-project/qemu/-/issues/213
Signed-off-by: Perry Hung 
---
   system/physmem.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/system/physmem.c b/system/physmem.c
index 342b7a8fd4..013cdd2ab1 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -3508,7 +3508,10 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
   if (l > len)
   l = len;
   phys_addr += (addr & ~TARGET_PAGE_MASK);
-if (is_write) {
+if (cpu_physical_memory_is_io(phys_addr)) {
+res = address_space_rw(cpu->cpu_ases[asidx].as, phys_addr, attrs,
+   buf, l, is_write);
+} else if (is_write) {
   res = address_space_write_rom(cpu->cpu_ases[asidx].as, phys_addr,
 attrs, buf, l);
   } else {

The other option is to make address_space_write_rom_internal()
also write to devices...


I wonder if we shouldn't be safer with a preliminary patch
adding a 'can_do_io' boolean argument to cpu_memory_rw_debug()
(updating the call sites), then this patch would become:

  if (can_do_io && cpu_physical_memory_is_io(phys_addr)) {

One of my worries for example is if someone accidently insert
a breakpoint at a I/O address, the device might change its
state and return MEMTX_OK which is confusing.

You can definitely do some silly things if we remove this
restriction.

On the other hand if you're using gdb as a debugger on real
(bare metal) hardware does anything stop you doing that?

-- PMM

Re: [PATCH] hw/riscv/virt: Add hotplugging and virtio-md-pci support

2024-05-20 Thread Björn Töpel

Daniel,

Thanks for taking a look!

Daniel Henrique Barboza  writes:

> Hi Björj,
>
> On 5/14/24 08:06, Björn Töpel wrote:
>> From: Björn Töpel 
>> 
>> Virtio-based memory devices allows for dynamic resizing of virtual
>> machine memory, and requires proper hotplugging (add/remove) support
>> to work.
>> 
>> Enable virtio-md-pci with the corresponding missing hotplugging
>> callbacks for the RISC-V "virt" machine.
>> 
>> Signed-off-by: Björn Töpel 
>> ---
>> This is basic support for MHP that works with DT. There some minor
>> ACPI SRAT plumbing in there as well. Ideally we'd like proper ACPI MHP
>> support as well. I have a branch [1], where I've applied this patch,
>> plus ACPI GED/PC-DIMM MHP support on top of Sunil's QEMU branch
>> (contains some ACPI DSDT additions) [2], for the curious/brave ones.
>> However, the ACPI MHP support this is not testable on upstream Linux
>> yet (ACPI AIA support, and ACPI NUMA SRAT series are ongoing).
>> 
>> I'll follow-up with proper ACPI GED/PC-DIMM MHP patches, once the
>> dependencies land (Linux kernel and QEMU).
>> 
>> I'll post the Linux MHP/virtio-mem v2 patches later this week!
>> 
>> 
>> Cheers,
>> Björn
>> 
>> [1] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi/
>> [2] 
>> https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-suni...@ventanamicro.com/
>> ---
>>   hw/riscv/Kconfig   |  2 ++
>>   hw/riscv/virt-acpi-build.c |  7 +
>>   hw/riscv/virt.c| 64 +-
>>   hw/virtio/virtio-mem.c |  2 +-
>>   4 files changed, 73 insertions(+), 2 deletions(-)
>> 
>> diff --git a/hw/riscv/Kconfig b/hw/riscv/Kconfig
>> index a2030e3a6ff0..08f82dbb681a 100644
>> --- a/hw/riscv/Kconfig
>> +++ b/hw/riscv/Kconfig
>> @@ -56,6 +56,8 @@ config RISCV_VIRT
>>   select PLATFORM_BUS
>>   select ACPI
>>   select ACPI_PCI
>> +select VIRTIO_MEM_SUPPORTED
>> +select VIRTIO_PMEM_SUPPORTED
>>   
>>   config SHAKTI_C
>>   bool
>> diff --git a/hw/riscv/virt-acpi-build.c b/hw/riscv/virt-acpi-build.c
>> index 0925528160f8..6dc3baa9ec86 100644
>> --- a/hw/riscv/virt-acpi-build.c
>> +++ b/hw/riscv/virt-acpi-build.c
>> @@ -610,6 +610,13 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
>> RISCVVirtState *vms)
>>   }
>>   }
>>   
>> +if (ms->device_memory) {
>> +build_srat_memory(table_data, ms->device_memory->base,
>> +  memory_region_size(>device_memory->mr),
>> +  ms->numa_state->num_nodes - 1,
>> +  MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
>> +}
>> +
>>   acpi_table_end(linker, );
>
> When the time comes I believe we'll want this chunk in a separated ACPI patch.

Hmm, I first thought about adding this to the ACPI MHP series, but then
realized that virtio-mem relies on SRAT for ACPI boots (again -- RISC-V
Linux does not support that upstream yet...).

Do you mean that you'd prefer this chunk in a separate patch?


Björn

Re: [PATCH V1 00/26] Live update: cpr-exec

2024-05-20 Thread Steven Sistare


Hi Peter, Hi Fabiano,
  Will you have time to review the migration guts of this series any time soon?
In particular:

[PATCH V1 05/26] migration: precreate vmstate
[PATCH V1 06/26] migration: precreate vmstate for exec
[PATCH V1 12/26] migration: vmstate factory object
[PATCH V1 18/26] migration: cpr-exec-args parameter
[PATCH V1 20/26] migration: cpr-exec mode

- Steve

On 4/29/2024 11:55 AM, Steve Sistare wrote:

This patch series adds the live migration cpr-exec mode.  In this mode, QEMU
stops the VM, writes VM state to the migration URI, and directly exec's a
new version of QEMU on the same host, replacing the original process while
retaining its PID.  Guest RAM is preserved in place, albeit with new virtual
addresses.  The user completes the migration by specifying the -incoming
option, and by issuing the migrate-incoming command if necessary.  This
saves and restores VM state, with minimal guest pause time, so that QEMU may
be updated to a new version in between.

The new interfaces are:
   * cpr-exec (MigMode migration parameter)
   * cpr-exec-args (migration parameter)
   * memfd-alloc=on (command-line option for -machine)
   * only-migratable-modes (command-line argument)

The caller sets the mode parameter before invoking the migrate command.

Arguments for the new QEMU process are taken from the cpr-exec-args parameter.
The first argument should be the path of a new QEMU binary, or a prefix
command that exec's the new QEMU binary, and the arguments should include
the -incoming option.

Memory backend objects must have the share=on attribute, and must be mmap'able
in the new QEMU process.  For example, memory-backend-file is acceptable,
but memory-backend-ram is not.

QEMU must be started with the '-machine memfd-alloc=on' option.  This causes
implicit RAM blocks (those not explicitly described by a memory-backend
object) to be allocated by mmap'ing a memfd.  Examples include VGA, ROM,
and even guest RAM when it is specified without without reference to a
memory-backend object.   The memfds are kept open across exec, their values
are saved in vmstate which is retrieved after exec, and they are re-mmap'd.

The '-only-migratable-modes cpr-exec' option guarantees that the
configuration supports cpr-exec.  QEMU will exit at start time if not.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

   # qemu-kvm -monitor stdio -object
   memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
   -m 4G -machine memfd-alloc=on ...

   QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running
   (qemu) migrate_set_parameter mode cpr-exec
   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
file:vm.state
   (qemu) migrate -d file:vm.state
   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running

cpr-exec mode preserves attributes of outgoing devices that must be known
before the device is created on the incoming side, such as the memfd descriptor
number, but currently the migration stream is read after all devices are
created.  To solve this problem, I add two VMStateDescription options:
precreate and factory.  precreate objects are saved to their own migration
stream, distinct from the main stream, and are read early by incoming QEMU,
before devices are created.  Factory objects are allocated on demand, without
relying on a pre-registered object's opaque address, which is necessary
because the devices to which the state will apply have not been created yet
and hence have not registered an opaque address to receive the state.

This patch series implements a minimal version of cpr-exec.  Future series
will add support for:
   * vfio
   * chardev's without loss of connectivity
   * vhost
   * fine-grained seccomp controls
   * hostmem-memfd
   * cpr-exec migration test


Steve Sistare (26):
   oslib: qemu_clear_cloexec
   vl: helper to request re-exec
   migration: SAVEVM_FOREACH
   migration: delete unused parameter mis
   migration: precreate vmstate
   migration: precreate vmstate for exec
   migration: VMStateId
   migration: vmstate_info_void_ptr
   migration: vmstate_register_named
   migration: vmstate_unregister_named
   migration: vmstate_register at init time
   migration: vmstate factory object
   physmem: ram_block_create
   physmem: hoist guest_memfd creation
   physmem: hoist host memory allocation
   physmem: set ram block idstr earlier
   machine: memfd-alloc option
   migration: cpr-exec-args parameter
   physmem: preserve ram blocks for cpr
   migration: cpr-exec mode
   migration: migrate_add_blocker_mode
   migration: ram block cpr-exec blockers
   migration: misc cpr-exec blockers
   seccomp: cpr-exec blocker
   migration: fix mismatched GPAs during cpr-exec
   migration: only-migratable-modes

  accel/xen/xen-all.c|   5 +
  backends/hostmem-epc.c |

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-05-20 Thread fan

On Mon, May 20, 2024 at 05:50:12PM +0100, Jonathan Cameron wrote:
> On Wed, 1 May 2024 15:29:31 -0700
> fan  wrote:
> 
> > From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> > From: Fan Ni 
> > Date: Tue, 20 Feb 2024 09:48:31 -0800
> > Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
> >  capacity extents
> > Status: RO
> > Content-Length: 25172
> > Lines: 731
> > 
> > To simulate FM functionalities for initiating Dynamic Capacity Add
> > (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> > r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> > add/release dynamic capacity extents requests.
> > 
> > With the change, we allow to release an extent only when its DPA range
> > is contained by a single accepted extent in the device. That is to say,
> > extent superset release is not supported yet.
> > 
> > 1. Add dynamic capacity extents:
> > 
> > For example, the command to add two continuous extents (each 128MiB long)
> > to region 0 (starting at DPA offset 0) looks like below:
> > 
> > { "execute": "qmp_capabilities" }
> > 
> > { "execute": "cxl-add-dynamic-capacity",
> >   "arguments": {
> >   "path": "/machine/peripheral/cxl-dcd0",
> >   "host-id": 0,
> >   "selection-policy": 2,
> >   "region": 0,
> >   "tag": "",
> >   "extents": [
> >   {
> >   "offset": 0,
> >   "len": 134217728
> >   },
> >   {
> >   "offset": 134217728,
> >   "len": 134217728
> >   }
> >   ]
> >   }
> > }
> > 
> > 2. Release dynamic capacity extents:
> > 
> > For example, the command to release an extent of size 128MiB from region 0
> > (DPA offset 128MiB) looks like below:
> > 
> > { "execute": "cxl-release-dynamic-capacity",
> >   "arguments": {
> >   "path": "/machine/peripheral/cxl-dcd0",
> >   "host-id": 0,
> >   "flags": 1,
> >   "region": 0,
> >   "tag": "",
> >   "extents": [
> >   {
> >   "offset": 134217728,
> >   "len": 134217728
> >   }
> >   ]
> >   }
> > }
> > 
> > Signed-off-by: Fan Ni 
> 
> Hi Fan,
> 
> A few trivial questions inline.  I don't feel particularly strongly
> about breaking up the flags fields, but I'd like to understand your
> reasoning for keeping them as single fields?
> 
> Is it mainly to keep aligned with the specification or something else?
> 
> Thanks,
> 
> Jonathan
> 
> 
> >  #endif /* CXL_EVENTS_H */
> > diff --git a/qapi/cxl.json b/qapi/cxl.json
> > index 4281726dec..27cf39f448 100644
> > --- a/qapi/cxl.json
> > +++ b/qapi/cxl.json
> > @@ -361,3 +361,93 @@
> >  ##
> >  {'command': 'cxl-inject-correctable-error',
> >   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> > +
> > +##
> > +# @CXLDynamicCapacityExtent:
> > +#
> > +# A single dynamic capacity extent
> > +#
> > +# @offset: The offset (in bytes) to the start of the region
> > +# where the extent belongs to
> > +#
> > +# @len: The length of the extent in bytes
> > +#
> > +# Since: 9.1
> > +##
> > +{ 'struct': 'CXLDynamicCapacityExtent',
> > +  'data': {
> > +  'offset':'uint64',
> > +  'len': 'uint64'
> > +  }
> > +}
> > +
> > +##
> > +# @cxl-add-dynamic-capacity:
> > +#
> > +# Command to initiate to add dynamic capacity extents to a host.  It
> > +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> > +#
> > +# @path: CXL DCD canonical QOM path
> > +#
> > +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> > +# Table 7-70.
> > +#
> > +# @selection-policy: The "Selection Policy" bits as defined in
> > +# cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> > +# selecting which extents comprise the added capacity.
> 
> Hmm. This one is defined as a selection of nameable choices.  Perhaps
> worth an enum?  If we did do that, we'd also need to break the flags
> on in the release flags below.

Initially, I defined a enum for selection policy. But for users who are
not familiar with CXL spec, I think the enum definition is not very clear to
to them without reading the spec, so I removed it. Also, there are some
reserved bits there, let it as uint8 may help keep the interface unchanged
if some of the bits are used in the future?

> 
> 
> > +#
> > +# @region: The "Region Number" field as defined in cxl spec r3.1
> > +# Table 7-70.  The dynamic capacity region where the capacity
> > +# is being added.  Valid range is from 0-7.
> > +#
> > +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> > +#
> > +# @extents: The "Extent List" field as defined in cxl spec r3.1
> > +# Table 7-70.
> > +#
> > +# Since : 9.1
> > +##
> > +{ 'command': 'cxl-add-dynamic-capacity',
> > +  'data': { 'path': 'str',
> > +'host-id': 'uint16',
> > +'selection-policy': 'uint8',
> > +'region': 'uint8',
> > +'tag': 'str',
> > +'extents': [ 'CXLDynamicCapacityExtent' ]
> > +   }
> > +}
> > +
> > +##
> > +#

[PATCH 2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

2024-05-20 Thread Zide Chen

cpu_exec_realizefn which calls the accel-specific realizefn may expand
features.  e.g., some accel-specific options may require extra features
to be enabled, and it's appropriate to expand these features in accel-
specific realizefn.

One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.

Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
that it won't expose features not supported by the host.

Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
Suggested-by: Xiaoyao Li 
Signed-off-by: Zide Chen 
---
 target/i386/cpu.c | 24 
 target/i386/kvm/kvm-cpu.c |  1 -
 2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cfe7c92d6bc6..da1ab7892d26 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7438,6 +7438,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 }
 }
 
+/*
+ * note: the call to the framework needs to happen after feature expansion,
+ * but before the checks/modifications to ucode_rev, mwait, phys_bits.
+ * These may be set by the accel-specific code,
+ * and the results are subsequently checked / assumed in this function.
+ */
+cpu_exec_realizefn(cs, _err);
+if (local_err != NULL) {
+error_propagate(errp, local_err);
+return;
+}
+
 x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
 
 if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
@@ -7459,18 +7471,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error 
**errp)
 
 x86_cpu_set_sgxlepubkeyhash(env);
 
-/*
- * note: the call to the framework needs to happen after feature expansion,
- * but before the checks/modifications to ucode_rev, mwait, phys_bits.
- * These may be set by the accel-specific code,
- * and the results are subsequently checked / assumed in this function.
- */
-cpu_exec_realizefn(cs, _err);
-if (local_err != NULL) {
-error_propagate(errp, local_err);
-return;
-}
-
 if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
 g_autofree char *name = x86_cpu_class_get_model_name(xcc);
 error_setg(_err, "CPU model '%s' requires KVM or HVF", name);
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index f76972e47e61..3adcedf0dbc3 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -50,7 +50,6 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
  * nothing else has been set by the user (or by accelerators) in
  * cpu->ucode_rev and cpu->phys_bits, and updates the CPUID results in
  * mwait.ecx.
- * This accel realization code also assumes cpu features are already 
expanded.
  *
  * realize order:
  *
-- 
2.34.1

[PATCH 0/3] improve -overcommit cpu-pm=on|off

2024-05-20 Thread Zide Chen

Currently, if running "-overcommit cpu-pm=on" on hosts that don't
have MWAIT support, the MWAIT/MONITOR feature is advertised to the
guest and executing MWAIT/MONITOR on the guest triggers #UD.

Zide Chen (3):
  vl: Allow multiple -overcommit commands
  target/i386: call cpu_exec_realizefn before x86_cpu_filter_features
  target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()

 system/vl.c   |  8 ++--
 target/i386/cpu.c | 24 
 target/i386/host-cpu.c| 12 
 target/i386/kvm/kvm-cpu.c | 12 +---
 4 files changed, 27 insertions(+), 29 deletions(-)

-- 
2.34.1

[PATCH 1/3] vl: Allow multiple -overcommit commands

2024-05-20 Thread Zide Chen

Both cpu-pm and mem-lock are related to system resource overcommit, but
they are separate from each other, in terms of how they are realized,
and of course, they are applied to different system resources.

It's tempting to use separate command lines to specify their behavior.
e.g., in the following example, the cpu-pm command is quietly
overwritten, and it's not easy to notice it without careful inspection.

  --overcommit mem-lock=on
  --overcommit cpu-pm=on

Fixes: c8c9dc42b7ca ("Remove the deprecated -realtime option")
Signed-off-by: Zide Chen 
---
 system/vl.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index a3eede5fa5b8..ed682643805b 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -3545,8 +3545,12 @@ void qemu_init(int argc, char **argv)
 if (!opts) {
 exit(1);
 }
-enable_mlock = qemu_opt_get_bool(opts, "mem-lock", false);
-enable_cpu_pm = qemu_opt_get_bool(opts, "cpu-pm", false);
+
+/* Don't override the -overcommit option if set */
+enable_mlock = enable_mlock ||
+qemu_opt_get_bool(opts, "mem-lock", false);
+enable_cpu_pm = enable_cpu_pm ||
+qemu_opt_get_bool(opts, "cpu-pm", false);
 break;
 case QEMU_OPTION_compat:
 {
-- 
2.34.1

[PATCH 3/3] target/i386: Move host_cpu_enable_cpu_pm into kvm_cpu_realizefn()

2024-05-20 Thread Zide Chen

It seems not a good idea to expand features in host_cpu_realizefn,
which is for host CPU only.  Additionally, cpu-pm option is KVM
specific, and it's cleaner to put it in kvm_cpu_realizefn(), together
with the WAITPKG code.

Fixes: f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using 
AccelCPUClass")
Signed-off-by: Zide Chen 
---
 target/i386/host-cpu.c| 12 
 target/i386/kvm/kvm-cpu.c | 11 +--
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/i386/host-cpu.c b/target/i386/host-cpu.c
index 280e427c017c..8b8bf5afeccf 100644
--- a/target/i386/host-cpu.c
+++ b/target/i386/host-cpu.c
@@ -42,15 +42,6 @@ static uint32_t host_cpu_phys_bits(void)
 return host_phys_bits;
 }
 
-static void host_cpu_enable_cpu_pm(X86CPU *cpu)
-{
-CPUX86State *env = >env;
-
-host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
-   >mwait.ecx, >mwait.edx);
-env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
-}
-
 static uint32_t host_cpu_adjust_phys_bits(X86CPU *cpu)
 {
 uint32_t host_phys_bits = host_cpu_phys_bits();
@@ -83,9 +74,6 @@ bool host_cpu_realizefn(CPUState *cs, Error **errp)
 X86CPU *cpu = X86_CPU(cs);
 CPUX86State *env = >env;
 
-if (cpu->max_features && enable_cpu_pm) {
-host_cpu_enable_cpu_pm(cpu);
-}
 if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
 uint32_t phys_bits = host_cpu_adjust_phys_bits(cpu);
 
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index 3adcedf0dbc3..197c892da89a 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -64,9 +64,16 @@ static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
  *   cpu_common_realizefn() (via xcc->parent_realize)
  */
 if (cpu->max_features) {
-if (enable_cpu_pm && kvm_has_waitpkg()) {
-env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+if (enable_cpu_pm) {
+if (kvm_has_waitpkg()) {
+env->features[FEAT_7_0_ECX] |= CPUID_7_0_ECX_WAITPKG;
+}
+
+host_cpuid(5, 0, >mwait.eax, >mwait.ebx,
+   >mwait.ecx, >mwait.edx);
+env->features[FEAT_1_ECX] |= CPUID_EXT_MONITOR;
 }
+
 if (cpu->ucode_rev == 0) {
 cpu->ucode_rev =
 kvm_arch_get_supported_msr_feature(kvm_state,
-- 
2.34.1

Re: [RFC PATCH 0/1] pci: allocate a PCI ID for RISC-V IOMMU

2024-05-20 Thread Daniel Henrique Barboza





On 5/10/24 07:47, Frank Chang wrote:

Hi Daniel,

Daniel Henrique Barboza  於 2024年5月8日 週三 下午8:42寫道：




On 5/7/24 12:44, Peter Maydell wrote:

On Fri, 3 May 2024 at 13:43, Daniel Henrique Barboza
 wrote:


Hi,

In this RFC I want to check with Gerd and others if it's ok to add a PCI
id for the RISC-V IOMMU device. It's currently under review in [1]. The
idea is to fold this patch into the RISC-V IOMMU series if we're all ok
with this change.


My question here would be "why is this risc-v specific?" (and more
generally "what is this for?" -- the cover letter and patch and
documentation page provide almost no information about what this
device is and why it needs to exist rather than using either
virtio-iommu or else a model of a real hardware IOMMU.)


The RISC-V IOMMU device emulation under review ([1]) is a reference 
implementation of
the riscv-iommu spec [2]. AFAIK it is similar to what we already have with 
aarch64 'smmuv3'
'virt' bus, i.e. an impl of ARM's SMMUv3 that isn't tied to a specific vendor.

The difference here is that the riscv-iommu spec, ratified by RISC-V 
International (RVI),
predicts that the device could be implemented as a PCIe device. But RVI didn't 
bother
assigning a PCI ID for their reference IOMMU. The existing implementation in 
[1] is using
a Rivos PCI ID that we're treating as a placeholder only. We need an ID that 
reflects that
this is a device that adheres to the riscv-iommu spec, not to an IOMMU of any 
particular
vendor.

Since RVI doesn't provide a PCI ID for it we went to Red Hat, and they were 
kind enough
to give us a PCI ID for the RISC-V IOMMU reference device.


That's great. Thanks to Red Hat.
I'm wondering do we have the plan to document the new PCI ID to the IOMMU spec
or somewhere else that's publicly accessible?


It will be documented in QEMU, as you've already seen in this patch. I'm sure
that this info will be cascaded for other databases but I'm not sure how or 
when.
I think Gerd can give us more info about it.

I guess we'll end up using this same generic ID from QEMU in the kernel side 
too.
As of now the kernel IOMMU support is using a Rivos ID ([1], patch 3). Assuming 
that
[1] stays this way (I'm not sure if the kernel driver is a Rivos implementation 
or a
canonical implementation like we're doing here), we'll need to add a generic 
kernel
support that uses the generic ID too.


Thanks,

Daniel

[1] 
https://lore.kernel.org/linux-riscv/cover.1715708679.git.tjezn...@rivosinc.com/





Regards,
Frank Chang



I'll do a proper job this time and add all this context in the commit msg. 
Including a
proper shout-out to Gerd and Red Hat.



Thanks,


Daniel


[1] 
https://lore.kernel.org/qemu-riscv/20240307160319.675044-1-dbarb...@ventanamicro.com/
[2] https://github.com/riscv-non-isa/riscv-iommu/releases/tag/v1.0.0



thanks
-- PMM

Re: [PATCH] physmem: allow debug writes to MMIO regions

2024-05-20 Thread Peter Maydell

On Wed, 15 May 2024 at 13:49, Philippe Mathieu-Daudé  wrote:
>
> Hi Perry,
>
> On 14/5/24 01:33, Perry Hung wrote:
> > Writes from GDB to memory-mapped IO regions are currently silently
> > dropped. cpu_memory_rw_debug() calls address_space_write_rom(), which
> > calls address_space_write_rom_internal(), which ignores all non-ram/rom
> > regions.
> >
> > Add a check for MMIO regions and direct those to address_space_rw()
> > instead.
> >
>
> Reported-by: Andreas Rasmusson 
> BugLink: https://bugs.launchpad.net/qemu/+bug/1625216
>
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/213
> > Signed-off-by: Perry Hung 
> > ---
> >   system/physmem.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/system/physmem.c b/system/physmem.c
> > index 342b7a8fd4..013cdd2ab1 100644
> > --- a/system/physmem.c
> > +++ b/system/physmem.c
> > @@ -3508,7 +3508,10 @@ int cpu_memory_rw_debug(CPUState *cpu, vaddr addr,
> >   if (l > len)
> >   l = len;
> >   phys_addr += (addr & ~TARGET_PAGE_MASK);
> > -if (is_write) {
> > +if (cpu_physical_memory_is_io(phys_addr)) {
> > +res = address_space_rw(cpu->cpu_ases[asidx].as, phys_addr, 
> > attrs,
> > +   buf, l, is_write);
> > +} else if (is_write) {
> >   res = address_space_write_rom(cpu->cpu_ases[asidx].as, 
> > phys_addr,
> > attrs, buf, l);
> >   } else {

The other option is to make address_space_write_rom_internal()
also write to devices...

> I wonder if we shouldn't be safer with a preliminary patch
> adding a 'can_do_io' boolean argument to cpu_memory_rw_debug()
> (updating the call sites), then this patch would become:
>
>  if (can_do_io && cpu_physical_memory_is_io(phys_addr)) {
>
> One of my worries for example is if someone accidently insert
> a breakpoint at a I/O address, the device might change its
> state and return MEMTX_OK which is confusing.

You can definitely do some silly things if we remove this
restriction.

On the other hand if you're using gdb as a debugger on real
(bare metal) hardware does anything stop you doing that?

-- PMM

Re: hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-20 Thread Peter Maydell

On Tue, 6 Feb 2024 at 13:25, Cord Amfmgm  wrote:
>
> This changes the ohci validation to not assert if invalid
> data is fed to the ohci controller. The poc suggested in
> https://bugs.launchpad.net/qemu/+bug/1907042
> and then migrated to bug #303 does the following to
> feed it a SETUP pid and EndPt of 1:
>
> uint32_t MaxPacket = 64;
> uint32_t TDFormat = 0;
> uint32_t Skip = 0;
> uint32_t Speed = 0;
> uint32_t Direction = 0;  /* #define OHCI_TD_DIR_SETUP 0 */
> uint32_t EndPt = 1;
> uint32_t FuncAddress = 0;
> ed->attr = (MaxPacket << 16) | (TDFormat << 15) | (Skip << 14)
>| (Speed << 13) | (Direction << 11) | (EndPt << 7)
>| FuncAddress;
> ed->tailp = /*TDQTailPntr= */ 0;
> ed->headp = ((/*TDQHeadPntr= */ [0]) & 0xfff0)
>| (/* ToggleCarry= */ 0 << 1);
> ed->next_ed = (/* NextED= */ 0 & 0xfff0)
>
> qemu-fuzz also caught the same issue in #1510. They are
> both fixed by this patch.
>
> The if (td.cbp > td.be) logic in ohci_service_td() causes an
> ohci_die(). My understanding of the OHCI spec 4.3.1.2
> Table 4-2 allows td.cbp to be one byte more than td.be to
> signal the buffer has zero length. The new check in qemu
> appears to have been added since qemu-4.2. This patch
> includes both fixes since they are located very close
> together.

For the "zero length buffer" case, do you have a more detailed
pointer to the bit of the spec that says that "cbp = be + 1" is a
valid way to specify a zero length buffer? Table 4-2 in the copy I
have says for CurrentBufferPointer "a value of 0 indicates
a zero-length data packet or that all bytes have been transferred",
and the sample host OS driver function QueueGeneralRequest()
later in the spec handles a 0 length packet by setting
  TD->HcTD.CBP = TD->HcTD.BE = 0;
(which our emulation's code does handle).

> @@ -936,8 +941,8 @@ static int ohci_service_td(OHCIState *ohci, struct
> ohci_ed *ed)
>  if ((td.cbp & 0xf000) != (td.be & 0xf000)) {
>  len = (td.be & 0xfff) + 0x1001 - (td.cbp & 0xfff);
>  } else {
> -if (td.cbp > td.be) {
> -trace_usb_ohci_iso_td_bad_cc_overrun(td.cbp, td.be);
> +if (td.cbp > td.be + 1) {

I think this has an overflow if td.be is 0x.

> +trace_usb_ohci_td_bad_buf(td.cbp, td.be);
>  ohci_die(ohci);
>  return 1;
>  }

(On the other hand having looked at the code I'm happy
now that having a len of 0 passed into usb_packet_addbuf()
is OK because we already do that for the "cbp = be = 0"
way of specifying a zero length packet.)

thanks
-- PMM

Re: [PATCH] gitlab-ci: Replace Docker with Kaniko

2024-05-20 Thread Camilla Conte

On Fri, May 17, 2024 at 9:14 AM Daniel P. Berrangé  wrote:
>
> On Thu, May 16, 2024 at 07:24:04PM +0100, Daniel P. Berrangé wrote:
> > On Thu, May 16, 2024 at 05:52:43PM +0100, Camilla Conte wrote:
> > > Enables caching from the qemu-project repository.
> > >
> > > Uses a dedicated "$NAME-cache" tag for caching, to address limitations.
> > > See issue "when using --cache=true, kaniko fail to push cache layer 
> > > [...]":
> > > https://github.com/GoogleContainerTools/kaniko/issues/1459
> >
> > After investigating, this is a result of a different design approach
> > for caching in kaniko.
> >
> > In docker, it can leverage any existing image as a cache source,
> > reusing individual layers that were present. IOW, there's no
> > difference between a cache and a final image, they're one and the
> > same thing
> >
> > In kaniko, the cache is a distinct object type. IIUC, it is not
> > populated with the individual layers, instead it has a custom
> > format for storing the cached content. Therefore the concept of
> > storing the cache at the same location as the final image, is
> > completely inappropriate - you can't store two completely different
> > kinds of content at the same place.
> >
> > That is also why you can't just "git pull" the fetch the cache
> > image(s) beforehand, and also why it doesn't look like you can
> > use multiple cache sources with kaniko.
> >
> > None of this is inherantly a bad thing. except when it comes
> > to data storage. By using Kaniko we would, at minimum, doubling
> > the amount of data storage we consume in the gitlab registry.
>
> Double is actually just the initial case. The cache is storing layers
> using docker tags, whose name appears based on a hash of the "RUN"
> command.
>
> IOW, the first time we build a container we have double the usage.
> When a dockerfile is updated changing a 'RUN' command, we now have
> triple the storage usage for cache. Update the RUN command again,
> and we now have quadruple the storage. etc.
>
> Kaniko does not appear to purge cache entries itself, and will rely
> on something else to do the cache purging.
>
> GitLab has support for purging old docker tags, but I'm not an
> admin on the QEMU project namespace, so can't tell if it can be
> enabled or not ? Many older projects have this permanently disabled
> due to historical compat issues in gitlab after they introduced the
> feature.

I'm pretty sure purging can be enabled. Gitlab itself proposes this
with a "set up cleanup" link on the registry page (1).
Can you recall what issues they were experiencing?

If this is the only issue blocking Kaniko adoption, and we can't solve
it by enabling the cleanup, I can write an additional step at the end
of the container build to explicitly remove old cache tags.

(1) https://gitlab.com/qemu-project/qemu/container_registry

>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>

Re: [PATCH 1/2] hw/usb/hcd-ohci: Fix #1510, #303: pid not IN or OUT

2024-05-20 Thread Peter Maydell

On Thu, 9 May 2024 at 01:30, David Hubbard  wrote:
>
> From: Cord Amfmgm 
>
> This changes the ohci validation to not assert if invalid data is fed to the
> ohci controller. The poc in https://bugs.launchpad.net/qemu/+bug/1907042 and
> migrated to bug #303 does the following to feed it a SETUP pid (valid)
> at an EndPt of 1 (invalid - all SETUP pids must be addressed to EndPt 0):
>
> uint32_t MaxPacket = 64;
> uint32_t TDFormat = 0;
> uint32_t Skip = 0;
> uint32_t Speed = 0;
> uint32_t Direction = 0;  /* #define OHCI_TD_DIR_SETUP 0 */
> uint32_t EndPt = 1;
> uint32_t FuncAddress = 0;
> ed->attr = (MaxPacket << 16) | (TDFormat << 15) | (Skip << 14)
>| (Speed << 13) | (Direction << 11) | (EndPt << 7)
>| FuncAddress;
> ed->tailp = /*TDQTailPntr= */ 0;
> ed->headp = ((/*TDQHeadPntr= */ [0]) & 0xfff0)
>| (/* ToggleCarry= */ 0 << 1);
> ed->next_ed = (/* NextED= */ 0 & 0xfff0)
>
> qemu-fuzz also caught the same issue in #1510. They are both fixed by this
> patch.
>
> With a tiny OS[1] that boots and executes the poc the repro shows the issue:
>
> * OS that sends USB requests to a USB mass storage device
>   but sends a SETUP with EndPt = 1
> * qemu 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.19)
> * qemu HEAD (4e66a0854)
> * Actual OHCI controller (hardware)
>
> Command line:
> qemu-system-x86_64 -m 20 \
>  -device pci-ohci,id=ohci \
>  -drive if=none,format=raw,id=d,file=testmbr.raw \
>  -device usb-storage,bus=ohci.0,drive=d \
>  --trace "usb_*" --trace "ohci_*" -D qemu.log
>
> Results are:
>
>  qemu 6.2.0 | qemu HEAD | actual HW
> +---+
>  assertion  | assertion | sets stall bit
>
> The assertion message is:
>
> > qemu-system-x86_64: ../../hw/usb/core.c:744: usb_ep_get: Assertion `pid == 
> > USB_TOKEN_IN || pid == USB_TOKEN_OUT' failed.
> > Aborted (core dumped)
>
> Tip: if the flags "-serial pty -serial stdio" are added to the command line
> the poc outputs its USB requests like this:
>
> > Free mem 2M ohci port0 conn FS
> > setup { 80 6 0 1 0 0 8 0 }
> > ED info=8 { mps=8 en=0 d=0 } tail=c20920
> >   td0 c20880 nxt=c20960 f200 setup cbp=c20900 be=c20907   cbp=0 
> > be=c20907
> >   td1 c20960 nxt=c20980 f314in cbp=c20908 be=c2090f   cbp=0 
> > be=c2090f
> >   td2 c20980 nxt=c20920 f308   out cbp=0 be=0 cbp=0 be=0
> >rx { 12 1 0 2 0 0 0 8 }
> > setup { 0 5 1 0 0 0 0 0 } tx {}
> > ED info=8 { mps=8 en=0 d=0 } tail=c20880
> >   td0 c20920 nxt=c20960 f200 setup cbp=c20900 be=c20907   cbp=0 
> > be=c20907
> >   td1 c20960 nxt=c20880 f310in cbp=0 be=0 cbp=0 be=0
> > setup { 80 6 0 1 0 0 12 0 }
> > ED info=80081 { mps=8 en=0 d=1 } tail=c20960
> >   td0 c20880 nxt=c209c0 f200 setup cbp=c20920 be=c20927
> >   td1 c209c0 nxt=c209e0 f314in cbp=c20928 be=c20939
> >   td2 c209e0 nxt=c20960 f308   out cbp=0 be=0qemu-system-x86_64: 
> > ../../hw/usb/core.c:744: usb_ep_get: Assertion `pid == USB_TOKEN_IN || pid 
> > == USB_TOKEN_OUT' failed.
> > Aborted (core dumped)
>
> [1] The OS disk image has been emailed to phi...@linaro.org, m...@tls.msk.ru,
> and kra...@redhat.com:
>
> * testBadSetup.img.xz
> * sha256: 045b43f4396de02b149518358bf8025d5ba11091e86458875339fc649e6e5ac6
>
> Signed-off-by: Cord Amfmgm 
> ---
>  hw/usb/hcd-ohci.c   | 5 +
>  hw/usb/trace-events | 1 +
>  2 files changed, 6 insertions(+)
>
> diff --git a/hw/usb/hcd-ohci.c b/hw/usb/hcd-ohci.c
> index fc8fc91a1d..acd6016980 100644
> --- a/hw/usb/hcd-ohci.c
> +++ b/hw/usb/hcd-ohci.c
> @@ -927,6 +927,11 @@ static int ohci_service_td(OHCIState *ohci, struct 
> ohci_ed *ed)
>  case OHCI_TD_DIR_SETUP:
>  str = "setup";
>  pid = USB_TOKEN_SETUP;
> +if (OHCI_BM(ed->flags, ED_EN) > 0) {  /* setup only allowed to ep 0 
> */
> +trace_usb_ohci_td_bad_pid(str, ed->flags, td.flags);
> +ohci_die(ohci);
> +return 1;
> +}
>  break;
>  default:
>  trace_usb_ohci_td_bad_direction(dir);
> diff --git a/hw/usb/trace-events b/hw/usb/trace-events
> index ed7dc210d3..fd7b90d70c 100644
> --- a/hw/usb/trace-events
> +++ b/hw/usb/trace-events
> @@ -28,6 +28,7 @@ usb_ohci_iso_td_data_overrun(int ret, ssize_t len) 
> "DataOverrun %d > %zu"
>  usb_ohci_iso_td_data_underrun(int ret) "DataUnderrun %d"
>  usb_ohci_iso_td_nak(int ret) "got NAK/STALL %d"
>  usb_ohci_iso_td_bad_response(int ret) "Bad device response %d"
> +usb_ohci_td_bad_pid(const char *s, uint32_t edf, uint32_t tdf) "Bad pid %s: 
> ed.flags 0x%x td.flags 0x%x"
>  usb_ohci_port_attach(int index) "port #%d"
>  usb_ohci_port_detach(int index) "port #%d"
>  usb_ohci_port_wakeup(int index) "port #%d"
> --

For this patch,

Reviewed-by: Peter Maydell 

Are you happy for me to take this patch and apply it to
target-arm.next with the git Author and Signed-off-by:

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-05-20 Thread Jonathan Cameron via

On Wed, 1 May 2024 15:29:31 -0700
fan  wrote:

> From 873f59ec06c38645768ada452d9b18920a34723e Mon Sep 17 00:00:00 2001
> From: Fan Ni 
> Date: Tue, 20 Feb 2024 09:48:31 -0800
> Subject: [PATCH] hw/cxl/events: Add qmp interfaces to add/release dynamic
>  capacity extents
> Status: RO
> Content-Length: 25172
> Lines: 731
> 
> To simulate FM functionalities for initiating Dynamic Capacity Add
> (Opcode 5604h) and Dynamic Capacity Release (Opcode 5605h) as in CXL spec
> r3.1 7.6.7.6.5 and 7.6.7.6.6, we implemented two QMP interfaces to issue
> add/release dynamic capacity extents requests.
> 
> With the change, we allow to release an extent only when its DPA range
> is contained by a single accepted extent in the device. That is to say,
> extent superset release is not supported yet.
> 
> 1. Add dynamic capacity extents:
> 
> For example, the command to add two continuous extents (each 128MiB long)
> to region 0 (starting at DPA offset 0) looks like below:
> 
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-add-dynamic-capacity",
>   "arguments": {
>   "path": "/machine/peripheral/cxl-dcd0",
>   "host-id": 0,
>   "selection-policy": 2,
>   "region": 0,
>   "tag": "",
>   "extents": [
>   {
>   "offset": 0,
>   "len": 134217728
>   },
>   {
>   "offset": 134217728,
>   "len": 134217728
>   }
>   ]
>   }
> }
> 
> 2. Release dynamic capacity extents:
> 
> For example, the command to release an extent of size 128MiB from region 0
> (DPA offset 128MiB) looks like below:
> 
> { "execute": "cxl-release-dynamic-capacity",
>   "arguments": {
>   "path": "/machine/peripheral/cxl-dcd0",
>   "host-id": 0,
>   "flags": 1,
>   "region": 0,
>   "tag": "",
>   "extents": [
>   {
>   "offset": 134217728,
>   "len": 134217728
>   }
>   ]
>   }
> }
> 
> Signed-off-by: Fan Ni 

Hi Fan,

A few trivial questions inline.  I don't feel particularly strongly
about breaking up the flags fields, but I'd like to understand your
reasoning for keeping them as single fields?

Is it mainly to keep aligned with the specification or something else?

Thanks,

Jonathan


>  #endif /* CXL_EVENTS_H */
> diff --git a/qapi/cxl.json b/qapi/cxl.json
> index 4281726dec..27cf39f448 100644
> --- a/qapi/cxl.json
> +++ b/qapi/cxl.json
> @@ -361,3 +361,93 @@
>  ##
>  {'command': 'cxl-inject-correctable-error',
>   'data': {'path': 'str', 'type': 'CxlCorErrorType'}}
> +
> +##
> +# @CXLDynamicCapacityExtent:
> +#
> +# A single dynamic capacity extent
> +#
> +# @offset: The offset (in bytes) to the start of the region
> +# where the extent belongs to
> +#
> +# @len: The length of the extent in bytes
> +#
> +# Since: 9.1
> +##
> +{ 'struct': 'CXLDynamicCapacityExtent',
> +  'data': {
> +  'offset':'uint64',
> +  'len': 'uint64'
> +  }
> +}
> +
> +##
> +# @cxl-add-dynamic-capacity:
> +#
> +# Command to initiate to add dynamic capacity extents to a host.  It
> +# simulates operations defined in cxl spec r3.1 7.6.7.6.5.
> +#
> +# @path: CXL DCD canonical QOM path
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +# Table 7-70.
> +#
> +# @selection-policy: The "Selection Policy" bits as defined in
> +# cxl spec r3.1 Table 7-70.  It specifies the policy to use for
> +# selecting which extents comprise the added capacity.

Hmm. This one is defined as a selection of nameable choices.  Perhaps
worth an enum?  If we did do that, we'd also need to break the flags
on in the release flags below.


> +#
> +# @region: The "Region Number" field as defined in cxl spec r3.1
> +# Table 7-70.  The dynamic capacity region where the capacity
> +# is being added.  Valid range is from 0-7.
> +#
> +# @tag: The "Tag" field as defined in cxl spec r3.1 Table 7-70.
> +#
> +# @extents: The "Extent List" field as defined in cxl spec r3.1
> +# Table 7-70.
> +#
> +# Since : 9.1
> +##
> +{ 'command': 'cxl-add-dynamic-capacity',
> +  'data': { 'path': 'str',
> +'host-id': 'uint16',
> +'selection-policy': 'uint8',
> +'region': 'uint8',
> +'tag': 'str',
> +'extents': [ 'CXLDynamicCapacityExtent' ]
> +   }
> +}
> +
> +##
> +# @cxl-release-dynamic-capacity:
> +#
> +# Command to initiate to release dynamic capacity extents from a
> +# host.  It simulates operations defined in cxl spec r3.1 7.6.7.6.6.
> +#
> +# @path: CXL DCD canonical QOM path
> +#
> +# @host-id: The "Host ID" field as defined in cxl spec r3.1
> +# Table 7-71.
> +#
> +# @flags: The "Flags" field as defined in cxl spec r3.1 Table 7-71,
> +# with bit[3:0] for removal policy, bit[4] for forced removal,
> +# bit[5] for sanitize on release, bit[7:6] reserved.

This can be nicely broken up into removal policy enum plus two flags.
It might be worth doing so to give a nicer interface?

> +#
> +# @region: The dynamic capacity region where the extents will be
> +#

Re: [PATCH v2 00/15] riscv: QEMU RISC-V IOMMU Support

2024-05-20 Thread Daniel Henrique Barboza





On 5/10/24 08:14, Frank Chang wrote:

Hi Daniel,

Thanks for the upstream work.
Sorry that it took a while for me to review the patchset.

Please let me know if you need any help from us to update the IOMMU model.
We would like to see it merged for QEMU 9.1.0.


Thanks for the help in the reviews!

I'll do some last changes in the riscv-iommu-pci device, and check if we have 
any
DT changes that happened that we need to sync up.

The plan is to send v3 in the next couple of days. Let's see how it goes.


Thanks,


Daniel




Regards,
Frank Chang

Daniel Henrique Barboza  於 2024年3月8日 週五 上午12:04寫道：


Hi,

This is the second version of the work Tomasz sent in July 2023 [1].
I'll be helping Tomasz upstreaming it.

The core emulation code is left unchanged but a few tweaks were made in
v2:

- The most notable difference in this version is that the code was split
   in smaller chunks. Patch 03 is still a 1700 lines patch, which is an
   improvement from the 3800 lines patch from v1, but we can only go so
   far when splitting the core components of the emulation. The reality
   is that the IOMMU emulation is a rather complex piece of software and
   there's not much we can do to alleviate it;

- I'm not contributing the HPM support that was present in v1. It shaved
   off 600 lines of code from the series, which is already large enough
   as is. We'll introduce HPM in later versions or as a follow-up;

- The riscv-iommu-header.h header was also trimmed. I shaved it of 300
   or so from it, all of them from definitions that the emulation isn't
   using it. The header will be eventually be imported from the Linux
   driver (not upstream yet), so for now we can live with a trimmed
   header for the emulation usage alone;

- I added libqos tests for the riscv-iommu-pci device. The idea of these
   tests is to give us more confidence in the emulation code;

- 'edu' device support. The support was retrieved from Tomasz EDU branch
   [2]. This device can then be used to test PCI passthrough to exercise
   the IOMMU.


Patches based on alistair/riscv-to-apply.next.

v1 link: 
https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjezn...@rivosinc.com/

[1] 
https://lore.kernel.org/qemu-riscv/cover.1689819031.git.tjezn...@rivosinc.com/
[2] https://github.com/tjeznach/qemu.git, branch 'riscv_iommu_edu_impl'

Andrew Jones (1):
   hw/riscv/riscv-iommu: Add another irq for mrif notifications

Daniel Henrique Barboza (2):
   test/qtest: add riscv-iommu-pci tests
   qtest/riscv-iommu-test: add init queues test

Tomasz Jeznach (12):
   exec/memtxattr: add process identifier to the transaction attributes
   hw/riscv: add riscv-iommu-bits.h
   hw/riscv: add RISC-V IOMMU base emulation
   hw/riscv: add riscv-iommu-pci device
   hw/riscv: add riscv-iommu-sys platform device
   hw/riscv/virt.c: support for RISC-V IOMMU PCIDevice hotplug
   hw/riscv/riscv-iommu: add Address Translation Cache (IOATC)
   hw/riscv/riscv-iommu: add s-stage and g-stage support
   hw/riscv/riscv-iommu: add ATS support
   hw/riscv/riscv-iommu: add DBG support
   hw/misc: EDU: added PASID support
   hw/misc: EDU: add ATS/PRI capability

  hw/misc/edu.c|  297 -
  hw/riscv/Kconfig |4 +
  hw/riscv/meson.build |1 +
  hw/riscv/riscv-iommu-bits.h  |  407 ++
  hw/riscv/riscv-iommu-pci.c   |  173 +++
  hw/riscv/riscv-iommu-sys.c   |   93 ++
  hw/riscv/riscv-iommu.c   | 2085 ++
  hw/riscv/riscv-iommu.h   |  146 +++
  hw/riscv/trace-events|   15 +
  hw/riscv/trace.h |2 +
  hw/riscv/virt.c  |   33 +-
  include/exec/memattrs.h  |5 +
  include/hw/riscv/iommu.h |   40 +
  meson.build  |1 +
  tests/qtest/libqos/meson.build   |4 +
  tests/qtest/libqos/riscv-iommu.c |   79 ++
  tests/qtest/libqos/riscv-iommu.h |   96 ++
  tests/qtest/meson.build  |1 +
  tests/qtest/riscv-iommu-test.c   |  234 
  19 files changed, 3704 insertions(+), 12 deletions(-)
  create mode 100644 hw/riscv/riscv-iommu-bits.h
  create mode 100644 hw/riscv/riscv-iommu-pci.c
  create mode 100644 hw/riscv/riscv-iommu-sys.c
  create mode 100644 hw/riscv/riscv-iommu.c
  create mode 100644 hw/riscv/riscv-iommu.h
  create mode 100644 hw/riscv/trace-events
  create mode 100644 hw/riscv/trace.h
  create mode 100644 include/hw/riscv/iommu.h
  create mode 100644 tests/qtest/libqos/riscv-iommu.c
  create mode 100644 tests/qtest/libqos/riscv-iommu.h
  create mode 100644 tests/qtest/riscv-iommu-test.c

--
2.43.2

Re: [PATCH] Hexagon: fix HVX store new

2024-05-20 Thread Brian Cain




On 5/20/2024 10:53 AM, Matheus Tavares Bernardino wrote:

At 09a7e7db0f (Hexagon (target/hexagon) Remove uses of
op_regs_generated.h.inc, 2024-03-06), we've changed the logic of
check_new_value() to use the new pre-calculated
packet->insn[...].dest_idx instead of calculating the index on the fly
using opcode_reginfo[...]. The dest_idx index is calculated roughly like
the following:

 for reg in iset[tag]["syntax"]:
 if reg.is_written():
 dest_idx = regno
 break

Thus, we take the first register that is writtable. Before that,
however, we also used to follow an alphabetical order on the register
type: 'd', 'e', 'x', and 'y'. No longer following that makes us select
the wrong register index and the HVX store new instruction does not
update the memory like expected.

Signed-off-by: Matheus Tavares Bernardino 
---



Reviewed-by: Brian Cain 



  tests/tcg/hexagon/hvx_misc.c  | 23 +++
  target/hexagon/gen_trans_funcs.py |  9 ++---
  2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
index 1fe14b5158..90c3733da0 100644
--- a/tests/tcg/hexagon/hvx_misc.c
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -474,6 +474,27 @@ static void test_vcombine(void)
  check_output_w(__LINE__, BUFSIZE);
  }
  
+void test_store_new()

+{
+asm volatile(
+"r0 = #0x12345678\n"
+"v0 = vsplat(r0)\n"
+"r0 = #0xff00ff00\n"
+"v1 = vsplat(r0)\n"
+"{\n"
+"   vdeal(v1,v0,r0)\n"
+"   vmem(%0) = v0.new\n"
+"}\n"
+:
+: "r"([0])
+: "r0", "v0", "v1", "memory"
+);
+for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+expect[0].w[i] = 0x12345678;
+}
+check_output_w(__LINE__, 1);
+}
+
  int main()
  {
  init_buffers();
@@ -515,6 +536,8 @@ int main()
  
  test_vcombine();
  
+test_store_new();

+
  puts(err ? "FAIL" : "PASS");
  return err ? 1 : 0;
  }
diff --git a/target/hexagon/gen_trans_funcs.py 
b/target/hexagon/gen_trans_funcs.py
index 9f86b4edbd..30f0c73e0c 100755
--- a/target/hexagon/gen_trans_funcs.py
+++ b/target/hexagon/gen_trans_funcs.py
@@ -89,6 +89,7 @@ def gen_trans_funcs(f):
  
  new_read_idx = -1

  dest_idx = -1
+dest_idx_reg_id = None
  has_pred_dest = "false"
  for regno, (reg_type, reg_id, *_) in enumerate(regs):
  reg = hex_common.get_register(tag, reg_type, reg_id)
@@ -97,9 +98,11 @@ def gen_trans_funcs(f):
  """))
  if reg.is_read() and reg.is_new():
  new_read_idx = regno
-# dest_idx should be the first destination, so check for -1
-if reg.is_written() and dest_idx == -1:
-dest_idx = regno
+if reg.is_written():
+# dest_idx should be the first destination alphabetically
+if dest_idx_reg_id is None or reg_id < dest_idx_reg_id:
+dest_idx = regno
+dest_idx_reg_id = reg_id
  if reg_type == "P" and reg.is_written() and not reg.is_read():
  has_pred_dest = "true"

Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation

2024-05-20 Thread Daniel Henrique Barboza


Hi Frank,

On 5/16/24 04:13, Frank Chang wrote:

On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza mailto:dbarb...@ventanamicro.com>> wrote:

Hi Frank,


On 5/8/24 08:15, Daniel Henrique Barboza wrote:
 > Hi Frank,
 >
 > I'll reply with that I've done so far. Still missing some stuff:
 >
 > On 5/2/24 08:37, Frank Chang wrote:
 >> Hi Daniel,
 >>
 >> Daniel Henrique Barboza mailto:dbarb...@ventanamicro.com>> 於 2024年3月8日 週五 上午12:04寫道：
 >>>
 >>> From: Tomasz Jeznach mailto:tjezn...@rivosinc.com>>
 >>>
 >>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
 >>> international process. The latest frozen specifcation can be found
 >>> at:
 >>>
 >>> 
https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf 

 >>>
 >>> Add the foundation of the device emulation for RISC-V IOMMU, which
 >>> includes an IOMMU that has no capabilities but MSI interrupt support 
and
 >>> fault queue interfaces. We'll add add more features incrementally in 
the
 >>> next patches.
 >>>
 >>> Co-developed-by: Sebastien Boeuf mailto:s...@rivosinc.com>>
 >>> Signed-off-by: Sebastien Boeuf mailto:s...@rivosinc.com>>
 >>> Signed-off-by: Tomasz Jeznach mailto:tjezn...@rivosinc.com>>
 >>> Signed-off-by: Daniel Henrique Barboza mailto:dbarb...@ventanamicro.com>>
 >>> ---
 >>>   hw/riscv/Kconfig |    4 +

(...)

 >>> +
 >>> +    s->iommus.le_next = NULL;
 >>> +    s->iommus.le_prev = NULL;
 >>> +    QLIST_INIT(>spaces);
 >>> +    qemu_cond_init(>core_cond);
 >>> +    qemu_mutex_init(>core_lock);
 >>> +    qemu_spin_init(>regs_lock);
 >>> +    qemu_thread_create(>core_proc, "riscv-iommu-core",
 >>> +    riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
 >>
 >> In our experience, using QEMU thread increases the latency of command
 >> queue processing,
 >> which leads to the potential IOMMU fence timeout in the Linux driver
 >> when using IOMMU with KVM,
 >> e.g. booting the guest Linux.
 >>
 >> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
 >> and Intel IOMMU models?
 >
 > Interesting. We've been using this emulation internally in Ventana, with
 > KVM and VFIO, and didn't experience this issue. Drew is on CC and can 
talk
 > more about it.
 >
 > That said, I don't mind this change, assuming it's feasible to make it 
for this
 > first version.  I'll need to check it how other IOMMUs are doing it.


I removed the threading and it seems to be working fine without it. I'll 
commit this
change for v3.

 >
 >
 >
 >>
 >>> +}
 >>> +
 >
 > (...)
 >
 >>> +
 >>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, 
int devfn)
 >>> +{
 >>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
 >>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
 >>> +    AddressSpace *as = NULL;
 >>> +
 >>> +    if (pdev && pci_is_iommu(pdev)) {
 >>> +    return s->target_as;
 >>> +    }
 >>> +
 >>> +    /* Find first registered IOMMU device */
 >>> +    while (s->iommus.le_prev) {
 >>> +    s = *(s->iommus.le_prev);
 >>> +    }
 >>> +
 >>> +    /* Find first matching IOMMU */
 >>> +    while (s != NULL && as == NULL) {
 >>> +    as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), 
devfn));
 >>
 >> For pci_bus_num(),
 >> riscv_iommu_find_as() can be called at the very early stage
 >> where software has no chance to enumerate the bus numbers.

I investigated and this doesn't seem to be a problem. This function is 
called at the
last step of the realize() steps of both riscv_iommu_pci_realize() and
riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already 
assigned.
Other iommus use pci_bus_num() into their own get_address_space() callbacks 
like
this too.


Hi Daniel,

IIUC, pci_bus_num() by default is assigned to pcibus_num():

static int pcibus_num(PCIBus *bus)
{
     if (pci_bus_is_root(bus)) {
         return 0; /* pci host bridge */
     }
     return bus->parent_dev->config[PCI_SECONDARY_BUS];
}

If the bus is not the root bus, it tries to read the bus' parent device's
secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration space.
This field should be programmable by the SW during PCIe enumeration.
But I don't think SW has a chance to be executed before 
riscv_iommu_sys_realize() is called,
since it's pretty early before CPU's execution unless RISC-V IOMMU is 
hot-plugged.
Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize() is still 
called
before SW aware of the existence of IOMMU on the PCI topology tree.

Do you think this

Re: [RFC v2 2/2] hw/riscv: Add server platform reference machine

2024-05-20 Thread Andrew Jones

On Tue, Mar 12, 2024 at 09:52:21PM GMT, Fei Wu wrote:
> The RISC-V Server Platform specification[1] defines a standardized set
> of hardware and software capabilities, that portable system software,
> such as OS and hypervisors can rely on being present in a RISC-V server
> platform.
> 
> A corresponding Qemu RISC-V server platform reference (rvsp-ref for
> short) machine type is added to provide a environment for firmware/OS
> development and testing. The main features included in rvsp-ref are:
> 
>  - Based on riscv virt machine type
>  - A new memory map as close as virt machine as possible
>  - A new virt CPU type rvsp-ref-cpu for server platform compliance
>  - AIA
>  - PCIe AHCI
>  - PCIe NIC

We should rebase on the IOMMU series [1] and add an IOMMU to the
platform, as it's required by the Server Soc spec (which is required
by the server platform spec).

[1] 
https://lore.kernel.org/qemu-devel/20240307160319.675044-1-dbarb...@ventanamicro.com/

Thanks,
drew

[PATCH] Hexagon: fix HVX store new

2024-05-20 Thread Matheus Tavares Bernardino

At 09a7e7db0f (Hexagon (target/hexagon) Remove uses of
op_regs_generated.h.inc, 2024-03-06), we've changed the logic of
check_new_value() to use the new pre-calculated
packet->insn[...].dest_idx instead of calculating the index on the fly
using opcode_reginfo[...]. The dest_idx index is calculated roughly like
the following:

for reg in iset[tag]["syntax"]:
if reg.is_written():
dest_idx = regno
break

Thus, we take the first register that is writtable. Before that,
however, we also used to follow an alphabetical order on the register
type: 'd', 'e', 'x', and 'y'. No longer following that makes us select
the wrong register index and the HVX store new instruction does not
update the memory like expected.

Signed-off-by: Matheus Tavares Bernardino 
---
 tests/tcg/hexagon/hvx_misc.c  | 23 +++
 target/hexagon/gen_trans_funcs.py |  9 ++---
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
index 1fe14b5158..90c3733da0 100644
--- a/tests/tcg/hexagon/hvx_misc.c
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -474,6 +474,27 @@ static void test_vcombine(void)
 check_output_w(__LINE__, BUFSIZE);
 }
 
+void test_store_new()
+{
+asm volatile(
+"r0 = #0x12345678\n"
+"v0 = vsplat(r0)\n"
+"r0 = #0xff00ff00\n"
+"v1 = vsplat(r0)\n"
+"{\n"
+"   vdeal(v1,v0,r0)\n"
+"   vmem(%0) = v0.new\n"
+"}\n"
+:
+: "r"([0])
+: "r0", "v0", "v1", "memory"
+);
+for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
+expect[0].w[i] = 0x12345678;
+}
+check_output_w(__LINE__, 1);
+}
+
 int main()
 {
 init_buffers();
@@ -515,6 +536,8 @@ int main()
 
 test_vcombine();
 
+test_store_new();
+
 puts(err ? "FAIL" : "PASS");
 return err ? 1 : 0;
 }
diff --git a/target/hexagon/gen_trans_funcs.py 
b/target/hexagon/gen_trans_funcs.py
index 9f86b4edbd..30f0c73e0c 100755
--- a/target/hexagon/gen_trans_funcs.py
+++ b/target/hexagon/gen_trans_funcs.py
@@ -89,6 +89,7 @@ def gen_trans_funcs(f):
 
 new_read_idx = -1
 dest_idx = -1
+dest_idx_reg_id = None
 has_pred_dest = "false"
 for regno, (reg_type, reg_id, *_) in enumerate(regs):
 reg = hex_common.get_register(tag, reg_type, reg_id)
@@ -97,9 +98,11 @@ def gen_trans_funcs(f):
 """))
 if reg.is_read() and reg.is_new():
 new_read_idx = regno
-# dest_idx should be the first destination, so check for -1
-if reg.is_written() and dest_idx == -1:
-dest_idx = regno
+if reg.is_written():
+# dest_idx should be the first destination alphabetically
+if dest_idx_reg_id is None or reg_id < dest_idx_reg_id:
+dest_idx = regno
+dest_idx_reg_id = reg_id
 if reg_type == "P" and reg.is_written() and not reg.is_read():
 has_pred_dest = "true"
 
-- 
2.37.2

Re: [Semihosting Tests PATCH 3/3] add SYS_GET_CMDLINE test

2024-05-20 Thread Peter Maydell

On Mon, 13 May 2024 at 12:35, Alex Bennée  wrote:
>
> We actually had the stubs to implement this. The main pain is getting
> the binary name into the program so we can validate the result.

Could you write the commit message so that it makes sense
without reading the Subject line, please ?

> index 5df95f3..268a9d9 100644
> --- a/usertest.c
> +++ b/usertest.c
> @@ -315,6 +315,26 @@ static int test_feature_detect(void)
>  return 0;
>  }
>
> +static int test_cmdline(void)
> +{
> +char cmdline[256];
> +int actual;
> +const char *s, *c;
> +
> +if (semi_get_cmdline([0], sizeof(cmdline), )) {
> +semi_write0("FAIL could recover command line\n");

"couldn't", I guess.

> +return 1;
> +}
> +
> +if (strcmp([0], BINARY_NAME) != 0) {

Why "[0]" and not just "cmdline" ?

> +semi_write0("FAIL unexpected command line:");

Space after the colon will make the error message a bit
more neatly formatted.

> +semi_write0([0]);

Missing "return 1" ?

> +}

Is it worth testing that the length value returned
by the semihosting function matches the length of
the string?

> +
> +semi_write0("PASS command line test\n");
> +return 0;
> +}
> +
>  int main(void)
>  {
>  void *bufp;
> @@ -366,6 +386,10 @@ int main(void)
>  return 1;
>  }
>
> +if (test_cmdline()) {
> +return 1;
> +}
> +
>  semi_write0("ALL TESTS PASSED\n");
>
>  /* If we have EXIT_EXTENDED then use it */
> --
> 2.39.2

thanks
-- PMM

Re: [Semihosting Tests PATCH 2/3] update includes for bare metal compiling

2024-05-20 Thread Peter Maydell

On Mon, 13 May 2024 at 12:35, Alex Bennée  wrote:
>
> We shouldn't use  for our own implementation. Also the base
> types we need live in  as  doesn't exist for the
> bare metal compilers.
>
> Signed-off-by: Alex Bennée 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [Semihosting Tests PATCH 1/3] .editorconfig: add code conventions for tooling

2024-05-20 Thread Peter Maydell

On Mon, 13 May 2024 at 12:35, Alex Bennée  wrote:
>
> It's a pain when you come back to a code base you haven't touched in a
> while and realise whatever indent settings you were using having
> carried over. Add an editorconfig and be done with it.
>
> Signed-off-by: Alex Bennée 
> ---
>  .editorconfig | 28 
>  1 file changed, 28 insertions(+)
>  create mode 100644 .editorconfig
>
> diff --git a/.editorconfig b/.editorconfig
> new file mode 100644
> index 000..e1540ae
> --- /dev/null
> +++ b/.editorconfig
> @@ -0,0 +1,28 @@
> +# EditorConfig is a file format and collection of text editor plugins
> +# for maintaining consistent coding styles between different editors
> +# and IDEs. Most popular editors support this either natively or via
> +# plugin.
> +#
> +# Check https://editorconfig.org for details.
> +#
> +# Emacs: you need https://github.com/10sr/editorconfig-custom-majormode-el
> +# to automatically enable the appropriate major-mode for your files
> +# that aren't already caught by your existing config.
> +#
> +
> +root = true
> +
> +[*]
> +end_of_line = lf
> +insert_final_newline = true
> +charset = utf-8
> +
> +[Makefile*]
> +indent_style = tab
> +indent_size = 8
> +emacs_mode = makefile
> +
> +[*.{c,h}]
> +indent_style = space
> +indent_size = 4
> +emacs_mode = c

The QEMU .editorconfig has a stanza for .s/.S files too:

[*.{s,S}]
indent_style = tab
indent_size = 8
emacs_mode = asm

thanks
-- PMM

Re: [PATCH] Fixes: Indentation using spaces instead of TABS and improve formatting

2024-05-20 Thread Tanmay

Sure! Thanks for the update.

~ Tanmay

On Mon, 20 May, 2024, 6:32 pm Peter Maydell, 
wrote:

> On Wed, 8 May 2024 at 09:15, Tanmay Patil 
> wrote:
> >
> > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/373
> >
> > Files changed:
> > - hw/arm/boot.c
> > - hw/char/omap_uart.c
> > - hw/gpio/zaurus.c
> > - hw/input/tsc2005.c
> >
> > Signed-off-by: Tanmay Patil 
>
> Thanks for this patch; I've applied it to my target-arm.next
> queue and it will get upstream within the next week or so.
> (I tweaked the commit message format a bit.)
>
> -- PMM
>

Re: [PATCH 0/4] Check clock connection between STM32L4x5 RCC and peripherals

2024-05-20 Thread Peter Maydell

On Tue, 7 May 2024 at 19:59, Inès Varhol  wrote:
>
> Among implemented STM32L4x5 devices, USART, GPIO and SYSCFG
> have a clock source, but none has a corresponding test in QEMU.
>
> This patch makes sure that all 3 devices create a clock,
> have a QOM property to access the clock frequency,
> and adds QTests checking that clock enable in RCC has the
> expected results.
>
> Philippe Mathieu-Daudé suggested the following :
> ".. We could add the clock properties
> directly in qdev_init_clock_in(). Seems useful for the QTest
> framework."
>
> However Peter Maydell pointed out the following :
> "...Mostly "frequency" properties on devices are for the case
> where they *don't* have a Clock input and instead have
> ad-hoc legacy handling where the board/SoC that creates the
> device sets an integer property to define the input frequency
> because it doesn't model the clock tree with Clock objects."
>
> You both agree on the fact that replicating the code in the
> different devices is a bad idea, what should be the
> alternative?

I think we should use the approach discussed in the review
comments on Philippe's patch
https://patchew.org/QEMU/20240508141333.44610-1-phi...@linaro.org/
where if we're running a qtest then the core clock code creates a
QOM property which is the clock period; the test code can then use
that.

thanks
-- PMM

Re: [PATCH] hw/clock: Expose 'freq-hz' QOM property

2024-05-20 Thread Peter Maydell

On Wed, 8 May 2024 at 22:27, Philippe Mathieu-Daudé  wrote:
>
> On 8/5/24 19:46, Peter Maydell wrote:
> > On Wed, 8 May 2024 at 15:13, Philippe Mathieu-Daudé  
> > wrote:
> >>
> >> Expose the clock frequency via the QOM 'freq-hz' property,
> >> as it might be useful for QTests.
> >>
> >> HMP example:
> >>
> >>$ qemu-system-mips -S -monitor stdio -M mipssim
> >>(qemu) qom-get /machine/cpu-refclk freq-hz
> >>1200
> >>
> >> Inspired-by: Inès Varhol 
> >> Signed-off-by: Philippe Mathieu-Daudé 
> >
> > So I have a couple of thoughts here:
> >
> > (1) if this is intended for qtests, would exposing the period (i.e.
> > QOM equivalent of clock_get() rather than clock_get_hz()) be better?
> > A Hz figure has rounding so it's not as accurate.
>
> Indeed, simpler to compare from QTest perspective.
>
> > (2) We should document this in clocks.rst; I guess we want to say
> > "only intended for use in qtests" (i.e. if you're part of QEMU
> > use the existing function interface, not this).
>
> OK, and we can also only expose this for QTest using:
>
>if (qtest_enabled()) {
>object_property_add(obj, "[qtest-]clock-period", ...);
>}

Yes, that seems reasonable. (I don't know if we have any other
qtest-only properties but I don't see any reason why we
shouldn't have them if we want to expose stuff for tests only.)

thanks
-- PMM

Re: [PATCH 1/3] hw/misc: In STM32L4x5 EXTI, consolidate 2 constants

2024-05-20 Thread Peter Maydell

On Sun, 12 May 2024 at 11:20, Inès Varhol  wrote:
>
> Up until now, the EXTI implementation had 16 inbound GPIOs connected to
> the 16 outbound GPIOs of STM32L4x5 SYSCFG.
> The EXTI actually handles 40 lines (namely 5 from STM32L4x5 USART
> devices which are already implemented in QEMU).
> In order to connect USART devices to EXTI, this commit consolidates
> constants `EXTI_NUM_INTERRUPT_OUT_LINES` (40) and
> `EXTI_NUM_GPIO_EVENT_IN_LINES` (16) into `EXTI_NUM_LINES` (40).
>
> Signed-off-by: Inès Varhol 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 3/3] hw/arm: In STM32L4x5 SOC, connect USART devices to EXTI

2024-05-20 Thread Peter Maydell

On Sun, 12 May 2024 at 11:20, Inès Varhol  wrote:
>
> The USART devices were previously connecting their outbound IRQs
> directly to the CPU because the EXTI wasn't handling direct lines
> interrupts.
> Now the USART connects to the EXTI inbound GPIOs, and the EXTI connects
> its IRQs to the CPU.
> The existing QTest for the USART (tests/qtest/stm32l4x5_usart-test.c)
> checks that USART1_IRQ in the CPU is pending when expected so it
> confirms that the connection through the EXTI still works.
>
> Signed-off-by: Inès Varhol 

Reviewed-by: Peter Maydell 

thanks
-- PMM

Re: [PATCH 2/3] hw/misc: In STM32L4x5 EXTI, handle direct line interrupts

2024-05-20 Thread Peter Maydell

On Sun, 12 May 2024 at 11:20, Inès Varhol  wrote:
>
> The previous implementation for EXTI interrupts only handled
> "configurable" interrupts, like those originating from STM32L4x5 SYSCFG
> (the only device currently connected to the EXTI up until now).
> In order to connect STM32L4x5 USART to the EXTI, this commit adds
> handling for direct interrupts (interrupts without configurable edge),
> as well as a comment that will be useful to connect other devices to the
> EXTI.
>
> Signed-off-by: Inès Varhol 
> ---
>  hw/misc/stm32l4x5_exti.c | 23 ++-
>  1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/hw/misc/stm32l4x5_exti.c b/hw/misc/stm32l4x5_exti.c
> index eebefc6cd3..1817bbdad2 100644
> --- a/hw/misc/stm32l4x5_exti.c
> +++ b/hw/misc/stm32l4x5_exti.c
> @@ -106,6 +106,27 @@ static void stm32l4x5_exti_set_irq(void *opaque, int 
> irq, int level)
>  return;
>  }
>
> +/* In case of a direct line interrupt */
> +if (extract32(exti_romask[bank], irq, 1)) {
> +if (level) {
> +qemu_irq_raise(s->irq[oirq]);
> +} else {
> +qemu_irq_lower(s->irq[oirq]);
> +}

You can say this more concisely as
   qemu_set_irq(s->irq[oirq], level);

> +return;
> +}
> +
> +/*
> + * In case of a configurable interrupt
> + *
> + * Note that while the real EXTI uses edge detection to tell
> + * apart a line rising (the level changes from 0 to 1) and a line
> + * staying high (the level was 1 and is set to 1), the current
> + * implementation relies on the fact that this handler will only
> + * be called when there's a level change. That means that the
> + * devices creating a configurable interrupt (like STM32L4x5 GPIO)
> + * have to set their IRQs only on a change.
> + */

You cannot rely on this in QEMU's qemu_irq API. The set
function may be called multiple times with the same input
level value. If you need to detect rising edges then this
device needs to have a state field that records the current
value so it can compare the 'level' argument here against
what it was previously.

>  if (((1 << irq) & s->rtsr[bank]) && level) {
>  /* Rising Edge */
>  s->pr[bank] |= 1 << irq;
> @@ -116,7 +137,7 @@ static void stm32l4x5_exti_set_irq(void *opaque, int irq, 
> int level)
>  qemu_irq_pulse(s->irq[oirq]);
>  }
>  /*
> - * In the following situations :
> + * In the following situations (for configurable interrupts) :
>   * - falling edge but rising trigger selected
>   * - rising edge but falling trigger selected
>   * - no trigger selected

thanks
-- PMM

Re: [PATCH] docs/system: Remove ADC from raspi documentation

2024-05-20 Thread Peter Maydell

On Sun, 12 May 2024 at 10:00, Rayhan Faizel  wrote:
>
> None of the RPi boards have ADC on-board. In real life, an external ADC chip
> is required to operate on analog signals.
>
> Signed-off-by: Rayhan Faizel 



Applied to target-arm.next, thanks.

-- PMM

[PATCH] docs/system/target-arm: Re-alphabetize board list

2024-05-20 Thread Peter Maydell

The board list in target-arm.rst is supposed to be in alphabetical
order by the title text of each file (which is not the same as
alphabetical order by filename).  A few items had got out of order;
correct them.

The entry for
"Facebook Yosemite v3.5 Platform and CraterLake Server (fby35)"
remains out-of-order, because this is not its own file
but is currently part of the aspeed.rst file.

Signed-off-by: Peter Maydell 
---
 docs/system/target-arm.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index c9d7c0dda7e..870d30e3502 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -86,16 +86,16 @@ undocumented; you can get a complete list by running
arm/bananapi_m2u.rst
arm/b-l475e-iot01a.rst
arm/sabrelite
+   arm/highbank
arm/digic
arm/cubieboard
arm/emcraft-sf2
-   arm/highbank
arm/musicpal
arm/gumstix
arm/mainstone
arm/kzm
-   arm/nrf
arm/nseries
+   arm/nrf
arm/nuvoton
arm/imx25-pdk
arm/orangepi
@@ -107,8 +107,8 @@ undocumented; you can get a complete list by running
arm/stellaris
arm/stm32
arm/virt
-   arm/xlnx-versal-virt
arm/xenpvh
+   arm/xlnx-versal-virt
 
 Emulated CPU architecture support
 =
-- 
2.34.1

[PATCH v2 5/6] vhost, vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits

2024-05-20 Thread Jonah Palmer via

Add support for the VIRTIO_F_IN_ORDER feature across a variety of vhost
devices.

The inclusion of VIRTIO_F_IN_ORDER in the feature bits arrays for these
devices ensures that the backend is capable of offering and providing
support for this feature, and that it can be disabled if the backend
does not support it.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c| 1 +
 hw/net/vhost_net.c   | 2 ++
 hw/scsi/vhost-scsi.c | 1 +
 hw/scsi/vhost-user-scsi.c| 1 +
 hw/virtio/vhost-user-fs.c| 1 +
 hw/virtio/vhost-user-vsock.c | 1 +
 net/vhost-vdpa.c | 1 +
 7 files changed, 8 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index 9e6bbc6950..1dd0a8ef63 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -51,6 +51,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index fd1a93701a..eb0b1c06e5 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -48,6 +48,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_HASH_REPORT,
 VHOST_INVALID_FEATURE_BIT
 };
@@ -76,6 +77,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_HASH_REPORT,
 VIRTIO_NET_F_GUEST_USO4,
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index ae26bc19a4..40e7630191 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -38,6 +38,7 @@ static const int kernel_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c
index a63b1f4948..1d59951ab7 100644
--- a/hw/scsi/vhost-user-scsi.c
+++ b/hw/scsi/vhost-user-scsi.c
@@ -36,6 +36,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_SCSI_F_HOTPLUG,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index cca2cd41be..9243dbb128 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -33,6 +33,7 @@ static const int user_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_IOMMU_PLATFORM,
 VIRTIO_F_RING_RESET,
+VIRTIO_F_IN_ORDER,
 
 VHOST_INVALID_FEATURE_BIT
 };
diff --git a/hw/virtio/vhost-user-vsock.c b/hw/virtio/vhost-user-vsock.c
index 9431b9792c..cc7e4e47b4 100644
--- a/hw/virtio/vhost-user-vsock.c
+++ b/hw/virtio/vhost-user-vsock.c
@@ -21,6 +21,7 @@ static const int user_feature_bits[] = {
 VIRTIO_RING_F_INDIRECT_DESC,
 VIRTIO_RING_F_EVENT_IDX,
 VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_F_IN_ORDER,
 VHOST_INVALID_FEATURE_BIT
 };
 
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 85e73dd6a7..ed3185acfa 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -62,6 +62,7 @@ const int vdpa_feature_bits[] = {
 VIRTIO_F_RING_PACKED,
 VIRTIO_F_RING_RESET,
 VIRTIO_F_VERSION_1,
+VIRTIO_F_IN_ORDER,
 VIRTIO_NET_F_CSUM,
 VIRTIO_NET_F_CTRL_GUEST_OFFLOADS,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
-- 
2.39.3

Re: [RISC-V][tech-server-soc] [RFC v2 1/2] target/riscv: Add server platform reference cpu

2024-05-20 Thread Andrew Jones

On Tue, Mar 12, 2024 at 09:52:20PM GMT, Wu, Fei2 wrote:
> The harts requirements of RISC-V server platform [1] require RVA23 ISA
> profile support, plus Sv48, Svadu, H, Sscofmpf etc. This patch provides
> a virt CPU type (rvsp-ref) as compliant as possible.

We should add the RVA23 profile cpu type first, and then base a reference
cpu type on that. But, I guess we should version the reference type, since
we shouldn't expect the reference type to be bound to only RVA23 forever.

Thanks,
drew

Re: [PATCH 2/2] hw/arm/xilinx_zynq: Support up to two CPU cores

2024-05-20 Thread Peter Maydell

On Tue, 7 May 2024 at 14:04, Sebastian Huber
 wrote:
>
> The Zynq 7000 SoCs contain two Arm Cortex-A9 MPCore (the Zynq 7000S have only
> one core).  Add support for up to two simulated cores.
>
> Signed-off-by: Sebastian Huber 
> ---
>  hw/arm/xilinx_zynq.c | 42 +++---
>  1 file changed, 27 insertions(+), 15 deletions(-)
>
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 078abd77bd..3b858e3e9a 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -184,6 +184,8 @@ static void zynq_init(MachineState *machine)
>  SysBusDevice *busdev;
>  qemu_irq pic[64];
>  int n;
> +unsigned int smp_cpus = machine->smp.cpus;
> +qemu_irq cpu_irq[2];

We prefer not to have arrays of qemu_irq like this that are
just passing qemu_irqs from one place to another. Instead
at the point where you want the ARM_CPU_IRQ of a particular
CPU, call qdev_get_gpio_in() on the CPU object there.

I suggest dropping the "ARMCPU *cpu" local from this function
and instead adding an "ARMCPU *cpu[ZYNQ_MAX_CPUS]" array to
the ZynqMachineState struct.

>  /* max 2GB ram */
>  if (machine->ram_size > 2 * GiB) {
> @@ -191,21 +193,27 @@ static void zynq_init(MachineState *machine)
>  exit(EXIT_FAILURE);
>  }
>
> -cpu = ARM_CPU(object_new(machine->cpu_type));
> +for (n = 0; n < smp_cpus; n++) {
> +Object *cpuobj = object_new(machine->cpu_type);
>
> -/* By default A9 CPUs have EL3 enabled.  This board does not
> - * currently support EL3 so the CPU EL3 property is disabled before
> - * realization.
> - */
> -if (object_property_find(OBJECT(cpu), "has_el3")) {
> -object_property_set_bool(OBJECT(cpu), "has_el3", false, 
> _fatal);
> -}
> +/* By default A9 CPUs have EL3 enabled.  This board does not
> + * currently support EL3 so the CPU EL3 property is disabled before
> + * realization.
> + */

If you're moving comment text around checkpatch will suggest that
you fix it up to our current coding standard, which is that
a multiline comment has the "/*" on a line of its own.

> +if (object_property_find(cpuobj, "has_el3")) {
> +object_property_set_bool(cpuobj, "has_el3", false, _fatal);
> +}
> +
> +object_property_set_int(cpuobj, "midr", ZYNQ_BOARD_MIDR,
> +_fatal);
> +object_property_set_int(cpuobj, "reset-cbar", MPCORE_PERIPHBASE,
> +_fatal);
>
> -object_property_set_int(OBJECT(cpu), "midr", ZYNQ_BOARD_MIDR,
> -_fatal);
> -object_property_set_int(OBJECT(cpu), "reset-cbar", MPCORE_PERIPHBASE,
> -_fatal);
> -qdev_realize(DEVICE(cpu), NULL, _fatal);
> +qdev_realize(DEVICE(cpuobj), NULL, _fatal);
> +
> +cpu_irq[n] = qdev_get_gpio_in(DEVICE(cpuobj), ARM_CPU_IRQ);
> +}
> +cpu = ARM_CPU(first_cpu);
>
>  /* DDR remapped to address zero.  */
>  memory_region_add_subregion(address_space_mem, 0, machine->ram);
> @@ -238,10 +246,14 @@ static void zynq_init(MachineState *machine)
>  sysbus_mmio_map(SYS_BUS_DEVICE(slcr), 0, 0xF800);
>
>  dev = qdev_new(TYPE_A9MPCORE_PRIV);
> -qdev_prop_set_uint32(dev, "num-cpu", 1);
> +qdev_prop_set_uint32(dev, "num-cpu", smp_cpus);
>  busdev = SYS_BUS_DEVICE(dev);
>  sysbus_realize_and_unref(busdev, _fatal);
>  sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
> +for (n = 0; n < smp_cpus; n++) {
> +sysbus_connect_irq(busdev, n, cpu_irq[n]);
> +}

Looks like you have based this on a version of QEMU which doesn't
have commit 68a5827b80117973 which wires up the FIQ line of the
A9MPCORE_PRIV device to the CPUs.

> +zynq_binfo.gic_cpu_if_addr = MPCORE_PERIPHBASE + 0x100;
>  sysbus_create_varargs("l2x0", MPCORE_PERIPHBASE + 0x2000, NULL);
>  sysbus_connect_irq(busdev, 0,
> qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));
> @@ -357,7 +369,7 @@ static void zynq_machine_class_init(ObjectClass *oc, void 
> *data)
>  MachineClass *mc = MACHINE_CLASS(oc);
>  mc->desc = "Xilinx Zynq Platform Baseboard for Cortex-A9";
>  mc->init = zynq_init;
> -mc->max_cpus = 1;
> +mc->max_cpus = 2;
>  mc->no_sdcard = 1;
>  mc->ignore_memory_transaction_failures = true;
>  mc->default_cpu_type = ARM_CPU_TYPE_NAME("cortex-a9");
> --

I'm not making this a condition for accepting this patch, but
since you're working on this board model would you consider
writing up some documentation for it? It's one of the boards
we do not currently have documented at all. This doesn't have to
be very extensive: a few paragraphs describing what the board
type is, maybe linking to a reference to the hardware, listing
what is and isn't implemented, and (if there are some convenient
examples available) perhaps listing some examples of use.
docs/system/arm/xlnx-versal-virt.rst is the docs

Re: [RFC PATCH v3 16/18] hw/arm/smmu: Refactor SMMU OAS

2024-05-20 Thread Eric Auger

Hi Mostafa,
On 4/29/24 05:24, Mostafa Saleh wrote:
> SMMUv3 OAS is hardcoded to 44 bits, for nested configurations that
is currently hardcoded in the code.
> can be a problem as stage-2 might be shared with the CPU which might
> have different PARANGE, and according to SMMU manual ARM IHI 0070F.b:
> 6.3.6 SMMU_IDR5, OAS must match the system physical address size.
>
> This patch doesn't change the SMMU OAS, but refactors the code to
> make it easier to do that:
> - Rely everywhere on IDR5 for reading OAS instead of using the macro so
instead of using the SMMU_IDR5_OAS macro.
Also add additional checks when OAS is greater than 48bits
>   it is easier just change IDR5 and it propagages correctly.
> - Remove unused functions/macros: pa_range/MAX_PA
>
> Signed-off-by: Mostafa Saleh 
> ---
>  hw/arm/smmu-common.c |  7 ---
>  hw/arm/smmuv3-internal.h | 13 -
>  hw/arm/smmuv3.c  | 35 ---
>  3 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 3ed0be05ef..b559878aef 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -434,7 +434,8 @@ static int smmu_ptw_64_s1(SMMUTransCfg *cfg,
>  inputsize = 64 - tt->tsz;
>  level = 4 - (inputsize - 4) / stride;
>  indexmask = VMSA_IDXMSK(inputsize, stride, level);
> -baseaddr = extract64(tt->ttb, 0, 48);
> +
> +baseaddr = extract64(tt->ttb, 0, cfg->oas);
>  baseaddr &= ~indexmask;
>  
>  while (level < VMSA_LEVELS) {
> @@ -557,8 +558,8 @@ static int smmu_ptw_64_s2(SMMUTransCfg *cfg,
>   * Get the ttb from concatenated structure.
>   * The offset is the idx * size of each ttb(number of ptes * 
> (sizeof(pte))
>   */
> -uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, 48) + (1 << stride)
> -  idx * sizeof(uint64_t);
> +uint64_t baseaddr = extract64(cfg->s2cfg.vttb, 0, cfg->s2cfg.eff_ps) +
> +  (1 << stride) * idx * sizeof(uint64_t);
>  dma_addr_t indexmask = VMSA_IDXMSK(inputsize, stride, level);
>  
>  baseaddr &= ~indexmask;
> diff --git a/hw/arm/smmuv3-internal.h b/hw/arm/smmuv3-internal.h
> index 0f3ecec804..0ebf2eebcf 100644
> --- a/hw/arm/smmuv3-internal.h
> +++ b/hw/arm/smmuv3-internal.h
> @@ -602,19 +602,6 @@ static inline int oas2bits(int oas_field)
>  return -1;
>  }
>  
> -static inline int pa_range(STE *ste)
> -{
> -int oas_field = MIN(STE_S2PS(ste), SMMU_IDR5_OAS);
> -
> -if (!STE_S2AA64(ste)) {
> -return 40;
> -}
> -
> -return oas2bits(oas_field);
> -}
> -
> -#define MAX_PA(ste) ((1 << pa_range(ste)) - 1)
> -
>  /* CD fields */
>  
>  #define CD_VALID(x)   extract32((x)->word[0], 31, 1)
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 8a11e41144..4ac818cf7a 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -408,10 +408,10 @@ static bool s2t0sz_valid(SMMUTransCfg *cfg)
>  }
>  
>  if (cfg->s2cfg.granule_sz == 16) {
> -return (cfg->s2cfg.tsz >= 64 - oas2bits(SMMU_IDR5_OAS));
> +return (cfg->s2cfg.tsz >= 64 - cfg->s2cfg.eff_ps);
>  }
>  
> -return (cfg->s2cfg.tsz >= MAX(64 - oas2bits(SMMU_IDR5_OAS), 16));
> +return (cfg->s2cfg.tsz >= MAX(64 - cfg->s2cfg.eff_ps, 16));
>  }
>  
>  /*
> @@ -432,8 +432,11 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t 
> t0sz, uint8_t gran)
>  return nr_concat <= VMSA_MAX_S2_CONCAT;
>  }
>  
> -static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
> +static int decode_ste_s2_cfg(SMMUv3State *s, SMMUTransCfg *cfg,
> + STE *ste)
>  {
> +uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
> +
>  if (STE_S2AA64(ste) == 0x0) {
>  qemu_log_mask(LOG_UNIMP,
>"SMMUv3 AArch32 tables not supported\n");
> @@ -466,7 +469,15 @@ static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
>  }
>  
>  /* For AA64, The effective S2PS size is capped to the OAS. */
> -cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), SMMU_IDR5_OAS));
> +cfg->s2cfg.eff_ps = oas2bits(MIN(STE_S2PS(ste), oas));
> +/*
> + * For SMMUv3.1 and later, when OAS == IAS == 52, the stage 2 input
> + * range is further limited to 48 bits unless STE.S2TG indicates a
> + * 64KB granule.
> + */
> +if (cfg->s2cfg.granule_sz != 16) {
> +cfg->s2cfg.eff_ps = MIN(cfg->s2cfg.eff_ps, 48);
> +}
>  /*
>   * It is ILLEGAL for the address in S2TTB to be outside the range
>   * described by the effective S2PS value.
> @@ -542,6 +553,7 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>STE *ste, SMMUEventInfo *event)
>  {
>  uint32_t config;
> +uint8_t oas = FIELD_EX32(s->idr[5], IDR5, OAS);
>  int ret;
>  
>  if (!STE_VALID(ste)) {
> @@ -585,8 +597,8 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>   * Stage-1 OAS defaults to OAS even if not

Re: [PATCH 1/2] hw/arm/xilinx_zynq: Add cache controller

2024-05-20 Thread Peter Maydell

On Tue, 7 May 2024 at 14:04, Sebastian Huber
 wrote:
>
> The Zynq 7000 SoCs contain a CoreLink L2C-310 cache controller.  Add the
> corresponding Qemu device to the xilinx-zynq-a9 machine.
>
> Signed-off-by: Sebastian Huber 
> ---
>  hw/arm/xilinx_zynq.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/hw/arm/xilinx_zynq.c b/hw/arm/xilinx_zynq.c
> index 3190cc0b8d..078abd77bd 100644
> --- a/hw/arm/xilinx_zynq.c
> +++ b/hw/arm/xilinx_zynq.c
> @@ -242,6 +242,7 @@ static void zynq_init(MachineState *machine)
>  busdev = SYS_BUS_DEVICE(dev);
>  sysbus_realize_and_unref(busdev, _fatal);
>  sysbus_mmio_map(busdev, 0, MPCORE_PERIPHBASE);
> +sysbus_create_varargs("l2x0", MPCORE_PERIPHBASE + 0x2000, NULL);
>  sysbus_connect_irq(busdev, 0,
> qdev_get_gpio_in(DEVICE(cpu), ARM_CPU_IRQ));

If we add the cache controller to this board we also need to
update the board's entry in hw/arm/Kconfig to add a
"select PL310" line. This ensures that if the user asks
to build the zynq board type then we will also compile in
the PL310 device.

thanks
-- PMM

Re: [PATCH 2/2] hw/intc/arm_gic: Fix writes to GICD_ITARGETSRn

2024-05-20 Thread Peter Maydell

On Tue, 7 May 2024 at 14:00, Sebastian Huber
 wrote:
>
> According to the GICv2 specification section 4.3.12, "Interrupt Processor
> Targets Registers, GICD_ITARGETSRn":
>
> "Any change to a CPU targets field value:
> [...]
> * Has an effect on any pending interrupts. This means:
>   - adding a CPU interface to the target list of a pending interrupt makes 
> that
> interrupt pending on that CPU interface
>   - removing a CPU interface from the target list of a pending interrupt
> removes the pending state of that interrupt on that CPU interface."
>
> Signed-off-by: Sebastian Huber 
> ---
>  hw/intc/arm_gic.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
> index 20b3f701e0..79aee56053 100644
> --- a/hw/intc/arm_gic.c
> +++ b/hw/intc/arm_gic.c
> @@ -1397,6 +1397,13 @@ static void gic_dist_writeb(void *opaque, hwaddr 
> offset,
>  value = ALL_CPU_MASK;
>  }
>  s->irq_target[irq] = value & ALL_CPU_MASK;
> +if (irq >= GIC_INTERNAL && s->irq_state[irq].pending) {
> +/*
> + * Changing the target of an interrupt that is currently
> + * pending updates the set of CPUs it is pending on.
> + */
> +GIC_DIST_SET_PENDING(irq, value);

Looking back at the 2021 thread this is the change I suggested then,
but I think I was wrong. This will set the pending bit for the new
specified set of targets, but it won't remove it from any CPUs that
previously were targeted and are not in the new target list (because
GIC_DIST_SET_PENDING does a logical OR into the pending field).
So I think what we want is
   s->irq_state[irq].pending = value & ALL_CPU_MASK;

> +}
>  }
>  } else if (offset < 0xf00) {
>  /* Interrupt Configuration.  */
> --

thanks
-- PMM

Re: [RFC PATCH v3 15/18] hw/arm/smmuv3: Advertise S2FWB

2024-05-20 Thread Eric Auger




On 4/29/24 05:23, Mostafa Saleh wrote:
> QEMU doesn's support memory attributes, so FWB is NOP, this
> might change in the future if memory attributre would be supported.
if mem attributes get supported
>
> Signed-off-by: Mostafa Saleh 
> ---
>  hw/arm/smmuv3.c | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 88f6473d33..8a11e41144 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -287,6 +287,14 @@ static void smmuv3_init_regs(SMMUv3State *s)
>  if (FIELD_EX32(s->idr[0], IDR0, S2P)) {
>  /* XNX is a stage-2-specific feature */
>  s->idr[3] = FIELD_DP32(s->idr[3], IDR3, XNX, 1);
> +if (FIELD_EX32(s->idr[0], IDR0, S1P)) {
> +/*
> + * QEMU doesn's support memory attributes, so FWB is NOP, this
> + * might change in the future if memory attributre would be
if mem attributes get supported
> + * supported.
> + */
> +   s->idr[3] = FIELD_DP32(s->idr[3], IDR3, FWB, 1);
spec says:
0b0    Stage 2 control of memory types and attributes is
not supported and the STE.S2FWB bit is RES 0.


Thanks

Eric
> +}
>  }
>  s->idr[3] = FIELD_DP32(s->idr[3], IDR3, RIL, 1);
>  s->idr[3] = FIELD_DP32(s->idr[3], IDR3, BBML, 2);

Re: [PATCH 1/2] hw/intc/arm_gic: Fix set pending of PPIs

2024-05-20 Thread Peter Maydell

On Tue, 7 May 2024 at 14:00, Sebastian Huber
 wrote:
>
> According to the GICv2 specification section 4.3.7, "Interrupt Set-Pending
> Registers, GICD_ISPENDRn":
>
> "In a multiprocessor implementation, GICD_ISPENDR0 is banked for each 
> connected
> processor. This register holds the Set-pending bits for interrupts 0-31."

The commit message says it's only changing the handling of
setting the pending bit for a PPI, but...

> Signed-off-by: Sebastian Huber 
> ---
>  hw/intc/arm_gic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
> index 4da5326ed6..20b3f701e0 100644
> --- a/hw/intc/arm_gic.c
> +++ b/hw/intc/arm_gic.c
> @@ -1296,12 +1296,14 @@ static void gic_dist_writeb(void *opaque, hwaddr 
> offset,
>
>  for (i = 0; i < 8; i++) {
>  if (value & (1 << i)) {
> +int cm = (irq < GIC_INTERNAL) ? (1 << cpu) : ALL_CPU_MASK;
> +
>  if (s->security_extn && !attrs.secure &&
>  !GIC_DIST_TEST_GROUP(irq + i, 1 << cpu)) {
>  continue; /* Ignore Non-secure access of Group0 IRQ */
>  }
>
> -GIC_DIST_SET_PENDING(irq + i, GIC_DIST_TARGET(irq + i));
> +GIC_DIST_SET_PENDING(irq + i, cm);

... the patch changes also the handling of set-pending for
SPIs (which previously were marked pending on the target CPU
and are now marked pending on all CPUs).

Looking back at the thread from your 2021 patch this was also
noted in that version as being wrong:
https://lore.kernel.org/qemu-devel/20210725080817.ivlkutnow7soj...@sekoia-pc.home.lmichel.fr/

PS: for multi-patch patches please can you send also a cover
letter? Our automated tooling gets confused if there isn't one.
It looks also like you sent these respins of these patches as
followups to the thread of the original patch you sent back in
2021. Can you send new versions of patches as their own threads,
please (and with a "PATCH v2" (v3, etc) tag if they're respins?

thanks
-- PMM

Re: [PATCH 3/7] linux-user: sparc: Remove unused struct 'target_mc_fq'

2024-05-20 Thread Dr. David Alan Gilbert

* Dr. David Alan Gilbert (d...@treblig.org) wrote:
> This struct is unused since Peter's
> Commit b8ae597f0e6d ("linux-user/sparc: Fix errors in target_ucontext
> structures")
> 
> However, hmm, I'm a bit confused since that commit modifies the
> structure and then removes it, was that intentional?

Ping on this.
(I think the others in the set have been reviewed and one picked up).

Dave

> Signed-off-by: Dr. David Alan Gilbert 
> ---
>  linux-user/sparc/signal.c | 5 -
>  1 file changed, 5 deletions(-)
> 
> diff --git a/linux-user/sparc/signal.c b/linux-user/sparc/signal.c
> index f164b74032..8181b8b92c 100644
> --- a/linux-user/sparc/signal.c
> +++ b/linux-user/sparc/signal.c
> @@ -546,11 +546,6 @@ void setup_sigtramp(abi_ulong sigtramp_page)
>  typedef abi_ulong target_mc_greg_t;
>  typedef target_mc_greg_t target_mc_gregset_t[SPARC_MC_NGREG];
>  
> -struct target_mc_fq {
> -abi_ulong mcfq_addr;
> -uint32_t mcfq_insn;
> -};
> -
>  /*
>   * Note the manual 16-alignment; the kernel gets this because it
>   * includes a "long double qregs[16]" in the mcpu_fregs union,
> -- 
> 2.45.0
> 
> 
-- 
 -Open up your eyes, open up your mind, open up your code ---   
/ Dr. David Alan Gilbert|   Running GNU/Linux   | Happy  \ 
\dave @ treblig.org |   | In Hex /
 \ _|_ http://www.treblig.org   |___/

Re: [RFC PATCH v3 14/18] hw/arm/smmuv3: Support and advertise nesting

2024-05-20 Thread Eric Auger

Hi Mostafa,

On 4/29/24 05:23, Mostafa Saleh wrote:
> Everything is in place, add the last missing bits:
> - Handle fault checking according to the actual PTW event and not the
>   the translation stage.
missing the "why". Can't it be moved in a separate patch?
> - Consolidate parsing of STE cfg and setting translation stage.
>
> Advertise nesting if stage requested is "nested".
I would move the introduction of the nested option in a separate patch
and in the associated commit msg properly document how the new option
shall be used.
>
> Signed-off-by: Mostafa Saleh 
> ---
>  hw/arm/smmuv3.c | 50 +
>  1 file changed, 34 insertions(+), 16 deletions(-)
>
> diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
> index 96d07234fe..88f6473d33 100644
> --- a/hw/arm/smmuv3.c
> +++ b/hw/arm/smmuv3.c
> @@ -34,9 +34,10 @@
>  #include "smmuv3-internal.h"
>  #include "smmu-internal.h"
>  
> -#define PTW_RECORD_FAULT(cfg)   (((cfg)->stage == SMMU_STAGE_1) ? \
> - (cfg)->record_faults : \
> - (cfg)->s2cfg.record_faults)
> +#define PTW_RECORD_FAULT(ptw_info, cfg) (((ptw_info).stage == SMMU_STAGE_1 
> && \
> +(cfg)->record_faults) || \
> +((ptw_info).stage == SMMU_STAGE_2 && 
> \
> +(cfg)->s2cfg.record_faults))
>  
>  /**
>   * smmuv3_trigger_irq - pulse @irq if enabled and update
> @@ -260,6 +261,9 @@ static void smmuv3_init_regs(SMMUv3State *s)
>  /* Based on sys property, the stages supported in smmu will be 
> advertised.*/
>  if (s->stage && !strcmp("2", s->stage)) {
>  s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
> +} else if (s->stage && !strcmp("nested", s->stage)) {
> +s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
> +s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S2P, 1);
>  } else {
>  s->idr[0] = FIELD_DP32(s->idr[0], IDR0, S1P, 1);
>  }
> @@ -422,8 +426,6 @@ static bool s2_pgtable_config_valid(uint8_t sl0, uint8_t 
> t0sz, uint8_t gran)
>  
>  static int decode_ste_s2_cfg(SMMUTransCfg *cfg, STE *ste)
>  {
> -cfg->stage = SMMU_STAGE_2;
> -
>  if (STE_S2AA64(ste) == 0x0) {
>  qemu_log_mask(LOG_UNIMP,
>"SMMUv3 AArch32 tables not supported\n");
> @@ -506,6 +508,27 @@ bad_ste:
>  return -EINVAL;
>  }
>  
> +static void decode_ste_config(SMMUTransCfg *cfg, uint32_t config)
> +{
> +
> +if (STE_CFG_ABORT(config)) {
> +cfg->aborted = true;
> +return;
> +}
> +if (STE_CFG_BYPASS(config)) {
> +cfg->bypassed = true;
> +return;
> +}
> +
> +if (STE_CFG_S1_ENABLED(config)) {
> +cfg->stage = SMMU_STAGE_1;
> +}
> +
> +if (STE_CFG_S2_ENABLED(config)) {
> +cfg->stage |= SMMU_STAGE_2;
> +}
> +}
> +
>  /* Returns < 0 in case of invalid STE, 0 otherwise */
>  static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>STE *ste, SMMUEventInfo *event)
> @@ -522,13 +545,9 @@ static int decode_ste(SMMUv3State *s, SMMUTransCfg *cfg,
>  
>  config = STE_CONFIG(ste);
>  
> -if (STE_CFG_ABORT(config)) {
> -cfg->aborted = true;
> -return 0;
> -}
> +decode_ste_config(cfg, config);
>  
> -if (STE_CFG_BYPASS(config)) {
> -cfg->bypassed = true;
> +if (cfg->aborted || cfg->bypassed) {
>  return 0;
>  }
>  
> @@ -701,7 +720,6 @@ static int decode_cd(SMMUv3State *s, SMMUTransCfg *cfg,
>  
>  /* we support only those at the moment */
>  cfg->aa64 = true;
> -cfg->stage = SMMU_STAGE_1;
>  
>  cfg->oas = oas2bits(CD_IPS(cd));
>  cfg->oas = MIN(oas2bits(SMMU_IDR5_OAS), cfg->oas);
> @@ -901,7 +919,7 @@ static SMMUTranslationStatus 
> smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
>  event->u.f_walk_eabt.addr2 = ptw_info.addr;
>  break;
>  case SMMU_PTW_ERR_TRANSLATION:
> -if (PTW_RECORD_FAULT(cfg)) {
> +if (PTW_RECORD_FAULT(ptw_info, cfg)) {
>  event->type = SMMU_EVT_F_TRANSLATION;
>  event->u.f_translation.addr = addr;
>  event->u.f_translation.addr2 = ptw_info.addr;
> @@ -910,7 +928,7 @@ static SMMUTranslationStatus 
> smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
>  }
>  break;
>  case SMMU_PTW_ERR_ADDR_SIZE:
> -if (PTW_RECORD_FAULT(cfg)) {
> +if (PTW_RECORD_FAULT(ptw_info, cfg)) {
>  event->type = SMMU_EVT_F_ADDR_SIZE;
>  event->u.f_addr_size.addr = addr;
>  event->u.f_addr_size.addr2 = ptw_info.addr;
> @@ -919,7 +937,7 @@ static SMMUTranslationStatus 
> smmuv3_do_translate(SMMUv3State *s, hwaddr addr,
>  }
>  break;
>  case SMMU_PTW_ERR_ACCESS:
> -if (PTW_RECORD_FAULT(cfg)) {
> +if

[PATCH v2 2/6] virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer

Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and
virtqueue_packed_pop.

VirtQueueElements popped from the available/descritpor ring are added to
the VirtQueue's used_elems array in-order and in the same fashion as
they would be added the used and descriptor rings, respectively.

This will allow us to keep track of the current order, what elements
have been written, as well as an element's essential data after being
processed.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 893a072c9d..7456d61bc8 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1506,7 +1506,7 @@ static void *virtqueue_alloc_element(size_t sz, unsigned 
out_num, unsigned in_nu
 
 static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 {
-unsigned int i, head, max;
+unsigned int i, head, max, prev_avail_idx;
 VRingMemoryRegionCaches *caches;
 MemoryRegionCache indirect_desc_cache;
 MemoryRegionCache *desc_cache;
@@ -1539,6 +1539,8 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t sz)
 goto done;
 }
 
+prev_avail_idx = vq->last_avail_idx;
+
 if (!virtqueue_get_head(vq, vq->last_avail_idx++, )) {
 goto done;
 }
@@ -1630,6 +1632,12 @@ static void *virtqueue_split_pop(VirtQueue *vq, size_t 
sz)
 elem->in_sg[i] = iov[out_num + i];
 }
 
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[prev_avail_idx].index = elem->index;
+vq->used_elems[prev_avail_idx].len = elem->len;
+vq->used_elems[prev_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->inuse++;
 
 trace_virtqueue_pop(vq, elem, elem->in_num, elem->out_num);
@@ -1758,6 +1766,13 @@ static void *virtqueue_packed_pop(VirtQueue *vq, size_t 
sz)
 
 elem->index = id;
 elem->ndescs = (desc_cache == _desc_cache) ? 1 : elem_entries;
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_IN_ORDER)) {
+vq->used_elems[vq->last_avail_idx].index = elem->index;
+vq->used_elems[vq->last_avail_idx].len = elem->len;
+vq->used_elems[vq->last_avail_idx].ndescs = elem->ndescs;
+}
+
 vq->last_avail_idx += elem->ndescs;
 vq->inuse += elem->ndescs;
 
-- 
2.39.3

[PATCH v2 4/6] virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer

Add VIRTIO_F_IN_ORDER feature support for the virtqueue_flush operation.

The goal of the virtqueue_ordered_flush operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to write elements to
the used/descriptor ring in-order and then update used_idx.

The function iterates through the VirtQueueElement used_elems array
in-order starting at vq->used_idx. If the element is valid (filled), the
element is written to the used/descriptor ring. This process continues
until we find an invalid (not filled) element.

For packed VQs, the first entry (at vq->used_idx) is written to the
descriptor ring last so the guest doesn't see any invalid descriptors.

If any elements were written, the used_idx is updated.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 66 +-
 1 file changed, 65 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 01b6b32460..39b91beece 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1016,6 +1016,68 @@ static void virtqueue_packed_flush(VirtQueue *vq, 
unsigned int count)
 }
 }
 
+static void virtqueue_ordered_flush(VirtQueue *vq)
+{
+unsigned int i = vq->used_idx;
+unsigned int ndescs = 0;
+uint16_t old = vq->used_idx;
+bool packed;
+VRingUsedElem uelem;
+
+packed = virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED);
+
+if (packed) {
+if (unlikely(!vq->vring.desc)) {
+return;
+}
+} else if (unlikely(!vq->vring.used)) {
+return;
+}
+
+/* First expected in-order element isn't ready, nothing to do */
+if (!vq->used_elems[i].in_order_filled) {
+return;
+}
+
+/* Search for filled elements in-order */
+while (vq->used_elems[i].in_order_filled) {
+/*
+ * First entry for packed VQs is written last so the guest
+ * doesn't see invalid descriptors.
+ */
+if (packed && i != vq->used_idx) {
+virtqueue_packed_fill_desc(vq, >used_elems[i], ndescs, false);
+} else if (!packed) {
+uelem.id = vq->used_elems[i].index;
+uelem.len = vq->used_elems[i].len;
+vring_used_write(vq, , i);
+}
+
+vq->used_elems[i].in_order_filled = false;
+ndescs += vq->used_elems[i].ndescs;
+i += ndescs;
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+
+if (packed) {
+virtqueue_packed_fill_desc(vq, >used_elems[vq->used_idx], 0, true);
+vq->used_idx += ndescs;
+if (vq->used_idx >= vq->vring.num) {
+vq->used_idx -= vq->vring.num;
+vq->used_wrap_counter ^= 1;
+vq->signalled_used_valid = false;
+}
+} else {
+vring_used_idx_set(vq, i);
+if (unlikely((int16_t)(i - vq->signalled_used) < (uint16_t)(i - old))) 
{
+vq->signalled_used_valid = false;
+}
+}
+vq->inuse -= ndescs;
+}
+
 void virtqueue_flush(VirtQueue *vq, unsigned int count)
 {
 if (virtio_device_disabled(vq->vdev)) {
@@ -1023,7 +1085,9 @@ void virtqueue_flush(VirtQueue *vq, unsigned int count)
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_flush(vq);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_flush(vq, count);
 } else {
 virtqueue_split_flush(vq, count);
-- 
2.39.3

Re: [PATCH] Fixes: Indentation using spaces instead of TABS and improve formatting

2024-05-20 Thread Peter Maydell

On Wed, 8 May 2024 at 09:15, Tanmay Patil  wrote:
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/373
>
> Files changed:
> - hw/arm/boot.c
> - hw/char/omap_uart.c
> - hw/gpio/zaurus.c
> - hw/input/tsc2005.c
>
> Signed-off-by: Tanmay Patil 

Thanks for this patch; I've applied it to my target-arm.next
queue and it will get upstream within the next week or so.
(I tweaked the commit message format a bit.)

-- PMM

[PATCH v2 0/6] virtio,vhost: Add VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer

The goal of these patches is to add support to a variety of virtio and
vhost devices for the VIRTIO_F_IN_ORDER transport feature. This feature
indicates that all buffers are used by the device in the same order in
which they were made available by the driver.

These patches attempt to implement a generalized, non-device-specific
solution to support this feature.

The core feature behind this solution is a buffer mechanism in the form
of a VirtQueue's used_elems VirtQueueElement array. This allows devices
who always use buffers in-order by default to have a minimal overhead
impact. Devices that may not always use buffers in-order likely will
experience a performance hit. How large that performance hit is will
depend on how frequently elements are completed out-of-order.

A VirtQueue whose device uses this feature will use its used_elems
VirtQueueElement array to hold used VirtQueueElements. The index that
used elements are placed in used_elems is the same index on the
used/descriptor ring that would satisfy the in-order requirement. In
other words, used elements are placed in their in-order locations on
used_elems and are only written to the used/descriptor ring once the
elements on used_elems are able to continue their expected order.

To differentiate between a "used" and "unused" element on the used_elems
array (a "used" element being an element that has returned from
processing and an "unused" element being an element that has not yet
been processed), we added a boolean 'in_order_filled' member to the
VirtQueueElement struct. This flag is set to true when the element comes
back from processing (virtqueue_ordered_fill) and then set back to false
once it's been written to the used/descriptor ring
(virtqueue_ordered_flush).

---
v2: Make 'in_order_filled' more descriptive.
Change 'j' to more descriptive var name in virtqueue_split_pop.
Use more definitive search conditional in virtqueue_ordered_fill.
Avoid code duplication in virtqueue_ordered_flush.

v1: Move series from RFC to PATCH for submission.

Jonah Palmer (6):
  virtio: Add bool to VirtQueueElement
  virtio: virtqueue_pop - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support
  virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER support
  vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits
  virtio: Add VIRTIO_F_IN_ORDER property definition

 hw/block/vhost-user-blk.c|   1 +
 hw/net/vhost_net.c   |   2 +
 hw/scsi/vhost-scsi.c |   1 +
 hw/scsi/vhost-user-scsi.c|   1 +
 hw/virtio/vhost-user-fs.c|   1 +
 hw/virtio/vhost-user-vsock.c |   1 +
 hw/virtio/virtio.c   | 119 ++-
 include/hw/virtio/virtio.h   |   6 +-
 net/vhost-vdpa.c |   1 +
 9 files changed, 129 insertions(+), 4 deletions(-)

-- 
2.39.3

[PATCH v2 6/6] virtio: Add VIRTIO_F_IN_ORDER property definition

2024-05-20 Thread Jonah Palmer

Extend the virtio device property definitions to include the
VIRTIO_F_IN_ORDER feature.

The default state of this feature is disabled, allowing it to be
explicitly enabled where it's supported.

Tested-by: Lei Yang 
Acked-by: Eugenio Pérez 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 88e70c1ae1..d33345ecc5 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -371,7 +371,9 @@ typedef struct VirtIORNGConf VirtIORNGConf;
 DEFINE_PROP_BIT64("packed", _state, _field, \
   VIRTIO_F_RING_PACKED, false), \
 DEFINE_PROP_BIT64("queue_reset", _state, _field, \
-  VIRTIO_F_RING_RESET, true)
+  VIRTIO_F_RING_RESET, true), \
+DEFINE_PROP_BIT64("in_order", _state, _field, \
+  VIRTIO_F_IN_ORDER, false)
 
 hwaddr virtio_queue_get_desc_addr(VirtIODevice *vdev, int n);
 bool virtio_queue_enabled_legacy(VirtIODevice *vdev, int n);
-- 
2.39.3

[PATCH v2 3/6] virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support

2024-05-20 Thread Jonah Palmer

Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation.

The goal of the virtqueue_ordered_fill operation when the
VIRTIO_F_IN_ORDER feature has been negotiated is to search for this
now-used element, set its length, and mark the element as filled in
the VirtQueue's used_elems array.

By marking the element as filled, it will indicate that this element has
been processed and is ready to be flushed, so long as the element is
in-order.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7456d61bc8..01b6b32460 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -873,6 +873,38 @@ static void virtqueue_packed_fill(VirtQueue *vq, const 
VirtQueueElement *elem,
 vq->used_elems[idx].ndescs = elem->ndescs;
 }
 
+static void virtqueue_ordered_fill(VirtQueue *vq, const VirtQueueElement *elem,
+   unsigned int len)
+{
+unsigned int i, steps, max_steps;
+
+i = vq->used_idx;
+steps = 0;
+/*
+ * We shouldn't need to increase 'i' by more than the distance
+ * between used_idx and last_avail_idx.
+ */
+max_steps = (vq->last_avail_idx + vq->vring.num - vq->used_idx)
+% vq->vring.num;
+
+/* Search for element in vq->used_elems */
+while (steps <= max_steps) {
+/* Found element, set length and mark as filled */
+if (vq->used_elems[i].index == elem->index) {
+vq->used_elems[i].len = len;
+vq->used_elems[i].in_order_filled = true;
+break;
+}
+
+i += vq->used_elems[i].ndescs;
+steps += vq->used_elems[i].ndescs;
+
+if (i >= vq->vring.num) {
+i -= vq->vring.num;
+}
+}
+}
+
 static void virtqueue_packed_fill_desc(VirtQueue *vq,
const VirtQueueElement *elem,
unsigned int idx,
@@ -923,7 +955,9 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 return;
 }
 
-if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
+if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_IN_ORDER)) {
+virtqueue_ordered_fill(vq, elem, len);
+} else if (virtio_vdev_has_feature(vq->vdev, VIRTIO_F_RING_PACKED)) {
 virtqueue_packed_fill(vq, elem, len, idx);
 } else {
 virtqueue_split_fill(vq, elem, len, idx);
-- 
2.39.3

[PATCH v2 1/6] virtio: Add bool to VirtQueueElement

2024-05-20 Thread Jonah Palmer

Add the boolean 'in_order_filled' member to the VirtQueueElement structure.
The use of this boolean will signify whether the element has been processed
and is ready to be flushed (so long as the element is in-order). This
boolean is used to support the VIRTIO_F_IN_ORDER feature.

Tested-by: Lei Yang 
Signed-off-by: Jonah Palmer 
---
 include/hw/virtio/virtio.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 7d5ffdc145..88e70c1ae1 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -69,6 +69,8 @@ typedef struct VirtQueueElement
 unsigned int ndescs;
 unsigned int out_num;
 unsigned int in_num;
+/* Element has been processed (VIRTIO_F_IN_ORDER) */
+bool in_order_filled;
 hwaddr *in_addr;
 hwaddr *out_addr;
 struct iovec *in_sg;
-- 
2.39.3

[PATCH v2 0/2] target/riscv: Minor fixes and improvements for Virtual IRQs

2024-05-20 Thread Rajnesh Kanwal

This series contains few miscellaneous fixes related to Virtual IRQs
and related code. The first patch changes CSR mask widths to 64bit
as AIA introduces half CSRs in case of 32bit systems.

Second patch fixes guest and core local IRQ overlap. Qemu creates
a single IRQ range which is shared between core local interrupts
and guests in riscv_cpu_init(). Even though, in the current state
there is no device generating interrupts in the 13:63 range, and
virtual IRQ logic in Qemu also doesn't go through riscv_cpu_set_irq()
path, it's better to keep local and guest range separate to avoid
confusion and any future issues.

Patches can be found here on github [0] and v1 of the series
can be found here [1].

Patches are based on alistair/riscv-to-apply.next.

[0] https://github.com/rajnesh-kanwal/qemu/tree/dev/rkanwal/irq_fixes_v2
[1] https://lore.kernel.org/all/20240513114602.72098-1-rkan...@rivosinc.com/

Changes from v1->v2:
1. Check patch fixes.
2. Removed commit title split from Fixes tags.

Rajnesh Kanwal (2):
  target/riscv: Extend virtual irq csrs masks to be 64 bit wide.
  target/riscv: Move Guest irqs out of the core local irqs range.

 target/riscv/cpu_bits.h |  3 ++-
 target/riscv/csr.c  | 23 +++
 2 files changed, 17 insertions(+), 9 deletions(-)

-- 
2.34.1

[PATCH v2 1/2] target/riscv: Extend virtual irq csrs masks to be 64 bit wide.

2024-05-20 Thread Rajnesh Kanwal

AIA extends the width of all IRQ CSRs to 64bit even
in 32bit systems by adding missing half CSRs.

This seems to be missed while adding support for
virtual IRQs. The whole logic seems to be correct
except the width of the masks.

Fixes: 1697837ed9 ("target/riscv: Add M-mode virtual interrupt and IRQ 
filtering support.")
Fixes: 40336d5b1d ("target/riscv: Add HS-mode virtual interrupt and IRQ 
filtering support.")

Signed-off-by: Rajnesh Kanwal 
Reviewed-by: Daniel Henrique Barboza 
---
 target/riscv/csr.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 6b460ee0e8..152796ebc0 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1200,18 +1200,18 @@ static const target_ulong sstatus_v1_10_mask = 
SSTATUS_SIE | SSTATUS_SPIE |
  */
 
 /* Bit STIP can be an alias of mip.STIP that's why it's writable in mvip. */
-static const target_ulong mvip_writable_mask = MIP_SSIP | MIP_STIP | MIP_SEIP |
+static const uint64_t mvip_writable_mask = MIP_SSIP | MIP_STIP | MIP_SEIP |
 LOCAL_INTERRUPTS;
-static const target_ulong mvien_writable_mask = MIP_SSIP | MIP_SEIP |
+static const uint64_t mvien_writable_mask = MIP_SSIP | MIP_SEIP |
 LOCAL_INTERRUPTS;
 
-static const target_ulong sip_writable_mask = SIP_SSIP | LOCAL_INTERRUPTS;
-static const target_ulong hip_writable_mask = MIP_VSSIP;
-static const target_ulong hvip_writable_mask = MIP_VSSIP | MIP_VSTIP |
+static const uint64_t sip_writable_mask = SIP_SSIP | LOCAL_INTERRUPTS;
+static const uint64_t hip_writable_mask = MIP_VSSIP;
+static const uint64_t hvip_writable_mask = MIP_VSSIP | MIP_VSTIP |
 MIP_VSEIP | LOCAL_INTERRUPTS;
-static const target_ulong hvien_writable_mask = LOCAL_INTERRUPTS;
+static const uint64_t hvien_writable_mask = LOCAL_INTERRUPTS;
 
-static const target_ulong vsip_writable_mask = MIP_VSSIP | LOCAL_INTERRUPTS;
+static const uint64_t vsip_writable_mask = MIP_VSSIP | LOCAL_INTERRUPTS;
 
 const bool valid_vm_1_10_32[16] = {
 [VM_1_10_MBARE] = true,
-- 
2.34.1

[PATCH v2 2/2] target/riscv: Move Guest irqs out of the core local irqs range.

2024-05-20 Thread Rajnesh Kanwal

Qemu maps IRQs 0:15 for core interrupts and 16 onward for
guest interrupts which are later translated to hgiep in
`riscv_cpu_set_irq()` function.

With virtual IRQ support added, software now can fully
use the whole local interrupt range without any actual
hardware attached.

This change moves the guest interrupt range after the
core local interrupt range to avoid clash.

Fixes: 1697837ed9 ("target/riscv: Add M-mode virtual interrupt and IRQ 
filtering support.")
Fixes: 40336d5b1d ("target/riscv: Add HS-mode virtual interrupt and IRQ 
filtering support.")

Signed-off-by: Rajnesh Kanwal 
---
 target/riscv/cpu_bits.h | 3 ++-
 target/riscv/csr.c  | 9 -
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
index 74318a925c..a470fda9be 100644
--- a/target/riscv/cpu_bits.h
+++ b/target/riscv/cpu_bits.h
@@ -695,7 +695,8 @@ typedef enum RISCVException {
 #define IRQ_M_EXT  11
 #define IRQ_S_GEXT 12
 #define IRQ_PMU_OVF13
-#define IRQ_LOCAL_MAX  16
+#define IRQ_LOCAL_MAX  64
+/* -1 is due to bit zero of hgeip and hgeie being ROZ. */
 #define IRQ_LOCAL_GUEST_MAX(TARGET_LONG_BITS - 1)
 
 /* mip masks */
diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 152796ebc0..464e0e57a3 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1148,7 +1148,14 @@ static RISCVException write_stimecmph(CPURISCVState 
*env, int csrno,
 
 #define VSTOPI_NUM_SRCS 5
 
-#define LOCAL_INTERRUPTS (~0x1FFF)
+/*
+ * All core local interrupts except the fixed ones 0:12. This macro is for
+ * virtual interrupts logic so please don't change this to avoid messing up
+ * the whole support, For reference see AIA spec: `5.3 Interrupt filtering and
+ * virtual interrupts for supervisor level` and `6.3.2 Virtual interrupts for
+ * VS level`.
+ */
+#define LOCAL_INTERRUPTS   (~0x1FFFULL)
 
 static const uint64_t delegable_ints =
 S_MODE_INTERRUPTS | VS_MODE_INTERRUPTS | MIP_LCOFIP;
-- 
2.34.1

Re: [PATCH v2] hw/input/tsc2005: Fix -Wchar-subscripts warning in tsc2005_txrx()

2024-05-20 Thread Peter Maydell

On Wed, 8 May 2024 at 15:35, Philippe Mathieu-Daudé  wrote:
>
> Check the function index is in range and use an unsigned
> variable to avoid the following warning with GCC 13.2.0:
>
>   [666/5358] Compiling C object libcommon.fa.p/hw_input_tsc2005.c.o
>   hw/input/tsc2005.c: In function 'tsc2005_timer_tick':
>   hw/input/tsc2005.c:416:26: warning: array subscript has type 'char' 
> [-Wchar-subscripts]
> 416 | s->dav |= mode_regs[s->function];
> | ~^~
>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> v2: Use Peter suggestion
> ---
>  hw/input/tsc2005.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/hw/input/tsc2005.c b/hw/input/tsc2005.c
> index 941f163d36..8d35892c09 100644
> --- a/hw/input/tsc2005.c
> +++ b/hw/input/tsc2005.c
> @@ -406,6 +406,9 @@ uint32_t tsc2005_txrx(void *opaque, uint32_t value, int 
> len)
>  static void tsc2005_timer_tick(void *opaque)
>  {
>  TSC2005State *s = opaque;
> +unsigned int function = s->function;
> +
> +assert(function < ARRAY_SIZE(mode_regs);

Missing ')' -- this doesn't compile ;-)


Applied to target-arm.next with the typo fixed, thanks.

-- PMM

1 2 >

1 - 100 of 141 matches

Mail list logo