Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions

2022-10-27 Thread Gavin Shan

Hi Connie,

On 10/26/22 7:10 PM, Cornelia Huck wrote:

On Wed, Oct 26 2022, Gavin Shan  wrote:


On 10/25/22 6:54 PM, Cornelia Huck wrote:

On Mon, Oct 24 2022, Gavin Shan  wrote:


These 3 high memory regions are usually enabled by default, but


s/These 3/The/ ?



Ok.


they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't
needed by GICv2. This leads to waste in the PA space.


When building the command line, do we have enough information on when
the regions provide something useful, and when they just waste space?



I think the help messages are already indicative. For example, the help
messages for 'highmem-redist2' indicate the region is only needed by
GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM
and MMIO and the key words 'high' indicates they're the corresponding
second window.

#./qemu-system-aarch64 -M virt,?
highmem-ecam=- Set on/off to enable/disable high memory region for 
PCI ECAM
highmem-mmio=- Set on/off to enable/disable high memory region for 
PCI MMIO
highmem-redists= - Set on/off to enable/disable high memory region for 
GICv3 or GICv4 redistributor


OK, hopefully this is enough for anyone building a command line
directly. (Do we want to encourage management software like libvirt to
switch off regions that are not needed?)



Yeah, to disable those regions aren't needed will make libvirt smart.
I think it's worthy for libvirt to do it. I'm not sure if arbitrary
machine properties have been supported by libvirt or not. For example,
'highmem-redists' can still be turned off from 'vm1.xml' even though
it's not explicitly supported in libvirt's code.





Add properties to allow users selectively disable them if needed:
"highmem-redists", "highmem-ecam", "highmem-mmio".

Suggested-by: Marc Zyngier 
Signed-off-by: Gavin Shan 
---
   docs/system/arm/virt.rst | 12 
   hw/arm/virt.c| 64 
   2 files changed, 76 insertions(+)

diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst
index 4454706392..a1668a969d 100644
--- a/docs/system/arm/virt.rst
+++ b/docs/system/arm/virt.rst
@@ -98,6 +98,18 @@ compact-highmem
 Set ``on``/``off`` to enable/disable the compact layout for high memory 
regions.
 The default is ``on`` for machine types later than ``virt-7.2``.
   
+highmem-redists

+  Set ``on``/``off`` to enable/disable the high memry region for GICv3/4


s/memry/memory/



Ok, copy-and-paste error. Will be fixed.


+  redistributor. The default is ``on``.


Do we need to add a note about what effects setting this to "off" may
have, e.g. "Setting this to ``off`` may limit the maximum number of
cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save
some space."?



We may not mention GICv2 since GICv3/v4 are already mentioned. It's a
good idea to mention that the maximum number of CPUs is reduced when
it's turned off. I will have something like below in next respin if
you agree.

highmem-redists
Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or
GICv4 redistributor. The default is ``on``. Setting this to ``off`` will
limit the maximum number of CPUs when GICv3 or GICv4 is used.


OK, sounds reasonable to me.



Ok, Thanks for your confirm.



Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in
machvirt_init() needs to be recalculated based on that. The code change
will be included into next respin. Besides, the follow-up error message
will be improved to something like below.

error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
 "supported by machine 'mach-virt' (%d). The high memory "
 "region for GICv3 or GICv4 redistributor has been %s",
 max_cpus, virt_max_cpus,
 vms->highmem_redists ? "enabled" : "disabled");


Hm, the doc for error_report() states that "The resulting message should
be a single phrase, with no newline or trailing punctuation." Maybe

 if (max_cpus > virt_max_cpus) {
 error_report("Number of SMP CPUs requested (%d) exceeds max CPUs "
  "supported by machine 'mach-virt' (%d)",
  max_cpus, virt_max_cpus);
 if (vms->gic_version != VIRT_GIC_VERSION_2 &&
 !vms->higmem_redists) {
 error_printf("Try 'highmem-redists=on' for more CPUs\n");
 }
 exit(1);
 }



Ok, I didn't notice the message raised by error_report() has this sort
of requirements. Your suggested snippet looks good to me and it will
be included in next respin. However, I would hold next respin (v7) for
a while to see if I can receive comments from Peter/Marc on v6 :)

Thanks,
Gavin




Re: [PATCH 0/4] Shadow VirtQueue event index support

2022-10-27 Thread Eugenio Perez Martin
On Fri, Oct 28, 2022 at 4:44 AM Jason Wang  wrote:
>
>
> 在 2022/10/27 04:58, Michael S. Tsirkin 写道:
> > On Thu, Oct 20, 2022 at 05:52:47PM +0200, Eugenio Pérez wrote:
> >> Event idx helps to reduce the number of notifications between the device
> >> and the driver. It allows them to specify an index on the circular
> >> descriptors rings where to issue the notification, instead of a single
> >> binary indicator.
> >>
> >> Adding support for SVQ.
> >
> > Jason seems to be taking this through net
> >
> > Reviewed-by: Michael S. Tsirkin 
>
>
> Ok, I've queued this.
>
> Eugenio, please post the fix for endian on top.
>

I've got a v2 ready to send, would it be possible to send it right now
and send a v2 pull? That would save a few commits where the vdpa
devices do not work with big endian architectures.

Thanks!




Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment

2022-10-27 Thread Gavin Shan

Hi Connie,

On 10/26/22 6:43 PM, Cornelia Huck wrote:

On Wed, Oct 26 2022, Gavin Shan  wrote:

On 10/26/22 12:29 AM, Eric Auger wrote:

On 10/24/22 05:54, Gavin Shan wrote:

There are three high memory regions, which are VIRT_HIGH_REDIST2,
VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses
are floating on highest RAM address. However, they can be disabled
in several cases.

(1) One specific high memory region is likely to be disabled by
  code by toggling vms->highmem_{redists, ecam, mmio}.

(2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is
  'virt-2.12' or ealier than it.

(3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded
  on 32-bits system.

(4) One specific high memory region is disabled when it breaks the
  PA space limit.

The current implementation of virt_set_{memmap, high_memmap}() isn't
optimized because the high memory region's PA space is always reserved,
regardless of whatever the actual state in the corresponding
vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and
'vms->highest_gpa' are always increased for case (1), (2) and (3).
It's unnecessary since the assigned PA space for the disabled high
memory region won't be used afterwards.

Improve the address assignment for those three high memory region by
skipping the address assignment for one specific high memory region if
it has been disabled in case (1), (2) and (3). The memory layout may
be changed after the improvement is applied, which leads to potential
migration breakage. So 'vms->highmem_compact' is added to control if
the improvement should be applied. For now, 'vms->highmem_compact' is
set to false, meaning that we don't have memory layout change until it
becomes configurable through property 'compact-highmem' in next patch.

Signed-off-by: Gavin Shan 
Reviewed-by: Cornelia Huck 

the code has quite changed since Connie's R-b


Right. Connie, could you please check if the changes make sense to you
and I can regain your R-B? :)


My R-b still holds.



Thanks.




Tested-by: Zhenyu Zhang 
---
   hw/arm/virt.c | 15 ++-
   include/hw/arm/virt.h |  1 +
   2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ee98a8a3b6..4896f600b4 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms,
   vms->memmap[i].size = region_size;
   
   /*

- * Check each device to see if they fit in the PA space,
- * moving highest_gpa as we go.
+ * Check each device to see if it fits in the PA space,
+ * moving highest_gpa as we go. For compatibility, move
+ * highest_gpa for disabled fitting devices as well, if
+ * the compact layout has been disabled.
*
* For each device that doesn't fit, disable it.
*/
   fits = (region_base + region_size) <= BIT_ULL(pa_bits);
-if (fits) {
-vms->highest_gpa = region_base + region_size - 1;
+*region_enabled &= fits;
+if (vms->highmem_compact && !*region_enabled) {
+continue;
   }
   
-*region_enabled &= fits;

   base = region_base + region_size;
+if (fits) {
+vms->highest_gpa = region_base + region_size - 1;


vms->highest_gpa = base - 1;



It's personal taste actually. I was thinking of using 'base - 1', but
'region_base + region_size - 1' looks more like a direct way. I don't
have strong sense though and lets use 'base - 1' in next respin.


I don't really have a preference for one or the other.



Ok. Lets use 'base - 1' in next respin.

Thanks,
Gavin




Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed

2022-10-27 Thread Chao Peng
On Wed, Oct 26, 2022 at 01:46:20PM -0700, Isaku Yamahata wrote:
> On Tue, Oct 25, 2022 at 11:13:42PM +0800,
> Chao Peng  wrote:
> 
> > When private/shared memory are mixed in a large page, the lpage_info may
> > not be accurate and should be updated with this mixed info. A large page
> > has mixed pages can't be really mapped as large page since its
> > private/shared pages are from different physical memory.
> > 
> > Update lpage_info when private/shared memory attribute is changed. If
> > both private and shared pages are within a large page region, it can't
> > be mapped as large page. It's a bit challenge to track the mixed
> > info in a 'count' like variable, this patch instead reserves a bit in
> > 'disallow_lpage' to indicate a large page has mixed private/share pages.
> > 
> > Signed-off-by: Chao Peng 
> > ---
> >  arch/x86/include/asm/kvm_host.h |   8 +++
> >  arch/x86/kvm/mmu/mmu.c  | 112 +++-
> >  arch/x86/kvm/x86.c  |   2 +
> >  include/linux/kvm_host.h|  19 ++
> >  virt/kvm/kvm_main.c |  16 +++--
> >  5 files changed, 152 insertions(+), 5 deletions(-)
> > 
> ...
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 33b1aec44fb8..67a9823a8c35 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> ...
> > @@ -6910,3 +6915,108 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
> > if (kvm->arch.nx_lpage_recovery_thread)
> > kthread_stop(kvm->arch.nx_lpage_recovery_thread);
> >  }
> > +
> > +static inline bool linfo_is_mixed(struct kvm_lpage_info *linfo)
> > +{
> > +   return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +}
> > +
> > +static inline void linfo_update_mixed(struct kvm_lpage_info *linfo, bool 
> > mixed)
> > +{
> > +   if (mixed)
> > +   linfo->disallow_lpage |= KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +   else
> > +   linfo->disallow_lpage &= ~KVM_LPAGE_PRIVATE_SHARED_MIXED;
> > +}
> > +
> > +static bool mem_attr_is_mixed_2m(struct kvm *kvm, unsigned int attr,
> > +gfn_t start, gfn_t end)
> > +{
> > +   XA_STATE(xas, &kvm->mem_attr_array, start);
> > +   gfn_t gfn = start;
> > +   void *entry;
> > +   bool shared = attr == KVM_MEM_ATTR_SHARED;
> > +   bool mixed = false;
> > +
> > +   rcu_read_lock();
> > +   entry = xas_load(&xas);
> > +   while (gfn < end) {
> > +   if (xas_retry(&xas, entry))
> > +   continue;
> > +
> > +   KVM_BUG_ON(gfn != xas.xa_index, kvm);
> > +
> > +   if ((entry && !shared) || (!entry && shared)) {
> > +   mixed = true;
> > +   goto out;
> 
> nitpick: goto isn't needed. break should work.

Thanks.

> 
> > +   }
> > +
> > +   entry = xas_next(&xas);
> > +   gfn++;
> > +   }
> > +out:
> > +   rcu_read_unlock();
> > +   return mixed;
> > +}
> > +
> > +static bool mem_attr_is_mixed(struct kvm *kvm, struct kvm_memory_slot 
> > *slot,
> > + int level, unsigned int attr,
> > + gfn_t start, gfn_t end)
> > +{
> > +   unsigned long gfn;
> > +   void *entry;
> > +
> > +   if (level == PG_LEVEL_2M)
> > +   return mem_attr_is_mixed_2m(kvm, attr, start, end);
> > +
> > +   entry = xa_load(&kvm->mem_attr_array, start);
> > +   for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) {
> > +   if (linfo_is_mixed(lpage_info_slot(gfn, slot, level - 1)))
> > +   return true;
> > +   if (xa_load(&kvm->mem_attr_array, gfn) != entry)
> > +   return true;
> > +   }
> > +   return false;
> > +}
> > +
> > +void kvm_arch_update_mem_attr(struct kvm *kvm, struct kvm_memory_slot 
> > *slot,
> > + unsigned int attr, gfn_t start, gfn_t end)
> > +{
> > +
> > +   unsigned long lpage_start, lpage_end;
> > +   unsigned long gfn, pages, mask;
> > +   int level;
> > +
> > +   WARN_ONCE(!(attr & (KVM_MEM_ATTR_PRIVATE | KVM_MEM_ATTR_SHARED)),
> > +   "Unsupported mem attribute.\n");
> > +
> > +   /*
> > +* The sequence matters here: we update the higher level basing on the
> > +* lower level's scanning result.
> > +*/
> > +   for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) {
> > +   pages = KVM_PAGES_PER_HPAGE(level);
> > +   mask = ~(pages - 1);
> 
> nitpick: KVM_HPAGE_MASK(level).  Maybe matter of preference.

Yes, haven't noticed there is a KVM_HPAGE_MASK defined. Have no
strong preference here, since I already have KVM_PAGES_PER_HPAGE(level),
getting mask is straightforward.

A single KVM_HPAGE_MASK(level) will not give me what I need since here
is gfn, KVM_HPAGE_MASK(level)>> PAGE_SHIFT should be the right
equivalent.

Chao
> 
> 
> > +   lpage_start = max(start & mask, slot->base_gfn);
> > +   lpage_end = (end - 1) & mask;
> > +
> > +   /*
> > +* We only need to scan the head and tail 

Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2022-10-27 Thread Chao Peng
On Thu, Oct 27, 2022 at 11:27:05AM +0100, Fuad Tabba wrote:
> Hi,
> 
> On Tue, Oct 25, 2022 at 4:19 PM Chao Peng  wrote:
> >
> > This new KVM exit allows userspace to handle memory-related errors. It
> > indicates an error happens in KVM at guest memory range [gpa, gpa+size).
> > The flags includes additional information for userspace to handle the
> > error. Currently bit 0 is defined as 'private memory' where '1'
> > indicates error happens due to private memory access and '0' indicates
> > error happens due to shared memory access.
> >
> > When private memory is enabled, this new exit will be used for KVM to
> > exit to userspace for shared <-> private memory conversion in memory
> > encryption usage. In such usage, typically there are two kind of memory
> > conversions:
> >   - explicit conversion: happens when guest explicitly calls into KVM
> > to map a range (as private or shared), KVM then exits to userspace
> > to perform the map/unmap operations.
> >   - implicit conversion: happens in KVM page fault handler where KVM
> > exits to userspace for an implicit conversion when the page is in a
> > different state than requested (private or shared).
> >
> > Suggested-by: Sean Christopherson 
> > Co-developed-by: Yu Zhang 
> > Signed-off-by: Yu Zhang 
> > Signed-off-by: Chao Peng 
> > ---
> 
> Reviewed-by: Fuad Tabba 
> 
> I have tested the V8 version of this patch on arm64/qemu, and
> considering this hasn't changed:
> Tested-by: Fuad Tabba 

Appreciate your review and testing!

Chao
> 
> Cheers,
> /fuad
> 
> 
> 
> >  Documentation/virt/kvm/api.rst | 23 +++
> >  include/uapi/linux/kvm.h   |  9 +
> >  2 files changed, 32 insertions(+)
> >
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index f3fa75649a78..975688912b8c 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace 
> > should update the return
> >  values of SBI call before resuming the VCPU. For more details on RISC-V SBI
> >  spec refer, https://github.com/riscv/riscv-sbi-doc.
> >
> > +::
> > +
> > +   /* KVM_EXIT_MEMORY_FAULT */
> > +   struct {
> > +  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0)
> > +   __u32 flags;
> > +   __u32 padding;
> > +   __u64 gpa;
> > +   __u64 size;
> > +   } memory;
> > +
> > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
> > +encountered a memory error which is not handled by KVM kernel module and
> > +userspace may choose to handle it. The 'flags' field indicates the memory
> > +properties of the exit.
> > +
> > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
> > +   private memory access when the bit is set. Otherwise the memory error is
> > +   caused by shared memory access when the bit is clear.
> > +
> > +'gpa' and 'size' indicate the memory range the error occurs at. The 
> > userspace
> > +may handle the error and return to KVM to retry the previous memory access.
> > +
> >  ::
> >
> >  /* KVM_EXIT_NOTIFY */
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index f1ae45c10c94..fa60b032a405 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -300,6 +300,7 @@ struct kvm_xen_exit {
> >  #define KVM_EXIT_RISCV_SBI35
> >  #define KVM_EXIT_RISCV_CSR36
> >  #define KVM_EXIT_NOTIFY   37
> > +#define KVM_EXIT_MEMORY_FAULT 38
> >
> >  /* For KVM_EXIT_INTERNAL_ERROR */
> >  /* Emulate instruction failed. */
> > @@ -538,6 +539,14 @@ struct kvm_run {
> >  #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0)
> > __u32 flags;
> > } notify;
> > +   /* KVM_EXIT_MEMORY_FAULT */
> > +   struct {
> > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1 << 0)
> > +   __u32 flags;
> > +   __u32 padding;
> > +   __u64 gpa;
> > +   __u64 size;
> > +   } memory;
> > /* Fix the size of the union. */
> > char padding[256];
> > };
> > --
> > 2.25.1
> >



[PATCH V5 4/4] intel-iommu: PASID support

2022-10-27 Thread Jason Wang
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:

1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
   with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush
4) PASID based IOTLB invalidation

For simplicity PASID cache is not implemented so we can simply
implement the PASID cache flush as a no and leave it to be implemented
in the future. For PASID based IOTLB invalidation, since we haven't
had L1 stage support, the PASID based IOTLB invalidation is not
implemented yet. For PASID based device IOTLB invalidation, it
requires the support for vhost so we forbid enabling device IOTLB when
PASID is enabled now. Those work could be done in the future.

Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address without PASID. vtd_find_add_as() has
been extended to provision address space with PASID which could be
utilized by the future extension of PCI core to allow device model to
use PASID based DMA translation.

This feature would be useful for:

1) prototyping PASID support for devices like virtio
2) future vPASID work
3) future PRS and vSVA work

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V3:
- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops
Changes since V2:
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors
---
 hw/i386/intel_iommu.c  | 416 +
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |   7 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 339 insertions(+), 104 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9029ee98f4..7ca077b824 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -58,6 +58,14 @@
 struct vtd_as_key {
 PCIBus *bus;
 uint8_t devfn;
+uint32_t pasid;
+};
+
+struct vtd_iotlb_key {
+uint64_t gfn;
+uint32_t pasid;
+uint32_t level;
+uint16_t sid;
 };
 
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
@@ -199,14 +207,24 @@ static inline gboolean 
vtd_as_has_map_notifier(VTDAddressSpace *as)
 }
 
 /* GHashTable functions */
-static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2)
+static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2)
 {
-return *((const uint64_t *)v1) == *((const uint64_t *)v2);
+const struct vtd_iotlb_key *key1 = v1;
+const struct vtd_iotlb_key *key2 = v2;
+
+return key1->sid == key2->sid &&
+   key1->pasid == key2->pasid &&
+   key1->level == key2->level &&
+   key1->gfn == key2->gfn;
 }
 
-static guint vtd_uint64_hash(gconstpointer v)
+static guint vtd_iotlb_hash(gconstpointer v)
 {
-return (guint)*(const uint64_t *)v;
+const struct vtd_iotlb_key *key = v;
+
+return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) |
+   (key->level) << VTD_IOTLB_LVL_SHIFT |
+   (key->pasid) << VTD_IOTLB_PASID_SHIFT;
 }
 
 static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
@@ -214,7 +232,8 @@ static gboolean vtd_as_equal(gconstpointer v1, 
gconstpointer v2)
 const struct vtd_as_key *key1 = v1;
 const struct vtd_as_key *key2 = v2;
 
-return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) &&
+   (key1->pasid == key2->pasid);
 }
 
 /*
@@ -302,13 +321,6 @@ static void vtd_reset_caches(IntelIOMMUState *s)
 vtd_iommu_unlock(s);
 }
 
-static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id,
-  uint32_t level)
-{
-return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) |
-   ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT);
-}
-
 static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level)
 {
 return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K;
@@ -316,15 +328,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t 
level)
 
 /* Must be called with IOMMU lock held */
 static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id,
-   hwaddr addr)
+   uint32_t pasid, hwaddr addr)
 {
+struct vtd_iotlb_key key;
 VTDIOTLBEntry *entry;
-uint64_t key;
 int level;
 
 for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) {
-key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level),
-source_id, level);
+key.gfn = vtd_get_iotlb_gfn(addr, level);
+key.level = level;
+key.sid = source_id;
+key.pasid = pasid;
  

[PATCH V5 0/4] PASID support for Intel IOMMU

2022-10-27 Thread Jason Wang
Hi All:

This series tries to introduce PASID support for Intel IOMMU. The work
is based on the previous scalabe mode support by implement the
ECAP_PASID. A new "x-pasid-mode" is introduced to enable this
mode. All internal vIOMMU codes were extended to support PASID instead
of the current RID2PASID method. The code is also capable of
provisiong address space with PASID. Note that no devices can issue
PASID DMA right now, this needs future work.

This will be used for prototying PASID based device like virtio or
future vPASID support for Intel IOMMU.

Test has been done with the Linux guest with scalalbe mode enabled and
disabled. A virtio prototype[1][2] that can issue PAISD based DMA
request were also tested, different PASID were used in TX and RX in
those testing drivers.

Changes since V4:

- rename vtd_report_qualify_fault() to vtd_report_fault()
- Tweak the code to avoid using ret variable when getting rid2pasid

Changes since V3:

- rearrange the member for vtd_iotlb_key structure
- reorder the pasid parameter ahead of addr for vtd_lookup_iotlb()
- allow access size from 1 to 8 for vtd_mem_ir_fault_ops

Changes since V2:

- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
- forbid device-iotlb with PASID
- report PASID based qualified fault
- log PASID during errors

Changes since V1:

- speed up IOMMU translation when RID2PASID is not used
- remove the unnecessary L1 PASID invalidation descriptor support
- adding support for catching the translation to interrupt range when
  in the case of PT and scalable mode
- refine the comments to explain the hash algorithm used in IOTLB
  lookups

Please review.

[1] https://github.com/jasowang/qemu.git virtio-pasid
[2] https://github.com/jasowang/linux.git virtio-pasid

Jason Wang (4):
  intel-iommu: don't warn guest errors when getting rid2pasid entry
  intel-iommu: drop VTDBus
  intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
  intel-iommu: PASID support

 hw/i386/intel_iommu.c  | 692 ++---
 hw/i386/intel_iommu_internal.h |  16 +-
 hw/i386/trace-events   |   2 +
 include/hw/i386/intel_iommu.h  |  18 +-
 include/hw/pci/pci_bus.h   |   2 +
 5 files changed, 485 insertions(+), 245 deletions(-)

-- 
2.25.1




[PATCH V5 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function

2022-10-27 Thread Jason Wang
We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
internal goto which prevents it from being reused. This patch convert
that macro to a dedicated function and let the caller to decide what
to do (e.g using goto or not). This makes sure it can be re-used for
other function that requires fault reporting.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V4:
- rename vtd_report_qualify_fault() to vtd_report_fault()
Changes since V2:
- rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
---
 hw/i386/intel_iommu.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9fe5a222eb..9029ee98f4 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -49,17 +49,6 @@
 /* pe operations */
 #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
 #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & 
VTD_SM_PASID_ENTRY_AW))
-#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write) {\
-if (ret_fr) { \
-ret_fr = -ret_fr; \
-if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {   \
-trace_vtd_fault_disabled();   \
-} else {  \
-vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);  \
-} \
-goto error;   \
-} \
-}
 
 /*
  * PCI bus number (or SID) is not reliable since the device is usaully
@@ -1716,6 +1705,19 @@ out:
 trace_vtd_pt_enable_fast_path(source_id, success);
 }
 
+static void vtd_report_fault(IntelIOMMUState *s,
+ int err, bool is_fpd_set,
+ uint16_t source_id,
+ hwaddr addr,
+ bool is_write)
+{
+if (is_fpd_set && vtd_is_qualified_fault(err)) {
+trace_vtd_fault_disabled();
+} else {
+vtd_report_dmar_fault(s, source_id, addr, err, is_write);
+}
+}
+
 /* Map dev to context-entry then do a paging-structures walk to do a iommu
  * translation.
  *
@@ -1776,7 +1778,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
 if (!is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
is_write);
+if (ret_fr) {
+vtd_report_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 }
 } else {
 ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
@@ -1784,7 +1790,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 if (!ret_fr && !is_fpd_set && s->root_scalable) {
 ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
 }
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_fault(s, -ret_fr, is_fpd_set,
+ source_id, addr, is_write);
+goto error;
+}
 /* Update context-cache */
 trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
   cc_entry->context_cache_gen,
@@ -1820,7 +1830,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
*vtd_as, PCIBus *bus,
 
 ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
&reads, &writes, s->aw_bits);
-VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
+if (ret_fr) {
+vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
+ addr, is_write);
+goto error;
+}
 
 page_mask = vtd_slpt_level_page_mask(level);
 access_flags = IOMMU_ACCESS_FLAG(reads, writes);
-- 
2.25.1




[PATCH V5 2/4] intel-iommu: drop VTDBus

2022-10-27 Thread Jason Wang
We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.

To prevent being slower for past vtd_find_as_from_bus_num() callers, a
vtd_as cache indexed by the bus number is introduced to store the last
recent search result of a vtd_as belongs to a specific bus.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since V2:
- use PCI_BUILD_BDF() instead of vtd_make_source_id()
- Tweak the comments above vtd_as_hash()
- use PCI_BUS_NUM() instead of open coding
- rename vtd_as to vtd_address_spaces
---
 hw/i386/intel_iommu.c | 234 +-
 include/hw/i386/intel_iommu.h |  11 +-
 2 files changed, 118 insertions(+), 127 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 271de995be..9fe5a222eb 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -61,6 +61,16 @@
 } \
 }
 
+/*
+ * PCI bus number (or SID) is not reliable since the device is usaully
+ * initalized before guest can configure the PCI bridge
+ * (SECONDARY_BUS_NUMBER).
+ */
+struct vtd_as_key {
+PCIBus *bus;
+uint8_t devfn;
+};
+
 static void vtd_address_space_refresh_all(IntelIOMMUState *s);
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
 
@@ -210,6 +220,27 @@ static guint vtd_uint64_hash(gconstpointer v)
 return (guint)*(const uint64_t *)v;
 }
 
+static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
+{
+const struct vtd_as_key *key1 = v1;
+const struct vtd_as_key *key2 = v2;
+
+return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
+
+/*
+ * Note that we use pointer to PCIBus as the key, so hashing/shifting
+ * based on the pointer value is intended. Note that we deal with
+ * collisions through vtd_as_equal().
+ */
+static guint vtd_as_hash(gconstpointer v)
+{
+const struct vtd_as_key *key = v;
+guint value = (guint)(uintptr_t)key->bus;
+
+return (guint)(value << 8 | key->devfn);
+}
+
 static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value,
   gpointer user_data)
 {
@@ -248,22 +279,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, 
gpointer value,
 static void vtd_reset_context_cache_locked(IntelIOMMUState *s)
 {
 VTDAddressSpace *vtd_as;
-VTDBus *vtd_bus;
-GHashTableIter bus_it;
-uint32_t devfn_it;
+GHashTableIter as_it;
 
 trace_vtd_context_cache_reset();
 
-g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr);
+g_hash_table_iter_init(&as_it, s->vtd_address_spaces);
 
-while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) {
-for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) {
-vtd_as = vtd_bus->dev_as[devfn_it];
-if (!vtd_as) {
-continue;
-}
-vtd_as->context_cache_entry.context_cache_gen = 0;
-}
+while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) {
+vtd_as->context_cache_entry.context_cache_gen = 0;
 }
 s->context_cache_gen = 1;
 }
@@ -993,32 +1016,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, 
uint32_t level)
 return slpte & rsvd_mask;
 }
 
-/* Find the VTD address space associated with a given bus number */
-static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num)
-{
-VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num];
-GHashTableIter iter;
-
-if (vtd_bus) {
-return vtd_bus;
-}
-
-/*
- * Iterate over the registered buses to find the one which
- * currently holds this bus number and update the bus_num
- * lookup table.
- */
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-if (pci_bus_num(vtd_bus->bus) == bus_num) {
-s->vtd_as_by_bus_num[bus_num] = vtd_bus;
-return vtd_bus;
-}
-}
-
-return NULL;
-}
-
 /* Given the @iova, get relevant @slptep. @slpte_level will be the last level
  * of the translation, can be used for deciding the size of large page.
  */
@@ -1632,24 +1629,13 @@ static bool vtd_switch_address_space(VTDAddressSpace 
*as)
 
 static void vtd_switch_address_space_all(IntelIOMMUState *s)
 {
+VTDAddressSpace *vtd_as;
 GHashTableIter iter;
-VTDBus *vtd_bus;
-int i;
-
-g_hash_table_iter_init(&iter, s->vtd_as_by_busptr);
-while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) {
-for (i = 0; i < PCI_DEVFN_MAX; i++) {
-if (!vtd_bus->dev_as[i]) {
- 

[PATCH V5 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-10-27 Thread Jason Wang
We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.

Reviewed-by: Peter Xu 
Signed-off-by: Jason Wang 
---
Changes since v4:
- Tweak the code to avoid using ret variable
---
 hw/i386/intel_iommu.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6524c2ee32..271de995be 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
VTDContextEntry *ce)
 if (s->root_scalable) {
 ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
 if (ret) {
-error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32,
-  __func__, ret);
+/*
+ * This error is guest triggerable. We should assumt PT
+ * not enabled for safety.
+ */
 return false;
 }
 return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
@@ -1569,14 +1571,12 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as)
 {
 IntelIOMMUState *s;
 VTDContextEntry ce;
-int ret;
 
 assert(as);
 
 s = as->iommu_state;
-ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus),
-   as->devfn, &ce);
-if (ret) {
+if (vtd_dev_to_context_entry(s, pci_bus_num(as->bus), as->devfn,
+ &ce)) {
 /*
  * Possibly failed to parse the context entry for some reason
  * (e.g., during init, or any guest configuration errors on
-- 
2.25.1




[PULL 1/4] qom: Improve error messages when property has no getter or setter

2022-10-27 Thread Markus Armbruster
When you try to set a property that has no setter, the error message
blames "insufficient permission":

$ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio
QEMU 7.1.50 monitor - type 'help' for more information
(qemu) qom-set /machine type q35
Error: Insufficient permission to perform this operation

This implies it could work with "sufficient permission".  It can't.
Change the error message to:

Error: Property 'pc-i440fx-7.2-machine.type' is not writable

Do the same for getting a property that has no getter.

Signed-off-by: Markus Armbruster 
Message-Id: <20221012153801.2604340-2-arm...@redhat.com>
Reviewed-by: David Hildenbrand 
---
 qom/object.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/qom/object.c b/qom/object.c
index d34608558e..e5cef30f6d 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -1383,7 +1383,8 @@ bool object_property_get(Object *obj, const char *name, 
Visitor *v,
 }
 
 if (!prop->get) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property '%s.%s' is not readable",
+   object_get_typename(obj), name);
 return false;
 }
 prop->get(obj, v, name, prop->opaque, &err);
@@ -1402,7 +1403,8 @@ bool object_property_set(Object *obj, const char *name, 
Visitor *v,
 }
 
 if (!prop->set) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property '%s.%s' is not writable",
+   object_get_typename(obj), name);
 return false;
 }
 prop->set(obj, v, name, prop->opaque, errp);
-- 
2.37.3




[PULL 3/4] qtest: Improve error messages when property can not be set right now

2022-10-27 Thread Markus Armbruster
When you try to set qtest property "log" while the qtest object is
active, the error message blames "insufficient permission":

$ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -chardev 
socket,id=chrqt0,path=qtest.socket,server=on,wait=off -object 
qtest,id=qt0,chardev=chrqt0,log=/dev/null
QEMU 7.1.50 monitor - type 'help' for more information
(qemu) qom-set /objects/qt0 log qtest.log
Error: Insufficient permission to perform this operation

This implies it could work with "sufficient permission".  It can't.
Change the error message to:

Error: Property 'log' can not be set now

Same for property "chardev".

Signed-off-by: Markus Armbruster 
Message-Id: <20221012153801.2604340-4-arm...@redhat.com>
---
 softmmu/qtest.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/softmmu/qtest.c b/softmmu/qtest.c
index f8acef2628..afea7693d0 100644
--- a/softmmu/qtest.c
+++ b/softmmu/qtest.c
@@ -977,7 +977,7 @@ static void qtest_set_log(Object *obj, const char *value, 
Error **errp)
 QTest *q = QTEST(obj);
 
 if (qtest == q) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'log' can not be set now");
 } else {
 g_free(q->log);
 q->log = g_strdup(value);
@@ -997,7 +997,7 @@ static void qtest_set_chardev(Object *obj, const char 
*value, Error **errp)
 Chardev *chr;
 
 if (qtest == q) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'chardev' can not be set now");
 return;
 }
 
-- 
2.37.3




[PULL 2/4] backends: Improve error messages when property can no longer be set

2022-10-27 Thread Markus Armbruster
When you try to set virtio-rng property "filename" after the backend
has been completed with user_creatable_complete(), the error message
blames "insufficient permission":

$ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -object 
rng-random,id=rng0 -device virtio-rng,id=vrng0,rng=rng0
QEMU 7.1.50 monitor - type 'help' for more information
(qemu) qom-set /objects/rng0 filename /dev/random
Error: Insufficient permission to perform this operation

This implies it could work with "sufficient permission".  It can't.
Change the error message to:

Error: Property 'filename' can no longer be set

Same for cryptodev-vhost-user property "chardev", rng-egd property
"chardev", and vhost-user-backend property "chardev".

Signed-off-by: Markus Armbruster 
Message-Id: <20221012153801.2604340-3-arm...@redhat.com>
Acked-by: Michael S. Tsirkin 
[Commit message tidied up]
---
 backends/cryptodev-vhost-user.c | 2 +-
 backends/rng-egd.c  | 2 +-
 backends/rng-random.c   | 2 +-
 backends/vhost-user.c   | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c
index 5443a59153..f9c5867e38 100644
--- a/backends/cryptodev-vhost-user.c
+++ b/backends/cryptodev-vhost-user.c
@@ -339,7 +339,7 @@ static void cryptodev_vhost_user_set_chardev(Object *obj,
   CRYPTODEV_BACKEND_VHOST_USER(obj);
 
 if (s->opened) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'chardev' can no longer be set");
 } else {
 g_free(s->chr_name);
 s->chr_name = g_strdup(value);
diff --git a/backends/rng-egd.c b/backends/rng-egd.c
index 4de142b9dc..684c3cf3d6 100644
--- a/backends/rng-egd.c
+++ b/backends/rng-egd.c
@@ -116,7 +116,7 @@ static void rng_egd_set_chardev(Object *obj, const char 
*value, Error **errp)
 RngEgd *s = RNG_EGD(b);
 
 if (b->opened) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'chardev' can no longer be set");
 } else {
 g_free(s->chr_name);
 s->chr_name = g_strdup(value);
diff --git a/backends/rng-random.c b/backends/rng-random.c
index 7add272edd..80eb5be138 100644
--- a/backends/rng-random.c
+++ b/backends/rng-random.c
@@ -96,7 +96,7 @@ static void rng_random_set_filename(Object *obj, const char 
*filename,
 RngRandom *s = RNG_RANDOM(obj);
 
 if (b->opened) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'filename' can no longer be set");
 return;
 }
 
diff --git a/backends/vhost-user.c b/backends/vhost-user.c
index 10b39992d2..5dedb2d987 100644
--- a/backends/vhost-user.c
+++ b/backends/vhost-user.c
@@ -141,7 +141,7 @@ static void set_chardev(Object *obj, const char *value, 
Error **errp)
 Chardev *chr;
 
 if (b->completed) {
-error_setg(errp, QERR_PERMISSION_DENIED);
+error_setg(errp, "Property 'chardev' can no longer be set");
 return;
 }
 
-- 
2.37.3




[PULL 4/4] qerror: QERR_PERMISSION_DENIED is no longer used, drop

2022-10-27 Thread Markus Armbruster
Signed-off-by: Markus Armbruster 
Message-Id: <20221012153801.2604340-5-arm...@redhat.com>
---
 include/qapi/qmp/qerror.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h
index 596fce0c54..87ca83b155 100644
--- a/include/qapi/qmp/qerror.h
+++ b/include/qapi/qmp/qerror.h
@@ -50,9 +50,6 @@
 #define QERR_MISSING_PARAMETER \
 "Parameter '%s' is missing"
 
-#define QERR_PERMISSION_DENIED \
-"Insufficient permission to perform this operation"
-
 #define QERR_PROPERTY_VALUE_BAD \
 "Property '%s.%s' doesn't take value '%s'"
 
-- 
2.37.3




[PULL 0/4] Error reporting patches for 2022-10-28

2022-10-27 Thread Markus Armbruster
The following changes since commit 344744e148e6e865f5a57e745b02a87e5ea534ad:

  Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
into staging (2022-10-26 10:53:49 -0400)

are available in the Git repository at:

  https://repo.or.cz/qemu/armbru.git tags/pull-error-2022-10-28

for you to fetch changes up to 0dddb0fc80f83d3bb469dc220ba8e2496b27a205:

  qerror: QERR_PERMISSION_DENIED is no longer used, drop (2022-10-27 07:57:18 
+0200)


Error reporting patches for 2022-10-28


Markus Armbruster (4):
  qom: Improve error messages when property has no getter or setter
  backends: Improve error messages when property can no longer be set
  qtest: Improve error messages when property can not be set right now
  qerror: QERR_PERMISSION_DENIED is no longer used, drop

 include/qapi/qmp/qerror.h   | 3 ---
 backends/cryptodev-vhost-user.c | 2 +-
 backends/rng-egd.c  | 2 +-
 backends/rng-random.c   | 2 +-
 backends/vhost-user.c   | 2 +-
 qom/object.c| 6 --
 softmmu/qtest.c | 4 ++--
 7 files changed, 10 insertions(+), 11 deletions(-)

-- 
2.37.3




[PULL 14/26] qapi: net: introduce a way to bypass qemu_opts_parse_noisily()

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

As qemu_opts_parse_noisily() flattens the QAPI structures ("type" field
of Netdev structure can collides with "type" field of SocketAddress),
we introduce a way to bypass qemu_opts_parse_noisily() and use directly
visit_type_Netdev() to parse the backend parameters.

More details from Markus:

qemu_init() passes the argument of -netdev, -nic, and -net to
net_client_parse().

net_client_parse() parses with qemu_opts_parse_noisily(), passing
QemuOptsList qemu_netdev_opts for -netdev, qemu_nic_opts for -nic, and
qemu_net_opts for -net.  Their desc[] are all empty, which means any
keys are accepted.  The result of the parse (a QemuOpts) is stored in
the QemuOptsList.

Note that QemuOpts is flat by design.  In some places, we layer non-flat
on top using dotted keys convention, but not here.

net_init_clients() iterates over the stored QemuOpts, and passes them to
net_init_netdev(), net_param_nic(), or net_init_client(), respectively.

These functions pass the QemuOpts to net_client_init().  They also do
other things with the QemuOpts, which we can ignore here.

net_client_init() uses the opts visitor to convert the (flat) QemOpts to
a (non-flat) QAPI object Netdev.  Netdev is also the argument of QMP
command netdev_add.

The opts visitor was an early attempt to support QAPI in
(QemuOpts-based) CLI.  It restricts QAPI types to a certain shape; see
commit eb7ee2cbeb "qapi: introduce OptsVisitor".

A more modern way to support QAPI is qobject_input_visitor_new_str().
It uses keyval_parse() instead of QemuOpts for KEY=VALUE,... syntax, and
it also supports JSON syntax.  The former isn't quite as expressive as
JSON, but it's a lot closer than QemuOpts + opts visitor.

This commit paves the way to use of the modern way instead.

Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  2 ++
 net/net.c | 57 +++
 softmmu/vl.c  |  6 +-
 3 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/net/net.h b/include/net/net.h
index 55023e7..025dbf1 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -220,6 +220,8 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
+bool netdev_is_modern(const char *optarg);
+void netdev_parse_modern(const char *optarg);
 void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
diff --git a/net/net.c b/net/net.c
index 178257a..128a90f 100644
--- a/net/net.c
+++ b/net/net.c
@@ -54,6 +54,7 @@
 #include "net/colo-compare.h"
 #include "net/filter.h"
 #include "qapi/string-output-visitor.h"
+#include "qapi/qobject-input-visitor.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -63,6 +64,16 @@
 static VMChangeStateEntry *net_change_state_entry;
 static QTAILQ_HEAD(, NetClientState) net_clients;
 
+typedef struct NetdevQueueEntry {
+Netdev *nd;
+Location loc;
+QSIMPLEQ_ENTRY(NetdevQueueEntry) entry;
+} NetdevQueueEntry;
+
+typedef QSIMPLEQ_HEAD(, NetdevQueueEntry) NetdevQueue;
+
+static NetdevQueue nd_queue = QSIMPLEQ_HEAD_INITIALIZER(nd_queue);
+
 /***/
 /* network device redirectors */
 
@@ -1566,6 +1577,20 @@ out:
 return ret;
 }
 
+static void netdev_init_modern(void)
+{
+while (!QSIMPLEQ_EMPTY(&nd_queue)) {
+NetdevQueueEntry *nd = QSIMPLEQ_FIRST(&nd_queue);
+
+QSIMPLEQ_REMOVE_HEAD(&nd_queue, entry);
+loc_push_restore(&nd->loc);
+net_client_init1(nd->nd, true, &error_fatal);
+loc_pop(&nd->loc);
+qapi_free_Netdev(nd->nd);
+g_free(nd);
+}
+}
+
 void net_init_clients(void)
 {
 net_change_state_entry =
@@ -1573,6 +1598,8 @@ void net_init_clients(void)
 
 QTAILQ_INIT(&net_clients);
 
+netdev_init_modern();
+
 qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
   &error_fatal);
 
@@ -1583,6 +1610,36 @@ void net_init_clients(void)
   &error_fatal);
 }
 
+/*
+ * Does this -netdev argument use modern rather than traditional syntax?
+ * Modern syntax is to be parsed with netdev_parse_modern().
+ * Traditional syntax is to be parsed with net_client_parse().
+ */
+bool netdev_is_modern(const char *optarg)
+{
+return false;
+}
+
+/*
+ * netdev_parse_modern() uses modern, more expressive syntax than
+ * net_client_parse(), but supports only the -netdev option.
+ * netdev_parse_modern() appends to @nd_queue, whereas net_client_parse()
+ * appends to @qemu_netdev_opts.
+ */
+void netdev_parse_modern(const char *optarg)
+{
+Visitor *v;
+NetdevQueueEntry *nd;
+
+v = qobject_input_visitor_new_str(optarg, "type", &error_fatal);
+nd = g_new(NetdevQueueEntry, 1);
+visit_type_Netdev(v, NULL, &nd->nd, &error_fatal);
+visit_free(v);
+loc_save(&nd-

[PULL 15/26] net: introduce qemu_set_info_str() function

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Embed the setting of info_str in a function.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 hw/net/xen_nic.c  |  5 ++---
 include/net/net.h |  1 +
 net/l2tpv3.c  |  3 +--
 net/net.c | 17 -
 net/slirp.c   |  5 ++---
 net/socket.c  | 33 ++---
 net/tap-win32.c   |  3 +--
 net/tap.c | 13 +
 net/vde.c |  3 +--
 net/vhost-user.c  |  3 +--
 net/vhost-vdpa.c  |  2 +-
 11 files changed, 41 insertions(+), 47 deletions(-)

diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c
index 5c815b4..7d92c2d 100644
--- a/hw/net/xen_nic.c
+++ b/hw/net/xen_nic.c
@@ -296,9 +296,8 @@ static int net_init(struct XenLegacyDevice *xendev)
 netdev->nic = qemu_new_nic(&net_xen_info, &netdev->conf,
"xen", NULL, netdev);
 
-snprintf(qemu_get_queue(netdev->nic)->info_str,
- sizeof(qemu_get_queue(netdev->nic)->info_str),
- "nic: xenbus vif macaddr=%s", netdev->mac);
+qemu_set_info_str(qemu_get_queue(netdev->nic),
+  "nic: xenbus vif macaddr=%s", netdev->mac);
 
 /* fill info */
 xenstore_write_be_int(&netdev->xendev, "feature-rx-copy", 1);
diff --git a/include/net/net.h b/include/net/net.h
index 025dbf1..3db75ff 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -177,6 +177,7 @@ ssize_t qemu_send_packet_async(NetClientState *nc, const 
uint8_t *buf,
 void qemu_purge_queued_packets(NetClientState *nc);
 void qemu_flush_queued_packets(NetClientState *nc);
 void qemu_flush_or_purge_queued_packets(NetClientState *nc, bool purge);
+void qemu_set_info_str(NetClientState *nc, const char *fmt, ...);
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
 bool qemu_has_ufo(NetClientState *nc);
 bool qemu_has_vnet_hdr(NetClientState *nc);
diff --git a/net/l2tpv3.c b/net/l2tpv3.c
index af373e5..350041a 100644
--- a/net/l2tpv3.c
+++ b/net/l2tpv3.c
@@ -723,8 +723,7 @@ int net_init_l2tpv3(const Netdev *netdev,
 
 l2tpv3_read_poll(s, true);
 
-snprintf(s->nc.info_str, sizeof(s->nc.info_str),
- "l2tpv3: connected");
+qemu_set_info_str(&s->nc, "l2tpv3: connected");
 return 0;
 outerr:
 qemu_del_net_client(nc);
diff --git a/net/net.c b/net/net.c
index 128a90f..f29a59c 100644
--- a/net/net.c
+++ b/net/net.c
@@ -141,13 +141,20 @@ char *qemu_mac_strdup_printf(const uint8_t *macaddr)
macaddr[3], macaddr[4], macaddr[5]);
 }
 
+void qemu_set_info_str(NetClientState *nc, const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+vsnprintf(nc->info_str, sizeof(nc->info_str), fmt, ap);
+va_end(ap);
+}
+
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6])
 {
-snprintf(nc->info_str, sizeof(nc->info_str),
- "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x",
- nc->model,
- macaddr[0], macaddr[1], macaddr[2],
- macaddr[3], macaddr[4], macaddr[5]);
+qemu_set_info_str(nc, "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x",
+  nc->model, macaddr[0], macaddr[1], macaddr[2],
+  macaddr[3], macaddr[4], macaddr[5]);
 }
 
 static int mac_table[256] = {0};
diff --git a/net/slirp.c b/net/slirp.c
index 8679be6..14a8d59 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -611,9 +611,8 @@ static int net_slirp_init(NetClientState *peer, const char 
*model,
 
 nc = qemu_new_net_client(&net_slirp_info, peer, model, name);
 
-snprintf(nc->info_str, sizeof(nc->info_str),
- "net=%s,restrict=%s", inet_ntoa(net),
- restricted ? "on" : "off");
+qemu_set_info_str(nc, "net=%s,restrict=%s", inet_ntoa(net),
+  restricted ? "on" : "off");
 
 s = DO_UPCAST(SlirpState, nc, nc);
 
diff --git a/net/socket.c b/net/socket.c
index bfd8596..ade1ecf 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -179,7 +179,7 @@ static void net_socket_send(void *opaque)
 s->fd = -1;
 net_socket_rs_init(&s->rs, net_socket_rs_finalize, false);
 s->nc.link_down = true;
-memset(s->nc.info_str, 0, sizeof(s->nc.info_str));
+qemu_set_info_str(&s->nc, "");
 
 return;
 }
@@ -387,16 +387,15 @@ static NetSocketState 
*net_socket_fd_init_dgram(NetClientState *peer,
 /* mcast: save bound address as dst */
 if (is_connected && mcast != NULL) {
 s->dgram_dst = saddr;
-snprintf(nc->info_str, sizeof(nc->info_str),
- "socket: fd=%d (cloned mcast=%s:%d)",
- fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+qemu_set_info_str(nc, "socket: fd=%d (cloned mcast=%s:%d)", fd,
+  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
 } else {
 if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) {
 s->dgram_dst.sin_family =

[PULL 12/26] net: remove the @errp argument of net_client_inits()

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

The only caller passes &error_fatal, so use this directly in the function.

It's what we do for -blockdev, -device, and -object.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  2 +-
 net/net.c | 20 +++-
 softmmu/vl.c  |  2 +-
 3 files changed, 9 insertions(+), 15 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 81d0b21..c1c34a5 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -222,7 +222,7 @@ extern const char *host_net_devices[];
 /* from net.c */
 int net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
-int net_init_clients(Error **errp);
+void net_init_clients(void);
 void net_check_clients(void);
 void net_cleanup(void);
 void hmp_host_net_add(Monitor *mon, const QDict *qdict);
diff --git a/net/net.c b/net/net.c
index 407bdd1..5e78880 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1566,27 +1566,21 @@ out:
 return ret;
 }
 
-int net_init_clients(Error **errp)
+void net_init_clients(void)
 {
 net_change_state_entry =
 qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL);
 
 QTAILQ_INIT(&net_clients);
 
-if (qemu_opts_foreach(qemu_find_opts("netdev"),
-  net_init_netdev, NULL, errp)) {
-return -1;
-}
-
-if (qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, errp)) {
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL,
+  &error_fatal);
 
-if (qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, errp)) 
{
-return -1;
-}
+qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL,
+  &error_fatal);
 
-return 0;
+qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL,
+  &error_fatal);
 }
 
 int net_client_parse(QemuOptsList *opts_list, const char *optarg)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index b464da2..a4ae131 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1904,7 +1904,7 @@ static void qemu_create_late_backends(void)
 qtest_server_init(qtest_chrdev, qtest_log, &error_fatal);
 }
 
-net_init_clients(&error_fatal);
+net_init_clients();
 
 object_option_foreach_add(object_create_late);
 
-- 
2.7.4




[PULL 17/26] net: socket: Don't ignore EINVAL on netdev socket connection

2022-10-27 Thread Jason Wang
From: Stefano Brivio 

Other errors are treated as failure by net_socket_connect_init(),
but if connect() returns EINVAL, we'll fail silently. Remove the
related exception.

Signed-off-by: Stefano Brivio 
Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 net/socket.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index ade1ecf..4944bb7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -577,8 +577,7 @@ static int net_socket_connect_init(NetClientState *peer,
 if (errno == EINTR || errno == EWOULDBLOCK) {
 /* continue */
 } else if (errno == EINPROGRESS ||
-   errno == EALREADY ||
-   errno == EINVAL) {
+   errno == EALREADY) {
 break;
 } else {
 error_setg_errno(errp, errno, "can't connect socket");
-- 
2.7.4




[PULL 11/26] net: introduce convert_host_port()

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 include/qemu/sockets.h |  2 ++
 net/net.c  | 62 ++
 2 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 036745e..db4bedb 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -65,6 +65,8 @@ void socket_listen_cleanup(int fd, Error **errp);
 int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp);
 
 /* Old, ipv4 only bits.  Don't use for new code. */
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp);
 int parse_host_port(struct sockaddr_in *saddr, const char *str,
 Error **errp);
 int socket_init(void);
diff --git a/net/net.c b/net/net.c
index 8ddafac..407bdd1 100644
--- a/net/net.c
+++ b/net/net.c
@@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients;
 /***/
 /* network device redirectors */
 
-int parse_host_port(struct sockaddr_in *saddr, const char *str,
-Error **errp)
+int convert_host_port(struct sockaddr_in *saddr, const char *host,
+  const char *port, Error **errp)
 {
-gchar **substrings;
 struct hostent *he;
-const char *addr, *p, *r;
-int port, ret = 0;
+const char *r;
+long p;
 
 memset(saddr, 0, sizeof(*saddr));
 
-substrings = g_strsplit(str, ":", 2);
-if (!substrings || !substrings[0] || !substrings[1]) {
-error_setg(errp, "host address '%s' doesn't contain ':' "
-   "separating host from port", str);
-ret = -1;
-goto out;
-}
-
-addr = substrings[0];
-p = substrings[1];
-
 saddr->sin_family = AF_INET;
-if (addr[0] == '\0') {
+if (host[0] == '\0') {
 saddr->sin_addr.s_addr = 0;
 } else {
-if (qemu_isdigit(addr[0])) {
-if (!inet_aton(addr, &saddr->sin_addr)) {
+if (qemu_isdigit(host[0])) {
+if (!inet_aton(host, &saddr->sin_addr)) {
 error_setg(errp, "host address '%s' is not a valid "
-   "IPv4 address", addr);
-ret = -1;
-goto out;
+   "IPv4 address", host);
+return -1;
 }
 } else {
-he = gethostbyname(addr);
+he = gethostbyname(host);
 if (he == NULL) {
-error_setg(errp, "can't resolve host address '%s'", addr);
-ret = -1;
-goto out;
+error_setg(errp, "can't resolve host address '%s'", host);
+return -1;
 }
 saddr->sin_addr = *(struct in_addr *)he->h_addr;
 }
 }
-port = strtol(p, (char **)&r, 0);
-if (r == p) {
-error_setg(errp, "port number '%s' is invalid", p);
+if (qemu_strtol(port, &r, 0, &p) != 0) {
+error_setg(errp, "port number '%s' is invalid", port);
+return -1;
+}
+saddr->sin_port = htons(p);
+return 0;
+}
+
+int parse_host_port(struct sockaddr_in *saddr, const char *str,
+Error **errp)
+{
+gchar **substrings;
+int ret;
+
+substrings = g_strsplit(str, ":", 2);
+if (!substrings || !substrings[0] || !substrings[1]) {
+error_setg(errp, "host address '%s' doesn't contain ':' "
+   "separating host from port", str);
 ret = -1;
 goto out;
 }
-saddr->sin_port = htons(port);
+
+ret = convert_host_port(saddr, substrings[0], substrings[1], errp);
 
 out:
 g_strfreev(substrings);
-- 
2.7.4




[PULL 23/26] qemu-sockets: move and rename SocketAddress_to_str()

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Rename SocketAddress_to_str() to socket_uri() and move it to
util/qemu-sockets.c close to socket_parse().

socket_uri() generates a string from a SocketAddress while
socket_parse() generates a SocketAddress from a string.

Signed-off-by: Laurent Vivier 
Reviewed-by: David Gibson 
Reviewed-by: Dr. David Alan Gilbert 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 include/qemu/sockets.h |  2 +-
 monitor/hmp-cmds.c | 23 +--
 util/qemu-sockets.c| 20 
 3 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index db4bedb..214058d 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -58,6 +58,7 @@ NetworkAddressFamily inet_netfamily(int family);
 int unix_listen(const char *path, Error **errp);
 int unix_connect(const char *path, Error **errp);
 
+char *socket_uri(SocketAddress *addr);
 SocketAddress *socket_parse(const char *str, Error **errp);
 int socket_connect(SocketAddress *addr, Error **errp);
 int socket_listen(SocketAddress *addr, int num, Error **errp);
@@ -141,5 +142,4 @@ SocketAddress *socket_address_flatten(SocketAddressLegacy 
*addr);
  * Return 0 on success.
  */
 int socket_address_parse_named_fd(SocketAddress *addr, Error **errp);
-
 #endif /* QEMU_SOCKETS_H */
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index bab86c5..01b789a 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -199,27 +199,6 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
 qapi_free_MouseInfoList(mice_list);
 }
 
-static char *SocketAddress_to_str(SocketAddress *addr)
-{
-switch (addr->type) {
-case SOCKET_ADDRESS_TYPE_INET:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.inet.host,
-   addr->u.inet.port);
-case SOCKET_ADDRESS_TYPE_UNIX:
-return g_strdup_printf("unix:%s",
-   addr->u.q_unix.path);
-case SOCKET_ADDRESS_TYPE_FD:
-return g_strdup_printf("fd:%s", addr->u.fd.str);
-case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
-   addr->u.vsock.cid,
-   addr->u.vsock.port);
-default:
-return g_strdup("unknown address type");
-}
-}
-
 void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 {
 MigrationInfo *info;
@@ -382,7 +361,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict)
 monitor_printf(mon, "socket address: [\n");
 
 for (addr = info->socket_address; addr; addr = addr->next) {
-char *s = SocketAddress_to_str(addr->value);
+char *s = socket_uri(addr->value);
 monitor_printf(mon, "\t%s\n", s);
 g_free(s);
 }
diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 83f4bd6..9f6f655 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1077,6 +1077,26 @@ int unix_connect(const char *path, Error **errp)
 return sock;
 }
 
+char *socket_uri(SocketAddress *addr)
+{
+switch (addr->type) {
+case SOCKET_ADDRESS_TYPE_INET:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.inet.host,
+   addr->u.inet.port);
+case SOCKET_ADDRESS_TYPE_UNIX:
+return g_strdup_printf("unix:%s",
+   addr->u.q_unix.path);
+case SOCKET_ADDRESS_TYPE_FD:
+return g_strdup_printf("fd:%s", addr->u.fd.str);
+case SOCKET_ADDRESS_TYPE_VSOCK:
+return g_strdup_printf("tcp:%s:%s",
+   addr->u.vsock.cid,
+   addr->u.vsock.port);
+default:
+return g_strdup("unknown address type");
+}
+}
 
 SocketAddress *socket_parse(const char *str, Error **errp)
 {
-- 
2.7.4




[PULL 25/26] net: stream: move to QIO to enable additional parameters

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Use QIOChannel, QIOChannelSocket and QIONetListener.
This allows net/stream to use all the available parameters provided by
SocketAddress.

Signed-off-by: Laurent Vivier 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 net/stream.c| 492 
 qemu-options.hx |   4 +-
 2 files changed, 178 insertions(+), 318 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 884f473..54c67e1 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -35,48 +35,36 @@
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
+#include "io/channel.h"
+#include "io/channel-socket.h"
+#include "io/net-listener.h"
 
 typedef struct NetStreamState {
 NetClientState nc;
-int listen_fd;
-int fd;
+QIOChannel *listen_ioc;
+QIONetListener *listener;
+QIOChannel *ioc;
+guint ioc_read_tag;
+guint ioc_write_tag;
 SocketReadState rs;
 unsigned int send_index;  /* number of bytes sent*/
-bool read_poll;   /* waiting to receive data? */
-bool write_poll;  /* waiting to transmit data? */
 } NetStreamState;
 
-static void net_stream_send(void *opaque);
-static void net_stream_accept(void *opaque);
-static void net_stream_writable(void *opaque);
+static void net_stream_listen(QIONetListener *listener,
+  QIOChannelSocket *cioc,
+  void *opaque);
 
-static void net_stream_update_fd_handler(NetStreamState *s)
+static gboolean net_stream_writable(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data)
 {
-qemu_set_fd_handler(s->fd,
-s->read_poll ? net_stream_send : NULL,
-s->write_poll ? net_stream_writable : NULL,
-s);
-}
-
-static void net_stream_read_poll(NetStreamState *s, bool enable)
-{
-s->read_poll = enable;
-net_stream_update_fd_handler(s);
-}
-
-static void net_stream_write_poll(NetStreamState *s, bool enable)
-{
-s->write_poll = enable;
-net_stream_update_fd_handler(s);
-}
-
-static void net_stream_writable(void *opaque)
-{
-NetStreamState *s = opaque;
+NetStreamState *s = data;
 
-net_stream_write_poll(s, false);
+s->ioc_write_tag = 0;
 
 qemu_flush_queued_packets(&s->nc);
+
+return G_SOURCE_REMOVE;
 }
 
 static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf,
@@ -93,13 +81,15 @@ static ssize_t net_stream_receive(NetClientState *nc, const 
uint8_t *buf,
 .iov_len  = size,
 },
 };
+struct iovec local_iov[2];
+unsigned int nlocal_iov;
 size_t remaining;
 ssize_t ret;
 
 remaining = iov_size(iov, 2) - s->send_index;
-ret = iov_send(s->fd, iov, 2, s->send_index, remaining);
-
-if (ret == -1 && errno == EAGAIN) {
+nlocal_iov = iov_copy(local_iov, 2, iov, 2, s->send_index, remaining);
+ret = qio_channel_writev(s->ioc, local_iov, nlocal_iov, NULL);
+if (ret == QIO_CHANNEL_ERR_BLOCK) {
 ret = 0; /* handled further down */
 }
 if (ret == -1) {
@@ -108,19 +98,25 @@ static ssize_t net_stream_receive(NetClientState *nc, 
const uint8_t *buf,
 }
 if (ret < (ssize_t)remaining) {
 s->send_index += ret;
-net_stream_write_poll(s, true);
+s->ioc_write_tag = qio_channel_add_watch(s->ioc, G_IO_OUT,
+ net_stream_writable, s, NULL);
 return 0;
 }
 s->send_index = 0;
 return size;
 }
 
+static gboolean net_stream_send(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data);
+
 static void net_stream_send_completed(NetClientState *nc, ssize_t len)
 {
 NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc);
 
-if (!s->read_poll) {
-net_stream_read_poll(s, true);
+if (!s->ioc_read_tag) {
+s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN,
+net_stream_send, s, NULL);
 }
 }
 
@@ -131,19 +127,24 @@ static void net_stream_rs_finalize(SocketReadState *rs)
 if (qemu_send_packet_async(&s->nc, rs->buf,
rs->packet_len,
net_stream_send_completed) == 0) {
-net_stream_read_poll(s, false);
+if (s->ioc_read_tag) {
+g_source_remove(s->ioc_read_tag);
+s->ioc_read_tag = 0;
+}
 }
 }
 
-static void net_stream_send(void *opaque)
+static gboolean net_stream_send(QIOChannel *ioc,
+GIOCondition condition,
+gpointer data)
 {
-NetStreamState *s = opaque;
+NetStreamState *s = data;
 int size;
 int ret;
-uint8_t buf1[NET_BUFSIZE];
-const uint8_t *buf;
+char buf1[NET_BUFSIZE];
+const char *buf;
 
-size = r

[PULL 04/26] vdpa: Remove shadow CVQ command check

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

The guest will see undefined behavior if it issue not negotiate
commands, bit it is expected somehow.

Simplify code deleting this check.

Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
Signed-off-by: Jason Wang 
---
 net/vhost-vdpa.c | 48 
 1 file changed, 48 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index eebf29f..6d64000 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -462,48 +462,6 @@ static NetClientInfo net_vhost_vdpa_cvq_info = {
 };
 
 /**
- * Do not forward commands not supported by SVQ. Otherwise, the device could
- * accept it and qemu would not know how to update the device model.
- */
-static bool vhost_vdpa_net_cvq_validate_cmd(const void *out_buf, size_t len)
-{
-struct virtio_net_ctrl_hdr ctrl;
-
-if (unlikely(len < sizeof(ctrl))) {
-qemu_log_mask(LOG_GUEST_ERROR,
-  "%s: invalid legnth of out buffer %zu\n", __func__, len);
-return false;
-}
-
-memcpy(&ctrl, out_buf, sizeof(ctrl));
-switch (ctrl.class) {
-case VIRTIO_NET_CTRL_MAC:
-switch (ctrl.cmd) {
-case VIRTIO_NET_CTRL_MAC_ADDR_SET:
-return true;
-default:
-qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n",
-  __func__, ctrl.cmd);
-};
-break;
-case VIRTIO_NET_CTRL_MQ:
-switch (ctrl.cmd) {
-case VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET:
-return true;
-default:
-qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mq cmd %u\n",
-  __func__, ctrl.cmd);
-};
-break;
-default:
-qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n",
-  __func__, ctrl.class);
-};
-
-return false;
-}
-
-/**
  * Validate and copy control virtqueue commands.
  *
  * Following QEMU guidelines, we offer a copy of the buffers to the device to
@@ -526,16 +484,10 @@ static int 
vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq,
 .iov_len = sizeof(status),
 };
 ssize_t dev_written = -EINVAL;
-bool ok;
 
 out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0,
  s->cvq_cmd_out_buffer,
  vhost_vdpa_net_cvq_cmd_len());
-ok = vhost_vdpa_net_cvq_validate_cmd(s->cvq_cmd_out_buffer, out.iov_len);
-if (unlikely(!ok)) {
-goto out;
-}
-
 dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status));
 if (unlikely(dev_written < 0)) {
 goto out;
-- 
2.7.4




[PULL 05/26] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-27 Thread Jason Wang
From: Si-Wei Liu 

Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
backend as another parameter to instantiate vhost-vdpa net client.
This would benefit the use case where only open file descriptors, as
opposed to raw vhost-vdpa device paths, are accessible from the QEMU
process.

(qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1

Signed-off-by: Si-Wei Liu 
Acked-by: Eugenio Pérez 
Signed-off-by: Jason Wang 
---
 net/vhost-vdpa.c | 25 -
 qapi/net.json|  3 +++
 qemu-options.hx  |  6 --
 3 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 6d64000..9c1b25d 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -634,14 +634,29 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = &netdev->u.vhost_vdpa;
-if (!opts->vhostdev) {
-error_setg(errp, "vdpa character device not specified with vhostdev");
+if (!opts->has_vhostdev && !opts->has_vhostfd) {
+error_setg(errp,
+   "vhost-vdpa: neither vhostdev= nor vhostfd= was specified");
 return -1;
 }
 
-vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
-if (vdpa_device_fd == -1) {
-return -errno;
+if (opts->has_vhostdev && opts->has_vhostfd) {
+error_setg(errp,
+   "vhost-vdpa: vhostdev= and vhostfd= are mutually 
exclusive");
+return -1;
+}
+
+if (opts->has_vhostdev) {
+vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp);
+if (vdpa_device_fd == -1) {
+return -errno;
+}
+} else if (opts->has_vhostfd) {
+vdpa_device_fd = monitor_fd_param(monitor_cur(), opts->vhostfd, errp);
+if (vdpa_device_fd == -1) {
+error_prepend(errp, "vhost-vdpa: unable to parse vhostfd: ");
+return -1;
+}
 }
 
 r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp);
diff --git a/qapi/net.json b/qapi/net.json
index dd088c0..926ecc8 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -442,6 +442,8 @@
 # @vhostdev: path of vhost-vdpa device
 #(default:'/dev/vhost-vdpa-0')
 #
+# @vhostfd: file descriptor of an already opened vhost vdpa device
+#
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #  (default: 1)
 #
@@ -456,6 +458,7 @@
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
 '*vhostdev': 'str',
+'*vhostfd':  'str',
 '*queues':   'int',
 '*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
diff --git a/qemu-options.hx b/qemu-options.hx
index eb38e5d..876a70a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2790,8 +2790,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a vhost-user network, backed by a chardev 
'dev'\n"
 #endif
 #ifdef __linux__
-"-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n"
+"-netdev vhost-vdpa,id=str[,vhostdev=/path/to/dev][,vhostfd=h]\n"
 "configure a vhost-vdpa network,Establish a vhost-vdpa 
netdev\n"
+"use 'vhostdev=/path/to/dev' to open a vhost vdpa device\n"
+"use 'vhostfd=h' to connect to an already opened vhost 
vdpa device\n"
 #endif
 #ifdef CONFIG_VMNET
 "-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n"
@@ -3296,7 +3298,7 @@ SRST
  -netdev type=vhost-user,id=net0,chardev=chr0 \
  -device virtio-net-pci,netdev=net0
 
-``-netdev vhost-vdpa,vhostdev=/path/to/dev``
+``-netdev vhost-vdpa[,vhostdev=/path/to/dev][,vhostfd=h]``
 Establish a vhost-vdpa netdev.
 
 vDPA device is a device that uses a datapath which complies with
-- 
2.7.4




[PULL 19/26] net: stream: add unix socket

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Acked-by: Michael S. Tsirkin 
Acked-by: Markus Armbruster  (QAPI schema)
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 net/stream.c| 107 +---
 qapi/net.json   |   2 +-
 qemu-options.hx |   1 +
 3 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index e4388fe..884f473 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -235,7 +235,7 @@ static NetStreamState *net_stream_fd_init(NetClientState 
*peer,
 static void net_stream_accept(void *opaque)
 {
 NetStreamState *s = opaque;
-struct sockaddr_in saddr;
+struct sockaddr_storage saddr;
 socklen_t len;
 int fd;
 
@@ -253,8 +253,26 @@ static void net_stream_accept(void *opaque)
 s->fd = fd;
 s->nc.link_down = false;
 net_stream_connect(s);
-qemu_set_info_str(&s->nc, "connection from %s:%d",
-  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+switch (saddr.ss_family) {
+case AF_INET: {
+struct sockaddr_in *saddr_in = (struct sockaddr_in *)&saddr;
+
+qemu_set_info_str(&s->nc, "connection from %s:%d",
+  inet_ntoa(saddr_in->sin_addr),
+  ntohs(saddr_in->sin_port));
+break;
+}
+case AF_UNIX: {
+struct sockaddr_un saddr_un;
+
+len = sizeof(saddr_un);
+getsockname(s->listen_fd, (struct sockaddr *)&saddr_un, &len);
+qemu_set_info_str(&s->nc, "connect from %s", saddr_un.sun_path);
+break;
+}
+default:
+g_assert_not_reached();
+}
 }
 
 static int net_stream_server_init(NetClientState *peer,
@@ -294,6 +312,43 @@ static int net_stream_server_init(NetClientState *peer,
 }
 break;
 }
+case SOCKET_ADDRESS_TYPE_UNIX: {
+struct sockaddr_un saddr_un;
+
+ret = unlink(addr->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ addr->u.q_unix.path);
+return -1;
+}
+
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+ret = bind(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't create socket with path: %s",
+ saddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+break;
+}
 case SOCKET_ADDRESS_TYPE_FD:
 fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp);
 if (fd == -1) {
@@ -337,6 +392,7 @@ static int net_stream_client_init(NetClientState *peer,
 {
 NetStreamState *s;
 struct sockaddr_in saddr_in;
+struct sockaddr_un saddr_un;
 int fd, connected, ret;
 
 switch (addr->type) {
@@ -373,6 +429,45 @@ static int net_stream_client_init(NetClientState *peer,
 }
 }
 break;
+case SOCKET_ADDRESS_TYPE_UNIX:
+saddr_un.sun_family = PF_UNIX;
+ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s",
+   addr->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   addr->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(saddr_un.sun_path));
+return -1;
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create stream socket");
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+connected = 0;
+for (;;) {
+ret = connect(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un));
+if (ret < 0) {
+if (errno == EINTR || errno == EWOULDBLOCK) {
+/* continue */
+} else if (errno == EAGAIN ||
+   errno == EALREADY) {
+break;
+} else {
+error_setg_errno(errp, errno, "can't connect socket");
+closesocket(fd);
+ 

[PULL 18/26] net: stream: Don't ignore EINVAL on netdev socket connection

2022-10-27 Thread Jason Wang
From: Stefano Brivio 

Other errors are treated as failure by net_stream_client_init(),
but if connect() returns EINVAL, we'll fail silently. Remove the
related exception.

Signed-off-by: Stefano Brivio 
[lvivier: applied to net/stream.c]
Signed-off-by: Laurent Vivier 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 net/stream.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/stream.c b/net/stream.c
index 0a7e847..e4388fe 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -360,8 +360,7 @@ static int net_stream_client_init(NetClientState *peer,
 if (errno == EINTR || errno == EWOULDBLOCK) {
 /* continue */
 } else if (errno == EINPROGRESS ||
-   errno == EALREADY ||
-   errno == EINVAL) {
+   errno == EALREADY) {
 break;
 } else {
 error_setg_errno(errp, errno, "can't connect socket");
-- 
2.7.4




[PULL 24/26] qemu-sockets: update socket_uri() and socket_parse() to be consistent

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

To be consistent with socket_uri(), add 'tcp:' prefix for inet type in
socket_parse(), by default socket_parse() use tcp when no prefix is
provided (format is host:port).

In socket_uri(), use 'vsock:' prefix for vsock type rather than 'tcp:'
because it makes a vsock address look like an inet address with CID
misinterpreted as host.
Goes back to commit 9aca82ba31 "migration: Create socket-address parameter"

Signed-off-by: Laurent Vivier 
Reviewed-by: Dr. David Alan Gilbert 
Reviewed-by: Markus Armbruster 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 util/qemu-sockets.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
index 9f6f655..a9926af 100644
--- a/util/qemu-sockets.c
+++ b/util/qemu-sockets.c
@@ -1090,7 +1090,7 @@ char *socket_uri(SocketAddress *addr)
 case SOCKET_ADDRESS_TYPE_FD:
 return g_strdup_printf("fd:%s", addr->u.fd.str);
 case SOCKET_ADDRESS_TYPE_VSOCK:
-return g_strdup_printf("tcp:%s:%s",
+return g_strdup_printf("vsock:%s:%s",
addr->u.vsock.cid,
addr->u.vsock.port);
 default:
@@ -1124,6 +1124,11 @@ SocketAddress *socket_parse(const char *str, Error 
**errp)
 if (vsock_parse(&addr->u.vsock, str + strlen("vsock:"), errp)) {
 goto fail;
 }
+} else if (strstart(str, "tcp:", NULL)) {
+addr->type = SOCKET_ADDRESS_TYPE_INET;
+if (inet_parse(&addr->u.inet, str + strlen("tcp:"), errp)) {
+goto fail;
+}
 } else {
 addr->type = SOCKET_ADDRESS_TYPE_INET;
 if (inet_parse(&addr->u.inet, str, errp)) {
-- 
2.7.4




[PULL 13/26] net: simplify net_client_parse() error management

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

All net_client_parse() callers exit in case of error.

Move exit(1) to net_client_parse() and remove error checking from
the callers.

Suggested-by: Markus Armbruster 
Signed-off-by: Laurent Vivier 
Reviewed-by: Markus Armbruster 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  2 +-
 net/net.c |  6 ++
 softmmu/vl.c  | 12 +++-
 3 files changed, 6 insertions(+), 14 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index c1c34a5..55023e7 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -220,7 +220,7 @@ extern NICInfo nd_table[MAX_NICS];
 extern const char *host_net_devices[];
 
 /* from net.c */
-int net_client_parse(QemuOptsList *opts_list, const char *str);
+void net_client_parse(QemuOptsList *opts_list, const char *str);
 void show_netdevs(void);
 void net_init_clients(void);
 void net_check_clients(void);
diff --git a/net/net.c b/net/net.c
index 5e78880..178257a 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1583,13 +1583,11 @@ void net_init_clients(void)
   &error_fatal);
 }
 
-int net_client_parse(QemuOptsList *opts_list, const char *optarg)
+void net_client_parse(QemuOptsList *opts_list, const char *optarg)
 {
 if (!qemu_opts_parse_noisily(opts_list, optarg, true)) {
-return -1;
+exit(1);
 }
-
-return 0;
 }
 
 /* From FreeBSD */
diff --git a/softmmu/vl.c b/softmmu/vl.c
index a4ae131..e69aa43 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2801,21 +2801,15 @@ void qemu_init(int argc, char **argv)
 break;
 case QEMU_OPTION_netdev:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("netdev"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("netdev"), optarg);
 break;
 case QEMU_OPTION_nic:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("nic"), optarg);
 break;
 case QEMU_OPTION_net:
 default_net = 0;
-if (net_client_parse(qemu_find_opts("net"), optarg) == -1) {
-exit(1);
-}
+net_client_parse(qemu_find_opts("net"), optarg);
 break;
 #ifdef CONFIG_LIBISCSI
 case QEMU_OPTION_iscsi:
-- 
2.7.4




[PULL 03/26] vdpa: Delete duplicated vdpa_feature_bits entry

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

This entry was duplicated on referenced commit. Removing it.

Fixes: 402378407dbd ("vhost-vdpa: multiqueue support")
Signed-off-by: Eugenio Pérez 
Acked-by: Jason Wang 
Signed-off-by: Jason Wang 
---
 net/vhost-vdpa.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 4bc3fd0..eebf29f 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -63,7 +63,6 @@ const int vdpa_feature_bits[] = {
 VIRTIO_NET_F_CTRL_RX,
 VIRTIO_NET_F_CTRL_RX_EXTRA,
 VIRTIO_NET_F_CTRL_VLAN,
-VIRTIO_NET_F_GUEST_ANNOUNCE,
 VIRTIO_NET_F_CTRL_MAC_ADDR,
 VIRTIO_NET_F_RSS,
 VIRTIO_NET_F_MQ,
-- 
2.7.4




[PULL 07/26] vhost: allocate event_idx fields on vring

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

There was not enough room to accomodate them.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost-shadow-virtqueue.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 596d443..a518f84 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -570,16 +570,16 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue 
*svq,
 size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq)
 {
 size_t desc_size = sizeof(vring_desc_t) * svq->vring.num;
-size_t avail_size = offsetof(vring_avail_t, ring) +
- sizeof(uint16_t) * svq->vring.num;
+size_t avail_size = offsetof(vring_avail_t, ring[svq->vring.num]) +
+  sizeof(uint16_t);
 
 return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size());
 }
 
 size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq)
 {
-size_t used_size = offsetof(vring_used_t, ring) +
-sizeof(vring_used_elem_t) * svq->vring.num;
+size_t used_size = offsetof(vring_used_t, ring[svq->vring.num]) +
+  sizeof(uint16_t);
 return ROUND_UP(used_size, qemu_real_host_page_size());
 }
 
-- 
2.7.4




[PULL 08/26] vhost: toggle device callbacks using used event idx

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

Actually use the new field of the used ring and tell the device if SVQ
wants to be notified.

The code is not reachable at the moment.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost-shadow-virtqueue.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index a518f84..f5c0fad 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -369,15 +369,27 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq)
  */
 static bool vhost_svq_enable_notification(VhostShadowVirtqueue *svq)
 {
-svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
-/* Make sure the flag is written before the read of used_idx */
+if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
+uint16_t *used_event = (uint16_t 
*)&svq->vring.avail->ring[svq->vring.num];
+*used_event = svq->shadow_used_idx;
+} else {
+svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+}
+
+/* Make sure the event is enabled before the read of used_idx */
 smp_mb();
 return !vhost_svq_more_used(svq);
 }
 
 static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq)
 {
-svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+/*
+ * No need to disable notification in the event idx case, since used event
+ * index is already an index too far away.
+ */
+if (!virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
+svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT);
+}
 }
 
 static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq,
-- 
2.7.4




[PULL 16/26] qapi: net: add stream and dgram netdevs

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Copied from socket netdev file and modified to use SocketAddress
to be able to introduce new features like unix socket.

"udp" and "mcast" are squashed into dgram netdev, multicast is detected
according to the IP address type.
"listen" and "connect" modes are managed by stream netdev. An optional
parameter "server" defines the mode (off by default)

The two new types need to be parsed the modern way with -netdev, because
with the traditional way, the "type" field of netdev structure collides with
the "type" field of SocketAddress and prevents the correct evaluation of the
command line option. Moreover the traditional way doesn't allow to use
the same type (SocketAddress) several times with the -netdev option
(needed to specify "local" and "remote" addresses).

The previous commit paved the way for parsing the modern way, but
omitted one detail: how to pick modern vs. traditional, in
netdev_is_modern().

We want to pick based on the value of parameter "type".  But how to
extract it from the option argument?

Parsing the option argument, either the modern or the traditional way,
extracts it for us, but only if parsing succeeds.

If parsing fails, there is no good option.  No matter which parser we
pick, it'll be the wrong one for some arguments, and the error
reporting will be confusing.

Fortunately, the traditional parser accepts *anything* when called in
a certain way.  This maximizes our chance to extract the value of
"type", and in turn minimizes the risk of confusing error reporting.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Acked-by: Markus Armbruster 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hmp-commands.hx |   2 +-
 net/clients.h   |   6 +
 net/dgram.c | 537 
 net/hub.c   |   2 +
 net/meson.build |   2 +
 net/net.c   |  30 +++-
 net/stream.c| 425 
 qapi/net.json   |  66 ++-
 qemu-options.hx |  12 ++
 9 files changed, 1078 insertions(+), 4 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 12b6d4e..673e39a 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1276,7 +1276,7 @@ ERST
 {
 .name   = "netdev_add",
 .args_type  = "netdev:O",
-.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user"
+.params = 
"[user|tap|socket|stream|dgram|vde|bridge|hubport|netmap|vhost-user"
 #ifdef CONFIG_VMNET
   "|vmnet-host|vmnet-shared|vmnet-bridged"
 #endif
diff --git a/net/clients.h b/net/clients.h
index c915778..ed8bdff 100644
--- a/net/clients.h
+++ b/net/clients.h
@@ -40,6 +40,12 @@ int net_init_hubport(const Netdev *netdev, const char *name,
 int net_init_socket(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp);
 
+int net_init_stream(const Netdev *netdev, const char *name,
+NetClientState *peer, Error **errp);
+
+int net_init_dgram(const Netdev *netdev, const char *name,
+   NetClientState *peer, Error **errp);
+
 int net_init_tap(const Netdev *netdev, const char *name,
  NetClientState *peer, Error **errp);
 
diff --git a/net/dgram.c b/net/dgram.c
new file mode 100644
index 000..5339585
--- /dev/null
+++ b/net/dgram.c
@@ -0,0 +1,537 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ * Copyright (c) 2022 Red Hat, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+
+#include "net/net.h"
+#include "clients.h"
+#include "monitor/monitor.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+
+typedef struct NetDgramState {
+NetClientState nc;
+ 

[PULL 09/26] vhost: use avail event idx on vhost_svq_kick

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

So SVQ code knows if an event is needed.

The code is not reachable at the moment.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost-shadow-virtqueue.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index f5c0fad..f306ebe 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -218,12 +218,22 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 
 static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 {
+bool needs_kick;
+
 /*
  * We need to expose the available array entries before checking the used
  * flags
  */
 smp_mb();
-if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) {
+
+if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) {
+uint16_t avail_event = *(uint16_t 
*)(&svq->vring.used->ring[svq->vring.num]);
+needs_kick = vring_need_event(avail_event, svq->shadow_avail_idx, 
svq->shadow_avail_idx - 1);
+} else {
+needs_kick = !(svq->vring.used->flags & VRING_USED_F_NO_NOTIFY);
+}
+
+if (!needs_kick) {
 return;
 }
 
-- 
2.7.4




[PULL 21/26] net: dgram: move mcast specific code from net_socket_fd_init_dgram()

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

It is less complex to manage special cases directly in
net_dgram_mcast_init() and net_dgram_udp_init().

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 net/dgram.c | 143 
 1 file changed, 76 insertions(+), 67 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index e20be9c..e581cc6 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -259,52 +259,11 @@ static NetClientInfo net_dgram_socket_info = {
 static NetDgramState *net_dgram_fd_init(NetClientState *peer,
 const char *model,
 const char *name,
-int fd, int is_fd,
-SocketAddress *mcast,
+int fd,
 Error **errp)
 {
-struct sockaddr_in *saddr = NULL;
-int newfd;
 NetClientState *nc;
 NetDgramState *s;
-SocketAddress *sa;
-SocketAddressType sa_type;
-
-sa = socket_local_address(fd, errp);
-if (!sa) {
-return NULL;
-}
-sa_type = sa->type;
-qapi_free_SocketAddress(sa);
-
-/*
- * fd passed: multicast: "learn" dest_addr address from bound address and
- * save it. Because this may be "shared" socket from a "master" process,
- * datagrams would be recv() by ONLY ONE process: we must "clone" this
- * dgram socket --jjo
- */
-
-if (is_fd && mcast != NULL) {
-saddr = g_new(struct sockaddr_in, 1);
-
-if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
-  errp) < 0) {
-goto err;
-}
-/* must be bound */
-if (saddr->sin_addr.s_addr == 0) {
-error_setg(errp, "can't setup multicast destination address");
-goto err;
-}
-/* clone dgram socket */
-newfd = net_dgram_mcast_create(saddr, NULL, errp);
-if (newfd < 0) {
-goto err;
-}
-/* clone newfd to fd, close newfd */
-dup2(newfd, fd);
-close(newfd);
-}
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
 
@@ -314,23 +273,7 @@ static NetDgramState *net_dgram_fd_init(NetClientState 
*peer,
 net_socket_rs_init(&s->rs, net_dgram_rs_finalize, false);
 net_dgram_read_poll(s, true);
 
-/* mcast: save bound address as dst */
-if (saddr) {
-g_assert(s->dest_addr == NULL);
-s->dest_addr = (struct sockaddr *)saddr;
-s->dest_len = sizeof(*saddr);
-qemu_set_info_str(nc, "fd=%d (cloned mcast=%s:%d)", fd,
-  inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
-} else {
-qemu_set_info_str(nc, "fd=%d %s", fd, SocketAddressType_str(sa_type));
-}
-
 return s;
-
-err:
-g_free(saddr);
-closesocket(fd);
-return NULL;
 }
 
 static int net_dgram_mcast_init(NetClientState *peer,
@@ -381,7 +324,9 @@ static int net_dgram_mcast_init(NetClientState *peer,
 }
 break;
 }
-case SOCKET_ADDRESS_TYPE_FD:
+case SOCKET_ADDRESS_TYPE_FD: {
+int newfd;
+
 fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp);
 if (fd == -1) {
 g_free(saddr);
@@ -394,7 +339,42 @@ static int net_dgram_mcast_init(NetClientState *peer,
  name, fd);
 return -1;
 }
+
+/*
+ * fd passed: multicast: "learn" dest_addr address from bound
+ * address and save it. Because this may be "shared" socket from a
+ * "master" process, datagrams would be recv() by ONLY ONE process:
+ * we must "clone" this dgram socket --jjo
+ */
+
+saddr = g_new(struct sockaddr_in, 1);
+
+if (convert_host_port(saddr, local->u.inet.host, 
local->u.inet.port,
+  errp) < 0) {
+g_free(saddr);
+closesocket(fd);
+return -1;
+}
+
+/* must be bound */
+if (saddr->sin_addr.s_addr == 0) {
+error_setg(errp, "can't setup multicast destination address");
+g_free(saddr);
+closesocket(fd);
+return -1;
+}
+/* clone dgram socket */
+newfd = net_dgram_mcast_create(saddr, NULL, errp);
+if (newfd < 0) {
+g_free(saddr);
+closesocket(fd);
+return -1;
+}
+/* clone newfd to fd, close newfd */
+dup2(newfd, fd);
+close(newfd);
 break;
+}
 default:
 g_free(saddr);
  

[PULL 10/26] vhost: Accept event idx flag

2022-10-27 Thread Jason Wang
From: Eugenio Pérez 

Enabling all the code path created before.

Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Eugenio Pérez 
Signed-off-by: Jason Wang 
---
 hw/virtio/vhost-shadow-virtqueue.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index f306ebe..5bd14ca 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -33,6 +33,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp)
  ++b) {
 switch (b) {
 case VIRTIO_F_ANY_LAYOUT:
+case VIRTIO_RING_F_EVENT_IDX:
 continue;
 
 case VIRTIO_F_ACCESS_PLATFORM:
-- 
2.7.4




[PULL 20/26] net: dgram: make dgram_dst generic

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

dgram_dst is a sockaddr_in structure. To be able to use it with
unix socket, use a pointer to a generic sockaddr structure.

Rename it dest_addr, and store socket length in dest_len.

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Acked-by: Michael S. Tsirkin 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Jason Wang 
---
 net/dgram.c | 82 +++--
 1 file changed, 53 insertions(+), 29 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index 5339585..e20be9c 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -40,9 +40,11 @@ typedef struct NetDgramState {
 NetClientState nc;
 int fd;
 SocketReadState rs;
-struct sockaddr_in dgram_dst; /* contains destination iff connectionless */
 bool read_poll;   /* waiting to receive data? */
 bool write_poll;  /* waiting to transmit data? */
+/* contains destination iff connectionless */
+struct sockaddr *dest_addr;
+socklen_t dest_len;
 } NetDgramState;
 
 static void net_dgram_send(void *opaque);
@@ -84,10 +86,8 @@ static ssize_t net_dgram_receive(NetClientState *nc,
 ssize_t ret;
 
 do {
-if (s->dgram_dst.sin_family != AF_UNIX) {
-ret = sendto(s->fd, buf, size, 0,
- (struct sockaddr *)&s->dgram_dst,
- sizeof(s->dgram_dst));
+if (s->dest_addr) {
+ret = sendto(s->fd, buf, size, 0, s->dest_addr, s->dest_len);
 } else {
 ret = send(s->fd, buf, size, 0);
 }
@@ -244,6 +244,9 @@ static void net_dgram_cleanup(NetClientState *nc)
 close(s->fd);
 s->fd = -1;
 }
+g_free(s->dest_addr);
+s->dest_addr = NULL;
+s->dest_len = 0;
 }
 
 static NetClientInfo net_dgram_socket_info = {
@@ -260,7 +263,7 @@ static NetDgramState *net_dgram_fd_init(NetClientState 
*peer,
 SocketAddress *mcast,
 Error **errp)
 {
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr = NULL;
 int newfd;
 NetClientState *nc;
 NetDgramState *s;
@@ -275,31 +278,32 @@ static NetDgramState *net_dgram_fd_init(NetClientState 
*peer,
 qapi_free_SocketAddress(sa);
 
 /*
- * fd passed: multicast: "learn" dgram_dst address from bound address and
+ * fd passed: multicast: "learn" dest_addr address from bound address and
  * save it. Because this may be "shared" socket from a "master" process,
  * datagrams would be recv() by ONLY ONE process: we must "clone" this
  * dgram socket --jjo
  */
 
 if (is_fd && mcast != NULL) {
-if (convert_host_port(&saddr, mcast->u.inet.host,
-  mcast->u.inet.port, errp) < 0) {
+saddr = g_new(struct sockaddr_in, 1);
+
+if (convert_host_port(saddr, mcast->u.inet.host, 
mcast->u.inet.port,
+  errp) < 0) {
 goto err;
 }
 /* must be bound */
-if (saddr.sin_addr.s_addr == 0) {
+if (saddr->sin_addr.s_addr == 0) {
 error_setg(errp, "can't setup multicast destination address");
 goto err;
 }
 /* clone dgram socket */
-newfd = net_dgram_mcast_create(&saddr, NULL, errp);
+newfd = net_dgram_mcast_create(saddr, NULL, errp);
 if (newfd < 0) {
 goto err;
 }
 /* clone newfd to fd, close newfd */
 dup2(newfd, fd);
 close(newfd);
-
 }
 
 nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name);
@@ -311,21 +315,20 @@ static NetDgramState *net_dgram_fd_init(NetClientState 
*peer,
 net_dgram_read_poll(s, true);
 
 /* mcast: save bound address as dst */
-if (is_fd && mcast != NULL) {
-s->dgram_dst = saddr;
+if (saddr) {
+g_assert(s->dest_addr == NULL);
+s->dest_addr = (struct sockaddr *)saddr;
+s->dest_len = sizeof(*saddr);
 qemu_set_info_str(nc, "fd=%d (cloned mcast=%s:%d)", fd,
-  inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port));
+  inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port));
 } else {
-if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) {
-s->dgram_dst.sin_family = AF_UNIX;
-}
-
 qemu_set_info_str(nc, "fd=%d %s", fd, SocketAddressType_str(sa_type));
 }
 
 return s;
 
 err:
+g_free(saddr);
 closesocket(fd);
 return NULL;
 }
@@ -339,21 +342,24 @@ static int net_dgram_mcast_init(NetClientState *peer,
 {
 NetDgramState *s;
 int fd, ret;
-struct sockaddr_in saddr;
+struct sockaddr_in *saddr;
 
 if (remote->type != SOCKET_ADDRESS_TYPE_INET) {
 error_setg(errp, "multicast only support inet type");
 return -1;
 }
 
-if (convert_host_port(&saddr, re

[PULL 00/26] Net patches

2022-10-27 Thread Jason Wang
The following changes since commit 344744e148e6e865f5a57e745b02a87e5ea534ad:

  Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu 
into staging (2022-10-26 10:53:49 -0400)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to e506fee8b1e092f6ac6f9459bf6a35b807644ad2:

  net: stream: add QAPI events to report connection state (2022-10-28 13:28:52 
+0800)




Daniel P. Berrangé (1):
  net: improve error message for missing netdev backend

Eugenio Pérez (6):
  vdpa: Delete duplicated vdpa_feature_bits entry
  vdpa: Remove shadow CVQ command check
  vhost: allocate event_idx fields on vring
  vhost: toggle device callbacks using used event idx
  vhost: use avail event idx on vhost_svq_kick
  vhost: Accept event idx flag

Laurent Vivier (16):
  virtio-net: fix bottom-half packet TX on asynchronous completion
  virtio-net: fix TX timer with tx_burst
  net: introduce convert_host_port()
  net: remove the @errp argument of net_client_inits()
  net: simplify net_client_parse() error management
  qapi: net: introduce a way to bypass qemu_opts_parse_noisily()
  net: introduce qemu_set_info_str() function
  qapi: net: add stream and dgram netdevs
  net: stream: add unix socket
  net: dgram: make dgram_dst generic
  net: dgram: move mcast specific code from net_socket_fd_init_dgram()
  net: dgram: add unix socket
  qemu-sockets: move and rename SocketAddress_to_str()
  qemu-sockets: update socket_uri() and socket_parse() to be consistent
  net: stream: move to QIO to enable additional parameters
  net: stream: add QAPI events to report connection state

Si-Wei Liu (1):
  vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

Stefano Brivio (2):
  net: socket: Don't ignore EINVAL on netdev socket connection
  net: stream: Don't ignore EINVAL on netdev socket connection

 hmp-commands.hx|   2 +-
 hw/net/virtio-net.c|  59 +++-
 hw/net/xen_nic.c   |   5 +-
 hw/virtio/vhost-shadow-virtqueue.c |  39 ++-
 include/net/net.h  |   7 +-
 include/qemu/sockets.h |   4 +-
 monitor/hmp-cmds.c |  23 +-
 net/clients.h  |   6 +
 net/dgram.c| 623 +
 net/hub.c  |   2 +
 net/l2tpv3.c   |   3 +-
 net/meson.build|   2 +
 net/net.c  | 204 
 net/slirp.c|   5 +-
 net/socket.c   |  36 +--
 net/stream.c   | 386 +++
 net/tap-win32.c|   3 +-
 net/tap.c  |  13 +-
 net/vde.c  |   3 +-
 net/vhost-user.c   |   3 +-
 net/vhost-vdpa.c   |  76 ++---
 qapi/net.json  | 118 ++-
 qemu-options.hx|  20 +-
 softmmu/vl.c   |  16 +-
 util/qemu-sockets.c|  25 ++
 25 files changed, 1473 insertions(+), 210 deletions(-)
 create mode 100644 net/dgram.c
 create mode 100644 net/stream.c




[PULL 22/26] net: dgram: add unix socket

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

Signed-off-by: Laurent Vivier 
Reviewed-by: Stefano Brivio 
Reviewed-by: David Gibson 
Acked-by: Michael S. Tsirkin 
Acked-by: Markus Armbruster  (QAPI schema)
Signed-off-by: Jason Wang 
---
 net/dgram.c | 55 ++-
 qapi/net.json   |  2 +-
 qemu-options.hx |  1 +
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/net/dgram.c b/net/dgram.c
index e581cc6..9f7bf38 100644
--- a/net/dgram.c
+++ b/net/dgram.c
@@ -426,6 +426,7 @@ int net_init_dgram(const Netdev *netdev, const char *name,
 SocketAddress *remote, *local;
 struct sockaddr *dest_addr;
 struct sockaddr_in laddr_in, raddr_in;
+struct sockaddr_un laddr_un, raddr_un;
 socklen_t dest_len;
 
 assert(netdev->type == NET_CLIENT_DRIVER_DGRAM);
@@ -465,7 +466,8 @@ int net_init_dgram(const Netdev *netdev, const char *name,
 }
 } else {
 if (local->type != SOCKET_ADDRESS_TYPE_FD) {
-error_setg(errp, "type=inet requires remote parameter");
+error_setg(errp,
+   "type=inet or type=unix requires remote parameter");
 return -1;
 }
 }
@@ -508,6 +510,53 @@ int net_init_dgram(const Netdev *netdev, const char *name,
 dest_addr = g_malloc(dest_len);
 memcpy(dest_addr, &raddr_in, dest_len);
 break;
+case SOCKET_ADDRESS_TYPE_UNIX:
+ret = unlink(local->u.q_unix.path);
+if (ret < 0 && errno != ENOENT) {
+error_setg_errno(errp, errno, "failed to unlink socket %s",
+ local->u.q_unix.path);
+return -1;
+}
+
+laddr_un.sun_family = PF_UNIX;
+ret = snprintf(laddr_un.sun_path, sizeof(laddr_un.sun_path), "%s",
+   local->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(laddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   local->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(laddr_un.sun_path));
+}
+
+raddr_un.sun_family = PF_UNIX;
+ret = snprintf(raddr_un.sun_path, sizeof(raddr_un.sun_path), "%s",
+   remote->u.q_unix.path);
+if (ret < 0 || ret >= sizeof(raddr_un.sun_path)) {
+error_setg(errp, "UNIX socket path '%s' is too long",
+   remote->u.q_unix.path);
+error_append_hint(errp, "Path must be less than %zu bytes\n",
+  sizeof(raddr_un.sun_path));
+}
+
+fd = qemu_socket(PF_UNIX, SOCK_DGRAM, 0);
+if (fd < 0) {
+error_setg_errno(errp, errno, "can't create datagram socket");
+return -1;
+}
+
+ret = bind(fd, (struct sockaddr *)&laddr_un, sizeof(laddr_un));
+if (ret < 0) {
+error_setg_errno(errp, errno, "can't bind unix=%s to socket",
+ laddr_un.sun_path);
+closesocket(fd);
+return -1;
+}
+qemu_socket_set_nonblock(fd);
+
+dest_len = sizeof(raddr_un);
+dest_addr = g_malloc(dest_len);
+memcpy(dest_addr, &raddr_un, dest_len);
+break;
 case SOCKET_ADDRESS_TYPE_FD:
 fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp);
 if (fd == -1) {
@@ -546,6 +595,10 @@ int net_init_dgram(const Netdev *netdev, const char *name,
   inet_ntoa(raddr_in.sin_addr),
   ntohs(raddr_in.sin_port));
 break;
+case SOCKET_ADDRESS_TYPE_UNIX:
+qemu_set_info_str(&s->nc, "udp=%s:%s",
+  laddr_un.sun_path, raddr_un.sun_path);
+break;
 case SOCKET_ADDRESS_TYPE_FD: {
 SocketAddress *sa;
 SocketAddressType sa_type;
diff --git a/qapi/net.json b/qapi/net.json
index b4768ce..cbc8d77 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -603,7 +603,7 @@
 # @remote: remote address
 # @local: local address
 #
-# Only SocketAddress types 'inet' and 'fd' are supported.
+# Only SocketAddress types 'unix', 'inet' and 'fd' are supported.
 #
 # If remote address is present and it's a multicast address, local address
 # is optional. Otherwise local address is required and remote address is
diff --git a/qemu-options.hx b/qemu-options.hx
index 858f3dc..e76142b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2782,6 +2782,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev,
 "configure a network backend to connect to a multicast 
maddr and port\n"
 "use ``local.host=addr`` to specify the host address to 
send packets from\n"
 "-netdev 
dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n"
+"-netdev 
dgram,id=str,local.type=unix,local.path=path[,remote.type=unix,remote.path=path]\n"
 "-netdev dgram,id=st

[PULL 06/26] net: improve error message for missing netdev backend

2022-10-27 Thread Jason Wang
From: Daniel P. Berrangé 

The current message when using '-net user...' with SLIRP disabled at
compile time is:

  qemu-system-x86_64: -net user: Parameter 'type' expects a net backend type 
(maybe it is not compiled into this binary)

An observation is that we're using the 'netdev->type' field here which
is an enum value, produced after QAPI has converted from its string
form.

IOW, at this point in the code, we know that the user's specified
type name was a valid network backend. The only possible scenario that
can make the backend init function be NULL, is if support for that
backend was disabled at build time. Given this, we don't need to caveat
our error message with a 'maybe' hint, we can be totally explicit.

The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to
user friendly error message text. Since this is not used to set a
specific QAPI error class, we can simply stop using this pre-formatted
error text and provide something better.

Thus the new message is:

  qemu-system-x86_64: -net user: network backend 'user' is not compiled into 
this binary

The case of passing 'hubport' for -net is also given a message reminding
people they should have used -netdev/-nic instead, as this backend type
is only valid for the modern syntax.

Reviewed-by: Marc-André Lureau 
Reviewed-by: Thomas Huth 
Signed-off-by: Daniel P. Berrangé 
Signed-off-by: Jason Wang 
---
 net/net.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/net.c b/net/net.c
index 2db160e..8ddafac 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1036,19 +1036,23 @@ static int net_client_init1(const Netdev *netdev, bool 
is_netdev, Error **errp)
 if (is_netdev) {
 if (netdev->type == NET_CLIENT_DRIVER_NIC ||
 !net_client_init_fun[netdev->type]) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
-   "a netdev backend type");
+error_setg(errp, "network backend '%s' is not compiled into this 
binary",
+   NetClientDriver_str(netdev->type));
 return -1;
 }
 } else {
 if (netdev->type == NET_CLIENT_DRIVER_NONE) {
 return 0; /* nothing to do */
 }
-if (netdev->type == NET_CLIENT_DRIVER_HUBPORT ||
-!net_client_init_fun[netdev->type]) {
-error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
-   "a net backend type (maybe it is not compiled "
-   "into this binary)");
+if (netdev->type == NET_CLIENT_DRIVER_HUBPORT) {
+error_setg(errp, "network backend '%s' is only supported with 
-netdev/-nic",
+   NetClientDriver_str(netdev->type));
+return -1;
+}
+
+if (!net_client_init_fun[netdev->type]) {
+error_setg(errp, "network backend '%s' is not compiled into this 
binary",
+   NetClientDriver_str(netdev->type));
 return -1;
 }
 
-- 
2.7.4




[PULL 02/26] virtio-net: fix TX timer with tx_burst

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

When virtio_net_flush_tx() reaches the tx_burst value all
the queue is not flushed and nothing restart the timer.

Fix that by doing for TX timer as we do for bottom half TX:
rearming the timer if we find any packet to send during the
virtio_net_flush_tx() call.

Fixes: e3f30488e5f8 ("virtio-net: Limit number of packets sent per TX flush")
Cc: alex.william...@redhat.com
Signed-off-by: Laurent Vivier 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hw/net/virtio-net.c | 50 +-
 1 file changed, 41 insertions(+), 9 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 1fbf2f3..b6903ae 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2536,14 +2536,19 @@ static void virtio_net_tx_complete(NetClientState *nc, 
ssize_t len)
 
 virtio_queue_set_notification(q->tx_vq, 1);
 ret = virtio_net_flush_tx(q);
-if (q->tx_bh && ret >= n->tx_burst) {
+if (ret >= n->tx_burst) {
 /*
  * the flush has been stopped by tx_burst
  * we will not receive notification for the
  * remainining part, so re-schedule
  */
 virtio_queue_set_notification(q->tx_vq, 0);
-qemu_bh_schedule(q->tx_bh);
+if (q->tx_bh) {
+qemu_bh_schedule(q->tx_bh);
+} else {
+timer_mod(q->tx_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
+}
 q->tx_waiting = 1;
 }
 }
@@ -2644,6 +2649,8 @@ drop:
 return num_packets;
 }
 
+static void virtio_net_tx_timer(void *opaque);
+
 static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -2661,15 +2668,13 @@ static void virtio_net_handle_tx_timer(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 
 if (q->tx_waiting) {
-virtio_queue_set_notification(vq, 1);
+/* We already have queued packets, immediately flush */
 timer_del(q->tx_timer);
-q->tx_waiting = 0;
-if (virtio_net_flush_tx(q) == -EINVAL) {
-return;
-}
+virtio_net_tx_timer(q);
 } else {
+/* re-arm timer to flush it (and more) on next tick */
 timer_mod(q->tx_timer,
-   qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
 q->tx_waiting = 1;
 virtio_queue_set_notification(vq, 0);
 }
@@ -2702,6 +2707,8 @@ static void virtio_net_tx_timer(void *opaque)
 VirtIONetQueue *q = opaque;
 VirtIONet *n = q->n;
 VirtIODevice *vdev = VIRTIO_DEVICE(n);
+int ret;
+
 /* This happens when device was stopped but BH wasn't. */
 if (!vdev->vm_running) {
 /* Make sure tx waiting is set, so we'll run when restarted. */
@@ -2716,8 +2723,33 @@ static void virtio_net_tx_timer(void *opaque)
 return;
 }
 
+ret = virtio_net_flush_tx(q);
+if (ret == -EBUSY || ret == -EINVAL) {
+return;
+}
+/*
+ * If we flush a full burst of packets, assume there are
+ * more coming and immediately rearm
+ */
+if (ret >= n->tx_burst) {
+q->tx_waiting = 1;
+timer_mod(q->tx_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
+return;
+}
+/*
+ * If less than a full burst, re-enable notification and flush
+ * anything that may have come in while we weren't looking.  If
+ * we find something, assume the guest is still active and rearm
+ */
 virtio_queue_set_notification(q->tx_vq, 1);
-virtio_net_flush_tx(q);
+ret = virtio_net_flush_tx(q);
+if (ret > 0) {
+virtio_queue_set_notification(q->tx_vq, 0);
+q->tx_waiting = 1;
+timer_mod(q->tx_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout);
+}
 }
 
 static void virtio_net_tx_bh(void *opaque)
-- 
2.7.4




[PULL 26/26] net: stream: add QAPI events to report connection state

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

The netdev reports NETDEV_STREAM_CONNECTED event when the backend
is connected, and NETDEV_STREAM_DISCONNECTED when it is disconnected.

The NETDEV_STREAM_CONNECTED event includes the destination address.

This allows a system manager like libvirt to detect when the server
fails.

For instance with passt:

{ 'execute': 'qmp_capabilities' }
{ "return": { } }
{ "timestamp": { "seconds": 1666341395, "microseconds": 505347 },
"event": "NETDEV_STREAM_CONNECTED",
"data": { "netdev-id": "netdev0",
"addr": { "path": "/tmp/passt_1.socket", "type": "unix" } } }

[killing passt here]

{ "timestamp": { "seconds": 1666341430, "microseconds": 968694 },
"event": "NETDEV_STREAM_DISCONNECTED",
"data": { "netdev-id": "netdev0" } }

Signed-off-by: Laurent Vivier 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 net/stream.c  |  5 +
 qapi/net.json | 49 +
 2 files changed, 54 insertions(+)

diff --git a/net/stream.c b/net/stream.c
index 54c67e1..53b7040 100644
--- a/net/stream.c
+++ b/net/stream.c
@@ -38,6 +38,7 @@
 #include "io/channel.h"
 #include "io/channel-socket.h"
 #include "io/net-listener.h"
+#include "qapi/qapi-events-net.h"
 
 typedef struct NetStreamState {
 NetClientState nc;
@@ -168,6 +169,8 @@ static gboolean net_stream_send(QIOChannel *ioc,
 s->nc.link_down = true;
 qemu_set_info_str(&s->nc, "");
 
+qapi_event_send_netdev_stream_disconnected(s->nc.name);
+
 return G_SOURCE_REMOVE;
 }
 buf = buf1;
@@ -244,6 +247,7 @@ static void net_stream_listen(QIONetListener *listener,
 uri = socket_uri(addr);
 qemu_set_info_str(&s->nc, uri);
 g_free(uri);
+qapi_event_send_netdev_stream_connected(s->nc.name, addr);
 qapi_free_SocketAddress(addr);
 }
 
@@ -335,6 +339,7 @@ static void net_stream_client_connected(QIOTask *task, 
gpointer opaque)
 s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN, net_stream_send,
 s, NULL);
 s->nc.link_down = false;
+qapi_event_send_netdev_stream_connected(s->nc.name, addr);
 qapi_free_SocketAddress(addr);
 
 return;
diff --git a/qapi/net.json b/qapi/net.json
index cbc8d77..522ac58 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -898,3 +898,52 @@
 ##
 { 'event': 'FAILOVER_NEGOTIATED',
   'data': {'device-id': 'str'} }
+
+##
+# @NETDEV_STREAM_CONNECTED:
+#
+# Emitted when the netdev stream backend is connected
+#
+# @netdev-id: QEMU netdev id that is connected
+# @addr: The destination address
+#
+# Since: 7.2
+#
+# Example:
+#
+# <- { "event": "NETDEV_STREAM_CONNECTED",
+#  "data": { "netdev-id": "netdev0",
+#"addr": { "port": "47666", "ipv6": true,
+#  "host": "::1", "type": "inet" } },
+#  "timestamp": { "seconds": 1666269863, "microseconds": 311222 } }
+#
+# or
+#
+# <- { "event": "NETDEV_STREAM_CONNECTED",
+#  "data": { "netdev-id": "netdev0",
+#"addr": { "path": "/tmp/qemu0", "type": "unix" } },
+#  "timestamp": { "seconds": 1666269706, "microseconds": 413651 } }
+#
+##
+{ 'event': 'NETDEV_STREAM_CONNECTED',
+  'data': { 'netdev-id': 'str',
+'addr': 'SocketAddress' } }
+
+##
+# @NETDEV_STREAM_DISCONNECTED:
+#
+# Emitted when the netdev stream backend is disconnected
+#
+# @netdev-id: QEMU netdev id that is disconnected
+#
+# Since: 7.2
+#
+# Example:
+#
+# <- { 'event': 'NETDEV_STREAM_DISCONNECTED',
+#  'data': {'netdev-id': 'netdev0'},
+#  'timestamp': {'seconds': 1663330937, 'microseconds': 526695} }
+#
+##
+{ 'event': 'NETDEV_STREAM_DISCONNECTED',
+  'data': { 'netdev-id': 'str' } }
-- 
2.7.4




[PULL 01/26] virtio-net: fix bottom-half packet TX on asynchronous completion

2022-10-27 Thread Jason Wang
From: Laurent Vivier 

When virtio-net is used with the socket netdev backend, the backend
can be busy and not able to collect new packets.

In this case, net_socket_receive() returns 0 and registers a poll function
to detect when the socket is ready again.

In virtio_net_tx_bh(), virtio_net_flush_tx() forwards the 0, the virtio
notifications are disabled and the function is not re-scheduled, waiting
for the backend to be ready.

When the socket netdev backend is again able to send packets, the poll
function re-starts to flush remaining packets. This is done by
calling virtio_net_tx_complete(). It re-enables notifications and calls
again virtio_net_flush_tx().

But it seems if virtio_net_flush_tx() reaches the tx_burst value all
the queue is not flushed and no new notification is sent to re-schedule
virtio_net_tx_bh(). Nothing re-start to flush the queue and remaining
packets are stuck in the queue.

To fix that, detect in virtio_net_tx_complete() if virtio_net_flush_tx()
has been stopped by tx_burst and if yes re-schedule the bottom half
function virtio_net_tx_bh() to flush the remaining packets.

This is what is done in virtio_net_tx_bh() when the virtio_net_flush_tx()
is synchronous, and completly by-passed when the operation needs to be
asynchronous.

Fixes: a697a334b3c4 ("virtio-net: Introduce a new bottom half packet TX")
Cc: alex.william...@redhat.com
Signed-off-by: Laurent Vivier 
Reviewed-by: Michael S. Tsirkin 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hw/net/virtio-net.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index e9f696b..1fbf2f3 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2526,6 +2526,7 @@ static void virtio_net_tx_complete(NetClientState *nc, 
ssize_t len)
 VirtIONet *n = qemu_get_nic_opaque(nc);
 VirtIONetQueue *q = virtio_net_get_subqueue(nc);
 VirtIODevice *vdev = VIRTIO_DEVICE(n);
+int ret;
 
 virtqueue_push(q->tx_vq, q->async_tx.elem, 0);
 virtio_notify(vdev, q->tx_vq);
@@ -2534,7 +2535,17 @@ static void virtio_net_tx_complete(NetClientState *nc, 
ssize_t len)
 q->async_tx.elem = NULL;
 
 virtio_queue_set_notification(q->tx_vq, 1);
-virtio_net_flush_tx(q);
+ret = virtio_net_flush_tx(q);
+if (q->tx_bh && ret >= n->tx_burst) {
+/*
+ * the flush has been stopped by tx_burst
+ * we will not receive notification for the
+ * remainining part, so re-schedule
+ */
+virtio_queue_set_notification(q->tx_vq, 0);
+qemu_bh_schedule(q->tx_bh);
+q->tx_waiting = 1;
+}
 }
 
 /* TX */
-- 
2.7.4




megasa regression in 7.1?

2022-10-27 Thread John Johnson

I pulled 7.1, and the megasas driver stopped being able to do reads 
from a disk.
It looks to be related to this commit:

https://github.com/qemu/qemu/commit/fe9d8927e265fd723a6dc87cd6d220f4677dbe1f#diffe3f5f30efc54747e0624dca63e5f55f0012736c1875b6e85526b3514e6911be3

which added some command buffer bounds checking to the SCSI subsysem.  
Unfortunately,
when the megasas QEMU emulation receives a direct I/O command from the device 
driver
in megasas_handle_io(), it synthesizes a SCSI command from it in 
megasas_encode_lba(),
but passes the command buffer length from the driver frame instead of the 
length of the
buffer it synthesized the SCSI command in.  The driver (at least the Linux 4.14 
version
I’m using) does not fill in the command buffer length in direct I/O frames, so
scsi_req_new() sees a 0 length command and fails it.


I worked around this issue with:

diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c
index 7082456..6e11607 100644
--- a/hw/scsi/megasas.c
+++ b/hw/scsi/megasas.c
@@ -1823,7 +1823,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd 
*cmd, int frame_cmd)
 
 megasas_encode_lba(cdb, lba_start, lba_count, is_write);
 cmd->req = scsi_req_new(sdev, cmd->index,
-lun_id, cdb, cdb_len, cmd);
+lun_id, cdb, sizeof (cdb), cmd);
 if (!cmd->req) {
 trace_megasas_scsi_req_alloc_failed(
 mfi_frame_desc(frame_cmd), target_id, lun_id);

and the driver can read the disk again, but I’m not sure this is the correct
fix since cdb_len is used for bounds checking elsewhere in megagsas_handle_io(),
although a 0 won’t fail there.

Is there anyone with megasas experience who could comment on this?
Thanks

JJ



[PATCH v6 11/11] tests/qtest: Enable qtest build on Windows

2022-10-27 Thread Bin Meng
Now that we have fixed various test case issues as seen when running
on Windows, let's enable the qtest build on Windows.

Signed-off-by: Bin Meng 
Reviewed-by: Thomas Huth 

---

Changes in v5:
- Drop patches that are already merged

Changes in v3:
- Drop the host test

Changes in v2:
- new patch: "tests/qtest: Enable qtest build on Windows"

 tests/qtest/meson.build | 6 --
 1 file changed, 6 deletions(-)

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c07a5b1a5f..f0ebb5fac6 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -1,9 +1,3 @@
-# All QTests for now are POSIX-only, but the dependencies are
-# really in libqtest, not in the testcases themselves.
-if not config_host.has_key('CONFIG_POSIX')
-  subdir_done()
-endif
-
 slow_qtests = {
   'ahci-test' : 60,
   'bios-tables-test' : 120,
-- 
2.25.1




[PATCH v6 09/11] tests/qtest: libqtest: Correct the timeout unit of blocking receive calls for win32

2022-10-27 Thread Bin Meng
Some qtest cases don't get response from the QEMU executable under
test in time on Windows. It turns out that the socket receive call
got timeout before it receive the complete response.

The timeout value is supposed to be set to 50 seconds via the
setsockopt() call, but there is a difference among platforms.
The timeout unit of blocking receive calls is measured in
seconds on non-Windows platforms but milliseconds on Windows.

Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 
---

(no changes since v1)

 tests/qtest/libqtest.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index e1e2d39a6e..2fbc3b88f3 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -36,13 +36,14 @@
 #include "qapi/qmp/qstring.h"
 
 #define MAX_IRQ 256
-#define SOCKET_TIMEOUT 50
 
 #ifndef _WIN32
+# define SOCKET_TIMEOUT 50
 # define CMD_EXEC   "exec "
 # define DEV_STDERR "/dev/fd/2"
 # define DEV_NULL   "/dev/null"
 #else
+# define SOCKET_TIMEOUT 5
 # define CMD_EXEC   ""
 # define DEV_STDERR "2"
 # define DEV_NULL   "nul"
@@ -106,8 +107,16 @@ static int socket_accept(int sock)
 struct sockaddr_un addr;
 socklen_t addrlen;
 int ret;
+/*
+ * timeout unit of blocking receive calls is different among platfoms.
+ * It's in seconds on non-Windows platforms but milliseconds on Windows.
+ */
+#ifndef _WIN32
 struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT,
.tv_usec = 0 };
+#else
+DWORD timeout = SOCKET_TIMEOUT;
+#endif
 
 if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO,
(void *)&timeout, sizeof(timeout))) {
-- 
2.25.1




[PATCH v6 00/11] tests/qtest: Enable running qtest on Windows

2022-10-27 Thread Bin Meng
In preparation to adding virtio-9p support on Windows, this series
enables running qtest on Windows, so that we can run the virtio-9p
tests on Windows to make sure it does not break accidently.

Changes in v6:
- drop patches that are already in Alex and Daniel's tree
- remove CONFIG_POSIX from meson.build
- include  in libqtest.c
- move documentation comments of qemu_send_full() from util/osdep.c
  to qemu/sockets.h
- save the "exit_code" in struct QTestState
- new patch: "tests/qtest: device-plug-test: Reverse the usage of double/single 
quotes"
- new patch: "tests/qtest: Use EXIT_FAILURE instead of magic number"
- new patch: "tests/qtest: libqtest: Introduce qtest_wait_qemu()"
- change to use qtest_wait_qemu() API
- new patch: "test/qtest/libqos: meson.build: Do not build virtio-9p 
unconditionally"

Changes in v5:
- restore to v1 version which does not touch the posix implementation
- Drop patches that are already merged

Changes in v3:
- Add a usleep(1) in the busy wait loop
- Drop the host test

Changes in v2:
- Introduce qemu_send_full() and use it
- Move the enabling of building qtests on Windows to a separate
  patch to keep bisectablity
- Call socket_init() unconditionally
- Add a missing CloseHandle() call
- Change to a busy wait after migration is canceled
- Change the timeout limit to 90 minutes
- new patch: "tests/qtest: Enable qtest build on Windows"

Bin Meng (8):
  tests/qtest: Support libqtest to build and run on Windows
  tests/qtest: device-plug-test: Reverse the usage of double/single
quotes
  tests/qtest: Use EXIT_FAILURE instead of magic number
  tests/qtest: libqtest: Introduce qtest_wait_qemu()
  tests/qtest: libqos: Do not build virtio-9p unconditionally
  tests/qtest: libqtest: Correct the timeout unit of blocking receive
calls for win32
  .gitlab-ci.d/windows.yml: Increase the timeout to 90 minutes
  tests/qtest: Enable qtest build on Windows

Xuzhou Cheng (3):
  accel/qtest: Support qtest accelerator for Windows
  tests/qtest: Use send/recv for socket communication
  tests/qtest: migration-test: Make sure QEMU process "to" exited after
migration is canceled

 include/hw/core/cpu.h   |   1 +
 include/qemu/sockets.h  |  13 +++
 tests/qtest/libqtest.h  |   9 ++
 accel/dummy-cpus.c  |  14 ++-
 softmmu/cpus.c  |   9 +-
 tests/qtest/dbus-vmstate-test.c |   2 +-
 tests/qtest/device-plug-test.c  |  16 ++--
 tests/qtest/libqmp.c|   5 +-
 tests/qtest/libqtest.c  | 151 
 tests/qtest/migration-test.c|   8 +-
 util/osdep.c|  22 +
 .gitlab-ci.d/windows.yml|   4 +-
 accel/meson.build   |   2 +-
 accel/qtest/meson.build |   3 +-
 tests/qtest/libqos/meson.build  |   6 +-
 tests/qtest/meson.build |   6 --
 16 files changed, 221 insertions(+), 50 deletions(-)

-- 
2.25.1




Re: [PATCH v14 16/17] tests/qtest: netdev: test stream and dgram backends

2022-10-27 Thread Jason Wang



在 2022/10/21 17:09, Laurent Vivier 写道:

Signed-off-by: Laurent Vivier 
Acked-by: Michael S. Tsirkin 
---



I got this:

63/63 ERROR:../tests/qtest/netdev-socket.c:139:test_stream_inet_ipv6: 
assertion failed (resp == expect): ("st0: index=0,type=stream,connection 
error\r\n" == "st0: index=0,type=stream,tcp:::1:40389\r\n") ERROR
63/63 qemu:qtest+qtest-x86_64 / 
qtest-x86_64/netdev-socket  ERROR 5.29s   killed by 
signal 6 SIGABRT
>>> QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 
MALLOC_PERTURB_=96 
QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh 
/home/devel/git/qemu/build/tests/qtest/netdev-socket --tap -k
――― 
✀ 
―――

stderr:
**
ERROR:../tests/qtest/netdev-socket.c:139:test_stream_inet_ipv6: 
assertion failed (resp == expect): ("st0: index=0,type=stream,connection 
error\r\n" == "st0: index=0,type=stream,tcp:::1:40389\r\n")


(test program exited with status code -6)
――

The base is:

commit 344744e148e6e865f5a57e745b02a87e5ea534ad (HEAD -> master, 
origin/master, origin/HEAD)

Merge: 08a5d04606 e38c24cb58
Author: Stefan Hajnoczi 
Date:   Wed Oct 26 10:53:48 2022 -0400

    Merge tag 'dump-pull-request' of 
https://gitlab.com/marcandre.lureau/qemu into staging


So I dropped this patch from the queue and we can add it back after 
soft-freeze.


Thanks



  tests/qtest/meson.build |   1 +
  tests/qtest/netdev-socket.c | 420 
  2 files changed, 421 insertions(+)
  create mode 100644 tests/qtest/netdev-socket.c

diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build
index c07a5b1a5f43..6953797e4e3e 100644
--- a/tests/qtest/meson.build
+++ b/tests/qtest/meson.build
@@ -27,6 +27,7 @@ qtests_generic = [
'test-hmp',
'qos-test',
'readconfig-test',
+  'netdev-socket',
  ]
  if config_host.has_key('CONFIG_MODULES')
qtests_generic += [ 'modules-test' ]
diff --git a/tests/qtest/netdev-socket.c b/tests/qtest/netdev-socket.c
new file mode 100644
index ..b24c0819b9ac
--- /dev/null
+++ b/tests/qtest/netdev-socket.c
@@ -0,0 +1,420 @@
+/*
+ * QTest testcase for netdev stream and dgram
+ *
+ * Copyright (c) 2022 Red Hat, Inc.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+
+#define CONNECTION_TIMEOUT5
+
+#define EXPECT_STATE(q, e, t) \
+do {  \
+char *resp = qtest_hmp(q, "info network");\
+if (t) {  \
+strrchr(resp, t)[0] = 0;  \
+} \
+g_test_timer_start(); \
+while (g_test_timer_elapsed() < CONNECTION_TIMEOUT) { \
+if (strcmp(resp, e) == 0) {   \
+break;\
+} \
+g_free(resp); \
+resp = qtest_hmp(q, "info network");  \
+if (t) {  \
+strrchr(resp, t)[0] = 0;  \
+} \
+} \
+g_assert_cmpstr(resp, ==, e); \
+g_free(resp); \
+} while (0)
+
+static int inet_get_free_port_socket(int sock)
+{
+struct sockaddr_in addr;
+socklen_t len;
+
+memset(&addr, 0, sizeof(addr));
+addr.sin_family = AF_INET;
+addr.sin_addr.s_addr = INADDR_ANY;
+addr.sin_port = 0;
+if (bind(sock, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
+return -1;
+}
+
+len = sizeof(addr);
+if (getsockname(sock,  (struct sockaddr *)&addr, &len) < 0) {
+return -1;
+}
+
+return ntohs(addr.sin_port);
+}
+
+static int inet_get_free_port_multiple(int nb, int *port)
+{
+int sock[nb];
+int i;
+
+for (i = 0; i < nb; i++) {
+sock[i] = socket(AF_INET, SOCK_STREAM, 0);
+if (sock[i] < 0) {
+break;
+}
+port[i] = inet_get_free_port_socket(sock[i]);
+}
+
+nb = i;
+for (i = 0; i < nb; i++) {
+closesocket(sock[i]);
+}
+
+return nb;
+}
+
+static int inet_get_free_port(void)
+{
+int nb, port;
+
+nb = inet_get_free_port_multiple(1, &port);
+g_assert_cmpint(nb, ==, 1);
+
+return port;
+}
+
+static void test_stream_inet_ipv4

[PATCH v6 07/11] tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled

2022-10-27 Thread Bin Meng
From: Xuzhou Cheng 

Make sure QEMU process "to" exited before launching another target
for migration in the test_multifd_tcp_cancel case.

Signed-off-by: Xuzhou Cheng 
Signed-off-by: Bin Meng 

---

Changes in v6:
- change to use qtest_wait_qemu() API

Changes in v3:
- Add a usleep(1) in the busy wait loop

Changes in v2:
- Change to a busy wait after migration is canceled

 tests/qtest/migration-test.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 28a06d8170..d2eb107f0c 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2141,6 +2141,10 @@ static void test_multifd_tcp_cancel(void)
 
 migrate_cancel(from);
 
+/* Make sure QEMU process "to" exited */
+qtest_set_expected_status(to, EXIT_FAILURE);
+qtest_wait_qemu(to);
+
 args = (MigrateStart){
 .only_target = true,
 };
-- 
2.25.1




[PATCH v6 01/11] accel/qtest: Support qtest accelerator for Windows

2022-10-27 Thread Bin Meng
From: Xuzhou Cheng 

Currently signal SIGIPI [=SIGUSR1] is used to kick the dummy CPU
when qtest accelerator is used. However SIGUSR1 is unsupported on
Windows. To support Windows, we add a QemuSemaphore CPUState::sem
to kick the dummy CPU instead for Windows.

Signed-off-by: Xuzhou Cheng 
Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 

---

Changes in v6:
- remove CONFIG_POSIX from meson.build

Changes in v5:
- restore to v1 version which does not touch the posix implementation

 include/hw/core/cpu.h   |  1 +
 accel/dummy-cpus.c  | 14 --
 softmmu/cpus.c  |  9 +
 accel/meson.build   |  2 +-
 accel/qtest/meson.build |  3 +--
 5 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index f9b58773f7..8830546121 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -333,6 +333,7 @@ struct CPUState {
 struct QemuThread *thread;
 #ifdef _WIN32
 HANDLE hThread;
+QemuSemaphore sem;
 #endif
 int thread_id;
 bool running, has_waiter;
diff --git a/accel/dummy-cpus.c b/accel/dummy-cpus.c
index 10429fdfb2..d6a1b8d0a2 100644
--- a/accel/dummy-cpus.c
+++ b/accel/dummy-cpus.c
@@ -21,8 +21,6 @@
 static void *dummy_cpu_thread_fn(void *arg)
 {
 CPUState *cpu = arg;
-sigset_t waitset;
-int r;
 
 rcu_register_thread();
 
@@ -32,8 +30,13 @@ static void *dummy_cpu_thread_fn(void *arg)
 cpu->can_do_io = 1;
 current_cpu = cpu;
 
+#ifndef _WIN32
+sigset_t waitset;
+int r;
+
 sigemptyset(&waitset);
 sigaddset(&waitset, SIG_IPI);
+#endif
 
 /* signal CPU creation */
 cpu_thread_signal_created(cpu);
@@ -41,6 +44,7 @@ static void *dummy_cpu_thread_fn(void *arg)
 
 do {
 qemu_mutex_unlock_iothread();
+#ifndef _WIN32
 do {
 int sig;
 r = sigwait(&waitset, &sig);
@@ -49,6 +53,9 @@ static void *dummy_cpu_thread_fn(void *arg)
 perror("sigwait");
 exit(1);
 }
+#else
+qemu_sem_wait(&cpu->sem);
+#endif
 qemu_mutex_lock_iothread();
 qemu_wait_io_event(cpu);
 } while (!cpu->unplug);
@@ -69,4 +76,7 @@ void dummy_start_vcpu_thread(CPUState *cpu)
  cpu->cpu_index);
 qemu_thread_create(cpu->thread, thread_name, dummy_cpu_thread_fn, cpu,
QEMU_THREAD_JOINABLE);
+#ifdef _WIN32
+qemu_sem_init(&cpu->sem, 0);
+#endif
 }
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index 61b27ff59d..9dd1a4dc17 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -437,18 +437,19 @@ void qemu_wait_io_event(CPUState *cpu)
 
 void cpus_kick_thread(CPUState *cpu)
 {
-#ifndef _WIN32
-int err;
-
 if (cpu->thread_kicked) {
 return;
 }
 cpu->thread_kicked = true;
-err = pthread_kill(cpu->thread->thread, SIG_IPI);
+
+#ifndef _WIN32
+int err = pthread_kill(cpu->thread->thread, SIG_IPI);
 if (err && err != ESRCH) {
 fprintf(stderr, "qemu:%s: %s", __func__, strerror(err));
 exit(1);
 }
+#else
+qemu_sem_post(&cpu->sem);
 #endif
 }
 
diff --git a/accel/meson.build b/accel/meson.build
index b9a963cf80..259c35c4c8 100644
--- a/accel/meson.build
+++ b/accel/meson.build
@@ -16,5 +16,5 @@ dummy_ss.add(files(
   'dummy-cpus.c',
 ))
 
-specific_ss.add_all(when: ['CONFIG_SOFTMMU', 'CONFIG_POSIX'], if_true: 
dummy_ss)
+specific_ss.add_all(when: ['CONFIG_SOFTMMU'], if_true: dummy_ss)
 specific_ss.add_all(when: ['CONFIG_XEN'], if_true: dummy_ss)
diff --git a/accel/qtest/meson.build b/accel/qtest/meson.build
index 4c65600293..176d990ae1 100644
--- a/accel/qtest/meson.build
+++ b/accel/qtest/meson.build
@@ -1,2 +1 @@
-qtest_module_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_POSIX'],
-if_true: files('qtest.c'))
+qtest_module_ss.add(when: ['CONFIG_SOFTMMU'], if_true: files('qtest.c'))
-- 
2.25.1




[PATCH v6 08/11] tests/qtest: libqos: Do not build virtio-9p unconditionally

2022-10-27 Thread Bin Meng
At present the virtio-9p related codes are built into libqos
unconditionally. Change to build them conditionally by testing
the 'virtfs' config option.

Signed-off-by: Bin Meng 

---

Changes in v6:
- new patch: "test/qtest/libqos: meson.build: Do not build virtio-9p 
unconditionally"

 tests/qtest/libqos/meson.build | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build
index 113c80b4e4..32f028872c 100644
--- a/tests/qtest/libqos/meson.build
+++ b/tests/qtest/libqos/meson.build
@@ -33,8 +33,6 @@ libqos_srcs = files(
 'sdhci.c',
 'tpci200.c',
 'virtio.c',
-'virtio-9p.c',
-'virtio-9p-client.c',
 'virtio-balloon.c',
 'virtio-blk.c',
 'vhost-user-blk.c',
@@ -62,6 +60,10 @@ libqos_srcs = files(
 'x86_64_pc-machine.c',
 )
 
+if have_virtfs
+  libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c')
+endif
+
 libqos = static_library('qos', libqos_srcs + genh,
 name_suffix: 'fa',
 build_by_default: false)
-- 
2.25.1




[PATCH v6 05/11] tests/qtest: Use EXIT_FAILURE instead of magic number

2022-10-27 Thread Bin Meng
When migration fails, QEMU exits with a status code EXIT_FAILURE.
Change qtests to use the well-defined macro instead of magic number.

Signed-off-by: Bin Meng 

---

Changes in v6:
- new patch: "tests/qtest: Use EXIT_FAILURE instead of magic number"

 tests/qtest/dbus-vmstate-test.c | 2 +-
 tests/qtest/migration-test.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/dbus-vmstate-test.c b/tests/qtest/dbus-vmstate-test.c
index 74ede651f6..6c990864e3 100644
--- a/tests/qtest/dbus-vmstate-test.c
+++ b/tests/qtest/dbus-vmstate-test.c
@@ -233,7 +233,7 @@ test_dbus_vmstate(Test *test)
 test->src_qemu = src_qemu;
 if (test->migrate_fail) {
 wait_for_migration_fail(src_qemu, true);
-qtest_set_expected_status(dst_qemu, 1);
+qtest_set_expected_status(dst_qemu, EXIT_FAILURE);
 } else {
 wait_for_migration_complete(src_qemu);
 }
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index aa1ba179fa..28a06d8170 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1342,7 +1342,7 @@ static void test_precopy_common(MigrateCommon *args)
 wait_for_migration_fail(from, allow_active);
 
 if (args->result == MIG_TEST_FAIL_DEST_QUIT_ERR) {
-qtest_set_expected_status(to, 1);
+qtest_set_expected_status(to, EXIT_FAILURE);
 }
 } else {
 if (args->iterations) {
@@ -1738,7 +1738,7 @@ static void do_test_validate_uuid(MigrateStart *args, 
bool should_fail)
 migrate_qmp(from, uri, "{}");
 
 if (should_fail) {
-qtest_set_expected_status(to, 1);
+qtest_set_expected_status(to, EXIT_FAILURE);
 wait_for_migration_fail(from, true);
 } else {
 wait_for_migration_complete(from);
-- 
2.25.1




[PATCH v6 04/11] tests/qtest: device-plug-test: Reverse the usage of double/single quotes

2022-10-27 Thread Bin Meng
The usage of double/single quotes in test_q35_pci_unplug_json_request()
should be reversed to work on both win32 and non-win32 platforms:

- The value of -device parameter needs to be surrounded by "" as
  Windows does not drop '' when passing it to QEMU which causes
  QEMU command line option parser failure.
- The JSON key/value pairs need to be surrounded by '' to make the
  JSON parser happy on Windows.

Fixes: a12f1a7e56b7 ("tests/x86: Add subtest with 'q35' machine type to 
device-plug-test")
Signed-off-by: Bin Meng 

---

Changes in v6:
- new patch: "tests/qtest: device-plug-test: Reverse the usage of double/single 
quotes"

 tests/qtest/device-plug-test.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c
index 3f44f731d1..5a6afa2b57 100644
--- a/tests/qtest/device-plug-test.c
+++ b/tests/qtest/device-plug-test.c
@@ -112,16 +112,16 @@ static void test_pci_unplug_json_request(void)
 
 static void test_q35_pci_unplug_json_request(void)
 {
-const char *port = "-device '{\"driver\": \"pcie-root-port\", "
-  "\"id\": \"p1\"}'";
+const char *port = "-device \"{'driver': 'pcie-root-port', "
+  "'id': 'p1'}\"";
 
-const char *bridge = "-device '{\"driver\": \"pcie-pci-bridge\", "
-   "\"id\": \"b1\", "
-   "\"bus\": \"p1\"}'";
+const char *bridge = "-device \"{'driver': 'pcie-pci-bridge', "
+"'id': 'b1', "
+"'bus': 'p1'}\"";
 
-const char *device = "-device '{\"driver\": \"virtio-mouse-pci\", "
-   "\"bus\": \"b1\", "
-   "\"id\": \"dev0\"}'";
+const char *device = "-device \"{'driver': 'virtio-mouse-pci', "
+"'bus': 'b1', "
+"'id': 'dev0'}\"";
 
 QTestState *qtest = qtest_initf("-machine q35 %s %s %s",
 port, bridge, device);
-- 
2.25.1




[PATCH v6 10/11] .gitlab-ci.d/windows.yml: Increase the timeout to 90 minutes

2022-10-27 Thread Bin Meng
commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using 
--without-default-devices"
changed to compile QEMU with the --without-default-devices switch for
the msys2-64bit job, due to the build could not complete within the
project timeout (1h), and also mentioned that a bigger timeout was
getting ignored on the shared Gitlab-CI Windows runners.

However as of today it seems the shared Gitlab-CI Windows runners does
honor the job timeout, and the runner has the timeout limit of 2h, so
let's increase the timeout to 90 minutes and drop the configure switch
"--without-default-devices" to get a larger build coverage.

Signed-off-by: Bin Meng 

---

(no changes since v2)

Changes in v2:
- Change the timeout limit to 90 minutes

 .gitlab-ci.d/windows.yml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml
index a3e7a37022..093276ddbc 100644
--- a/.gitlab-ci.d/windows.yml
+++ b/.gitlab-ci.d/windows.yml
@@ -10,7 +10,7 @@
   - ${CI_PROJECT_DIR}/msys64/var/cache
   needs: []
   stage: build
-  timeout: 70m
+  timeout: 90m
   before_script:
   - If ( !(Test-Path -Path msys64\var\cache ) ) {
   mkdir msys64\var\cache
@@ -60,7 +60,7 @@ msys2-64bit:
   - $env:MSYSTEM = 'MINGW64' # Start a 64 bit Mingw environment
   - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink
   - .\msys64\usr\bin\bash -lc './configure --target-list=x86_64-softmmu
-  --enable-capstone --without-default-devices'
+  --enable-capstone'
   - .\msys64\usr\bin\bash -lc 'make'
   - .\msys64\usr\bin\bash -lc 'make check || { cat 
build/meson-logs/testlog.txt; exit 1; } ;'
 
-- 
2.25.1




[PATCH v6 03/11] tests/qtest: Support libqtest to build and run on Windows

2022-10-27 Thread Bin Meng
At present the libqtest codes were written to depend on several
POSIX APIs, including fork(), kill() and waitpid(). Unfortunately
these APIs are not available on Windows.

This commit implements the corresponding functionalities using
win32 native APIs. With this change, all qtest cases can build
successfully on a Windows host, and we can start qtest testing
on Windows now.

Signed-off-by: Xuzhou Cheng 
Signed-off-by: Bin Meng 

---

Changes in v6:
- save the "exit_code" in struct QTestState

Changes in v2:
- Move the enabling of building qtests on Windows to a separate
  patch to keep bisectablity
- Call socket_init() unconditionally
- Add a missing CloseHandle() call

 tests/qtest/libqtest.c | 96 +-
 1 file changed, 94 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index b01846fd98..d12a604d78 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -16,9 +16,11 @@
 
 #include "qemu/osdep.h"
 
+#ifndef _WIN32
 #include 
 #include 
 #include 
+#endif /* _WIN32 */
 #ifdef __linux__
 #include 
 #endif /* __linux__ */
@@ -36,6 +38,16 @@
 #define MAX_IRQ 256
 #define SOCKET_TIMEOUT 50
 
+#ifndef _WIN32
+# define CMD_EXEC   "exec "
+# define DEV_STDERR "/dev/fd/2"
+# define DEV_NULL   "/dev/null"
+#else
+# define CMD_EXEC   ""
+# define DEV_STDERR "2"
+# define DEV_NULL   "nul"
+#endif
+
 typedef void (*QTestSendFn)(QTestState *s, const char *buf);
 typedef void (*ExternalSendFn)(void *s, const char *buf);
 typedef GString* (*QTestRecvFn)(QTestState *);
@@ -58,6 +70,9 @@ struct QTestState
 int qmp_fd;
 pid_t qemu_pid;  /* our child QEMU process */
 int wstatus;
+#ifdef _WIN32
+DWORD exit_code;
+#endif
 int expected_status;
 bool big_endian;
 bool irq_level[MAX_IRQ];
@@ -119,10 +134,18 @@ bool qtest_probe_child(QTestState *s)
 pid_t pid = s->qemu_pid;
 
 if (pid != -1) {
+#ifndef _WIN32
 pid = waitpid(pid, &s->wstatus, WNOHANG);
 if (pid == 0) {
 return true;
 }
+#else
+GetExitCodeProcess((HANDLE)pid, &s->exit_code);
+if (s->exit_code == STILL_ACTIVE) {
+return true;
+}
+CloseHandle((HANDLE)pid);
+#endif
 s->qemu_pid = -1;
 }
 return false;
@@ -136,13 +159,25 @@ void qtest_set_expected_status(QTestState *s, int status)
 void qtest_kill_qemu(QTestState *s)
 {
 pid_t pid = s->qemu_pid;
+#ifndef _WIN32
 int wstatus;
+#else
+DWORD ret;
+#endif
 
 /* Skip wait if qtest_probe_child already reaped.  */
 if (pid != -1) {
+#ifndef _WIN32
 kill(pid, SIGTERM);
 TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
 assert(pid == s->qemu_pid);
+#else
+TerminateProcess((HANDLE)pid, s->expected_status);
+ret = WaitForSingleObject((HANDLE)pid, INFINITE);
+assert(ret == WAIT_OBJECT_0);
+GetExitCodeProcess((HANDLE)pid, &s->exit_code);
+CloseHandle((HANDLE)pid);
+#endif
 s->qemu_pid = -1;
 }
 
@@ -150,6 +185,7 @@ void qtest_kill_qemu(QTestState *s)
  * Check whether qemu exited with expected exit status; anything else is
  * fishy and should be logged with as much detail as possible.
  */
+#ifndef _WIN32
 wstatus = s->wstatus;
 if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != s->expected_status) {
 fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
@@ -166,6 +202,14 @@ void qtest_kill_qemu(QTestState *s)
 __FILE__, __LINE__, sig, signame, dump);
 abort();
 }
+#else
+if (s->exit_code != s->expected_status) {
+fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
+"process but encountered exit status %ld (expected %d)\n",
+__FILE__, __LINE__, s->exit_code, s->expected_status);
+abort();
+}
+#endif
 }
 
 static void kill_qemu_hook_func(void *s)
@@ -244,6 +288,38 @@ static const char *qtest_qemu_binary(void)
 return qemu_bin;
 }
 
+#ifdef _WIN32
+static pid_t qtest_create_process(char *cmd)
+{
+STARTUPINFO si;
+PROCESS_INFORMATION pi;
+BOOL ret;
+
+ZeroMemory(&si, sizeof(si));
+si.cb = sizeof(si);
+ZeroMemory(&pi, sizeof(pi));
+
+ret = CreateProcess(NULL,   /* module name */
+cmd,/* command line */
+NULL,   /* process handle not inheritable */
+NULL,   /* thread handle not inheritable */
+FALSE,  /* set handle inheritance to FALSE */
+0,  /* No creation flags */
+NULL,   /* use parent's environment block */
+NULL,   /* use parent's starting directory */
+&si,/* pointer to STARTUPINFO structure */
+&pi /* pointer to PROCESS_INFORMATION structure */
+);
+if (ret == 0) {
+fprintf(stderr, "%s:%d: una

[PATCH v6 06/11] tests/qtest: libqtest: Introduce qtest_wait_qemu()

2022-10-27 Thread Bin Meng
Introduce an API for qtest to wait for the QEMU process to terminate.

Suggested-by: Marc-André Lureau 
Signed-off-by: Bin Meng 

---

Changes in v6:
- new patch: "tests/qtest: libqtest: Introduce qtest_wait_qemu()"

 tests/qtest/libqtest.h |  9 ++
 tests/qtest/libqtest.c | 63 +-
 2 files changed, 47 insertions(+), 25 deletions(-)

diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h
index 65c040e504..91a5f7edd9 100644
--- a/tests/qtest/libqtest.h
+++ b/tests/qtest/libqtest.h
@@ -75,6 +75,15 @@ QTestState *qtest_init_without_qmp_handshake(const char 
*extra_args);
  */
 QTestState *qtest_init_with_serial(const char *extra_args, int *sock_fd);
 
+/**
+ * qtest_wait_qemu:
+ * @s: #QTestState instance to operate on.
+ *
+ * Wait for the QEMU process to terminate. It is safe to call this function
+ * multiple times.
+ */
+void qtest_wait_qemu(QTestState *s);
+
 /**
  * qtest_kill_qemu:
  * @s: #QTestState instance to operate on.
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index d12a604d78..e1e2d39a6e 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -156,37 +156,14 @@ void qtest_set_expected_status(QTestState *s, int status)
 s->expected_status = status;
 }
 
-void qtest_kill_qemu(QTestState *s)
+static void qtest_check_status(QTestState *s)
 {
-pid_t pid = s->qemu_pid;
-#ifndef _WIN32
-int wstatus;
-#else
-DWORD ret;
-#endif
-
-/* Skip wait if qtest_probe_child already reaped.  */
-if (pid != -1) {
-#ifndef _WIN32
-kill(pid, SIGTERM);
-TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
-assert(pid == s->qemu_pid);
-#else
-TerminateProcess((HANDLE)pid, s->expected_status);
-ret = WaitForSingleObject((HANDLE)pid, INFINITE);
-assert(ret == WAIT_OBJECT_0);
-GetExitCodeProcess((HANDLE)pid, &s->exit_code);
-CloseHandle((HANDLE)pid);
-#endif
-s->qemu_pid = -1;
-}
-
 /*
  * Check whether qemu exited with expected exit status; anything else is
  * fishy and should be logged with as much detail as possible.
  */
 #ifndef _WIN32
-wstatus = s->wstatus;
+int wstatus = s->wstatus;
 if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != s->expected_status) {
 fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU "
 "process but encountered exit status %d (expected %d)\n",
@@ -212,6 +189,42 @@ void qtest_kill_qemu(QTestState *s)
 #endif
 }
 
+void qtest_wait_qemu(QTestState *s)
+{
+#ifndef _WIN32
+pid_t pid;
+
+TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
+assert(pid == s->qemu_pid);
+#else
+DWORD ret;
+
+ret = WaitForSingleObject((HANDLE)s->qemu_pid, INFINITE);
+assert(ret == WAIT_OBJECT_0);
+GetExitCodeProcess((HANDLE)s->qemu_pid, &s->exit_code);
+CloseHandle((HANDLE)s->qemu_pid);
+#endif
+
+qtest_check_status(s);
+}
+
+void qtest_kill_qemu(QTestState *s)
+{
+/* Skip wait if qtest_probe_child() already reaped */
+if (s->qemu_pid != -1) {
+#ifndef _WIN32
+kill(s->qemu_pid, SIGTERM);
+#else
+TerminateProcess((HANDLE)s->qemu_pid, s->expected_status);
+#endif
+qtest_wait_qemu(s);
+s->qemu_pid = -1;
+return;
+}
+
+qtest_check_status(s);
+}
+
 static void kill_qemu_hook_func(void *s)
 {
 qtest_kill_qemu(s);
-- 
2.25.1




[PATCH v6 02/11] tests/qtest: Use send/recv for socket communication

2022-10-27 Thread Bin Meng
From: Xuzhou Cheng 

Socket communication in the libqtest and libqmp codes uses read()
and write() which work on any file descriptor on *nix, and sockets
in *nix are an example of a file descriptor.

However sockets on Windows do not use *nix-style file descriptors,
so read() and write() cannot be used on sockets on Windows.
Switch over to use send() and recv() instead which work on both
Windows and *nix.

Signed-off-by: Xuzhou Cheng 
Signed-off-by: Bin Meng 
Reviewed-by: Marc-André Lureau 

---

Changes in v6:
- include  in libqtest.c
- move documentation comments of qemu_send_full() from util/osdep.c
  to qemu/sockets.h

Changes in v2:
- Introduce qemu_send_full() and use it

 include/qemu/sockets.h | 13 +
 tests/qtest/libqmp.c   |  5 +++--
 tests/qtest/libqtest.c |  5 +++--
 util/osdep.c   | 22 ++
 4 files changed, 41 insertions(+), 4 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 036745e586..61648f3f3c 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -33,6 +33,19 @@ int qemu_socketpair(int domain, int type, int protocol, int 
sv[2]);
 #endif
 
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
+/*
+ * A variant of send(2) which handles partial send.
+ *
+ * Return the number of bytes transferred over the socket.
+ * Set errno if fewer than `count' bytes are sent.
+ *
+ * This function don't work with non-blocking socket's.
+ * Any of the possibilities with non-blocking socket's is bad:
+ *   - return a short write (then name is wrong)
+ *   - busy wait adding (errno == EAGAIN) to the loop
+ */
+ssize_t qemu_send_full(int s, const void *buf, size_t count)
+G_GNUC_WARN_UNUSED_RESULT;
 int socket_set_cork(int fd, int v);
 int socket_set_nodelay(int fd);
 void qemu_socket_set_block(int fd);
diff --git a/tests/qtest/libqmp.c b/tests/qtest/libqmp.c
index ade26c15f0..2b08382e5d 100644
--- a/tests/qtest/libqmp.c
+++ b/tests/qtest/libqmp.c
@@ -23,6 +23,7 @@
 #endif
 
 #include "qemu/cutils.h"
+#include "qemu/sockets.h"
 #include "qapi/error.h"
 #include "qapi/qmp/json-parser.h"
 #include "qapi/qmp/qjson.h"
@@ -36,7 +37,7 @@ typedef struct {
 
 static void socket_send(int fd, const char *buf, size_t size)
 {
-size_t res = qemu_write_full(fd, buf, size);
+ssize_t res = qemu_send_full(fd, buf, size);
 
 assert(res == size);
 }
@@ -69,7 +70,7 @@ QDict *qmp_fd_receive(int fd)
 ssize_t len;
 char c;
 
-len = read(fd, &c, 1);
+len = recv(fd, &c, 1, 0);
 if (len == -1 && errno == EINTR) {
 continue;
 }
diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index b23eb3edc3..b01846fd98 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -27,6 +27,7 @@
 #include "libqmp.h"
 #include "qemu/ctype.h"
 #include "qemu/cutils.h"
+#include "qemu/sockets.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qjson.h"
 #include "qapi/qmp/qlist.h"
@@ -428,7 +429,7 @@ void qtest_quit(QTestState *s)
 
 static void socket_send(int fd, const char *buf, size_t size)
 {
-size_t res = qemu_write_full(fd, buf, size);
+ssize_t res = qemu_send_full(fd, buf, size);
 
 assert(res == size);
 }
@@ -460,7 +461,7 @@ static GString *qtest_client_socket_recv_line(QTestState *s)
 ssize_t len;
 char buffer[1024];
 
-len = read(s->fd, buffer, sizeof(buffer));
+len = recv(s->fd, buffer, sizeof(buffer), 0);
 if (len == -1 && errno == EINTR) {
 continue;
 }
diff --git a/util/osdep.c b/util/osdep.c
index 746d5f7d71..77c1a6c562 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -502,6 +502,28 @@ int qemu_accept(int s, struct sockaddr *addr, socklen_t 
*addrlen)
 return ret;
 }
 
+ssize_t qemu_send_full(int s, const void *buf, size_t count)
+{
+ssize_t ret = 0;
+ssize_t total = 0;
+
+while (count) {
+ret = send(s, buf, count, 0);
+if (ret < 0) {
+if (errno == EINTR) {
+continue;
+}
+break;
+}
+
+count -= ret;
+buf += ret;
+total += ret;
+}
+
+return total;
+}
+
 void qemu_set_hw_version(const char *version)
 {
 hw_version = version;
-- 
2.25.1




Re: [PATCH 0/3] target: Rename headers using .def extension to .h.inc

2022-10-27 Thread Markus Armbruster
Peter Maydell  writes:

> On Thu, 27 Oct 2022 at 18:17, Markus Armbruster  wrote:
>>
>> Peter Maydell  writes:
>>
>> > On Thu, 27 Oct 2022 at 15:40, Markus Armbruster  wrote:
>> >> I wonder why we use any of .def, .h.inc, .inc.h, .c.inc, .inc.c.  Why
>> >> not .h and call it a day?  No need to configure each and every editor to
>> >> tread these as C code.
>> >
>> > It says "this isn't actually a header in the usual sense". That's
>> > useful for automated scripted checks (eg we don't want
>> > scripts/clean-header-guards.pl to add the standard #include header
>> > guards to this sort of file) and for humans (if you see one of these
>> > files included as part of the normal #include block at the top of
>> > a .c file that's probably a mistake; if you see it being used then
>> > you know there's likely multiple-inclusion shenanigans going on.)
>>
>> scripts/clean-header-guards.pl needs exclude patterns anyway.
>
> Yes, in theory instead of having a systematic convention for
> filenames we could instead give the files names that don't
> let you easily distinguish them from plain old header files and
> require every use like this to update clean-header-guards.pl,
> but that seems to me to be clearly worse than maintaining the
> filename convention that we already have.

Handle that just like for plain .h: when you re-run the script, you fix
its complaints either by fixing the header or by adding it to the
exclude pattern.

>> Comments would likely work better for humans than obscure naming
>> conventions.
>>
>> Make them stylized, and they work for scripts, too.
>
> We already have a stylized convention, it's the filename...

True.  But its opaque both for tools (editors mostly) and humans.  Tools
need to be configured, and humans need to be taught.

Moreover, we don't have *a* stylized convention, we have *two*: funny
filenames (several patterns, but that's fixable), and exclude patterns.

> Comments inside the .inc file are also helpful, of course.

Let's add a third!  Stylized comments!

I'll shut up now :)




RE: [PATCH v2] softmmu/physmem: Fix input parameters for flatview_access_allowed()

2022-10-27 Thread Duan, Zhenzhong



>-Original Message-
>From: Duan, Zhenzhong 
>Sent: Friday, July 22, 2022 4:46 PM
>To: qemu-devel@nongnu.org
>Cc: pbonz...@redhat.com; pet...@redhat.com; da...@redhat.com;
>f4...@amsat.org
>Subject: [PATCH v2] softmmu/physmem: Fix input parameters for
>flatview_access_allowed()
>
>The comment of flatview_access_allowed() suggests to pass address within
>that memory region, this isn't true in some call sites.
>
>This makes qemu log in flatview_access_allowed() confusing and potential risk
>if the input parameter will be checked in the future.
>
>Signed-off-by: Zhenzhong Duan 
>Reviewed-by: Peter Xu 
>Reviewed-by: David Hildenbrand 
>---
>v2: Fix typo and removed Fixed-by per David
Hi Maintainers,

Can this patch be considered merged as it got reviewed-by and no objection for 
a long time. Thanks.

Zhenzhong

>
> softmmu/physmem.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/softmmu/physmem.c b/softmmu/physmem.c index
>fb16be57a6c6..214cb04c8fc3 100644
>--- a/softmmu/physmem.c
>+++ b/softmmu/physmem.c
>@@ -2850,7 +2850,7 @@ static MemTxResult flatview_write(FlatView *fv,
>hwaddr addr, MemTxAttrs attrs,
>
> l = len;
> mr = flatview_translate(fv, addr, &addr1, &l, true, attrs);
>-if (!flatview_access_allowed(mr, attrs, addr, len)) {
>+if (!flatview_access_allowed(mr, attrs, addr1, l)) {
> return MEMTX_ACCESS_ERROR;
> }
> return flatview_write_continue(fv, addr, attrs, buf, len, @@ -2917,7
>+2917,7 @@ static MemTxResult flatview_read(FlatView *fv, hwaddr addr,
>
> l = len;
> mr = flatview_translate(fv, addr, &addr1, &l, false, attrs);
>-if (!flatview_access_allowed(mr, attrs, addr, len)) {
>+if (!flatview_access_allowed(mr, attrs, addr1, l)) {
> return MEMTX_ACCESS_ERROR;
> }
> return flatview_read_continue(fv, addr, attrs, buf, len,
>--
>2.25.1




Re: [PATCH 0/4] Shadow VirtQueue event index support

2022-10-27 Thread Jason Wang



在 2022/10/27 04:58, Michael S. Tsirkin 写道:

On Thu, Oct 20, 2022 at 05:52:47PM +0200, Eugenio Pérez wrote:

Event idx helps to reduce the number of notifications between the device
and the driver. It allows them to specify an index on the circular
descriptors rings where to issue the notification, instead of a single
binary indicator.

Adding support for SVQ.


Jason seems to be taking this through net

Reviewed-by: Michael S. Tsirkin 



Ok, I've queued this.

Eugenio, please post the fix for endian on top.

Thanks





These patches are sent on top of [1] series, so trivial conflicts could arise
if it is applied directly on master. Future versions can be not based on
it is more convenient.

[1] https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg03280.html

Eugenio Pérez (4):
   vhost: allocate event_idx fields on vring
   vhost: toggle device callbacks using used event idx
   vhost: use avail event idx on vhost_svq_kick
   vhost: Accept event idx flag

  hw/virtio/vhost-shadow-virtqueue.c | 39 --
  1 file changed, 31 insertions(+), 8 deletions(-)

--
2.31.1






Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Alex Williamson
On Fri, 28 Oct 2022 10:23:16 +0800
Jason Wang  wrote:

> On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
>  wrote:
> >
> > On Fri, 28 Oct 2022 09:50:10 +0800
> > Jason Wang  wrote:
> >  
> > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> > >  wrote:  
> > > >
> > > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > > Cindy Lu  wrote:
> > > >  
> > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > > change the name to memory_get_xlat_addr().So we can use this
> > > > > function in other devices,such as vDPA device.
> > > > >
> > > > > Signed-off-by: Cindy Lu 
> > > > > ---
> > > > >  hw/vfio/common.c  | 92 
> > > > > ++-
> > > > >  include/exec/memory.h |  4 ++
> > > > >  softmmu/memory.c  | 84 +++
> > > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > > >
> > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > > --- a/hw/vfio/common.c
> > > > > +++ b/hw/vfio/common.c
> > > > > @@ -574,92 +574,6 @@ static bool 
> > > > > vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > > section->offset_within_address_space & (1ULL << 63);
> > > > >  }
> > > > >
> > > > > -/* Called with rcu_read_lock held.  */
> > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > -   ram_addr_t *ram_addr, bool *read_only)
> > > > > -{
> > > > > -MemoryRegion *mr;
> > > > > -hwaddr xlat;
> > > > > -hwaddr len = iotlb->addr_mask + 1;
> > > > > -bool writable = iotlb->perm & IOMMU_WO;
> > > > > -
> > > > > -/*
> > > > > - * The IOMMU TLB entry we have just covers translation through
> > > > > - * this IOMMU to its immediate target.  We need to translate
> > > > > - * it the rest of the way through to memory.
> > > > > - */
> > > > > -mr = address_space_translate(&address_space_memory,
> > > > > - iotlb->translated_addr,
> > > > > - &xlat, &len, writable,
> > > > > - MEMTXATTRS_UNSPECIFIED);
> > > > > -if (!memory_region_is_ram(mr)) {
> > > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > > - xlat);
> > > > > -return false;
> > > > > -} else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > -RamDiscardManager *rdm = 
> > > > > memory_region_get_ram_discard_manager(mr);
> > > > > -MemoryRegionSection tmp = {
> > > > > -.mr = mr,
> > > > > -.offset_within_region = xlat,
> > > > > -.size = int128_make64(len),
> > > > > -};
> > > > > -
> > > > > -/*
> > > > > - * Malicious VMs can map memory into the IOMMU, which is 
> > > > > expected
> > > > > - * to remain discarded. vfio will pin all pages, populating 
> > > > > memory.
> > > > > - * Disallow that. vmstate priorities make sure any 
> > > > > RamDiscardManager
> > > > > - * were already restored before IOMMUs are restored.
> > > > > - */
> > > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > -error_report("iommu map to discarded memory (e.g., 
> > > > > unplugged via"
> > > > > - " virtio-mem): %"HWADDR_PRIx"",
> > > > > - iotlb->translated_addr);
> > > > > -return false;
> > > > > -}
> > > > > -
> > > > > -/*
> > > > > - * Malicious VMs might trigger discarding of IOMMU-mapped 
> > > > > memory. The
> > > > > - * pages will remain pinned inside vfio until unmapped, 
> > > > > resulting in a
> > > > > - * higher memory consumption than expected. If memory would 
> > > > > get
> > > > > - * populated again later, there would be an inconsistency 
> > > > > between pages
> > > > > - * pinned by vfio and pages seen by QEMU. This is the case 
> > > > > until
> > > > > - * unmapped from the IOMMU (e.g., during device reset).
> > > > > - *
> > > > > - * With malicious guests, we really only care about pinning 
> > > > > more memory
> > > > > - * than expected. RLIMIT_MEMLOCK set for the user/process 
> > > > > can never be
> > > > > - * exceeded and can be used to mitigate this problem.
> > > > > - */
> > > > > -warn_report_once("Using vfio with vIOMMUs and coordinated 
> > > > > discarding of"
> > > > > - " RAM (e.g., virtio-mem) works, however, 
> > > > > malicious"
> > > > > - " guests can trigger pinning of more memory 
> > > > > than"
> > > > > - " intended via an IOMMU. It's possible to 
> > > > > mitigate "
> > > > > - " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > > -}
> > > > > -
> > > > > -/*
> > > >

Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Jason Wang
On Fri, Oct 28, 2022 at 10:35 AM Alex Williamson
 wrote:
>
> On Fri, 28 Oct 2022 10:23:16 +0800
> Jason Wang  wrote:
>
> > On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
> >  wrote:
> > >
> > > On Fri, 28 Oct 2022 09:50:10 +0800
> > > Jason Wang  wrote:
> > >
> > > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> > > >  wrote:
> > > > >
> > > > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > > > Cindy Lu  wrote:
> > > > >
> > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > > > change the name to memory_get_xlat_addr().So we can use this
> > > > > > function in other devices,such as vDPA device.
> > > > > >
> > > > > > Signed-off-by: Cindy Lu 
> > > > > > ---
> > > > > >  hw/vfio/common.c  | 92 
> > > > > > ++-
> > > > > >  include/exec/memory.h |  4 ++
> > > > > >  softmmu/memory.c  | 84 +++
> > > > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > > > >
> > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > > > --- a/hw/vfio/common.c
> > > > > > +++ b/hw/vfio/common.c
> > > > > > @@ -574,92 +574,6 @@ static bool 
> > > > > > vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > > > section->offset_within_address_space & (1ULL << 63);
> > > > > >  }
> > > > > >
> > > > > > -/* Called with rcu_read_lock held.  */
> > > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > > > -   ram_addr_t *ram_addr, bool 
> > > > > > *read_only)
> > > > > > -{
> > > > > > -MemoryRegion *mr;
> > > > > > -hwaddr xlat;
> > > > > > -hwaddr len = iotlb->addr_mask + 1;
> > > > > > -bool writable = iotlb->perm & IOMMU_WO;
> > > > > > -
> > > > > > -/*
> > > > > > - * The IOMMU TLB entry we have just covers translation through
> > > > > > - * this IOMMU to its immediate target.  We need to translate
> > > > > > - * it the rest of the way through to memory.
> > > > > > - */
> > > > > > -mr = address_space_translate(&address_space_memory,
> > > > > > - iotlb->translated_addr,
> > > > > > - &xlat, &len, writable,
> > > > > > - MEMTXATTRS_UNSPECIFIED);
> > > > > > -if (!memory_region_is_ram(mr)) {
> > > > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > > > - xlat);
> > > > > > -return false;
> > > > > > -} else if (memory_region_has_ram_discard_manager(mr)) {
> > > > > > -RamDiscardManager *rdm = 
> > > > > > memory_region_get_ram_discard_manager(mr);
> > > > > > -MemoryRegionSection tmp = {
> > > > > > -.mr = mr,
> > > > > > -.offset_within_region = xlat,
> > > > > > -.size = int128_make64(len),
> > > > > > -};
> > > > > > -
> > > > > > -/*
> > > > > > - * Malicious VMs can map memory into the IOMMU, which is 
> > > > > > expected
> > > > > > - * to remain discarded. vfio will pin all pages, 
> > > > > > populating memory.
> > > > > > - * Disallow that. vmstate priorities make sure any 
> > > > > > RamDiscardManager
> > > > > > - * were already restored before IOMMUs are restored.
> > > > > > - */
> > > > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > > > -error_report("iommu map to discarded memory (e.g., 
> > > > > > unplugged via"
> > > > > > - " virtio-mem): %"HWADDR_PRIx"",
> > > > > > - iotlb->translated_addr);
> > > > > > -return false;
> > > > > > -}
> > > > > > -
> > > > > > -/*
> > > > > > - * Malicious VMs might trigger discarding of IOMMU-mapped 
> > > > > > memory. The
> > > > > > - * pages will remain pinned inside vfio until unmapped, 
> > > > > > resulting in a
> > > > > > - * higher memory consumption than expected. If memory 
> > > > > > would get
> > > > > > - * populated again later, there would be an inconsistency 
> > > > > > between pages
> > > > > > - * pinned by vfio and pages seen by QEMU. This is the case 
> > > > > > until
> > > > > > - * unmapped from the IOMMU (e.g., during device reset).
> > > > > > - *
> > > > > > - * With malicious guests, we really only care about 
> > > > > > pinning more memory
> > > > > > - * than expected. RLIMIT_MEMLOCK set for the user/process 
> > > > > > can never be
> > > > > > - * exceeded and can be used to mitigate this problem.
> > > > > > - */
> > > > > > -warn_report_once("Using vfio with vIOMMUs and coordinated 
> > > > > > discarding of"
> > > > > > - " RAM (e.g., virtio-mem) works, however, 
> > > > > > malicious"
> > > > > > - " guests can

Re: [PATCH 00/16] hw/9pfs: Add 9pfs support for Windows

2022-10-27 Thread Bin Meng
On Fri, Oct 28, 2022 at 12:31 AM Christian Schoenebeck
 wrote:
>
> On Thursday, October 27, 2022 6:19:27 PM CEST Bin Meng wrote:
> > Hi Christian,
> >
> > On Mon, Oct 24, 2022 at 1:16 PM Bin Meng  wrote:
> > >
> > > At present there is no Windows support for 9p file system.
> > > This series adds initial Windows support for 9p file system.
> > >
> > > 'local' file system backend driver is supported on Windows,
> > > including open, read, write, close, rename, remove, etc.
> > > All security models are supported. The mapped (mapped-xattr)
> > > security model is implemented using NTFS Alternate Data Stream
> > > (ADS) so the 9p export path shall be on an NTFS partition.
> > >
> > > 'synth' driver is adapted for Windows too so that we can now
> > > run qtests on Windows for 9p related regression testing.
> > >
> > > Example command line to test:
> > >
> > >   "-fsdev local,path=c:\msys64,security_model=mapped,id=p9 -device 
> > > virtio-9p-pci,fsdev=p9,mount_tag=p9fs"
> > >
> > >
> > > Bin Meng (5):
> > >   qemu/xattr.h: Exclude  for Windows
> > >   hw/9pfs: Drop unnecessary *xattr wrapper API declarations
> > >   hw/9pfs: Replace the direct call to xxxat() APIs with a wrapper
> > >   hw/9pfs: Introduce an opaque type 9P_FILE_ID
> > >   hw/9pfs: Update P9_FILE_ID to support Windows
> > >
> > > Guohuai Shi (11):
> > >   hw/9pfs: Add missing definitions for Windows
> > >   hw/9pfs: Implement Windows specific utilities functions for 9pfs
> > >   hw/9pfs: Handle current directory offset for Windows
> > >   hw/9pfs: Disable unsupported flags and features for Windows
> > >   hw/9pfs: Update the local fs driver to support Windows
> > >   hw/9pfs: Add Linux error number definition
> > >   hw/9pfs: Translate Windows errno to Linux value
> > >   fsdev: Disable proxy fs driver on Windows
> > >   hw/9pfs: Update synth fs driver for Windows
> > >   tests/qtest: virtio-9p-test: Adapt the case for win32
> > >   meson.build: Turn on virtfs for Windows
> > >
> >
> > With the latest 9p test case refactoring in the mainline, I will have
> > to cherry-pick the following 2 patches in this series, to v6 of
> > "tests/qtest: Enable running qtest on Windows" series [1], in order to
> > get qtest on Windows build successfully.
> >
> > [06/16] hw/9pfs: Add missing definitions for Windows
> > [15/16] tests/qtest: virtio-9p-test: Adapt the case for win32
> >
> > I will include the above 2 patches in the v6 qtest windows support series.
>
> No need to add those patches as they are already being queued separately. Just
> add appropriate tag(s) to the first patch:
>
> Based-on: 

Sure, but I really want to get the qtest windows support patch series
merged first.

I will disable the 9p test for Windows in my v6 then. We will enable
the 9p test later when it lands on the mainline.

>
> I already had a quick look on this version, will try to give feedback
> tomorrow.
>
> This feature won't make it into 7.2 release anyway, so patience please. ;-)
>
> Best regards,
> Christian Schoenebeck
>
> > [1] http://patchwork.ozlabs.org/project/qemu-devel/list/?series=321695

Regards,
Bin



Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Jason Wang
On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson
 wrote:
>
> On Fri, 28 Oct 2022 09:50:10 +0800
> Jason Wang  wrote:
>
> > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
> >  wrote:
> > >
> > > On Thu, 27 Oct 2022 15:40:31 +0800
> > > Cindy Lu  wrote:
> > >
> > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > > change the name to memory_get_xlat_addr().So we can use this
> > > > function in other devices,such as vDPA device.
> > > >
> > > > Signed-off-by: Cindy Lu 
> > > > ---
> > > >  hw/vfio/common.c  | 92 ++-
> > > >  include/exec/memory.h |  4 ++
> > > >  softmmu/memory.c  | 84 +++
> > > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > > >
> > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > index ace9562a9b..2b5a9f3d8d 100644
> > > > --- a/hw/vfio/common.c
> > > > +++ b/hw/vfio/common.c
> > > > @@ -574,92 +574,6 @@ static bool 
> > > > vfio_listener_skipped_section(MemoryRegionSection *section)
> > > > section->offset_within_address_space & (1ULL << 63);
> > > >  }
> > > >
> > > > -/* Called with rcu_read_lock held.  */
> > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > > -   ram_addr_t *ram_addr, bool *read_only)
> > > > -{
> > > > -MemoryRegion *mr;
> > > > -hwaddr xlat;
> > > > -hwaddr len = iotlb->addr_mask + 1;
> > > > -bool writable = iotlb->perm & IOMMU_WO;
> > > > -
> > > > -/*
> > > > - * The IOMMU TLB entry we have just covers translation through
> > > > - * this IOMMU to its immediate target.  We need to translate
> > > > - * it the rest of the way through to memory.
> > > > - */
> > > > -mr = address_space_translate(&address_space_memory,
> > > > - iotlb->translated_addr,
> > > > - &xlat, &len, writable,
> > > > - MEMTXATTRS_UNSPECIFIED);
> > > > -if (!memory_region_is_ram(mr)) {
> > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > > - xlat);
> > > > -return false;
> > > > -} else if (memory_region_has_ram_discard_manager(mr)) {
> > > > -RamDiscardManager *rdm = 
> > > > memory_region_get_ram_discard_manager(mr);
> > > > -MemoryRegionSection tmp = {
> > > > -.mr = mr,
> > > > -.offset_within_region = xlat,
> > > > -.size = int128_make64(len),
> > > > -};
> > > > -
> > > > -/*
> > > > - * Malicious VMs can map memory into the IOMMU, which is 
> > > > expected
> > > > - * to remain discarded. vfio will pin all pages, populating 
> > > > memory.
> > > > - * Disallow that. vmstate priorities make sure any 
> > > > RamDiscardManager
> > > > - * were already restored before IOMMUs are restored.
> > > > - */
> > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > > -error_report("iommu map to discarded memory (e.g., 
> > > > unplugged via"
> > > > - " virtio-mem): %"HWADDR_PRIx"",
> > > > - iotlb->translated_addr);
> > > > -return false;
> > > > -}
> > > > -
> > > > -/*
> > > > - * Malicious VMs might trigger discarding of IOMMU-mapped 
> > > > memory. The
> > > > - * pages will remain pinned inside vfio until unmapped, 
> > > > resulting in a
> > > > - * higher memory consumption than expected. If memory would get
> > > > - * populated again later, there would be an inconsistency 
> > > > between pages
> > > > - * pinned by vfio and pages seen by QEMU. This is the case 
> > > > until
> > > > - * unmapped from the IOMMU (e.g., during device reset).
> > > > - *
> > > > - * With malicious guests, we really only care about pinning 
> > > > more memory
> > > > - * than expected. RLIMIT_MEMLOCK set for the user/process can 
> > > > never be
> > > > - * exceeded and can be used to mitigate this problem.
> > > > - */
> > > > -warn_report_once("Using vfio with vIOMMUs and coordinated 
> > > > discarding of"
> > > > - " RAM (e.g., virtio-mem) works, however, 
> > > > malicious"
> > > > - " guests can trigger pinning of more memory 
> > > > than"
> > > > - " intended via an IOMMU. It's possible to 
> > > > mitigate "
> > > > - " by setting/adjusting RLIMIT_MEMLOCK.");
> > > > -}
> > > > -
> > > > -/*
> > > > - * Translation truncates length to the IOMMU page size,
> > > > - * check that it did not truncate too much.
> > > > - */
> > > > -if (len & iotlb->addr_mask) {
> > > > -error_report("iommu has granularity incompatible with target 
> > > > AS");
> > > > -return false;

RE: [PATCH] memory: Fix wrong end address dump

2022-10-27 Thread Duan, Zhenzhong


>-Original Message-
>From: David Hildenbrand 
>Sent: Friday, July 22, 2022 2:44 PM
>To: Duan, Zhenzhong ; qemu-
>de...@nongnu.org
>Cc: pbonz...@redhat.com; pet...@redhat.com; f4...@amsat.org
>Subject: Re: [PATCH] memory: Fix wrong end address dump
>
>On 22.06.22 11:59, Zhenzhong Duan wrote:
>> The end address of memory region section isn't correctly calculated
>> which leads to overflowed mtree dump:
>>
>>   Dispatch
>> Physical sections
>>   ..
>>   #70 @2000..00011fff io [ROOT]
>>   #71 @5000..5fff (noname)
>>   #72 @5000..00014fff io [ROOT]
>>   #73 @5658..5658 vmport
>>   #74 @5659..00015658 io [ROOT]
>>   #75 @6000..00015fff io [ROOT]
>>
>> After fix:
>>   #70 @2000..4fff io [ROOT]
>>   #71 @5000..5fff (noname)
>>   #72 @5000..5657 io [ROOT]
>>   #73 @5658..5658 vmport
>>   #74 @5659..5fff io [ROOT]
>>   #75 @6000.. io [ROOT]
>>
>> Fixes: 5e8fd947e2670 ("memory: Rework "info mtree" to print flat views
>> and dispatch trees")
>> Signed-off-by: Zhenzhong Duan 
>> ---
>>  softmmu/physmem.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c index
>> 214cb04c8fc3..cbabd10ac0bf 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -3701,7 +3701,7 @@ void mtree_print_dispatch(AddressSpaceDispatch
>*d, MemoryRegion *root)
>>  " %s%s%s%s%s",
>>  i,
>>  s->offset_within_address_space,
>> -s->offset_within_address_space + MR_SIZE(s->mr->size),
>> +s->offset_within_address_space + MR_SIZE(s->size),
>>  s->mr->name ? s->mr->name : "(noname)",
>>  i < ARRAY_SIZE(names) ? names[i] : "",
>>  s->mr == root ? " [ROOT]" : "",
>
>Reviewed-by: David Hildenbrand 
>
>I assume this should get picked up soonish.
Hi Maintainers,

Can this patch be considered merged as it got reviewed-by and no objection for 
a long time. Thanks.

Zhenzhong


Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Alex Williamson
On Fri, 28 Oct 2022 09:50:10 +0800
Jason Wang  wrote:

> On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
>  wrote:
> >
> > On Thu, 27 Oct 2022 15:40:31 +0800
> > Cindy Lu  wrote:
> >  
> > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > > change the name to memory_get_xlat_addr().So we can use this
> > > function in other devices,such as vDPA device.
> > >
> > > Signed-off-by: Cindy Lu 
> > > ---
> > >  hw/vfio/common.c  | 92 ++-
> > >  include/exec/memory.h |  4 ++
> > >  softmmu/memory.c  | 84 +++
> > >  3 files changed, 92 insertions(+), 88 deletions(-)
> > >
> > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > index ace9562a9b..2b5a9f3d8d 100644
> > > --- a/hw/vfio/common.c
> > > +++ b/hw/vfio/common.c
> > > @@ -574,92 +574,6 @@ static bool 
> > > vfio_listener_skipped_section(MemoryRegionSection *section)
> > > section->offset_within_address_space & (1ULL << 63);
> > >  }
> > >
> > > -/* Called with rcu_read_lock held.  */
> > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > > -   ram_addr_t *ram_addr, bool *read_only)
> > > -{
> > > -MemoryRegion *mr;
> > > -hwaddr xlat;
> > > -hwaddr len = iotlb->addr_mask + 1;
> > > -bool writable = iotlb->perm & IOMMU_WO;
> > > -
> > > -/*
> > > - * The IOMMU TLB entry we have just covers translation through
> > > - * this IOMMU to its immediate target.  We need to translate
> > > - * it the rest of the way through to memory.
> > > - */
> > > -mr = address_space_translate(&address_space_memory,
> > > - iotlb->translated_addr,
> > > - &xlat, &len, writable,
> > > - MEMTXATTRS_UNSPECIFIED);
> > > -if (!memory_region_is_ram(mr)) {
> > > -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > > - xlat);
> > > -return false;
> > > -} else if (memory_region_has_ram_discard_manager(mr)) {
> > > -RamDiscardManager *rdm = 
> > > memory_region_get_ram_discard_manager(mr);
> > > -MemoryRegionSection tmp = {
> > > -.mr = mr,
> > > -.offset_within_region = xlat,
> > > -.size = int128_make64(len),
> > > -};
> > > -
> > > -/*
> > > - * Malicious VMs can map memory into the IOMMU, which is expected
> > > - * to remain discarded. vfio will pin all pages, populating 
> > > memory.
> > > - * Disallow that. vmstate priorities make sure any 
> > > RamDiscardManager
> > > - * were already restored before IOMMUs are restored.
> > > - */
> > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > > -error_report("iommu map to discarded memory (e.g., unplugged 
> > > via"
> > > - " virtio-mem): %"HWADDR_PRIx"",
> > > - iotlb->translated_addr);
> > > -return false;
> > > -}
> > > -
> > > -/*
> > > - * Malicious VMs might trigger discarding of IOMMU-mapped 
> > > memory. The
> > > - * pages will remain pinned inside vfio until unmapped, 
> > > resulting in a
> > > - * higher memory consumption than expected. If memory would get
> > > - * populated again later, there would be an inconsistency 
> > > between pages
> > > - * pinned by vfio and pages seen by QEMU. This is the case until
> > > - * unmapped from the IOMMU (e.g., during device reset).
> > > - *
> > > - * With malicious guests, we really only care about pinning more 
> > > memory
> > > - * than expected. RLIMIT_MEMLOCK set for the user/process can 
> > > never be
> > > - * exceeded and can be used to mitigate this problem.
> > > - */
> > > -warn_report_once("Using vfio with vIOMMUs and coordinated 
> > > discarding of"
> > > - " RAM (e.g., virtio-mem) works, however, 
> > > malicious"
> > > - " guests can trigger pinning of more memory 
> > > than"
> > > - " intended via an IOMMU. It's possible to 
> > > mitigate "
> > > - " by setting/adjusting RLIMIT_MEMLOCK.");
> > > -}
> > > -
> > > -/*
> > > - * Translation truncates length to the IOMMU page size,
> > > - * check that it did not truncate too much.
> > > - */
> > > -if (len & iotlb->addr_mask) {
> > > -error_report("iommu has granularity incompatible with target 
> > > AS");
> > > -return false;
> > > -}
> > > -
> > > -if (vaddr) {
> > > -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > > -}
> > > -
> > > -if (ram_addr) {
> > > -*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > > -}
> > > -
> > > -if (read_only) {
> > > -*read_only = 

Re: [PATCH V4 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 9:16 PM Yi Liu  wrote:
>
> On 2022/10/27 15:50, Jason Wang wrote:
> > We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
> > internal goto which prevents it from being reused. This patch convert
> > that macro to a dedicated function and let the caller to decide what
> > to do (e.g using goto or not). This makes sure it can be re-used for
> > other function that requires fault reporting.
> >
> > Signed-off-by: Jason Wang 
> > ---
> > Changes since V2:
> > - rename vtd_qualify_report_fault() to vtd_report_qualify_fault()
> > ---
> >   hw/i386/intel_iommu.c | 42 --
> >   1 file changed, 28 insertions(+), 14 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 6abe12a8c5..6c03ecf3cb 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -49,17 +49,6 @@
> >   /* pe operations */
> >   #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT)
> >   #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & 
> > VTD_SM_PASID_ENTRY_AW))
> > -#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
> > is_write) {\
> > -if (ret_fr) {  
> >\
> > -ret_fr = -ret_fr;  
> >\
> > -if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) {
> >\
> > -trace_vtd_fault_disabled();
> >\
> > -} else {   
> >\
> > -vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write);   
> >\
> > -}  
> >\
> > -goto error;
> >\
> > -}  
> >\
> > -}
> >
> >   /*
> >* PCI bus number (or SID) is not reliable since the device is usaully
> > @@ -1718,6 +1707,19 @@ out:
> >   trace_vtd_pt_enable_fast_path(source_id, success);
> >   }
> >
> > +static void vtd_report_qualify_fault(IntelIOMMUState *s,
> > + int err, bool is_fpd_set,
> > + uint16_t source_id,
> > + hwaddr addr,
> > + bool is_write)
> > +{
> > +if (is_fpd_set && vtd_is_qualified_fault(err)) {
> > +trace_vtd_fault_disabled();
> > +} else {
> > +vtd_report_dmar_fault(s, source_id, addr, err, is_write);
>
> seems like this will report non-qualified fault. so the naming is not
> most suit. :-) Otherwise, I'm ok with the change.

Right, let me rename it to vtd_report_fault().

Thanks

>
> > +}
> > +}
> > +
> >   /* Map dev to context-entry then do a paging-structures walk to do a iommu
> >* translation.
> >*
> > @@ -1778,7 +1780,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> > *vtd_as, PCIBus *bus,
> >   is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD;
> >   if (!is_fpd_set && s->root_scalable) {
> >   ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
> > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
> > is_write);
> > +if (ret_fr) {
> > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
> > + source_id, addr, is_write);
> > +goto error;
> > +}
> >   }
> >   } else {
> >   ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce);
> > @@ -1786,7 +1792,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> > *vtd_as, PCIBus *bus,
> >   if (!ret_fr && !is_fpd_set && s->root_scalable) {
> >   ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set);
> >   }
> > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, 
> > is_write);
> > +if (ret_fr) {
> > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set,
> > + source_id, addr, is_write);
> > +goto error;
> > +}
> >   /* Update context-cache */
> >   trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo,
> > cc_entry->context_cache_gen,
> > @@ -1822,7 +1832,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace 
> > *vtd_as, PCIBus *bus,
> >
> >   ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
> >  &reads, &writes, s->aw_bits);
> > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write);
> > +if (ret_fr) {
> > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, source_id,
> > + addr, is_write);
> > +goto error;
> > +}
> >
> >   page_mask = 

Re: [PATCH V4 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 9:20 PM Yi Liu  wrote:
>
> On 2022/10/27 15:50, Jason Wang wrote:
> > We use to warn on wrong rid2pasid entry. But this error could be
> > triggered by the guest and could happens during initialization. So
> > let's don't warn in this case.
> >
> > Signed-off-by: Jason Wang 
> > ---
> >   hw/i386/intel_iommu.c | 6 --
> >   1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> > index 6524c2ee32..796f924c06 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, 
> > VTDContextEntry *ce)
> >   if (s->root_scalable) {
> >   ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe);
> >   if (ret) {
>
> ret is no more used in this branch. It may be changed to below. right?

Right.

>
>  if (vtd_ce_get_rid2pasid_entry(s, ce, &pe)) {
>  ...
>  }
>
> > -error_report_once("%s: vtd_ce_get_rid2pasid_entry error: 
> > %"PRId32,
> > -  __func__, ret);
> > +/*
> > + * This error is guest triggerable. We should assumt PT
>
> s/triggerable/trigger-able
> s/assumt/assume

Fixed.

Thanks

>
> > + * not enabled for safety.
> > + */
> >   return false;
> >   }
> >   return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT);
>
> --
> Regards,
> Yi Liu
>




Re: [PATCH] net: improve error message for missing netdev backend

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 6:52 PM Daniel P. Berrangé  wrote:
>
> ping: Jason, are you willing to queue this since it has two
> positive reviews.
>

Yes, I've queued this.

Thanks

> On Mon, Oct 03, 2022 at 11:06:12AM +0100, Daniel P. Berrangé wrote:
> > The current message when using '-net user...' with SLIRP disabled at
> > compile time is:
> >
> >   qemu-system-x86_64: -net user: Parameter 'type' expects a net backend 
> > type (maybe it is not compiled into this binary)
> >
> > An observation is that we're using the 'netdev->type' field here which
> > is an enum value, produced after QAPI has converted from its string
> > form.
> >
> > IOW, at this point in the code, we know that the user's specified
> > type name was a valid network backend. The only possible scenario that
> > can make the backend init function be NULL, is if support for that
> > backend was disabled at build time. Given this, we don't need to caveat
> > our error message with a 'maybe' hint, we can be totally explicit.
> >
> > The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to
> > user friendly error message text. Since this is not used to set a
> > specific QAPI error class, we can simply stop using this pre-formatted
> > error text and provide something better.
> >
> > Thus the new message is:
> >
> >   qemu-system-x86_64: -net user: network backend 'user' is not compiled 
> > into this binary
> >
> > The case of passing 'hubport' for -net is also given a message reminding
> > people they should have used -netdev/-nic instead, as this backend type
> > is only valid for the modern syntax.
> >
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >
> > NB, this does not make any difference to people who were relying on the
> > QEMU built-in default hub that was created if you don't list any -net /
> > -netdev / -nic argument, only those using explicit args.
> >
> >  net/net.c | 18 +++---
> >  1 file changed, 11 insertions(+), 7 deletions(-)
> >
> > diff --git a/net/net.c b/net/net.c
> > index 2db160e063..8ddafacf13 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -1036,19 +1036,23 @@ static int net_client_init1(const Netdev *netdev, 
> > bool is_netdev, Error **errp)
> >  if (is_netdev) {
> >  if (netdev->type == NET_CLIENT_DRIVER_NIC ||
> >  !net_client_init_fun[netdev->type]) {
> > -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
> > -   "a netdev backend type");
> > +error_setg(errp, "network backend '%s' is not compiled into 
> > this binary",
> > +   NetClientDriver_str(netdev->type));
> >  return -1;
> >  }
> >  } else {
> >  if (netdev->type == NET_CLIENT_DRIVER_NONE) {
> >  return 0; /* nothing to do */
> >  }
> > -if (netdev->type == NET_CLIENT_DRIVER_HUBPORT ||
> > -!net_client_init_fun[netdev->type]) {
> > -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type",
> > -   "a net backend type (maybe it is not compiled "
> > -   "into this binary)");
> > +if (netdev->type == NET_CLIENT_DRIVER_HUBPORT) {
> > +error_setg(errp, "network backend '%s' is only supported with 
> > -netdev/-nic",
> > +   NetClientDriver_str(netdev->type));
> > +return -1;
> > +}
> > +
> > +if (!net_client_init_fun[netdev->type]) {
> > +error_setg(errp, "network backend '%s' is not compiled into 
> > this binary",
> > +   NetClientDriver_str(netdev->type));
> >  return -1;
> >  }
> >
> > --
> > 2.37.3
> >
>
> With regards,
> Daniel
> --
> |: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
>




Re: [PATCH 3/3] vdpa: Expose VIRTIO_NET_F_STATUS unconditionally

2022-10-27 Thread Jason Wang
On Thu, Oct 27, 2022 at 6:18 PM Eugenio Perez Martin
 wrote:
>
> On Thu, Oct 27, 2022 at 8:54 AM Jason Wang  wrote:
> >
> > On Thu, Oct 27, 2022 at 2:47 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Thu, Oct 27, 2022 at 6:32 AM Jason Wang  wrote:
> > > >
> > > >
> > > > 在 2022/10/26 17:53, Eugenio Pérez 写道:
> > > > > Now that qemu can handle and emulate it if the vdpa backend does not
> > > > > support it we can offer it always.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > >
> > > >
> > > > I may miss something but isn't more easier to simply remove the
> > > > _F_STATUS from vdpa_feature_bits[]?
> > > >
> > >
> > > How is that? if we remove it, the guest cannot ack it so it cannot
> > > access the net status, isn't it?
> >
> > My understanding is that the bits stored in the vdpa_feature_bits[]
> > are the features that must be explicitly supported by the vhost
> > device.
>
> (Non English native here, so maybe I don't get what you mean :) ) The
> device may not support them. net simulator lacks some of them
> actually, and it works.

Speaking too fast, I think I meant that, if the bit doesn't belong to
vdpa_feature_bits[], it is assumed to be supported by the Qemu without
the support of the vhost. So Qemu won't even try to validate if vhost
has this support. E.g for vhost-net, we only have:

static const int kernel_feature_bits[] = {
VIRTIO_F_NOTIFY_ON_EMPTY,
VIRTIO_RING_F_INDIRECT_DESC,
VIRTIO_RING_F_EVENT_IDX,
VIRTIO_NET_F_MRG_RXBUF,
VIRTIO_F_VERSION_1,
VIRTIO_NET_F_MTU,
VIRTIO_F_IOMMU_PLATFORM,
VIRTIO_F_RING_PACKED,
VIRTIO_NET_F_HASH_REPORT,
VHOST_INVALID_FEATURE_BIT
};

You can see there's no STATUS bit there since it is emulated by Qemu.

>
> From what I see these are the only features that will be forwarded to
> the guest as device_features. If it is not in the list, the feature
> will be masked out,

Only when there's no support for this feature from the vhost.

> as if the device does not support it.
>
> So now _F_STATUS it was forwarded only if the device supports it. If
> we remove it from bit_mask, it will never be offered to the guest. But
> we want to offer it always, since we will need it for
> _F_GUEST_ANNOUNCE.
>
> Things get more complex because we actually need to ack it back if the
> device offers it, so the vdpa device can report link_down. We will
> only emulate LINK_UP always in the case the device does not support
> _F_STATUS.
>
> > So if we remove _F_STATUS, Qemu vhost code won't validate if
> > vhost-vdpa device has this support:
> >
> > uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits,
> > uint64_t features)
> > {
> > const int *bit = feature_bits;
> > while (*bit != VHOST_INVALID_FEATURE_BIT) {
> > uint64_t bit_mask = (1ULL << *bit);
> > if (!(hdev->features & bit_mask)) {
> > features &= ~bit_mask;
> > }
> > bit++;
> > }
> > return features;
> > }
> >
>
> Now maybe I'm the one missing something, but why is this not done as a
> masking directly?

Not sure, the code has been there since day 0.

But you can see from the code:

1) if STATUS is in feature_bits, we need validate the hdev->features
and mask it if the vhost doesn't have the support
2) if STATUS is not, we don't do the check and driver may still see STATUS

Thanks

>
> Instead of making feature_bits an array of ints, to declare it as a
> uint64_t with the valid feature bits and simply return features &
> feature_bits.
>
> Thanks!
>
> > Thanks
> >
> >
> >
> > >
> > > The goal with this patch series is to let the guest access the status
> > > always, even if the device doesn't support _F_STATUS.
> > >
> > > > Thanks
> > > >
> > > >
> > > > > ---
> > > > >   include/net/vhost-vdpa.h |  1 +
> > > > >   hw/net/vhost_net.c   | 16 ++--
> > > > >   net/vhost-vdpa.c |  3 +++
> > > > >   3 files changed, 18 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h
> > > > > index b81f9a6f2a..cfbcce6427 100644
> > > > > --- a/include/net/vhost-vdpa.h
> > > > > +++ b/include/net/vhost-vdpa.h
> > > > > @@ -17,5 +17,6 @@
> > > > >   struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc);
> > > > >
> > > > >   extern const int vdpa_feature_bits[];
> > > > > +extern const uint64_t vhost_vdpa_net_added_feature_bits;
> > > > >
> > > > >   #endif /* VHOST_VDPA_H */
> > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> > > > > index d28f8b974b..7c15cc6e8f 100644
> > > > > --- a/hw/net/vhost_net.c
> > > > > +++ b/hw/net/vhost_net.c
> > > > > @@ -109,10 +109,22 @@ static const int 
> > > > > *vhost_net_get_feature_bits(struct vhost_net *net)
> > > > >   return feature_bits;
> > > > >   }
> > > > >
> > > > > +static uint64_t vhost_net_add_feature_bits(struct vhost_net *net)
> > > > > +{
> > > > > +if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
> > > > > +  

Re: [PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-27 Thread Jason Wang
On Fri, Oct 28, 2022 at 5:56 AM Si-Wei Liu  wrote:
>
> Hi Jason,
>
> Sorry for top posting, but are you going to queue this patch? It looks
> like the discussion has been settled and no further comment I got for 2
> weeks for this patch.

Yes, I've queued this.

Thanks

>
> Thanks,
> -Siwei
>
> On 10/13/2022 4:12 PM, Si-Wei Liu wrote:
> > Jason,
> >
> > On 10/12/2022 10:02 PM, Jason Wang wrote:
> >>
> >> 在 2022/10/12 13:59, Si-Wei Liu 写道:
> >>>
> >>>
> >>> On 10/11/2022 8:09 PM, Jason Wang wrote:
>  On Tue, Oct 11, 2022 at 1:18 AM Si-Wei Liu
>  wrote:
> > On 10/8/2022 10:43 PM, Jason Wang wrote:
> >
> > On Sat, Oct 8, 2022 at 5:04 PM Si-Wei Liu
> > wrote:
> >
> > Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
> > backend as another parameter to instantiate vhost-vdpa net client.
> > This would benefit the use case where only open file descriptors, as
> > opposed to raw vhost-vdpa device paths, are accessible from the QEMU
> > process.
> >
> > (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1
> >
> > Adding Cindy.
> >
> > This has been discussed before, we've already had
> > vhostdev=/dev/fdset/$fd which should be functional equivalent to what
> > has been proposed here. (And this is how libvirt works if I
> > understand
> > correctly).
> >
> > Yes, I was aware of that discussion. However, our implementation
> > of the management software is a bit different from libvirt, in
> > which the paths in /dev/fdset/NNN can't be dynamically passed to
> > the container where QEMU is running. By using a specific vhostfd
> > property with existing code, it would allow our mgmt software
> > smooth adaption without having to add too much infra code to
> > support the /dev/fdset/NNN trick.
>  I think fdset has extra flexibility in e.g hot-plug to allow the file
>  descriptor to be passed with SCM_RIGHTS.
> >>> Yes, that's exactly the use case we'd like to support. Though the
> >>> difference in our mgmt software stack from libvirt is that any
> >>> dynamic path in /dev (like /dev/fdset/ABC or /dev/vhost-vdpa-XYZ)
> >>> can't be allowed to get passed through to the container running QEMU
> >>> on the fly for security reasons. fd passing is allowed, though, with
> >>> very strict security checks.
> >>
> >>
> >> Interesting, any reason for disallowing fd passing?
> > For our mgmt software stack, QEMU is running in a secured container
> > with its own namespace(s) with minimally well known and trusted
> > devices from root ns exposed (only) at the time when QEMU is being
> > started.  Direct fd passing via SCM_RIGHTS is allowed, but fdset
> > device node exposure is not allowed and not even considered useful to
> > us, as it adds an unwarranted attack surface to the QEMU's secured
> > container unnecessarily. This has been the case and our security model
> > for a while now w.r.t hot plugging vhost-net/tap and vhost-scsi
> > devices, so will do for vhost-vdpa with vhostfd. It's not an open
> > source project, though what I can share is that it's not a simple
> > script that can be easily changed, and allow passing extra devices
> > e.g. fdset especially on the fly is not even in consideration per
> > suggested security guideline. I think we don't do anything special
> > here as with other secured containers that disallow dynamic device
> > injection on the fly.
> >
> >> I'm asking since it's the way that libvirt work and it seems to me we
> >> don't get any complaints in the past.
> > I guess it was because libvirt doesn't run QEMU in a container with
> > very limited device exposure, otherwise this sort of constraints would
> > pop up. Anyway the point and the way I see it is that passing vhostfd
> > is proved to be working well and secure with other vhost devices, I
> > don't see why vhost-vdpa is treated special here that would need to
> > enforce the fdset usage. It's an edge case for libvirt maybe, but
> > supporting QEMU's vhost-vdpa device to run in a securely contained
> > environment with no dynamic device injection shouldn't be an odd or
> > bizarre use case.
> >
> >
> > Thanks,
> > -Siwei
> >
> >>
> >>
> >>> That's the main motivation for this direct vhostfd passing support
> >>> (noted fdset doesn't need to be used along with /dev/fdset node).
> >>>
> >>> Having it said, I found there's also nuance in the
> >>> vhostdev=/dev/fdset/XyZ interface besides the /dev node limitation:
> >>> the fd to open has to be dup'ed from the original one passed via
> >>> SCM_RIGHTS. This also has implication on security that any ioctl
> >>> call from QEMU can't be audited through the original fd.
> >>
> >>
> >> I'm not sure I get this, but management layer can enforce a ioctl
> >> whiltelist for safety.
> >>
> >> Thanks
> >>
> >>
> >>> With this regard, I think vhostfd offers more flexibility than work
> >>> around those qemu_open() specifics. Would these justify the use cas

Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Jason Wang
On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson
 wrote:
>
> On Thu, 27 Oct 2022 15:40:31 +0800
> Cindy Lu  wrote:
>
> > Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> > change the name to memory_get_xlat_addr().So we can use this
> > function in other devices,such as vDPA device.
> >
> > Signed-off-by: Cindy Lu 
> > ---
> >  hw/vfio/common.c  | 92 ++-
> >  include/exec/memory.h |  4 ++
> >  softmmu/memory.c  | 84 +++
> >  3 files changed, 92 insertions(+), 88 deletions(-)
> >
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index ace9562a9b..2b5a9f3d8d 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -574,92 +574,6 @@ static bool 
> > vfio_listener_skipped_section(MemoryRegionSection *section)
> > section->offset_within_address_space & (1ULL << 63);
> >  }
> >
> > -/* Called with rcu_read_lock held.  */
> > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > -   ram_addr_t *ram_addr, bool *read_only)
> > -{
> > -MemoryRegion *mr;
> > -hwaddr xlat;
> > -hwaddr len = iotlb->addr_mask + 1;
> > -bool writable = iotlb->perm & IOMMU_WO;
> > -
> > -/*
> > - * The IOMMU TLB entry we have just covers translation through
> > - * this IOMMU to its immediate target.  We need to translate
> > - * it the rest of the way through to memory.
> > - */
> > -mr = address_space_translate(&address_space_memory,
> > - iotlb->translated_addr,
> > - &xlat, &len, writable,
> > - MEMTXATTRS_UNSPECIFIED);
> > -if (!memory_region_is_ram(mr)) {
> > -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> > - xlat);
> > -return false;
> > -} else if (memory_region_has_ram_discard_manager(mr)) {
> > -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> > -MemoryRegionSection tmp = {
> > -.mr = mr,
> > -.offset_within_region = xlat,
> > -.size = int128_make64(len),
> > -};
> > -
> > -/*
> > - * Malicious VMs can map memory into the IOMMU, which is expected
> > - * to remain discarded. vfio will pin all pages, populating memory.
> > - * Disallow that. vmstate priorities make sure any 
> > RamDiscardManager
> > - * were already restored before IOMMUs are restored.
> > - */
> > -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> > -error_report("iommu map to discarded memory (e.g., unplugged 
> > via"
> > - " virtio-mem): %"HWADDR_PRIx"",
> > - iotlb->translated_addr);
> > -return false;
> > -}
> > -
> > -/*
> > - * Malicious VMs might trigger discarding of IOMMU-mapped memory. 
> > The
> > - * pages will remain pinned inside vfio until unmapped, resulting 
> > in a
> > - * higher memory consumption than expected. If memory would get
> > - * populated again later, there would be an inconsistency between 
> > pages
> > - * pinned by vfio and pages seen by QEMU. This is the case until
> > - * unmapped from the IOMMU (e.g., during device reset).
> > - *
> > - * With malicious guests, we really only care about pinning more 
> > memory
> > - * than expected. RLIMIT_MEMLOCK set for the user/process can 
> > never be
> > - * exceeded and can be used to mitigate this problem.
> > - */
> > -warn_report_once("Using vfio with vIOMMUs and coordinated 
> > discarding of"
> > - " RAM (e.g., virtio-mem) works, however, 
> > malicious"
> > - " guests can trigger pinning of more memory than"
> > - " intended via an IOMMU. It's possible to 
> > mitigate "
> > - " by setting/adjusting RLIMIT_MEMLOCK.");
> > -}
> > -
> > -/*
> > - * Translation truncates length to the IOMMU page size,
> > - * check that it did not truncate too much.
> > - */
> > -if (len & iotlb->addr_mask) {
> > -error_report("iommu has granularity incompatible with target AS");
> > -return false;
> > -}
> > -
> > -if (vaddr) {
> > -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > -}
> > -
> > -if (ram_addr) {
> > -*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > -}
> > -
> > -if (read_only) {
> > -*read_only = !writable || mr->readonly;
> > -}
> > -
> > -return true;
> > -}
> > -
> >  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> >  {
> >  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> > IOM

[PATCH v2 0/3] Improve FDT and support TPM for LoongArch

2022-10-27 Thread Xiaojuan Yang
This series load FDT table into dram memory space
and add uart,rtc info into FDT, Add TPM device for
LoongArch virt machine.

Changes for v2: 
1. Modify the title of the first patch("Change FDT base")
   to "Load FDT table into dram memory space".

Changes for v1: 
1. Change FDT base addr to 2 MiB.
2. Add uart and rtc info into FDT.
3. Add TPM device support.

Xiaojuan Yang (3):
  hw/loongarch: Load FDT table into dram memory space
  hw/loongarch: Improve fdt for LoongArch virt machine
  hw/loongarch: Add TPM device for LoongArch virt machine

 hw/loongarch/acpi-build.c   | 50 --
 hw/loongarch/virt.c | 53 -
 include/hw/loongarch/virt.h |  3 ---
 include/hw/pci-host/ls7a.h  |  1 +
 4 files changed, 95 insertions(+), 12 deletions(-)

-- 
2.31.1




[PATCH v2 3/3] hw/loongarch: Add TPM device for LoongArch virt machine

2022-10-27 Thread Xiaojuan Yang
Add TPM device for LoongArch virt machine, including
establish TPM acpi info and add TYPE_TPM_TIS_SYSBUS
to dynamic_sysbus_devices list.

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/acpi-build.c | 50 +--
 hw/loongarch/virt.c   |  4 
 2 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c
index 378a6d9d38..1d0e562435 100644
--- a/hw/loongarch/acpi-build.c
+++ b/hw/loongarch/acpi-build.c
@@ -31,6 +31,9 @@
 
 #include "hw/acpi/generic_event_device.h"
 #include "hw/pci-host/gpex.h"
+#include "sysemu/tpm.h"
+#include "hw/platform-bus.h"
+#include "hw/acpi/aml-build.h"
 
 #define ACPI_BUILD_ALIGN_SIZE 0x1000
 #define ACPI_BUILD_TABLE_SIZE 0x2
@@ -275,6 +278,41 @@ static void build_pci_device_aml(Aml *scope, 
LoongArchMachineState *lams)
 acpi_dsdt_add_gpex(scope, &cfg);
 }
 
+#ifdef CONFIG_TPM
+static void acpi_dsdt_add_tpm(Aml *scope, LoongArchMachineState *vms)
+{
+PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev);
+hwaddr pbus_base = VIRT_PLATFORM_BUS_BASEADDRESS;
+SysBusDevice *sbdev = SYS_BUS_DEVICE(tpm_find());
+MemoryRegion *sbdev_mr;
+hwaddr tpm_base;
+
+if (!sbdev) {
+return;
+}
+
+tpm_base = platform_bus_get_mmio_addr(pbus, sbdev, 0);
+assert(tpm_base != -1);
+
+tpm_base += pbus_base;
+
+sbdev_mr = sysbus_mmio_get_region(sbdev, 0);
+
+Aml *dev = aml_device("TPM0");
+aml_append(dev, aml_name_decl("_HID", aml_string("MSFT0101")));
+aml_append(dev, aml_name_decl("_STR", aml_string("TPM 2.0 Device")));
+aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+
+Aml *crs = aml_resource_template();
+aml_append(crs,
+   aml_memory32_fixed(tpm_base,
+  (uint32_t)memory_region_size(sbdev_mr),
+  AML_READ_WRITE));
+aml_append(dev, aml_name_decl("_CRS", crs));
+aml_append(scope, dev);
+}
+#endif
+
 /* build DSDT */
 static void
 build_dsdt(GArray *table_data, BIOSLinker *linker, MachineState *machine)
@@ -289,7 +327,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_uart_device_aml(dsdt);
 build_pci_device_aml(dsdt, lams);
 build_la_ged_aml(dsdt, machine);
-
+#ifdef CONFIG_TPM
+acpi_dsdt_add_tpm(dsdt, lams);
+#endif
 /* System State Package */
 scope = aml_scope("\\");
 pkg = aml_package(4);
@@ -358,7 +398,13 @@ static void acpi_build(AcpiBuildTables *tables, 
MachineState *machine)
 build_mcfg(tables_blob, tables->linker, &mcfg, lams->oem_id,
lams->oem_table_id);
 }
-
+/* TPM info */
+if (tpm_get_version(tpm_find()) == TPM_VERSION_2_0) {
+acpi_add_table(table_offsets, tables_blob);
+build_tpm2(tables_blob, tables->linker,
+   tables->tcpalog, lams->oem_id,
+   lams->oem_table_id);
+}
 /* Add tables supplied by user (if any) */
 for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
 unsigned len = acpi_table_len(u);
diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index eed9d591e7..c1612d5e05 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -41,6 +41,7 @@
 #include "hw/platform-bus.h"
 #include "hw/display/ramfb.h"
 #include "hw/mem/pc-dimm.h"
+#include "sysemu/tpm.h"
 
 static void fdt_add_rtc_node(LoongArchMachineState *lams)
 {
@@ -993,6 +994,9 @@ static void loongarch_class_init(ObjectClass *oc, void 
*data)
 object_class_property_set_description(oc, "acpi",
 "Enable ACPI");
 machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE);
+#ifdef CONFIG_TPM
+machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS);
+#endif
 }
 
 static const TypeInfo loongarch_machine_types[] = {
-- 
2.31.1




[PATCH v2 2/3] hw/loongarch: Improve fdt for LoongArch virt machine

2022-10-27 Thread Xiaojuan Yang
Add new items into LoongArch FDT, including rtc and uart info.

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/virt.c| 31 +++
 include/hw/pci-host/ls7a.h |  1 +
 2 files changed, 32 insertions(+)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index fe33e7e3e4..eed9d591e7 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -42,6 +42,35 @@
 #include "hw/display/ramfb.h"
 #include "hw/mem/pc-dimm.h"
 
+static void fdt_add_rtc_node(LoongArchMachineState *lams)
+{
+char *nodename;
+hwaddr base = VIRT_RTC_REG_BASE;
+hwaddr size = VIRT_RTC_LEN;
+MachineState *ms = MACHINE(lams);
+
+nodename = g_strdup_printf("/rtc@%" PRIx64, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", 
"loongson,ls7a-rtc");
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", 0x0, base, size);
+g_free(nodename);
+}
+
+static void fdt_add_uart_node(LoongArchMachineState *lams)
+{
+char *nodename;
+hwaddr base = VIRT_UART_BASE;
+hwaddr size = VIRT_UART_SIZE;
+MachineState *ms = MACHINE(lams);
+
+nodename = g_strdup_printf("/serial@%" PRIx64, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", "ns16550a");
+qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0x0, base, 0x0, size);
+qemu_fdt_setprop_cell(ms->fdt, nodename, "clock-frequency", 1);
+g_free(nodename);
+}
+
 static void create_fdt(LoongArchMachineState *lams)
 {
 MachineState *ms = MACHINE(lams);
@@ -422,6 +451,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
qdev_get_gpio_in(pch_pic,
 VIRT_UART_IRQ - PCH_PIC_IRQ_OFFSET),
115200, serial_hd(0), DEVICE_LITTLE_ENDIAN);
+fdt_add_uart_node(lams);
 
 /* Network init */
 for (i = 0; i < nb_nics; i++) {
@@ -442,6 +472,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, 
LoongArchMachineState *
 sysbus_create_simple("ls7a_rtc", VIRT_RTC_REG_BASE,
  qdev_get_gpio_in(pch_pic,
  VIRT_RTC_IRQ - PCH_PIC_IRQ_OFFSET));
+fdt_add_rtc_node(lams);
 
 pm_mem = g_new(MemoryRegion, 1);
 memory_region_init_io(pm_mem, NULL, &loongarch_virt_pm_ops,
diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h
index 9bd875ca8b..df7fa55a30 100644
--- a/include/hw/pci-host/ls7a.h
+++ b/include/hw/pci-host/ls7a.h
@@ -37,6 +37,7 @@
 #define VIRT_PCI_IRQS48
 #define VIRT_UART_IRQ(PCH_PIC_IRQ_OFFSET + 2)
 #define VIRT_UART_BASE   0x1fe001e0
+#define VIRT_UART_SIZE   0X100
 #define VIRT_RTC_IRQ (PCH_PIC_IRQ_OFFSET + 3)
 #define VIRT_MISC_REG_BASE   (VIRT_PCH_REG_BASE + 0x0008)
 #define VIRT_RTC_REG_BASE(VIRT_MISC_REG_BASE + 0x00050100)
-- 
2.31.1




[PATCH v2 1/3] hw/loongarch: Load FDT table into dram memory space

2022-10-27 Thread Xiaojuan Yang
Load FDT table into dram memory space, and the addr is 2 MiB.
Since lowmem region starts from 0, FDT base address is located
at 2 MiB to avoid NULL pointer access.

Signed-off-by: Xiaojuan Yang 
---
 hw/loongarch/virt.c | 18 +++---
 include/hw/loongarch/virt.h |  3 ---
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c
index 29df99727d..fe33e7e3e4 100644
--- a/hw/loongarch/virt.c
+++ b/hw/loongarch/virt.c
@@ -159,7 +159,6 @@ static void fdt_add_pcie_node(const LoongArchMachineState 
*lams)
  1, FDT_PCI_RANGE_MMIO, 2, base_mmio,
  2, base_mmio, 2, size_mmio);
 g_free(nodename);
-qemu_fdt_dumpdtb(ms->fdt, lams->fdt_size);
 }
 
 static void fdt_add_irqchip_node(LoongArchMachineState *lams)
@@ -689,6 +688,7 @@ static void loongarch_init(MachineState *machine)
 MemoryRegion *address_space_mem = get_system_memory();
 LoongArchMachineState *lams = LOONGARCH_MACHINE(machine);
 int i;
+hwaddr fdt_base;
 
 if (!cpu_model) {
 cpu_model = LOONGARCH_CPU_TYPE_NAME("la464");
@@ -793,12 +793,16 @@ static void loongarch_init(MachineState *machine)
 lams->machine_done.notify = virt_machine_done;
 qemu_add_machine_init_done_notifier(&lams->machine_done);
 fdt_add_pcie_node(lams);
-
-/* load fdt */
-MemoryRegion *fdt_rom = g_new(MemoryRegion, 1);
-memory_region_init_rom(fdt_rom, NULL, "fdt", VIRT_FDT_SIZE, &error_fatal);
-memory_region_add_subregion(get_system_memory(), VIRT_FDT_BASE, fdt_rom);
-rom_add_blob_fixed("fdt", machine->fdt, lams->fdt_size, VIRT_FDT_BASE);
+/*
+ * Since lowmem region starts from 0, FDT base address is located
+ * at 2 MiB to avoid NULL pointer access.
+ *
+ * Put the FDT into the memory map as a ROM image: this will ensure
+ * the FDT is copied again upon reset, even if addr points into RAM.
+ */
+fdt_base = 2 * MiB;
+qemu_fdt_dumpdtb(machine->fdt, lams->fdt_size);
+rom_add_blob_fixed("fdt", machine->fdt, lams->fdt_size, fdt_base);
 }
 
 bool loongarch_is_acpi_enabled(LoongArchMachineState *lams)
diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h
index 09f1c88ee5..45c383f5a7 100644
--- a/include/hw/loongarch/virt.h
+++ b/include/hw/loongarch/virt.h
@@ -28,9 +28,6 @@
 #define VIRT_GED_MEM_ADDR   (VIRT_GED_EVT_ADDR + ACPI_GED_EVT_SEL_LEN)
 #define VIRT_GED_REG_ADDR   (VIRT_GED_MEM_ADDR + MEMORY_HOTPLUG_IO_LEN)
 
-#define VIRT_FDT_BASE   0x1c40
-#define VIRT_FDT_SIZE   0x10
-
 struct LoongArchMachineState {
 /*< private >*/
 MachineState parent_obj;
-- 
2.31.1




Re: [PATCH] i386/cpu: Adjust cpuid addresable id count to match the spec

2022-10-27 Thread Wang, Lei


On 10/10/2022 10:49 AM, Ilya Oleinik wrote:
> For EBX bits 23 - 16 in CPUID[01] Intel's manual states:
> "
> *   The nearest power-of-2 integer that is not smaller than EBX[23:16]
> is the number of unique initial APICIDs reserved for addressing
> different logical processors in a physical package. This field is
> only valid if CPUID.1.EDX.HTT[bit 28]= 1.
> "
> Ensure this condition is met.
> 
> Additionally, apply efb3934adf9ee7794db7e0ade9f576c634592891 to
> non passthrough cache mode.
> 
> Signed-off-by: Ilya Oleinik 
> ---
>  target/i386/cpu.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index ad623d91e4..e793bcc03f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -245,8 +245,8 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache,
>  *eax = CACHE_TYPE(cache->type) |
> CACHE_LEVEL(cache->level) |
> (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) |
> -   ((num_cores - 1) << 26) |
> -   ((num_apic_ids - 1) << 14);
> +   ((pow2ceil(num_cores) - 1) << 26) |
> +   ((pow2ceil(num_apic_ids) - 1) << 14);
>  
>  assert(cache->line_size > 0);
>  assert(cache->partitions > 0);
> @@ -5258,7 +5258,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  }
>  *edx = env->features[FEAT_1_EDX];
>  if (cs->nr_cores * cs->nr_threads > 1) {
> -*ebx |= (cs->nr_cores * cs->nr_threads) << 16;
> +*ebx |= (pow2ceil(cs->nr_cores * cs->nr_threads)) << 16;
>  *edx |= CPUID_HT;
>  }
>  if (!cpu->enable_pmu) {
> @@ -5313,12 +5313,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  switch (count) {
>  case 0: /* L1 dcache info */
>  encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache,
> -1, cs->nr_cores,

Hi Ilya,

Just curious why the origin implementation is hard-coded to 1 and is there any
bug reported related to this?

> +cs->nr_threads, cs->nr_cores,
>  eax, ebx, ecx, edx);
>  break;
>  case 1: /* L1 icache info */
>  encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache,
> -1, cs->nr_cores,
> +cs->nr_threads, cs->nr_cores,
>  eax, ebx, ecx, edx);
>  break;
>  case 2: /* L2 cache info */



Re: [PATCH v1 2/3] target/riscv: Extend isa_ext_data for single letter extensions

2022-10-27 Thread Alistair Francis
On Thu, Oct 27, 2022 at 3:50 PM Mayuresh Chitale
 wrote:
>
> Currently the ISA string for a CPU is generated from two different
> arrays, one for single letter extensions and another for multi letter
> extensions. Add all the single letter extensions to the isa_ext_data
> array and use it for generating the ISA string. Also drop 'P' and 'Q'
> extensions from the list of single letter extensions as those are not
> supported yet.
>
> Signed-off-by: Mayuresh Chitale 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  target/riscv/cpu.c | 41 +++--
>  1 file changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index e6d9c706bb..35320a8547 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -41,8 +41,6 @@
>   (QEMU_VERSION_MICRO))
>  #define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID
>
> -static const char riscv_single_letter_exts[] = "IEMAFDQCPVH";
> -
>  struct isa_ext_data {
>  const char *name;
>  bool multi_letter;
> @@ -71,6 +69,13 @@ struct isa_ext_data {
>   *extensions by an underscore.
>   */
>  static const struct isa_ext_data isa_edata_arr[] = {
> +ISA_EXT_DATA_ENTRY(i, false, PRIV_VERSION_1_10_0, ext_i),
> +ISA_EXT_DATA_ENTRY(e, false, PRIV_VERSION_1_10_0, ext_e),
> +ISA_EXT_DATA_ENTRY(m, false, PRIV_VERSION_1_10_0, ext_m),
> +ISA_EXT_DATA_ENTRY(a, false, PRIV_VERSION_1_10_0, ext_a),
> +ISA_EXT_DATA_ENTRY(f, false, PRIV_VERSION_1_10_0, ext_f),
> +ISA_EXT_DATA_ENTRY(d, false, PRIV_VERSION_1_10_0, ext_d),
> +ISA_EXT_DATA_ENTRY(c, false, PRIV_VERSION_1_10_0, ext_c),
>  ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h),
>  ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v),
>  ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr),
> @@ -1182,16 +1187,23 @@ static void riscv_cpu_class_init(ObjectClass *c, void 
> *data)
>  device_class_set_props(dc, riscv_cpu_properties);
>  }
>
> -static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int 
> max_str_len)
> +static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str)
>  {
>  char *old = *isa_str;
>  char *new = *isa_str;
>  int i;
>
>  for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) {
> -if (isa_edata_arr[i].multi_letter &&
> -isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
> -new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
> +if (isa_ext_is_enabled(cpu, &isa_edata_arr[i])) {
> +if (isa_edata_arr[i].multi_letter) {
> +if (cpu->cfg.short_isa_string) {
> +continue;
> +}
> +new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL);
> +} else {
> +new = g_strconcat(old, isa_edata_arr[i].name, NULL);
> +}
> +
>  g_free(old);
>  old = new;
>  }
> @@ -1202,19 +1214,12 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
> **isa_str, int max_str_len)
>
>  char *riscv_isa_string(RISCVCPU *cpu)
>  {
> -int i;
> -const size_t maxlen = sizeof("rv128") + sizeof(riscv_single_letter_exts);
> +const size_t maxlen = sizeof("rv128");
>  char *isa_str = g_new(char, maxlen);
> -char *p = isa_str + snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
> -for (i = 0; i < sizeof(riscv_single_letter_exts) - 1; i++) {
> -if (cpu->env.misa_ext & RV(riscv_single_letter_exts[i])) {
> -*p++ = qemu_tolower(riscv_single_letter_exts[i]);
> -}
> -}
> -*p = '\0';
> -if (!cpu->cfg.short_isa_string) {
> -riscv_isa_string_ext(cpu, &isa_str, maxlen);
> -}
> +
> +snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS);
> +riscv_isa_string_ext(cpu, &isa_str);
> +
>  return isa_str;
>  }
>
> --
> 2.34.1
>
>



Re: [PATCH v5 00/13] Instantiate VT82xx functions in host device

2022-10-27 Thread Philippe Mathieu-Daudé

Hi Daniel, Bernhard,

On 27/10/22 11:47, Daniel Henrique Barboza wrote:

On 10/27/22 05:21, Bernhard Beschow wrote:
Am 16. September 2022 14:36:05 UTC schrieb "Philippe Mathieu-Daudé" 
:

On 12/9/22 21:50, Bernhard Beschow wrote:
Am 1. September 2022 11:41:14 UTC schrieb Bernhard Beschow 
:



Testing done:

* `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device 
ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso 
-kernel morphos-3.17/boot.img`


   Boots successfully and it is possible to open games and tools.



* I was unable to test the fuloong2e board even before this series 
since it seems to be unfinished [1].


   A buildroot-baked kernel [2] booted but doesn't find its root 
partition, though the issues could be in the buildroot receipt I 
created.




[1] https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2

[2] https://github.com/shentok/buildroot/commits/fuloong2e



Copying from v2 (just found it in my spam folder :/):
Series:
Reviewed-by: Philippe Mathieu-Daudé 

Review seems complete, thanks to all who participated! Now we just 
need someone to queue this series.


Best regards,
Bernhard


Excellent cleanup! Series queued to mips-next.


Hi Phil,

would you mind doing a pull request in time for 7.2?


I believe Phil was having problems with his amsat.org email. It's
better to CC him using his work email phi...@linaro.org (just added
it).

Phil, since this has pegasos2 changes I can queue it up via ppc-next
if you like. I'll toss a PR tomorrow.


This series is already queued. I apologize for the lng delay, I am
trying to run my usual tests but various fileservers I was using to
fetch MIPS binaries disappeared over the last year, so I have to pull
these files from offline backups. The PR will be in time for 7.2
however :)

Regards,

Phil.



Re: [PATCH] target/i386: Expand eflags updates inline

2022-10-27 Thread Philippe Mathieu-Daudé

On 27/10/22 12:26, Richard Henderson wrote:

The helpers for reset_rf, cli, sti, clac, stac are
completely trivial; implement them inline.

Drop some nearby #if 0 code.

Signed-off-by: Richard Henderson 
---
  target/i386/helper.h|  5 -
  target/i386/tcg/cc_helper.c | 41 -
  target/i386/tcg/translate.c | 30 ++-
  3 files changed, 25 insertions(+), 51 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 3/4] hw/i386/acpi-build: Resolve PIIX ISA bridge rather than ACPI controller

2022-10-27 Thread Philippe Mathieu-Daudé

On 26/10/22 15:31, Bernhard Beschow wrote:

Resolving the PIIX ISA bridge rather than the PIIX ACPI controller mirrors
the ICH9 code one line below.

Signed-off-by: Bernhard Beschow 
---
  hw/i386/acpi-build.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 4/4] hw/i386/acpi-build: Generate AML for north and south bridges separately

2022-10-27 Thread Philippe Mathieu-Daudé

On 26/10/22 15:31, Bernhard Beschow wrote:

The code currently assumes Q35 iff ICH9 and i440fx iff PIIX. This is
slightly confusing when trying to understand the code. Split north and
south bridge code to communicate which piece of code assumes which type
of bridge.

Signed-off-by: Bernhard Beschow 
---
  hw/i386/acpi-build.c | 22 ++
  1 file changed, 14 insertions(+), 8 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa

2022-10-27 Thread Si-Wei Liu

Hi Jason,

Sorry for top posting, but are you going to queue this patch? It looks 
like the discussion has been settled and no further comment I got for 2 
weeks for this patch.


Thanks,
-Siwei

On 10/13/2022 4:12 PM, Si-Wei Liu wrote:

Jason,

On 10/12/2022 10:02 PM, Jason Wang wrote:


在 2022/10/12 13:59, Si-Wei Liu 写道:



On 10/11/2022 8:09 PM, Jason Wang wrote:
On Tue, Oct 11, 2022 at 1:18 AM Si-Wei Liu 
wrote:

On 10/8/2022 10:43 PM, Jason Wang wrote:

On Sat, Oct 8, 2022 at 5:04 PM Si-Wei Liu 
wrote:


Similar to other vhost backends, vhostfd can be passed to vhost-vdpa
backend as another parameter to instantiate vhost-vdpa net client.
This would benefit the use case where only open file descriptors, as
opposed to raw vhost-vdpa device paths, are accessible from the QEMU
process.

(qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1

Adding Cindy.

This has been discussed before, we've already had
vhostdev=/dev/fdset/$fd which should be functional equivalent to what
has been proposed here. (And this is how libvirt works if I 
understand

correctly).

Yes, I was aware of that discussion. However, our implementation 
of the management software is a bit different from libvirt, in 
which the paths in /dev/fdset/NNN can't be dynamically passed to 
the container where QEMU is running. By using a specific vhostfd 
property with existing code, it would allow our mgmt software 
smooth adaption without having to add too much infra code to 
support the /dev/fdset/NNN trick.

I think fdset has extra flexibility in e.g hot-plug to allow the file
descriptor to be passed with SCM_RIGHTS.
Yes, that's exactly the use case we'd like to support. Though the 
difference in our mgmt software stack from libvirt is that any 
dynamic path in /dev (like /dev/fdset/ABC or /dev/vhost-vdpa-XYZ) 
can't be allowed to get passed through to the container running QEMU 
on the fly for security reasons. fd passing is allowed, though, with 
very strict security checks.



Interesting, any reason for disallowing fd passing?
For our mgmt software stack, QEMU is running in a secured container 
with its own namespace(s) with minimally well known and trusted 
devices from root ns exposed (only) at the time when QEMU is being 
started.  Direct fd passing via SCM_RIGHTS is allowed, but fdset 
device node exposure is not allowed and not even considered useful to 
us, as it adds an unwarranted attack surface to the QEMU's secured 
container unnecessarily. This has been the case and our security model 
for a while now w.r.t hot plugging vhost-net/tap and vhost-scsi 
devices, so will do for vhost-vdpa with vhostfd. It's not an open 
source project, though what I can share is that it's not a simple 
script that can be easily changed, and allow passing extra devices 
e.g. fdset especially on the fly is not even in consideration per 
suggested security guideline. I think we don't do anything special 
here as with other secured containers that disallow dynamic device 
injection on the fly.


I'm asking since it's the way that libvirt work and it seems to me we 
don't get any complaints in the past.
I guess it was because libvirt doesn't run QEMU in a container with 
very limited device exposure, otherwise this sort of constraints would 
pop up. Anyway the point and the way I see it is that passing vhostfd 
is proved to be working well and secure with other vhost devices, I 
don't see why vhost-vdpa is treated special here that would need to 
enforce the fdset usage. It's an edge case for libvirt maybe, but 
supporting QEMU's vhost-vdpa device to run in a securely contained 
environment with no dynamic device injection shouldn't be an odd or 
bizarre use case.



Thanks,
-Siwei




That's the main motivation for this direct vhostfd passing support 
(noted fdset doesn't need to be used along with /dev/fdset node).


Having it said, I found there's also nuance in the 
vhostdev=/dev/fdset/XyZ interface besides the /dev node limitation: 
the fd to open has to be dup'ed from the original one passed via 
SCM_RIGHTS. This also has implication on security that any ioctl 
call from QEMU can't be audited through the original fd.



I'm not sure I get this, but management layer can enforce a ioctl 
whiltelist for safety.


Thanks


With this regard, I think vhostfd offers more flexibility than work 
around those qemu_open() specifics. Would these justify the use case 
of concern?


Thanks,
-Siwei


  It would still be good to add
the support.

On the other hand, the other vhost backends, e.g. tap (via 
vhost-net), vhost-scsi and vhost-vsock all accept vhostfd as 
parameter to instantiate device, although the /dev/fdset trick 
also works there. I think vhost-vdpa is not  unprecedented in this 
case?

Yes.

Thanks


Thanks,
-Siwei



Thanks

Signed-off-by: Si-Wei Liu
Acked-by: Eugenio Pérez

---
v2:
   - fixed typo in commit message
   - s/fd's/file descriptors/
---
  net/vhost-vdpa.c | 25 -
  qapi/net.json    |  3 +++
  qemu-option

Re: [PATCH 2/4] hw/i386/acpi-build: Resolve redundant attribute

2022-10-27 Thread Philippe Mathieu-Daudé

On 26/10/22 15:31, Bernhard Beschow wrote:

The is_piix4 attribute is set once in one location and read once in
another. Doing both in one location allows for removing the attribute
altogether.

Signed-off-by: Bernhard Beschow 
---
  hw/i386/acpi-build.c | 20 ++--
  1 file changed, 6 insertions(+), 14 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2 0/3] hw/isa/piix4: Remove MIPS Malta specific bits

2022-10-27 Thread Philippe Mathieu-Daudé

On 27/10/22 22:47, Philippe Mathieu-Daudé wrote:

Since v1:
- bswap -> tswap (Bernhard)

Bernhard posted a series merging both PIIX3/PIIX4 models
in one [1]. Due to Malta-specific board code forced into
the PIIX4 reset values, Bernhard had to include an array
of "register values at reset" as a class property. This
is not wrong, but to model properly the model, we should
simply use the hardware real reset values, not try to
bend the model to please the Malta board.

This series fix this issue by having the Malta bootloader
code setting the board-specific PIIX4 IRQ routing values.

Note patch 2 still misses an equivalent nanoMIPS code.


So this series won't be merged until this is added, but
it should be enough to let Bernhard keep working on the
"Consolidate PIIX series".



Re: [PATCH v4 6/7] hw/sd/sdhci: Implement Freescale eSDHC device model

2022-10-27 Thread Philippe Mathieu-Daudé

Hi Bernhard,

On 18/10/22 23:01, Bernhard Beschow wrote:

Will allow e500 boards to access SD cards using just their own devices.

Signed-off-by: Bernhard Beschow 
---
  hw/sd/sdhci.c | 120 +-
  include/hw/sd/sdhci.h |   3 ++
  2 files changed, 122 insertions(+), 1 deletion(-)

diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
index 306070c872..8d8ad9ff24 100644
--- a/hw/sd/sdhci.c
+++ b/hw/sd/sdhci.c
@@ -1369,6 +1369,7 @@ void sdhci_initfn(SDHCIState *s)
  s->transfer_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, sdhci_data_transfer, 
s);
  
  s->io_ops = &sdhci_mmio_ops;

+s->io_registers_map_size = SDHC_REGISTERS_MAP_SIZE;
  }
  
  void sdhci_uninitfn(SDHCIState *s)

@@ -1392,7 +1393,7 @@ void sdhci_common_realize(SDHCIState *s, Error **errp)
  s->fifo_buffer = g_malloc0(s->buf_maxsz);
  
  memory_region_init_io(&s->iomem, OBJECT(s), s->io_ops, s, "sdhci",

-  SDHC_REGISTERS_MAP_SIZE);
+  s->io_registers_map_size);


I don't think we want to change this region size. [see below]


  void sdhci_common_unrealize(SDHCIState *s)
@@ -1575,6 +1576,122 @@ static const TypeInfo sdhci_bus_info = {
  .class_init = sdhci_bus_class_init,
  };
  
+/* --- qdev Freescale eSDHC --- */

+
+/* Watermark Level Register */
+#define ESDHC_WML0x44
+
+/* Control Register for DMA transfer */
+#define ESDHC_DMA_SYSCTL0x40c
+
+#define ESDHC_REGISTERS_MAP_SIZE0x410


My preferred approach would be to create a container region with a
size of ESDHC_REGISTERS_MAP_SIZE. Map the SDHC_REGISTERS_MAP region
in the container at offset 0, priority -1. Add 2 register regions
for ESDHC_WML and ESDHC_DMA_SYSCTL, and map them with priority 1 in
the container. ...


+static uint64_t esdhci_read(void *opaque, hwaddr offset, unsigned size)
+{
+uint64_t ret;
+
+switch (offset) {
+case SDHC_SYSAD:
+case SDHC_BLKSIZE:
+case SDHC_ARGUMENT:
+case SDHC_TRNMOD:
+case SDHC_RSPREG0:
+case SDHC_RSPREG1:
+case SDHC_RSPREG2:
+case SDHC_RSPREG3:
+case SDHC_BDATA:
+case SDHC_PRNSTS:
+case SDHC_HOSTCTL:
+case SDHC_CLKCON:
+case SDHC_NORINTSTS:
+case SDHC_NORINTSTSEN:
+case SDHC_NORINTSIGEN:
+case SDHC_ACMD12ERRSTS:
+case SDHC_CAPAB:
+case SDHC_SLOT_INT_STATUS:
+ret = sdhci_read(opaque, offset, size);
+break;


... Then you don't need these cases.


+case ESDHC_WML:
+case ESDHC_DMA_SYSCTL:
+ret = 0;
+qemu_log_mask(LOG_UNIMP, "ESDHC rd @0x%02" HWADDR_PRIx
+  " not implemented\n", offset);


But then I realize you only treat these 2 registers as UNIMP.

So now, I'd create 1 UNIMP region for ESDHC_WML and map it
into SDHC_REGISTERS_MAP (s->iomem) with priority 1, and add
another UNIMP region of ESDHC_REGISTERS_MAP_SIZE - 
SDHC_REGISTERS_MAP_SIZE (= 0x310) and map it normally at offset

0x100 (SDHC_REGISTERS_MAP_SIZE). Look at create_unimp() in
hw/arm/bcm2835_peripherals.c.

But the ESDHC_WML register has address 0x44 and fits inside the
SDHC_REGISTERS_MAP region, so likely belong there. 0x44 is the
upper part of the SDHC_CAPAB register. These bits are undefined
on the spec v2, which I see you are setting in esdhci_init().
So this register should already return 0, otherwise we have
a bug. Thus we don't need to handle this ESDHC_WML particularly.

And your model is reduced to handling create_unimp() in esdhci_realize().

Am I missing something?

Regards,

Phil.



Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking

2022-10-27 Thread Richard Henderson

On 10/28/22 01:59, Alex Bennée wrote:

I'm unwilling to put an expensive test like a function call
(have_mmap_lock) before an inexpensive test like pointer != NULL.


Is it really that more expensive?


Well, yes.  I mean, the function call isn't really slow, but it isn't single-cycle like a 
comparison against 0.



Sure, I guess I'm just trying to avoid having so many returns out of
the code at various levels of nesting. The page_get_target_data code is
harder to follow. What about:

int page_get_flags(target_ulong address)
{
 PageFlagsNode *p = pageflags_find(address, address);

 /*
  * See util/interval-tree.c re lockless lookups: no false positives but
  * there are false negatives.  If we had the lock and found
  * nothing we are done, otherwise retry with the mmap lock acquired.
  */
 if (p) {
 return p->flags;
 } else if (have_mmap_lock()) {
 return 0;
 }

 mmap_lock();
 p = pageflags_find(address, address);
 mmap_unlock();

 return p ? p->flags : 0;
}


Ok, I can use this.  In for v3.

As for page_get_target_data, see v2, in which there has been clean up.


r~



[PATCH v2] s390x/css: revert SCSW ctrl/flag bits on error

2022-10-27 Thread Peter Jin
Revert the control and flag bits in the subchannel status word in case
the SSCH operation fails with non-zero CC (ditto for CSCH and HSCH).
According to POPS, the control and flag bits are only changed if SSCH,
CSCH, and HSCH return CC 0, and no other action should be taken otherwise.
In order to simulate that after the fact, the bits need to be reverted on
non-zero CC.

This change is necessary due to the fact that the pwrite() in vfio-ccw
which triggers the SSCH can fail at any time. Previously, there was
only virtio-ccw, whose do_subchannel_work function was only able to
return CC0. However, once vfio-ccw went into the mix, it has become
necessary to handle errors in code paths that were previously assumed
to always return success.

In our case, we found that in case of pwrite() failure (which was
discovered by strace injection), the subchannel could be stuck in start
pending state, which could be problematic if the pwrite() call returns
CC2. Experimentation shows that the guest tries to retry the SSCH call as
normal for CC2, but it actually continously fails due to the fact that
the subchannel is stuck in start pending state even though no start
function is actually taking place.

Signed-off-by: Peter Jin 
---
 hw/s390x/css.c | 51 +++---
 1 file changed, 48 insertions(+), 3 deletions(-)

diff --git a/hw/s390x/css.c b/hw/s390x/css.c
index 7d9523f811..95d1b3a3ce 100644
--- a/hw/s390x/css.c
+++ b/hw/s390x/css.c
@@ -1522,21 +1522,37 @@ IOInstEnding css_do_xsch(SubchDev *sch)
 IOInstEnding css_do_csch(SubchDev *sch)
 {
 SCHIB *schib = &sch->curr_status;
+uint16_t old_scsw_ctrl;
+IOInstEnding ccode;
 
 if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
 return IOINST_CC_NOT_OPERATIONAL;
 }
 
+/*
+ * Save the current scsw.ctrl in case CSCH fails and we need
+ * to revert the scsw to the status quo ante.
+ */
+old_scsw_ctrl = schib->scsw.ctrl;
+
 /* Trigger the clear function. */
 schib->scsw.ctrl &= ~(SCSW_CTRL_MASK_FCTL | SCSW_CTRL_MASK_ACTL);
 schib->scsw.ctrl |= SCSW_FCTL_CLEAR_FUNC | SCSW_ACTL_CLEAR_PEND;
 
-return do_subchannel_work(sch);
+ccode = do_subchannel_work(sch);
+
+if (ccode != IOINST_CC_EXPECTED) {
+schib->scsw.ctrl = old_scsw_ctrl;
+}
+
+return ccode;
 }
 
 IOInstEnding css_do_hsch(SubchDev *sch)
 {
 SCHIB *schib = &sch->curr_status;
+uint16_t old_scsw_ctrl;
+IOInstEnding ccode;
 
 if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
 return IOINST_CC_NOT_OPERATIONAL;
@@ -1553,6 +1569,12 @@ IOInstEnding css_do_hsch(SubchDev *sch)
 return IOINST_CC_BUSY;
 }
 
+/*
+ * Save the current scsw.ctrl in case HSCH fails and we need
+ * to revert the scsw to the status quo ante.
+ */
+old_scsw_ctrl = schib->scsw.ctrl;
+
 /* Trigger the halt function. */
 schib->scsw.ctrl |= SCSW_FCTL_HALT_FUNC;
 schib->scsw.ctrl &= ~SCSW_FCTL_START_FUNC;
@@ -1564,7 +1586,13 @@ IOInstEnding css_do_hsch(SubchDev *sch)
 }
 schib->scsw.ctrl |= SCSW_ACTL_HALT_PEND;
 
-return do_subchannel_work(sch);
+ccode = do_subchannel_work(sch);
+
+if (ccode != IOINST_CC_EXPECTED) {
+schib->scsw.ctrl = old_scsw_ctrl;
+}
+
+return ccode;
 }
 
 static void css_update_chnmon(SubchDev *sch)
@@ -1605,6 +1633,8 @@ static void css_update_chnmon(SubchDev *sch)
 IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb)
 {
 SCHIB *schib = &sch->curr_status;
+uint16_t old_scsw_ctrl, old_scsw_flags;
+IOInstEnding ccode;
 
 if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) {
 return IOINST_CC_NOT_OPERATIONAL;
@@ -1626,11 +1656,26 @@ IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb)
 }
 sch->orb = *orb;
 sch->channel_prog = orb->cpa;
+
+/*
+ * Save the current scsw.ctrl and scsw.flags in case SSCH fails and we need
+ * to revert the scsw to the status quo ante.
+ */
+old_scsw_ctrl = schib->scsw.ctrl;
+old_scsw_flags = schib->scsw.flags;
+
 /* Trigger the start function. */
 schib->scsw.ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND);
 schib->scsw.flags &= ~SCSW_FLAGS_MASK_PNO;
 
-return do_subchannel_work(sch);
+ccode = do_subchannel_work(sch);
+
+if (ccode != IOINST_CC_EXPECTED) {
+schib->scsw.ctrl = old_scsw_ctrl;
+schib->scsw.flags = old_scsw_flags;
+}
+
+return ccode;
 }
 
 static void copy_irb_to_guest(IRB *dest, const IRB *src, const PMCW *pmcw,
-- 
2.37.3




Re: [PATCH v4 01/30] tests/docker: update fedora-win[32|64]-cross with lcitool

2022-10-27 Thread Richard Henderson

On 10/28/22 04:36, Alex Bennée wrote:

Convert another two dockerfiles to lcitool and update. I renamed the
helper because it is not Debian specific. We need an updated lcitool
for this to deal with the weirdness of a 32bit nsis tool for both 32
and 64 bit builds. As a result there are some minor whitespace and
re-order changes in a bunch of the docker files.

Signed-off-by: Alex Bennée
Message-Id:<20220929114231.583801-10-alex.ben...@linaro.org>
---
  tests/docker/dockerfiles/alpine.docker|   2 +-
  tests/docker/dockerfiles/centos8.docker   |   2 +-
  .../dockerfiles/debian-amd64-cross.docker | 234 -
  tests/docker/dockerfiles/debian-amd64.docker  | 236 +-
  .../dockerfiles/debian-arm64-cross.docker | 232 -
  .../dockerfiles/debian-armel-cross.docker | 230 -
  .../dockerfiles/debian-armhf-cross.docker | 232 -
  .../dockerfiles/debian-mips64el-cross.docker  | 226 -
  .../dockerfiles/debian-mipsel-cross.docker| 226 -
  .../dockerfiles/debian-ppc64el-cross.docker   | 230 -
  .../dockerfiles/debian-s390x-cross.docker | 228 -
  .../dockerfiles/fedora-win32-cross.docker | 139 ---
  .../dockerfiles/fedora-win64-cross.docker | 138 +++---
  tests/docker/dockerfiles/fedora.docker| 230 -
  tests/docker/dockerfiles/opensuse-leap.docker |   2 +-
  tests/docker/dockerfiles/ubuntu2004.docker| 234 -
  tests/lcitool/libvirt-ci  |   2 +-
  tests/lcitool/refresh |  48 ++--
  18 files changed, 1499 insertions(+), 1372 deletions(-)


Acked-by: Richard Henderson 


r~



Re: [PATCH v4 28/30] contrib/plugins: protect execlog's last_exec expansion

2022-10-27 Thread Richard Henderson

On 10/28/22 04:36, Alex Bennée wrote:

We originally naively treated expansion as safe because we expected
each new CPU/thread to appear in order. However the -M raspi2 model
triggered a case where a new high cpu_index thread started executing
just before a smaller one.

Clean this up by converting the GArray into the simpler GPtrArray and
then holding a lock for the expansion.

Signed-off-by: Alex Bennée
Cc: Alexandre Iooss
---
  contrib/plugins/execlog.c | 38 ++
  1 file changed, 30 insertions(+), 8 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v4 7/7] hw/ppc/e500: Add Freescale eSDHC to e500plat

2022-10-27 Thread Philippe Mathieu-Daudé

On 18/10/22 23:01, Bernhard Beschow wrote:

Adds missing functionality to e500plat machine which increases the
chance of given "real" firmware images to access SD cards.

Signed-off-by: Bernhard Beschow 
---
  docs/system/ppc/ppce500.rst | 12 
  hw/ppc/Kconfig  |  1 +
  hw/ppc/e500.c   | 35 ++-
  hw/ppc/e500.h   |  1 +
  hw/ppc/e500plat.c   |  1 +
  5 files changed, 49 insertions(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c

2022-10-27 Thread Alex Williamson
On Thu, 27 Oct 2022 15:40:31 +0800
Cindy Lu  wrote:

> Move the function vfio_get_xlat_addr to softmmu/memory.c, and
> change the name to memory_get_xlat_addr().So we can use this
> function in other devices,such as vDPA device.
> 
> Signed-off-by: Cindy Lu 
> ---
>  hw/vfio/common.c  | 92 ++-
>  include/exec/memory.h |  4 ++
>  softmmu/memory.c  | 84 +++
>  3 files changed, 92 insertions(+), 88 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ace9562a9b..2b5a9f3d8d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -574,92 +574,6 @@ static bool 
> vfio_listener_skipped_section(MemoryRegionSection *section)
> section->offset_within_address_space & (1ULL << 63);
>  }
>  
> -/* Called with rcu_read_lock held.  */
> -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> -   ram_addr_t *ram_addr, bool *read_only)
> -{
> -MemoryRegion *mr;
> -hwaddr xlat;
> -hwaddr len = iotlb->addr_mask + 1;
> -bool writable = iotlb->perm & IOMMU_WO;
> -
> -/*
> - * The IOMMU TLB entry we have just covers translation through
> - * this IOMMU to its immediate target.  We need to translate
> - * it the rest of the way through to memory.
> - */
> -mr = address_space_translate(&address_space_memory,
> - iotlb->translated_addr,
> - &xlat, &len, writable,
> - MEMTXATTRS_UNSPECIFIED);
> -if (!memory_region_is_ram(mr)) {
> -error_report("iommu map to non memory area %"HWADDR_PRIx"",
> - xlat);
> -return false;
> -} else if (memory_region_has_ram_discard_manager(mr)) {
> -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr);
> -MemoryRegionSection tmp = {
> -.mr = mr,
> -.offset_within_region = xlat,
> -.size = int128_make64(len),
> -};
> -
> -/*
> - * Malicious VMs can map memory into the IOMMU, which is expected
> - * to remain discarded. vfio will pin all pages, populating memory.
> - * Disallow that. vmstate priorities make sure any RamDiscardManager
> - * were already restored before IOMMUs are restored.
> - */
> -if (!ram_discard_manager_is_populated(rdm, &tmp)) {
> -error_report("iommu map to discarded memory (e.g., unplugged via"
> - " virtio-mem): %"HWADDR_PRIx"",
> - iotlb->translated_addr);
> -return false;
> -}
> -
> -/*
> - * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> - * pages will remain pinned inside vfio until unmapped, resulting in 
> a
> - * higher memory consumption than expected. If memory would get
> - * populated again later, there would be an inconsistency between 
> pages
> - * pinned by vfio and pages seen by QEMU. This is the case until
> - * unmapped from the IOMMU (e.g., during device reset).
> - *
> - * With malicious guests, we really only care about pinning more 
> memory
> - * than expected. RLIMIT_MEMLOCK set for the user/process can never 
> be
> - * exceeded and can be used to mitigate this problem.
> - */
> -warn_report_once("Using vfio with vIOMMUs and coordinated discarding 
> of"
> - " RAM (e.g., virtio-mem) works, however, malicious"
> - " guests can trigger pinning of more memory than"
> - " intended via an IOMMU. It's possible to mitigate "
> - " by setting/adjusting RLIMIT_MEMLOCK.");
> -}
> -
> -/*
> - * Translation truncates length to the IOMMU page size,
> - * check that it did not truncate too much.
> - */
> -if (len & iotlb->addr_mask) {
> -error_report("iommu has granularity incompatible with target AS");
> -return false;
> -}
> -
> -if (vaddr) {
> -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -}
> -
> -if (ram_addr) {
> -*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> -}
> -
> -if (read_only) {
> -*read_only = !writable || mr->readonly;
> -}
> -
> -return true;
> -}
> -
>  static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>  {
>  VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>  bool read_only;
>  
> -if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) {
> +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only,
> +  &address_space_memory)) {
>

Re: [PATCH v4 5/7] hw/ppc/e500: Implement pflash handling

2022-10-27 Thread Philippe Mathieu-Daudé

On 18/10/22 23:01, Bernhard Beschow wrote:

Allows e500 boards to have their root file system reside on flash using
only builtin devices located in the eLBC memory region.

Note that the flash memory area is only created when a -pflash argument is
given, and that the size is determined by the given file. The idea is to
put users into control.

Signed-off-by: Bernhard Beschow 
---
  docs/system/ppc/ppce500.rst | 16 
  hw/ppc/Kconfig  |  1 +
  hw/ppc/e500.c   | 79 +
  3 files changed, 96 insertions(+)



@@ -1024,6 +1061,48 @@ void ppce500_init(MachineState *machine)
  pmc->platform_bus_base,
  &pms->pbus_dev->mmio);
  
+dinfo = drive_get(IF_PFLASH, 0, 0);

+if (dinfo) {
+BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
+BlockDriverState *bs = blk_bs(blk);
+uint64_t size = bdrv_getlength(bs);
+uint64_t mmio_size = pms->pbus_dev->mmio.size;
+uint32_t sector_len = 64 * KiB;
+
+if (!is_power_of_2(size)) {
+error_report("Size of pflash file must be a power of two.");
+exit(1);
+}
+
+if (size > mmio_size) {
+error_report("Size of pflash file must not be bigger than %" PRIu64
+ " bytes.", mmio_size);
+exit(1);
+}
+
+if (!QEMU_IS_ALIGNED(size, sector_len)) {
+error_report("Size of pflash file must be a multiple of %" PRIu32
+ ".", sector_len);
+exit(1);
+}


Again, this check is unrelated to the board code and belong to the flash 
device (the board has no idea of the underlying flash restrictions).


(see below)


+dev = qdev_new(TYPE_PFLASH_CFI01);
+qdev_prop_set_drive(dev, "drive", blk);
+qdev_prop_set_uint32(dev, "num-blocks", size / sector_len);
+qdev_prop_set_uint64(dev, "sector-length", sector_len);
+qdev_prop_set_uint8(dev, "width", 2);
+qdev_prop_set_bit(dev, "big-endian", true);
+qdev_prop_set_uint16(dev, "id0", 0x89);
+qdev_prop_set_uint16(dev, "id1", 0x18);
+qdev_prop_set_uint16(dev, "id2", 0x);
+qdev_prop_set_uint16(dev, "id3", 0x0);
+qdev_prop_set_string(dev, "name", "e500.flash");
+sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal);


If you want to report the error differently, you can use a local Error*
and display it with error_report_err() before exiting.

Anyhow can be cleaned later, so:
Reviewed-by: Philippe Mathieu-Daudé 


+memory_region_add_subregion(&pms->pbus_dev->mmio, 0,
+
pflash_cfi01_get_memory(PFLASH_CFI01(dev)));
+}





Re: [PULL 15/20] include/hw/core: Create struct CPUJumpCache

2022-10-27 Thread Richard Henderson

On 10/28/22 00:44, Ilya Leoshkevich wrote:

Putting CPUJumpCache inside CPUState made problem go away:

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 18ca701b443..3ea528566c3 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -32,6 +32,7 @@
  #include "qemu/thread.h"
  #include "qemu/plugin.h"
  #include "qom/object.h"
+#include "accel/tcg/tb-jmp-cache.h"
  
  typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size,

   void *opaque);
@@ -366,7 +367,7 @@ struct CPUState {
  CPUArchState *env_ptr;
  IcountDecr *icount_decr_ptr;
  
-CPUJumpCache *tb_jmp_cache;

+CPUJumpCache tb_jmp_cache;


Yes, well.  That structure is quite large (128kB?) and I had been hoping to (1) save that 
extra memory for e.g. KVM and (2) hide the tcg-specific stuff from core.


But clearly something went wrong during some threadedness with your test case.



  void tcg_flush_jmp_cache(CPUState *cpu)
  {
-CPUJumpCache *jc = cpu->tb_jmp_cache;
  
-if (likely(jc)) {

-for (int i = 0; i < TB_JMP_CACHE_SIZE; i++) {
-qatomic_set(&jc->array[i].tb, NULL);
-}
-} else {
-/* This should happen once during realize, and thus never race. */
-jc = g_new0(CPUJumpCache, 1);
-jc = qatomic_xchg(&cpu->tb_jmp_cache, jc);
-assert(jc == NULL);
  }
  }
  
So there must be a race in tcg_flush_jmp_cache() after all?


If there had been a race here, we would abort with the assert.
It must be something else...


r~




Re: [PATCH v2 09/43] hw/usb/hcd-uhci: Introduce TYPE_ defines for device models

2022-10-27 Thread Philippe Mathieu-Daudé

On 22/10/22 17:04, Bernhard Beschow wrote:

Suggested-by: Mark Cave-Ayland 
Signed-off-by: Bernhard Beschow 
---
  hw/i386/pc_piix.c |  3 ++-
  hw/i386/pc_q35.c  | 13 +++--
  hw/isa/piix4.c|  2 +-
  hw/usb/hcd-uhci.c | 16 
  hw/usb/hcd-uhci.h |  9 +
  5 files changed, 27 insertions(+), 16 deletions(-)



diff --git a/hw/usb/hcd-uhci.h b/hw/usb/hcd-uhci.h
index c85ab7868e..22f6e6fcfc 100644
--- a/hw/usb/hcd-uhci.h
+++ b/hw/usb/hcd-uhci.h
@@ -91,4 +91,13 @@ typedef struct UHCIInfo {
  void uhci_data_class_init(ObjectClass *klass, void *data);
  void usb_uhci_common_realize(PCIDevice *dev, Error **errp);
  
+#define TYPE_PIIX3_USB_UHCI "piix3-usb-uhci"

+#define TYPE_PIIX4_USB_UHCI "piix4-usb-uhci"
+#define TYPE_ICH9_USB_UHCI1 "ich9-usb-uhci1"
+#define TYPE_ICH9_USB_UHCI2 "ich9-usb-uhci2"
+#define TYPE_ICH9_USB_UHCI3 "ich9-usb-uhci3"
+#define TYPE_ICH9_USB_UHCI4 "ich9-usb-uhci4"
+#define TYPE_ICH9_USB_UHCI5 "ich9-usb-uhci5"
+#define TYPE_ICH9_USB_UHCI6 "ich9-usb-uhci6"


What about defining once:

  #define TYPE_ICH9_USB_UHCI(fn) "ich9-usb-uhci" #fn

?



Re: [PULL 00/23] 9p queue 2022-10-24

2022-10-27 Thread Warner Losh
On Thu, Oct 27, 2022, 1:45 PM Christian Schoenebeck 
wrote:

> On Thursday, October 27, 2022 7:37:07 PM CEST Stefan Hajnoczi wrote:
> > On Thu, 27 Oct 2022 at 12:38, Christian Schoenebeck
> >  wrote:
> > >
> > > On Thursday, October 27, 2022 5:53:47 PM CEST Thomas Huth wrote:
> > > > On 24/10/2022 12.54, Christian Schoenebeck wrote:
> > > > > The following changes since commit
> 0529245488865038344d64fff7ee05864d3d17f6:
> > > > >
> > > > >Merge tag 'pull-target-arm-20221020' of
> https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-10-20
> 14:36:12 -0400)
> > > > >
> > > > > are available in the Git repository at:
> > > > >
> > > > >https://github.com/cschoenebeck/qemu.git tags/pull-9p-20221024
> > > > >
> > > > > for you to fetch changes up to
> 3ce77865bf813f313cf79c00fd951bfc95a50165:
> > > > >
> > > > >tests/9p: remove unnecessary g_strdup() calls (2022-10-24
> 12:24:32 +0200)
> > > > >
> > > > > 
> > > > > 9pfs: performance, Windows host prep, tests restructure
> > > > >
> > > > > * Highlight of this PR is Linus Heckemann's GHashTable patch which
> > > > >brings massive general performance improvements of 9p server
> > > > >somewhere between factor 6 .. 12.
> > > > >
> > > > > * Bin Meng's g_mkdir patch is a preparatory patch for upcoming
> > > > >Windows host support of 9p server.
> > > > >
> > > > > * The rest of the patches in this PR are 9p test code restructuring
> > > > >and refactoring changes to improve readability and to ease
> > > > >maintenance of 9p test code on the long-term.
> > > >
> > > >   Hi Christian,
> > > >
> > > > I think this PR broke the FreeBSD CI jobs:
> > > >
> > > >   https://gitlab.com/qemu-project/qemu/-/jobs/3219611457#L3116
> > > >
> > > >   https://gitlab.com/qemu-project/qemu/-/jobs/3219611460#L3372
> > > >
> > > > Could you please have a look?
> > > >
> > > >   Thanks!
> > > >Thomas
> > >
> > > I try, but will certainly take some days, especially as I currently
> don't have
> > > a BSD installation at hand to try the changes.
> >
> > QEMU has the automation to run FreeBSD builds locally (in a VM):
> > $ make vm-build-freebsd
> >
> > Not sure if that FreeBSD environment matches the one in Cirrus CI
> > though. If they are different then maybe it won't reproduce locally.
>
> Something must be different, because
> e750a7ace492f0b450653d4ad368a77d6f660fb8
> compiles fine locally with `make vm-build-freebsd` and all tests pass, too.
>

I have a pull request I'm sending tomorrow that fixes a build issue on
FreeBSD for about 6 or so months on FreeBSD current. If it's hitting
that... but I didn't think our normal CI hit that... I'd submit it today,
but I'm traveling and there is an issue getting to my machines at home, but
I'll be home tomorrow.

Warner


Ideas?
>
> Best regards,
> Christian Schoenebeck
>
>
>
>


[PATCH v2 1/3] hw/mips/malta: Introduce PIIX4_PCI_DEVFN definition

2022-10-27 Thread Philippe Mathieu-Daudé
The PIIX4 PCI-ISA bridge function is always located at 10:0.
Since we want to re-use its address, add the PIIX4_PCI_DEVFN
definition.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/mips/malta.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index 272d93eea7..df0f448b67 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -72,6 +72,8 @@
 
 #define MAX_IDE_BUS 2
 
+#define PIIX4_PCI_DEVFN PCI_DEVFN(10, 0)
+
 typedef struct {
 MemoryRegion iomem;
 MemoryRegion iomem_lo; /* 0 - 0x900 */
@@ -1377,7 +1379,7 @@ void mips_malta_init(MachineState *machine)
 empty_slot_init("GT64120", 0, 0x2000);
 
 /* Southbridge */
-piix4 = pci_create_simple_multifunction(pci_bus, PCI_DEVFN(10, 0), true,
+piix4 = pci_create_simple_multifunction(pci_bus, PIIX4_PCI_DEVFN, true,
 TYPE_PIIX4_PCI_DEVICE);
 dev = DEVICE(piix4);
 isa_bus = ISA_BUS(qdev_get_child_bus(dev, "isa.0"));
-- 
2.37.3




[PATCH v2 2/3] hw/mips/malta: Set PIIX4 IRQ routes in embedded bootloader

2022-10-27 Thread Philippe Mathieu-Daudé
Linux kernel expects the northbridge & southbridge chipsets
configured by the BIOS firmware. We emulate that by writing
a tiny bootloader code in write_bootloader().

Upon introduction in commit 5c2b87e34d ("PIIX4 support"),
the PIIX4 configuration space included values specific to
the Malta board.

Set the Malta-specific IRQ routing values in the embedded
bootloader, so the next commit can remove the Malta specific
bits from the PIIX4 PCI-ISA bridge and make it generic
(matching the real hardware).

Signed-off-by: Philippe Mathieu-Daudé 
---
FIXME: Missing the nanoMIPS counter-part!
---
 hw/mips/malta.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/hw/mips/malta.c b/hw/mips/malta.c
index df0f448b67..4403028778 100644
--- a/hw/mips/malta.c
+++ b/hw/mips/malta.c
@@ -804,6 +804,8 @@ static void write_bootloader_nanomips(uint8_t *base, 
uint64_t run_addr,
 stw_p(p++, 0x8422); stw_p(p++, 0x9088);
 /* sw t0, 0x88(t1)  */
 
+/* TODO set PIIX IRQC[A:D] routing values! */
+
 stw_p(p++, 0xe320 | NM_HI1(kernel_entry));
 
 stw_p(p++, NM_HI2(kernel_entry));
@@ -841,6 +843,9 @@ static void write_bootloader_nanomips(uint8_t *base, 
uint64_t run_addr,
 static void write_bootloader(uint8_t *base, uint64_t run_addr,
  uint64_t kernel_entry)
 {
+const char pci_pins_cfg[PCI_NUM_PINS] = {
+10, 10, 11, 11 /* PIIX IRQRC[A:D] */
+};
 uint32_t *p;
 
 /* Small bootloader */
@@ -915,6 +920,20 @@ static void write_bootloader(uint8_t *base, uint64_t 
run_addr,
 
 #undef cpu_to_gt32
 
+/*
+ * The PIIX ISA bridge is on PCI bus 0 dev 10 func 0.
+ * Load the PIIX IRQC[A:D] routing config address, then
+ * write routing configuration to the config data register.
+ */
+bl_gen_write_u32(&p, /* GT_PCI0_CFGADDR */
+ cpu_mips_phys_to_kseg1(NULL, 0x1be0 + 0xcf8),
+ tswap32((1 << 31) /* ConfigEn */
+ | PCI_BUILD_BDF(0, PIIX4_PCI_DEVFN) << 8
+ | PIIX_PIRQCA));
+bl_gen_write_u32(&p, /* GT_PCI0_CFGDATA */
+ cpu_mips_phys_to_kseg1(NULL, 0x1be0 + 0xcfc),
+ tswap32(ldl_be_p(pci_pins_cfg)));
+
 bl_gen_jump_kernel(&p,
true, ENVP_VADDR - 64,
/*
-- 
2.37.3




[PATCH v2 3/3] hw/isa/piix4: Correct IRQRC[A:D] reset values

2022-10-27 Thread Philippe Mathieu-Daudé
IRQRC[A:D] registers reset value is 0x80. We were forcing
the MIPS Malta machine routing to be able to boot a Linux
kernel without any bootloader.
We now have these registers initialized in the Malta machine
write_bootloader(), so we can use the correct reset values.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/isa/piix4.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c
index 15f344dbb7..a2165c6a49 100644
--- a/hw/isa/piix4.c
+++ b/hw/isa/piix4.c
@@ -115,10 +115,10 @@ static void piix4_isa_reset(DeviceState *dev)
 pci_conf[0x4c] = 0x4d;
 pci_conf[0x4e] = 0x03;
 pci_conf[0x4f] = 0x00;
-pci_conf[0x60] = 0x0a; // PCI A -> IRQ 10
-pci_conf[0x61] = 0x0a; // PCI B -> IRQ 10
-pci_conf[0x62] = 0x0b; // PCI C -> IRQ 11
-pci_conf[0x63] = 0x0b; // PCI D -> IRQ 11
+pci_conf[0x60] = 0x80;
+pci_conf[0x61] = 0x80;
+pci_conf[0x62] = 0x80;
+pci_conf[0x63] = 0x80;
 pci_conf[0x69] = 0x02;
 pci_conf[0x70] = 0x80;
 pci_conf[0x76] = 0x0c;
-- 
2.37.3




[PATCH v2 0/3] hw/isa/piix4: Remove MIPS Malta specific bits

2022-10-27 Thread Philippe Mathieu-Daudé
Since v1:
- bswap -> tswap (Bernhard)

Bernhard posted a series merging both PIIX3/PIIX4 models
in one [1]. Due to Malta-specific board code forced into
the PIIX4 reset values, Bernhard had to include an array
of "register values at reset" as a class property. This
is not wrong, but to model properly the model, we should
simply use the hardware real reset values, not try to
bend the model to please the Malta board.

This series fix this issue by having the Malta bootloader
code setting the board-specific PIIX4 IRQ routing values.

Note patch 2 still misses an equivalent nanoMIPS code.

Regards,

Phil.

[1] https://lore.kernel.org/qemu-devel/20221022150508.26830-1-shen...@gmail.com/

Based-on: <20221026191821.28167-1-phi...@linaro.org>
https://lore.kernel.org/qemu-devel/20221026191821.28167-1-phi...@linaro.org/

Philippe Mathieu-Daudé (3):
  hw/mips/malta: Introduce PIIX4_PCI_DEVFN definition
  hw/mips/malta: Set PIIX4 IRQ routes in embedded bootloader
  hw/isa/piix4: Correct IRQRC[A:D] reset values

 hw/isa/piix4.c  |  8 
 hw/mips/malta.c | 23 ++-
 2 files changed, 26 insertions(+), 5 deletions(-)

-- 
2.37.3




Re: [PULL 15/20] include/hw/core: Create struct CPUJumpCache

2022-10-27 Thread Richard Henderson

On 10/28/22 00:18, Ilya Leoshkevich wrote:

in one of the wasmtime tests (host=x86_64, guest=s390x).
GDB shows that the root cause is actually this:

 Thread 181 "wasi_tokio::pat" received signal SIGSEGV, Segmentation fault.
 [Switching to Thread 0x76c54640 (LWP 168352)]
 0x55626736 in do_tb_phys_invalidate (tb=tb@entry=0x7fffea4b8500 
, rm_from_page_list=rm_from_page_list@entry=true) at 
qemu/accel/tcg/translate-all.c:1192
 1192   if (qatomic_read(&jc->array[h].tb) == tb) {
 (gdb) bt
 #0  0x55626736 in do_tb_phys_invalidate (tb=tb@entry=0x7fffea4b8500 
, rm_from_page_list=rm_from_page_list@entry=true) at 
qemu/accel/tcg/translate-all.c:1192
 #1  0x55626b98 in tb_phys_invalidate__locked (tb=0x7fffea4b8500 
) at qemu/accel/tcg/translate-all.c:1211
 #2  tb_invalidate_phys_page_range__locked (p=, 
start=start@entry=836716683264, end=end@entry=836716687360, retaddr=0, pages=0x0) at 
qemu/accel/tcg/translate-all.c:1678
 #3  0x55626dfb in tb_invalidate_phys_range (start=836716683264, 
start@entry=836716584960, end=end@entry=836716982272) at 
qemu/accel/tcg/translate-all.c:1753
 #4  0x55639e43 in target_munmap (start=start@entry=836716584960, 
len=len@entry=397312) at qemu/linux-user/mmap.c:769

Let me know if you need more information, I can try to extract a
minimal reproducer.


A reproducer would be helpful.


r~




  1   2   3   4   5   >