Re: [PATCH v6 7/7] hw/arm/virt: Add properties to disable high memory regions
Hi Connie, On 10/26/22 7:10 PM, Cornelia Huck wrote: On Wed, Oct 26 2022, Gavin Shan wrote: On 10/25/22 6:54 PM, Cornelia Huck wrote: On Mon, Oct 24 2022, Gavin Shan wrote: These 3 high memory regions are usually enabled by default, but s/These 3/The/ ? Ok. they may be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2. This leads to waste in the PA space. When building the command line, do we have enough information on when the regions provide something useful, and when they just waste space? I think the help messages are already indicative. For example, the help messages for 'highmem-redist2' indicate the region is only needed by GICv3 or GICv4. 'highmem-ecam' and 'highmem-mmio' are needed by PCI ECAM and MMIO and the key words 'high' indicates they're the corresponding second window. #./qemu-system-aarch64 -M virt,? highmem-ecam=- Set on/off to enable/disable high memory region for PCI ECAM highmem-mmio=- Set on/off to enable/disable high memory region for PCI MMIO highmem-redists= - Set on/off to enable/disable high memory region for GICv3 or GICv4 redistributor OK, hopefully this is enough for anyone building a command line directly. (Do we want to encourage management software like libvirt to switch off regions that are not needed?) Yeah, to disable those regions aren't needed will make libvirt smart. I think it's worthy for libvirt to do it. I'm not sure if arbitrary machine properties have been supported by libvirt or not. For example, 'highmem-redists' can still be turned off from 'vm1.xml' even though it's not explicitly supported in libvirt's code. Add properties to allow users selectively disable them if needed: "highmem-redists", "highmem-ecam", "highmem-mmio". Suggested-by: Marc Zyngier Signed-off-by: Gavin Shan --- docs/system/arm/virt.rst | 12 hw/arm/virt.c| 64 2 files changed, 76 insertions(+) diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index 4454706392..a1668a969d 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -98,6 +98,18 @@ compact-highmem Set ``on``/``off`` to enable/disable the compact layout for high memory regions. The default is ``on`` for machine types later than ``virt-7.2``. +highmem-redists + Set ``on``/``off`` to enable/disable the high memry region for GICv3/4 s/memry/memory/ Ok, copy-and-paste error. Will be fixed. + redistributor. The default is ``on``. Do we need to add a note about what effects setting this to "off" may have, e.g. "Setting this to ``off`` may limit the maximum number of cpus." or so? And/or "Setting this to ``off`` when using GICv2 will save some space."? We may not mention GICv2 since GICv3/v4 are already mentioned. It's a good idea to mention that the maximum number of CPUs is reduced when it's turned off. I will have something like below in next respin if you agree. highmem-redists Set ``on``/``off`` to enable/disable the high memroy region for GICv3 or GICv4 redistributor. The default is ``on``. Setting this to ``off`` will limit the maximum number of CPUs when GICv3 or GICv4 is used. OK, sounds reasonable to me. Ok, Thanks for your confirm. Since 'vms->highmem_redists' is changeable, the 'virt_max_cpus' in machvirt_init() needs to be recalculated based on that. The code change will be included into next respin. Besides, the follow-up error message will be improved to something like below. error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d). The high memory " "region for GICv3 or GICv4 redistributor has been %s", max_cpus, virt_max_cpus, vms->highmem_redists ? "enabled" : "disabled"); Hm, the doc for error_report() states that "The resulting message should be a single phrase, with no newline or trailing punctuation." Maybe if (max_cpus > virt_max_cpus) { error_report("Number of SMP CPUs requested (%d) exceeds max CPUs " "supported by machine 'mach-virt' (%d)", max_cpus, virt_max_cpus); if (vms->gic_version != VIRT_GIC_VERSION_2 && !vms->higmem_redists) { error_printf("Try 'highmem-redists=on' for more CPUs\n"); } exit(1); } Ok, I didn't notice the message raised by error_report() has this sort of requirements. Your suggested snippet looks good to me and it will be included in next respin. However, I would hold next respin (v7) for a while to see if I can receive comments from Peter/Marc on v6 :) Thanks, Gavin
Re: [PATCH 0/4] Shadow VirtQueue event index support
On Fri, Oct 28, 2022 at 4:44 AM Jason Wang wrote: > > > 在 2022/10/27 04:58, Michael S. Tsirkin 写道: > > On Thu, Oct 20, 2022 at 05:52:47PM +0200, Eugenio Pérez wrote: > >> Event idx helps to reduce the number of notifications between the device > >> and the driver. It allows them to specify an index on the circular > >> descriptors rings where to issue the notification, instead of a single > >> binary indicator. > >> > >> Adding support for SVQ. > > > > Jason seems to be taking this through net > > > > Reviewed-by: Michael S. Tsirkin > > > Ok, I've queued this. > > Eugenio, please post the fix for endian on top. > I've got a v2 ready to send, would it be possible to send it right now and send a v2 pull? That would save a few commits where the vdpa devices do not work with big endian architectures. Thanks!
Re: [PATCH v6 5/7] hw/arm/virt: Improve high memory region address assignment
Hi Connie, On 10/26/22 6:43 PM, Cornelia Huck wrote: On Wed, Oct 26 2022, Gavin Shan wrote: On 10/26/22 12:29 AM, Eric Auger wrote: On 10/24/22 05:54, Gavin Shan wrote: There are three high memory regions, which are VIRT_HIGH_REDIST2, VIRT_HIGH_PCIE_ECAM and VIRT_HIGH_PCIE_MMIO. Their base addresses are floating on highest RAM address. However, they can be disabled in several cases. (1) One specific high memory region is likely to be disabled by code by toggling vms->highmem_{redists, ecam, mmio}. (2) VIRT_HIGH_PCIE_ECAM region is disabled on machine, which is 'virt-2.12' or ealier than it. (3) VIRT_HIGH_PCIE_ECAM region is disabled when firmware is loaded on 32-bits system. (4) One specific high memory region is disabled when it breaks the PA space limit. The current implementation of virt_set_{memmap, high_memmap}() isn't optimized because the high memory region's PA space is always reserved, regardless of whatever the actual state in the corresponding vms->highmem_{redists, ecam, mmio} flag. In the code, 'base' and 'vms->highest_gpa' are always increased for case (1), (2) and (3). It's unnecessary since the assigned PA space for the disabled high memory region won't be used afterwards. Improve the address assignment for those three high memory region by skipping the address assignment for one specific high memory region if it has been disabled in case (1), (2) and (3). The memory layout may be changed after the improvement is applied, which leads to potential migration breakage. So 'vms->highmem_compact' is added to control if the improvement should be applied. For now, 'vms->highmem_compact' is set to false, meaning that we don't have memory layout change until it becomes configurable through property 'compact-highmem' in next patch. Signed-off-by: Gavin Shan Reviewed-by: Cornelia Huck the code has quite changed since Connie's R-b Right. Connie, could you please check if the changes make sense to you and I can regain your R-B? :) My R-b still holds. Thanks. Tested-by: Zhenyu Zhang --- hw/arm/virt.c | 15 ++- include/hw/arm/virt.h | 1 + 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index ee98a8a3b6..4896f600b4 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1721,18 +1721,23 @@ static void virt_set_high_memmap(VirtMachineState *vms, vms->memmap[i].size = region_size; /* - * Check each device to see if they fit in the PA space, - * moving highest_gpa as we go. + * Check each device to see if it fits in the PA space, + * moving highest_gpa as we go. For compatibility, move + * highest_gpa for disabled fitting devices as well, if + * the compact layout has been disabled. * * For each device that doesn't fit, disable it. */ fits = (region_base + region_size) <= BIT_ULL(pa_bits); -if (fits) { -vms->highest_gpa = region_base + region_size - 1; +*region_enabled &= fits; +if (vms->highmem_compact && !*region_enabled) { +continue; } -*region_enabled &= fits; base = region_base + region_size; +if (fits) { +vms->highest_gpa = region_base + region_size - 1; vms->highest_gpa = base - 1; It's personal taste actually. I was thinking of using 'base - 1', but 'region_base + region_size - 1' looks more like a direct way. I don't have strong sense though and lets use 'base - 1' in next respin. I don't really have a preference for one or the other. Ok. Lets use 'base - 1' in next respin. Thanks, Gavin
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed
On Wed, Oct 26, 2022 at 01:46:20PM -0700, Isaku Yamahata wrote: > On Tue, Oct 25, 2022 at 11:13:42PM +0800, > Chao Peng wrote: > > > When private/shared memory are mixed in a large page, the lpage_info may > > not be accurate and should be updated with this mixed info. A large page > > has mixed pages can't be really mapped as large page since its > > private/shared pages are from different physical memory. > > > > Update lpage_info when private/shared memory attribute is changed. If > > both private and shared pages are within a large page region, it can't > > be mapped as large page. It's a bit challenge to track the mixed > > info in a 'count' like variable, this patch instead reserves a bit in > > 'disallow_lpage' to indicate a large page has mixed private/share pages. > > > > Signed-off-by: Chao Peng > > --- > > arch/x86/include/asm/kvm_host.h | 8 +++ > > arch/x86/kvm/mmu/mmu.c | 112 +++- > > arch/x86/kvm/x86.c | 2 + > > include/linux/kvm_host.h| 19 ++ > > virt/kvm/kvm_main.c | 16 +++-- > > 5 files changed, 152 insertions(+), 5 deletions(-) > > > ... > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > > index 33b1aec44fb8..67a9823a8c35 100644 > > --- a/arch/x86/kvm/mmu/mmu.c > > +++ b/arch/x86/kvm/mmu/mmu.c > ... > > @@ -6910,3 +6915,108 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm) > > if (kvm->arch.nx_lpage_recovery_thread) > > kthread_stop(kvm->arch.nx_lpage_recovery_thread); > > } > > + > > +static inline bool linfo_is_mixed(struct kvm_lpage_info *linfo) > > +{ > > + return linfo->disallow_lpage & KVM_LPAGE_PRIVATE_SHARED_MIXED; > > +} > > + > > +static inline void linfo_update_mixed(struct kvm_lpage_info *linfo, bool > > mixed) > > +{ > > + if (mixed) > > + linfo->disallow_lpage |= KVM_LPAGE_PRIVATE_SHARED_MIXED; > > + else > > + linfo->disallow_lpage &= ~KVM_LPAGE_PRIVATE_SHARED_MIXED; > > +} > > + > > +static bool mem_attr_is_mixed_2m(struct kvm *kvm, unsigned int attr, > > +gfn_t start, gfn_t end) > > +{ > > + XA_STATE(xas, &kvm->mem_attr_array, start); > > + gfn_t gfn = start; > > + void *entry; > > + bool shared = attr == KVM_MEM_ATTR_SHARED; > > + bool mixed = false; > > + > > + rcu_read_lock(); > > + entry = xas_load(&xas); > > + while (gfn < end) { > > + if (xas_retry(&xas, entry)) > > + continue; > > + > > + KVM_BUG_ON(gfn != xas.xa_index, kvm); > > + > > + if ((entry && !shared) || (!entry && shared)) { > > + mixed = true; > > + goto out; > > nitpick: goto isn't needed. break should work. Thanks. > > > + } > > + > > + entry = xas_next(&xas); > > + gfn++; > > + } > > +out: > > + rcu_read_unlock(); > > + return mixed; > > +} > > + > > +static bool mem_attr_is_mixed(struct kvm *kvm, struct kvm_memory_slot > > *slot, > > + int level, unsigned int attr, > > + gfn_t start, gfn_t end) > > +{ > > + unsigned long gfn; > > + void *entry; > > + > > + if (level == PG_LEVEL_2M) > > + return mem_attr_is_mixed_2m(kvm, attr, start, end); > > + > > + entry = xa_load(&kvm->mem_attr_array, start); > > + for (gfn = start; gfn < end; gfn += KVM_PAGES_PER_HPAGE(level - 1)) { > > + if (linfo_is_mixed(lpage_info_slot(gfn, slot, level - 1))) > > + return true; > > + if (xa_load(&kvm->mem_attr_array, gfn) != entry) > > + return true; > > + } > > + return false; > > +} > > + > > +void kvm_arch_update_mem_attr(struct kvm *kvm, struct kvm_memory_slot > > *slot, > > + unsigned int attr, gfn_t start, gfn_t end) > > +{ > > + > > + unsigned long lpage_start, lpage_end; > > + unsigned long gfn, pages, mask; > > + int level; > > + > > + WARN_ONCE(!(attr & (KVM_MEM_ATTR_PRIVATE | KVM_MEM_ATTR_SHARED)), > > + "Unsupported mem attribute.\n"); > > + > > + /* > > +* The sequence matters here: we update the higher level basing on the > > +* lower level's scanning result. > > +*/ > > + for (level = PG_LEVEL_2M; level <= KVM_MAX_HUGEPAGE_LEVEL; level++) { > > + pages = KVM_PAGES_PER_HPAGE(level); > > + mask = ~(pages - 1); > > nitpick: KVM_HPAGE_MASK(level). Maybe matter of preference. Yes, haven't noticed there is a KVM_HPAGE_MASK defined. Have no strong preference here, since I already have KVM_PAGES_PER_HPAGE(level), getting mask is straightforward. A single KVM_HPAGE_MASK(level) will not give me what I need since here is gfn, KVM_HPAGE_MASK(level)>> PAGE_SHIFT should be the right equivalent. Chao > > > > + lpage_start = max(start & mask, slot->base_gfn); > > + lpage_end = (end - 1) & mask; > > + > > + /* > > +* We only need to scan the head and tail
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit
On Thu, Oct 27, 2022 at 11:27:05AM +0100, Fuad Tabba wrote: > Hi, > > On Tue, Oct 25, 2022 at 4:19 PM Chao Peng wrote: > > > > This new KVM exit allows userspace to handle memory-related errors. It > > indicates an error happens in KVM at guest memory range [gpa, gpa+size). > > The flags includes additional information for userspace to handle the > > error. Currently bit 0 is defined as 'private memory' where '1' > > indicates error happens due to private memory access and '0' indicates > > error happens due to shared memory access. > > > > When private memory is enabled, this new exit will be used for KVM to > > exit to userspace for shared <-> private memory conversion in memory > > encryption usage. In such usage, typically there are two kind of memory > > conversions: > > - explicit conversion: happens when guest explicitly calls into KVM > > to map a range (as private or shared), KVM then exits to userspace > > to perform the map/unmap operations. > > - implicit conversion: happens in KVM page fault handler where KVM > > exits to userspace for an implicit conversion when the page is in a > > different state than requested (private or shared). > > > > Suggested-by: Sean Christopherson > > Co-developed-by: Yu Zhang > > Signed-off-by: Yu Zhang > > Signed-off-by: Chao Peng > > --- > > Reviewed-by: Fuad Tabba > > I have tested the V8 version of this patch on arm64/qemu, and > considering this hasn't changed: > Tested-by: Fuad Tabba Appreciate your review and testing! Chao > > Cheers, > /fuad > > > > > Documentation/virt/kvm/api.rst | 23 +++ > > include/uapi/linux/kvm.h | 9 + > > 2 files changed, 32 insertions(+) > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > > index f3fa75649a78..975688912b8c 100644 > > --- a/Documentation/virt/kvm/api.rst > > +++ b/Documentation/virt/kvm/api.rst > > @@ -6537,6 +6537,29 @@ array field represents return values. The userspace > > should update the return > > values of SBI call before resuming the VCPU. For more details on RISC-V SBI > > spec refer, https://github.com/riscv/riscv-sbi-doc. > > > > +:: > > + > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > + #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > + > > +If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has > > +encountered a memory error which is not handled by KVM kernel module and > > +userspace may choose to handle it. The 'flags' field indicates the memory > > +properties of the exit. > > + > > + - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by > > + private memory access when the bit is set. Otherwise the memory error is > > + caused by shared memory access when the bit is clear. > > + > > +'gpa' and 'size' indicate the memory range the error occurs at. The > > userspace > > +may handle the error and return to KVM to retry the previous memory access. > > + > > :: > > > > /* KVM_EXIT_NOTIFY */ > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index f1ae45c10c94..fa60b032a405 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -300,6 +300,7 @@ struct kvm_xen_exit { > > #define KVM_EXIT_RISCV_SBI35 > > #define KVM_EXIT_RISCV_CSR36 > > #define KVM_EXIT_NOTIFY 37 > > +#define KVM_EXIT_MEMORY_FAULT 38 > > > > /* For KVM_EXIT_INTERNAL_ERROR */ > > /* Emulate instruction failed. */ > > @@ -538,6 +539,14 @@ struct kvm_run { > > #define KVM_NOTIFY_CONTEXT_INVALID (1 << 0) > > __u32 flags; > > } notify; > > + /* KVM_EXIT_MEMORY_FAULT */ > > + struct { > > +#define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0) > > + __u32 flags; > > + __u32 padding; > > + __u64 gpa; > > + __u64 size; > > + } memory; > > /* Fix the size of the union. */ > > char padding[256]; > > }; > > -- > > 2.25.1 > >
[PATCH V5 4/4] intel-iommu: PASID support
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the existing support for scalable mode, we need to implement the following missing parts: 1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation with PASID 2) tag IOTLB with PASID 3) PASID cache and its flush 4) PASID based IOTLB invalidation For simplicity PASID cache is not implemented so we can simply implement the PASID cache flush as a no and leave it to be implemented in the future. For PASID based IOTLB invalidation, since we haven't had L1 stage support, the PASID based IOTLB invalidation is not implemented yet. For PASID based device IOTLB invalidation, it requires the support for vhost so we forbid enabling device IOTLB when PASID is enabled now. Those work could be done in the future. Note that though PASID based IOMMU translation is ready but no device can issue PASID DMA right now. In this case, PCI_NO_PASID is used as PASID to identify the address without PASID. vtd_find_add_as() has been extended to provision address space with PASID which could be utilized by the future extension of PCI core to allow device model to use PASID based DMA translation. This feature would be useful for: 1) prototyping PASID support for devices like virtio 2) future vPASID work 3) future PRS and vSVA work Reviewed-by: Peter Xu Signed-off-by: Jason Wang --- Changes since V3: - rearrange the member for vtd_iotlb_key structure - reorder the pasid parameter ahead of addr for vtd_lookup_iotlb() - allow access size from 1 to 8 for vtd_mem_ir_fault_ops Changes since V2: - forbid device-iotlb with PASID - report PASID based qualified fault - log PASID during errors --- hw/i386/intel_iommu.c | 416 + hw/i386/intel_iommu_internal.h | 16 +- hw/i386/trace-events | 2 + include/hw/i386/intel_iommu.h | 7 +- include/hw/pci/pci_bus.h | 2 + 5 files changed, 339 insertions(+), 104 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 9029ee98f4..7ca077b824 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -58,6 +58,14 @@ struct vtd_as_key { PCIBus *bus; uint8_t devfn; +uint32_t pasid; +}; + +struct vtd_iotlb_key { +uint64_t gfn; +uint32_t pasid; +uint32_t level; +uint16_t sid; }; static void vtd_address_space_refresh_all(IntelIOMMUState *s); @@ -199,14 +207,24 @@ static inline gboolean vtd_as_has_map_notifier(VTDAddressSpace *as) } /* GHashTable functions */ -static gboolean vtd_uint64_equal(gconstpointer v1, gconstpointer v2) +static gboolean vtd_iotlb_equal(gconstpointer v1, gconstpointer v2) { -return *((const uint64_t *)v1) == *((const uint64_t *)v2); +const struct vtd_iotlb_key *key1 = v1; +const struct vtd_iotlb_key *key2 = v2; + +return key1->sid == key2->sid && + key1->pasid == key2->pasid && + key1->level == key2->level && + key1->gfn == key2->gfn; } -static guint vtd_uint64_hash(gconstpointer v) +static guint vtd_iotlb_hash(gconstpointer v) { -return (guint)*(const uint64_t *)v; +const struct vtd_iotlb_key *key = v; + +return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) | + (key->level) << VTD_IOTLB_LVL_SHIFT | + (key->pasid) << VTD_IOTLB_PASID_SHIFT; } static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2) @@ -214,7 +232,8 @@ static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2) const struct vtd_as_key *key1 = v1; const struct vtd_as_key *key2 = v2; -return (key1->bus == key2->bus) && (key1->devfn == key2->devfn); +return (key1->bus == key2->bus) && (key1->devfn == key2->devfn) && + (key1->pasid == key2->pasid); } /* @@ -302,13 +321,6 @@ static void vtd_reset_caches(IntelIOMMUState *s) vtd_iommu_unlock(s); } -static uint64_t vtd_get_iotlb_key(uint64_t gfn, uint16_t source_id, - uint32_t level) -{ -return gfn | ((uint64_t)(source_id) << VTD_IOTLB_SID_SHIFT) | - ((uint64_t)(level) << VTD_IOTLB_LVL_SHIFT); -} - static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level) { return (addr & vtd_slpt_level_page_mask(level)) >> VTD_PAGE_SHIFT_4K; @@ -316,15 +328,17 @@ static uint64_t vtd_get_iotlb_gfn(hwaddr addr, uint32_t level) /* Must be called with IOMMU lock held */ static VTDIOTLBEntry *vtd_lookup_iotlb(IntelIOMMUState *s, uint16_t source_id, - hwaddr addr) + uint32_t pasid, hwaddr addr) { +struct vtd_iotlb_key key; VTDIOTLBEntry *entry; -uint64_t key; int level; for (level = VTD_SL_PT_LEVEL; level < VTD_SL_PML4_LEVEL; level++) { -key = vtd_get_iotlb_key(vtd_get_iotlb_gfn(addr, level), -source_id, level); +key.gfn = vtd_get_iotlb_gfn(addr, level); +key.level = level; +key.sid = source_id; +key.pasid = pasid;
[PATCH V5 0/4] PASID support for Intel IOMMU
Hi All: This series tries to introduce PASID support for Intel IOMMU. The work is based on the previous scalabe mode support by implement the ECAP_PASID. A new "x-pasid-mode" is introduced to enable this mode. All internal vIOMMU codes were extended to support PASID instead of the current RID2PASID method. The code is also capable of provisiong address space with PASID. Note that no devices can issue PASID DMA right now, this needs future work. This will be used for prototying PASID based device like virtio or future vPASID support for Intel IOMMU. Test has been done with the Linux guest with scalalbe mode enabled and disabled. A virtio prototype[1][2] that can issue PAISD based DMA request were also tested, different PASID were used in TX and RX in those testing drivers. Changes since V4: - rename vtd_report_qualify_fault() to vtd_report_fault() - Tweak the code to avoid using ret variable when getting rid2pasid Changes since V3: - rearrange the member for vtd_iotlb_key structure - reorder the pasid parameter ahead of addr for vtd_lookup_iotlb() - allow access size from 1 to 8 for vtd_mem_ir_fault_ops Changes since V2: - use PCI_BUILD_BDF() instead of vtd_make_source_id() - Tweak the comments above vtd_as_hash() - use PCI_BUS_NUM() instead of open coding - rename vtd_as to vtd_address_spaces - rename vtd_qualify_report_fault() to vtd_report_qualify_fault() - forbid device-iotlb with PASID - report PASID based qualified fault - log PASID during errors Changes since V1: - speed up IOMMU translation when RID2PASID is not used - remove the unnecessary L1 PASID invalidation descriptor support - adding support for catching the translation to interrupt range when in the case of PT and scalable mode - refine the comments to explain the hash algorithm used in IOTLB lookups Please review. [1] https://github.com/jasowang/qemu.git virtio-pasid [2] https://github.com/jasowang/linux.git virtio-pasid Jason Wang (4): intel-iommu: don't warn guest errors when getting rid2pasid entry intel-iommu: drop VTDBus intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function intel-iommu: PASID support hw/i386/intel_iommu.c | 692 ++--- hw/i386/intel_iommu_internal.h | 16 +- hw/i386/trace-events | 2 + include/hw/i386/intel_iommu.h | 18 +- include/hw/pci/pci_bus.h | 2 + 5 files changed, 485 insertions(+), 245 deletions(-) -- 2.25.1
[PATCH V5 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an internal goto which prevents it from being reused. This patch convert that macro to a dedicated function and let the caller to decide what to do (e.g using goto or not). This makes sure it can be re-used for other function that requires fault reporting. Reviewed-by: Peter Xu Signed-off-by: Jason Wang --- Changes since V4: - rename vtd_report_qualify_fault() to vtd_report_fault() Changes since V2: - rename vtd_qualify_report_fault() to vtd_report_qualify_fault() --- hw/i386/intel_iommu.c | 42 -- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 9fe5a222eb..9029ee98f4 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -49,17 +49,6 @@ /* pe operations */ #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT) #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & VTD_SM_PASID_ENTRY_AW)) -#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write) {\ -if (ret_fr) { \ -ret_fr = -ret_fr; \ -if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) { \ -trace_vtd_fault_disabled(); \ -} else { \ -vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write); \ -} \ -goto error; \ -} \ -} /* * PCI bus number (or SID) is not reliable since the device is usaully @@ -1716,6 +1705,19 @@ out: trace_vtd_pt_enable_fast_path(source_id, success); } +static void vtd_report_fault(IntelIOMMUState *s, + int err, bool is_fpd_set, + uint16_t source_id, + hwaddr addr, + bool is_write) +{ +if (is_fpd_set && vtd_is_qualified_fault(err)) { +trace_vtd_fault_disabled(); +} else { +vtd_report_dmar_fault(s, source_id, addr, err, is_write); +} +} + /* Map dev to context-entry then do a paging-structures walk to do a iommu * translation. * @@ -1776,7 +1778,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD; if (!is_fpd_set && s->root_scalable) { ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set); -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write); +if (ret_fr) { +vtd_report_fault(s, -ret_fr, is_fpd_set, + source_id, addr, is_write); +goto error; +} } } else { ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); @@ -1784,7 +1790,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, if (!ret_fr && !is_fpd_set && s->root_scalable) { ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set); } -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write); +if (ret_fr) { +vtd_report_fault(s, -ret_fr, is_fpd_set, + source_id, addr, is_write); +goto error; +} /* Update context-cache */ trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo, cc_entry->context_cache_gen, @@ -1820,7 +1830,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus, ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level, &reads, &writes, s->aw_bits); -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write); +if (ret_fr) { +vtd_report_fault(s, -ret_fr, is_fpd_set, source_id, + addr, is_write); +goto error; +} page_mask = vtd_slpt_level_page_mask(level); access_flags = IOMMU_ACCESS_FLAG(reads, writes); -- 2.25.1
[PATCH V5 2/4] intel-iommu: drop VTDBus
We introduce VTDBus structure as an intermediate step for searching the address space. This works well with SID based matching/lookup. But when we want to support SID plus PASID based address space lookup, this intermediate steps turns out to be a burden. So the patch simply drops the VTDBus structure and use the PCIBus and devfn as the key for the g_hash_table(). This simplifies the codes and the future PASID extension. To prevent being slower for past vtd_find_as_from_bus_num() callers, a vtd_as cache indexed by the bus number is introduced to store the last recent search result of a vtd_as belongs to a specific bus. Reviewed-by: Peter Xu Signed-off-by: Jason Wang --- Changes since V2: - use PCI_BUILD_BDF() instead of vtd_make_source_id() - Tweak the comments above vtd_as_hash() - use PCI_BUS_NUM() instead of open coding - rename vtd_as to vtd_address_spaces --- hw/i386/intel_iommu.c | 234 +- include/hw/i386/intel_iommu.h | 11 +- 2 files changed, 118 insertions(+), 127 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 271de995be..9fe5a222eb 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -61,6 +61,16 @@ } \ } +/* + * PCI bus number (or SID) is not reliable since the device is usaully + * initalized before guest can configure the PCI bridge + * (SECONDARY_BUS_NUMBER). + */ +struct vtd_as_key { +PCIBus *bus; +uint8_t devfn; +}; + static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); @@ -210,6 +220,27 @@ static guint vtd_uint64_hash(gconstpointer v) return (guint)*(const uint64_t *)v; } +static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2) +{ +const struct vtd_as_key *key1 = v1; +const struct vtd_as_key *key2 = v2; + +return (key1->bus == key2->bus) && (key1->devfn == key2->devfn); +} + +/* + * Note that we use pointer to PCIBus as the key, so hashing/shifting + * based on the pointer value is intended. Note that we deal with + * collisions through vtd_as_equal(). + */ +static guint vtd_as_hash(gconstpointer v) +{ +const struct vtd_as_key *key = v; +guint value = (guint)(uintptr_t)key->bus; + +return (guint)(value << 8 | key->devfn); +} + static gboolean vtd_hash_remove_by_domain(gpointer key, gpointer value, gpointer user_data) { @@ -248,22 +279,14 @@ static gboolean vtd_hash_remove_by_page(gpointer key, gpointer value, static void vtd_reset_context_cache_locked(IntelIOMMUState *s) { VTDAddressSpace *vtd_as; -VTDBus *vtd_bus; -GHashTableIter bus_it; -uint32_t devfn_it; +GHashTableIter as_it; trace_vtd_context_cache_reset(); -g_hash_table_iter_init(&bus_it, s->vtd_as_by_busptr); +g_hash_table_iter_init(&as_it, s->vtd_address_spaces); -while (g_hash_table_iter_next (&bus_it, NULL, (void**)&vtd_bus)) { -for (devfn_it = 0; devfn_it < PCI_DEVFN_MAX; ++devfn_it) { -vtd_as = vtd_bus->dev_as[devfn_it]; -if (!vtd_as) { -continue; -} -vtd_as->context_cache_entry.context_cache_gen = 0; -} +while (g_hash_table_iter_next (&as_it, NULL, (void**)&vtd_as)) { +vtd_as->context_cache_entry.context_cache_gen = 0; } s->context_cache_gen = 1; } @@ -993,32 +1016,6 @@ static bool vtd_slpte_nonzero_rsvd(uint64_t slpte, uint32_t level) return slpte & rsvd_mask; } -/* Find the VTD address space associated with a given bus number */ -static VTDBus *vtd_find_as_from_bus_num(IntelIOMMUState *s, uint8_t bus_num) -{ -VTDBus *vtd_bus = s->vtd_as_by_bus_num[bus_num]; -GHashTableIter iter; - -if (vtd_bus) { -return vtd_bus; -} - -/* - * Iterate over the registered buses to find the one which - * currently holds this bus number and update the bus_num - * lookup table. - */ -g_hash_table_iter_init(&iter, s->vtd_as_by_busptr); -while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) { -if (pci_bus_num(vtd_bus->bus) == bus_num) { -s->vtd_as_by_bus_num[bus_num] = vtd_bus; -return vtd_bus; -} -} - -return NULL; -} - /* Given the @iova, get relevant @slptep. @slpte_level will be the last level * of the translation, can be used for deciding the size of large page. */ @@ -1632,24 +1629,13 @@ static bool vtd_switch_address_space(VTDAddressSpace *as) static void vtd_switch_address_space_all(IntelIOMMUState *s) { +VTDAddressSpace *vtd_as; GHashTableIter iter; -VTDBus *vtd_bus; -int i; - -g_hash_table_iter_init(&iter, s->vtd_as_by_busptr); -while (g_hash_table_iter_next(&iter, NULL, (void **)&vtd_bus)) { -for (i = 0; i < PCI_DEVFN_MAX; i++) { -if (!vtd_bus->dev_as[i]) { -
[PATCH V5 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry
We use to warn on wrong rid2pasid entry. But this error could be triggered by the guest and could happens during initialization. So let's don't warn in this case. Reviewed-by: Peter Xu Signed-off-by: Jason Wang --- Changes since v4: - Tweak the code to avoid using ret variable --- hw/i386/intel_iommu.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 6524c2ee32..271de995be 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce) if (s->root_scalable) { ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe); if (ret) { -error_report_once("%s: vtd_ce_get_rid2pasid_entry error: %"PRId32, - __func__, ret); +/* + * This error is guest triggerable. We should assumt PT + * not enabled for safety. + */ return false; } return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT); @@ -1569,14 +1571,12 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as) { IntelIOMMUState *s; VTDContextEntry ce; -int ret; assert(as); s = as->iommu_state; -ret = vtd_dev_to_context_entry(s, pci_bus_num(as->bus), - as->devfn, &ce); -if (ret) { +if (vtd_dev_to_context_entry(s, pci_bus_num(as->bus), as->devfn, + &ce)) { /* * Possibly failed to parse the context entry for some reason * (e.g., during init, or any guest configuration errors on -- 2.25.1
[PULL 1/4] qom: Improve error messages when property has no getter or setter
When you try to set a property that has no setter, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /machine type q35 Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'pc-i440fx-7.2-machine.type' is not writable Do the same for getting a property that has no getter. Signed-off-by: Markus Armbruster Message-Id: <20221012153801.2604340-2-arm...@redhat.com> Reviewed-by: David Hildenbrand --- qom/object.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/qom/object.c b/qom/object.c index d34608558e..e5cef30f6d 100644 --- a/qom/object.c +++ b/qom/object.c @@ -1383,7 +1383,8 @@ bool object_property_get(Object *obj, const char *name, Visitor *v, } if (!prop->get) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property '%s.%s' is not readable", + object_get_typename(obj), name); return false; } prop->get(obj, v, name, prop->opaque, &err); @@ -1402,7 +1403,8 @@ bool object_property_set(Object *obj, const char *name, Visitor *v, } if (!prop->set) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property '%s.%s' is not writable", + object_get_typename(obj), name); return false; } prop->set(obj, v, name, prop->opaque, errp); -- 2.37.3
[PULL 3/4] qtest: Improve error messages when property can not be set right now
When you try to set qtest property "log" while the qtest object is active, the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -chardev socket,id=chrqt0,path=qtest.socket,server=on,wait=off -object qtest,id=qt0,chardev=chrqt0,log=/dev/null QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/qt0 log qtest.log Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'log' can not be set now Same for property "chardev". Signed-off-by: Markus Armbruster Message-Id: <20221012153801.2604340-4-arm...@redhat.com> --- softmmu/qtest.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/softmmu/qtest.c b/softmmu/qtest.c index f8acef2628..afea7693d0 100644 --- a/softmmu/qtest.c +++ b/softmmu/qtest.c @@ -977,7 +977,7 @@ static void qtest_set_log(Object *obj, const char *value, Error **errp) QTest *q = QTEST(obj); if (qtest == q) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'log' can not be set now"); } else { g_free(q->log); q->log = g_strdup(value); @@ -997,7 +997,7 @@ static void qtest_set_chardev(Object *obj, const char *value, Error **errp) Chardev *chr; if (qtest == q) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'chardev' can not be set now"); return; } -- 2.37.3
[PULL 2/4] backends: Improve error messages when property can no longer be set
When you try to set virtio-rng property "filename" after the backend has been completed with user_creatable_complete(), the error message blames "insufficient permission": $ qemu-system-x86_64 -S -display none -nodefaults -monitor stdio -object rng-random,id=rng0 -device virtio-rng,id=vrng0,rng=rng0 QEMU 7.1.50 monitor - type 'help' for more information (qemu) qom-set /objects/rng0 filename /dev/random Error: Insufficient permission to perform this operation This implies it could work with "sufficient permission". It can't. Change the error message to: Error: Property 'filename' can no longer be set Same for cryptodev-vhost-user property "chardev", rng-egd property "chardev", and vhost-user-backend property "chardev". Signed-off-by: Markus Armbruster Message-Id: <20221012153801.2604340-3-arm...@redhat.com> Acked-by: Michael S. Tsirkin [Commit message tidied up] --- backends/cryptodev-vhost-user.c | 2 +- backends/rng-egd.c | 2 +- backends/rng-random.c | 2 +- backends/vhost-user.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/backends/cryptodev-vhost-user.c b/backends/cryptodev-vhost-user.c index 5443a59153..f9c5867e38 100644 --- a/backends/cryptodev-vhost-user.c +++ b/backends/cryptodev-vhost-user.c @@ -339,7 +339,7 @@ static void cryptodev_vhost_user_set_chardev(Object *obj, CRYPTODEV_BACKEND_VHOST_USER(obj); if (s->opened) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'chardev' can no longer be set"); } else { g_free(s->chr_name); s->chr_name = g_strdup(value); diff --git a/backends/rng-egd.c b/backends/rng-egd.c index 4de142b9dc..684c3cf3d6 100644 --- a/backends/rng-egd.c +++ b/backends/rng-egd.c @@ -116,7 +116,7 @@ static void rng_egd_set_chardev(Object *obj, const char *value, Error **errp) RngEgd *s = RNG_EGD(b); if (b->opened) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'chardev' can no longer be set"); } else { g_free(s->chr_name); s->chr_name = g_strdup(value); diff --git a/backends/rng-random.c b/backends/rng-random.c index 7add272edd..80eb5be138 100644 --- a/backends/rng-random.c +++ b/backends/rng-random.c @@ -96,7 +96,7 @@ static void rng_random_set_filename(Object *obj, const char *filename, RngRandom *s = RNG_RANDOM(obj); if (b->opened) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'filename' can no longer be set"); return; } diff --git a/backends/vhost-user.c b/backends/vhost-user.c index 10b39992d2..5dedb2d987 100644 --- a/backends/vhost-user.c +++ b/backends/vhost-user.c @@ -141,7 +141,7 @@ static void set_chardev(Object *obj, const char *value, Error **errp) Chardev *chr; if (b->completed) { -error_setg(errp, QERR_PERMISSION_DENIED); +error_setg(errp, "Property 'chardev' can no longer be set"); return; } -- 2.37.3
[PULL 4/4] qerror: QERR_PERMISSION_DENIED is no longer used, drop
Signed-off-by: Markus Armbruster Message-Id: <20221012153801.2604340-5-arm...@redhat.com> --- include/qapi/qmp/qerror.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/include/qapi/qmp/qerror.h b/include/qapi/qmp/qerror.h index 596fce0c54..87ca83b155 100644 --- a/include/qapi/qmp/qerror.h +++ b/include/qapi/qmp/qerror.h @@ -50,9 +50,6 @@ #define QERR_MISSING_PARAMETER \ "Parameter '%s' is missing" -#define QERR_PERMISSION_DENIED \ -"Insufficient permission to perform this operation" - #define QERR_PROPERTY_VALUE_BAD \ "Property '%s.%s' doesn't take value '%s'" -- 2.37.3
[PULL 0/4] Error reporting patches for 2022-10-28
The following changes since commit 344744e148e6e865f5a57e745b02a87e5ea534ad: Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu into staging (2022-10-26 10:53:49 -0400) are available in the Git repository at: https://repo.or.cz/qemu/armbru.git tags/pull-error-2022-10-28 for you to fetch changes up to 0dddb0fc80f83d3bb469dc220ba8e2496b27a205: qerror: QERR_PERMISSION_DENIED is no longer used, drop (2022-10-27 07:57:18 +0200) Error reporting patches for 2022-10-28 Markus Armbruster (4): qom: Improve error messages when property has no getter or setter backends: Improve error messages when property can no longer be set qtest: Improve error messages when property can not be set right now qerror: QERR_PERMISSION_DENIED is no longer used, drop include/qapi/qmp/qerror.h | 3 --- backends/cryptodev-vhost-user.c | 2 +- backends/rng-egd.c | 2 +- backends/rng-random.c | 2 +- backends/vhost-user.c | 2 +- qom/object.c| 6 -- softmmu/qtest.c | 4 ++-- 7 files changed, 10 insertions(+), 11 deletions(-) -- 2.37.3
[PULL 14/26] qapi: net: introduce a way to bypass qemu_opts_parse_noisily()
From: Laurent Vivier As qemu_opts_parse_noisily() flattens the QAPI structures ("type" field of Netdev structure can collides with "type" field of SocketAddress), we introduce a way to bypass qemu_opts_parse_noisily() and use directly visit_type_Netdev() to parse the backend parameters. More details from Markus: qemu_init() passes the argument of -netdev, -nic, and -net to net_client_parse(). net_client_parse() parses with qemu_opts_parse_noisily(), passing QemuOptsList qemu_netdev_opts for -netdev, qemu_nic_opts for -nic, and qemu_net_opts for -net. Their desc[] are all empty, which means any keys are accepted. The result of the parse (a QemuOpts) is stored in the QemuOptsList. Note that QemuOpts is flat by design. In some places, we layer non-flat on top using dotted keys convention, but not here. net_init_clients() iterates over the stored QemuOpts, and passes them to net_init_netdev(), net_param_nic(), or net_init_client(), respectively. These functions pass the QemuOpts to net_client_init(). They also do other things with the QemuOpts, which we can ignore here. net_client_init() uses the opts visitor to convert the (flat) QemOpts to a (non-flat) QAPI object Netdev. Netdev is also the argument of QMP command netdev_add. The opts visitor was an early attempt to support QAPI in (QemuOpts-based) CLI. It restricts QAPI types to a certain shape; see commit eb7ee2cbeb "qapi: introduce OptsVisitor". A more modern way to support QAPI is qobject_input_visitor_new_str(). It uses keyval_parse() instead of QemuOpts for KEY=VALUE,... syntax, and it also supports JSON syntax. The former isn't quite as expressive as JSON, but it's a lot closer than QemuOpts + opts visitor. This commit paves the way to use of the modern way instead. Signed-off-by: Laurent Vivier Reviewed-by: Markus Armbruster Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- include/net/net.h | 2 ++ net/net.c | 57 +++ softmmu/vl.c | 6 +- 3 files changed, 64 insertions(+), 1 deletion(-) diff --git a/include/net/net.h b/include/net/net.h index 55023e7..025dbf1 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -220,6 +220,8 @@ extern NICInfo nd_table[MAX_NICS]; extern const char *host_net_devices[]; /* from net.c */ +bool netdev_is_modern(const char *optarg); +void netdev_parse_modern(const char *optarg); void net_client_parse(QemuOptsList *opts_list, const char *str); void show_netdevs(void); void net_init_clients(void); diff --git a/net/net.c b/net/net.c index 178257a..128a90f 100644 --- a/net/net.c +++ b/net/net.c @@ -54,6 +54,7 @@ #include "net/colo-compare.h" #include "net/filter.h" #include "qapi/string-output-visitor.h" +#include "qapi/qobject-input-visitor.h" /* Net bridge is currently not supported for W32. */ #if !defined(_WIN32) @@ -63,6 +64,16 @@ static VMChangeStateEntry *net_change_state_entry; static QTAILQ_HEAD(, NetClientState) net_clients; +typedef struct NetdevQueueEntry { +Netdev *nd; +Location loc; +QSIMPLEQ_ENTRY(NetdevQueueEntry) entry; +} NetdevQueueEntry; + +typedef QSIMPLEQ_HEAD(, NetdevQueueEntry) NetdevQueue; + +static NetdevQueue nd_queue = QSIMPLEQ_HEAD_INITIALIZER(nd_queue); + /***/ /* network device redirectors */ @@ -1566,6 +1577,20 @@ out: return ret; } +static void netdev_init_modern(void) +{ +while (!QSIMPLEQ_EMPTY(&nd_queue)) { +NetdevQueueEntry *nd = QSIMPLEQ_FIRST(&nd_queue); + +QSIMPLEQ_REMOVE_HEAD(&nd_queue, entry); +loc_push_restore(&nd->loc); +net_client_init1(nd->nd, true, &error_fatal); +loc_pop(&nd->loc); +qapi_free_Netdev(nd->nd); +g_free(nd); +} +} + void net_init_clients(void) { net_change_state_entry = @@ -1573,6 +1598,8 @@ void net_init_clients(void) QTAILQ_INIT(&net_clients); +netdev_init_modern(); + qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL, &error_fatal); @@ -1583,6 +1610,36 @@ void net_init_clients(void) &error_fatal); } +/* + * Does this -netdev argument use modern rather than traditional syntax? + * Modern syntax is to be parsed with netdev_parse_modern(). + * Traditional syntax is to be parsed with net_client_parse(). + */ +bool netdev_is_modern(const char *optarg) +{ +return false; +} + +/* + * netdev_parse_modern() uses modern, more expressive syntax than + * net_client_parse(), but supports only the -netdev option. + * netdev_parse_modern() appends to @nd_queue, whereas net_client_parse() + * appends to @qemu_netdev_opts. + */ +void netdev_parse_modern(const char *optarg) +{ +Visitor *v; +NetdevQueueEntry *nd; + +v = qobject_input_visitor_new_str(optarg, "type", &error_fatal); +nd = g_new(NetdevQueueEntry, 1); +visit_type_Netdev(v, NULL, &nd->nd, &error_fatal); +visit_free(v); +loc_save(&nd-
[PULL 15/26] net: introduce qemu_set_info_str() function
From: Laurent Vivier Embed the setting of info_str in a function. Signed-off-by: Laurent Vivier Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Jason Wang --- hw/net/xen_nic.c | 5 ++--- include/net/net.h | 1 + net/l2tpv3.c | 3 +-- net/net.c | 17 - net/slirp.c | 5 ++--- net/socket.c | 33 ++--- net/tap-win32.c | 3 +-- net/tap.c | 13 + net/vde.c | 3 +-- net/vhost-user.c | 3 +-- net/vhost-vdpa.c | 2 +- 11 files changed, 41 insertions(+), 47 deletions(-) diff --git a/hw/net/xen_nic.c b/hw/net/xen_nic.c index 5c815b4..7d92c2d 100644 --- a/hw/net/xen_nic.c +++ b/hw/net/xen_nic.c @@ -296,9 +296,8 @@ static int net_init(struct XenLegacyDevice *xendev) netdev->nic = qemu_new_nic(&net_xen_info, &netdev->conf, "xen", NULL, netdev); -snprintf(qemu_get_queue(netdev->nic)->info_str, - sizeof(qemu_get_queue(netdev->nic)->info_str), - "nic: xenbus vif macaddr=%s", netdev->mac); +qemu_set_info_str(qemu_get_queue(netdev->nic), + "nic: xenbus vif macaddr=%s", netdev->mac); /* fill info */ xenstore_write_be_int(&netdev->xendev, "feature-rx-copy", 1); diff --git a/include/net/net.h b/include/net/net.h index 025dbf1..3db75ff 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -177,6 +177,7 @@ ssize_t qemu_send_packet_async(NetClientState *nc, const uint8_t *buf, void qemu_purge_queued_packets(NetClientState *nc); void qemu_flush_queued_packets(NetClientState *nc); void qemu_flush_or_purge_queued_packets(NetClientState *nc, bool purge); +void qemu_set_info_str(NetClientState *nc, const char *fmt, ...); void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]); bool qemu_has_ufo(NetClientState *nc); bool qemu_has_vnet_hdr(NetClientState *nc); diff --git a/net/l2tpv3.c b/net/l2tpv3.c index af373e5..350041a 100644 --- a/net/l2tpv3.c +++ b/net/l2tpv3.c @@ -723,8 +723,7 @@ int net_init_l2tpv3(const Netdev *netdev, l2tpv3_read_poll(s, true); -snprintf(s->nc.info_str, sizeof(s->nc.info_str), - "l2tpv3: connected"); +qemu_set_info_str(&s->nc, "l2tpv3: connected"); return 0; outerr: qemu_del_net_client(nc); diff --git a/net/net.c b/net/net.c index 128a90f..f29a59c 100644 --- a/net/net.c +++ b/net/net.c @@ -141,13 +141,20 @@ char *qemu_mac_strdup_printf(const uint8_t *macaddr) macaddr[3], macaddr[4], macaddr[5]); } +void qemu_set_info_str(NetClientState *nc, const char *fmt, ...) +{ +va_list ap; + +va_start(ap, fmt); +vsnprintf(nc->info_str, sizeof(nc->info_str), fmt, ap); +va_end(ap); +} + void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]) { -snprintf(nc->info_str, sizeof(nc->info_str), - "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x", - nc->model, - macaddr[0], macaddr[1], macaddr[2], - macaddr[3], macaddr[4], macaddr[5]); +qemu_set_info_str(nc, "model=%s,macaddr=%02x:%02x:%02x:%02x:%02x:%02x", + nc->model, macaddr[0], macaddr[1], macaddr[2], + macaddr[3], macaddr[4], macaddr[5]); } static int mac_table[256] = {0}; diff --git a/net/slirp.c b/net/slirp.c index 8679be6..14a8d59 100644 --- a/net/slirp.c +++ b/net/slirp.c @@ -611,9 +611,8 @@ static int net_slirp_init(NetClientState *peer, const char *model, nc = qemu_new_net_client(&net_slirp_info, peer, model, name); -snprintf(nc->info_str, sizeof(nc->info_str), - "net=%s,restrict=%s", inet_ntoa(net), - restricted ? "on" : "off"); +qemu_set_info_str(nc, "net=%s,restrict=%s", inet_ntoa(net), + restricted ? "on" : "off"); s = DO_UPCAST(SlirpState, nc, nc); diff --git a/net/socket.c b/net/socket.c index bfd8596..ade1ecf 100644 --- a/net/socket.c +++ b/net/socket.c @@ -179,7 +179,7 @@ static void net_socket_send(void *opaque) s->fd = -1; net_socket_rs_init(&s->rs, net_socket_rs_finalize, false); s->nc.link_down = true; -memset(s->nc.info_str, 0, sizeof(s->nc.info_str)); +qemu_set_info_str(&s->nc, ""); return; } @@ -387,16 +387,15 @@ static NetSocketState *net_socket_fd_init_dgram(NetClientState *peer, /* mcast: save bound address as dst */ if (is_connected && mcast != NULL) { s->dgram_dst = saddr; -snprintf(nc->info_str, sizeof(nc->info_str), - "socket: fd=%d (cloned mcast=%s:%d)", - fd, inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); +qemu_set_info_str(nc, "socket: fd=%d (cloned mcast=%s:%d)", fd, + inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); } else { if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) { s->dgram_dst.sin_family =
[PULL 12/26] net: remove the @errp argument of net_client_inits()
From: Laurent Vivier The only caller passes &error_fatal, so use this directly in the function. It's what we do for -blockdev, -device, and -object. Suggested-by: Markus Armbruster Signed-off-by: Laurent Vivier Reviewed-by: Markus Armbruster Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- include/net/net.h | 2 +- net/net.c | 20 +++- softmmu/vl.c | 2 +- 3 files changed, 9 insertions(+), 15 deletions(-) diff --git a/include/net/net.h b/include/net/net.h index 81d0b21..c1c34a5 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -222,7 +222,7 @@ extern const char *host_net_devices[]; /* from net.c */ int net_client_parse(QemuOptsList *opts_list, const char *str); void show_netdevs(void); -int net_init_clients(Error **errp); +void net_init_clients(void); void net_check_clients(void); void net_cleanup(void); void hmp_host_net_add(Monitor *mon, const QDict *qdict); diff --git a/net/net.c b/net/net.c index 407bdd1..5e78880 100644 --- a/net/net.c +++ b/net/net.c @@ -1566,27 +1566,21 @@ out: return ret; } -int net_init_clients(Error **errp) +void net_init_clients(void) { net_change_state_entry = qemu_add_vm_change_state_handler(net_vm_change_state_handler, NULL); QTAILQ_INIT(&net_clients); -if (qemu_opts_foreach(qemu_find_opts("netdev"), - net_init_netdev, NULL, errp)) { -return -1; -} - -if (qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, errp)) { -return -1; -} +qemu_opts_foreach(qemu_find_opts("netdev"), net_init_netdev, NULL, + &error_fatal); -if (qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, errp)) { -return -1; -} +qemu_opts_foreach(qemu_find_opts("nic"), net_param_nic, NULL, + &error_fatal); -return 0; +qemu_opts_foreach(qemu_find_opts("net"), net_init_client, NULL, + &error_fatal); } int net_client_parse(QemuOptsList *opts_list, const char *optarg) diff --git a/softmmu/vl.c b/softmmu/vl.c index b464da2..a4ae131 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -1904,7 +1904,7 @@ static void qemu_create_late_backends(void) qtest_server_init(qtest_chrdev, qtest_log, &error_fatal); } -net_init_clients(&error_fatal); +net_init_clients(); object_option_foreach_add(object_create_late); -- 2.7.4
[PULL 17/26] net: socket: Don't ignore EINVAL on netdev socket connection
From: Stefano Brivio Other errors are treated as failure by net_socket_connect_init(), but if connect() returns EINVAL, we'll fail silently. Remove the related exception. Signed-off-by: Stefano Brivio Signed-off-by: Laurent Vivier Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- net/socket.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/socket.c b/net/socket.c index ade1ecf..4944bb7 100644 --- a/net/socket.c +++ b/net/socket.c @@ -577,8 +577,7 @@ static int net_socket_connect_init(NetClientState *peer, if (errno == EINTR || errno == EWOULDBLOCK) { /* continue */ } else if (errno == EINPROGRESS || - errno == EALREADY || - errno == EINVAL) { + errno == EALREADY) { break; } else { error_setg_errno(errp, errno, "can't connect socket"); -- 2.7.4
[PULL 11/26] net: introduce convert_host_port()
From: Laurent Vivier Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- include/qemu/sockets.h | 2 ++ net/net.c | 62 ++ 2 files changed, 34 insertions(+), 30 deletions(-) diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h index 036745e..db4bedb 100644 --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -65,6 +65,8 @@ void socket_listen_cleanup(int fd, Error **errp); int socket_dgram(SocketAddress *remote, SocketAddress *local, Error **errp); /* Old, ipv4 only bits. Don't use for new code. */ +int convert_host_port(struct sockaddr_in *saddr, const char *host, + const char *port, Error **errp); int parse_host_port(struct sockaddr_in *saddr, const char *str, Error **errp); int socket_init(void); diff --git a/net/net.c b/net/net.c index 8ddafac..407bdd1 100644 --- a/net/net.c +++ b/net/net.c @@ -66,55 +66,57 @@ static QTAILQ_HEAD(, NetClientState) net_clients; /***/ /* network device redirectors */ -int parse_host_port(struct sockaddr_in *saddr, const char *str, -Error **errp) +int convert_host_port(struct sockaddr_in *saddr, const char *host, + const char *port, Error **errp) { -gchar **substrings; struct hostent *he; -const char *addr, *p, *r; -int port, ret = 0; +const char *r; +long p; memset(saddr, 0, sizeof(*saddr)); -substrings = g_strsplit(str, ":", 2); -if (!substrings || !substrings[0] || !substrings[1]) { -error_setg(errp, "host address '%s' doesn't contain ':' " - "separating host from port", str); -ret = -1; -goto out; -} - -addr = substrings[0]; -p = substrings[1]; - saddr->sin_family = AF_INET; -if (addr[0] == '\0') { +if (host[0] == '\0') { saddr->sin_addr.s_addr = 0; } else { -if (qemu_isdigit(addr[0])) { -if (!inet_aton(addr, &saddr->sin_addr)) { +if (qemu_isdigit(host[0])) { +if (!inet_aton(host, &saddr->sin_addr)) { error_setg(errp, "host address '%s' is not a valid " - "IPv4 address", addr); -ret = -1; -goto out; + "IPv4 address", host); +return -1; } } else { -he = gethostbyname(addr); +he = gethostbyname(host); if (he == NULL) { -error_setg(errp, "can't resolve host address '%s'", addr); -ret = -1; -goto out; +error_setg(errp, "can't resolve host address '%s'", host); +return -1; } saddr->sin_addr = *(struct in_addr *)he->h_addr; } } -port = strtol(p, (char **)&r, 0); -if (r == p) { -error_setg(errp, "port number '%s' is invalid", p); +if (qemu_strtol(port, &r, 0, &p) != 0) { +error_setg(errp, "port number '%s' is invalid", port); +return -1; +} +saddr->sin_port = htons(p); +return 0; +} + +int parse_host_port(struct sockaddr_in *saddr, const char *str, +Error **errp) +{ +gchar **substrings; +int ret; + +substrings = g_strsplit(str, ":", 2); +if (!substrings || !substrings[0] || !substrings[1]) { +error_setg(errp, "host address '%s' doesn't contain ':' " + "separating host from port", str); ret = -1; goto out; } -saddr->sin_port = htons(port); + +ret = convert_host_port(saddr, substrings[0], substrings[1], errp); out: g_strfreev(substrings); -- 2.7.4
[PULL 23/26] qemu-sockets: move and rename SocketAddress_to_str()
From: Laurent Vivier Rename SocketAddress_to_str() to socket_uri() and move it to util/qemu-sockets.c close to socket_parse(). socket_uri() generates a string from a SocketAddress while socket_parse() generates a SocketAddress from a string. Signed-off-by: Laurent Vivier Reviewed-by: David Gibson Reviewed-by: Dr. David Alan Gilbert Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- include/qemu/sockets.h | 2 +- monitor/hmp-cmds.c | 23 +-- util/qemu-sockets.c| 20 3 files changed, 22 insertions(+), 23 deletions(-) diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h index db4bedb..214058d 100644 --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -58,6 +58,7 @@ NetworkAddressFamily inet_netfamily(int family); int unix_listen(const char *path, Error **errp); int unix_connect(const char *path, Error **errp); +char *socket_uri(SocketAddress *addr); SocketAddress *socket_parse(const char *str, Error **errp); int socket_connect(SocketAddress *addr, Error **errp); int socket_listen(SocketAddress *addr, int num, Error **errp); @@ -141,5 +142,4 @@ SocketAddress *socket_address_flatten(SocketAddressLegacy *addr); * Return 0 on success. */ int socket_address_parse_named_fd(SocketAddress *addr, Error **errp); - #endif /* QEMU_SOCKETS_H */ diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index bab86c5..01b789a 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -199,27 +199,6 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict) qapi_free_MouseInfoList(mice_list); } -static char *SocketAddress_to_str(SocketAddress *addr) -{ -switch (addr->type) { -case SOCKET_ADDRESS_TYPE_INET: -return g_strdup_printf("tcp:%s:%s", - addr->u.inet.host, - addr->u.inet.port); -case SOCKET_ADDRESS_TYPE_UNIX: -return g_strdup_printf("unix:%s", - addr->u.q_unix.path); -case SOCKET_ADDRESS_TYPE_FD: -return g_strdup_printf("fd:%s", addr->u.fd.str); -case SOCKET_ADDRESS_TYPE_VSOCK: -return g_strdup_printf("tcp:%s:%s", - addr->u.vsock.cid, - addr->u.vsock.port); -default: -return g_strdup("unknown address type"); -} -} - void hmp_info_migrate(Monitor *mon, const QDict *qdict) { MigrationInfo *info; @@ -382,7 +361,7 @@ void hmp_info_migrate(Monitor *mon, const QDict *qdict) monitor_printf(mon, "socket address: [\n"); for (addr = info->socket_address; addr; addr = addr->next) { -char *s = SocketAddress_to_str(addr->value); +char *s = socket_uri(addr->value); monitor_printf(mon, "\t%s\n", s); g_free(s); } diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 83f4bd6..9f6f655 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -1077,6 +1077,26 @@ int unix_connect(const char *path, Error **errp) return sock; } +char *socket_uri(SocketAddress *addr) +{ +switch (addr->type) { +case SOCKET_ADDRESS_TYPE_INET: +return g_strdup_printf("tcp:%s:%s", + addr->u.inet.host, + addr->u.inet.port); +case SOCKET_ADDRESS_TYPE_UNIX: +return g_strdup_printf("unix:%s", + addr->u.q_unix.path); +case SOCKET_ADDRESS_TYPE_FD: +return g_strdup_printf("fd:%s", addr->u.fd.str); +case SOCKET_ADDRESS_TYPE_VSOCK: +return g_strdup_printf("tcp:%s:%s", + addr->u.vsock.cid, + addr->u.vsock.port); +default: +return g_strdup("unknown address type"); +} +} SocketAddress *socket_parse(const char *str, Error **errp) { -- 2.7.4
[PULL 25/26] net: stream: move to QIO to enable additional parameters
From: Laurent Vivier Use QIOChannel, QIOChannelSocket and QIONetListener. This allows net/stream to use all the available parameters provided by SocketAddress. Signed-off-by: Laurent Vivier Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- net/stream.c| 492 qemu-options.hx | 4 +- 2 files changed, 178 insertions(+), 318 deletions(-) diff --git a/net/stream.c b/net/stream.c index 884f473..54c67e1 100644 --- a/net/stream.c +++ b/net/stream.c @@ -35,48 +35,36 @@ #include "qemu/iov.h" #include "qemu/main-loop.h" #include "qemu/cutils.h" +#include "io/channel.h" +#include "io/channel-socket.h" +#include "io/net-listener.h" typedef struct NetStreamState { NetClientState nc; -int listen_fd; -int fd; +QIOChannel *listen_ioc; +QIONetListener *listener; +QIOChannel *ioc; +guint ioc_read_tag; +guint ioc_write_tag; SocketReadState rs; unsigned int send_index; /* number of bytes sent*/ -bool read_poll; /* waiting to receive data? */ -bool write_poll; /* waiting to transmit data? */ } NetStreamState; -static void net_stream_send(void *opaque); -static void net_stream_accept(void *opaque); -static void net_stream_writable(void *opaque); +static void net_stream_listen(QIONetListener *listener, + QIOChannelSocket *cioc, + void *opaque); -static void net_stream_update_fd_handler(NetStreamState *s) +static gboolean net_stream_writable(QIOChannel *ioc, +GIOCondition condition, +gpointer data) { -qemu_set_fd_handler(s->fd, -s->read_poll ? net_stream_send : NULL, -s->write_poll ? net_stream_writable : NULL, -s); -} - -static void net_stream_read_poll(NetStreamState *s, bool enable) -{ -s->read_poll = enable; -net_stream_update_fd_handler(s); -} - -static void net_stream_write_poll(NetStreamState *s, bool enable) -{ -s->write_poll = enable; -net_stream_update_fd_handler(s); -} - -static void net_stream_writable(void *opaque) -{ -NetStreamState *s = opaque; +NetStreamState *s = data; -net_stream_write_poll(s, false); +s->ioc_write_tag = 0; qemu_flush_queued_packets(&s->nc); + +return G_SOURCE_REMOVE; } static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf, @@ -93,13 +81,15 @@ static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf, .iov_len = size, }, }; +struct iovec local_iov[2]; +unsigned int nlocal_iov; size_t remaining; ssize_t ret; remaining = iov_size(iov, 2) - s->send_index; -ret = iov_send(s->fd, iov, 2, s->send_index, remaining); - -if (ret == -1 && errno == EAGAIN) { +nlocal_iov = iov_copy(local_iov, 2, iov, 2, s->send_index, remaining); +ret = qio_channel_writev(s->ioc, local_iov, nlocal_iov, NULL); +if (ret == QIO_CHANNEL_ERR_BLOCK) { ret = 0; /* handled further down */ } if (ret == -1) { @@ -108,19 +98,25 @@ static ssize_t net_stream_receive(NetClientState *nc, const uint8_t *buf, } if (ret < (ssize_t)remaining) { s->send_index += ret; -net_stream_write_poll(s, true); +s->ioc_write_tag = qio_channel_add_watch(s->ioc, G_IO_OUT, + net_stream_writable, s, NULL); return 0; } s->send_index = 0; return size; } +static gboolean net_stream_send(QIOChannel *ioc, +GIOCondition condition, +gpointer data); + static void net_stream_send_completed(NetClientState *nc, ssize_t len) { NetStreamState *s = DO_UPCAST(NetStreamState, nc, nc); -if (!s->read_poll) { -net_stream_read_poll(s, true); +if (!s->ioc_read_tag) { +s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN, +net_stream_send, s, NULL); } } @@ -131,19 +127,24 @@ static void net_stream_rs_finalize(SocketReadState *rs) if (qemu_send_packet_async(&s->nc, rs->buf, rs->packet_len, net_stream_send_completed) == 0) { -net_stream_read_poll(s, false); +if (s->ioc_read_tag) { +g_source_remove(s->ioc_read_tag); +s->ioc_read_tag = 0; +} } } -static void net_stream_send(void *opaque) +static gboolean net_stream_send(QIOChannel *ioc, +GIOCondition condition, +gpointer data) { -NetStreamState *s = opaque; +NetStreamState *s = data; int size; int ret; -uint8_t buf1[NET_BUFSIZE]; -const uint8_t *buf; +char buf1[NET_BUFSIZE]; +const char *buf; -size = r
[PULL 04/26] vdpa: Remove shadow CVQ command check
From: Eugenio Pérez The guest will see undefined behavior if it issue not negotiate commands, bit it is expected somehow. Simplify code deleting this check. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang Signed-off-by: Jason Wang --- net/vhost-vdpa.c | 48 1 file changed, 48 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index eebf29f..6d64000 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -462,48 +462,6 @@ static NetClientInfo net_vhost_vdpa_cvq_info = { }; /** - * Do not forward commands not supported by SVQ. Otherwise, the device could - * accept it and qemu would not know how to update the device model. - */ -static bool vhost_vdpa_net_cvq_validate_cmd(const void *out_buf, size_t len) -{ -struct virtio_net_ctrl_hdr ctrl; - -if (unlikely(len < sizeof(ctrl))) { -qemu_log_mask(LOG_GUEST_ERROR, - "%s: invalid legnth of out buffer %zu\n", __func__, len); -return false; -} - -memcpy(&ctrl, out_buf, sizeof(ctrl)); -switch (ctrl.class) { -case VIRTIO_NET_CTRL_MAC: -switch (ctrl.cmd) { -case VIRTIO_NET_CTRL_MAC_ADDR_SET: -return true; -default: -qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mac cmd %u\n", - __func__, ctrl.cmd); -}; -break; -case VIRTIO_NET_CTRL_MQ: -switch (ctrl.cmd) { -case VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET: -return true; -default: -qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid mq cmd %u\n", - __func__, ctrl.cmd); -}; -break; -default: -qemu_log_mask(LOG_GUEST_ERROR, "%s: invalid control class %u\n", - __func__, ctrl.class); -}; - -return false; -} - -/** * Validate and copy control virtqueue commands. * * Following QEMU guidelines, we offer a copy of the buffers to the device to @@ -526,16 +484,10 @@ static int vhost_vdpa_net_handle_ctrl_avail(VhostShadowVirtqueue *svq, .iov_len = sizeof(status), }; ssize_t dev_written = -EINVAL; -bool ok; out.iov_len = iov_to_buf(elem->out_sg, elem->out_num, 0, s->cvq_cmd_out_buffer, vhost_vdpa_net_cvq_cmd_len()); -ok = vhost_vdpa_net_cvq_validate_cmd(s->cvq_cmd_out_buffer, out.iov_len); -if (unlikely(!ok)) { -goto out; -} - dev_written = vhost_vdpa_net_cvq_add(s, out.iov_len, sizeof(status)); if (unlikely(dev_written < 0)) { goto out; -- 2.7.4
[PULL 05/26] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa
From: Si-Wei Liu Similar to other vhost backends, vhostfd can be passed to vhost-vdpa backend as another parameter to instantiate vhost-vdpa net client. This would benefit the use case where only open file descriptors, as opposed to raw vhost-vdpa device paths, are accessible from the QEMU process. (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1 Signed-off-by: Si-Wei Liu Acked-by: Eugenio Pérez Signed-off-by: Jason Wang --- net/vhost-vdpa.c | 25 - qapi/net.json| 3 +++ qemu-options.hx | 6 -- 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 6d64000..9c1b25d 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -634,14 +634,29 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char *name, assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA); opts = &netdev->u.vhost_vdpa; -if (!opts->vhostdev) { -error_setg(errp, "vdpa character device not specified with vhostdev"); +if (!opts->has_vhostdev && !opts->has_vhostfd) { +error_setg(errp, + "vhost-vdpa: neither vhostdev= nor vhostfd= was specified"); return -1; } -vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp); -if (vdpa_device_fd == -1) { -return -errno; +if (opts->has_vhostdev && opts->has_vhostfd) { +error_setg(errp, + "vhost-vdpa: vhostdev= and vhostfd= are mutually exclusive"); +return -1; +} + +if (opts->has_vhostdev) { +vdpa_device_fd = qemu_open(opts->vhostdev, O_RDWR, errp); +if (vdpa_device_fd == -1) { +return -errno; +} +} else if (opts->has_vhostfd) { +vdpa_device_fd = monitor_fd_param(monitor_cur(), opts->vhostfd, errp); +if (vdpa_device_fd == -1) { +error_prepend(errp, "vhost-vdpa: unable to parse vhostfd: "); +return -1; +} } r = vhost_vdpa_get_features(vdpa_device_fd, &features, errp); diff --git a/qapi/net.json b/qapi/net.json index dd088c0..926ecc8 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -442,6 +442,8 @@ # @vhostdev: path of vhost-vdpa device #(default:'/dev/vhost-vdpa-0') # +# @vhostfd: file descriptor of an already opened vhost vdpa device +# # @queues: number of queues to be created for multiqueue vhost-vdpa # (default: 1) # @@ -456,6 +458,7 @@ { 'struct': 'NetdevVhostVDPAOptions', 'data': { '*vhostdev': 'str', +'*vhostfd': 'str', '*queues': 'int', '*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } } diff --git a/qemu-options.hx b/qemu-options.hx index eb38e5d..876a70a 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2790,8 +2790,10 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, "configure a vhost-user network, backed by a chardev 'dev'\n" #endif #ifdef __linux__ -"-netdev vhost-vdpa,id=str,vhostdev=/path/to/dev\n" +"-netdev vhost-vdpa,id=str[,vhostdev=/path/to/dev][,vhostfd=h]\n" "configure a vhost-vdpa network,Establish a vhost-vdpa netdev\n" +"use 'vhostdev=/path/to/dev' to open a vhost vdpa device\n" +"use 'vhostfd=h' to connect to an already opened vhost vdpa device\n" #endif #ifdef CONFIG_VMNET "-netdev vmnet-host,id=str[,isolated=on|off][,net-uuid=uuid]\n" @@ -3296,7 +3298,7 @@ SRST -netdev type=vhost-user,id=net0,chardev=chr0 \ -device virtio-net-pci,netdev=net0 -``-netdev vhost-vdpa,vhostdev=/path/to/dev`` +``-netdev vhost-vdpa[,vhostdev=/path/to/dev][,vhostfd=h]`` Establish a vhost-vdpa netdev. vDPA device is a device that uses a datapath which complies with -- 2.7.4
[PULL 19/26] net: stream: add unix socket
From: Laurent Vivier Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Acked-by: Michael S. Tsirkin Acked-by: Markus Armbruster (QAPI schema) Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Jason Wang --- net/stream.c| 107 +--- qapi/net.json | 2 +- qemu-options.hx | 1 + 3 files changed, 104 insertions(+), 6 deletions(-) diff --git a/net/stream.c b/net/stream.c index e4388fe..884f473 100644 --- a/net/stream.c +++ b/net/stream.c @@ -235,7 +235,7 @@ static NetStreamState *net_stream_fd_init(NetClientState *peer, static void net_stream_accept(void *opaque) { NetStreamState *s = opaque; -struct sockaddr_in saddr; +struct sockaddr_storage saddr; socklen_t len; int fd; @@ -253,8 +253,26 @@ static void net_stream_accept(void *opaque) s->fd = fd; s->nc.link_down = false; net_stream_connect(s); -qemu_set_info_str(&s->nc, "connection from %s:%d", - inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); +switch (saddr.ss_family) { +case AF_INET: { +struct sockaddr_in *saddr_in = (struct sockaddr_in *)&saddr; + +qemu_set_info_str(&s->nc, "connection from %s:%d", + inet_ntoa(saddr_in->sin_addr), + ntohs(saddr_in->sin_port)); +break; +} +case AF_UNIX: { +struct sockaddr_un saddr_un; + +len = sizeof(saddr_un); +getsockname(s->listen_fd, (struct sockaddr *)&saddr_un, &len); +qemu_set_info_str(&s->nc, "connect from %s", saddr_un.sun_path); +break; +} +default: +g_assert_not_reached(); +} } static int net_stream_server_init(NetClientState *peer, @@ -294,6 +312,43 @@ static int net_stream_server_init(NetClientState *peer, } break; } +case SOCKET_ADDRESS_TYPE_UNIX: { +struct sockaddr_un saddr_un; + +ret = unlink(addr->u.q_unix.path); +if (ret < 0 && errno != ENOENT) { +error_setg_errno(errp, errno, "failed to unlink socket %s", + addr->u.q_unix.path); +return -1; +} + +saddr_un.sun_family = PF_UNIX; +ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s", + addr->u.q_unix.path); +if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) { +error_setg(errp, "UNIX socket path '%s' is too long", + addr->u.q_unix.path); +error_append_hint(errp, "Path must be less than %zu bytes\n", + sizeof(saddr_un.sun_path)); +return -1; +} + +fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0); +if (fd < 0) { +error_setg_errno(errp, errno, "can't create stream socket"); +return -1; +} +qemu_socket_set_nonblock(fd); + +ret = bind(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un)); +if (ret < 0) { +error_setg_errno(errp, errno, "can't create socket with path: %s", + saddr_un.sun_path); +closesocket(fd); +return -1; +} +break; +} case SOCKET_ADDRESS_TYPE_FD: fd = monitor_fd_param(monitor_cur(), addr->u.fd.str, errp); if (fd == -1) { @@ -337,6 +392,7 @@ static int net_stream_client_init(NetClientState *peer, { NetStreamState *s; struct sockaddr_in saddr_in; +struct sockaddr_un saddr_un; int fd, connected, ret; switch (addr->type) { @@ -373,6 +429,45 @@ static int net_stream_client_init(NetClientState *peer, } } break; +case SOCKET_ADDRESS_TYPE_UNIX: +saddr_un.sun_family = PF_UNIX; +ret = snprintf(saddr_un.sun_path, sizeof(saddr_un.sun_path), "%s", + addr->u.q_unix.path); +if (ret < 0 || ret >= sizeof(saddr_un.sun_path)) { +error_setg(errp, "UNIX socket path '%s' is too long", + addr->u.q_unix.path); +error_append_hint(errp, "Path must be less than %zu bytes\n", + sizeof(saddr_un.sun_path)); +return -1; +} + +fd = qemu_socket(PF_UNIX, SOCK_STREAM, 0); +if (fd < 0) { +error_setg_errno(errp, errno, "can't create stream socket"); +return -1; +} +qemu_socket_set_nonblock(fd); + +connected = 0; +for (;;) { +ret = connect(fd, (struct sockaddr *)&saddr_un, sizeof(saddr_un)); +if (ret < 0) { +if (errno == EINTR || errno == EWOULDBLOCK) { +/* continue */ +} else if (errno == EAGAIN || + errno == EALREADY) { +break; +} else { +error_setg_errno(errp, errno, "can't connect socket"); +closesocket(fd); +
[PULL 18/26] net: stream: Don't ignore EINVAL on netdev socket connection
From: Stefano Brivio Other errors are treated as failure by net_stream_client_init(), but if connect() returns EINVAL, we'll fail silently. Remove the related exception. Signed-off-by: Stefano Brivio [lvivier: applied to net/stream.c] Signed-off-by: Laurent Vivier Reviewed-by: Daniel P. Berrangé Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- net/stream.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/stream.c b/net/stream.c index 0a7e847..e4388fe 100644 --- a/net/stream.c +++ b/net/stream.c @@ -360,8 +360,7 @@ static int net_stream_client_init(NetClientState *peer, if (errno == EINTR || errno == EWOULDBLOCK) { /* continue */ } else if (errno == EINPROGRESS || - errno == EALREADY || - errno == EINVAL) { + errno == EALREADY) { break; } else { error_setg_errno(errp, errno, "can't connect socket"); -- 2.7.4
[PULL 24/26] qemu-sockets: update socket_uri() and socket_parse() to be consistent
From: Laurent Vivier To be consistent with socket_uri(), add 'tcp:' prefix for inet type in socket_parse(), by default socket_parse() use tcp when no prefix is provided (format is host:port). In socket_uri(), use 'vsock:' prefix for vsock type rather than 'tcp:' because it makes a vsock address look like an inet address with CID misinterpreted as host. Goes back to commit 9aca82ba31 "migration: Create socket-address parameter" Signed-off-by: Laurent Vivier Reviewed-by: Dr. David Alan Gilbert Reviewed-by: Markus Armbruster Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- util/qemu-sockets.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c index 9f6f655..a9926af 100644 --- a/util/qemu-sockets.c +++ b/util/qemu-sockets.c @@ -1090,7 +1090,7 @@ char *socket_uri(SocketAddress *addr) case SOCKET_ADDRESS_TYPE_FD: return g_strdup_printf("fd:%s", addr->u.fd.str); case SOCKET_ADDRESS_TYPE_VSOCK: -return g_strdup_printf("tcp:%s:%s", +return g_strdup_printf("vsock:%s:%s", addr->u.vsock.cid, addr->u.vsock.port); default: @@ -1124,6 +1124,11 @@ SocketAddress *socket_parse(const char *str, Error **errp) if (vsock_parse(&addr->u.vsock, str + strlen("vsock:"), errp)) { goto fail; } +} else if (strstart(str, "tcp:", NULL)) { +addr->type = SOCKET_ADDRESS_TYPE_INET; +if (inet_parse(&addr->u.inet, str + strlen("tcp:"), errp)) { +goto fail; +} } else { addr->type = SOCKET_ADDRESS_TYPE_INET; if (inet_parse(&addr->u.inet, str, errp)) { -- 2.7.4
[PULL 13/26] net: simplify net_client_parse() error management
From: Laurent Vivier All net_client_parse() callers exit in case of error. Move exit(1) to net_client_parse() and remove error checking from the callers. Suggested-by: Markus Armbruster Signed-off-by: Laurent Vivier Reviewed-by: Markus Armbruster Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- include/net/net.h | 2 +- net/net.c | 6 ++ softmmu/vl.c | 12 +++- 3 files changed, 6 insertions(+), 14 deletions(-) diff --git a/include/net/net.h b/include/net/net.h index c1c34a5..55023e7 100644 --- a/include/net/net.h +++ b/include/net/net.h @@ -220,7 +220,7 @@ extern NICInfo nd_table[MAX_NICS]; extern const char *host_net_devices[]; /* from net.c */ -int net_client_parse(QemuOptsList *opts_list, const char *str); +void net_client_parse(QemuOptsList *opts_list, const char *str); void show_netdevs(void); void net_init_clients(void); void net_check_clients(void); diff --git a/net/net.c b/net/net.c index 5e78880..178257a 100644 --- a/net/net.c +++ b/net/net.c @@ -1583,13 +1583,11 @@ void net_init_clients(void) &error_fatal); } -int net_client_parse(QemuOptsList *opts_list, const char *optarg) +void net_client_parse(QemuOptsList *opts_list, const char *optarg) { if (!qemu_opts_parse_noisily(opts_list, optarg, true)) { -return -1; +exit(1); } - -return 0; } /* From FreeBSD */ diff --git a/softmmu/vl.c b/softmmu/vl.c index a4ae131..e69aa43 100644 --- a/softmmu/vl.c +++ b/softmmu/vl.c @@ -2801,21 +2801,15 @@ void qemu_init(int argc, char **argv) break; case QEMU_OPTION_netdev: default_net = 0; -if (net_client_parse(qemu_find_opts("netdev"), optarg) == -1) { -exit(1); -} +net_client_parse(qemu_find_opts("netdev"), optarg); break; case QEMU_OPTION_nic: default_net = 0; -if (net_client_parse(qemu_find_opts("nic"), optarg) == -1) { -exit(1); -} +net_client_parse(qemu_find_opts("nic"), optarg); break; case QEMU_OPTION_net: default_net = 0; -if (net_client_parse(qemu_find_opts("net"), optarg) == -1) { -exit(1); -} +net_client_parse(qemu_find_opts("net"), optarg); break; #ifdef CONFIG_LIBISCSI case QEMU_OPTION_iscsi: -- 2.7.4
[PULL 03/26] vdpa: Delete duplicated vdpa_feature_bits entry
From: Eugenio Pérez This entry was duplicated on referenced commit. Removing it. Fixes: 402378407dbd ("vhost-vdpa: multiqueue support") Signed-off-by: Eugenio Pérez Acked-by: Jason Wang Signed-off-by: Jason Wang --- net/vhost-vdpa.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c index 4bc3fd0..eebf29f 100644 --- a/net/vhost-vdpa.c +++ b/net/vhost-vdpa.c @@ -63,7 +63,6 @@ const int vdpa_feature_bits[] = { VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_RX_EXTRA, VIRTIO_NET_F_CTRL_VLAN, -VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_CTRL_MAC_ADDR, VIRTIO_NET_F_RSS, VIRTIO_NET_F_MQ, -- 2.7.4
[PULL 07/26] vhost: allocate event_idx fields on vring
From: Eugenio Pérez There was not enough room to accomodate them. Reviewed-by: Michael S. Tsirkin Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-shadow-virtqueue.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index 596d443..a518f84 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -570,16 +570,16 @@ void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq, size_t vhost_svq_driver_area_size(const VhostShadowVirtqueue *svq) { size_t desc_size = sizeof(vring_desc_t) * svq->vring.num; -size_t avail_size = offsetof(vring_avail_t, ring) + - sizeof(uint16_t) * svq->vring.num; +size_t avail_size = offsetof(vring_avail_t, ring[svq->vring.num]) + + sizeof(uint16_t); return ROUND_UP(desc_size + avail_size, qemu_real_host_page_size()); } size_t vhost_svq_device_area_size(const VhostShadowVirtqueue *svq) { -size_t used_size = offsetof(vring_used_t, ring) + -sizeof(vring_used_elem_t) * svq->vring.num; +size_t used_size = offsetof(vring_used_t, ring[svq->vring.num]) + + sizeof(uint16_t); return ROUND_UP(used_size, qemu_real_host_page_size()); } -- 2.7.4
[PULL 08/26] vhost: toggle device callbacks using used event idx
From: Eugenio Pérez Actually use the new field of the used ring and tell the device if SVQ wants to be notified. The code is not reachable at the moment. Reviewed-by: Michael S. Tsirkin Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-shadow-virtqueue.c | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index a518f84..f5c0fad 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -369,15 +369,27 @@ static bool vhost_svq_more_used(VhostShadowVirtqueue *svq) */ static bool vhost_svq_enable_notification(VhostShadowVirtqueue *svq) { -svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT); -/* Make sure the flag is written before the read of used_idx */ +if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) { +uint16_t *used_event = (uint16_t *)&svq->vring.avail->ring[svq->vring.num]; +*used_event = svq->shadow_used_idx; +} else { +svq->vring.avail->flags &= ~cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT); +} + +/* Make sure the event is enabled before the read of used_idx */ smp_mb(); return !vhost_svq_more_used(svq); } static void vhost_svq_disable_notification(VhostShadowVirtqueue *svq) { -svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT); +/* + * No need to disable notification in the event idx case, since used event + * index is already an index too far away. + */ +if (!virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) { +svq->vring.avail->flags |= cpu_to_le16(VRING_AVAIL_F_NO_INTERRUPT); +} } static uint16_t vhost_svq_last_desc_of_chain(const VhostShadowVirtqueue *svq, -- 2.7.4
[PULL 16/26] qapi: net: add stream and dgram netdevs
From: Laurent Vivier Copied from socket netdev file and modified to use SocketAddress to be able to introduce new features like unix socket. "udp" and "mcast" are squashed into dgram netdev, multicast is detected according to the IP address type. "listen" and "connect" modes are managed by stream netdev. An optional parameter "server" defines the mode (off by default) The two new types need to be parsed the modern way with -netdev, because with the traditional way, the "type" field of netdev structure collides with the "type" field of SocketAddress and prevents the correct evaluation of the command line option. Moreover the traditional way doesn't allow to use the same type (SocketAddress) several times with the -netdev option (needed to specify "local" and "remote" addresses). The previous commit paved the way for parsing the modern way, but omitted one detail: how to pick modern vs. traditional, in netdev_is_modern(). We want to pick based on the value of parameter "type". But how to extract it from the option argument? Parsing the option argument, either the modern or the traditional way, extracts it for us, but only if parsing succeeds. If parsing fails, there is no good option. No matter which parser we pick, it'll be the wrong one for some arguments, and the error reporting will be confusing. Fortunately, the traditional parser accepts *anything* when called in a certain way. This maximizes our chance to extract the value of "type", and in turn minimizes the risk of confusing error reporting. Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Acked-by: Markus Armbruster Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- hmp-commands.hx | 2 +- net/clients.h | 6 + net/dgram.c | 537 net/hub.c | 2 + net/meson.build | 2 + net/net.c | 30 +++- net/stream.c| 425 qapi/net.json | 66 ++- qemu-options.hx | 12 ++ 9 files changed, 1078 insertions(+), 4 deletions(-) create mode 100644 net/dgram.c create mode 100644 net/stream.c diff --git a/hmp-commands.hx b/hmp-commands.hx index 12b6d4e..673e39a 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1276,7 +1276,7 @@ ERST { .name = "netdev_add", .args_type = "netdev:O", -.params = "[user|tap|socket|vde|bridge|hubport|netmap|vhost-user" +.params = "[user|tap|socket|stream|dgram|vde|bridge|hubport|netmap|vhost-user" #ifdef CONFIG_VMNET "|vmnet-host|vmnet-shared|vmnet-bridged" #endif diff --git a/net/clients.h b/net/clients.h index c915778..ed8bdff 100644 --- a/net/clients.h +++ b/net/clients.h @@ -40,6 +40,12 @@ int net_init_hubport(const Netdev *netdev, const char *name, int net_init_socket(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); +int net_init_stream(const Netdev *netdev, const char *name, +NetClientState *peer, Error **errp); + +int net_init_dgram(const Netdev *netdev, const char *name, + NetClientState *peer, Error **errp); + int net_init_tap(const Netdev *netdev, const char *name, NetClientState *peer, Error **errp); diff --git a/net/dgram.c b/net/dgram.c new file mode 100644 index 000..5339585 --- /dev/null +++ b/net/dgram.c @@ -0,0 +1,537 @@ +/* + * QEMU System Emulator + * + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2022 Red Hat, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "qemu/osdep.h" + +#include "net/net.h" +#include "clients.h" +#include "monitor/monitor.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "qemu/option.h" +#include "qemu/sockets.h" +#include "qemu/iov.h" +#include "qemu/main-loop.h" +#include "qemu/cutils.h" + +typedef struct NetDgramState { +NetClientState nc; +
[PULL 09/26] vhost: use avail event idx on vhost_svq_kick
From: Eugenio Pérez So SVQ code knows if an event is needed. The code is not reachable at the moment. Reviewed-by: Michael S. Tsirkin Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-shadow-virtqueue.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index f5c0fad..f306ebe 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -218,12 +218,22 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, static void vhost_svq_kick(VhostShadowVirtqueue *svq) { +bool needs_kick; + /* * We need to expose the available array entries before checking the used * flags */ smp_mb(); -if (svq->vring.used->flags & VRING_USED_F_NO_NOTIFY) { + +if (virtio_vdev_has_feature(svq->vdev, VIRTIO_RING_F_EVENT_IDX)) { +uint16_t avail_event = *(uint16_t *)(&svq->vring.used->ring[svq->vring.num]); +needs_kick = vring_need_event(avail_event, svq->shadow_avail_idx, svq->shadow_avail_idx - 1); +} else { +needs_kick = !(svq->vring.used->flags & VRING_USED_F_NO_NOTIFY); +} + +if (!needs_kick) { return; } -- 2.7.4
[PULL 21/26] net: dgram: move mcast specific code from net_socket_fd_init_dgram()
From: Laurent Vivier It is less complex to manage special cases directly in net_dgram_mcast_init() and net_dgram_udp_init(). Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- net/dgram.c | 143 1 file changed, 76 insertions(+), 67 deletions(-) diff --git a/net/dgram.c b/net/dgram.c index e20be9c..e581cc6 100644 --- a/net/dgram.c +++ b/net/dgram.c @@ -259,52 +259,11 @@ static NetClientInfo net_dgram_socket_info = { static NetDgramState *net_dgram_fd_init(NetClientState *peer, const char *model, const char *name, -int fd, int is_fd, -SocketAddress *mcast, +int fd, Error **errp) { -struct sockaddr_in *saddr = NULL; -int newfd; NetClientState *nc; NetDgramState *s; -SocketAddress *sa; -SocketAddressType sa_type; - -sa = socket_local_address(fd, errp); -if (!sa) { -return NULL; -} -sa_type = sa->type; -qapi_free_SocketAddress(sa); - -/* - * fd passed: multicast: "learn" dest_addr address from bound address and - * save it. Because this may be "shared" socket from a "master" process, - * datagrams would be recv() by ONLY ONE process: we must "clone" this - * dgram socket --jjo - */ - -if (is_fd && mcast != NULL) { -saddr = g_new(struct sockaddr_in, 1); - -if (convert_host_port(saddr, mcast->u.inet.host, mcast->u.inet.port, - errp) < 0) { -goto err; -} -/* must be bound */ -if (saddr->sin_addr.s_addr == 0) { -error_setg(errp, "can't setup multicast destination address"); -goto err; -} -/* clone dgram socket */ -newfd = net_dgram_mcast_create(saddr, NULL, errp); -if (newfd < 0) { -goto err; -} -/* clone newfd to fd, close newfd */ -dup2(newfd, fd); -close(newfd); -} nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name); @@ -314,23 +273,7 @@ static NetDgramState *net_dgram_fd_init(NetClientState *peer, net_socket_rs_init(&s->rs, net_dgram_rs_finalize, false); net_dgram_read_poll(s, true); -/* mcast: save bound address as dst */ -if (saddr) { -g_assert(s->dest_addr == NULL); -s->dest_addr = (struct sockaddr *)saddr; -s->dest_len = sizeof(*saddr); -qemu_set_info_str(nc, "fd=%d (cloned mcast=%s:%d)", fd, - inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port)); -} else { -qemu_set_info_str(nc, "fd=%d %s", fd, SocketAddressType_str(sa_type)); -} - return s; - -err: -g_free(saddr); -closesocket(fd); -return NULL; } static int net_dgram_mcast_init(NetClientState *peer, @@ -381,7 +324,9 @@ static int net_dgram_mcast_init(NetClientState *peer, } break; } -case SOCKET_ADDRESS_TYPE_FD: +case SOCKET_ADDRESS_TYPE_FD: { +int newfd; + fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp); if (fd == -1) { g_free(saddr); @@ -394,7 +339,42 @@ static int net_dgram_mcast_init(NetClientState *peer, name, fd); return -1; } + +/* + * fd passed: multicast: "learn" dest_addr address from bound + * address and save it. Because this may be "shared" socket from a + * "master" process, datagrams would be recv() by ONLY ONE process: + * we must "clone" this dgram socket --jjo + */ + +saddr = g_new(struct sockaddr_in, 1); + +if (convert_host_port(saddr, local->u.inet.host, local->u.inet.port, + errp) < 0) { +g_free(saddr); +closesocket(fd); +return -1; +} + +/* must be bound */ +if (saddr->sin_addr.s_addr == 0) { +error_setg(errp, "can't setup multicast destination address"); +g_free(saddr); +closesocket(fd); +return -1; +} +/* clone dgram socket */ +newfd = net_dgram_mcast_create(saddr, NULL, errp); +if (newfd < 0) { +g_free(saddr); +closesocket(fd); +return -1; +} +/* clone newfd to fd, close newfd */ +dup2(newfd, fd); +close(newfd); break; +} default: g_free(saddr);
[PULL 10/26] vhost: Accept event idx flag
From: Eugenio Pérez Enabling all the code path created before. Reviewed-by: Michael S. Tsirkin Signed-off-by: Eugenio Pérez Signed-off-by: Jason Wang --- hw/virtio/vhost-shadow-virtqueue.c | 1 + 1 file changed, 1 insertion(+) diff --git a/hw/virtio/vhost-shadow-virtqueue.c b/hw/virtio/vhost-shadow-virtqueue.c index f306ebe..5bd14ca 100644 --- a/hw/virtio/vhost-shadow-virtqueue.c +++ b/hw/virtio/vhost-shadow-virtqueue.c @@ -33,6 +33,7 @@ bool vhost_svq_valid_features(uint64_t features, Error **errp) ++b) { switch (b) { case VIRTIO_F_ANY_LAYOUT: +case VIRTIO_RING_F_EVENT_IDX: continue; case VIRTIO_F_ACCESS_PLATFORM: -- 2.7.4
[PULL 20/26] net: dgram: make dgram_dst generic
From: Laurent Vivier dgram_dst is a sockaddr_in structure. To be able to use it with unix socket, use a pointer to a generic sockaddr structure. Rename it dest_addr, and store socket length in dest_len. Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Acked-by: Michael S. Tsirkin Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Jason Wang --- net/dgram.c | 82 +++-- 1 file changed, 53 insertions(+), 29 deletions(-) diff --git a/net/dgram.c b/net/dgram.c index 5339585..e20be9c 100644 --- a/net/dgram.c +++ b/net/dgram.c @@ -40,9 +40,11 @@ typedef struct NetDgramState { NetClientState nc; int fd; SocketReadState rs; -struct sockaddr_in dgram_dst; /* contains destination iff connectionless */ bool read_poll; /* waiting to receive data? */ bool write_poll; /* waiting to transmit data? */ +/* contains destination iff connectionless */ +struct sockaddr *dest_addr; +socklen_t dest_len; } NetDgramState; static void net_dgram_send(void *opaque); @@ -84,10 +86,8 @@ static ssize_t net_dgram_receive(NetClientState *nc, ssize_t ret; do { -if (s->dgram_dst.sin_family != AF_UNIX) { -ret = sendto(s->fd, buf, size, 0, - (struct sockaddr *)&s->dgram_dst, - sizeof(s->dgram_dst)); +if (s->dest_addr) { +ret = sendto(s->fd, buf, size, 0, s->dest_addr, s->dest_len); } else { ret = send(s->fd, buf, size, 0); } @@ -244,6 +244,9 @@ static void net_dgram_cleanup(NetClientState *nc) close(s->fd); s->fd = -1; } +g_free(s->dest_addr); +s->dest_addr = NULL; +s->dest_len = 0; } static NetClientInfo net_dgram_socket_info = { @@ -260,7 +263,7 @@ static NetDgramState *net_dgram_fd_init(NetClientState *peer, SocketAddress *mcast, Error **errp) { -struct sockaddr_in saddr; +struct sockaddr_in *saddr = NULL; int newfd; NetClientState *nc; NetDgramState *s; @@ -275,31 +278,32 @@ static NetDgramState *net_dgram_fd_init(NetClientState *peer, qapi_free_SocketAddress(sa); /* - * fd passed: multicast: "learn" dgram_dst address from bound address and + * fd passed: multicast: "learn" dest_addr address from bound address and * save it. Because this may be "shared" socket from a "master" process, * datagrams would be recv() by ONLY ONE process: we must "clone" this * dgram socket --jjo */ if (is_fd && mcast != NULL) { -if (convert_host_port(&saddr, mcast->u.inet.host, - mcast->u.inet.port, errp) < 0) { +saddr = g_new(struct sockaddr_in, 1); + +if (convert_host_port(saddr, mcast->u.inet.host, mcast->u.inet.port, + errp) < 0) { goto err; } /* must be bound */ -if (saddr.sin_addr.s_addr == 0) { +if (saddr->sin_addr.s_addr == 0) { error_setg(errp, "can't setup multicast destination address"); goto err; } /* clone dgram socket */ -newfd = net_dgram_mcast_create(&saddr, NULL, errp); +newfd = net_dgram_mcast_create(saddr, NULL, errp); if (newfd < 0) { goto err; } /* clone newfd to fd, close newfd */ dup2(newfd, fd); close(newfd); - } nc = qemu_new_net_client(&net_dgram_socket_info, peer, model, name); @@ -311,21 +315,20 @@ static NetDgramState *net_dgram_fd_init(NetClientState *peer, net_dgram_read_poll(s, true); /* mcast: save bound address as dst */ -if (is_fd && mcast != NULL) { -s->dgram_dst = saddr; +if (saddr) { +g_assert(s->dest_addr == NULL); +s->dest_addr = (struct sockaddr *)saddr; +s->dest_len = sizeof(*saddr); qemu_set_info_str(nc, "fd=%d (cloned mcast=%s:%d)", fd, - inet_ntoa(saddr.sin_addr), ntohs(saddr.sin_port)); + inet_ntoa(saddr->sin_addr), ntohs(saddr->sin_port)); } else { -if (sa_type == SOCKET_ADDRESS_TYPE_UNIX) { -s->dgram_dst.sin_family = AF_UNIX; -} - qemu_set_info_str(nc, "fd=%d %s", fd, SocketAddressType_str(sa_type)); } return s; err: +g_free(saddr); closesocket(fd); return NULL; } @@ -339,21 +342,24 @@ static int net_dgram_mcast_init(NetClientState *peer, { NetDgramState *s; int fd, ret; -struct sockaddr_in saddr; +struct sockaddr_in *saddr; if (remote->type != SOCKET_ADDRESS_TYPE_INET) { error_setg(errp, "multicast only support inet type"); return -1; } -if (convert_host_port(&saddr, re
[PULL 00/26] Net patches
The following changes since commit 344744e148e6e865f5a57e745b02a87e5ea534ad: Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu into staging (2022-10-26 10:53:49 -0400) are available in the git repository at: https://github.com/jasowang/qemu.git tags/net-pull-request for you to fetch changes up to e506fee8b1e092f6ac6f9459bf6a35b807644ad2: net: stream: add QAPI events to report connection state (2022-10-28 13:28:52 +0800) Daniel P. Berrangé (1): net: improve error message for missing netdev backend Eugenio Pérez (6): vdpa: Delete duplicated vdpa_feature_bits entry vdpa: Remove shadow CVQ command check vhost: allocate event_idx fields on vring vhost: toggle device callbacks using used event idx vhost: use avail event idx on vhost_svq_kick vhost: Accept event idx flag Laurent Vivier (16): virtio-net: fix bottom-half packet TX on asynchronous completion virtio-net: fix TX timer with tx_burst net: introduce convert_host_port() net: remove the @errp argument of net_client_inits() net: simplify net_client_parse() error management qapi: net: introduce a way to bypass qemu_opts_parse_noisily() net: introduce qemu_set_info_str() function qapi: net: add stream and dgram netdevs net: stream: add unix socket net: dgram: make dgram_dst generic net: dgram: move mcast specific code from net_socket_fd_init_dgram() net: dgram: add unix socket qemu-sockets: move and rename SocketAddress_to_str() qemu-sockets: update socket_uri() and socket_parse() to be consistent net: stream: move to QIO to enable additional parameters net: stream: add QAPI events to report connection state Si-Wei Liu (1): vhost-vdpa: allow passing opened vhostfd to vhost-vdpa Stefano Brivio (2): net: socket: Don't ignore EINVAL on netdev socket connection net: stream: Don't ignore EINVAL on netdev socket connection hmp-commands.hx| 2 +- hw/net/virtio-net.c| 59 +++- hw/net/xen_nic.c | 5 +- hw/virtio/vhost-shadow-virtqueue.c | 39 ++- include/net/net.h | 7 +- include/qemu/sockets.h | 4 +- monitor/hmp-cmds.c | 23 +- net/clients.h | 6 + net/dgram.c| 623 + net/hub.c | 2 + net/l2tpv3.c | 3 +- net/meson.build| 2 + net/net.c | 204 net/slirp.c| 5 +- net/socket.c | 36 +-- net/stream.c | 386 +++ net/tap-win32.c| 3 +- net/tap.c | 13 +- net/vde.c | 3 +- net/vhost-user.c | 3 +- net/vhost-vdpa.c | 76 ++--- qapi/net.json | 118 ++- qemu-options.hx| 20 +- softmmu/vl.c | 16 +- util/qemu-sockets.c| 25 ++ 25 files changed, 1473 insertions(+), 210 deletions(-) create mode 100644 net/dgram.c create mode 100644 net/stream.c
[PULL 22/26] net: dgram: add unix socket
From: Laurent Vivier Signed-off-by: Laurent Vivier Reviewed-by: Stefano Brivio Reviewed-by: David Gibson Acked-by: Michael S. Tsirkin Acked-by: Markus Armbruster (QAPI schema) Signed-off-by: Jason Wang --- net/dgram.c | 55 ++- qapi/net.json | 2 +- qemu-options.hx | 1 + 3 files changed, 56 insertions(+), 2 deletions(-) diff --git a/net/dgram.c b/net/dgram.c index e581cc6..9f7bf38 100644 --- a/net/dgram.c +++ b/net/dgram.c @@ -426,6 +426,7 @@ int net_init_dgram(const Netdev *netdev, const char *name, SocketAddress *remote, *local; struct sockaddr *dest_addr; struct sockaddr_in laddr_in, raddr_in; +struct sockaddr_un laddr_un, raddr_un; socklen_t dest_len; assert(netdev->type == NET_CLIENT_DRIVER_DGRAM); @@ -465,7 +466,8 @@ int net_init_dgram(const Netdev *netdev, const char *name, } } else { if (local->type != SOCKET_ADDRESS_TYPE_FD) { -error_setg(errp, "type=inet requires remote parameter"); +error_setg(errp, + "type=inet or type=unix requires remote parameter"); return -1; } } @@ -508,6 +510,53 @@ int net_init_dgram(const Netdev *netdev, const char *name, dest_addr = g_malloc(dest_len); memcpy(dest_addr, &raddr_in, dest_len); break; +case SOCKET_ADDRESS_TYPE_UNIX: +ret = unlink(local->u.q_unix.path); +if (ret < 0 && errno != ENOENT) { +error_setg_errno(errp, errno, "failed to unlink socket %s", + local->u.q_unix.path); +return -1; +} + +laddr_un.sun_family = PF_UNIX; +ret = snprintf(laddr_un.sun_path, sizeof(laddr_un.sun_path), "%s", + local->u.q_unix.path); +if (ret < 0 || ret >= sizeof(laddr_un.sun_path)) { +error_setg(errp, "UNIX socket path '%s' is too long", + local->u.q_unix.path); +error_append_hint(errp, "Path must be less than %zu bytes\n", + sizeof(laddr_un.sun_path)); +} + +raddr_un.sun_family = PF_UNIX; +ret = snprintf(raddr_un.sun_path, sizeof(raddr_un.sun_path), "%s", + remote->u.q_unix.path); +if (ret < 0 || ret >= sizeof(raddr_un.sun_path)) { +error_setg(errp, "UNIX socket path '%s' is too long", + remote->u.q_unix.path); +error_append_hint(errp, "Path must be less than %zu bytes\n", + sizeof(raddr_un.sun_path)); +} + +fd = qemu_socket(PF_UNIX, SOCK_DGRAM, 0); +if (fd < 0) { +error_setg_errno(errp, errno, "can't create datagram socket"); +return -1; +} + +ret = bind(fd, (struct sockaddr *)&laddr_un, sizeof(laddr_un)); +if (ret < 0) { +error_setg_errno(errp, errno, "can't bind unix=%s to socket", + laddr_un.sun_path); +closesocket(fd); +return -1; +} +qemu_socket_set_nonblock(fd); + +dest_len = sizeof(raddr_un); +dest_addr = g_malloc(dest_len); +memcpy(dest_addr, &raddr_un, dest_len); +break; case SOCKET_ADDRESS_TYPE_FD: fd = monitor_fd_param(monitor_cur(), local->u.fd.str, errp); if (fd == -1) { @@ -546,6 +595,10 @@ int net_init_dgram(const Netdev *netdev, const char *name, inet_ntoa(raddr_in.sin_addr), ntohs(raddr_in.sin_port)); break; +case SOCKET_ADDRESS_TYPE_UNIX: +qemu_set_info_str(&s->nc, "udp=%s:%s", + laddr_un.sun_path, raddr_un.sun_path); +break; case SOCKET_ADDRESS_TYPE_FD: { SocketAddress *sa; SocketAddressType sa_type; diff --git a/qapi/net.json b/qapi/net.json index b4768ce..cbc8d77 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -603,7 +603,7 @@ # @remote: remote address # @local: local address # -# Only SocketAddress types 'inet' and 'fd' are supported. +# Only SocketAddress types 'unix', 'inet' and 'fd' are supported. # # If remote address is present and it's a multicast address, local address # is optional. Otherwise local address is required and remote address is diff --git a/qemu-options.hx b/qemu-options.hx index 858f3dc..e76142b 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2782,6 +2782,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, "configure a network backend to connect to a multicast maddr and port\n" "use ``local.host=addr`` to specify the host address to send packets from\n" "-netdev dgram,id=str,local.type=inet,local.host=addr,local.port=port[,remote.type=inet,remote.host=addr,remote.port=port]\n" +"-netdev dgram,id=str,local.type=unix,local.path=path[,remote.type=unix,remote.path=path]\n" "-netdev dgram,id=st
[PULL 06/26] net: improve error message for missing netdev backend
From: Daniel P. Berrangé The current message when using '-net user...' with SLIRP disabled at compile time is: qemu-system-x86_64: -net user: Parameter 'type' expects a net backend type (maybe it is not compiled into this binary) An observation is that we're using the 'netdev->type' field here which is an enum value, produced after QAPI has converted from its string form. IOW, at this point in the code, we know that the user's specified type name was a valid network backend. The only possible scenario that can make the backend init function be NULL, is if support for that backend was disabled at build time. Given this, we don't need to caveat our error message with a 'maybe' hint, we can be totally explicit. The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to user friendly error message text. Since this is not used to set a specific QAPI error class, we can simply stop using this pre-formatted error text and provide something better. Thus the new message is: qemu-system-x86_64: -net user: network backend 'user' is not compiled into this binary The case of passing 'hubport' for -net is also given a message reminding people they should have used -netdev/-nic instead, as this backend type is only valid for the modern syntax. Reviewed-by: Marc-André Lureau Reviewed-by: Thomas Huth Signed-off-by: Daniel P. Berrangé Signed-off-by: Jason Wang --- net/net.c | 18 +++--- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/net/net.c b/net/net.c index 2db160e..8ddafac 100644 --- a/net/net.c +++ b/net/net.c @@ -1036,19 +1036,23 @@ static int net_client_init1(const Netdev *netdev, bool is_netdev, Error **errp) if (is_netdev) { if (netdev->type == NET_CLIENT_DRIVER_NIC || !net_client_init_fun[netdev->type]) { -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type", - "a netdev backend type"); +error_setg(errp, "network backend '%s' is not compiled into this binary", + NetClientDriver_str(netdev->type)); return -1; } } else { if (netdev->type == NET_CLIENT_DRIVER_NONE) { return 0; /* nothing to do */ } -if (netdev->type == NET_CLIENT_DRIVER_HUBPORT || -!net_client_init_fun[netdev->type]) { -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type", - "a net backend type (maybe it is not compiled " - "into this binary)"); +if (netdev->type == NET_CLIENT_DRIVER_HUBPORT) { +error_setg(errp, "network backend '%s' is only supported with -netdev/-nic", + NetClientDriver_str(netdev->type)); +return -1; +} + +if (!net_client_init_fun[netdev->type]) { +error_setg(errp, "network backend '%s' is not compiled into this binary", + NetClientDriver_str(netdev->type)); return -1; } -- 2.7.4
[PULL 02/26] virtio-net: fix TX timer with tx_burst
From: Laurent Vivier When virtio_net_flush_tx() reaches the tx_burst value all the queue is not flushed and nothing restart the timer. Fix that by doing for TX timer as we do for bottom half TX: rearming the timer if we find any packet to send during the virtio_net_flush_tx() call. Fixes: e3f30488e5f8 ("virtio-net: Limit number of packets sent per TX flush") Cc: alex.william...@redhat.com Signed-off-by: Laurent Vivier Reviewed-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 50 +- 1 file changed, 41 insertions(+), 9 deletions(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 1fbf2f3..b6903ae 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2536,14 +2536,19 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len) virtio_queue_set_notification(q->tx_vq, 1); ret = virtio_net_flush_tx(q); -if (q->tx_bh && ret >= n->tx_burst) { +if (ret >= n->tx_burst) { /* * the flush has been stopped by tx_burst * we will not receive notification for the * remainining part, so re-schedule */ virtio_queue_set_notification(q->tx_vq, 0); -qemu_bh_schedule(q->tx_bh); +if (q->tx_bh) { +qemu_bh_schedule(q->tx_bh); +} else { +timer_mod(q->tx_timer, + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout); +} q->tx_waiting = 1; } } @@ -2644,6 +2649,8 @@ drop: return num_packets; } +static void virtio_net_tx_timer(void *opaque); + static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq) { VirtIONet *n = VIRTIO_NET(vdev); @@ -2661,15 +2668,13 @@ static void virtio_net_handle_tx_timer(VirtIODevice *vdev, VirtQueue *vq) } if (q->tx_waiting) { -virtio_queue_set_notification(vq, 1); +/* We already have queued packets, immediately flush */ timer_del(q->tx_timer); -q->tx_waiting = 0; -if (virtio_net_flush_tx(q) == -EINVAL) { -return; -} +virtio_net_tx_timer(q); } else { +/* re-arm timer to flush it (and more) on next tick */ timer_mod(q->tx_timer, - qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout); + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout); q->tx_waiting = 1; virtio_queue_set_notification(vq, 0); } @@ -2702,6 +2707,8 @@ static void virtio_net_tx_timer(void *opaque) VirtIONetQueue *q = opaque; VirtIONet *n = q->n; VirtIODevice *vdev = VIRTIO_DEVICE(n); +int ret; + /* This happens when device was stopped but BH wasn't. */ if (!vdev->vm_running) { /* Make sure tx waiting is set, so we'll run when restarted. */ @@ -2716,8 +2723,33 @@ static void virtio_net_tx_timer(void *opaque) return; } +ret = virtio_net_flush_tx(q); +if (ret == -EBUSY || ret == -EINVAL) { +return; +} +/* + * If we flush a full burst of packets, assume there are + * more coming and immediately rearm + */ +if (ret >= n->tx_burst) { +q->tx_waiting = 1; +timer_mod(q->tx_timer, + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout); +return; +} +/* + * If less than a full burst, re-enable notification and flush + * anything that may have come in while we weren't looking. If + * we find something, assume the guest is still active and rearm + */ virtio_queue_set_notification(q->tx_vq, 1); -virtio_net_flush_tx(q); +ret = virtio_net_flush_tx(q); +if (ret > 0) { +virtio_queue_set_notification(q->tx_vq, 0); +q->tx_waiting = 1; +timer_mod(q->tx_timer, + qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + n->tx_timeout); +} } static void virtio_net_tx_bh(void *opaque) -- 2.7.4
[PULL 26/26] net: stream: add QAPI events to report connection state
From: Laurent Vivier The netdev reports NETDEV_STREAM_CONNECTED event when the backend is connected, and NETDEV_STREAM_DISCONNECTED when it is disconnected. The NETDEV_STREAM_CONNECTED event includes the destination address. This allows a system manager like libvirt to detect when the server fails. For instance with passt: { 'execute': 'qmp_capabilities' } { "return": { } } { "timestamp": { "seconds": 1666341395, "microseconds": 505347 }, "event": "NETDEV_STREAM_CONNECTED", "data": { "netdev-id": "netdev0", "addr": { "path": "/tmp/passt_1.socket", "type": "unix" } } } [killing passt here] { "timestamp": { "seconds": 1666341430, "microseconds": 968694 }, "event": "NETDEV_STREAM_DISCONNECTED", "data": { "netdev-id": "netdev0" } } Signed-off-by: Laurent Vivier Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- net/stream.c | 5 + qapi/net.json | 49 + 2 files changed, 54 insertions(+) diff --git a/net/stream.c b/net/stream.c index 54c67e1..53b7040 100644 --- a/net/stream.c +++ b/net/stream.c @@ -38,6 +38,7 @@ #include "io/channel.h" #include "io/channel-socket.h" #include "io/net-listener.h" +#include "qapi/qapi-events-net.h" typedef struct NetStreamState { NetClientState nc; @@ -168,6 +169,8 @@ static gboolean net_stream_send(QIOChannel *ioc, s->nc.link_down = true; qemu_set_info_str(&s->nc, ""); +qapi_event_send_netdev_stream_disconnected(s->nc.name); + return G_SOURCE_REMOVE; } buf = buf1; @@ -244,6 +247,7 @@ static void net_stream_listen(QIONetListener *listener, uri = socket_uri(addr); qemu_set_info_str(&s->nc, uri); g_free(uri); +qapi_event_send_netdev_stream_connected(s->nc.name, addr); qapi_free_SocketAddress(addr); } @@ -335,6 +339,7 @@ static void net_stream_client_connected(QIOTask *task, gpointer opaque) s->ioc_read_tag = qio_channel_add_watch(s->ioc, G_IO_IN, net_stream_send, s, NULL); s->nc.link_down = false; +qapi_event_send_netdev_stream_connected(s->nc.name, addr); qapi_free_SocketAddress(addr); return; diff --git a/qapi/net.json b/qapi/net.json index cbc8d77..522ac58 100644 --- a/qapi/net.json +++ b/qapi/net.json @@ -898,3 +898,52 @@ ## { 'event': 'FAILOVER_NEGOTIATED', 'data': {'device-id': 'str'} } + +## +# @NETDEV_STREAM_CONNECTED: +# +# Emitted when the netdev stream backend is connected +# +# @netdev-id: QEMU netdev id that is connected +# @addr: The destination address +# +# Since: 7.2 +# +# Example: +# +# <- { "event": "NETDEV_STREAM_CONNECTED", +# "data": { "netdev-id": "netdev0", +#"addr": { "port": "47666", "ipv6": true, +# "host": "::1", "type": "inet" } }, +# "timestamp": { "seconds": 1666269863, "microseconds": 311222 } } +# +# or +# +# <- { "event": "NETDEV_STREAM_CONNECTED", +# "data": { "netdev-id": "netdev0", +#"addr": { "path": "/tmp/qemu0", "type": "unix" } }, +# "timestamp": { "seconds": 1666269706, "microseconds": 413651 } } +# +## +{ 'event': 'NETDEV_STREAM_CONNECTED', + 'data': { 'netdev-id': 'str', +'addr': 'SocketAddress' } } + +## +# @NETDEV_STREAM_DISCONNECTED: +# +# Emitted when the netdev stream backend is disconnected +# +# @netdev-id: QEMU netdev id that is disconnected +# +# Since: 7.2 +# +# Example: +# +# <- { 'event': 'NETDEV_STREAM_DISCONNECTED', +# 'data': {'netdev-id': 'netdev0'}, +# 'timestamp': {'seconds': 1663330937, 'microseconds': 526695} } +# +## +{ 'event': 'NETDEV_STREAM_DISCONNECTED', + 'data': { 'netdev-id': 'str' } } -- 2.7.4
[PULL 01/26] virtio-net: fix bottom-half packet TX on asynchronous completion
From: Laurent Vivier When virtio-net is used with the socket netdev backend, the backend can be busy and not able to collect new packets. In this case, net_socket_receive() returns 0 and registers a poll function to detect when the socket is ready again. In virtio_net_tx_bh(), virtio_net_flush_tx() forwards the 0, the virtio notifications are disabled and the function is not re-scheduled, waiting for the backend to be ready. When the socket netdev backend is again able to send packets, the poll function re-starts to flush remaining packets. This is done by calling virtio_net_tx_complete(). It re-enables notifications and calls again virtio_net_flush_tx(). But it seems if virtio_net_flush_tx() reaches the tx_burst value all the queue is not flushed and no new notification is sent to re-schedule virtio_net_tx_bh(). Nothing re-start to flush the queue and remaining packets are stuck in the queue. To fix that, detect in virtio_net_tx_complete() if virtio_net_flush_tx() has been stopped by tx_burst and if yes re-schedule the bottom half function virtio_net_tx_bh() to flush the remaining packets. This is what is done in virtio_net_tx_bh() when the virtio_net_flush_tx() is synchronous, and completly by-passed when the operation needs to be asynchronous. Fixes: a697a334b3c4 ("virtio-net: Introduce a new bottom half packet TX") Cc: alex.william...@redhat.com Signed-off-by: Laurent Vivier Reviewed-by: Michael S. Tsirkin Acked-by: Michael S. Tsirkin Signed-off-by: Jason Wang --- hw/net/virtio-net.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index e9f696b..1fbf2f3 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -2526,6 +2526,7 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len) VirtIONet *n = qemu_get_nic_opaque(nc); VirtIONetQueue *q = virtio_net_get_subqueue(nc); VirtIODevice *vdev = VIRTIO_DEVICE(n); +int ret; virtqueue_push(q->tx_vq, q->async_tx.elem, 0); virtio_notify(vdev, q->tx_vq); @@ -2534,7 +2535,17 @@ static void virtio_net_tx_complete(NetClientState *nc, ssize_t len) q->async_tx.elem = NULL; virtio_queue_set_notification(q->tx_vq, 1); -virtio_net_flush_tx(q); +ret = virtio_net_flush_tx(q); +if (q->tx_bh && ret >= n->tx_burst) { +/* + * the flush has been stopped by tx_burst + * we will not receive notification for the + * remainining part, so re-schedule + */ +virtio_queue_set_notification(q->tx_vq, 0); +qemu_bh_schedule(q->tx_bh); +q->tx_waiting = 1; +} } /* TX */ -- 2.7.4
megasa regression in 7.1?
I pulled 7.1, and the megasas driver stopped being able to do reads from a disk. It looks to be related to this commit: https://github.com/qemu/qemu/commit/fe9d8927e265fd723a6dc87cd6d220f4677dbe1f#diffe3f5f30efc54747e0624dca63e5f55f0012736c1875b6e85526b3514e6911be3 which added some command buffer bounds checking to the SCSI subsysem. Unfortunately, when the megasas QEMU emulation receives a direct I/O command from the device driver in megasas_handle_io(), it synthesizes a SCSI command from it in megasas_encode_lba(), but passes the command buffer length from the driver frame instead of the length of the buffer it synthesized the SCSI command in. The driver (at least the Linux 4.14 version I’m using) does not fill in the command buffer length in direct I/O frames, so scsi_req_new() sees a 0 length command and fails it. I worked around this issue with: diff --git a/hw/scsi/megasas.c b/hw/scsi/megasas.c index 7082456..6e11607 100644 --- a/hw/scsi/megasas.c +++ b/hw/scsi/megasas.c @@ -1823,7 +1823,7 @@ static int megasas_handle_io(MegasasState *s, MegasasCmd *cmd, int frame_cmd) megasas_encode_lba(cdb, lba_start, lba_count, is_write); cmd->req = scsi_req_new(sdev, cmd->index, -lun_id, cdb, cdb_len, cmd); +lun_id, cdb, sizeof (cdb), cmd); if (!cmd->req) { trace_megasas_scsi_req_alloc_failed( mfi_frame_desc(frame_cmd), target_id, lun_id); and the driver can read the disk again, but I’m not sure this is the correct fix since cdb_len is used for bounds checking elsewhere in megagsas_handle_io(), although a 0 won’t fail there. Is there anyone with megasas experience who could comment on this? Thanks JJ
[PATCH v6 11/11] tests/qtest: Enable qtest build on Windows
Now that we have fixed various test case issues as seen when running on Windows, let's enable the qtest build on Windows. Signed-off-by: Bin Meng Reviewed-by: Thomas Huth --- Changes in v5: - Drop patches that are already merged Changes in v3: - Drop the host test Changes in v2: - new patch: "tests/qtest: Enable qtest build on Windows" tests/qtest/meson.build | 6 -- 1 file changed, 6 deletions(-) diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index c07a5b1a5f..f0ebb5fac6 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -1,9 +1,3 @@ -# All QTests for now are POSIX-only, but the dependencies are -# really in libqtest, not in the testcases themselves. -if not config_host.has_key('CONFIG_POSIX') - subdir_done() -endif - slow_qtests = { 'ahci-test' : 60, 'bios-tables-test' : 120, -- 2.25.1
[PATCH v6 09/11] tests/qtest: libqtest: Correct the timeout unit of blocking receive calls for win32
Some qtest cases don't get response from the QEMU executable under test in time on Windows. It turns out that the socket receive call got timeout before it receive the complete response. The timeout value is supposed to be set to 50 seconds via the setsockopt() call, but there is a difference among platforms. The timeout unit of blocking receive calls is measured in seconds on non-Windows platforms but milliseconds on Windows. Signed-off-by: Bin Meng Reviewed-by: Marc-André Lureau --- (no changes since v1) tests/qtest/libqtest.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c index e1e2d39a6e..2fbc3b88f3 100644 --- a/tests/qtest/libqtest.c +++ b/tests/qtest/libqtest.c @@ -36,13 +36,14 @@ #include "qapi/qmp/qstring.h" #define MAX_IRQ 256 -#define SOCKET_TIMEOUT 50 #ifndef _WIN32 +# define SOCKET_TIMEOUT 50 # define CMD_EXEC "exec " # define DEV_STDERR "/dev/fd/2" # define DEV_NULL "/dev/null" #else +# define SOCKET_TIMEOUT 5 # define CMD_EXEC "" # define DEV_STDERR "2" # define DEV_NULL "nul" @@ -106,8 +107,16 @@ static int socket_accept(int sock) struct sockaddr_un addr; socklen_t addrlen; int ret; +/* + * timeout unit of blocking receive calls is different among platfoms. + * It's in seconds on non-Windows platforms but milliseconds on Windows. + */ +#ifndef _WIN32 struct timeval timeout = { .tv_sec = SOCKET_TIMEOUT, .tv_usec = 0 }; +#else +DWORD timeout = SOCKET_TIMEOUT; +#endif if (setsockopt(sock, SOL_SOCKET, SO_RCVTIMEO, (void *)&timeout, sizeof(timeout))) { -- 2.25.1
[PATCH v6 00/11] tests/qtest: Enable running qtest on Windows
In preparation to adding virtio-9p support on Windows, this series enables running qtest on Windows, so that we can run the virtio-9p tests on Windows to make sure it does not break accidently. Changes in v6: - drop patches that are already in Alex and Daniel's tree - remove CONFIG_POSIX from meson.build - include in libqtest.c - move documentation comments of qemu_send_full() from util/osdep.c to qemu/sockets.h - save the "exit_code" in struct QTestState - new patch: "tests/qtest: device-plug-test: Reverse the usage of double/single quotes" - new patch: "tests/qtest: Use EXIT_FAILURE instead of magic number" - new patch: "tests/qtest: libqtest: Introduce qtest_wait_qemu()" - change to use qtest_wait_qemu() API - new patch: "test/qtest/libqos: meson.build: Do not build virtio-9p unconditionally" Changes in v5: - restore to v1 version which does not touch the posix implementation - Drop patches that are already merged Changes in v3: - Add a usleep(1) in the busy wait loop - Drop the host test Changes in v2: - Introduce qemu_send_full() and use it - Move the enabling of building qtests on Windows to a separate patch to keep bisectablity - Call socket_init() unconditionally - Add a missing CloseHandle() call - Change to a busy wait after migration is canceled - Change the timeout limit to 90 minutes - new patch: "tests/qtest: Enable qtest build on Windows" Bin Meng (8): tests/qtest: Support libqtest to build and run on Windows tests/qtest: device-plug-test: Reverse the usage of double/single quotes tests/qtest: Use EXIT_FAILURE instead of magic number tests/qtest: libqtest: Introduce qtest_wait_qemu() tests/qtest: libqos: Do not build virtio-9p unconditionally tests/qtest: libqtest: Correct the timeout unit of blocking receive calls for win32 .gitlab-ci.d/windows.yml: Increase the timeout to 90 minutes tests/qtest: Enable qtest build on Windows Xuzhou Cheng (3): accel/qtest: Support qtest accelerator for Windows tests/qtest: Use send/recv for socket communication tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled include/hw/core/cpu.h | 1 + include/qemu/sockets.h | 13 +++ tests/qtest/libqtest.h | 9 ++ accel/dummy-cpus.c | 14 ++- softmmu/cpus.c | 9 +- tests/qtest/dbus-vmstate-test.c | 2 +- tests/qtest/device-plug-test.c | 16 ++-- tests/qtest/libqmp.c| 5 +- tests/qtest/libqtest.c | 151 tests/qtest/migration-test.c| 8 +- util/osdep.c| 22 + .gitlab-ci.d/windows.yml| 4 +- accel/meson.build | 2 +- accel/qtest/meson.build | 3 +- tests/qtest/libqos/meson.build | 6 +- tests/qtest/meson.build | 6 -- 16 files changed, 221 insertions(+), 50 deletions(-) -- 2.25.1
Re: [PATCH v14 16/17] tests/qtest: netdev: test stream and dgram backends
在 2022/10/21 17:09, Laurent Vivier 写道: Signed-off-by: Laurent Vivier Acked-by: Michael S. Tsirkin --- I got this: 63/63 ERROR:../tests/qtest/netdev-socket.c:139:test_stream_inet_ipv6: assertion failed (resp == expect): ("st0: index=0,type=stream,connection error\r\n" == "st0: index=0,type=stream,tcp:::1:40389\r\n") ERROR 63/63 qemu:qtest+qtest-x86_64 / qtest-x86_64/netdev-socket ERROR 5.29s killed by signal 6 SIGABRT >>> QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=96 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/home/devel/git/qemu/tests/dbus-vmstate-daemon.sh /home/devel/git/qemu/build/tests/qtest/netdev-socket --tap -k ――― ✀ ――― stderr: ** ERROR:../tests/qtest/netdev-socket.c:139:test_stream_inet_ipv6: assertion failed (resp == expect): ("st0: index=0,type=stream,connection error\r\n" == "st0: index=0,type=stream,tcp:::1:40389\r\n") (test program exited with status code -6) ―― The base is: commit 344744e148e6e865f5a57e745b02a87e5ea534ad (HEAD -> master, origin/master, origin/HEAD) Merge: 08a5d04606 e38c24cb58 Author: Stefan Hajnoczi Date: Wed Oct 26 10:53:48 2022 -0400 Merge tag 'dump-pull-request' of https://gitlab.com/marcandre.lureau/qemu into staging So I dropped this patch from the queue and we can add it back after soft-freeze. Thanks tests/qtest/meson.build | 1 + tests/qtest/netdev-socket.c | 420 2 files changed, 421 insertions(+) create mode 100644 tests/qtest/netdev-socket.c diff --git a/tests/qtest/meson.build b/tests/qtest/meson.build index c07a5b1a5f43..6953797e4e3e 100644 --- a/tests/qtest/meson.build +++ b/tests/qtest/meson.build @@ -27,6 +27,7 @@ qtests_generic = [ 'test-hmp', 'qos-test', 'readconfig-test', + 'netdev-socket', ] if config_host.has_key('CONFIG_MODULES') qtests_generic += [ 'modules-test' ] diff --git a/tests/qtest/netdev-socket.c b/tests/qtest/netdev-socket.c new file mode 100644 index ..b24c0819b9ac --- /dev/null +++ b/tests/qtest/netdev-socket.c @@ -0,0 +1,420 @@ +/* + * QTest testcase for netdev stream and dgram + * + * Copyright (c) 2022 Red Hat, Inc. + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "libqtest.h" + +#define CONNECTION_TIMEOUT5 + +#define EXPECT_STATE(q, e, t) \ +do { \ +char *resp = qtest_hmp(q, "info network");\ +if (t) { \ +strrchr(resp, t)[0] = 0; \ +} \ +g_test_timer_start(); \ +while (g_test_timer_elapsed() < CONNECTION_TIMEOUT) { \ +if (strcmp(resp, e) == 0) { \ +break;\ +} \ +g_free(resp); \ +resp = qtest_hmp(q, "info network"); \ +if (t) { \ +strrchr(resp, t)[0] = 0; \ +} \ +} \ +g_assert_cmpstr(resp, ==, e); \ +g_free(resp); \ +} while (0) + +static int inet_get_free_port_socket(int sock) +{ +struct sockaddr_in addr; +socklen_t len; + +memset(&addr, 0, sizeof(addr)); +addr.sin_family = AF_INET; +addr.sin_addr.s_addr = INADDR_ANY; +addr.sin_port = 0; +if (bind(sock, (struct sockaddr *)&addr, sizeof(addr)) < 0) { +return -1; +} + +len = sizeof(addr); +if (getsockname(sock, (struct sockaddr *)&addr, &len) < 0) { +return -1; +} + +return ntohs(addr.sin_port); +} + +static int inet_get_free_port_multiple(int nb, int *port) +{ +int sock[nb]; +int i; + +for (i = 0; i < nb; i++) { +sock[i] = socket(AF_INET, SOCK_STREAM, 0); +if (sock[i] < 0) { +break; +} +port[i] = inet_get_free_port_socket(sock[i]); +} + +nb = i; +for (i = 0; i < nb; i++) { +closesocket(sock[i]); +} + +return nb; +} + +static int inet_get_free_port(void) +{ +int nb, port; + +nb = inet_get_free_port_multiple(1, &port); +g_assert_cmpint(nb, ==, 1); + +return port; +} + +static void test_stream_inet_ipv4
[PATCH v6 07/11] tests/qtest: migration-test: Make sure QEMU process "to" exited after migration is canceled
From: Xuzhou Cheng Make sure QEMU process "to" exited before launching another target for migration in the test_multifd_tcp_cancel case. Signed-off-by: Xuzhou Cheng Signed-off-by: Bin Meng --- Changes in v6: - change to use qtest_wait_qemu() API Changes in v3: - Add a usleep(1) in the busy wait loop Changes in v2: - Change to a busy wait after migration is canceled tests/qtest/migration-test.c | 4 1 file changed, 4 insertions(+) diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index 28a06d8170..d2eb107f0c 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -2141,6 +2141,10 @@ static void test_multifd_tcp_cancel(void) migrate_cancel(from); +/* Make sure QEMU process "to" exited */ +qtest_set_expected_status(to, EXIT_FAILURE); +qtest_wait_qemu(to); + args = (MigrateStart){ .only_target = true, }; -- 2.25.1
[PATCH v6 01/11] accel/qtest: Support qtest accelerator for Windows
From: Xuzhou Cheng Currently signal SIGIPI [=SIGUSR1] is used to kick the dummy CPU when qtest accelerator is used. However SIGUSR1 is unsupported on Windows. To support Windows, we add a QemuSemaphore CPUState::sem to kick the dummy CPU instead for Windows. Signed-off-by: Xuzhou Cheng Signed-off-by: Bin Meng Reviewed-by: Marc-André Lureau --- Changes in v6: - remove CONFIG_POSIX from meson.build Changes in v5: - restore to v1 version which does not touch the posix implementation include/hw/core/cpu.h | 1 + accel/dummy-cpus.c | 14 -- softmmu/cpus.c | 9 + accel/meson.build | 2 +- accel/qtest/meson.build | 3 +-- 5 files changed, 20 insertions(+), 9 deletions(-) diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index f9b58773f7..8830546121 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -333,6 +333,7 @@ struct CPUState { struct QemuThread *thread; #ifdef _WIN32 HANDLE hThread; +QemuSemaphore sem; #endif int thread_id; bool running, has_waiter; diff --git a/accel/dummy-cpus.c b/accel/dummy-cpus.c index 10429fdfb2..d6a1b8d0a2 100644 --- a/accel/dummy-cpus.c +++ b/accel/dummy-cpus.c @@ -21,8 +21,6 @@ static void *dummy_cpu_thread_fn(void *arg) { CPUState *cpu = arg; -sigset_t waitset; -int r; rcu_register_thread(); @@ -32,8 +30,13 @@ static void *dummy_cpu_thread_fn(void *arg) cpu->can_do_io = 1; current_cpu = cpu; +#ifndef _WIN32 +sigset_t waitset; +int r; + sigemptyset(&waitset); sigaddset(&waitset, SIG_IPI); +#endif /* signal CPU creation */ cpu_thread_signal_created(cpu); @@ -41,6 +44,7 @@ static void *dummy_cpu_thread_fn(void *arg) do { qemu_mutex_unlock_iothread(); +#ifndef _WIN32 do { int sig; r = sigwait(&waitset, &sig); @@ -49,6 +53,9 @@ static void *dummy_cpu_thread_fn(void *arg) perror("sigwait"); exit(1); } +#else +qemu_sem_wait(&cpu->sem); +#endif qemu_mutex_lock_iothread(); qemu_wait_io_event(cpu); } while (!cpu->unplug); @@ -69,4 +76,7 @@ void dummy_start_vcpu_thread(CPUState *cpu) cpu->cpu_index); qemu_thread_create(cpu->thread, thread_name, dummy_cpu_thread_fn, cpu, QEMU_THREAD_JOINABLE); +#ifdef _WIN32 +qemu_sem_init(&cpu->sem, 0); +#endif } diff --git a/softmmu/cpus.c b/softmmu/cpus.c index 61b27ff59d..9dd1a4dc17 100644 --- a/softmmu/cpus.c +++ b/softmmu/cpus.c @@ -437,18 +437,19 @@ void qemu_wait_io_event(CPUState *cpu) void cpus_kick_thread(CPUState *cpu) { -#ifndef _WIN32 -int err; - if (cpu->thread_kicked) { return; } cpu->thread_kicked = true; -err = pthread_kill(cpu->thread->thread, SIG_IPI); + +#ifndef _WIN32 +int err = pthread_kill(cpu->thread->thread, SIG_IPI); if (err && err != ESRCH) { fprintf(stderr, "qemu:%s: %s", __func__, strerror(err)); exit(1); } +#else +qemu_sem_post(&cpu->sem); #endif } diff --git a/accel/meson.build b/accel/meson.build index b9a963cf80..259c35c4c8 100644 --- a/accel/meson.build +++ b/accel/meson.build @@ -16,5 +16,5 @@ dummy_ss.add(files( 'dummy-cpus.c', )) -specific_ss.add_all(when: ['CONFIG_SOFTMMU', 'CONFIG_POSIX'], if_true: dummy_ss) +specific_ss.add_all(when: ['CONFIG_SOFTMMU'], if_true: dummy_ss) specific_ss.add_all(when: ['CONFIG_XEN'], if_true: dummy_ss) diff --git a/accel/qtest/meson.build b/accel/qtest/meson.build index 4c65600293..176d990ae1 100644 --- a/accel/qtest/meson.build +++ b/accel/qtest/meson.build @@ -1,2 +1 @@ -qtest_module_ss.add(when: ['CONFIG_SOFTMMU', 'CONFIG_POSIX'], -if_true: files('qtest.c')) +qtest_module_ss.add(when: ['CONFIG_SOFTMMU'], if_true: files('qtest.c')) -- 2.25.1
[PATCH v6 08/11] tests/qtest: libqos: Do not build virtio-9p unconditionally
At present the virtio-9p related codes are built into libqos unconditionally. Change to build them conditionally by testing the 'virtfs' config option. Signed-off-by: Bin Meng --- Changes in v6: - new patch: "test/qtest/libqos: meson.build: Do not build virtio-9p unconditionally" tests/qtest/libqos/meson.build | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/qtest/libqos/meson.build b/tests/qtest/libqos/meson.build index 113c80b4e4..32f028872c 100644 --- a/tests/qtest/libqos/meson.build +++ b/tests/qtest/libqos/meson.build @@ -33,8 +33,6 @@ libqos_srcs = files( 'sdhci.c', 'tpci200.c', 'virtio.c', -'virtio-9p.c', -'virtio-9p-client.c', 'virtio-balloon.c', 'virtio-blk.c', 'vhost-user-blk.c', @@ -62,6 +60,10 @@ libqos_srcs = files( 'x86_64_pc-machine.c', ) +if have_virtfs + libqos_srcs += files('virtio-9p.c', 'virtio-9p-client.c') +endif + libqos = static_library('qos', libqos_srcs + genh, name_suffix: 'fa', build_by_default: false) -- 2.25.1
[PATCH v6 05/11] tests/qtest: Use EXIT_FAILURE instead of magic number
When migration fails, QEMU exits with a status code EXIT_FAILURE. Change qtests to use the well-defined macro instead of magic number. Signed-off-by: Bin Meng --- Changes in v6: - new patch: "tests/qtest: Use EXIT_FAILURE instead of magic number" tests/qtest/dbus-vmstate-test.c | 2 +- tests/qtest/migration-test.c| 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/qtest/dbus-vmstate-test.c b/tests/qtest/dbus-vmstate-test.c index 74ede651f6..6c990864e3 100644 --- a/tests/qtest/dbus-vmstate-test.c +++ b/tests/qtest/dbus-vmstate-test.c @@ -233,7 +233,7 @@ test_dbus_vmstate(Test *test) test->src_qemu = src_qemu; if (test->migrate_fail) { wait_for_migration_fail(src_qemu, true); -qtest_set_expected_status(dst_qemu, 1); +qtest_set_expected_status(dst_qemu, EXIT_FAILURE); } else { wait_for_migration_complete(src_qemu); } diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index aa1ba179fa..28a06d8170 100644 --- a/tests/qtest/migration-test.c +++ b/tests/qtest/migration-test.c @@ -1342,7 +1342,7 @@ static void test_precopy_common(MigrateCommon *args) wait_for_migration_fail(from, allow_active); if (args->result == MIG_TEST_FAIL_DEST_QUIT_ERR) { -qtest_set_expected_status(to, 1); +qtest_set_expected_status(to, EXIT_FAILURE); } } else { if (args->iterations) { @@ -1738,7 +1738,7 @@ static void do_test_validate_uuid(MigrateStart *args, bool should_fail) migrate_qmp(from, uri, "{}"); if (should_fail) { -qtest_set_expected_status(to, 1); +qtest_set_expected_status(to, EXIT_FAILURE); wait_for_migration_fail(from, true); } else { wait_for_migration_complete(from); -- 2.25.1
[PATCH v6 04/11] tests/qtest: device-plug-test: Reverse the usage of double/single quotes
The usage of double/single quotes in test_q35_pci_unplug_json_request() should be reversed to work on both win32 and non-win32 platforms: - The value of -device parameter needs to be surrounded by "" as Windows does not drop '' when passing it to QEMU which causes QEMU command line option parser failure. - The JSON key/value pairs need to be surrounded by '' to make the JSON parser happy on Windows. Fixes: a12f1a7e56b7 ("tests/x86: Add subtest with 'q35' machine type to device-plug-test") Signed-off-by: Bin Meng --- Changes in v6: - new patch: "tests/qtest: device-plug-test: Reverse the usage of double/single quotes" tests/qtest/device-plug-test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tests/qtest/device-plug-test.c b/tests/qtest/device-plug-test.c index 3f44f731d1..5a6afa2b57 100644 --- a/tests/qtest/device-plug-test.c +++ b/tests/qtest/device-plug-test.c @@ -112,16 +112,16 @@ static void test_pci_unplug_json_request(void) static void test_q35_pci_unplug_json_request(void) { -const char *port = "-device '{\"driver\": \"pcie-root-port\", " - "\"id\": \"p1\"}'"; +const char *port = "-device \"{'driver': 'pcie-root-port', " + "'id': 'p1'}\""; -const char *bridge = "-device '{\"driver\": \"pcie-pci-bridge\", " - "\"id\": \"b1\", " - "\"bus\": \"p1\"}'"; +const char *bridge = "-device \"{'driver': 'pcie-pci-bridge', " +"'id': 'b1', " +"'bus': 'p1'}\""; -const char *device = "-device '{\"driver\": \"virtio-mouse-pci\", " - "\"bus\": \"b1\", " - "\"id\": \"dev0\"}'"; +const char *device = "-device \"{'driver': 'virtio-mouse-pci', " +"'bus': 'b1', " +"'id': 'dev0'}\""; QTestState *qtest = qtest_initf("-machine q35 %s %s %s", port, bridge, device); -- 2.25.1
[PATCH v6 10/11] .gitlab-ci.d/windows.yml: Increase the timeout to 90 minutes
commit 9f8e6cad65a6 ("gitlab-ci: Speed up the msys2-64bit job by using --without-default-devices" changed to compile QEMU with the --without-default-devices switch for the msys2-64bit job, due to the build could not complete within the project timeout (1h), and also mentioned that a bigger timeout was getting ignored on the shared Gitlab-CI Windows runners. However as of today it seems the shared Gitlab-CI Windows runners does honor the job timeout, and the runner has the timeout limit of 2h, so let's increase the timeout to 90 minutes and drop the configure switch "--without-default-devices" to get a larger build coverage. Signed-off-by: Bin Meng --- (no changes since v2) Changes in v2: - Change the timeout limit to 90 minutes .gitlab-ci.d/windows.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.gitlab-ci.d/windows.yml b/.gitlab-ci.d/windows.yml index a3e7a37022..093276ddbc 100644 --- a/.gitlab-ci.d/windows.yml +++ b/.gitlab-ci.d/windows.yml @@ -10,7 +10,7 @@ - ${CI_PROJECT_DIR}/msys64/var/cache needs: [] stage: build - timeout: 70m + timeout: 90m before_script: - If ( !(Test-Path -Path msys64\var\cache ) ) { mkdir msys64\var\cache @@ -60,7 +60,7 @@ msys2-64bit: - $env:MSYSTEM = 'MINGW64' # Start a 64 bit Mingw environment - $env:MSYS = 'winsymlinks:native' # Enable native Windows symlink - .\msys64\usr\bin\bash -lc './configure --target-list=x86_64-softmmu - --enable-capstone --without-default-devices' + --enable-capstone' - .\msys64\usr\bin\bash -lc 'make' - .\msys64\usr\bin\bash -lc 'make check || { cat build/meson-logs/testlog.txt; exit 1; } ;' -- 2.25.1
[PATCH v6 03/11] tests/qtest: Support libqtest to build and run on Windows
At present the libqtest codes were written to depend on several POSIX APIs, including fork(), kill() and waitpid(). Unfortunately these APIs are not available on Windows. This commit implements the corresponding functionalities using win32 native APIs. With this change, all qtest cases can build successfully on a Windows host, and we can start qtest testing on Windows now. Signed-off-by: Xuzhou Cheng Signed-off-by: Bin Meng --- Changes in v6: - save the "exit_code" in struct QTestState Changes in v2: - Move the enabling of building qtests on Windows to a separate patch to keep bisectablity - Call socket_init() unconditionally - Add a missing CloseHandle() call tests/qtest/libqtest.c | 96 +- 1 file changed, 94 insertions(+), 2 deletions(-) diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c index b01846fd98..d12a604d78 100644 --- a/tests/qtest/libqtest.c +++ b/tests/qtest/libqtest.c @@ -16,9 +16,11 @@ #include "qemu/osdep.h" +#ifndef _WIN32 #include #include #include +#endif /* _WIN32 */ #ifdef __linux__ #include #endif /* __linux__ */ @@ -36,6 +38,16 @@ #define MAX_IRQ 256 #define SOCKET_TIMEOUT 50 +#ifndef _WIN32 +# define CMD_EXEC "exec " +# define DEV_STDERR "/dev/fd/2" +# define DEV_NULL "/dev/null" +#else +# define CMD_EXEC "" +# define DEV_STDERR "2" +# define DEV_NULL "nul" +#endif + typedef void (*QTestSendFn)(QTestState *s, const char *buf); typedef void (*ExternalSendFn)(void *s, const char *buf); typedef GString* (*QTestRecvFn)(QTestState *); @@ -58,6 +70,9 @@ struct QTestState int qmp_fd; pid_t qemu_pid; /* our child QEMU process */ int wstatus; +#ifdef _WIN32 +DWORD exit_code; +#endif int expected_status; bool big_endian; bool irq_level[MAX_IRQ]; @@ -119,10 +134,18 @@ bool qtest_probe_child(QTestState *s) pid_t pid = s->qemu_pid; if (pid != -1) { +#ifndef _WIN32 pid = waitpid(pid, &s->wstatus, WNOHANG); if (pid == 0) { return true; } +#else +GetExitCodeProcess((HANDLE)pid, &s->exit_code); +if (s->exit_code == STILL_ACTIVE) { +return true; +} +CloseHandle((HANDLE)pid); +#endif s->qemu_pid = -1; } return false; @@ -136,13 +159,25 @@ void qtest_set_expected_status(QTestState *s, int status) void qtest_kill_qemu(QTestState *s) { pid_t pid = s->qemu_pid; +#ifndef _WIN32 int wstatus; +#else +DWORD ret; +#endif /* Skip wait if qtest_probe_child already reaped. */ if (pid != -1) { +#ifndef _WIN32 kill(pid, SIGTERM); TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); assert(pid == s->qemu_pid); +#else +TerminateProcess((HANDLE)pid, s->expected_status); +ret = WaitForSingleObject((HANDLE)pid, INFINITE); +assert(ret == WAIT_OBJECT_0); +GetExitCodeProcess((HANDLE)pid, &s->exit_code); +CloseHandle((HANDLE)pid); +#endif s->qemu_pid = -1; } @@ -150,6 +185,7 @@ void qtest_kill_qemu(QTestState *s) * Check whether qemu exited with expected exit status; anything else is * fishy and should be logged with as much detail as possible. */ +#ifndef _WIN32 wstatus = s->wstatus; if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != s->expected_status) { fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU " @@ -166,6 +202,14 @@ void qtest_kill_qemu(QTestState *s) __FILE__, __LINE__, sig, signame, dump); abort(); } +#else +if (s->exit_code != s->expected_status) { +fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU " +"process but encountered exit status %ld (expected %d)\n", +__FILE__, __LINE__, s->exit_code, s->expected_status); +abort(); +} +#endif } static void kill_qemu_hook_func(void *s) @@ -244,6 +288,38 @@ static const char *qtest_qemu_binary(void) return qemu_bin; } +#ifdef _WIN32 +static pid_t qtest_create_process(char *cmd) +{ +STARTUPINFO si; +PROCESS_INFORMATION pi; +BOOL ret; + +ZeroMemory(&si, sizeof(si)); +si.cb = sizeof(si); +ZeroMemory(&pi, sizeof(pi)); + +ret = CreateProcess(NULL, /* module name */ +cmd,/* command line */ +NULL, /* process handle not inheritable */ +NULL, /* thread handle not inheritable */ +FALSE, /* set handle inheritance to FALSE */ +0, /* No creation flags */ +NULL, /* use parent's environment block */ +NULL, /* use parent's starting directory */ +&si,/* pointer to STARTUPINFO structure */ +&pi /* pointer to PROCESS_INFORMATION structure */ +); +if (ret == 0) { +fprintf(stderr, "%s:%d: una
[PATCH v6 06/11] tests/qtest: libqtest: Introduce qtest_wait_qemu()
Introduce an API for qtest to wait for the QEMU process to terminate. Suggested-by: Marc-André Lureau Signed-off-by: Bin Meng --- Changes in v6: - new patch: "tests/qtest: libqtest: Introduce qtest_wait_qemu()" tests/qtest/libqtest.h | 9 ++ tests/qtest/libqtest.c | 63 +- 2 files changed, 47 insertions(+), 25 deletions(-) diff --git a/tests/qtest/libqtest.h b/tests/qtest/libqtest.h index 65c040e504..91a5f7edd9 100644 --- a/tests/qtest/libqtest.h +++ b/tests/qtest/libqtest.h @@ -75,6 +75,15 @@ QTestState *qtest_init_without_qmp_handshake(const char *extra_args); */ QTestState *qtest_init_with_serial(const char *extra_args, int *sock_fd); +/** + * qtest_wait_qemu: + * @s: #QTestState instance to operate on. + * + * Wait for the QEMU process to terminate. It is safe to call this function + * multiple times. + */ +void qtest_wait_qemu(QTestState *s); + /** * qtest_kill_qemu: * @s: #QTestState instance to operate on. diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c index d12a604d78..e1e2d39a6e 100644 --- a/tests/qtest/libqtest.c +++ b/tests/qtest/libqtest.c @@ -156,37 +156,14 @@ void qtest_set_expected_status(QTestState *s, int status) s->expected_status = status; } -void qtest_kill_qemu(QTestState *s) +static void qtest_check_status(QTestState *s) { -pid_t pid = s->qemu_pid; -#ifndef _WIN32 -int wstatus; -#else -DWORD ret; -#endif - -/* Skip wait if qtest_probe_child already reaped. */ -if (pid != -1) { -#ifndef _WIN32 -kill(pid, SIGTERM); -TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); -assert(pid == s->qemu_pid); -#else -TerminateProcess((HANDLE)pid, s->expected_status); -ret = WaitForSingleObject((HANDLE)pid, INFINITE); -assert(ret == WAIT_OBJECT_0); -GetExitCodeProcess((HANDLE)pid, &s->exit_code); -CloseHandle((HANDLE)pid); -#endif -s->qemu_pid = -1; -} - /* * Check whether qemu exited with expected exit status; anything else is * fishy and should be logged with as much detail as possible. */ #ifndef _WIN32 -wstatus = s->wstatus; +int wstatus = s->wstatus; if (WIFEXITED(wstatus) && WEXITSTATUS(wstatus) != s->expected_status) { fprintf(stderr, "%s:%d: kill_qemu() tried to terminate QEMU " "process but encountered exit status %d (expected %d)\n", @@ -212,6 +189,42 @@ void qtest_kill_qemu(QTestState *s) #endif } +void qtest_wait_qemu(QTestState *s) +{ +#ifndef _WIN32 +pid_t pid; + +TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0)); +assert(pid == s->qemu_pid); +#else +DWORD ret; + +ret = WaitForSingleObject((HANDLE)s->qemu_pid, INFINITE); +assert(ret == WAIT_OBJECT_0); +GetExitCodeProcess((HANDLE)s->qemu_pid, &s->exit_code); +CloseHandle((HANDLE)s->qemu_pid); +#endif + +qtest_check_status(s); +} + +void qtest_kill_qemu(QTestState *s) +{ +/* Skip wait if qtest_probe_child() already reaped */ +if (s->qemu_pid != -1) { +#ifndef _WIN32 +kill(s->qemu_pid, SIGTERM); +#else +TerminateProcess((HANDLE)s->qemu_pid, s->expected_status); +#endif +qtest_wait_qemu(s); +s->qemu_pid = -1; +return; +} + +qtest_check_status(s); +} + static void kill_qemu_hook_func(void *s) { qtest_kill_qemu(s); -- 2.25.1
[PATCH v6 02/11] tests/qtest: Use send/recv for socket communication
From: Xuzhou Cheng Socket communication in the libqtest and libqmp codes uses read() and write() which work on any file descriptor on *nix, and sockets in *nix are an example of a file descriptor. However sockets on Windows do not use *nix-style file descriptors, so read() and write() cannot be used on sockets on Windows. Switch over to use send() and recv() instead which work on both Windows and *nix. Signed-off-by: Xuzhou Cheng Signed-off-by: Bin Meng Reviewed-by: Marc-André Lureau --- Changes in v6: - include in libqtest.c - move documentation comments of qemu_send_full() from util/osdep.c to qemu/sockets.h Changes in v2: - Introduce qemu_send_full() and use it include/qemu/sockets.h | 13 + tests/qtest/libqmp.c | 5 +++-- tests/qtest/libqtest.c | 5 +++-- util/osdep.c | 22 ++ 4 files changed, 41 insertions(+), 4 deletions(-) diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h index 036745e586..61648f3f3c 100644 --- a/include/qemu/sockets.h +++ b/include/qemu/sockets.h @@ -33,6 +33,19 @@ int qemu_socketpair(int domain, int type, int protocol, int sv[2]); #endif int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen); +/* + * A variant of send(2) which handles partial send. + * + * Return the number of bytes transferred over the socket. + * Set errno if fewer than `count' bytes are sent. + * + * This function don't work with non-blocking socket's. + * Any of the possibilities with non-blocking socket's is bad: + * - return a short write (then name is wrong) + * - busy wait adding (errno == EAGAIN) to the loop + */ +ssize_t qemu_send_full(int s, const void *buf, size_t count) +G_GNUC_WARN_UNUSED_RESULT; int socket_set_cork(int fd, int v); int socket_set_nodelay(int fd); void qemu_socket_set_block(int fd); diff --git a/tests/qtest/libqmp.c b/tests/qtest/libqmp.c index ade26c15f0..2b08382e5d 100644 --- a/tests/qtest/libqmp.c +++ b/tests/qtest/libqmp.c @@ -23,6 +23,7 @@ #endif #include "qemu/cutils.h" +#include "qemu/sockets.h" #include "qapi/error.h" #include "qapi/qmp/json-parser.h" #include "qapi/qmp/qjson.h" @@ -36,7 +37,7 @@ typedef struct { static void socket_send(int fd, const char *buf, size_t size) { -size_t res = qemu_write_full(fd, buf, size); +ssize_t res = qemu_send_full(fd, buf, size); assert(res == size); } @@ -69,7 +70,7 @@ QDict *qmp_fd_receive(int fd) ssize_t len; char c; -len = read(fd, &c, 1); +len = recv(fd, &c, 1, 0); if (len == -1 && errno == EINTR) { continue; } diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c index b23eb3edc3..b01846fd98 100644 --- a/tests/qtest/libqtest.c +++ b/tests/qtest/libqtest.c @@ -27,6 +27,7 @@ #include "libqmp.h" #include "qemu/ctype.h" #include "qemu/cutils.h" +#include "qemu/sockets.h" #include "qapi/qmp/qdict.h" #include "qapi/qmp/qjson.h" #include "qapi/qmp/qlist.h" @@ -428,7 +429,7 @@ void qtest_quit(QTestState *s) static void socket_send(int fd, const char *buf, size_t size) { -size_t res = qemu_write_full(fd, buf, size); +ssize_t res = qemu_send_full(fd, buf, size); assert(res == size); } @@ -460,7 +461,7 @@ static GString *qtest_client_socket_recv_line(QTestState *s) ssize_t len; char buffer[1024]; -len = read(s->fd, buffer, sizeof(buffer)); +len = recv(s->fd, buffer, sizeof(buffer), 0); if (len == -1 && errno == EINTR) { continue; } diff --git a/util/osdep.c b/util/osdep.c index 746d5f7d71..77c1a6c562 100644 --- a/util/osdep.c +++ b/util/osdep.c @@ -502,6 +502,28 @@ int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen) return ret; } +ssize_t qemu_send_full(int s, const void *buf, size_t count) +{ +ssize_t ret = 0; +ssize_t total = 0; + +while (count) { +ret = send(s, buf, count, 0); +if (ret < 0) { +if (errno == EINTR) { +continue; +} +break; +} + +count -= ret; +buf += ret; +total += ret; +} + +return total; +} + void qemu_set_hw_version(const char *version) { hw_version = version; -- 2.25.1
Re: [PATCH 0/3] target: Rename headers using .def extension to .h.inc
Peter Maydell writes: > On Thu, 27 Oct 2022 at 18:17, Markus Armbruster wrote: >> >> Peter Maydell writes: >> >> > On Thu, 27 Oct 2022 at 15:40, Markus Armbruster wrote: >> >> I wonder why we use any of .def, .h.inc, .inc.h, .c.inc, .inc.c. Why >> >> not .h and call it a day? No need to configure each and every editor to >> >> tread these as C code. >> > >> > It says "this isn't actually a header in the usual sense". That's >> > useful for automated scripted checks (eg we don't want >> > scripts/clean-header-guards.pl to add the standard #include header >> > guards to this sort of file) and for humans (if you see one of these >> > files included as part of the normal #include block at the top of >> > a .c file that's probably a mistake; if you see it being used then >> > you know there's likely multiple-inclusion shenanigans going on.) >> >> scripts/clean-header-guards.pl needs exclude patterns anyway. > > Yes, in theory instead of having a systematic convention for > filenames we could instead give the files names that don't > let you easily distinguish them from plain old header files and > require every use like this to update clean-header-guards.pl, > but that seems to me to be clearly worse than maintaining the > filename convention that we already have. Handle that just like for plain .h: when you re-run the script, you fix its complaints either by fixing the header or by adding it to the exclude pattern. >> Comments would likely work better for humans than obscure naming >> conventions. >> >> Make them stylized, and they work for scripts, too. > > We already have a stylized convention, it's the filename... True. But its opaque both for tools (editors mostly) and humans. Tools need to be configured, and humans need to be taught. Moreover, we don't have *a* stylized convention, we have *two*: funny filenames (several patterns, but that's fixable), and exclude patterns. > Comments inside the .inc file are also helpful, of course. Let's add a third! Stylized comments! I'll shut up now :)
RE: [PATCH v2] softmmu/physmem: Fix input parameters for flatview_access_allowed()
>-Original Message- >From: Duan, Zhenzhong >Sent: Friday, July 22, 2022 4:46 PM >To: qemu-devel@nongnu.org >Cc: pbonz...@redhat.com; pet...@redhat.com; da...@redhat.com; >f4...@amsat.org >Subject: [PATCH v2] softmmu/physmem: Fix input parameters for >flatview_access_allowed() > >The comment of flatview_access_allowed() suggests to pass address within >that memory region, this isn't true in some call sites. > >This makes qemu log in flatview_access_allowed() confusing and potential risk >if the input parameter will be checked in the future. > >Signed-off-by: Zhenzhong Duan >Reviewed-by: Peter Xu >Reviewed-by: David Hildenbrand >--- >v2: Fix typo and removed Fixed-by per David Hi Maintainers, Can this patch be considered merged as it got reviewed-by and no objection for a long time. Thanks. Zhenzhong > > softmmu/physmem.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/softmmu/physmem.c b/softmmu/physmem.c index >fb16be57a6c6..214cb04c8fc3 100644 >--- a/softmmu/physmem.c >+++ b/softmmu/physmem.c >@@ -2850,7 +2850,7 @@ static MemTxResult flatview_write(FlatView *fv, >hwaddr addr, MemTxAttrs attrs, > > l = len; > mr = flatview_translate(fv, addr, &addr1, &l, true, attrs); >-if (!flatview_access_allowed(mr, attrs, addr, len)) { >+if (!flatview_access_allowed(mr, attrs, addr1, l)) { > return MEMTX_ACCESS_ERROR; > } > return flatview_write_continue(fv, addr, attrs, buf, len, @@ -2917,7 >+2917,7 @@ static MemTxResult flatview_read(FlatView *fv, hwaddr addr, > > l = len; > mr = flatview_translate(fv, addr, &addr1, &l, false, attrs); >-if (!flatview_access_allowed(mr, attrs, addr, len)) { >+if (!flatview_access_allowed(mr, attrs, addr1, l)) { > return MEMTX_ACCESS_ERROR; > } > return flatview_read_continue(fv, addr, attrs, buf, len, >-- >2.25.1
Re: [PATCH 0/4] Shadow VirtQueue event index support
在 2022/10/27 04:58, Michael S. Tsirkin 写道: On Thu, Oct 20, 2022 at 05:52:47PM +0200, Eugenio Pérez wrote: Event idx helps to reduce the number of notifications between the device and the driver. It allows them to specify an index on the circular descriptors rings where to issue the notification, instead of a single binary indicator. Adding support for SVQ. Jason seems to be taking this through net Reviewed-by: Michael S. Tsirkin Ok, I've queued this. Eugenio, please post the fix for endian on top. Thanks These patches are sent on top of [1] series, so trivial conflicts could arise if it is applied directly on master. Future versions can be not based on it is more convenient. [1] https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg03280.html Eugenio Pérez (4): vhost: allocate event_idx fields on vring vhost: toggle device callbacks using used event idx vhost: use avail event idx on vhost_svq_kick vhost: Accept event idx flag hw/virtio/vhost-shadow-virtqueue.c | 39 -- 1 file changed, 31 insertions(+), 8 deletions(-) -- 2.31.1
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Fri, 28 Oct 2022 10:23:16 +0800 Jason Wang wrote: > On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson > wrote: > > > > On Fri, 28 Oct 2022 09:50:10 +0800 > > Jason Wang wrote: > > > > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson > > > wrote: > > > > > > > > On Thu, 27 Oct 2022 15:40:31 +0800 > > > > Cindy Lu wrote: > > > > > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > > > > > change the name to memory_get_xlat_addr().So we can use this > > > > > function in other devices,such as vDPA device. > > > > > > > > > > Signed-off-by: Cindy Lu > > > > > --- > > > > > hw/vfio/common.c | 92 > > > > > ++- > > > > > include/exec/memory.h | 4 ++ > > > > > softmmu/memory.c | 84 +++ > > > > > 3 files changed, 92 insertions(+), 88 deletions(-) > > > > > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > > index ace9562a9b..2b5a9f3d8d 100644 > > > > > --- a/hw/vfio/common.c > > > > > +++ b/hw/vfio/common.c > > > > > @@ -574,92 +574,6 @@ static bool > > > > > vfio_listener_skipped_section(MemoryRegionSection *section) > > > > > section->offset_within_address_space & (1ULL << 63); > > > > > } > > > > > > > > > > -/* Called with rcu_read_lock held. */ > > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > > > > > - ram_addr_t *ram_addr, bool *read_only) > > > > > -{ > > > > > -MemoryRegion *mr; > > > > > -hwaddr xlat; > > > > > -hwaddr len = iotlb->addr_mask + 1; > > > > > -bool writable = iotlb->perm & IOMMU_WO; > > > > > - > > > > > -/* > > > > > - * The IOMMU TLB entry we have just covers translation through > > > > > - * this IOMMU to its immediate target. We need to translate > > > > > - * it the rest of the way through to memory. > > > > > - */ > > > > > -mr = address_space_translate(&address_space_memory, > > > > > - iotlb->translated_addr, > > > > > - &xlat, &len, writable, > > > > > - MEMTXATTRS_UNSPECIFIED); > > > > > -if (!memory_region_is_ram(mr)) { > > > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > > > > > - xlat); > > > > > -return false; > > > > > -} else if (memory_region_has_ram_discard_manager(mr)) { > > > > > -RamDiscardManager *rdm = > > > > > memory_region_get_ram_discard_manager(mr); > > > > > -MemoryRegionSection tmp = { > > > > > -.mr = mr, > > > > > -.offset_within_region = xlat, > > > > > -.size = int128_make64(len), > > > > > -}; > > > > > - > > > > > -/* > > > > > - * Malicious VMs can map memory into the IOMMU, which is > > > > > expected > > > > > - * to remain discarded. vfio will pin all pages, populating > > > > > memory. > > > > > - * Disallow that. vmstate priorities make sure any > > > > > RamDiscardManager > > > > > - * were already restored before IOMMUs are restored. > > > > > - */ > > > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > > > > > -error_report("iommu map to discarded memory (e.g., > > > > > unplugged via" > > > > > - " virtio-mem): %"HWADDR_PRIx"", > > > > > - iotlb->translated_addr); > > > > > -return false; > > > > > -} > > > > > - > > > > > -/* > > > > > - * Malicious VMs might trigger discarding of IOMMU-mapped > > > > > memory. The > > > > > - * pages will remain pinned inside vfio until unmapped, > > > > > resulting in a > > > > > - * higher memory consumption than expected. If memory would > > > > > get > > > > > - * populated again later, there would be an inconsistency > > > > > between pages > > > > > - * pinned by vfio and pages seen by QEMU. This is the case > > > > > until > > > > > - * unmapped from the IOMMU (e.g., during device reset). > > > > > - * > > > > > - * With malicious guests, we really only care about pinning > > > > > more memory > > > > > - * than expected. RLIMIT_MEMLOCK set for the user/process > > > > > can never be > > > > > - * exceeded and can be used to mitigate this problem. > > > > > - */ > > > > > -warn_report_once("Using vfio with vIOMMUs and coordinated > > > > > discarding of" > > > > > - " RAM (e.g., virtio-mem) works, however, > > > > > malicious" > > > > > - " guests can trigger pinning of more memory > > > > > than" > > > > > - " intended via an IOMMU. It's possible to > > > > > mitigate " > > > > > - " by setting/adjusting RLIMIT_MEMLOCK."); > > > > > -} > > > > > - > > > > > -/* > > > >
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Fri, Oct 28, 2022 at 10:35 AM Alex Williamson wrote: > > On Fri, 28 Oct 2022 10:23:16 +0800 > Jason Wang wrote: > > > On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson > > wrote: > > > > > > On Fri, 28 Oct 2022 09:50:10 +0800 > > > Jason Wang wrote: > > > > > > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson > > > > wrote: > > > > > > > > > > On Thu, 27 Oct 2022 15:40:31 +0800 > > > > > Cindy Lu wrote: > > > > > > > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > > > > > > change the name to memory_get_xlat_addr().So we can use this > > > > > > function in other devices,such as vDPA device. > > > > > > > > > > > > Signed-off-by: Cindy Lu > > > > > > --- > > > > > > hw/vfio/common.c | 92 > > > > > > ++- > > > > > > include/exec/memory.h | 4 ++ > > > > > > softmmu/memory.c | 84 +++ > > > > > > 3 files changed, 92 insertions(+), 88 deletions(-) > > > > > > > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > > > index ace9562a9b..2b5a9f3d8d 100644 > > > > > > --- a/hw/vfio/common.c > > > > > > +++ b/hw/vfio/common.c > > > > > > @@ -574,92 +574,6 @@ static bool > > > > > > vfio_listener_skipped_section(MemoryRegionSection *section) > > > > > > section->offset_within_address_space & (1ULL << 63); > > > > > > } > > > > > > > > > > > > -/* Called with rcu_read_lock held. */ > > > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > > > > > > - ram_addr_t *ram_addr, bool > > > > > > *read_only) > > > > > > -{ > > > > > > -MemoryRegion *mr; > > > > > > -hwaddr xlat; > > > > > > -hwaddr len = iotlb->addr_mask + 1; > > > > > > -bool writable = iotlb->perm & IOMMU_WO; > > > > > > - > > > > > > -/* > > > > > > - * The IOMMU TLB entry we have just covers translation through > > > > > > - * this IOMMU to its immediate target. We need to translate > > > > > > - * it the rest of the way through to memory. > > > > > > - */ > > > > > > -mr = address_space_translate(&address_space_memory, > > > > > > - iotlb->translated_addr, > > > > > > - &xlat, &len, writable, > > > > > > - MEMTXATTRS_UNSPECIFIED); > > > > > > -if (!memory_region_is_ram(mr)) { > > > > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > > > > > > - xlat); > > > > > > -return false; > > > > > > -} else if (memory_region_has_ram_discard_manager(mr)) { > > > > > > -RamDiscardManager *rdm = > > > > > > memory_region_get_ram_discard_manager(mr); > > > > > > -MemoryRegionSection tmp = { > > > > > > -.mr = mr, > > > > > > -.offset_within_region = xlat, > > > > > > -.size = int128_make64(len), > > > > > > -}; > > > > > > - > > > > > > -/* > > > > > > - * Malicious VMs can map memory into the IOMMU, which is > > > > > > expected > > > > > > - * to remain discarded. vfio will pin all pages, > > > > > > populating memory. > > > > > > - * Disallow that. vmstate priorities make sure any > > > > > > RamDiscardManager > > > > > > - * were already restored before IOMMUs are restored. > > > > > > - */ > > > > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > > > > > > -error_report("iommu map to discarded memory (e.g., > > > > > > unplugged via" > > > > > > - " virtio-mem): %"HWADDR_PRIx"", > > > > > > - iotlb->translated_addr); > > > > > > -return false; > > > > > > -} > > > > > > - > > > > > > -/* > > > > > > - * Malicious VMs might trigger discarding of IOMMU-mapped > > > > > > memory. The > > > > > > - * pages will remain pinned inside vfio until unmapped, > > > > > > resulting in a > > > > > > - * higher memory consumption than expected. If memory > > > > > > would get > > > > > > - * populated again later, there would be an inconsistency > > > > > > between pages > > > > > > - * pinned by vfio and pages seen by QEMU. This is the case > > > > > > until > > > > > > - * unmapped from the IOMMU (e.g., during device reset). > > > > > > - * > > > > > > - * With malicious guests, we really only care about > > > > > > pinning more memory > > > > > > - * than expected. RLIMIT_MEMLOCK set for the user/process > > > > > > can never be > > > > > > - * exceeded and can be used to mitigate this problem. > > > > > > - */ > > > > > > -warn_report_once("Using vfio with vIOMMUs and coordinated > > > > > > discarding of" > > > > > > - " RAM (e.g., virtio-mem) works, however, > > > > > > malicious" > > > > > > - " guests can
Re: [PATCH 00/16] hw/9pfs: Add 9pfs support for Windows
On Fri, Oct 28, 2022 at 12:31 AM Christian Schoenebeck wrote: > > On Thursday, October 27, 2022 6:19:27 PM CEST Bin Meng wrote: > > Hi Christian, > > > > On Mon, Oct 24, 2022 at 1:16 PM Bin Meng wrote: > > > > > > At present there is no Windows support for 9p file system. > > > This series adds initial Windows support for 9p file system. > > > > > > 'local' file system backend driver is supported on Windows, > > > including open, read, write, close, rename, remove, etc. > > > All security models are supported. The mapped (mapped-xattr) > > > security model is implemented using NTFS Alternate Data Stream > > > (ADS) so the 9p export path shall be on an NTFS partition. > > > > > > 'synth' driver is adapted for Windows too so that we can now > > > run qtests on Windows for 9p related regression testing. > > > > > > Example command line to test: > > > > > > "-fsdev local,path=c:\msys64,security_model=mapped,id=p9 -device > > > virtio-9p-pci,fsdev=p9,mount_tag=p9fs" > > > > > > > > > Bin Meng (5): > > > qemu/xattr.h: Exclude for Windows > > > hw/9pfs: Drop unnecessary *xattr wrapper API declarations > > > hw/9pfs: Replace the direct call to xxxat() APIs with a wrapper > > > hw/9pfs: Introduce an opaque type 9P_FILE_ID > > > hw/9pfs: Update P9_FILE_ID to support Windows > > > > > > Guohuai Shi (11): > > > hw/9pfs: Add missing definitions for Windows > > > hw/9pfs: Implement Windows specific utilities functions for 9pfs > > > hw/9pfs: Handle current directory offset for Windows > > > hw/9pfs: Disable unsupported flags and features for Windows > > > hw/9pfs: Update the local fs driver to support Windows > > > hw/9pfs: Add Linux error number definition > > > hw/9pfs: Translate Windows errno to Linux value > > > fsdev: Disable proxy fs driver on Windows > > > hw/9pfs: Update synth fs driver for Windows > > > tests/qtest: virtio-9p-test: Adapt the case for win32 > > > meson.build: Turn on virtfs for Windows > > > > > > > With the latest 9p test case refactoring in the mainline, I will have > > to cherry-pick the following 2 patches in this series, to v6 of > > "tests/qtest: Enable running qtest on Windows" series [1], in order to > > get qtest on Windows build successfully. > > > > [06/16] hw/9pfs: Add missing definitions for Windows > > [15/16] tests/qtest: virtio-9p-test: Adapt the case for win32 > > > > I will include the above 2 patches in the v6 qtest windows support series. > > No need to add those patches as they are already being queued separately. Just > add appropriate tag(s) to the first patch: > > Based-on: Sure, but I really want to get the qtest windows support patch series merged first. I will disable the 9p test for Windows in my v6 then. We will enable the 9p test later when it lands on the mainline. > > I already had a quick look on this version, will try to give feedback > tomorrow. > > This feature won't make it into 7.2 release anyway, so patience please. ;-) > > Best regards, > Christian Schoenebeck > > > [1] http://patchwork.ozlabs.org/project/qemu-devel/list/?series=321695 Regards, Bin
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Fri, Oct 28, 2022 at 10:08 AM Alex Williamson wrote: > > On Fri, 28 Oct 2022 09:50:10 +0800 > Jason Wang wrote: > > > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson > > wrote: > > > > > > On Thu, 27 Oct 2022 15:40:31 +0800 > > > Cindy Lu wrote: > > > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > > > > change the name to memory_get_xlat_addr().So we can use this > > > > function in other devices,such as vDPA device. > > > > > > > > Signed-off-by: Cindy Lu > > > > --- > > > > hw/vfio/common.c | 92 ++- > > > > include/exec/memory.h | 4 ++ > > > > softmmu/memory.c | 84 +++ > > > > 3 files changed, 92 insertions(+), 88 deletions(-) > > > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > > index ace9562a9b..2b5a9f3d8d 100644 > > > > --- a/hw/vfio/common.c > > > > +++ b/hw/vfio/common.c > > > > @@ -574,92 +574,6 @@ static bool > > > > vfio_listener_skipped_section(MemoryRegionSection *section) > > > > section->offset_within_address_space & (1ULL << 63); > > > > } > > > > > > > > -/* Called with rcu_read_lock held. */ > > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > > > > - ram_addr_t *ram_addr, bool *read_only) > > > > -{ > > > > -MemoryRegion *mr; > > > > -hwaddr xlat; > > > > -hwaddr len = iotlb->addr_mask + 1; > > > > -bool writable = iotlb->perm & IOMMU_WO; > > > > - > > > > -/* > > > > - * The IOMMU TLB entry we have just covers translation through > > > > - * this IOMMU to its immediate target. We need to translate > > > > - * it the rest of the way through to memory. > > > > - */ > > > > -mr = address_space_translate(&address_space_memory, > > > > - iotlb->translated_addr, > > > > - &xlat, &len, writable, > > > > - MEMTXATTRS_UNSPECIFIED); > > > > -if (!memory_region_is_ram(mr)) { > > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > > > > - xlat); > > > > -return false; > > > > -} else if (memory_region_has_ram_discard_manager(mr)) { > > > > -RamDiscardManager *rdm = > > > > memory_region_get_ram_discard_manager(mr); > > > > -MemoryRegionSection tmp = { > > > > -.mr = mr, > > > > -.offset_within_region = xlat, > > > > -.size = int128_make64(len), > > > > -}; > > > > - > > > > -/* > > > > - * Malicious VMs can map memory into the IOMMU, which is > > > > expected > > > > - * to remain discarded. vfio will pin all pages, populating > > > > memory. > > > > - * Disallow that. vmstate priorities make sure any > > > > RamDiscardManager > > > > - * were already restored before IOMMUs are restored. > > > > - */ > > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > > > > -error_report("iommu map to discarded memory (e.g., > > > > unplugged via" > > > > - " virtio-mem): %"HWADDR_PRIx"", > > > > - iotlb->translated_addr); > > > > -return false; > > > > -} > > > > - > > > > -/* > > > > - * Malicious VMs might trigger discarding of IOMMU-mapped > > > > memory. The > > > > - * pages will remain pinned inside vfio until unmapped, > > > > resulting in a > > > > - * higher memory consumption than expected. If memory would get > > > > - * populated again later, there would be an inconsistency > > > > between pages > > > > - * pinned by vfio and pages seen by QEMU. This is the case > > > > until > > > > - * unmapped from the IOMMU (e.g., during device reset). > > > > - * > > > > - * With malicious guests, we really only care about pinning > > > > more memory > > > > - * than expected. RLIMIT_MEMLOCK set for the user/process can > > > > never be > > > > - * exceeded and can be used to mitigate this problem. > > > > - */ > > > > -warn_report_once("Using vfio with vIOMMUs and coordinated > > > > discarding of" > > > > - " RAM (e.g., virtio-mem) works, however, > > > > malicious" > > > > - " guests can trigger pinning of more memory > > > > than" > > > > - " intended via an IOMMU. It's possible to > > > > mitigate " > > > > - " by setting/adjusting RLIMIT_MEMLOCK."); > > > > -} > > > > - > > > > -/* > > > > - * Translation truncates length to the IOMMU page size, > > > > - * check that it did not truncate too much. > > > > - */ > > > > -if (len & iotlb->addr_mask) { > > > > -error_report("iommu has granularity incompatible with target > > > > AS"); > > > > -return false;
RE: [PATCH] memory: Fix wrong end address dump
>-Original Message- >From: David Hildenbrand >Sent: Friday, July 22, 2022 2:44 PM >To: Duan, Zhenzhong ; qemu- >de...@nongnu.org >Cc: pbonz...@redhat.com; pet...@redhat.com; f4...@amsat.org >Subject: Re: [PATCH] memory: Fix wrong end address dump > >On 22.06.22 11:59, Zhenzhong Duan wrote: >> The end address of memory region section isn't correctly calculated >> which leads to overflowed mtree dump: >> >> Dispatch >> Physical sections >> .. >> #70 @2000..00011fff io [ROOT] >> #71 @5000..5fff (noname) >> #72 @5000..00014fff io [ROOT] >> #73 @5658..5658 vmport >> #74 @5659..00015658 io [ROOT] >> #75 @6000..00015fff io [ROOT] >> >> After fix: >> #70 @2000..4fff io [ROOT] >> #71 @5000..5fff (noname) >> #72 @5000..5657 io [ROOT] >> #73 @5658..5658 vmport >> #74 @5659..5fff io [ROOT] >> #75 @6000.. io [ROOT] >> >> Fixes: 5e8fd947e2670 ("memory: Rework "info mtree" to print flat views >> and dispatch trees") >> Signed-off-by: Zhenzhong Duan >> --- >> softmmu/physmem.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/softmmu/physmem.c b/softmmu/physmem.c index >> 214cb04c8fc3..cbabd10ac0bf 100644 >> --- a/softmmu/physmem.c >> +++ b/softmmu/physmem.c >> @@ -3701,7 +3701,7 @@ void mtree_print_dispatch(AddressSpaceDispatch >*d, MemoryRegion *root) >> " %s%s%s%s%s", >> i, >> s->offset_within_address_space, >> -s->offset_within_address_space + MR_SIZE(s->mr->size), >> +s->offset_within_address_space + MR_SIZE(s->size), >> s->mr->name ? s->mr->name : "(noname)", >> i < ARRAY_SIZE(names) ? names[i] : "", >> s->mr == root ? " [ROOT]" : "", > >Reviewed-by: David Hildenbrand > >I assume this should get picked up soonish. Hi Maintainers, Can this patch be considered merged as it got reviewed-by and no objection for a long time. Thanks. Zhenzhong
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Fri, 28 Oct 2022 09:50:10 +0800 Jason Wang wrote: > On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson > wrote: > > > > On Thu, 27 Oct 2022 15:40:31 +0800 > > Cindy Lu wrote: > > > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > > > change the name to memory_get_xlat_addr().So we can use this > > > function in other devices,such as vDPA device. > > > > > > Signed-off-by: Cindy Lu > > > --- > > > hw/vfio/common.c | 92 ++- > > > include/exec/memory.h | 4 ++ > > > softmmu/memory.c | 84 +++ > > > 3 files changed, 92 insertions(+), 88 deletions(-) > > > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > > index ace9562a9b..2b5a9f3d8d 100644 > > > --- a/hw/vfio/common.c > > > +++ b/hw/vfio/common.c > > > @@ -574,92 +574,6 @@ static bool > > > vfio_listener_skipped_section(MemoryRegionSection *section) > > > section->offset_within_address_space & (1ULL << 63); > > > } > > > > > > -/* Called with rcu_read_lock held. */ > > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > > > - ram_addr_t *ram_addr, bool *read_only) > > > -{ > > > -MemoryRegion *mr; > > > -hwaddr xlat; > > > -hwaddr len = iotlb->addr_mask + 1; > > > -bool writable = iotlb->perm & IOMMU_WO; > > > - > > > -/* > > > - * The IOMMU TLB entry we have just covers translation through > > > - * this IOMMU to its immediate target. We need to translate > > > - * it the rest of the way through to memory. > > > - */ > > > -mr = address_space_translate(&address_space_memory, > > > - iotlb->translated_addr, > > > - &xlat, &len, writable, > > > - MEMTXATTRS_UNSPECIFIED); > > > -if (!memory_region_is_ram(mr)) { > > > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > > > - xlat); > > > -return false; > > > -} else if (memory_region_has_ram_discard_manager(mr)) { > > > -RamDiscardManager *rdm = > > > memory_region_get_ram_discard_manager(mr); > > > -MemoryRegionSection tmp = { > > > -.mr = mr, > > > -.offset_within_region = xlat, > > > -.size = int128_make64(len), > > > -}; > > > - > > > -/* > > > - * Malicious VMs can map memory into the IOMMU, which is expected > > > - * to remain discarded. vfio will pin all pages, populating > > > memory. > > > - * Disallow that. vmstate priorities make sure any > > > RamDiscardManager > > > - * were already restored before IOMMUs are restored. > > > - */ > > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > > > -error_report("iommu map to discarded memory (e.g., unplugged > > > via" > > > - " virtio-mem): %"HWADDR_PRIx"", > > > - iotlb->translated_addr); > > > -return false; > > > -} > > > - > > > -/* > > > - * Malicious VMs might trigger discarding of IOMMU-mapped > > > memory. The > > > - * pages will remain pinned inside vfio until unmapped, > > > resulting in a > > > - * higher memory consumption than expected. If memory would get > > > - * populated again later, there would be an inconsistency > > > between pages > > > - * pinned by vfio and pages seen by QEMU. This is the case until > > > - * unmapped from the IOMMU (e.g., during device reset). > > > - * > > > - * With malicious guests, we really only care about pinning more > > > memory > > > - * than expected. RLIMIT_MEMLOCK set for the user/process can > > > never be > > > - * exceeded and can be used to mitigate this problem. > > > - */ > > > -warn_report_once("Using vfio with vIOMMUs and coordinated > > > discarding of" > > > - " RAM (e.g., virtio-mem) works, however, > > > malicious" > > > - " guests can trigger pinning of more memory > > > than" > > > - " intended via an IOMMU. It's possible to > > > mitigate " > > > - " by setting/adjusting RLIMIT_MEMLOCK."); > > > -} > > > - > > > -/* > > > - * Translation truncates length to the IOMMU page size, > > > - * check that it did not truncate too much. > > > - */ > > > -if (len & iotlb->addr_mask) { > > > -error_report("iommu has granularity incompatible with target > > > AS"); > > > -return false; > > > -} > > > - > > > -if (vaddr) { > > > -*vaddr = memory_region_get_ram_ptr(mr) + xlat; > > > -} > > > - > > > -if (ram_addr) { > > > -*ram_addr = memory_region_get_ram_addr(mr) + xlat; > > > -} > > > - > > > -if (read_only) { > > > -*read_only =
Re: [PATCH V4 3/4] intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
On Thu, Oct 27, 2022 at 9:16 PM Yi Liu wrote: > > On 2022/10/27 15:50, Jason Wang wrote: > > We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an > > internal goto which prevents it from being reused. This patch convert > > that macro to a dedicated function and let the caller to decide what > > to do (e.g using goto or not). This makes sure it can be re-used for > > other function that requires fault reporting. > > > > Signed-off-by: Jason Wang > > --- > > Changes since V2: > > - rename vtd_qualify_report_fault() to vtd_report_qualify_fault() > > --- > > hw/i386/intel_iommu.c | 42 -- > > 1 file changed, 28 insertions(+), 14 deletions(-) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index 6abe12a8c5..6c03ecf3cb 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -49,17 +49,6 @@ > > /* pe operations */ > > #define VTD_PE_GET_TYPE(pe) ((pe)->val[0] & VTD_SM_PASID_ENTRY_PGTT) > > #define VTD_PE_GET_LEVEL(pe) (2 + (((pe)->val[0] >> 2) & > > VTD_SM_PASID_ENTRY_AW)) > > -#define VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, > > is_write) {\ > > -if (ret_fr) { > >\ > > -ret_fr = -ret_fr; > >\ > > -if (is_fpd_set && vtd_is_qualified_fault(ret_fr)) { > >\ > > -trace_vtd_fault_disabled(); > >\ > > -} else { > >\ > > -vtd_report_dmar_fault(s, source_id, addr, ret_fr, is_write); > >\ > > -} > >\ > > -goto error; > >\ > > -} > >\ > > -} > > > > /* > >* PCI bus number (or SID) is not reliable since the device is usaully > > @@ -1718,6 +1707,19 @@ out: > > trace_vtd_pt_enable_fast_path(source_id, success); > > } > > > > +static void vtd_report_qualify_fault(IntelIOMMUState *s, > > + int err, bool is_fpd_set, > > + uint16_t source_id, > > + hwaddr addr, > > + bool is_write) > > +{ > > +if (is_fpd_set && vtd_is_qualified_fault(err)) { > > +trace_vtd_fault_disabled(); > > +} else { > > +vtd_report_dmar_fault(s, source_id, addr, err, is_write); > > seems like this will report non-qualified fault. so the naming is not > most suit. :-) Otherwise, I'm ok with the change. Right, let me rename it to vtd_report_fault(). Thanks > > > +} > > +} > > + > > /* Map dev to context-entry then do a paging-structures walk to do a iommu > >* translation. > >* > > @@ -1778,7 +1780,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace > > *vtd_as, PCIBus *bus, > > is_fpd_set = ce.lo & VTD_CONTEXT_ENTRY_FPD; > > if (!is_fpd_set && s->root_scalable) { > > ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set); > > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, > > is_write); > > +if (ret_fr) { > > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, > > + source_id, addr, is_write); > > +goto error; > > +} > > } > > } else { > > ret_fr = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); > > @@ -1786,7 +1792,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace > > *vtd_as, PCIBus *bus, > > if (!ret_fr && !is_fpd_set && s->root_scalable) { > > ret_fr = vtd_ce_get_pasid_fpd(s, &ce, &is_fpd_set); > > } > > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, > > is_write); > > +if (ret_fr) { > > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, > > + source_id, addr, is_write); > > +goto error; > > +} > > /* Update context-cache */ > > trace_vtd_iotlb_cc_update(bus_num, devfn, ce.hi, ce.lo, > > cc_entry->context_cache_gen, > > @@ -1822,7 +1832,11 @@ static bool vtd_do_iommu_translate(VTDAddressSpace > > *vtd_as, PCIBus *bus, > > > > ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level, > > &reads, &writes, s->aw_bits); > > -VTD_PE_GET_FPD_ERR(ret_fr, is_fpd_set, s, source_id, addr, is_write); > > +if (ret_fr) { > > +vtd_report_qualify_fault(s, -ret_fr, is_fpd_set, source_id, > > + addr, is_write); > > +goto error; > > +} > > > > page_mask =
Re: [PATCH V4 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry
On Thu, Oct 27, 2022 at 9:20 PM Yi Liu wrote: > > On 2022/10/27 15:50, Jason Wang wrote: > > We use to warn on wrong rid2pasid entry. But this error could be > > triggered by the guest and could happens during initialization. So > > let's don't warn in this case. > > > > Signed-off-by: Jason Wang > > --- > > hw/i386/intel_iommu.c | 6 -- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c > > index 6524c2ee32..796f924c06 100644 > > --- a/hw/i386/intel_iommu.c > > +++ b/hw/i386/intel_iommu.c > > @@ -1554,8 +1554,10 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, > > VTDContextEntry *ce) > > if (s->root_scalable) { > > ret = vtd_ce_get_rid2pasid_entry(s, ce, &pe); > > if (ret) { > > ret is no more used in this branch. It may be changed to below. right? Right. > > if (vtd_ce_get_rid2pasid_entry(s, ce, &pe)) { > ... > } > > > -error_report_once("%s: vtd_ce_get_rid2pasid_entry error: > > %"PRId32, > > - __func__, ret); > > +/* > > + * This error is guest triggerable. We should assumt PT > > s/triggerable/trigger-able > s/assumt/assume Fixed. Thanks > > > + * not enabled for safety. > > + */ > > return false; > > } > > return (VTD_PE_GET_TYPE(&pe) == VTD_SM_PASID_ENTRY_PT); > > -- > Regards, > Yi Liu >
Re: [PATCH] net: improve error message for missing netdev backend
On Thu, Oct 27, 2022 at 6:52 PM Daniel P. Berrangé wrote: > > ping: Jason, are you willing to queue this since it has two > positive reviews. > Yes, I've queued this. Thanks > On Mon, Oct 03, 2022 at 11:06:12AM +0100, Daniel P. Berrangé wrote: > > The current message when using '-net user...' with SLIRP disabled at > > compile time is: > > > > qemu-system-x86_64: -net user: Parameter 'type' expects a net backend > > type (maybe it is not compiled into this binary) > > > > An observation is that we're using the 'netdev->type' field here which > > is an enum value, produced after QAPI has converted from its string > > form. > > > > IOW, at this point in the code, we know that the user's specified > > type name was a valid network backend. The only possible scenario that > > can make the backend init function be NULL, is if support for that > > backend was disabled at build time. Given this, we don't need to caveat > > our error message with a 'maybe' hint, we can be totally explicit. > > > > The use of QERR_INVALID_PARAMETER_VALUE doesn't really lend itself to > > user friendly error message text. Since this is not used to set a > > specific QAPI error class, we can simply stop using this pre-formatted > > error text and provide something better. > > > > Thus the new message is: > > > > qemu-system-x86_64: -net user: network backend 'user' is not compiled > > into this binary > > > > The case of passing 'hubport' for -net is also given a message reminding > > people they should have used -netdev/-nic instead, as this backend type > > is only valid for the modern syntax. > > > > Signed-off-by: Daniel P. Berrangé > > --- > > > > NB, this does not make any difference to people who were relying on the > > QEMU built-in default hub that was created if you don't list any -net / > > -netdev / -nic argument, only those using explicit args. > > > > net/net.c | 18 +++--- > > 1 file changed, 11 insertions(+), 7 deletions(-) > > > > diff --git a/net/net.c b/net/net.c > > index 2db160e063..8ddafacf13 100644 > > --- a/net/net.c > > +++ b/net/net.c > > @@ -1036,19 +1036,23 @@ static int net_client_init1(const Netdev *netdev, > > bool is_netdev, Error **errp) > > if (is_netdev) { > > if (netdev->type == NET_CLIENT_DRIVER_NIC || > > !net_client_init_fun[netdev->type]) { > > -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type", > > - "a netdev backend type"); > > +error_setg(errp, "network backend '%s' is not compiled into > > this binary", > > + NetClientDriver_str(netdev->type)); > > return -1; > > } > > } else { > > if (netdev->type == NET_CLIENT_DRIVER_NONE) { > > return 0; /* nothing to do */ > > } > > -if (netdev->type == NET_CLIENT_DRIVER_HUBPORT || > > -!net_client_init_fun[netdev->type]) { > > -error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "type", > > - "a net backend type (maybe it is not compiled " > > - "into this binary)"); > > +if (netdev->type == NET_CLIENT_DRIVER_HUBPORT) { > > +error_setg(errp, "network backend '%s' is only supported with > > -netdev/-nic", > > + NetClientDriver_str(netdev->type)); > > +return -1; > > +} > > + > > +if (!net_client_init_fun[netdev->type]) { > > +error_setg(errp, "network backend '%s' is not compiled into > > this binary", > > + NetClientDriver_str(netdev->type)); > > return -1; > > } > > > > -- > > 2.37.3 > > > > With regards, > Daniel > -- > |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o-https://fstop138.berrange.com :| > |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :| >
Re: [PATCH 3/3] vdpa: Expose VIRTIO_NET_F_STATUS unconditionally
On Thu, Oct 27, 2022 at 6:18 PM Eugenio Perez Martin wrote: > > On Thu, Oct 27, 2022 at 8:54 AM Jason Wang wrote: > > > > On Thu, Oct 27, 2022 at 2:47 PM Eugenio Perez Martin > > wrote: > > > > > > On Thu, Oct 27, 2022 at 6:32 AM Jason Wang wrote: > > > > > > > > > > > > 在 2022/10/26 17:53, Eugenio Pérez 写道: > > > > > Now that qemu can handle and emulate it if the vdpa backend does not > > > > > support it we can offer it always. > > > > > > > > > > Signed-off-by: Eugenio Pérez > > > > > > > > > > > > I may miss something but isn't more easier to simply remove the > > > > _F_STATUS from vdpa_feature_bits[]? > > > > > > > > > > How is that? if we remove it, the guest cannot ack it so it cannot > > > access the net status, isn't it? > > > > My understanding is that the bits stored in the vdpa_feature_bits[] > > are the features that must be explicitly supported by the vhost > > device. > > (Non English native here, so maybe I don't get what you mean :) ) The > device may not support them. net simulator lacks some of them > actually, and it works. Speaking too fast, I think I meant that, if the bit doesn't belong to vdpa_feature_bits[], it is assumed to be supported by the Qemu without the support of the vhost. So Qemu won't even try to validate if vhost has this support. E.g for vhost-net, we only have: static const int kernel_feature_bits[] = { VIRTIO_F_NOTIFY_ON_EMPTY, VIRTIO_RING_F_INDIRECT_DESC, VIRTIO_RING_F_EVENT_IDX, VIRTIO_NET_F_MRG_RXBUF, VIRTIO_F_VERSION_1, VIRTIO_NET_F_MTU, VIRTIO_F_IOMMU_PLATFORM, VIRTIO_F_RING_PACKED, VIRTIO_NET_F_HASH_REPORT, VHOST_INVALID_FEATURE_BIT }; You can see there's no STATUS bit there since it is emulated by Qemu. > > From what I see these are the only features that will be forwarded to > the guest as device_features. If it is not in the list, the feature > will be masked out, Only when there's no support for this feature from the vhost. > as if the device does not support it. > > So now _F_STATUS it was forwarded only if the device supports it. If > we remove it from bit_mask, it will never be offered to the guest. But > we want to offer it always, since we will need it for > _F_GUEST_ANNOUNCE. > > Things get more complex because we actually need to ack it back if the > device offers it, so the vdpa device can report link_down. We will > only emulate LINK_UP always in the case the device does not support > _F_STATUS. > > > So if we remove _F_STATUS, Qemu vhost code won't validate if > > vhost-vdpa device has this support: > > > > uint64_t vhost_get_features(struct vhost_dev *hdev, const int *feature_bits, > > uint64_t features) > > { > > const int *bit = feature_bits; > > while (*bit != VHOST_INVALID_FEATURE_BIT) { > > uint64_t bit_mask = (1ULL << *bit); > > if (!(hdev->features & bit_mask)) { > > features &= ~bit_mask; > > } > > bit++; > > } > > return features; > > } > > > > Now maybe I'm the one missing something, but why is this not done as a > masking directly? Not sure, the code has been there since day 0. But you can see from the code: 1) if STATUS is in feature_bits, we need validate the hdev->features and mask it if the vhost doesn't have the support 2) if STATUS is not, we don't do the check and driver may still see STATUS Thanks > > Instead of making feature_bits an array of ints, to declare it as a > uint64_t with the valid feature bits and simply return features & > feature_bits. > > Thanks! > > > Thanks > > > > > > > > > > > > The goal with this patch series is to let the guest access the status > > > always, even if the device doesn't support _F_STATUS. > > > > > > > Thanks > > > > > > > > > > > > > --- > > > > > include/net/vhost-vdpa.h | 1 + > > > > > hw/net/vhost_net.c | 16 ++-- > > > > > net/vhost-vdpa.c | 3 +++ > > > > > 3 files changed, 18 insertions(+), 2 deletions(-) > > > > > > > > > > diff --git a/include/net/vhost-vdpa.h b/include/net/vhost-vdpa.h > > > > > index b81f9a6f2a..cfbcce6427 100644 > > > > > --- a/include/net/vhost-vdpa.h > > > > > +++ b/include/net/vhost-vdpa.h > > > > > @@ -17,5 +17,6 @@ > > > > > struct vhost_net *vhost_vdpa_get_vhost_net(NetClientState *nc); > > > > > > > > > > extern const int vdpa_feature_bits[]; > > > > > +extern const uint64_t vhost_vdpa_net_added_feature_bits; > > > > > > > > > > #endif /* VHOST_VDPA_H */ > > > > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c > > > > > index d28f8b974b..7c15cc6e8f 100644 > > > > > --- a/hw/net/vhost_net.c > > > > > +++ b/hw/net/vhost_net.c > > > > > @@ -109,10 +109,22 @@ static const int > > > > > *vhost_net_get_feature_bits(struct vhost_net *net) > > > > > return feature_bits; > > > > > } > > > > > > > > > > +static uint64_t vhost_net_add_feature_bits(struct vhost_net *net) > > > > > +{ > > > > > +if (net->nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > > +
Re: [PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa
On Fri, Oct 28, 2022 at 5:56 AM Si-Wei Liu wrote: > > Hi Jason, > > Sorry for top posting, but are you going to queue this patch? It looks > like the discussion has been settled and no further comment I got for 2 > weeks for this patch. Yes, I've queued this. Thanks > > Thanks, > -Siwei > > On 10/13/2022 4:12 PM, Si-Wei Liu wrote: > > Jason, > > > > On 10/12/2022 10:02 PM, Jason Wang wrote: > >> > >> 在 2022/10/12 13:59, Si-Wei Liu 写道: > >>> > >>> > >>> On 10/11/2022 8:09 PM, Jason Wang wrote: > On Tue, Oct 11, 2022 at 1:18 AM Si-Wei Liu > wrote: > > On 10/8/2022 10:43 PM, Jason Wang wrote: > > > > On Sat, Oct 8, 2022 at 5:04 PM Si-Wei Liu > > wrote: > > > > Similar to other vhost backends, vhostfd can be passed to vhost-vdpa > > backend as another parameter to instantiate vhost-vdpa net client. > > This would benefit the use case where only open file descriptors, as > > opposed to raw vhost-vdpa device paths, are accessible from the QEMU > > process. > > > > (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1 > > > > Adding Cindy. > > > > This has been discussed before, we've already had > > vhostdev=/dev/fdset/$fd which should be functional equivalent to what > > has been proposed here. (And this is how libvirt works if I > > understand > > correctly). > > > > Yes, I was aware of that discussion. However, our implementation > > of the management software is a bit different from libvirt, in > > which the paths in /dev/fdset/NNN can't be dynamically passed to > > the container where QEMU is running. By using a specific vhostfd > > property with existing code, it would allow our mgmt software > > smooth adaption without having to add too much infra code to > > support the /dev/fdset/NNN trick. > I think fdset has extra flexibility in e.g hot-plug to allow the file > descriptor to be passed with SCM_RIGHTS. > >>> Yes, that's exactly the use case we'd like to support. Though the > >>> difference in our mgmt software stack from libvirt is that any > >>> dynamic path in /dev (like /dev/fdset/ABC or /dev/vhost-vdpa-XYZ) > >>> can't be allowed to get passed through to the container running QEMU > >>> on the fly for security reasons. fd passing is allowed, though, with > >>> very strict security checks. > >> > >> > >> Interesting, any reason for disallowing fd passing? > > For our mgmt software stack, QEMU is running in a secured container > > with its own namespace(s) with minimally well known and trusted > > devices from root ns exposed (only) at the time when QEMU is being > > started. Direct fd passing via SCM_RIGHTS is allowed, but fdset > > device node exposure is not allowed and not even considered useful to > > us, as it adds an unwarranted attack surface to the QEMU's secured > > container unnecessarily. This has been the case and our security model > > for a while now w.r.t hot plugging vhost-net/tap and vhost-scsi > > devices, so will do for vhost-vdpa with vhostfd. It's not an open > > source project, though what I can share is that it's not a simple > > script that can be easily changed, and allow passing extra devices > > e.g. fdset especially on the fly is not even in consideration per > > suggested security guideline. I think we don't do anything special > > here as with other secured containers that disallow dynamic device > > injection on the fly. > > > >> I'm asking since it's the way that libvirt work and it seems to me we > >> don't get any complaints in the past. > > I guess it was because libvirt doesn't run QEMU in a container with > > very limited device exposure, otherwise this sort of constraints would > > pop up. Anyway the point and the way I see it is that passing vhostfd > > is proved to be working well and secure with other vhost devices, I > > don't see why vhost-vdpa is treated special here that would need to > > enforce the fdset usage. It's an edge case for libvirt maybe, but > > supporting QEMU's vhost-vdpa device to run in a securely contained > > environment with no dynamic device injection shouldn't be an odd or > > bizarre use case. > > > > > > Thanks, > > -Siwei > > > >> > >> > >>> That's the main motivation for this direct vhostfd passing support > >>> (noted fdset doesn't need to be used along with /dev/fdset node). > >>> > >>> Having it said, I found there's also nuance in the > >>> vhostdev=/dev/fdset/XyZ interface besides the /dev node limitation: > >>> the fd to open has to be dup'ed from the original one passed via > >>> SCM_RIGHTS. This also has implication on security that any ioctl > >>> call from QEMU can't be audited through the original fd. > >> > >> > >> I'm not sure I get this, but management layer can enforce a ioctl > >> whiltelist for safety. > >> > >> Thanks > >> > >> > >>> With this regard, I think vhostfd offers more flexibility than work > >>> around those qemu_open() specifics. Would these justify the use cas
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Fri, Oct 28, 2022 at 5:11 AM Alex Williamson wrote: > > On Thu, 27 Oct 2022 15:40:31 +0800 > Cindy Lu wrote: > > > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > > change the name to memory_get_xlat_addr().So we can use this > > function in other devices,such as vDPA device. > > > > Signed-off-by: Cindy Lu > > --- > > hw/vfio/common.c | 92 ++- > > include/exec/memory.h | 4 ++ > > softmmu/memory.c | 84 +++ > > 3 files changed, 92 insertions(+), 88 deletions(-) > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > > index ace9562a9b..2b5a9f3d8d 100644 > > --- a/hw/vfio/common.c > > +++ b/hw/vfio/common.c > > @@ -574,92 +574,6 @@ static bool > > vfio_listener_skipped_section(MemoryRegionSection *section) > > section->offset_within_address_space & (1ULL << 63); > > } > > > > -/* Called with rcu_read_lock held. */ > > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > > - ram_addr_t *ram_addr, bool *read_only) > > -{ > > -MemoryRegion *mr; > > -hwaddr xlat; > > -hwaddr len = iotlb->addr_mask + 1; > > -bool writable = iotlb->perm & IOMMU_WO; > > - > > -/* > > - * The IOMMU TLB entry we have just covers translation through > > - * this IOMMU to its immediate target. We need to translate > > - * it the rest of the way through to memory. > > - */ > > -mr = address_space_translate(&address_space_memory, > > - iotlb->translated_addr, > > - &xlat, &len, writable, > > - MEMTXATTRS_UNSPECIFIED); > > -if (!memory_region_is_ram(mr)) { > > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > > - xlat); > > -return false; > > -} else if (memory_region_has_ram_discard_manager(mr)) { > > -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr); > > -MemoryRegionSection tmp = { > > -.mr = mr, > > -.offset_within_region = xlat, > > -.size = int128_make64(len), > > -}; > > - > > -/* > > - * Malicious VMs can map memory into the IOMMU, which is expected > > - * to remain discarded. vfio will pin all pages, populating memory. > > - * Disallow that. vmstate priorities make sure any > > RamDiscardManager > > - * were already restored before IOMMUs are restored. > > - */ > > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > > -error_report("iommu map to discarded memory (e.g., unplugged > > via" > > - " virtio-mem): %"HWADDR_PRIx"", > > - iotlb->translated_addr); > > -return false; > > -} > > - > > -/* > > - * Malicious VMs might trigger discarding of IOMMU-mapped memory. > > The > > - * pages will remain pinned inside vfio until unmapped, resulting > > in a > > - * higher memory consumption than expected. If memory would get > > - * populated again later, there would be an inconsistency between > > pages > > - * pinned by vfio and pages seen by QEMU. This is the case until > > - * unmapped from the IOMMU (e.g., during device reset). > > - * > > - * With malicious guests, we really only care about pinning more > > memory > > - * than expected. RLIMIT_MEMLOCK set for the user/process can > > never be > > - * exceeded and can be used to mitigate this problem. > > - */ > > -warn_report_once("Using vfio with vIOMMUs and coordinated > > discarding of" > > - " RAM (e.g., virtio-mem) works, however, > > malicious" > > - " guests can trigger pinning of more memory than" > > - " intended via an IOMMU. It's possible to > > mitigate " > > - " by setting/adjusting RLIMIT_MEMLOCK."); > > -} > > - > > -/* > > - * Translation truncates length to the IOMMU page size, > > - * check that it did not truncate too much. > > - */ > > -if (len & iotlb->addr_mask) { > > -error_report("iommu has granularity incompatible with target AS"); > > -return false; > > -} > > - > > -if (vaddr) { > > -*vaddr = memory_region_get_ram_ptr(mr) + xlat; > > -} > > - > > -if (ram_addr) { > > -*ram_addr = memory_region_get_ram_addr(mr) + xlat; > > -} > > - > > -if (read_only) { > > -*read_only = !writable || mr->readonly; > > -} > > - > > -return true; > > -} > > - > > static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > > { > > VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); > > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, > > IOM
[PATCH v2 0/3] Improve FDT and support TPM for LoongArch
This series load FDT table into dram memory space and add uart,rtc info into FDT, Add TPM device for LoongArch virt machine. Changes for v2: 1. Modify the title of the first patch("Change FDT base") to "Load FDT table into dram memory space". Changes for v1: 1. Change FDT base addr to 2 MiB. 2. Add uart and rtc info into FDT. 3. Add TPM device support. Xiaojuan Yang (3): hw/loongarch: Load FDT table into dram memory space hw/loongarch: Improve fdt for LoongArch virt machine hw/loongarch: Add TPM device for LoongArch virt machine hw/loongarch/acpi-build.c | 50 -- hw/loongarch/virt.c | 53 - include/hw/loongarch/virt.h | 3 --- include/hw/pci-host/ls7a.h | 1 + 4 files changed, 95 insertions(+), 12 deletions(-) -- 2.31.1
[PATCH v2 3/3] hw/loongarch: Add TPM device for LoongArch virt machine
Add TPM device for LoongArch virt machine, including establish TPM acpi info and add TYPE_TPM_TIS_SYSBUS to dynamic_sysbus_devices list. Signed-off-by: Xiaojuan Yang --- hw/loongarch/acpi-build.c | 50 +-- hw/loongarch/virt.c | 4 2 files changed, 52 insertions(+), 2 deletions(-) diff --git a/hw/loongarch/acpi-build.c b/hw/loongarch/acpi-build.c index 378a6d9d38..1d0e562435 100644 --- a/hw/loongarch/acpi-build.c +++ b/hw/loongarch/acpi-build.c @@ -31,6 +31,9 @@ #include "hw/acpi/generic_event_device.h" #include "hw/pci-host/gpex.h" +#include "sysemu/tpm.h" +#include "hw/platform-bus.h" +#include "hw/acpi/aml-build.h" #define ACPI_BUILD_ALIGN_SIZE 0x1000 #define ACPI_BUILD_TABLE_SIZE 0x2 @@ -275,6 +278,41 @@ static void build_pci_device_aml(Aml *scope, LoongArchMachineState *lams) acpi_dsdt_add_gpex(scope, &cfg); } +#ifdef CONFIG_TPM +static void acpi_dsdt_add_tpm(Aml *scope, LoongArchMachineState *vms) +{ +PlatformBusDevice *pbus = PLATFORM_BUS_DEVICE(vms->platform_bus_dev); +hwaddr pbus_base = VIRT_PLATFORM_BUS_BASEADDRESS; +SysBusDevice *sbdev = SYS_BUS_DEVICE(tpm_find()); +MemoryRegion *sbdev_mr; +hwaddr tpm_base; + +if (!sbdev) { +return; +} + +tpm_base = platform_bus_get_mmio_addr(pbus, sbdev, 0); +assert(tpm_base != -1); + +tpm_base += pbus_base; + +sbdev_mr = sysbus_mmio_get_region(sbdev, 0); + +Aml *dev = aml_device("TPM0"); +aml_append(dev, aml_name_decl("_HID", aml_string("MSFT0101"))); +aml_append(dev, aml_name_decl("_STR", aml_string("TPM 2.0 Device"))); +aml_append(dev, aml_name_decl("_UID", aml_int(0))); + +Aml *crs = aml_resource_template(); +aml_append(crs, + aml_memory32_fixed(tpm_base, + (uint32_t)memory_region_size(sbdev_mr), + AML_READ_WRITE)); +aml_append(dev, aml_name_decl("_CRS", crs)); +aml_append(scope, dev); +} +#endif + /* build DSDT */ static void build_dsdt(GArray *table_data, BIOSLinker *linker, MachineState *machine) @@ -289,7 +327,9 @@ build_dsdt(GArray *table_data, BIOSLinker *linker, MachineState *machine) build_uart_device_aml(dsdt); build_pci_device_aml(dsdt, lams); build_la_ged_aml(dsdt, machine); - +#ifdef CONFIG_TPM +acpi_dsdt_add_tpm(dsdt, lams); +#endif /* System State Package */ scope = aml_scope("\\"); pkg = aml_package(4); @@ -358,7 +398,13 @@ static void acpi_build(AcpiBuildTables *tables, MachineState *machine) build_mcfg(tables_blob, tables->linker, &mcfg, lams->oem_id, lams->oem_table_id); } - +/* TPM info */ +if (tpm_get_version(tpm_find()) == TPM_VERSION_2_0) { +acpi_add_table(table_offsets, tables_blob); +build_tpm2(tables_blob, tables->linker, + tables->tcpalog, lams->oem_id, + lams->oem_table_id); +} /* Add tables supplied by user (if any) */ for (u = acpi_table_first(); u; u = acpi_table_next(u)) { unsigned len = acpi_table_len(u); diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index eed9d591e7..c1612d5e05 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -41,6 +41,7 @@ #include "hw/platform-bus.h" #include "hw/display/ramfb.h" #include "hw/mem/pc-dimm.h" +#include "sysemu/tpm.h" static void fdt_add_rtc_node(LoongArchMachineState *lams) { @@ -993,6 +994,9 @@ static void loongarch_class_init(ObjectClass *oc, void *data) object_class_property_set_description(oc, "acpi", "Enable ACPI"); machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); +#ifdef CONFIG_TPM +machine_class_allow_dynamic_sysbus_dev(mc, TYPE_TPM_TIS_SYSBUS); +#endif } static const TypeInfo loongarch_machine_types[] = { -- 2.31.1
[PATCH v2 2/3] hw/loongarch: Improve fdt for LoongArch virt machine
Add new items into LoongArch FDT, including rtc and uart info. Signed-off-by: Xiaojuan Yang --- hw/loongarch/virt.c| 31 +++ include/hw/pci-host/ls7a.h | 1 + 2 files changed, 32 insertions(+) diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index fe33e7e3e4..eed9d591e7 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -42,6 +42,35 @@ #include "hw/display/ramfb.h" #include "hw/mem/pc-dimm.h" +static void fdt_add_rtc_node(LoongArchMachineState *lams) +{ +char *nodename; +hwaddr base = VIRT_RTC_REG_BASE; +hwaddr size = VIRT_RTC_LEN; +MachineState *ms = MACHINE(lams); + +nodename = g_strdup_printf("/rtc@%" PRIx64, base); +qemu_fdt_add_subnode(ms->fdt, nodename); +qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", "loongson,ls7a-rtc"); +qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg", 0x0, base, size); +g_free(nodename); +} + +static void fdt_add_uart_node(LoongArchMachineState *lams) +{ +char *nodename; +hwaddr base = VIRT_UART_BASE; +hwaddr size = VIRT_UART_SIZE; +MachineState *ms = MACHINE(lams); + +nodename = g_strdup_printf("/serial@%" PRIx64, base); +qemu_fdt_add_subnode(ms->fdt, nodename); +qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", "ns16550a"); +qemu_fdt_setprop_cells(ms->fdt, nodename, "reg", 0x0, base, 0x0, size); +qemu_fdt_setprop_cell(ms->fdt, nodename, "clock-frequency", 1); +g_free(nodename); +} + static void create_fdt(LoongArchMachineState *lams) { MachineState *ms = MACHINE(lams); @@ -422,6 +451,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState * qdev_get_gpio_in(pch_pic, VIRT_UART_IRQ - PCH_PIC_IRQ_OFFSET), 115200, serial_hd(0), DEVICE_LITTLE_ENDIAN); +fdt_add_uart_node(lams); /* Network init */ for (i = 0; i < nb_nics; i++) { @@ -442,6 +472,7 @@ static void loongarch_devices_init(DeviceState *pch_pic, LoongArchMachineState * sysbus_create_simple("ls7a_rtc", VIRT_RTC_REG_BASE, qdev_get_gpio_in(pch_pic, VIRT_RTC_IRQ - PCH_PIC_IRQ_OFFSET)); +fdt_add_rtc_node(lams); pm_mem = g_new(MemoryRegion, 1); memory_region_init_io(pm_mem, NULL, &loongarch_virt_pm_ops, diff --git a/include/hw/pci-host/ls7a.h b/include/hw/pci-host/ls7a.h index 9bd875ca8b..df7fa55a30 100644 --- a/include/hw/pci-host/ls7a.h +++ b/include/hw/pci-host/ls7a.h @@ -37,6 +37,7 @@ #define VIRT_PCI_IRQS48 #define VIRT_UART_IRQ(PCH_PIC_IRQ_OFFSET + 2) #define VIRT_UART_BASE 0x1fe001e0 +#define VIRT_UART_SIZE 0X100 #define VIRT_RTC_IRQ (PCH_PIC_IRQ_OFFSET + 3) #define VIRT_MISC_REG_BASE (VIRT_PCH_REG_BASE + 0x0008) #define VIRT_RTC_REG_BASE(VIRT_MISC_REG_BASE + 0x00050100) -- 2.31.1
[PATCH v2 1/3] hw/loongarch: Load FDT table into dram memory space
Load FDT table into dram memory space, and the addr is 2 MiB. Since lowmem region starts from 0, FDT base address is located at 2 MiB to avoid NULL pointer access. Signed-off-by: Xiaojuan Yang --- hw/loongarch/virt.c | 18 +++--- include/hw/loongarch/virt.h | 3 --- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/hw/loongarch/virt.c b/hw/loongarch/virt.c index 29df99727d..fe33e7e3e4 100644 --- a/hw/loongarch/virt.c +++ b/hw/loongarch/virt.c @@ -159,7 +159,6 @@ static void fdt_add_pcie_node(const LoongArchMachineState *lams) 1, FDT_PCI_RANGE_MMIO, 2, base_mmio, 2, base_mmio, 2, size_mmio); g_free(nodename); -qemu_fdt_dumpdtb(ms->fdt, lams->fdt_size); } static void fdt_add_irqchip_node(LoongArchMachineState *lams) @@ -689,6 +688,7 @@ static void loongarch_init(MachineState *machine) MemoryRegion *address_space_mem = get_system_memory(); LoongArchMachineState *lams = LOONGARCH_MACHINE(machine); int i; +hwaddr fdt_base; if (!cpu_model) { cpu_model = LOONGARCH_CPU_TYPE_NAME("la464"); @@ -793,12 +793,16 @@ static void loongarch_init(MachineState *machine) lams->machine_done.notify = virt_machine_done; qemu_add_machine_init_done_notifier(&lams->machine_done); fdt_add_pcie_node(lams); - -/* load fdt */ -MemoryRegion *fdt_rom = g_new(MemoryRegion, 1); -memory_region_init_rom(fdt_rom, NULL, "fdt", VIRT_FDT_SIZE, &error_fatal); -memory_region_add_subregion(get_system_memory(), VIRT_FDT_BASE, fdt_rom); -rom_add_blob_fixed("fdt", machine->fdt, lams->fdt_size, VIRT_FDT_BASE); +/* + * Since lowmem region starts from 0, FDT base address is located + * at 2 MiB to avoid NULL pointer access. + * + * Put the FDT into the memory map as a ROM image: this will ensure + * the FDT is copied again upon reset, even if addr points into RAM. + */ +fdt_base = 2 * MiB; +qemu_fdt_dumpdtb(machine->fdt, lams->fdt_size); +rom_add_blob_fixed("fdt", machine->fdt, lams->fdt_size, fdt_base); } bool loongarch_is_acpi_enabled(LoongArchMachineState *lams) diff --git a/include/hw/loongarch/virt.h b/include/hw/loongarch/virt.h index 09f1c88ee5..45c383f5a7 100644 --- a/include/hw/loongarch/virt.h +++ b/include/hw/loongarch/virt.h @@ -28,9 +28,6 @@ #define VIRT_GED_MEM_ADDR (VIRT_GED_EVT_ADDR + ACPI_GED_EVT_SEL_LEN) #define VIRT_GED_REG_ADDR (VIRT_GED_MEM_ADDR + MEMORY_HOTPLUG_IO_LEN) -#define VIRT_FDT_BASE 0x1c40 -#define VIRT_FDT_SIZE 0x10 - struct LoongArchMachineState { /*< private >*/ MachineState parent_obj; -- 2.31.1
Re: [PATCH] i386/cpu: Adjust cpuid addresable id count to match the spec
On 10/10/2022 10:49 AM, Ilya Oleinik wrote: > For EBX bits 23 - 16 in CPUID[01] Intel's manual states: > " > * The nearest power-of-2 integer that is not smaller than EBX[23:16] > is the number of unique initial APICIDs reserved for addressing > different logical processors in a physical package. This field is > only valid if CPUID.1.EDX.HTT[bit 28]= 1. > " > Ensure this condition is met. > > Additionally, apply efb3934adf9ee7794db7e0ade9f576c634592891 to > non passthrough cache mode. > > Signed-off-by: Ilya Oleinik > --- > target/i386/cpu.c | 10 +- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index ad623d91e4..e793bcc03f 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -245,8 +245,8 @@ static void encode_cache_cpuid4(CPUCacheInfo *cache, > *eax = CACHE_TYPE(cache->type) | > CACHE_LEVEL(cache->level) | > (cache->self_init ? CACHE_SELF_INIT_LEVEL : 0) | > - ((num_cores - 1) << 26) | > - ((num_apic_ids - 1) << 14); > + ((pow2ceil(num_cores) - 1) << 26) | > + ((pow2ceil(num_apic_ids) - 1) << 14); > > assert(cache->line_size > 0); > assert(cache->partitions > 0); > @@ -5258,7 +5258,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, > uint32_t count, > } > *edx = env->features[FEAT_1_EDX]; > if (cs->nr_cores * cs->nr_threads > 1) { > -*ebx |= (cs->nr_cores * cs->nr_threads) << 16; > +*ebx |= (pow2ceil(cs->nr_cores * cs->nr_threads)) << 16; > *edx |= CPUID_HT; > } > if (!cpu->enable_pmu) { > @@ -5313,12 +5313,12 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, > uint32_t count, > switch (count) { > case 0: /* L1 dcache info */ > encode_cache_cpuid4(env->cache_info_cpuid4.l1d_cache, > -1, cs->nr_cores, Hi Ilya, Just curious why the origin implementation is hard-coded to 1 and is there any bug reported related to this? > +cs->nr_threads, cs->nr_cores, > eax, ebx, ecx, edx); > break; > case 1: /* L1 icache info */ > encode_cache_cpuid4(env->cache_info_cpuid4.l1i_cache, > -1, cs->nr_cores, > +cs->nr_threads, cs->nr_cores, > eax, ebx, ecx, edx); > break; > case 2: /* L2 cache info */
Re: [PATCH v1 2/3] target/riscv: Extend isa_ext_data for single letter extensions
On Thu, Oct 27, 2022 at 3:50 PM Mayuresh Chitale wrote: > > Currently the ISA string for a CPU is generated from two different > arrays, one for single letter extensions and another for multi letter > extensions. Add all the single letter extensions to the isa_ext_data > array and use it for generating the ISA string. Also drop 'P' and 'Q' > extensions from the list of single letter extensions as those are not > supported yet. > > Signed-off-by: Mayuresh Chitale Reviewed-by: Alistair Francis Alistair > --- > target/riscv/cpu.c | 41 +++-- > 1 file changed, 23 insertions(+), 18 deletions(-) > > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c > index e6d9c706bb..35320a8547 100644 > --- a/target/riscv/cpu.c > +++ b/target/riscv/cpu.c > @@ -41,8 +41,6 @@ > (QEMU_VERSION_MICRO)) > #define RISCV_CPU_MIMPIDRISCV_CPU_MARCHID > > -static const char riscv_single_letter_exts[] = "IEMAFDQCPVH"; > - > struct isa_ext_data { > const char *name; > bool multi_letter; > @@ -71,6 +69,13 @@ struct isa_ext_data { > *extensions by an underscore. > */ > static const struct isa_ext_data isa_edata_arr[] = { > +ISA_EXT_DATA_ENTRY(i, false, PRIV_VERSION_1_10_0, ext_i), > +ISA_EXT_DATA_ENTRY(e, false, PRIV_VERSION_1_10_0, ext_e), > +ISA_EXT_DATA_ENTRY(m, false, PRIV_VERSION_1_10_0, ext_m), > +ISA_EXT_DATA_ENTRY(a, false, PRIV_VERSION_1_10_0, ext_a), > +ISA_EXT_DATA_ENTRY(f, false, PRIV_VERSION_1_10_0, ext_f), > +ISA_EXT_DATA_ENTRY(d, false, PRIV_VERSION_1_10_0, ext_d), > +ISA_EXT_DATA_ENTRY(c, false, PRIV_VERSION_1_10_0, ext_c), > ISA_EXT_DATA_ENTRY(h, false, PRIV_VERSION_1_12_0, ext_h), > ISA_EXT_DATA_ENTRY(v, false, PRIV_VERSION_1_12_0, ext_v), > ISA_EXT_DATA_ENTRY(zicsr, true, PRIV_VERSION_1_10_0, ext_icsr), > @@ -1182,16 +1187,23 @@ static void riscv_cpu_class_init(ObjectClass *c, void > *data) > device_class_set_props(dc, riscv_cpu_properties); > } > > -static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str, int > max_str_len) > +static void riscv_isa_string_ext(RISCVCPU *cpu, char **isa_str) > { > char *old = *isa_str; > char *new = *isa_str; > int i; > > for (i = 0; i < ARRAY_SIZE(isa_edata_arr); i++) { > -if (isa_edata_arr[i].multi_letter && > -isa_ext_is_enabled(cpu, &isa_edata_arr[i])) { > -new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL); > +if (isa_ext_is_enabled(cpu, &isa_edata_arr[i])) { > +if (isa_edata_arr[i].multi_letter) { > +if (cpu->cfg.short_isa_string) { > +continue; > +} > +new = g_strconcat(old, "_", isa_edata_arr[i].name, NULL); > +} else { > +new = g_strconcat(old, isa_edata_arr[i].name, NULL); > +} > + > g_free(old); > old = new; > } > @@ -1202,19 +1214,12 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char > **isa_str, int max_str_len) > > char *riscv_isa_string(RISCVCPU *cpu) > { > -int i; > -const size_t maxlen = sizeof("rv128") + sizeof(riscv_single_letter_exts); > +const size_t maxlen = sizeof("rv128"); > char *isa_str = g_new(char, maxlen); > -char *p = isa_str + snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS); > -for (i = 0; i < sizeof(riscv_single_letter_exts) - 1; i++) { > -if (cpu->env.misa_ext & RV(riscv_single_letter_exts[i])) { > -*p++ = qemu_tolower(riscv_single_letter_exts[i]); > -} > -} > -*p = '\0'; > -if (!cpu->cfg.short_isa_string) { > -riscv_isa_string_ext(cpu, &isa_str, maxlen); > -} > + > +snprintf(isa_str, maxlen, "rv%d", TARGET_LONG_BITS); > +riscv_isa_string_ext(cpu, &isa_str); > + > return isa_str; > } > > -- > 2.34.1 > >
Re: [PATCH v5 00/13] Instantiate VT82xx functions in host device
Hi Daniel, Bernhard, On 27/10/22 11:47, Daniel Henrique Barboza wrote: On 10/27/22 05:21, Bernhard Beschow wrote: Am 16. September 2022 14:36:05 UTC schrieb "Philippe Mathieu-Daudé" : On 12/9/22 21:50, Bernhard Beschow wrote: Am 1. September 2022 11:41:14 UTC schrieb Bernhard Beschow : Testing done: * `qemu-system-ppc -machine pegasos2 -rtc base=localtime -device ati-vga,guest_hwcursor=true,romfile="" -cdrom morphos-3.17.iso -kernel morphos-3.17/boot.img` Boots successfully and it is possible to open games and tools. * I was unable to test the fuloong2e board even before this series since it seems to be unfinished [1]. A buildroot-baked kernel [2] booted but doesn't find its root partition, though the issues could be in the buildroot receipt I created. [1] https://osdn.net/projects/qmiga/wiki/SubprojectPegasos2 [2] https://github.com/shentok/buildroot/commits/fuloong2e Copying from v2 (just found it in my spam folder :/): Series: Reviewed-by: Philippe Mathieu-Daudé Review seems complete, thanks to all who participated! Now we just need someone to queue this series. Best regards, Bernhard Excellent cleanup! Series queued to mips-next. Hi Phil, would you mind doing a pull request in time for 7.2? I believe Phil was having problems with his amsat.org email. It's better to CC him using his work email phi...@linaro.org (just added it). Phil, since this has pegasos2 changes I can queue it up via ppc-next if you like. I'll toss a PR tomorrow. This series is already queued. I apologize for the lng delay, I am trying to run my usual tests but various fileservers I was using to fetch MIPS binaries disappeared over the last year, so I have to pull these files from offline backups. The PR will be in time for 7.2 however :) Regards, Phil.
Re: [PATCH] target/i386: Expand eflags updates inline
On 27/10/22 12:26, Richard Henderson wrote: The helpers for reset_rf, cli, sti, clac, stac are completely trivial; implement them inline. Drop some nearby #if 0 code. Signed-off-by: Richard Henderson --- target/i386/helper.h| 5 - target/i386/tcg/cc_helper.c | 41 - target/i386/tcg/translate.c | 30 ++- 3 files changed, 25 insertions(+), 51 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 3/4] hw/i386/acpi-build: Resolve PIIX ISA bridge rather than ACPI controller
On 26/10/22 15:31, Bernhard Beschow wrote: Resolving the PIIX ISA bridge rather than the PIIX ACPI controller mirrors the ICH9 code one line below. Signed-off-by: Bernhard Beschow --- hw/i386/acpi-build.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH 4/4] hw/i386/acpi-build: Generate AML for north and south bridges separately
On 26/10/22 15:31, Bernhard Beschow wrote: The code currently assumes Q35 iff ICH9 and i440fx iff PIIX. This is slightly confusing when trying to understand the code. Split north and south bridge code to communicate which piece of code assumes which type of bridge. Signed-off-by: Bernhard Beschow --- hw/i386/acpi-build.c | 22 ++ 1 file changed, 14 insertions(+), 8 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v2] vhost-vdpa: allow passing opened vhostfd to vhost-vdpa
Hi Jason, Sorry for top posting, but are you going to queue this patch? It looks like the discussion has been settled and no further comment I got for 2 weeks for this patch. Thanks, -Siwei On 10/13/2022 4:12 PM, Si-Wei Liu wrote: Jason, On 10/12/2022 10:02 PM, Jason Wang wrote: 在 2022/10/12 13:59, Si-Wei Liu 写道: On 10/11/2022 8:09 PM, Jason Wang wrote: On Tue, Oct 11, 2022 at 1:18 AM Si-Wei Liu wrote: On 10/8/2022 10:43 PM, Jason Wang wrote: On Sat, Oct 8, 2022 at 5:04 PM Si-Wei Liu wrote: Similar to other vhost backends, vhostfd can be passed to vhost-vdpa backend as another parameter to instantiate vhost-vdpa net client. This would benefit the use case where only open file descriptors, as opposed to raw vhost-vdpa device paths, are accessible from the QEMU process. (qemu) netdev_add type=vhost-vdpa,vhostfd=61,id=vhost-vdpa1 Adding Cindy. This has been discussed before, we've already had vhostdev=/dev/fdset/$fd which should be functional equivalent to what has been proposed here. (And this is how libvirt works if I understand correctly). Yes, I was aware of that discussion. However, our implementation of the management software is a bit different from libvirt, in which the paths in /dev/fdset/NNN can't be dynamically passed to the container where QEMU is running. By using a specific vhostfd property with existing code, it would allow our mgmt software smooth adaption without having to add too much infra code to support the /dev/fdset/NNN trick. I think fdset has extra flexibility in e.g hot-plug to allow the file descriptor to be passed with SCM_RIGHTS. Yes, that's exactly the use case we'd like to support. Though the difference in our mgmt software stack from libvirt is that any dynamic path in /dev (like /dev/fdset/ABC or /dev/vhost-vdpa-XYZ) can't be allowed to get passed through to the container running QEMU on the fly for security reasons. fd passing is allowed, though, with very strict security checks. Interesting, any reason for disallowing fd passing? For our mgmt software stack, QEMU is running in a secured container with its own namespace(s) with minimally well known and trusted devices from root ns exposed (only) at the time when QEMU is being started. Direct fd passing via SCM_RIGHTS is allowed, but fdset device node exposure is not allowed and not even considered useful to us, as it adds an unwarranted attack surface to the QEMU's secured container unnecessarily. This has been the case and our security model for a while now w.r.t hot plugging vhost-net/tap and vhost-scsi devices, so will do for vhost-vdpa with vhostfd. It's not an open source project, though what I can share is that it's not a simple script that can be easily changed, and allow passing extra devices e.g. fdset especially on the fly is not even in consideration per suggested security guideline. I think we don't do anything special here as with other secured containers that disallow dynamic device injection on the fly. I'm asking since it's the way that libvirt work and it seems to me we don't get any complaints in the past. I guess it was because libvirt doesn't run QEMU in a container with very limited device exposure, otherwise this sort of constraints would pop up. Anyway the point and the way I see it is that passing vhostfd is proved to be working well and secure with other vhost devices, I don't see why vhost-vdpa is treated special here that would need to enforce the fdset usage. It's an edge case for libvirt maybe, but supporting QEMU's vhost-vdpa device to run in a securely contained environment with no dynamic device injection shouldn't be an odd or bizarre use case. Thanks, -Siwei That's the main motivation for this direct vhostfd passing support (noted fdset doesn't need to be used along with /dev/fdset node). Having it said, I found there's also nuance in the vhostdev=/dev/fdset/XyZ interface besides the /dev node limitation: the fd to open has to be dup'ed from the original one passed via SCM_RIGHTS. This also has implication on security that any ioctl call from QEMU can't be audited through the original fd. I'm not sure I get this, but management layer can enforce a ioctl whiltelist for safety. Thanks With this regard, I think vhostfd offers more flexibility than work around those qemu_open() specifics. Would these justify the use case of concern? Thanks, -Siwei It would still be good to add the support. On the other hand, the other vhost backends, e.g. tap (via vhost-net), vhost-scsi and vhost-vsock all accept vhostfd as parameter to instantiate device, although the /dev/fdset trick also works there. I think vhost-vdpa is not unprecedented in this case? Yes. Thanks Thanks, -Siwei Thanks Signed-off-by: Si-Wei Liu Acked-by: Eugenio Pérez --- v2: - fixed typo in commit message - s/fd's/file descriptors/ --- net/vhost-vdpa.c | 25 - qapi/net.json | 3 +++ qemu-option
Re: [PATCH 2/4] hw/i386/acpi-build: Resolve redundant attribute
On 26/10/22 15:31, Bernhard Beschow wrote: The is_piix4 attribute is set once in one location and read once in another. Doing both in one location allows for removing the attribute altogether. Signed-off-by: Bernhard Beschow --- hw/i386/acpi-build.c | 20 ++-- 1 file changed, 6 insertions(+), 14 deletions(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v2 0/3] hw/isa/piix4: Remove MIPS Malta specific bits
On 27/10/22 22:47, Philippe Mathieu-Daudé wrote: Since v1: - bswap -> tswap (Bernhard) Bernhard posted a series merging both PIIX3/PIIX4 models in one [1]. Due to Malta-specific board code forced into the PIIX4 reset values, Bernhard had to include an array of "register values at reset" as a class property. This is not wrong, but to model properly the model, we should simply use the hardware real reset values, not try to bend the model to please the Malta board. This series fix this issue by having the Malta bootloader code setting the board-specific PIIX4 IRQ routing values. Note patch 2 still misses an equivalent nanoMIPS code. So this series won't be merged until this is added, but it should be enough to let Bernhard keep working on the "Consolidate PIIX series".
Re: [PATCH v4 6/7] hw/sd/sdhci: Implement Freescale eSDHC device model
Hi Bernhard, On 18/10/22 23:01, Bernhard Beschow wrote: Will allow e500 boards to access SD cards using just their own devices. Signed-off-by: Bernhard Beschow --- hw/sd/sdhci.c | 120 +- include/hw/sd/sdhci.h | 3 ++ 2 files changed, 122 insertions(+), 1 deletion(-) diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c index 306070c872..8d8ad9ff24 100644 --- a/hw/sd/sdhci.c +++ b/hw/sd/sdhci.c @@ -1369,6 +1369,7 @@ void sdhci_initfn(SDHCIState *s) s->transfer_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, sdhci_data_transfer, s); s->io_ops = &sdhci_mmio_ops; +s->io_registers_map_size = SDHC_REGISTERS_MAP_SIZE; } void sdhci_uninitfn(SDHCIState *s) @@ -1392,7 +1393,7 @@ void sdhci_common_realize(SDHCIState *s, Error **errp) s->fifo_buffer = g_malloc0(s->buf_maxsz); memory_region_init_io(&s->iomem, OBJECT(s), s->io_ops, s, "sdhci", - SDHC_REGISTERS_MAP_SIZE); + s->io_registers_map_size); I don't think we want to change this region size. [see below] void sdhci_common_unrealize(SDHCIState *s) @@ -1575,6 +1576,122 @@ static const TypeInfo sdhci_bus_info = { .class_init = sdhci_bus_class_init, }; +/* --- qdev Freescale eSDHC --- */ + +/* Watermark Level Register */ +#define ESDHC_WML0x44 + +/* Control Register for DMA transfer */ +#define ESDHC_DMA_SYSCTL0x40c + +#define ESDHC_REGISTERS_MAP_SIZE0x410 My preferred approach would be to create a container region with a size of ESDHC_REGISTERS_MAP_SIZE. Map the SDHC_REGISTERS_MAP region in the container at offset 0, priority -1. Add 2 register regions for ESDHC_WML and ESDHC_DMA_SYSCTL, and map them with priority 1 in the container. ... +static uint64_t esdhci_read(void *opaque, hwaddr offset, unsigned size) +{ +uint64_t ret; + +switch (offset) { +case SDHC_SYSAD: +case SDHC_BLKSIZE: +case SDHC_ARGUMENT: +case SDHC_TRNMOD: +case SDHC_RSPREG0: +case SDHC_RSPREG1: +case SDHC_RSPREG2: +case SDHC_RSPREG3: +case SDHC_BDATA: +case SDHC_PRNSTS: +case SDHC_HOSTCTL: +case SDHC_CLKCON: +case SDHC_NORINTSTS: +case SDHC_NORINTSTSEN: +case SDHC_NORINTSIGEN: +case SDHC_ACMD12ERRSTS: +case SDHC_CAPAB: +case SDHC_SLOT_INT_STATUS: +ret = sdhci_read(opaque, offset, size); +break; ... Then you don't need these cases. +case ESDHC_WML: +case ESDHC_DMA_SYSCTL: +ret = 0; +qemu_log_mask(LOG_UNIMP, "ESDHC rd @0x%02" HWADDR_PRIx + " not implemented\n", offset); But then I realize you only treat these 2 registers as UNIMP. So now, I'd create 1 UNIMP region for ESDHC_WML and map it into SDHC_REGISTERS_MAP (s->iomem) with priority 1, and add another UNIMP region of ESDHC_REGISTERS_MAP_SIZE - SDHC_REGISTERS_MAP_SIZE (= 0x310) and map it normally at offset 0x100 (SDHC_REGISTERS_MAP_SIZE). Look at create_unimp() in hw/arm/bcm2835_peripherals.c. But the ESDHC_WML register has address 0x44 and fits inside the SDHC_REGISTERS_MAP region, so likely belong there. 0x44 is the upper part of the SDHC_CAPAB register. These bits are undefined on the spec v2, which I see you are setting in esdhci_init(). So this register should already return 0, otherwise we have a bug. Thus we don't need to handle this ESDHC_WML particularly. And your model is reduced to handling create_unimp() in esdhci_realize(). Am I missing something? Regards, Phil.
Re: [PATCH 22/24] accel/tcg: Use interval tree for user-only page tracking
On 10/28/22 01:59, Alex Bennée wrote: I'm unwilling to put an expensive test like a function call (have_mmap_lock) before an inexpensive test like pointer != NULL. Is it really that more expensive? Well, yes. I mean, the function call isn't really slow, but it isn't single-cycle like a comparison against 0. Sure, I guess I'm just trying to avoid having so many returns out of the code at various levels of nesting. The page_get_target_data code is harder to follow. What about: int page_get_flags(target_ulong address) { PageFlagsNode *p = pageflags_find(address, address); /* * See util/interval-tree.c re lockless lookups: no false positives but * there are false negatives. If we had the lock and found * nothing we are done, otherwise retry with the mmap lock acquired. */ if (p) { return p->flags; } else if (have_mmap_lock()) { return 0; } mmap_lock(); p = pageflags_find(address, address); mmap_unlock(); return p ? p->flags : 0; } Ok, I can use this. In for v3. As for page_get_target_data, see v2, in which there has been clean up. r~
[PATCH v2] s390x/css: revert SCSW ctrl/flag bits on error
Revert the control and flag bits in the subchannel status word in case the SSCH operation fails with non-zero CC (ditto for CSCH and HSCH). According to POPS, the control and flag bits are only changed if SSCH, CSCH, and HSCH return CC 0, and no other action should be taken otherwise. In order to simulate that after the fact, the bits need to be reverted on non-zero CC. This change is necessary due to the fact that the pwrite() in vfio-ccw which triggers the SSCH can fail at any time. Previously, there was only virtio-ccw, whose do_subchannel_work function was only able to return CC0. However, once vfio-ccw went into the mix, it has become necessary to handle errors in code paths that were previously assumed to always return success. In our case, we found that in case of pwrite() failure (which was discovered by strace injection), the subchannel could be stuck in start pending state, which could be problematic if the pwrite() call returns CC2. Experimentation shows that the guest tries to retry the SSCH call as normal for CC2, but it actually continously fails due to the fact that the subchannel is stuck in start pending state even though no start function is actually taking place. Signed-off-by: Peter Jin --- hw/s390x/css.c | 51 +++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/hw/s390x/css.c b/hw/s390x/css.c index 7d9523f811..95d1b3a3ce 100644 --- a/hw/s390x/css.c +++ b/hw/s390x/css.c @@ -1522,21 +1522,37 @@ IOInstEnding css_do_xsch(SubchDev *sch) IOInstEnding css_do_csch(SubchDev *sch) { SCHIB *schib = &sch->curr_status; +uint16_t old_scsw_ctrl; +IOInstEnding ccode; if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) { return IOINST_CC_NOT_OPERATIONAL; } +/* + * Save the current scsw.ctrl in case CSCH fails and we need + * to revert the scsw to the status quo ante. + */ +old_scsw_ctrl = schib->scsw.ctrl; + /* Trigger the clear function. */ schib->scsw.ctrl &= ~(SCSW_CTRL_MASK_FCTL | SCSW_CTRL_MASK_ACTL); schib->scsw.ctrl |= SCSW_FCTL_CLEAR_FUNC | SCSW_ACTL_CLEAR_PEND; -return do_subchannel_work(sch); +ccode = do_subchannel_work(sch); + +if (ccode != IOINST_CC_EXPECTED) { +schib->scsw.ctrl = old_scsw_ctrl; +} + +return ccode; } IOInstEnding css_do_hsch(SubchDev *sch) { SCHIB *schib = &sch->curr_status; +uint16_t old_scsw_ctrl; +IOInstEnding ccode; if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) { return IOINST_CC_NOT_OPERATIONAL; @@ -1553,6 +1569,12 @@ IOInstEnding css_do_hsch(SubchDev *sch) return IOINST_CC_BUSY; } +/* + * Save the current scsw.ctrl in case HSCH fails and we need + * to revert the scsw to the status quo ante. + */ +old_scsw_ctrl = schib->scsw.ctrl; + /* Trigger the halt function. */ schib->scsw.ctrl |= SCSW_FCTL_HALT_FUNC; schib->scsw.ctrl &= ~SCSW_FCTL_START_FUNC; @@ -1564,7 +1586,13 @@ IOInstEnding css_do_hsch(SubchDev *sch) } schib->scsw.ctrl |= SCSW_ACTL_HALT_PEND; -return do_subchannel_work(sch); +ccode = do_subchannel_work(sch); + +if (ccode != IOINST_CC_EXPECTED) { +schib->scsw.ctrl = old_scsw_ctrl; +} + +return ccode; } static void css_update_chnmon(SubchDev *sch) @@ -1605,6 +1633,8 @@ static void css_update_chnmon(SubchDev *sch) IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb) { SCHIB *schib = &sch->curr_status; +uint16_t old_scsw_ctrl, old_scsw_flags; +IOInstEnding ccode; if (~(schib->pmcw.flags) & (PMCW_FLAGS_MASK_DNV | PMCW_FLAGS_MASK_ENA)) { return IOINST_CC_NOT_OPERATIONAL; @@ -1626,11 +1656,26 @@ IOInstEnding css_do_ssch(SubchDev *sch, ORB *orb) } sch->orb = *orb; sch->channel_prog = orb->cpa; + +/* + * Save the current scsw.ctrl and scsw.flags in case SSCH fails and we need + * to revert the scsw to the status quo ante. + */ +old_scsw_ctrl = schib->scsw.ctrl; +old_scsw_flags = schib->scsw.flags; + /* Trigger the start function. */ schib->scsw.ctrl |= (SCSW_FCTL_START_FUNC | SCSW_ACTL_START_PEND); schib->scsw.flags &= ~SCSW_FLAGS_MASK_PNO; -return do_subchannel_work(sch); +ccode = do_subchannel_work(sch); + +if (ccode != IOINST_CC_EXPECTED) { +schib->scsw.ctrl = old_scsw_ctrl; +schib->scsw.flags = old_scsw_flags; +} + +return ccode; } static void copy_irb_to_guest(IRB *dest, const IRB *src, const PMCW *pmcw, -- 2.37.3
Re: [PATCH v4 01/30] tests/docker: update fedora-win[32|64]-cross with lcitool
On 10/28/22 04:36, Alex Bennée wrote: Convert another two dockerfiles to lcitool and update. I renamed the helper because it is not Debian specific. We need an updated lcitool for this to deal with the weirdness of a 32bit nsis tool for both 32 and 64 bit builds. As a result there are some minor whitespace and re-order changes in a bunch of the docker files. Signed-off-by: Alex Bennée Message-Id:<20220929114231.583801-10-alex.ben...@linaro.org> --- tests/docker/dockerfiles/alpine.docker| 2 +- tests/docker/dockerfiles/centos8.docker | 2 +- .../dockerfiles/debian-amd64-cross.docker | 234 - tests/docker/dockerfiles/debian-amd64.docker | 236 +- .../dockerfiles/debian-arm64-cross.docker | 232 - .../dockerfiles/debian-armel-cross.docker | 230 - .../dockerfiles/debian-armhf-cross.docker | 232 - .../dockerfiles/debian-mips64el-cross.docker | 226 - .../dockerfiles/debian-mipsel-cross.docker| 226 - .../dockerfiles/debian-ppc64el-cross.docker | 230 - .../dockerfiles/debian-s390x-cross.docker | 228 - .../dockerfiles/fedora-win32-cross.docker | 139 --- .../dockerfiles/fedora-win64-cross.docker | 138 +++--- tests/docker/dockerfiles/fedora.docker| 230 - tests/docker/dockerfiles/opensuse-leap.docker | 2 +- tests/docker/dockerfiles/ubuntu2004.docker| 234 - tests/lcitool/libvirt-ci | 2 +- tests/lcitool/refresh | 48 ++-- 18 files changed, 1499 insertions(+), 1372 deletions(-) Acked-by: Richard Henderson r~
Re: [PATCH v4 28/30] contrib/plugins: protect execlog's last_exec expansion
On 10/28/22 04:36, Alex Bennée wrote: We originally naively treated expansion as safe because we expected each new CPU/thread to appear in order. However the -M raspi2 model triggered a case where a new high cpu_index thread started executing just before a smaller one. Clean this up by converting the GArray into the simpler GPtrArray and then holding a lock for the expansion. Signed-off-by: Alex Bennée Cc: Alexandre Iooss --- contrib/plugins/execlog.c | 38 ++ 1 file changed, 30 insertions(+), 8 deletions(-) Reviewed-by: Richard Henderson r~
Re: [PATCH v4 7/7] hw/ppc/e500: Add Freescale eSDHC to e500plat
On 18/10/22 23:01, Bernhard Beschow wrote: Adds missing functionality to e500plat machine which increases the chance of given "real" firmware images to access SD cards. Signed-off-by: Bernhard Beschow --- docs/system/ppc/ppce500.rst | 12 hw/ppc/Kconfig | 1 + hw/ppc/e500.c | 35 ++- hw/ppc/e500.h | 1 + hw/ppc/e500plat.c | 1 + 5 files changed, 49 insertions(+), 1 deletion(-) Reviewed-by: Philippe Mathieu-Daudé
Re: [PATCH v4 1/2] vfio: move the function vfio_get_xlat_addr() to memory.c
On Thu, 27 Oct 2022 15:40:31 +0800 Cindy Lu wrote: > Move the function vfio_get_xlat_addr to softmmu/memory.c, and > change the name to memory_get_xlat_addr().So we can use this > function in other devices,such as vDPA device. > > Signed-off-by: Cindy Lu > --- > hw/vfio/common.c | 92 ++- > include/exec/memory.h | 4 ++ > softmmu/memory.c | 84 +++ > 3 files changed, 92 insertions(+), 88 deletions(-) > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c > index ace9562a9b..2b5a9f3d8d 100644 > --- a/hw/vfio/common.c > +++ b/hw/vfio/common.c > @@ -574,92 +574,6 @@ static bool > vfio_listener_skipped_section(MemoryRegionSection *section) > section->offset_within_address_space & (1ULL << 63); > } > > -/* Called with rcu_read_lock held. */ > -static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, > - ram_addr_t *ram_addr, bool *read_only) > -{ > -MemoryRegion *mr; > -hwaddr xlat; > -hwaddr len = iotlb->addr_mask + 1; > -bool writable = iotlb->perm & IOMMU_WO; > - > -/* > - * The IOMMU TLB entry we have just covers translation through > - * this IOMMU to its immediate target. We need to translate > - * it the rest of the way through to memory. > - */ > -mr = address_space_translate(&address_space_memory, > - iotlb->translated_addr, > - &xlat, &len, writable, > - MEMTXATTRS_UNSPECIFIED); > -if (!memory_region_is_ram(mr)) { > -error_report("iommu map to non memory area %"HWADDR_PRIx"", > - xlat); > -return false; > -} else if (memory_region_has_ram_discard_manager(mr)) { > -RamDiscardManager *rdm = memory_region_get_ram_discard_manager(mr); > -MemoryRegionSection tmp = { > -.mr = mr, > -.offset_within_region = xlat, > -.size = int128_make64(len), > -}; > - > -/* > - * Malicious VMs can map memory into the IOMMU, which is expected > - * to remain discarded. vfio will pin all pages, populating memory. > - * Disallow that. vmstate priorities make sure any RamDiscardManager > - * were already restored before IOMMUs are restored. > - */ > -if (!ram_discard_manager_is_populated(rdm, &tmp)) { > -error_report("iommu map to discarded memory (e.g., unplugged via" > - " virtio-mem): %"HWADDR_PRIx"", > - iotlb->translated_addr); > -return false; > -} > - > -/* > - * Malicious VMs might trigger discarding of IOMMU-mapped memory. The > - * pages will remain pinned inside vfio until unmapped, resulting in > a > - * higher memory consumption than expected. If memory would get > - * populated again later, there would be an inconsistency between > pages > - * pinned by vfio and pages seen by QEMU. This is the case until > - * unmapped from the IOMMU (e.g., during device reset). > - * > - * With malicious guests, we really only care about pinning more > memory > - * than expected. RLIMIT_MEMLOCK set for the user/process can never > be > - * exceeded and can be used to mitigate this problem. > - */ > -warn_report_once("Using vfio with vIOMMUs and coordinated discarding > of" > - " RAM (e.g., virtio-mem) works, however, malicious" > - " guests can trigger pinning of more memory than" > - " intended via an IOMMU. It's possible to mitigate " > - " by setting/adjusting RLIMIT_MEMLOCK."); > -} > - > -/* > - * Translation truncates length to the IOMMU page size, > - * check that it did not truncate too much. > - */ > -if (len & iotlb->addr_mask) { > -error_report("iommu has granularity incompatible with target AS"); > -return false; > -} > - > -if (vaddr) { > -*vaddr = memory_region_get_ram_ptr(mr) + xlat; > -} > - > -if (ram_addr) { > -*ram_addr = memory_region_get_ram_addr(mr) + xlat; > -} > - > -if (read_only) { > -*read_only = !writable || mr->readonly; > -} > - > -return true; > -} > - > static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb) > { > VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n); > @@ -682,7 +596,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, > IOMMUTLBEntry *iotlb) > if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) { > bool read_only; > > -if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) { > +if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, > + &address_space_memory)) { >
Re: [PATCH v4 5/7] hw/ppc/e500: Implement pflash handling
On 18/10/22 23:01, Bernhard Beschow wrote: Allows e500 boards to have their root file system reside on flash using only builtin devices located in the eLBC memory region. Note that the flash memory area is only created when a -pflash argument is given, and that the size is determined by the given file. The idea is to put users into control. Signed-off-by: Bernhard Beschow --- docs/system/ppc/ppce500.rst | 16 hw/ppc/Kconfig | 1 + hw/ppc/e500.c | 79 + 3 files changed, 96 insertions(+) @@ -1024,6 +1061,48 @@ void ppce500_init(MachineState *machine) pmc->platform_bus_base, &pms->pbus_dev->mmio); +dinfo = drive_get(IF_PFLASH, 0, 0); +if (dinfo) { +BlockBackend *blk = blk_by_legacy_dinfo(dinfo); +BlockDriverState *bs = blk_bs(blk); +uint64_t size = bdrv_getlength(bs); +uint64_t mmio_size = pms->pbus_dev->mmio.size; +uint32_t sector_len = 64 * KiB; + +if (!is_power_of_2(size)) { +error_report("Size of pflash file must be a power of two."); +exit(1); +} + +if (size > mmio_size) { +error_report("Size of pflash file must not be bigger than %" PRIu64 + " bytes.", mmio_size); +exit(1); +} + +if (!QEMU_IS_ALIGNED(size, sector_len)) { +error_report("Size of pflash file must be a multiple of %" PRIu32 + ".", sector_len); +exit(1); +} Again, this check is unrelated to the board code and belong to the flash device (the board has no idea of the underlying flash restrictions). (see below) +dev = qdev_new(TYPE_PFLASH_CFI01); +qdev_prop_set_drive(dev, "drive", blk); +qdev_prop_set_uint32(dev, "num-blocks", size / sector_len); +qdev_prop_set_uint64(dev, "sector-length", sector_len); +qdev_prop_set_uint8(dev, "width", 2); +qdev_prop_set_bit(dev, "big-endian", true); +qdev_prop_set_uint16(dev, "id0", 0x89); +qdev_prop_set_uint16(dev, "id1", 0x18); +qdev_prop_set_uint16(dev, "id2", 0x); +qdev_prop_set_uint16(dev, "id3", 0x0); +qdev_prop_set_string(dev, "name", "e500.flash"); +sysbus_realize_and_unref(SYS_BUS_DEVICE(dev), &error_fatal); If you want to report the error differently, you can use a local Error* and display it with error_report_err() before exiting. Anyhow can be cleaned later, so: Reviewed-by: Philippe Mathieu-Daudé +memory_region_add_subregion(&pms->pbus_dev->mmio, 0, + pflash_cfi01_get_memory(PFLASH_CFI01(dev))); +}
Re: [PULL 15/20] include/hw/core: Create struct CPUJumpCache
On 10/28/22 00:44, Ilya Leoshkevich wrote: Putting CPUJumpCache inside CPUState made problem go away: diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h index 18ca701b443..3ea528566c3 100644 --- a/include/hw/core/cpu.h +++ b/include/hw/core/cpu.h @@ -32,6 +32,7 @@ #include "qemu/thread.h" #include "qemu/plugin.h" #include "qom/object.h" +#include "accel/tcg/tb-jmp-cache.h" typedef int (*WriteCoreDumpFunction)(const void *buf, size_t size, void *opaque); @@ -366,7 +367,7 @@ struct CPUState { CPUArchState *env_ptr; IcountDecr *icount_decr_ptr; -CPUJumpCache *tb_jmp_cache; +CPUJumpCache tb_jmp_cache; Yes, well. That structure is quite large (128kB?) and I had been hoping to (1) save that extra memory for e.g. KVM and (2) hide the tcg-specific stuff from core. But clearly something went wrong during some threadedness with your test case. void tcg_flush_jmp_cache(CPUState *cpu) { -CPUJumpCache *jc = cpu->tb_jmp_cache; -if (likely(jc)) { -for (int i = 0; i < TB_JMP_CACHE_SIZE; i++) { -qatomic_set(&jc->array[i].tb, NULL); -} -} else { -/* This should happen once during realize, and thus never race. */ -jc = g_new0(CPUJumpCache, 1); -jc = qatomic_xchg(&cpu->tb_jmp_cache, jc); -assert(jc == NULL); } } So there must be a race in tcg_flush_jmp_cache() after all? If there had been a race here, we would abort with the assert. It must be something else... r~
Re: [PATCH v2 09/43] hw/usb/hcd-uhci: Introduce TYPE_ defines for device models
On 22/10/22 17:04, Bernhard Beschow wrote: Suggested-by: Mark Cave-Ayland Signed-off-by: Bernhard Beschow --- hw/i386/pc_piix.c | 3 ++- hw/i386/pc_q35.c | 13 +++-- hw/isa/piix4.c| 2 +- hw/usb/hcd-uhci.c | 16 hw/usb/hcd-uhci.h | 9 + 5 files changed, 27 insertions(+), 16 deletions(-) diff --git a/hw/usb/hcd-uhci.h b/hw/usb/hcd-uhci.h index c85ab7868e..22f6e6fcfc 100644 --- a/hw/usb/hcd-uhci.h +++ b/hw/usb/hcd-uhci.h @@ -91,4 +91,13 @@ typedef struct UHCIInfo { void uhci_data_class_init(ObjectClass *klass, void *data); void usb_uhci_common_realize(PCIDevice *dev, Error **errp); +#define TYPE_PIIX3_USB_UHCI "piix3-usb-uhci" +#define TYPE_PIIX4_USB_UHCI "piix4-usb-uhci" +#define TYPE_ICH9_USB_UHCI1 "ich9-usb-uhci1" +#define TYPE_ICH9_USB_UHCI2 "ich9-usb-uhci2" +#define TYPE_ICH9_USB_UHCI3 "ich9-usb-uhci3" +#define TYPE_ICH9_USB_UHCI4 "ich9-usb-uhci4" +#define TYPE_ICH9_USB_UHCI5 "ich9-usb-uhci5" +#define TYPE_ICH9_USB_UHCI6 "ich9-usb-uhci6" What about defining once: #define TYPE_ICH9_USB_UHCI(fn) "ich9-usb-uhci" #fn ?
Re: [PULL 00/23] 9p queue 2022-10-24
On Thu, Oct 27, 2022, 1:45 PM Christian Schoenebeck wrote: > On Thursday, October 27, 2022 7:37:07 PM CEST Stefan Hajnoczi wrote: > > On Thu, 27 Oct 2022 at 12:38, Christian Schoenebeck > > wrote: > > > > > > On Thursday, October 27, 2022 5:53:47 PM CEST Thomas Huth wrote: > > > > On 24/10/2022 12.54, Christian Schoenebeck wrote: > > > > > The following changes since commit > 0529245488865038344d64fff7ee05864d3d17f6: > > > > > > > > > >Merge tag 'pull-target-arm-20221020' of > https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-10-20 > 14:36:12 -0400) > > > > > > > > > > are available in the Git repository at: > > > > > > > > > >https://github.com/cschoenebeck/qemu.git tags/pull-9p-20221024 > > > > > > > > > > for you to fetch changes up to > 3ce77865bf813f313cf79c00fd951bfc95a50165: > > > > > > > > > >tests/9p: remove unnecessary g_strdup() calls (2022-10-24 > 12:24:32 +0200) > > > > > > > > > > > > > > > 9pfs: performance, Windows host prep, tests restructure > > > > > > > > > > * Highlight of this PR is Linus Heckemann's GHashTable patch which > > > > >brings massive general performance improvements of 9p server > > > > >somewhere between factor 6 .. 12. > > > > > > > > > > * Bin Meng's g_mkdir patch is a preparatory patch for upcoming > > > > >Windows host support of 9p server. > > > > > > > > > > * The rest of the patches in this PR are 9p test code restructuring > > > > >and refactoring changes to improve readability and to ease > > > > >maintenance of 9p test code on the long-term. > > > > > > > > Hi Christian, > > > > > > > > I think this PR broke the FreeBSD CI jobs: > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/3219611457#L3116 > > > > > > > > https://gitlab.com/qemu-project/qemu/-/jobs/3219611460#L3372 > > > > > > > > Could you please have a look? > > > > > > > > Thanks! > > > >Thomas > > > > > > I try, but will certainly take some days, especially as I currently > don't have > > > a BSD installation at hand to try the changes. > > > > QEMU has the automation to run FreeBSD builds locally (in a VM): > > $ make vm-build-freebsd > > > > Not sure if that FreeBSD environment matches the one in Cirrus CI > > though. If they are different then maybe it won't reproduce locally. > > Something must be different, because > e750a7ace492f0b450653d4ad368a77d6f660fb8 > compiles fine locally with `make vm-build-freebsd` and all tests pass, too. > I have a pull request I'm sending tomorrow that fixes a build issue on FreeBSD for about 6 or so months on FreeBSD current. If it's hitting that... but I didn't think our normal CI hit that... I'd submit it today, but I'm traveling and there is an issue getting to my machines at home, but I'll be home tomorrow. Warner Ideas? > > Best regards, > Christian Schoenebeck > > > >
[PATCH v2 1/3] hw/mips/malta: Introduce PIIX4_PCI_DEVFN definition
The PIIX4 PCI-ISA bridge function is always located at 10:0. Since we want to re-use its address, add the PIIX4_PCI_DEVFN definition. Signed-off-by: Philippe Mathieu-Daudé --- hw/mips/malta.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/hw/mips/malta.c b/hw/mips/malta.c index 272d93eea7..df0f448b67 100644 --- a/hw/mips/malta.c +++ b/hw/mips/malta.c @@ -72,6 +72,8 @@ #define MAX_IDE_BUS 2 +#define PIIX4_PCI_DEVFN PCI_DEVFN(10, 0) + typedef struct { MemoryRegion iomem; MemoryRegion iomem_lo; /* 0 - 0x900 */ @@ -1377,7 +1379,7 @@ void mips_malta_init(MachineState *machine) empty_slot_init("GT64120", 0, 0x2000); /* Southbridge */ -piix4 = pci_create_simple_multifunction(pci_bus, PCI_DEVFN(10, 0), true, +piix4 = pci_create_simple_multifunction(pci_bus, PIIX4_PCI_DEVFN, true, TYPE_PIIX4_PCI_DEVICE); dev = DEVICE(piix4); isa_bus = ISA_BUS(qdev_get_child_bus(dev, "isa.0")); -- 2.37.3
[PATCH v2 2/3] hw/mips/malta: Set PIIX4 IRQ routes in embedded bootloader
Linux kernel expects the northbridge & southbridge chipsets configured by the BIOS firmware. We emulate that by writing a tiny bootloader code in write_bootloader(). Upon introduction in commit 5c2b87e34d ("PIIX4 support"), the PIIX4 configuration space included values specific to the Malta board. Set the Malta-specific IRQ routing values in the embedded bootloader, so the next commit can remove the Malta specific bits from the PIIX4 PCI-ISA bridge and make it generic (matching the real hardware). Signed-off-by: Philippe Mathieu-Daudé --- FIXME: Missing the nanoMIPS counter-part! --- hw/mips/malta.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/hw/mips/malta.c b/hw/mips/malta.c index df0f448b67..4403028778 100644 --- a/hw/mips/malta.c +++ b/hw/mips/malta.c @@ -804,6 +804,8 @@ static void write_bootloader_nanomips(uint8_t *base, uint64_t run_addr, stw_p(p++, 0x8422); stw_p(p++, 0x9088); /* sw t0, 0x88(t1) */ +/* TODO set PIIX IRQC[A:D] routing values! */ + stw_p(p++, 0xe320 | NM_HI1(kernel_entry)); stw_p(p++, NM_HI2(kernel_entry)); @@ -841,6 +843,9 @@ static void write_bootloader_nanomips(uint8_t *base, uint64_t run_addr, static void write_bootloader(uint8_t *base, uint64_t run_addr, uint64_t kernel_entry) { +const char pci_pins_cfg[PCI_NUM_PINS] = { +10, 10, 11, 11 /* PIIX IRQRC[A:D] */ +}; uint32_t *p; /* Small bootloader */ @@ -915,6 +920,20 @@ static void write_bootloader(uint8_t *base, uint64_t run_addr, #undef cpu_to_gt32 +/* + * The PIIX ISA bridge is on PCI bus 0 dev 10 func 0. + * Load the PIIX IRQC[A:D] routing config address, then + * write routing configuration to the config data register. + */ +bl_gen_write_u32(&p, /* GT_PCI0_CFGADDR */ + cpu_mips_phys_to_kseg1(NULL, 0x1be0 + 0xcf8), + tswap32((1 << 31) /* ConfigEn */ + | PCI_BUILD_BDF(0, PIIX4_PCI_DEVFN) << 8 + | PIIX_PIRQCA)); +bl_gen_write_u32(&p, /* GT_PCI0_CFGDATA */ + cpu_mips_phys_to_kseg1(NULL, 0x1be0 + 0xcfc), + tswap32(ldl_be_p(pci_pins_cfg))); + bl_gen_jump_kernel(&p, true, ENVP_VADDR - 64, /* -- 2.37.3
[PATCH v2 3/3] hw/isa/piix4: Correct IRQRC[A:D] reset values
IRQRC[A:D] registers reset value is 0x80. We were forcing the MIPS Malta machine routing to be able to boot a Linux kernel without any bootloader. We now have these registers initialized in the Malta machine write_bootloader(), so we can use the correct reset values. Signed-off-by: Philippe Mathieu-Daudé --- hw/isa/piix4.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/isa/piix4.c b/hw/isa/piix4.c index 15f344dbb7..a2165c6a49 100644 --- a/hw/isa/piix4.c +++ b/hw/isa/piix4.c @@ -115,10 +115,10 @@ static void piix4_isa_reset(DeviceState *dev) pci_conf[0x4c] = 0x4d; pci_conf[0x4e] = 0x03; pci_conf[0x4f] = 0x00; -pci_conf[0x60] = 0x0a; // PCI A -> IRQ 10 -pci_conf[0x61] = 0x0a; // PCI B -> IRQ 10 -pci_conf[0x62] = 0x0b; // PCI C -> IRQ 11 -pci_conf[0x63] = 0x0b; // PCI D -> IRQ 11 +pci_conf[0x60] = 0x80; +pci_conf[0x61] = 0x80; +pci_conf[0x62] = 0x80; +pci_conf[0x63] = 0x80; pci_conf[0x69] = 0x02; pci_conf[0x70] = 0x80; pci_conf[0x76] = 0x0c; -- 2.37.3
[PATCH v2 0/3] hw/isa/piix4: Remove MIPS Malta specific bits
Since v1: - bswap -> tswap (Bernhard) Bernhard posted a series merging both PIIX3/PIIX4 models in one [1]. Due to Malta-specific board code forced into the PIIX4 reset values, Bernhard had to include an array of "register values at reset" as a class property. This is not wrong, but to model properly the model, we should simply use the hardware real reset values, not try to bend the model to please the Malta board. This series fix this issue by having the Malta bootloader code setting the board-specific PIIX4 IRQ routing values. Note patch 2 still misses an equivalent nanoMIPS code. Regards, Phil. [1] https://lore.kernel.org/qemu-devel/20221022150508.26830-1-shen...@gmail.com/ Based-on: <20221026191821.28167-1-phi...@linaro.org> https://lore.kernel.org/qemu-devel/20221026191821.28167-1-phi...@linaro.org/ Philippe Mathieu-Daudé (3): hw/mips/malta: Introduce PIIX4_PCI_DEVFN definition hw/mips/malta: Set PIIX4 IRQ routes in embedded bootloader hw/isa/piix4: Correct IRQRC[A:D] reset values hw/isa/piix4.c | 8 hw/mips/malta.c | 23 ++- 2 files changed, 26 insertions(+), 5 deletions(-) -- 2.37.3
Re: [PULL 15/20] include/hw/core: Create struct CPUJumpCache
On 10/28/22 00:18, Ilya Leoshkevich wrote: in one of the wasmtime tests (host=x86_64, guest=s390x). GDB shows that the root cause is actually this: Thread 181 "wasi_tokio::pat" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x76c54640 (LWP 168352)] 0x55626736 in do_tb_phys_invalidate (tb=tb@entry=0x7fffea4b8500 , rm_from_page_list=rm_from_page_list@entry=true) at qemu/accel/tcg/translate-all.c:1192 1192 if (qatomic_read(&jc->array[h].tb) == tb) { (gdb) bt #0 0x55626736 in do_tb_phys_invalidate (tb=tb@entry=0x7fffea4b8500 , rm_from_page_list=rm_from_page_list@entry=true) at qemu/accel/tcg/translate-all.c:1192 #1 0x55626b98 in tb_phys_invalidate__locked (tb=0x7fffea4b8500 ) at qemu/accel/tcg/translate-all.c:1211 #2 tb_invalidate_phys_page_range__locked (p=, start=start@entry=836716683264, end=end@entry=836716687360, retaddr=0, pages=0x0) at qemu/accel/tcg/translate-all.c:1678 #3 0x55626dfb in tb_invalidate_phys_range (start=836716683264, start@entry=836716584960, end=end@entry=836716982272) at qemu/accel/tcg/translate-all.c:1753 #4 0x55639e43 in target_munmap (start=start@entry=836716584960, len=len@entry=397312) at qemu/linux-user/mmap.c:769 Let me know if you need more information, I can try to extract a minimal reproducer. A reproducer would be helpful. r~