Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu
John Snow writes: > On Tue, Apr 5, 2022, 5:03 AM Damien Hedde > wrote: [...] >> If it stays in QEMU tree, what licensing should I use ? LGPL does not >> hurt, no ? >> > > Whichever you please. GPLv2+ would be convenient and harmonizes well with > other tools. LGPL is only something I started doing so that the "qemu.qmp" > package would be LGPL. Licensing the tools as LGPL was just a sin of > convenience so I could claim a single license for the whole wheel/egg/tgz. > > (I didn't want to make separate qmp and qmp-tools packages.) > > Go with what you feel is best. Any license other than GPLv2+ needs justification in the commit message. [...]
[PATCH qemu] ppc/vof: Fix uninitialized string tracing
There are error paths which do not initialize propname but the trace_exit label prints it anyway. This initializes the problem string. Spotted by Coverity CID 1487241. Signed-off-by: Alexey Kardashevskiy --- hw/ppc/vof.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/vof.c b/hw/ppc/vof.c index 2b63a6287561..5ce3ca32c998 100644 --- a/hw/ppc/vof.c +++ b/hw/ppc/vof.c @@ -294,7 +294,7 @@ static uint32_t vof_setprop(MachineState *ms, void *fdt, Vof *vof, uint32_t nodeph, uint32_t pname, uint32_t valaddr, uint32_t vallen) { -char propname[OF_PROPNAME_LEN_MAX + 1]; +char propname[OF_PROPNAME_LEN_MAX + 1] = ""; uint32_t ret = PROM_ERROR; int offset, rc; char trval[64] = ""; -- 2.30.2
RE: [PATCH V2 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry
> From: Jason Wang > Sent: Wednesday, April 6, 2022 11:33 AM > To: Tian, Kevin > Cc: Liu, Yi L ; m...@redhat.com; pet...@redhat.com; > yi.y@linux.intel.com; qemu-devel@nongnu.org > Subject: Re: [PATCH V2 1/4] intel-iommu: don't warn guest errors when > getting rid2pasid entry > > On Sat, Apr 2, 2022 at 3:34 PM Tian, Kevin wrote: > > > > > From: Jason Wang > > > Sent: Wednesday, March 30, 2022 4:37 PM > > > On Wed, Mar 30, 2022 at 4:16 PM Tian, Kevin > wrote: > > > > > > > > > From: Jason Wang > > > > > Sent: Tuesday, March 29, 2022 12:52 PM > > > > > > > > > > > >>> > > > > > >>> Currently the implementation of vtd_ce_get_rid2pasid_entry() is > also > > > > > >>> problematic. According to VT-d spec, RID2PASID field is effective > only > > > > > >>> when ecap.rps is true otherwise PASID#0 is used for RID2PASID. I > > > didn't > > > > > >>> see ecap.rps is set, neither is it checked in that function. It > > > > > >>> works possibly > > > > > >>> just because Linux currently programs 0 to RID2PASID... > > > > > >> > > > > > >> This seems to be another issue since the introduction of scalable > mode. > > > > > > > > > > > > yes. this is not introduced in this series. The current scalable > > > > > > mode > > > > > > vIOMMU support was following 3.0 spec, while RPS is added in 3.1. > > > Needs > > > > > > to be fixed. > > > > > > > > > > > > > > > Interesting, so this is more complicated when dealing with migration > > > > > compatibility. So what I suggest is probably something like: > > > > > > > > > > -device intel-iommu,version=$version > > > > > > > > > > Then we can maintain migration compatibility correctly. For 3.0 we > can > > > > > go without RPS and 3.1 and above we need to implement RPS. > > > > > > > > This is sensible. Probably a new version number is created only when > > > > it breaks compatibility with an old version, i.e. not necessarily to > > > > follow > > > > every release from VT-d spec. In this case we definitely need one from > > > > 3.0 to 3.1+ given RID2PASID working on a 3.0 implementation will > > > > trigger a reserved fault due to RPS not set on a 3.1 implementation. > > > > > > 3.0 should be fine, but I need to check whether there's another > > > difference for PASID mode. > > > > > > It would be helpful if there's a chapter in the spec to describe the > > > difference of behaviours. > > > > There is a section called 'Revision History' in the start of the VT-d spec. > > It talks about changes in each revision, e.g.: > > -- > > June 2019, 3.1: > > > > Added support for RID-PASID capability (RPS field in ECAP_REG). > > Good to know that, does it mean, except for this revision history, all > the other semantics keep backward compatibility across the version? Yes and if you find anything not clarified properly I can help forward to the spec owner. Thanks Kevin
Re: [PATCH V2 1/4] intel-iommu: don't warn guest errors when getting rid2pasid entry
On Sat, Apr 2, 2022 at 3:34 PM Tian, Kevin wrote: > > > From: Jason Wang > > Sent: Wednesday, March 30, 2022 4:37 PM > > On Wed, Mar 30, 2022 at 4:16 PM Tian, Kevin wrote: > > > > > > > From: Jason Wang > > > > Sent: Tuesday, March 29, 2022 12:52 PM > > > > > > > > > >>> > > > > >>> Currently the implementation of vtd_ce_get_rid2pasid_entry() is also > > > > >>> problematic. According to VT-d spec, RID2PASID field is effective > > > > >>> only > > > > >>> when ecap.rps is true otherwise PASID#0 is used for RID2PASID. I > > didn't > > > > >>> see ecap.rps is set, neither is it checked in that function. It > > > > >>> works possibly > > > > >>> just because Linux currently programs 0 to RID2PASID... > > > > >> > > > > >> This seems to be another issue since the introduction of scalable > > > > >> mode. > > > > > > > > > > yes. this is not introduced in this series. The current scalable mode > > > > > vIOMMU support was following 3.0 spec, while RPS is added in 3.1. > > Needs > > > > > to be fixed. > > > > > > > > > > > > Interesting, so this is more complicated when dealing with migration > > > > compatibility. So what I suggest is probably something like: > > > > > > > > -device intel-iommu,version=$version > > > > > > > > Then we can maintain migration compatibility correctly. For 3.0 we can > > > > go without RPS and 3.1 and above we need to implement RPS. > > > > > > This is sensible. Probably a new version number is created only when > > > it breaks compatibility with an old version, i.e. not necessarily to > > > follow > > > every release from VT-d spec. In this case we definitely need one from > > > 3.0 to 3.1+ given RID2PASID working on a 3.0 implementation will > > > trigger a reserved fault due to RPS not set on a 3.1 implementation. > > > > 3.0 should be fine, but I need to check whether there's another > > difference for PASID mode. > > > > It would be helpful if there's a chapter in the spec to describe the > > difference of behaviours. > > There is a section called 'Revision History' in the start of the VT-d spec. > It talks about changes in each revision, e.g.: > -- > June 2019, 3.1: > > Added support for RID-PASID capability (RPS field in ECAP_REG). Good to know that, does it mean, except for this revision history, all the other semantics keep backward compatibility across the version? > -- > > > > > > > > > > > > > > Since most of the advanced features has not been implemented, we may > > > > probably start just from 3.4 (assuming it's the latest version). And all > > > > of the following effort should be done for 3.4 in order to productize > > > > it. > > > > > > > > > > Agree. btw in your understanding is intel-iommu in a production quality > > > now? > > > > Red Hat supports vIOMMU for the guest DPDK path now. > > > > For scalable-mode we need to see some use cases then we can evaluate. > > virtio SVA could be a possible use case, but it requires more work e.g > > PRS queue. > > Yes it's not ready for full evaluation yet. > > The current state before your change is exactly feature-on-par with the > legacy mode, except using scalable format in certain structures. That alone > is not worthy of a formal evaluation. Right. Thanks > > > > > > If not, do we want to apply this version scheme only when it > > > reaches the production quality or also in the experimental phase? > > > > Yes. E.g if we think scalable mode is mature, we can enable 3.0. > > > > Nice to know. > > Thanks > Kevin
Re: [PATCH V2 4/4] intel-iommu: PASID support
On Sat, Apr 2, 2022 at 3:27 PM Tian, Kevin wrote: > > > From: Jason Wang > > Sent: Wednesday, March 30, 2022 4:32 PM > > > > > > > > > > > > > > If there is certain fault > > > > > triggered by a request with PASID, we do want to report this > > information > > > > > upward. > > > > > > > > I tend to do it increasingly on top of this series (anyhow at least > > > > RID2PASID is introduced before this series) > > > > > > Yes, RID2PASID should have been recorded too but it's not done correctly. > > > > > > If you do it in separate series, it implies that you will introduce > > > another > > > "x-pasid-fault' to guard the new logic related to PASID fault recording? > > > > Something like this, as said previously, if it's a real problem, it > > exists since the introduction of rid2pasid, not specific to this > > patch. > > > > But I can add the fault recording if you insist. > > I prefer to including the fault recording given it's simple and makes this > change more complete in concept. That's fine. Thanks > > > > > > > > > > > Earlier when Yi proposed Qemu changes for guest SVA [1] he aimed for > > a > > > > > coarse-grained knob design: > > > > > -- > > > > > Intel VT-d 3.0 introduces scalable mode, and it has a bunch of > > capabilities > > > > > related to scalable mode translation, thus there are multiple > > combinations. > > > > > While this vIOMMU implementation wants simplify it for user by > > providing > > > > > typical combinations. User could config it by "x-scalable-mode" > > > > > option. > > > > The > > > > > usage is as below: > > > > > "-device intel-iommu,x-scalable-mode=["legacy"|"modern"]" > > > > > > > > > > - "legacy": gives support for SL page table > > > > > - "modern": gives support for FL page table, pasid, virtual > > > > > command > > > > > - if not configured, means no scalable mode support, if not > > > > > proper > > > > >configured, will throw error > > > > > -- > > > > > > > > > > Which way do you prefer to? > > > > > > > > > > [1] https://lists.gnu.org/archive/html/qemu-devel/2020- > > 02/msg02805.html > > > > > > > > My understanding is that, if we want to deploy Qemu in a production > > > > environment, we can't use the "x-" prefix. We need a full > > > > implementation of each cap. > > > > > > > > E.g > > > > -device intel-iommu,first-level=on,scalable-mode=on etc. > > > > > > > > > > You meant each cap will get a separate control option? > > > > > > But that way requires the management stack or admin to have deep > > > knowledge about how combinations of different capabilities work, e.g. > > > if just turning on scalable mode w/o first-level cannot support vSVA > > > on assigned devices. Is this a common practice when defining Qemu > > > parameters? > > > > We can have a safe and good default value for each cap. E.g > > > > In qemu 8.0 we think scalable is mature, we can make scalable to be > > enabled by default > > in qemu 8.1 we think first-level is mature, we can make first level to > > be enabled by default. > > > > OK, that is a workable way. > > Thanks > Kevin
Re: [PATCH V2 4/4] intel-iommu: PASID support
On Sat, Apr 2, 2022 at 3:24 PM Tian, Kevin wrote: > > > From: Jason Wang > > Sent: Wednesday, March 30, 2022 4:32 PM > > > > On Wed, Mar 30, 2022 at 4:02 PM Tian, Kevin wrote: > > > > > > > From: Jason Wang > > > > Sent: Tuesday, March 29, 2022 12:49 PM > > > > > > > > On Mon, Mar 28, 2022 at 3:03 PM Tian, Kevin > > wrote: > > > > > > > > > > > From: Jason Wang > > > > > > Sent: Monday, March 21, 2022 1:54 PM > > > > > > > > > > > > +/* > > > > > > + * vtd-spec v3.4 3.14: > > > > > > + * > > > > > > + * """ > > > > > > + * Requests-with-PASID with input address in range 0xFEEx_ > > are > > > > > > + * translated normally like any other request-with-PASID > > > > > > through > > > > > > + * DMA-remapping hardware. However, if such a request is > > processed > > > > > > + * using pass-through translation, it will be blocked as > > > > > > described > > > > > > + * in the paragraph below. > > > > > > > > > > While PASID+PT is blocked as described in the below paragraph, the > > > > > paragraph itself applies to all situations: > > > > > > > > > > 1) PT + noPASID > > > > > 2) translation + noPASID > > > > > 3) PT + PASID > > > > > 4) translation + PASID > > > > > > > > > > because... > > > > > > > > > > > + * > > > > > > + * Software must not program paging-structure entries to remap > > any > > > > > > + * address to the interrupt address range. Untranslated > > > > > > requests > > > > > > + * and translation requests that result in an address in the > > > > > > + * interrupt range will be blocked with condition code LGN.4 or > > > > > > + * SGN.8. > > > > > > > > > > ... if you look at the definition of LGN.4 or SGN.8: > > > > > > > > > > LGN.4: When legacy mode (RTADDR_REG.TTM=00b) is enabled, > > hardware > > > > > detected an output address (i.e. address after remapping) in > > > > > the > > > > > interrupt address range (0xFEEx_). For Translated > > > > > requests and > > > > > requests with pass-through translation type (TT=10), the > > > > > output > > > > > address is the same as the address in the request > > > > > > > > > > The last sentence in the first paragraph above just highlights the > > > > > fact > > that > > > > > when input address of PT is in interrupt range then it is blocked by > > LGN.4 > > > > > or SGN.8 due to output address also in interrupt range. > > > > > > > > > > > + * """ > > > > > > + * > > > > > > + * We enable per as memory region (iommu_ir_fault) for catching > > > > > > + * the tranlsation for interrupt range through PASID + PT. > > > > > > + */ > > > > > > +if (pt && as->pasid != PCI_NO_PASID) { > > > > > > +memory_region_set_enabled(>iommu_ir_fault, true); > > > > > > +} else { > > > > > > +memory_region_set_enabled(>iommu_ir_fault, false); > > > > > > +} > > > > > > + > > > > > > > > > > Given above this should be a bug fix for nopasid first and then apply > > > > > it > > > > > to pasid path too. > > > > > > > > Actually, nopasid path patches were posted here. > > > > > > > > https://www.mail-archive.com/qemu- > > de...@nongnu.org/msg867878.html > > > > > > > > Thanks > > > > > > > > > > Can you elaborate why they are handled differently? > > > > It's because that patch is for the case where pasid mode is not > > implemented. We might need it for -stable. > > > > So will that patch be replaced after this one goes in? That path will be merged first if I understand correctly. Then this patch could be applied on top. > By any means > the new iommu_ir_fault region could be applied to both nopasid > and pasid i.e. no need toggle it when address space is switched. Actually it's needed only when PT is enabled. When PT is disabled, the translation is done via iommu_translate. Considering the previous patch will be merged, I will fix this !PT in the next version. Thanks > > Thanks > Kevin
Re: [PATCH] vdpa: Add missing tracing to batch mapping functions
在 2022/4/5 下午2:36, Eugenio Pérez 写道: These functions were not traced properly. Signed-off-by: Eugenio Pérez Acked-by: Jason Wang --- hw/virtio/vhost-vdpa.c | 2 ++ hw/virtio/trace-events | 2 ++ 2 files changed, 4 insertions(+) diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c index 8adf7c0b92..9e5fe15d03 100644 --- a/hw/virtio/vhost-vdpa.c +++ b/hw/virtio/vhost-vdpa.c @@ -129,6 +129,7 @@ static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v) .iotlb.type = VHOST_IOTLB_BATCH_BEGIN, }; +trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type); if (write(fd, , sizeof(msg)) != sizeof(msg)) { error_report("failed to write, fd=%d, errno=%d (%s)", fd, errno, strerror(errno)); @@ -163,6 +164,7 @@ static void vhost_vdpa_listener_commit(MemoryListener *listener) msg.type = v->msg_type; msg.iotlb.type = VHOST_IOTLB_BATCH_END; +trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type); if (write(fd, , sizeof(msg)) != sizeof(msg)) { error_report("failed to write, fd=%d, errno=%d (%s)", fd, errno, strerror(errno)); diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events index a5102eac9e..48d9d5 100644 --- a/hw/virtio/trace-events +++ b/hw/virtio/trace-events @@ -25,6 +25,8 @@ vhost_user_postcopy_waker_nomatch(const char *rb, uint64_t rb_offset) "%s + 0x%" # vhost-vdpa.c vhost_vdpa_dma_map(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint64_t uaddr, uint8_t perm, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" uaddr: 0x%"PRIx64" perm: 0x%"PRIx8" type: %"PRIu8 vhost_vdpa_dma_unmap(void *vdpa, int fd, uint32_t msg_type, uint64_t iova, uint64_t size, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" iova: 0x%"PRIx64" size: 0x%"PRIx64" type: %"PRIu8 +vhost_vdpa_listener_begin_batch(void *v, int fd, uint32_t msg_type, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8 +vhost_vdpa_listener_commit(void *v, int fd, uint32_t msg_type, uint8_t type) "vdpa:%p fd: %d msg_type: %"PRIu32" type: %"PRIu8 vhost_vdpa_listener_region_add(void *vdpa, uint64_t iova, uint64_t llend, void *vaddr, bool readonly) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64" vaddr: %p read-only: %d" vhost_vdpa_listener_region_del(void *vdpa, uint64_t iova, uint64_t llend) "vdpa: %p iova 0x%"PRIx64" llend 0x%"PRIx64 vhost_vdpa_add_status(void *dev, uint8_t status) "dev: %p status: 0x%"PRIx8 -- 2.27.0
Re: [PATCH v4] vdpa: reset the backend device in the end of vhost_net_stop()
On 4/1/2022 7:20 PM, Jason Wang wrote: Adding Michael. On Sat, Apr 2, 2022 at 7:08 AM Si-Wei Liu wrote: On 3/31/2022 7:53 PM, Jason Wang wrote: On Fri, Apr 1, 2022 at 9:31 AM Michael Qiu wrote: Currently, when VM poweroff, it will trigger vdpa device(such as mlx bluefield2 VF) reset many times(with 1 datapath queue pair and one control queue, triggered 3 times), this leads to below issue: vhost VQ 2 ring restore failed: -22: Invalid argument (22) This because in vhost_net_stop(), it will stop all vhost device bind to this virtio device, and in vhost_dev_stop(), qemu tries to stop the device , then stop the queue: vhost_virtqueue_stop(). In vhost_dev_stop(), it resets the device, which clear some flags in low level driver, and in next loop(stop other vhost backends), qemu try to stop the queue corresponding to the vhost backend, the driver finds that the VQ is invalied, this is the root cause. To solve the issue, vdpa should set vring unready, and remove reset ops in device stop: vhost_dev_start(hdev, false). and implement a new function vhost_dev_reset, only reset backend device after all vhost(per-queue) stoped. Typo. Signed-off-by: Michael Qiu Acked-by: Jason Wang Rethink this patch, consider there're devices that don't support set_vq_ready(). I wonder if we need 1) uAPI to tell the user space whether or not it supports set_vq_ready() I guess what's more relevant here is to define the uAPI semantics for unready i.e. set_vq_ready(0) for resuming/stopping virtqueue processing, as starting vq is comparatively less ambiguous. Yes. Considering the likelihood that this interface may be used for live migration, it would be nice to come up with variants such as 1) discard inflight request v.s. 2) waiting for inflight processing to be done, Or inflight descriptor reporting (which seems to be tricky). But we can start from net that a discarding may just work. and 3) timeout in waiting. Actually, that's the plan and Eugenio is proposing something like this via virtio spec: https://urldefense.com/v3/__https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html__;!!ACWV5N9M2RV99hQ!bcX6i6_atR-6Gcl-4q5Tekab_xDuXr7lDAMw2E1hilZ_1cZIX1c5mztQtvsnjiiy$ Thanks for the pointer, I seem to recall I saw it some time back though I wonder if there's follow-up for the v3? My impression was that this is still a work-in-progress spec proposal, while the semantics of various F_STOP scenario is unclear yet and not all of the requirements (ex: STOP_FAILED, rewind & !IN_ORDER) for live migration do seem to get accommodated? 2) userspace will call SET_VRING_ENABLE() when the device supports otherwise it will use RESET. Are you looking to making virtqueue resume-able through the new SET_VRING_ENABLE() uAPI? I think RESET is inevitable in some case, i.e. when guest initiates device reset by writing 0 to the status register. Yes, that's all my plan. For suspend/resume and live migration use cases, indeed RESET can be substituted with SET_VRING_ENABLE. Again, it'd need quite some code refactoring to accommodate this change. Although I'm all for it, it'd be the best to lay out the plan for multiple phases rather than overload this single patch too much. You can count my time on this endeavor if you don't mind. :) You're welcome, I agree we should choose a way to go first: 1) manage to use SET_VRING_ENABLE (more like a workaround anyway) For networking device and the vq suspend/resume and live migration use cases to support, I thought it might suffice? We may drop inflight or unused ones for Ethernet... What other part do you think may limit its extension to become a general uAPI or add new uAPI to address similar VQ stop requirement if need be? Or we might well define subsystem specific uAPI to stop the virtqueue, for vdpa device specifically? I think the point here is given that we would like to avoid guest side modification to support live migration, we can define specific uAPI for specific live migration requirement without having to involve guest driver change. It'd be easy to get started this way and generalize them all to a full blown _S_STOP when things are eventually settled. 2) go with virtio-spec (may take a while) I feel it might be still quite early for now to get to a full blown _S_STOP spec level amendment that works for all types of virtio (vendor) devices. Generally there can be very specific subsystem-dependent ways to stop each type of virtio devices that satisfies the live migration of virtio subsystem devices. For now the discussion mostly concerns with vq index rewind, inflight handling, notification interrupt and configuration space such kind of virtio level things, but real device backend has implication on the other parts such as the order of IO/DMA quiescing and interrupt masking. If the subsystem virtio guest drivers today somehow don't support any of those _S_STOP new behaviors, I guess it's with little point to introduce the same
Re: [PATCH v5 11/13] KVM: Zap existing KVM mappings when pages changed in the private fd
On Thu, Mar 10, 2022 at 10:09:09PM +0800, Chao Peng wrote: > KVM gets notified when memory pages changed in the memory backing store. > When userspace allocates the memory with fallocate() or frees memory > with fallocate(FALLOC_FL_PUNCH_HOLE), memory backing store calls into > KVM fallocate/invalidate callbacks respectively. To ensure KVM never > maps both the private and shared variants of a GPA into the guest, in > the fallocate callback, we should zap the existing shared mapping and > in the invalidate callback we should zap the existing private mapping. > > In the callbacks, KVM firstly converts the offset range into the > gfn_range and then calls existing kvm_unmap_gfn_range() which will zap > the shared or private mapping. Both callbacks pass in a memslot > reference but we need 'kvm' so add a reference in memslot structure. > > Signed-off-by: Yu Zhang > Signed-off-by: Chao Peng > --- > include/linux/kvm_host.h | 3 ++- > virt/kvm/kvm_main.c | 36 > 2 files changed, 38 insertions(+), 1 deletion(-) > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index 9b175aeca63f..186b9b981a65 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -236,7 +236,7 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t > cr2_or_gpa, > int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu); > #endif > > -#ifdef KVM_ARCH_WANT_MMU_NOTIFIER > +#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_MEMFILE_NOTIFIER) > struct kvm_gfn_range { > struct kvm_memory_slot *slot; > gfn_t start; > @@ -568,6 +568,7 @@ struct kvm_memory_slot { > loff_t private_offset; > struct memfile_pfn_ops *pfn_ops; > struct memfile_notifier notifier; > + struct kvm *kvm; > }; > > static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot) > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 67349421eae3..52319f49d58a 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -841,8 +841,43 @@ static int kvm_init_mmu_notifier(struct kvm *kvm) > #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */ > > #ifdef CONFIG_MEMFILE_NOTIFIER > +static void kvm_memfile_notifier_handler(struct memfile_notifier *notifier, > + pgoff_t start, pgoff_t end) > +{ > + int idx; > + struct kvm_memory_slot *slot = container_of(notifier, > + struct kvm_memory_slot, > + notifier); > + struct kvm_gfn_range gfn_range = { > + .slot = slot, > + .start = start - (slot->private_offset >> PAGE_SHIFT), > + .end= end - (slot->private_offset >> PAGE_SHIFT), > + .may_block = true, > + }; > + struct kvm *kvm = slot->kvm; > + > + gfn_range.start = max(gfn_range.start, slot->base_gfn); > + gfn_range.end = min(gfn_range.end, slot->base_gfn + slot->npages); > + > + if (gfn_range.start >= gfn_range.end) > + return; > + > + idx = srcu_read_lock(>srcu); > + KVM_MMU_LOCK(kvm); > + kvm_unmap_gfn_range(kvm, _range); > + kvm_flush_remote_tlbs(kvm); > + KVM_MMU_UNLOCK(kvm); > + srcu_read_unlock(>srcu, idx); Should this also invalidate gfn_to_pfn_cache mappings? Otherwise it seems possible the kernel might end up inadvertantly writing to now-private guest memory via a now-stale gfn_to_pfn_cache entry.
Re: [PATCH 1/7] virtio-net: align ctrl_vq index for non-mq guest for vhost_vdpa
On 4/1/2022 7:10 PM, Jason Wang wrote: On Sat, Apr 2, 2022 at 6:32 AM Si-Wei Liu wrote: On 3/31/2022 1:39 AM, Jason Wang wrote: On Wed, Mar 30, 2022 at 11:48 PM Si-Wei Liu wrote: On 3/30/2022 2:00 AM, Jason Wang wrote: On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu wrote: With MQ enabled vdpa device and non-MQ supporting guest e.g. booting vdpa with mq=on over OVMF of single vqp, below assert failure is seen: ../hw/virtio/vhost-vdpa.c:560: vhost_vdpa_get_vq_index: Assertion `idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs' failed. 0 0x7f8ce3ff3387 in raise () at /lib64/libc.so.6 1 0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6 2 0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6 3 0x7f8ce3fec252 in () at /lib64/libc.so.6 4 0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, idx=) at ../hw/virtio/vhost-vdpa.c:563 5 0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, idx=) at ../hw/virtio/vhost-vdpa.c:558 6 0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557 7 0x558f52c6b89a in virtio_pci_set_guest_notifier (d=d@entry=0x558f568f0f60, n=n@entry=2, assign=assign@entry=true, with_irqfd=with_irqfd@entry=false) at ../hw/virtio/virtio-pci.c:974 8 0x558f52c6c0d8 in virtio_pci_set_guest_notifiers (d=0x558f568f0f60, nvqs=3, assign=true) at ../hw/virtio/virtio-pci.c:1019 9 0x558f52bf091d in vhost_net_start (dev=dev@entry=0x558f568f91f0, ncs=0x558f56937cd0, data_queue_pairs=data_queue_pairs@entry=1, cvq=cvq@entry=1) at ../hw/net/vhost_net.c:361 10 0x558f52d4e5e7 in virtio_net_set_status (status=, n=0x558f568f91f0) at ../hw/net/virtio-net.c:289 11 0x558f52d4e5e7 in virtio_net_set_status (vdev=0x558f568f91f0, status=15 '\017') at ../hw/net/virtio-net.c:370 12 0x558f52d6c4b2 in virtio_set_status (vdev=vdev@entry=0x558f568f91f0, val=val@entry=15 '\017') at ../hw/virtio/virtio.c:1945 13 0x558f52c69eff in virtio_pci_common_write (opaque=0x558f568f0f60, addr=, val=, size=) at ../hw/virtio/virtio-pci.c:1292 14 0x558f52d15d6e in memory_region_write_accessor (mr=0x558f568f19d0, addr=20, value=, size=1, shift=, mask=, attrs=...) at ../softmmu/memory.c:492 15 0x558f52d127de in access_with_adjusted_size (addr=addr@entry=20, value=value@entry=0x7f8cdbffe748, size=size@entry=1, access_size_min=, access_size_max=, access_fn=0x558f52d15cf0 , mr=0x558f568f19d0, attrs=...) at ../softmmu/memory.c:554 16 0x558f52d157ef in memory_region_dispatch_write (mr=mr@entry=0x558f568f19d0, addr=20, data=, op=, attrs=attrs@entry=...) at ../softmmu/memory.c:1504 17 0x558f52d078e7 in flatview_write_continue (fv=fv@entry=0x7f8accbc3b90, addr=addr@entry=103079215124, attrs=..., ptr=ptr@entry=0x7f8ce6300028, len=len@entry=1, addr1=, l=, mr=0x558f568f19d0) at /home/opc/qemu-upstream/include/qemu/host-utils.h:165 18 0x558f52d07b06 in flatview_write (fv=0x7f8accbc3b90, addr=103079215124, attrs=..., buf=0x7f8ce6300028, len=1) at ../softmmu/physmem.c:2822 19 0x558f52d0b36b in address_space_write (as=, addr=, attrs=..., buf=buf@entry=0x7f8ce6300028, len=) at ../softmmu/physmem.c:2914 20 0x558f52d0b3da in address_space_rw (as=, addr=, attrs=..., attrs@entry=..., buf=buf@entry=0x7f8ce6300028, len=, is_write=) at ../softmmu/physmem.c:2924 21 0x558f52dced09 in kvm_cpu_exec (cpu=cpu@entry=0x558f55c2da60) at ../accel/kvm/kvm-all.c:2903 22 0x558f52dcfabd in kvm_vcpu_thread_fn (arg=arg@entry=0x558f55c2da60) at ../accel/kvm/kvm-accel-ops.c:49 23 0x558f52f9f04a in qemu_thread_start (args=) at ../util/qemu-thread-posix.c:556 24 0x7f8ce4392ea5 in start_thread () at /lib64/libpthread.so.0 25 0x7f8ce40bb9fd in clone () at /lib64/libc.so.6 The cause for the assert failure is due to that the vhost_dev index for the ctrl vq was not aligned with actual one in use by the guest. Upon multiqueue feature negotiation in virtio_net_set_multiqueue(), if guest doesn't support multiqueue, the guest vq layout would shrink to a single queue pair, consisting of 3 vqs in total (rx, tx and ctrl). This results in ctrl_vq taking a different vhost_dev group index than the default. We can map vq to the correct vhost_dev group by checking if MQ is supported by guest and successfully negotiated. Since the MQ feature is only present along with CTRL_VQ, we make sure the index 2 is only meant for the control vq while MQ is not supported by guest. Be noted if QEMU or guest doesn't support control vq, there's no bother exposing vhost_dev and guest notifier for the control vq. Since vhost_net_start/stop implies DRIVER_OK is set in device status, feature negotiation should be completed when reaching virtio_net_vhost_status(). Fixes: 22288fe ("virtio-net: vhost control virtqueue support") Suggested-by: Jason Wang Signed-off-by: Si-Wei Liu --- hw/net/virtio-net.c | 19 --- 1 file changed, 16 insertions(+), 3
[PATCH for-7.1 09/11] pc-bios: Add NPCM8xx Bootrom
The bootrom is a minimal bootrom that can be used to bring up an NPCM845 Linux kernel. Its source code can be found at github.com/google/vbootrom/tree/master/npcm8xx Signed-off-by: Hao Wu Reviwed-by: Titus Rwantare --- pc-bios/npcm8xx_bootrom.bin | Bin 0 -> 608 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 pc-bios/npcm8xx_bootrom.bin diff --git a/pc-bios/npcm8xx_bootrom.bin b/pc-bios/npcm8xx_bootrom.bin new file mode 100644 index ..6370d6475635c4d445d2b927311edcd591949c82 GIT binary patch literal 608 zcmdUrKTE?<6vfX=0{*3B5ET?nwWA^;qEk()n=Xb9-4dxoSBrz#p|QJQL~zokn{Eyc z?PBXUkU+aB?k?IbNQftG5ej|*FC2c{bKkr7zLy3jhNxj`gc_y5h=Ru)PgZC)Y`f zTqA9Am28qLHlr*^#;re-)dpxT0U42|O+cWOcx=B;{6xXH04vx?cjm z+%U{oFx!aPpV3>ZKz0i$XA-yq{f}x4;|pbw;l#@9zGd|z-rs*H@V-o%PEV)D-)8n2%DyH5@w_^Y8 LH5R3RMV#gjxYTW} literal 0 HcmV?d1 -- 2.35.1.1094.g7c7d902a7c-goog
[PATCH for-7.1 11/11] hw/arm: Add NPCM845 Evaluation board
Signed-off-by: Hao Wu Reviwed-by: Patrick Venture --- hw/arm/meson.build | 2 +- hw/arm/npcm8xx_boards.c | 257 +++ include/hw/arm/npcm8xx.h | 20 +++ 3 files changed, 278 insertions(+), 1 deletion(-) create mode 100644 hw/arm/npcm8xx_boards.c diff --git a/hw/arm/meson.build b/hw/arm/meson.build index cf824241c5..e813cd72fa 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -14,7 +14,7 @@ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c')) arm_ss.add(when: 'CONFIG_NETDUINO2', if_true: files('netduino2.c')) arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c')) arm_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx.c', 'npcm7xx_boards.c')) -arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c')) +arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c', 'npcm8xx_boards.c')) arm_ss.add(when: 'CONFIG_NSERIES', if_true: files('nseries.c')) arm_ss.add(when: 'CONFIG_SX1', if_true: files('omap_sx1.c')) arm_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c')) diff --git a/hw/arm/npcm8xx_boards.c b/hw/arm/npcm8xx_boards.c new file mode 100644 index 00..2290473d12 --- /dev/null +++ b/hw/arm/npcm8xx_boards.c @@ -0,0 +1,257 @@ +/* + * Machine definitions for boards featuring an NPCM8xx SoC. + * + * Copyright 2022 Google LLC + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include "qemu/osdep.h" + +#include "chardev/char.h" +#include "hw/arm/npcm8xx.h" +#include "hw/core/cpu.h" +#include "hw/loader.h" +#include "hw/qdev-core.h" +#include "hw/qdev-properties.h" +#include "qapi/error.h" +#include "qemu-common.h" +#include "qemu/datadir.h" +#include "qemu/units.h" +#include "sysemu/block-backend.h" + +#define NPCM845_EVB_POWER_ON_STRAPS 0x17ff + +static const char npcm8xx_default_bootrom[] = "npcm8xx_bootrom.bin"; + +static void npcm8xx_load_bootrom(MachineState *machine, NPCM8xxState *soc) +{ +const char *bios_name = machine->firmware ?: npcm8xx_default_bootrom; +g_autofree char *filename = NULL; +int ret; + +filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name); +if (!filename) { +error_report("Could not find ROM image '%s'", bios_name); +if (!machine->kernel_filename) { +/* We can't boot without a bootrom or a kernel image. */ +exit(1); +} +return; +} +ret = load_image_mr(filename, machine->ram); +if (ret < 0) { +error_report("Failed to load ROM image '%s'", filename); +exit(1); +} +} + +static void npcm8xx_connect_flash(NPCM7xxFIUState *fiu, int cs_no, + const char *flash_type, DriveInfo *dinfo) +{ +DeviceState *flash; +qemu_irq flash_cs; + +flash = qdev_new(flash_type); +if (dinfo) { +qdev_prop_set_drive(flash, "drive", blk_by_legacy_dinfo(dinfo)); +} +qdev_realize_and_unref(flash, BUS(fiu->spi), _fatal); + +flash_cs = qdev_get_gpio_in_named(flash, SSI_GPIO_CS, 0); +qdev_connect_gpio_out_named(DEVICE(fiu), "cs", cs_no, flash_cs); +} + +static void npcm8xx_connect_dram(NPCM8xxState *soc, MemoryRegion *dram) +{ +memory_region_add_subregion(get_system_memory(), NPCM8XX_DRAM_BA, dram); + +object_property_set_link(OBJECT(soc), "dram-mr", OBJECT(dram), + _abort); +} + +static NPCM8xxState *npcm8xx_create_soc(MachineState *machine, +uint32_t hw_straps) +{ +NPCM8xxMachineClass *nmc = NPCM8XX_MACHINE_GET_CLASS(machine); +MachineClass *mc = MACHINE_CLASS(nmc); +Object *obj; + +if (strcmp(machine->cpu_type, mc->default_cpu_type) != 0) { +error_report("This board can only be used with %s", + mc->default_cpu_type); +exit(1); +} + +obj = object_new_with_props(nmc->soc_type, OBJECT(machine), "soc", +_abort, NULL); +object_property_set_uint(obj, "power-on-straps", hw_straps, _abort); + +return NPCM8XX(obj); +} + +static I2CBus *npcm8xx_i2c_get_bus(NPCM8xxState *soc, uint32_t num) +{ +g_assert(num < ARRAY_SIZE(soc->smbus)); +return I2C_BUS(qdev_get_child_bus(DEVICE(>smbus[num]), "i2c-bus")); +} + +static void npcm8xx_init_pwm_splitter(NPCM8xxMachine *machine, + NPCM8xxState *soc, const int *fan_counts) +{ +SplitIRQ *splitters = machine->fan_splitter; + +/* + * PWM 0~3 belong to module 0 output 0~3. + * PWM 4~7 belong to
[PATCH for-7.1 05/11] hw/misc: Store DRAM size in NPCM8XX GCR Module
NPCM8XX boot block stores the DRAM size in SCRPAD_B register in GCR module. Since we don't simulate a detailed memory controller, we need to store this information directly similar to the NPCM7XX's INCTR3 register. Signed-off-by: Hao Wu Reviwed-by: Titus Rwantare --- hw/misc/npcm_gcr.c | 33 ++--- include/hw/misc/npcm_gcr.h | 1 + 2 files changed, 31 insertions(+), 3 deletions(-) diff --git a/hw/misc/npcm_gcr.c b/hw/misc/npcm_gcr.c index 2349949599..14c298602a 100644 --- a/hw/misc/npcm_gcr.c +++ b/hw/misc/npcm_gcr.c @@ -267,7 +267,7 @@ static const struct MemoryRegionOps npcm_gcr_ops = { }, }; -static void npcm_gcr_enter_reset(Object *obj, ResetType type) +static void npcm7xx_gcr_enter_reset(Object *obj, ResetType type) { NPCMGCRState *s = NPCM_GCR(obj); NPCMGCRClass *c = NPCM_GCR_GET_CLASS(obj); @@ -283,6 +283,23 @@ static void npcm_gcr_enter_reset(Object *obj, ResetType type) } } +static void npcm8xx_gcr_enter_reset(Object *obj, ResetType type) +{ +NPCMGCRState *s = NPCM_GCR(obj); +NPCMGCRClass *c = NPCM_GCR_GET_CLASS(obj); + +switch (type) { +case RESET_TYPE_COLD: +memcpy(s->regs, c->cold_reset_values, c->nr_regs * sizeof(uint32_t)); +/* These 3 registers are at the same location in both 7xx and 8xx. */ +s->regs[NPCM8XX_GCR_PWRON] = s->reset_pwron; +s->regs[NPCM8XX_GCR_MDLR] = s->reset_mdlr; +s->regs[NPCM8XX_GCR_INTCR3] = s->reset_intcr3; +s->regs[NPCM8XX_GCR_SCRPAD_B] = s->reset_scrpad_b; +break; +} +} + static void npcm_gcr_realize(DeviceState *dev, Error **errp) { ERRP_GUARD(); @@ -326,6 +343,14 @@ static void npcm_gcr_realize(DeviceState *dev, Error **errp) * https://github.com/Nuvoton-Israel/u-boot/blob/2aef993bd2aafeb5408dbaad0f3ce099ee40c4aa/board/nuvoton/poleg/poleg.c#L244 */ s->reset_intcr3 |= ctz64(dram_size / NPCM7XX_GCR_MIN_DRAM_SIZE) << 8; + +/* + * The boot block starting from 0.0.6 for NPCM8xx SoCs stores the DRAM size + * in the SCRPAD2 registers. We need to set this field correctly since + * the initialization is skipped as we mentioned above. + * https://github.com/Nuvoton-Israel/u-boot/blob/npcm8mnx-v2019.01_tmp/board/nuvoton/arbel/arbel.c#L737 + */ +s->reset_scrpad_b = dram_size; } static void npcm_gcr_init(Object *obj) @@ -355,12 +380,10 @@ static Property npcm_gcr_properties[] = { static void npcm_gcr_class_init(ObjectClass *klass, void *data) { -ResettableClass *rc = RESETTABLE_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); dc->realize = npcm_gcr_realize; dc->vmsd = _npcm_gcr; -rc->phases.enter = npcm_gcr_enter_reset; device_class_set_props(dc, npcm_gcr_properties); } @@ -369,24 +392,28 @@ static void npcm7xx_gcr_class_init(ObjectClass *klass, void *data) { NPCMGCRClass *c = NPCM_GCR_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); +ResettableClass *rc = RESETTABLE_CLASS(klass); QEMU_BUILD_BUG_ON(NPCM7XX_GCR_REGS_END > NPCM_GCR_MAX_NR_REGS); QEMU_BUILD_BUG_ON(NPCM7XX_GCR_REGS_END != NPCM7XX_GCR_NR_REGS); dc->desc = "NPCM7xx System Global Control Registers"; c->nr_regs = NPCM7XX_GCR_NR_REGS; c->cold_reset_values = npcm7xx_cold_reset_values; +rc->phases.enter = npcm7xx_gcr_enter_reset; } static void npcm8xx_gcr_class_init(ObjectClass *klass, void *data) { NPCMGCRClass *c = NPCM_GCR_CLASS(klass); DeviceClass *dc = DEVICE_CLASS(klass); +ResettableClass *rc = RESETTABLE_CLASS(klass); QEMU_BUILD_BUG_ON(NPCM8XX_GCR_REGS_END > NPCM_GCR_MAX_NR_REGS); QEMU_BUILD_BUG_ON(NPCM8XX_GCR_REGS_END != NPCM8XX_GCR_NR_REGS); dc->desc = "NPCM8xx System Global Control Registers"; c->nr_regs = NPCM8XX_GCR_NR_REGS; c->cold_reset_values = npcm8xx_cold_reset_values; +rc->phases.enter = npcm8xx_gcr_enter_reset; } static const TypeInfo npcm_gcr_info[] = { diff --git a/include/hw/misc/npcm_gcr.h b/include/hw/misc/npcm_gcr.h index ac3d781c2e..bd69199d51 100644 --- a/include/hw/misc/npcm_gcr.h +++ b/include/hw/misc/npcm_gcr.h @@ -39,6 +39,7 @@ typedef struct NPCMGCRState { uint32_t reset_pwron; uint32_t reset_mdlr; uint32_t reset_intcr3; +uint32_t reset_scrpad_b; } NPCMGCRState; typedef struct NPCMGCRClass { -- 2.35.1.1094.g7c7d902a7c-goog
[PATCH for-7.1 08/11] hw/net: Add NPCM8XX PCS Module
The PCS exists in NPCM8XX's GMAC1 and is used to control the SGMII PHY. This implementation contains all the default registers and the soft reset feature that are required to load the Linux kernel driver. Further features have not been implemented yet. Signed-off-by: Hao Wu Reviewed-by: Titus Rwantare --- hw/net/meson.build| 1 + hw/net/npcm_pcs.c | 409 ++ hw/net/trace-events | 4 + include/hw/net/npcm_pcs.h | 42 4 files changed, 456 insertions(+) create mode 100644 hw/net/npcm_pcs.c create mode 100644 include/hw/net/npcm_pcs.h diff --git a/hw/net/meson.build b/hw/net/meson.build index 685b75badb..4cba3e66db 100644 --- a/hw/net/meson.build +++ b/hw/net/meson.build @@ -37,6 +37,7 @@ softmmu_ss.add(when: 'CONFIG_SUNHME', if_true: files('sunhme.c')) softmmu_ss.add(when: 'CONFIG_FTGMAC100', if_true: files('ftgmac100.c')) softmmu_ss.add(when: 'CONFIG_SUNGEM', if_true: files('sungem.c')) softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx_emc.c')) +softmmu_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm_pcs.c')) softmmu_ss.add(when: 'CONFIG_ETRAXFS', if_true: files('etraxfs_eth.c')) softmmu_ss.add(when: 'CONFIG_COLDFIRE', if_true: files('mcf_fec.c')) diff --git a/hw/net/npcm_pcs.c b/hw/net/npcm_pcs.c new file mode 100644 index 00..efe5f68d9c --- /dev/null +++ b/hw/net/npcm_pcs.c @@ -0,0 +1,409 @@ +/* + * Nuvoton NPCM8xx PCS Module + * + * Copyright 2022 Google LLC + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +/* + * Disclaimer: + * Currently we only implemented the default values of the registers and + * the soft reset feature. These are required to boot up the GMAC module + * in Linux kernel for NPCM845 boards. Other functionalities are not modeled. + */ + +#include "qemu/osdep.h" + +#include "exec/hwaddr.h" +#include "hw/registerfields.h" +#include "hw/net/npcm_pcs.h" +#include "migration/vmstate.h" +#include "qemu/log.h" +#include "qemu/units.h" +#include "trace.h" + +#define NPCM_PCS_IND_AC_BA 0x1fe +#define NPCM_PCS_IND_SR_CTL 0x1e00 +#define NPCM_PCS_IND_SR_MII 0x1f00 +#define NPCM_PCS_IND_SR_TIM 0x1f07 +#define NPCM_PCS_IND_VR_MII 0x1f80 + +REG16(NPCM_PCS_SR_CTL_ID1, 0x08) +REG16(NPCM_PCS_SR_CTL_ID2, 0x0a) +REG16(NPCM_PCS_SR_CTL_STS, 0x10) + +REG16(NPCM_PCS_SR_MII_CTRL, 0x00) +REG16(NPCM_PCS_SR_MII_STS, 0x02) +REG16(NPCM_PCS_SR_MII_DEV_ID1, 0x04) +REG16(NPCM_PCS_SR_MII_DEV_ID2, 0x06) +REG16(NPCM_PCS_SR_MII_AN_ADV, 0x08) +REG16(NPCM_PCS_SR_MII_LP_BABL, 0x0a) +REG16(NPCM_PCS_SR_MII_AN_EXPN, 0x0c) +REG16(NPCM_PCS_SR_MII_EXT_STS, 0x1e) + +REG16(NPCM_PCS_SR_TIM_SYNC_ABL, 0x10) +REG16(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_LWR, 0x12) +REG16(NPCM_PCS_SR_TIM_SYNC_TX_MAX_DLY_UPR, 0x14) +REG16(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_LWR, 0x16) +REG16(NPCM_PCS_SR_TIM_SYNC_TX_MIN_DLY_UPR, 0x18) +REG16(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_LWR, 0x1a) +REG16(NPCM_PCS_SR_TIM_SYNC_RX_MAX_DLY_UPR, 0x1c) +REG16(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_LWR, 0x1e) +REG16(NPCM_PCS_SR_TIM_SYNC_RX_MIN_DLY_UPR, 0x20) + +REG16(NPCM_PCS_VR_MII_MMD_DIG_CTRL1, 0x000) +REG16(NPCM_PCS_VR_MII_AN_CTRL, 0x002) +REG16(NPCM_PCS_VR_MII_AN_INTR_STS, 0x004) +REG16(NPCM_PCS_VR_MII_TC, 0x006) +REG16(NPCM_PCS_VR_MII_DBG_CTRL, 0x00a) +REG16(NPCM_PCS_VR_MII_EEE_MCTRL0, 0x00c) +REG16(NPCM_PCS_VR_MII_EEE_TXTIMER, 0x010) +REG16(NPCM_PCS_VR_MII_EEE_RXTIMER, 0x012) +REG16(NPCM_PCS_VR_MII_LINK_TIMER_CTRL, 0x014) +REG16(NPCM_PCS_VR_MII_EEE_MCTRL1, 0x016) +REG16(NPCM_PCS_VR_MII_DIG_STS, 0x020) +REG16(NPCM_PCS_VR_MII_ICG_ERRCNT1, 0x022) +REG16(NPCM_PCS_VR_MII_MISC_STS, 0x030) +REG16(NPCM_PCS_VR_MII_RX_LSTS, 0x040) +REG16(NPCM_PCS_VR_MII_MP_TX_BSTCTRL0, 0x070) +REG16(NPCM_PCS_VR_MII_MP_TX_LVLCTRL0, 0x074) +REG16(NPCM_PCS_VR_MII_MP_TX_GENCTRL0, 0x07a) +REG16(NPCM_PCS_VR_MII_MP_TX_GENCTRL1, 0x07c) +REG16(NPCM_PCS_VR_MII_MP_TX_STS, 0x090) +REG16(NPCM_PCS_VR_MII_MP_RX_GENCTRL0, 0x0b0) +REG16(NPCM_PCS_VR_MII_MP_RX_GENCTRL1, 0x0b2) +REG16(NPCM_PCS_VR_MII_MP_RX_LOS_CTRL0, 0x0ba) +REG16(NPCM_PCS_VR_MII_MP_MPLL_CTRL0, 0x0f0) +REG16(NPCM_PCS_VR_MII_MP_MPLL_CTRL1, 0x0f2) +REG16(NPCM_PCS_VR_MII_MP_MPLL_STS, 0x110) +REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL2, 0x126) +REG16(NPCM_PCS_VR_MII_MP_LVL_CTRL, 0x130) +REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL0, 0x132) +REG16(NPCM_PCS_VR_MII_MP_MISC_CTRL1, 0x134) +REG16(NPCM_PCS_VR_MII_DIG_CTRL2, 0x1c2) +REG16(NPCM_PCS_VR_MII_DIG_ERRCNT_SEL, 0x1c4) + +/* Register Fields */ +#define NPCM_PCS_SR_MII_CTRL_RSTBIT(15) + +static const uint16_t
[PATCH for-7.1 07/11] hw/misc: Support 8-bytes memop in NPCM GCR module
The NPCM8xx GCR device can be accessed with 64-bit memory operations. This patch supports that. Signed-off-by: Hao Wu Reviewed-by: Patrick Venture --- hw/misc/npcm_gcr.c | 98 +--- hw/misc/trace-events | 4 +- 2 files changed, 77 insertions(+), 25 deletions(-) diff --git a/hw/misc/npcm_gcr.c b/hw/misc/npcm_gcr.c index 14c298602a..aa81db23d7 100644 --- a/hw/misc/npcm_gcr.c +++ b/hw/misc/npcm_gcr.c @@ -201,6 +201,7 @@ static uint64_t npcm_gcr_read(void *opaque, hwaddr offset, unsigned size) uint32_t reg = offset / sizeof(uint32_t); NPCMGCRState *s = opaque; NPCMGCRClass *c = NPCM_GCR_GET_CLASS(s); +uint64_t value; if (reg >= c->nr_regs) { qemu_log_mask(LOG_GUEST_ERROR, @@ -209,9 +210,23 @@ static uint64_t npcm_gcr_read(void *opaque, hwaddr offset, unsigned size) return 0; } -trace_npcm_gcr_read(offset, s->regs[reg]); +switch (size) { +case 4: +value = s->regs[reg]; +break; + +case 8: +value = s->regs[reg] + (((uint64_t)s->regs[reg + 1]) << 32); +break; + +default: +g_assert_not_reached(); +} -return s->regs[reg]; +if (s->regs[reg] != 0) { +trace_npcm_gcr_read(offset, value); +} +return value; } static void npcm_gcr_write(void *opaque, hwaddr offset, @@ -222,7 +237,7 @@ static void npcm_gcr_write(void *opaque, hwaddr offset, NPCMGCRClass *c = NPCM_GCR_GET_CLASS(s); uint32_t value = v; -trace_npcm_gcr_write(offset, value); +trace_npcm_gcr_write(offset, v); if (reg >= c->nr_regs) { qemu_log_mask(LOG_GUEST_ERROR, @@ -231,29 +246,65 @@ static void npcm_gcr_write(void *opaque, hwaddr offset, return; } -switch (reg) { -case NPCM7XX_GCR_PDID: -case NPCM7XX_GCR_PWRON: -case NPCM7XX_GCR_INTSR: -qemu_log_mask(LOG_GUEST_ERROR, - "%s: register @ 0x%04" HWADDR_PRIx " is read-only\n", - __func__, offset); -return; - -case NPCM7XX_GCR_RESSR: -case NPCM7XX_GCR_CP2BST: -/* Write 1 to clear */ -value = s->regs[reg] & ~value; +switch (size) { +case 4: +switch (reg) { +case NPCM7XX_GCR_PDID: +case NPCM7XX_GCR_PWRON: +case NPCM7XX_GCR_INTSR: +qemu_log_mask(LOG_GUEST_ERROR, + "%s: register @ 0x%04" HWADDR_PRIx " is read-only\n", + __func__, offset); +return; + +case NPCM7XX_GCR_RESSR: +case NPCM7XX_GCR_CP2BST: +/* Write 1 to clear */ +value = s->regs[reg] & ~value; +break; + +case NPCM7XX_GCR_RLOCKR1: +case NPCM7XX_GCR_MDLR: +/* Write 1 to set */ +value |= s->regs[reg]; +break; +}; +s->regs[reg] = value; break; -case NPCM7XX_GCR_RLOCKR1: -case NPCM7XX_GCR_MDLR: -/* Write 1 to set */ -value |= s->regs[reg]; +case 8: +s->regs[reg] = value; +s->regs[reg + 1] = v >> 32; break; -}; -s->regs[reg] = value; +default: +g_assert_not_reached(); +} +} + +static bool npcm_gcr_check_mem_op(void *opaque, hwaddr offset, + unsigned size, bool is_write, + MemTxAttrs attrs) +{ +NPCMGCRClass *c = NPCM_GCR_GET_CLASS(opaque); + +if (offset >= c->nr_regs * sizeof(uint32_t)) { +return false; +} + +switch (size) { +case 4: +return true; +case 8: +if (offset >= NPCM8XX_GCR_SCRPAD_00 * sizeof(uint32_t) && +offset < (NPCM8XX_GCR_NR_REGS - 1) * sizeof(uint32_t)) { +return true; +} else { +return false; +} +default: +return false; +} } static const struct MemoryRegionOps npcm_gcr_ops = { @@ -262,7 +313,8 @@ static const struct MemoryRegionOps npcm_gcr_ops = { .endianness = DEVICE_LITTLE_ENDIAN, .valid = { .min_access_size= 4, -.max_access_size= 4, +.max_access_size= 8, +.accepts= npcm_gcr_check_mem_op, .unaligned = false, }, }; diff --git a/hw/misc/trace-events b/hw/misc/trace-events index 02650acfff..2ffec963e7 100644 --- a/hw/misc/trace-events +++ b/hw/misc/trace-events @@ -103,8 +103,8 @@ npcm_clk_read(uint64_t offset, uint32_t value) " offset: 0x%04" PRIx64 " value: npcm_clk_write(uint64_t offset, uint32_t value) "offset: 0x%04" PRIx64 " value: 0x%08" PRIx32 # npcm_gcr.c -npcm_gcr_read(uint64_t offset, uint32_t value) " offset: 0x%04" PRIx64 " value: 0x%08" PRIx32 -npcm_gcr_write(uint64_t offset, uint32_t value) "offset: 0x%04" PRIx64 " value: 0x%08" PRIx32 +npcm_gcr_read(uint64_t offset, uint64_t value) " offset: 0x%04" PRIx64 " value: 0x%08" PRIx64 +npcm_gcr_write(uint64_t offset, uint64_t
[PATCH for-7.1 06/11] hw/intc: Add a property to allow GIC to reset into non secure mode
This property allows certain boards like NPCM8xx to boot the kernel directly into non-secure mode. This is necessary since we do not support secure boot features for NPCM8xx now. Signed-off-by: Hao Wu Reviewed-by: Patrick Venture --- hw/intc/arm_gic_common.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/hw/intc/arm_gic_common.c b/hw/intc/arm_gic_common.c index 7b44d5625b..7ddc5cfbd0 100644 --- a/hw/intc/arm_gic_common.c +++ b/hw/intc/arm_gic_common.c @@ -358,6 +358,8 @@ static Property arm_gic_common_properties[] = { /* True if the GIC should implement the virtualization extensions */ DEFINE_PROP_BOOL("has-virtualization-extensions", GICState, virt_extn, 0), DEFINE_PROP_UINT32("num-priority-bits", GICState, n_prio_bits, 8), +/* True if we want to directly booting a kernel into NonSecure */ +DEFINE_PROP_BOOL("irq-reset-nonsecure", GICState, irq_reset_nonsecure, 0), DEFINE_PROP_END_OF_LIST(), }; -- 2.35.1.1094.g7c7d902a7c-goog
[PATCH for-7.1 03/11] hw/misc: Support NPCM8XX GCR module
NPCM8XX has a different set of global control registers than 7XX. This patch supports that. Signed-off-by: Hao Wu Reviwed-by: Titus Rwantare --- MAINTAINERS | 9 +- hw/misc/meson.build | 2 +- hw/misc/npcm7xx_gcr.c | 269 hw/misc/npcm_gcr.c| 413 ++ hw/misc/trace-events | 6 +- include/hw/arm/npcm7xx.h | 4 +- include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} | 29 +- 7 files changed, 445 insertions(+), 287 deletions(-) delete mode 100644 hw/misc/npcm7xx_gcr.c create mode 100644 hw/misc/npcm_gcr.c rename include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} (56%) diff --git a/MAINTAINERS b/MAINTAINERS index 4ad2451e03..c31ed09527 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -791,14 +791,15 @@ F: hw/net/mv88w8618_eth.c F: include/hw/net/mv88w8618_eth.h F: docs/system/arm/musicpal.rst -Nuvoton NPCM7xx +Nuvoton NPCM M: Havard Skinnemoen M: Tyrone Ting +M: Hao Wu L: qemu-...@nongnu.org S: Supported -F: hw/*/npcm7xx* -F: include/hw/*/npcm7xx* -F: tests/qtest/npcm7xx* +F: hw/*/npcm* +F: include/hw/*/npcm* +F: tests/qtest/npcm* F: pc-bios/npcm7xx_bootrom.bin F: roms/vbootrom F: docs/system/arm/nuvoton.rst diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 6fb69612e0..13f8fee5b6 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -61,7 +61,7 @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files( softmmu_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mst_fpga.c')) softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files( 'npcm7xx_clk.c', - 'npcm7xx_gcr.c', + 'npcm_gcr.c', 'npcm7xx_mft.c', 'npcm7xx_pwm.c', 'npcm7xx_rng.c', diff --git a/hw/misc/npcm7xx_gcr.c b/hw/misc/npcm7xx_gcr.c deleted file mode 100644 index eace9e1967..00 --- a/hw/misc/npcm7xx_gcr.c +++ /dev/null @@ -1,269 +0,0 @@ -/* - * Nuvoton NPCM7xx System Global Control Registers. - * - * Copyright 2020 Google LLC - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, but WITHOUT - * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or - * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - */ - -#include "qemu/osdep.h" - -#include "hw/misc/npcm7xx_gcr.h" -#include "hw/qdev-properties.h" -#include "migration/vmstate.h" -#include "qapi/error.h" -#include "qemu/cutils.h" -#include "qemu/log.h" -#include "qemu/module.h" -#include "qemu/units.h" - -#include "trace.h" - -#define NPCM7XX_GCR_MIN_DRAM_SIZE (128 * MiB) -#define NPCM7XX_GCR_MAX_DRAM_SIZE (2 * GiB) - -enum NPCM7xxGCRRegisters { -NPCM7XX_GCR_PDID, -NPCM7XX_GCR_PWRON, -NPCM7XX_GCR_MFSEL1 = 0x0c / sizeof(uint32_t), -NPCM7XX_GCR_MFSEL2, -NPCM7XX_GCR_MISCPE, -NPCM7XX_GCR_SPSWC = 0x038 / sizeof(uint32_t), -NPCM7XX_GCR_INTCR, -NPCM7XX_GCR_INTSR, -NPCM7XX_GCR_HIFCR = 0x050 / sizeof(uint32_t), -NPCM7XX_GCR_INTCR2 = 0x060 / sizeof(uint32_t), -NPCM7XX_GCR_MFSEL3, -NPCM7XX_GCR_SRCNT, -NPCM7XX_GCR_RESSR, -NPCM7XX_GCR_RLOCKR1, -NPCM7XX_GCR_FLOCKR1, -NPCM7XX_GCR_DSCNT, -NPCM7XX_GCR_MDLR, -NPCM7XX_GCR_SCRPAD3, -NPCM7XX_GCR_SCRPAD2, -NPCM7XX_GCR_DAVCLVLR= 0x098 / sizeof(uint32_t), -NPCM7XX_GCR_INTCR3, -NPCM7XX_GCR_VSINTR = 0x0ac / sizeof(uint32_t), -NPCM7XX_GCR_MFSEL4, -NPCM7XX_GCR_CPBPNTR = 0x0c4 / sizeof(uint32_t), -NPCM7XX_GCR_CPCTL = 0x0d0 / sizeof(uint32_t), -NPCM7XX_GCR_CP2BST, -NPCM7XX_GCR_B2CPNT, -NPCM7XX_GCR_CPPCTL, -NPCM7XX_GCR_I2CSEGSEL, -NPCM7XX_GCR_I2CSEGCTL, -NPCM7XX_GCR_VSRCR, -NPCM7XX_GCR_MLOCKR, -NPCM7XX_GCR_SCRPAD = 0x013c / sizeof(uint32_t), -NPCM7XX_GCR_USB1PHYCTL, -NPCM7XX_GCR_USB2PHYCTL, -NPCM7XX_GCR_REGS_END, -}; - -static const uint32_t cold_reset_values[NPCM7XX_GCR_NR_REGS] = { -[NPCM7XX_GCR_PDID] = 0x04a92750, /* Poleg A1 */ -[NPCM7XX_GCR_MISCPE]= 0x, -[NPCM7XX_GCR_SPSWC] = 0x0003, -[NPCM7XX_GCR_INTCR] = 0x035e, -[NPCM7XX_GCR_HIFCR] = 0x004e, -[NPCM7XX_GCR_INTCR2]= (1U << 19), /* DDR initialized */ -[NPCM7XX_GCR_RESSR] = 0x8000, -[NPCM7XX_GCR_DSCNT] = 0x00c0, -[NPCM7XX_GCR_DAVCLVLR] = 0x5a00f3cf, -[NPCM7XX_GCR_SCRPAD]= 0x0008, -[NPCM7XX_GCR_USB1PHYCTL]= 0x034730e4, -[NPCM7XX_GCR_USB2PHYCTL]= 0x034730e4, -}; - -static uint64_t npcm7xx_gcr_read(void *opaque, hwaddr offset, unsigned size) -{ -uint32_t reg = offset /
[PATCH for-7.1 01/11] docs/system/arm: Add Description for NPCM8XX SoC
NPCM8XX SoC is the successor of the NPCM7XX. It features quad-core Cortex-A35 (Armv8, 64-bit) CPUs and some additional peripherals. Signed-off-by: Hao Wu Reviewed-by: Patrick Venture --- docs/system/arm/nuvoton.rst | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/docs/system/arm/nuvoton.rst b/docs/system/arm/nuvoton.rst index ef2792076a..bead17fa7e 100644 --- a/docs/system/arm/nuvoton.rst +++ b/docs/system/arm/nuvoton.rst @@ -1,12 +1,13 @@ Nuvoton iBMC boards (``*-bmc``, ``npcm750-evb``, ``quanta-gsj``) -The `Nuvoton iBMC`_ chips (NPCM7xx) are a family of ARM-based SoCs that are +The `Nuvoton iBMC`_ chips are a family of ARM-based SoCs that are designed to be used as Baseboard Management Controllers (BMCs) in various -servers. They all feature one or two ARM Cortex-A9 CPU cores, as well as an -assortment of peripherals targeted for either Enterprise or Data Center / -Hyperscale applications. The former is a superset of the latter, so NPCM750 has -all the peripherals of NPCM730 and more. +servers. Currently there are two families: NPCM7XX series and +NPCM8XX series. NPCM7XX series feature one or two ARM Cortex-A9 CPU cores, +while NPCM8XX feature 4 ARM Cortex-A35 CPU cores. Both series contain a +different assortment of peripherals targeted for either Enterprise or Data +Center / Hyperscale applications. .. _Nuvoton iBMC: https://www.nuvoton.com/products/cloud-computing/ibmc/ @@ -27,6 +28,8 @@ There are also two more SoCs, NPCM710 and NPCM705, which are single-core variants of NPCM750 and NPCM730, respectively. These are currently not supported by QEMU. +The NPCM8xx SoC is the successor of the NPCM7xx SoC. + Supported devices - @@ -61,6 +64,8 @@ Missing devices * System Wake-up Control (SWC) * Shared memory (SHM) * eSPI slave interface + * Block-tranfer interface (8XX only) + * Virtual UART (8XX only) * Ethernet controller (GMAC) * USB device (USBD) @@ -76,6 +81,11 @@ Missing devices * Video capture * Encoding compression engine * Security features + * I3C buses (8XX only) + * Temperator sensor interface (8XX only) + * Virtual UART (8XX only) + * Flash monitor (8XX only) + * JTAG master (8XX only) Boot options -- 2.35.1.1094.g7c7d902a7c-goog
[PATCH for-7.1 04/11] hw/misc: Support NPCM8XX CLK Module Registers
NPCM8XX adds a few new registers and have a different set of reset values to the CLK modules. This patch supports them. This patch doesn't support the new clock values generated by these registers. Currently no modules use these new clock values so they are not necessary at this point. Implementation of these clocks might be required when implementing these modules. Signed-off-by: Hao Wu Reviewed-by: Titus Rwantare --- hw/misc/meson.build | 2 +- hw/misc/{npcm7xx_clk.c => npcm_clk.c} | 238 ++ hw/misc/trace-events | 6 +- include/hw/arm/npcm7xx.h | 4 +- include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} | 43 ++-- 5 files changed, 219 insertions(+), 74 deletions(-) rename hw/misc/{npcm7xx_clk.c => npcm_clk.c} (81%) rename include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} (83%) diff --git a/hw/misc/meson.build b/hw/misc/meson.build index 13f8fee5b6..b4e9d3f857 100644 --- a/hw/misc/meson.build +++ b/hw/misc/meson.build @@ -60,7 +60,7 @@ softmmu_ss.add(when: 'CONFIG_IMX', if_true: files( )) softmmu_ss.add(when: 'CONFIG_MAINSTONE', if_true: files('mst_fpga.c')) softmmu_ss.add(when: 'CONFIG_NPCM7XX', if_true: files( - 'npcm7xx_clk.c', + 'npcm_clk.c', 'npcm_gcr.c', 'npcm7xx_mft.c', 'npcm7xx_pwm.c', diff --git a/hw/misc/npcm7xx_clk.c b/hw/misc/npcm_clk.c similarity index 81% rename from hw/misc/npcm7xx_clk.c rename to hw/misc/npcm_clk.c index bc2b879feb..f4601a3e9a 100644 --- a/hw/misc/npcm7xx_clk.c +++ b/hw/misc/npcm_clk.c @@ -1,5 +1,5 @@ /* - * Nuvoton NPCM7xx Clock Control Registers. + * Nuvoton NPCM7xx/8xx Clock Control Registers. * * Copyright 2020 Google LLC * @@ -16,7 +16,7 @@ #include "qemu/osdep.h" -#include "hw/misc/npcm7xx_clk.h" +#include "hw/misc/npcm_clk.h" #include "hw/timer/npcm7xx_timer.h" #include "hw/qdev-clock.h" #include "migration/vmstate.h" @@ -75,13 +75,65 @@ enum NPCM7xxCLKRegisters { NPCM7XX_CLK_REGS_END, }; +enum NPCM8xxCLKRegisters { +NPCM8XX_CLK_CLKEN1, +NPCM8XX_CLK_CLKSEL, +NPCM8XX_CLK_CLKDIV1, +NPCM8XX_CLK_PLLCON0, +NPCM8XX_CLK_PLLCON1, +NPCM8XX_CLK_SWRSTR, +NPCM8XX_CLK_IPSRST1 = 0x20 / sizeof(uint32_t), +NPCM8XX_CLK_IPSRST2, +NPCM8XX_CLK_CLKEN2, +NPCM8XX_CLK_CLKDIV2, +NPCM8XX_CLK_CLKEN3, +NPCM8XX_CLK_IPSRST3, +NPCM8XX_CLK_WD0RCR, +NPCM8XX_CLK_WD1RCR, +NPCM8XX_CLK_WD2RCR, +NPCM8XX_CLK_SWRSTC1, +NPCM8XX_CLK_SWRSTC2, +NPCM8XX_CLK_SWRSTC3, +NPCM8XX_CLK_TIPRSTC, +NPCM8XX_CLK_PLLCON2, +NPCM8XX_CLK_CLKDIV3, +NPCM8XX_CLK_CORSTC, +NPCM8XX_CLK_PLLCONG, +NPCM8XX_CLK_AHBCKFI, +NPCM8XX_CLK_SECCNT, +NPCM8XX_CLK_CNTR25M, +/* Registers unique to NPCM8XX SoC */ +NPCM8XX_CLK_CLKEN4, +NPCM8XX_CLK_IPSRST4, +NPCM8XX_CLK_BUSTO, +NPCM8XX_CLK_CLKDIV4, +NPCM8XX_CLK_WD0RCRB, +NPCM8XX_CLK_WD1RCRB, +NPCM8XX_CLK_WD2RCRB, +NPCM8XX_CLK_SWRSTC1B, +NPCM8XX_CLK_SWRSTC2B, +NPCM8XX_CLK_SWRSTC3B, +NPCM8XX_CLK_TIPRSTCB, +NPCM8XX_CLK_CORSTCB, +NPCM8XX_CLK_IPSRSTDIS1, +NPCM8XX_CLK_IPSRSTDIS2, +NPCM8XX_CLK_IPSRSTDIS3, +NPCM8XX_CLK_IPSRSTDIS4, +NPCM8XX_CLK_CLKENDIS1, +NPCM8XX_CLK_CLKENDIS2, +NPCM8XX_CLK_CLKENDIS3, +NPCM8XX_CLK_CLKENDIS4, +NPCM8XX_CLK_THRTL_CNT, +NPCM8XX_CLK_REGS_END, +}; + /* * These reset values were taken from version 0.91 of the NPCM750R data sheet. * * All are loaded on power-up reset. CLKENx and SWRSTR should also be loaded on * core domain reset, but this reset type is not yet supported by QEMU. */ -static const uint32_t cold_reset_values[NPCM7XX_CLK_NR_REGS] = { +static const uint32_t npcm7xx_cold_reset_values[NPCM7XX_CLK_NR_REGS] = { [NPCM7XX_CLK_CLKEN1]= 0x, [NPCM7XX_CLK_CLKSEL]= 0x004a, [NPCM7XX_CLK_CLKDIV1] = 0x5413f855, @@ -103,6 +155,46 @@ static const uint32_t cold_reset_values[NPCM7XX_CLK_NR_REGS] = { [NPCM7XX_CLK_AHBCKFI] = 0x00c8, }; +/* + * These reset values were taken from version 0.92 of the NPCM8xx data sheet. + */ +static const uint32_t npcm8xx_cold_reset_values[NPCM8XX_CLK_NR_REGS] = { +[NPCM8XX_CLK_CLKEN1]= 0x, +[NPCM8XX_CLK_CLKSEL]= 0x154a, +[NPCM8XX_CLK_CLKDIV1] = 0x5413f855, +[NPCM8XX_CLK_PLLCON0] = 0x00222101 | PLLCON_LOKI, +[NPCM8XX_CLK_PLLCON1] = 0x00202101 | PLLCON_LOKI, +[NPCM8XX_CLK_IPSRST1] = 0x1000, +[NPCM8XX_CLK_IPSRST2] = 0x8000, +[NPCM8XX_CLK_CLKEN2]= 0x, +[NPCM8XX_CLK_CLKDIV2] = 0xaa4f8f9f, +[NPCM8XX_CLK_CLKEN3]= 0x, +[NPCM8XX_CLK_IPSRST3] = 0x0300, +[NPCM8XX_CLK_WD0RCR]= 0x, +[NPCM8XX_CLK_WD1RCR]= 0x, +[NPCM8XX_CLK_WD2RCR]= 0x, +[NPCM8XX_CLK_SWRSTC1] = 0x0003, +[NPCM8XX_CLK_SWRSTC2] = 0x0001, +
[PATCH for-7.1 10/11] hw/arm: Add NPCM8XX SoC
The file contains a basic NPCM8XX SOC file. It's forked from the NPCM7XX SOC with some changes. Signed-off-by: Hao Wu Reviwed-by: Patrick Venture Reviwed-by: Titus Rwantare --- configs/devices/aarch64-softmmu/default.mak | 1 + hw/arm/Kconfig | 11 + hw/arm/meson.build | 1 + hw/arm/npcm8xx.c| 806 include/hw/arm/npcm8xx.h| 106 +++ 5 files changed, 925 insertions(+) create mode 100644 hw/arm/npcm8xx.c create mode 100644 include/hw/arm/npcm8xx.h diff --git a/configs/devices/aarch64-softmmu/default.mak b/configs/devices/aarch64-softmmu/default.mak index cf43ac8da1..1c3cf6dda1 100644 --- a/configs/devices/aarch64-softmmu/default.mak +++ b/configs/devices/aarch64-softmmu/default.mak @@ -6,3 +6,4 @@ include ../arm-softmmu/default.mak CONFIG_XLNX_ZYNQMP_ARM=y CONFIG_XLNX_VERSAL=y CONFIG_SBSA_REF=y +CONFIG_NPCM8XX=y diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig index 97f3b38019..ed5d37ba01 100644 --- a/hw/arm/Kconfig +++ b/hw/arm/Kconfig @@ -408,6 +408,17 @@ config NPCM7XX select UNIMP select PCA954X +config NPCM8XX +bool +select ARM_GIC +select SMBUS +select PL310 # cache controller +select NPCM7XX +select SERIAL +select SSI +select UNIMP + + config FSL_IMX25 bool imply I2C_DEVICES diff --git a/hw/arm/meson.build b/hw/arm/meson.build index 721a8eb8be..cf824241c5 100644 --- a/hw/arm/meson.build +++ b/hw/arm/meson.build @@ -14,6 +14,7 @@ arm_ss.add(when: 'CONFIG_MUSICPAL', if_true: files('musicpal.c')) arm_ss.add(when: 'CONFIG_NETDUINO2', if_true: files('netduino2.c')) arm_ss.add(when: 'CONFIG_NETDUINOPLUS2', if_true: files('netduinoplus2.c')) arm_ss.add(when: 'CONFIG_NPCM7XX', if_true: files('npcm7xx.c', 'npcm7xx_boards.c')) +arm_ss.add(when: 'CONFIG_NPCM8XX', if_true: files('npcm8xx.c')) arm_ss.add(when: 'CONFIG_NSERIES', if_true: files('nseries.c')) arm_ss.add(when: 'CONFIG_SX1', if_true: files('omap_sx1.c')) arm_ss.add(when: 'CONFIG_CHEETAH', if_true: files('palm.c')) diff --git a/hw/arm/npcm8xx.c b/hw/arm/npcm8xx.c new file mode 100644 index 00..afcf8330d5 --- /dev/null +++ b/hw/arm/npcm8xx.c @@ -0,0 +1,806 @@ +/* + * Nuvoton NPCM8xx SoC family. + * + * Copyright 2022 Google LLC + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + * for more details. + */ + +#include "qemu/osdep.h" + +#include "hw/arm/boot.h" +#include "hw/arm/npcm8xx.h" +#include "hw/char/serial.h" +#include "hw/intc/arm_gic.h" +#include "hw/loader.h" +#include "hw/misc/unimp.h" +#include "hw/qdev-clock.h" +#include "hw/qdev-properties.h" +#include "qapi/error.h" +#include "qemu/units.h" +#include "sysemu/sysemu.h" + +#define ARM_PHYS_TIMER_PPI 30 +#define ARM_VIRT_TIMER_PPI 27 +#define ARM_HYP_TIMER_PPI 26 +#define ARM_SEC_TIMER_PPI 29 + +/* + * This covers the whole MMIO space. We'll use this to catch any MMIO accesses + * that aren't handled by a device. + */ +#define NPCM8XX_MMIO_BA (0x8000) +#define NPCM8XX_MMIO_SZ (0x7ffd) + +/* OTP fuse array */ +#define NPCM8XX_OTP_BA (0xf0189000) + +/* GIC Distributor */ +#define NPCM8XX_GICD_BA (0xdfff9000) +#define NPCM8XX_GICC_BA (0xdfffa000) + +/* Core system modules. */ +#define NPCM8XX_CPUP_BA (0xf03fe000) +#define NPCM8XX_GCR_BA (0xf080) +#define NPCM8XX_CLK_BA (0xf0801000) +#define NPCM8XX_MC_BA (0xf0824000) +#define NPCM8XX_RNG_BA (0xf000b000) + +/* ADC Module */ +#define NPCM8XX_ADC_BA (0xf000c000) + +/* Internal AHB SRAM */ +#define NPCM8XX_RAM3_BA (0xc0008000) +#define NPCM8XX_RAM3_SZ (4 * KiB) + +/* Memory blocks at the end of the address space */ +#define NPCM8XX_RAM2_BA (0xfffb) +#define NPCM8XX_RAM2_SZ (256 * KiB) +#define NPCM8XX_ROM_BA (0x0100) +#define NPCM8XX_ROM_SZ (64 * KiB) + +/* SDHCI Modules */ +#define NPCM8XX_MMC_BA (0xf0842000) + +/* Run PLL1 at 1600 MHz */ +#define NPCM8XX_PLLCON1_FIXUP_VAL (0x00402101) +/* Run the CPU from PLL1 and UART from PLL2 */ +#define NPCM8XX_CLKSEL_FIXUP_VAL(0x004aaba9) + +/* Clock configuration values to be fixed up when bypassing bootloader */ + +/* + * Interrupt lines going into the GIC. This does not include internal Cortex-A9 + * interrupts. + */ +enum NPCM8xxInterrupt { +NPCM8XX_ADC_IRQ = 0, +NPCM8XX_KCS_HIB_IRQ = 9, +NPCM8XX_MMC_IRQ = 26, +NPCM8XX_TIMER0_IRQ =
[PATCH for-7.1 02/11] hw/ssi: Make flash size a property in NPCM7XX FIU
This allows different FIUs to have different flash sizes, useful in NPCM8XX which has multiple different sized FIU modules. Signed-off-by: Hao Wu Reviewed-by: Patrick Venture --- hw/arm/npcm7xx.c | 6 ++ hw/ssi/npcm7xx_fiu.c | 6 ++ include/hw/ssi/npcm7xx_fiu.h | 1 + 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/hw/arm/npcm7xx.c b/hw/arm/npcm7xx.c index d85cc02765..9946b94120 100644 --- a/hw/arm/npcm7xx.c +++ b/hw/arm/npcm7xx.c @@ -274,17 +274,21 @@ static const struct { hwaddr regs_addr; int cs_count; const hwaddr *flash_addr; +size_t flash_size; } npcm7xx_fiu[] = { { .name = "fiu0", .regs_addr = 0xfb00, .cs_count = ARRAY_SIZE(npcm7xx_fiu0_flash_addr), .flash_addr = npcm7xx_fiu0_flash_addr, +.flash_size = 128 * MiB, + }, { .name = "fiu3", .regs_addr = 0xc000, .cs_count = ARRAY_SIZE(npcm7xx_fiu3_flash_addr), .flash_addr = npcm7xx_fiu3_flash_addr, +.flash_size = 128 * MiB, }, }; @@ -686,6 +690,8 @@ static void npcm7xx_realize(DeviceState *dev, Error **errp) object_property_set_int(OBJECT(sbd), "cs-count", npcm7xx_fiu[i].cs_count, _abort); +object_property_set_int(OBJECT(sbd), "flash-size", +npcm7xx_fiu[i].flash_size, _abort); sysbus_realize(sbd, _abort); sysbus_mmio_map(sbd, 0, npcm7xx_fiu[i].regs_addr); diff --git a/hw/ssi/npcm7xx_fiu.c b/hw/ssi/npcm7xx_fiu.c index 4eedb2927e..ea490f1332 100644 --- a/hw/ssi/npcm7xx_fiu.c +++ b/hw/ssi/npcm7xx_fiu.c @@ -28,9 +28,6 @@ #include "trace.h" -/* Up to 128 MiB of flash may be accessed directly as memory. */ -#define NPCM7XX_FIU_FLASH_WINDOW_SIZE (128 * MiB) - /* Each module has 4 KiB of register space. Only a fraction of it is used. */ #define NPCM7XX_FIU_CTRL_REGS_SIZE (4 * KiB) @@ -525,7 +522,7 @@ static void npcm7xx_fiu_realize(DeviceState *dev, Error **errp) flash->fiu = s; memory_region_init_io(>direct_access, OBJECT(s), _fiu_flash_ops, >flash[i], "flash", - NPCM7XX_FIU_FLASH_WINDOW_SIZE); + s->flash_size); sysbus_init_mmio(sbd, >direct_access); } } @@ -543,6 +540,7 @@ static const VMStateDescription vmstate_npcm7xx_fiu = { static Property npcm7xx_fiu_properties[] = { DEFINE_PROP_INT32("cs-count", NPCM7xxFIUState, cs_count, 0), +DEFINE_PROP_SIZE("flash-size", NPCM7xxFIUState, flash_size, 0), DEFINE_PROP_END_OF_LIST(), }; diff --git a/include/hw/ssi/npcm7xx_fiu.h b/include/hw/ssi/npcm7xx_fiu.h index a3a1704289..1785ea16f4 100644 --- a/include/hw/ssi/npcm7xx_fiu.h +++ b/include/hw/ssi/npcm7xx_fiu.h @@ -60,6 +60,7 @@ struct NPCM7xxFIUState { int32_t cs_count; int32_t active_cs; qemu_irq *cs_lines; +size_t flash_size; NPCM7xxFIUFlash *flash; SSIBus *spi; -- 2.35.1.1094.g7c7d902a7c-goog
[PATCH for-7.1 00/11] hw/arm: Add NPCM8XX support
NPCM8XX BMCs are the successors of the NPCM7XX BMCs. They feature quad-core ARM Cortex A35 that supports both 32 bits and 64 bits operations. This patch set aims to support basic functionalities of the NPCM7XX BMCs. The patch set includes: 1. We derive most devices from the 7XX models and made some modifications. 2. We have constructed a minimum vBootROM similar to the 7XX one at https://github.com/google/vbootrom/tree/master/npcm8xx and included it in the patch set. 3. We added a new NPCM8XX SOC and an evaluation board machine npcm845-evb. The OpenBMC for NPCM845 evaluation board can be found at: https://github.com/Nuvoton-Israel/openbmc/tree/npcm-v2.10/meta-evb/meta-evb-nuvoton/meta-evb-npcm845 The patch set can boot the evaluation board image built from the source above to login prompt. Hao Wu (11): docs/system/arm: Add Description for NPCM8XX SoC hw/ssi: Make flash size a property in NPCM7XX FIU hw/misc: Support NPCM8XX GCR module hw/misc: Support NPCM8XX CLK Module Registers hw/misc: Store DRAM size in NPCM8XX GCR Module hw/intc: Add a property to allow GIC to reset into non secure mode hw/misc: Support 8-bytes memop in NPCM GCR module hw/net: Add NPCM8XX PCS Module pc-bios: Add NPCM8xx Bootrom hw/arm: Add NPCM8XX SoC hw/arm: Add NPCM845 Evaluation board MAINTAINERS | 9 +- configs/devices/aarch64-softmmu/default.mak | 1 + docs/system/arm/nuvoton.rst | 20 +- hw/arm/Kconfig| 11 + hw/arm/meson.build| 1 + hw/arm/npcm7xx.c | 6 + hw/arm/npcm8xx.c | 806 ++ hw/arm/npcm8xx_boards.c | 257 ++ hw/intc/arm_gic_common.c | 2 + hw/misc/meson.build | 4 +- hw/misc/npcm7xx_gcr.c | 269 -- hw/misc/{npcm7xx_clk.c => npcm_clk.c} | 238 -- hw/misc/npcm_gcr.c| 492 +++ hw/misc/trace-events | 12 +- hw/net/meson.build| 1 + hw/net/npcm_pcs.c | 409 + hw/net/trace-events | 4 + hw/ssi/npcm7xx_fiu.c | 6 +- include/hw/arm/npcm7xx.h | 8 +- include/hw/arm/npcm8xx.h | 126 +++ include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} | 43 +- include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} | 30 +- include/hw/net/npcm_pcs.h | 42 + include/hw/ssi/npcm7xx_fiu.h | 1 + pc-bios/npcm8xx_bootrom.bin | Bin 0 -> 608 bytes 25 files changed, 2428 insertions(+), 370 deletions(-) create mode 100644 hw/arm/npcm8xx.c create mode 100644 hw/arm/npcm8xx_boards.c delete mode 100644 hw/misc/npcm7xx_gcr.c rename hw/misc/{npcm7xx_clk.c => npcm_clk.c} (81%) create mode 100644 hw/misc/npcm_gcr.c create mode 100644 hw/net/npcm_pcs.c create mode 100644 include/hw/arm/npcm8xx.h rename include/hw/misc/{npcm7xx_clk.h => npcm_clk.h} (83%) rename include/hw/misc/{npcm7xx_gcr.h => npcm_gcr.h} (55%) create mode 100644 include/hw/net/npcm_pcs.h create mode 100644 pc-bios/npcm8xx_bootrom.bin -- 2.35.1.1094.g7c7d902a7c-goog
Re: [PATCH] block/stream: Drain subtree around graph change
05.04.2022 17:41, Kevin Wolf wrote: Am 05.04.2022 um 14:12 hat Vladimir Sementsov-Ogievskiy geschrieben: Thanks Kevin! I have already run out of arguments in the battle against using subtree-drains to isolate graph modification operations from each other in different threads in the mailing list) (Note also, that the top-most version of this patch is "[PATCH v2] block/stream: Drain subtree around graph change") Oops, I completely missed the v2. Thanks! About avoiding polling during graph-modifying operations, there is a problem: some IO operations are involved into block-graph modifying operations. At least it's rewriting "backing_file_offset" and "backing_file_size" fields in qcow2 header. We can't just separate rewriting metadata from graph modifying operation: this way another graph-modifying operation may interleave and we'll write outdated metadata. Hm, generally we don't update image metadata when we reconfigure the graph. Most changes are temporary (like insertion of filter nodes) and the image header only contains a "default configuration" to be used on the next start. There are only a few places that update the image header; I think it's generally block job completions. They obviously update the in-memory graph, too, but they don't write to the image file (and therefore potentially poll) in the middle of updating the in-memory graph, but they do both in separate steps. I think this is okay. We must just avoid polling in the middle of graph updates because if something else changes the graph there, it's not clear any more that we're really doing what the caller had in mind. Hmm, interesting where is polling in described case? First possible place I can find is bdrv_parent_drained_begin_single() in bdrv_replace_child_noperm(). Another is bdrv_apply_subtree_drain() in bdrv_child_cb_attach(). No idea how to get rid of them. Hmm. I think, the core problem here is that when we wait in drained_begin(), nobody protects us from attaching one more node to the drained subgraph. And we should handle this, that's the complexity. So I still think, we need a kind of global lock for graph modifying operations. Or a kind per-BDS locks as you propose. But in this case we need to be sure that taking all needed per-BDS locks we'll avoid deadlocking. I guess this depends on the exact granularity of the locks we're using. If you take the lock only while updating a single edge, I don't think you could easily deadlock. If you hold it for more complex operations, it becomes harder to tell without checking the code. I think, keeping the whole operation, like reopen_multiple, or some job's .prepare(), etc., under one critical section is simplest to analyze. Could this be something like this? uint8_t graph_locked; void graph_lock(AioContext *ctx) { AIO_POLL_WHILE(ctx, qatomic_cmpxchg(_locked, 0, 1) == 1); } void graph_unlock() { qatomic_set(_locked, 0); aio_wait_kick(); } -- Best regards, Vladimir
Re: [PATCH] acpi: Bodge acpi_index migration
On Tue, Apr 05, 2022 at 08:06:58PM +0100, Dr. David Alan Gilbert (git) wrote: The patch is fine but pls repost as text not as application/octet-stream. Thanks! -- MST
Re: [PATCH] acpi: Bodge acpi_index migration
On Tue, 5 Apr 2022 20:06:58 +0100 "Dr. David Alan Gilbert (git)" wrote: > From: "Dr. David Alan Gilbert" > > The 'acpi_index' field is a statically configured field, which for > some reason is migrated; this never makes much sense because it's > command line static. > > However, on piix4 it's conditional, and the condition/test function > ends up having the wrong pointer passed to it (it gets a PIIX4PMState > not the AcpiPciHpState it was expecting, because VMSTATE_PCI_HOTPLUG > is a macro and not another struct). This means the field is randomly > loaded/saved based on a random pointer. In 6.x this random pointer > randomly seems to get 0 for everyone (!); in 7.0rc it's getting junk > and trying to load a field that the source didn't send. FWIW, after some hunting and pecking, 6.2 (64bit): (gdb) p &((struct AcpiPciHpState *)0)->acpi_index $1 = (uint32_t *) 0xc04 (gdb) p &((struct PIIX4PMState *)0)->ar.tmr.io.addr $2 = (hwaddr *) 0xc00 f53faa70bb63: (gdb) p &((struct AcpiPciHpState *)0)->acpi_index $1 = (uint32_t *) 0xc04 (gdb) p &((struct PIIX4PMState *)0)->io_gpe.coalesced.tqh_circ.tql_prev $2 = (struct QTailQLink **) 0xc00 So yeah, it seems 0xc04 will always be part of a pointer on current mainline. I can't really speak to the ACPIPMTimer MemoryRegion in the PIIX4PMState, maybe if there's a hwaddr it's always 32bit and the upper dword is reliably zero? Thanks, Alex > The migration > stream gets out of line and hits the section footer. > > The bodge is on piix4 never to load the field: > a) Most 6.x builds never send it, so most of the time the migration > will work. > b) We can backport this fix to 6.x to remove the boobytrap. > c) It should never have made a difference anyway since the acpi-index > is command line configured and should be correct on the destination > anyway > d) ich9 is still sending/receiving this (unconditionally all the time) > but due to (c) should never notice. We could follow up to make it > skip. > > It worries me just when (a) actually happens. > > Fixes: b32bd76 ("pci: introduce acpi-index property for PCI device") > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/932 > > Signed-off-by: Dr. David Alan Gilbert > --- > hw/acpi/acpi-pci-hotplug-stub.c | 4 > hw/acpi/pcihp.c | 6 -- > hw/acpi/piix4.c | 11 ++- > include/hw/acpi/pcihp.h | 2 -- > 4 files changed, 10 insertions(+), 13 deletions(-) > > diff --git a/hw/acpi/acpi-pci-hotplug-stub.c b/hw/acpi/acpi-pci-hotplug-stub.c > index 734e4c5986..a43f6dafc9 100644 > --- a/hw/acpi/acpi-pci-hotplug-stub.c > +++ b/hw/acpi/acpi-pci-hotplug-stub.c > @@ -41,7 +41,3 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool > acpihp_root_off) > return; > } > > -bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id) > -{ > -return false; > -} > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c > index 6351bd3424..bf65bbea49 100644 > --- a/hw/acpi/pcihp.c > +++ b/hw/acpi/pcihp.c > @@ -554,12 +554,6 @@ void acpi_pcihp_init(Object *owner, AcpiPciHpState *s, > PCIBus *root_bus, > OBJ_PROP_FLAG_READ); > } > > -bool vmstate_acpi_pcihp_use_acpi_index(void *opaque, int version_id) > -{ > - AcpiPciHpState *s = opaque; > - return s->acpi_index; > -} > - > const VMStateDescription vmstate_acpi_pcihp_pci_status = { > .name = "acpi_pcihp_pci_status", > .version_id = 1, > diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c > index cc37fa3416..48aeedd5f0 100644 > --- a/hw/acpi/piix4.c > +++ b/hw/acpi/piix4.c > @@ -267,6 +267,15 @@ static bool piix4_vmstate_need_smbus(void *opaque, int > version_id) > return pm_smbus_vmstate_needed(); > } > > +/* > + * This is a fudge to turn off the acpi_index field, whose > + * test was always broken on piix4. > + */ > +static bool vmstate_test_never(void *opaque, int version_id) > +{ > +return false; > +} > + > /* qemu-kvm 1.2 uses version 3 but advertised as 2 > * To support incoming qemu-kvm 1.2 migration, change version_id > * and minimum_version_id to 2 below (which breaks migration from > @@ -297,7 +306,7 @@ static const VMStateDescription vmstate_acpi = { > struct AcpiPciHpPciStatus), > VMSTATE_PCI_HOTPLUG(acpi_pci_hotplug, PIIX4PMState, > vmstate_test_use_acpi_hotplug_bridge, > -vmstate_acpi_pcihp_use_acpi_index), > +vmstate_test_never), > VMSTATE_END_OF_LIST() > }, > .subsections = (const VMStateDescription*[]) { > diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h > index af1a169fc3..7e268c2c9c 100644 > --- a/include/hw/acpi/pcihp.h > +++ b/include/hw/acpi/pcihp.h > @@ -73,8 +73,6 @@ void acpi_pcihp_reset(AcpiPciHpState *s, bool > acpihp_root_off); > > extern const VMStateDescription vmstate_acpi_pcihp_pci_status; > > -bool
Re: [RFC PATCH 0/4] hw/i2c: i2c slave mode support
> On Mar 31, 2022, at 9:57 AM, Klaus Jensen wrote: > > From: Klaus Jensen > > Hi all, > > This RFC series adds I2C "slave mode" support for the Aspeed I2C > controller as well as the necessary infrastructure in the i2c core to > support this. > > Background > ~~ > We are working on an emulated NVM Express Management Interface[1] for > testing and validation purposes. NVMe-MI is based on the MCTP > protocol[2] which may use a variety of underlying transports. The one we > are interested in is I2C[3]. > > The first general trickery here is that all MCTP transactions are based > on the SMBus Block Write bus protocol[4]. This means that the slave must > be able to master the bus to communicate. As you know, hw/i2c/core.c > currently does not support this use case. This is great, I’m attempting to use your changes right now for the same thing (MCTP). > > The second issue is how to interact with these mastering devices. Jeremy > and Matt (CC'ed) have been working on an MCTP stack for the Linux Kernel > (already upstream) and an I2C binding driver[5] is currently under > review. This binding driver relies on I2C slave mode support in the I2C > controller. > > This series > ~~~ > Patch 1 adds support for multiple masters in the i2c core, allowing > slaves to master the bus and safely issue i2c_send/recv(). Patch 2 adds > an asynchronous send i2c_send_async(I2CBus *, uint8) on the bus that > must be paired with an explicit ack using i2c_ack(I2CBus *). > > Patch 3 adds the slave mode functionality to the emulated Aspeed I2C > controller. The implementation is probably buggy since I had to rely on > the implementation of the kernel driver to reverse engineer the behavior > of the controller slave mode (I do not have access to a spec sheet for > the Aspeed, but maybe someone can help me out with that?). > > Finally, patch 4 adds an example device using this new API. The device > is a simple "echo" device that upon being sent a set of bytes uses the > first byte as the address of the slave to echo to. > > With this combined I am able to boot up Linux on an emulated Aspeed 2600 > evaluation board and have the i2c echo device write into a Linux slave > EEPROM. Assuming the echo device is on address 0x42: > > # echo slave-24c02 0x1064 > /sys/bus/i2c/devices/i2c-15/new_device > i2c i2c-15: new_device: Instantiated device slave-24c02 at 0x64 > # i2cset -y 15 0x42 0x64 0x00 0xaa i > # hexdump /sys/bus/i2c/devices/15-1064/slave-eeprom > 000 ffaa > 010 > * > 100 When I try this with my system, it seems like the i2c-echo device takes over the bus but never echoes the data to the EEPROM. Am I missing something to make this work? It seems like the “i2c_send_async” calls aren’t happening, which must be because the bottom half isn’t being scheduled, right? After the i2c_do_start_transfer, how is the bottom half supposed to be scheduled again? Is the slave receiving (the EEPROM) supposed to call i2c_ack or something? root@bmc-oob:~# echo 24c02 0x1064 > /sys/bus/i2c/devices/i2c-8/new_device [ 135.559719] at24 8-1064: 256 byte 24c02 EEPROM, writable, 1 bytes/write [ 135.562661] i2c i2c-8: new_device: Instantiated device 24c02 at 0x64 root@bmc-oob:~# i2cset -y 8 0x42 0x64 0x00 0xaa i i2c_echo_event: start send i2c_echo_send: data[0] = 0x64 i2c_echo_send: data[1] = 0x00 i2c_echo_send: data[2] = 0xaa i2c_echo_event: scheduling bottom-half i2c_echo_bh: attempting to gain mastery of bus i2c_echo_bh: starting a send to address 0x64 root@bmc-oob:~# hexdump -C /sys/bus/i2c/devices/8-1064/eeprom 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 || * 0100 Thanks again for this, it’s exactly what I needed. > > [1]: https://nvmexpress.org/developers/nvme-mi-specification/ > [2]: > https://www.dmtf.org/sites/default/files/standards/documents/DSP0236_1.3.1.pdf > > [3]: > https://www.dmtf.org/sites/default/files/standards/documents/DSP0237_1.2.0.pdf > > [4]: http://www.smbus.org/specs/SMBus_3_1_20180319.pdf > [5]: > https://lore.kernel.org/linux-i2c/20220218055106.1944485-1-m...@codeconstruct.com.au/ > > Klaus Jensen (4): > hw/i2c: support multiple masters > hw/i2c: add async send > hw/i2c: add slave mode for aspeed_i2c > hw/misc: add a toy i2c echo device > > hw/i2c/aspeed_i2c.c | 95 +--- > hw/i2c/core.c | 57 +- > hw/i2c/trace-events | 2 +- > hw/misc/i2c-echo.c | 144 > hw/misc/meson.build | 2 + > include/hw/i2c/aspeed_i2c.h | 8 ++ > include/hw/i2c/i2c.h| 19 + > 7 files changed, 316 insertions(+), 11 deletions(-) > create mode 100644 hw/misc/i2c-echo.c > > -- > 2.35.1 > >
[PATCH for-7.1 1/1] hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c
spapr_nvdimm_flush_completion_cb() and flush_worker_cb() are using the DRC object returned by spapr_drc_index() without checking it for NULL. In this case we would be dereferencing a NULL pointer when doing SPAPR_NVDIMM(drc->dev) and PC_DIMM(drc->dev). This can happen if, during a scm_flush(), the DRC object is wrongly freed/released by another part of the code (i.e. hotunplug the device). spapr_drc_index() would then return NULL in the callbacks. Fixes: Coverity CID 1487108, 1487178 Signed-off-by: Daniel Henrique Barboza --- hw/ppc/spapr_nvdimm.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr_nvdimm.c b/hw/ppc/spapr_nvdimm.c index c4c97da5de..e92d92fdae 100644 --- a/hw/ppc/spapr_nvdimm.c +++ b/hw/ppc/spapr_nvdimm.c @@ -447,9 +447,19 @@ static int flush_worker_cb(void *opaque) { SpaprNVDIMMDeviceFlushState *state = opaque; SpaprDrc *drc = spapr_drc_by_index(state->drcidx); -PCDIMMDevice *dimm = PC_DIMM(drc->dev); -HostMemoryBackend *backend = MEMORY_BACKEND(dimm->hostmem); -int backend_fd = memory_region_get_fd(>mr); +PCDIMMDevice *dimm; +HostMemoryBackend *backend; +int backend_fd; + +if (!drc) { +error_report("papr_scm: Could not find nvdimm device with DRC 0x%u", + state->drcidx); +return H_HARDWARE; +} + +dimm = PC_DIMM(drc->dev); +backend = MEMORY_BACKEND(dimm->hostmem); +backend_fd = memory_region_get_fd(>mr); if (object_property_get_bool(OBJECT(backend), "pmem", NULL)) { MemoryRegion *mr = host_memory_backend_get_memory(dimm->hostmem); @@ -475,7 +485,15 @@ static void spapr_nvdimm_flush_completion_cb(void *opaque, int hcall_ret) { SpaprNVDIMMDeviceFlushState *state = opaque; SpaprDrc *drc = spapr_drc_by_index(state->drcidx); -SpaprNVDIMMDevice *s_nvdimm = SPAPR_NVDIMM(drc->dev); +SpaprNVDIMMDevice *s_nvdimm; + +if (!drc) { +error_report("papr_scm: Could not find nvdimm device with DRC 0x%u", + state->drcidx); +return; +} + +s_nvdimm = SPAPR_NVDIMM(drc->dev); state->hcall_ret = hcall_ret; QLIST_REMOVE(state, node); -- 2.35.1
[PATCH for-7.1 0/1] Coverity fixes in hw/ppc/spapr_nvdimm.c
Hi, This is a simple patch to fix 2 Coverity issues in hw/ppc/spapr_nvdimm.c. Aiming it to 7.1 because it's not critical enough for 7.0. Daniel Henrique Barboza (1): hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c hw/ppc/spapr_nvdimm.c | 26 ++ 1 file changed, 22 insertions(+), 4 deletions(-) -- 2.35.1
[PATCH v2 8/9] target/ppc: Implemented vector module word/doubleword
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vmodsw: Vector Modulo Signed Word vmoduw: Vector Modulo Unsigned Word vmodsd: Vector Modulo Signed Doubleword vmodud: Vector Modulo Unsigned Doubleword Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/insn32.decode| 5 + target/ppc/translate/vmx-impl.c.inc | 10 ++ 2 files changed, 15 insertions(+) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 3eb920ac76..36b42e41d2 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -719,3 +719,8 @@ VDIVESD 000100 . . . 0001011@VX VDIVEUD 000100 . . . 01011001011@VX VDIVESQ 000100 . . . 0111011@VX VDIVEUQ 000100 . . . 0101011@VX + +VMODSW 000100 . . . 0001011@VX +VMODUW 000100 . . . 11010001011@VX +VMODSD 000100 . . . 1001011@VX +VMODUD 000100 . . . 11011001011@VX diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 23f215dbea..c5178a0f1e 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3340,6 +3340,11 @@ static void do_diveu_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) DO_VDIV_VMOD(do_divesw, 32, do_dives_i32, true) DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false) +DO_VDIV_VMOD(do_modsw, 32, tcg_gen_rem_i32, true) +DO_VDIV_VMOD(do_moduw, 32, tcg_gen_remu_i32, false) +DO_VDIV_VMOD(do_modsd, 64, tcg_gen_rem_i64, true) +DO_VDIV_VMOD(do_modud, 64, tcg_gen_remu_i64, false) + TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL) TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL) TRANS_FLAGS2(ISA310, VDIVESD, do_vx_helper, gen_helper_VDIVESD) @@ -3347,6 +3352,11 @@ TRANS_FLAGS2(ISA310, VDIVEUD, do_vx_helper, gen_helper_VDIVEUD) TRANS_FLAGS2(ISA310, VDIVESQ, do_vx_helper, gen_helper_VDIVESQ) TRANS_FLAGS2(ISA310, VDIVEUQ, do_vx_helper, gen_helper_VDIVEUQ) +TRANS_VDIV_VMOD(ISA310, VMODSW, MO_32, do_modsw , NULL) +TRANS_VDIV_VMOD(ISA310, VMODUW, MO_32, do_moduw, NULL) +TRANS_VDIV_VMOD(ISA310, VMODSD, MO_64, NULL, do_modsd) +TRANS_VDIV_VMOD(ISA310, VMODUD, MO_64, NULL, do_modud) + #undef DO_VDIV_VMOD #undef GEN_VR_LDX -- 2.31.1
[PATCH v2 9/9] target/ppc: Implemented vector module quadword
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vmodsq: Vector Modulo Signed Quadword vmoduq: Vector Modulo Unsigned Quadword Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/helper.h | 2 ++ target/ppc/insn32.decode| 2 ++ target/ppc/int_helper.c | 21 + target/ppc/translate/vmx-impl.c.inc | 2 ++ 4 files changed, 27 insertions(+) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 67ecff2c9a..881e03959a 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -177,6 +177,8 @@ DEF_HELPER_FLAGS_3(VDIVESD, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VDIVEUD, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VDIVESQ, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VDIVEUQ, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VMODSQ, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VMODUQ, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_3(vslo, void, avr, avr, avr) DEF_HELPER_3(vsro, void, avr, avr, avr) DEF_HELPER_3(vsrv, void, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 36b42e41d2..b53efe1915 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -724,3 +724,5 @@ VMODSW 000100 . . . 0001011@VX VMODUW 000100 . . . 11010001011@VX VMODSD 000100 . . . 1001011@VX VMODUD 000100 . . . 11011001011@VX +VMODSQ 000100 . . . 1111011@VX +VMODUQ 000100 . . . 1101011@VX diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 17a10c4412..72b2b06078 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1121,6 +1121,27 @@ void helper_VDIVEUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) } } +void helper_VMODSQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +Int128 neg1 = int128_makes64(-1); +Int128 int128_min = int128_make128(0, INT64_MIN); +if (likely(int128_nz(b->s128) && + (int128_ne(a->s128, int128_min) || int128_ne(b->s128, neg1 { +t->s128 = int128_rems(a->s128, b->s128); +} else { +t->s128 = int128_zero(); /* Undefined behavior */ +} +} + +void helper_VMODUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +if (likely(int128_nz(b->s128))) { +t->s128 = int128_remu(a->s128, b->s128); +} else { +t->s128 = int128_zero(); /* Undefined behavior */ +} +} + void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { ppc_avr_t result; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index c5178a0f1e..7ced7ad655 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3356,6 +3356,8 @@ TRANS_VDIV_VMOD(ISA310, VMODSW, MO_32, do_modsw , NULL) TRANS_VDIV_VMOD(ISA310, VMODUW, MO_32, do_moduw, NULL) TRANS_VDIV_VMOD(ISA310, VMODSD, MO_64, NULL, do_modsd) TRANS_VDIV_VMOD(ISA310, VMODUD, MO_64, NULL, do_modud) +TRANS_FLAGS2(ISA310, VMODSQ, do_vx_helper, gen_helper_VMODSQ) +TRANS_FLAGS2(ISA310, VMODUQ, do_vx_helper, gen_helper_VMODUQ) #undef DO_VDIV_VMOD -- 2.31.1
[PATCH v2 7/9] target/ppc: Implemented remaining vector divide extended
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vdivesd: Vector Divide Extended Signed Doubleword vdiveud: Vector Divide Extended Unsigned Doubleword vdivesq: Vector Divide Extended Signed Quadword vdiveuq: Vector Divide Extended Unsigned Quadword Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/helper.h | 4 ++ target/ppc/insn32.decode| 4 ++ target/ppc/int_helper.c | 64 + target/ppc/translate/vmx-impl.c.inc | 4 ++ 4 files changed, 76 insertions(+) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 4cfdf7b3ec..67ecff2c9a 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -173,6 +173,10 @@ DEF_HELPER_FLAGS_3(VMULOUH, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VMULOUW, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VDIVSQ, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VDIVUQ, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVESD, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVEUD, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVESQ, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVEUQ, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_3(vslo, void, avr, avr, avr) DEF_HELPER_3(vsro, void, avr, avr, avr) DEF_HELPER_3(vsrv, void, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 8c115c9c60..3eb920ac76 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -715,3 +715,7 @@ VDIVUQ 000100 . . . 0001011@VX VDIVESW 000100 . . . 01110001011@VX VDIVEUW 000100 . . . 01010001011@VX +VDIVESD 000100 . . . 0001011@VX +VDIVEUD 000100 . . . 01011001011@VX +VDIVESQ 000100 . . . 0111011@VX +VDIVEUQ 000100 . . . 0101011@VX diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index ba5d4193ff..17a10c4412 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1057,6 +1057,70 @@ void helper_VDIVUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) } } +void helper_VDIVESD(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +int i; +int64_t high; +uint64_t low; +for (i = 0; i < 2; i++) { +high = a->s64[i]; +low = 0; +if (unlikely((high == INT64_MIN && b->s64[i] == -1) || !b->s64[i])) { +t->s64[i] = a->s64[i]; /* Undefined behavior */ +} else { +divs128(, , b->s64[i]); +t->s64[i] = low; +} +} +} + +void helper_VDIVEUD(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +int i; +uint64_t high, low; +for (i = 0; i < 2; i++) { +high = a->u64[i]; +low = 0; +if (unlikely(!b->u64[i])) { +t->u64[i] = a->u64[i]; /* Undefined behavior */ +} else { +divu128(, , b->u64[i]); +t->u64[i] = low; +} +} +} + +void helper_VDIVESQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +Int128 high, low; +Int128 int128_min = int128_make128(0, INT64_MIN); +Int128 neg1 = int128_makes64(-1); + +high = a->s128; +low = int128_zero(); +if (unlikely(!int128_nz(b->s128) || + (int128_eq(b->s128, neg1) && int128_eq(high, int128_min { +t->s128 = a->s128; /* Undefined behavior */ +} else { +divs256(, , b->s128); +t->s128 = low; +} +} + +void helper_VDIVEUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +Int128 high, low; + +high = a->s128; +low = int128_zero(); +if (unlikely(!int128_nz(b->s128))) { +t->s128 = a->s128; /* Undefined behavior */ +} else { +divu256(, , b->s128); +t->s128 = low; +} +} + void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { ppc_avr_t result; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 8799e945bd..23f215dbea 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3342,6 +3342,10 @@ DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false) TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL) TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL) +TRANS_FLAGS2(ISA310, VDIVESD, do_vx_helper, gen_helper_VDIVESD) +TRANS_FLAGS2(ISA310, VDIVEUD, do_vx_helper, gen_helper_VDIVEUD) +TRANS_FLAGS2(ISA310, VDIVESQ, do_vx_helper, gen_helper_VDIVESQ) +TRANS_FLAGS2(ISA310, VDIVEUQ, do_vx_helper, gen_helper_VDIVEUQ) #undef DO_VDIV_VMOD -- 2.31.1
[PATCH v2 4/9] target/ppc: Implemented vector divide extended word
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vdivesw: Vector Divide Extended Signed Word vdiveuw: Vector Divide Extended Unsigned Word Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/insn32.decode| 3 ++ target/ppc/translate/vmx-impl.c.inc | 48 + 2 files changed, 51 insertions(+) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 3a88a0b5bc..8c115c9c60 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -712,3 +712,6 @@ VDIVSD 000100 . . . 00111001011@VX VDIVUD 000100 . . . 00011001011@VX VDIVSQ 000100 . . . 0011011@VX VDIVUQ 000100 . . . 0001011@VX + +VDIVESW 000100 . . . 01110001011@VX +VDIVEUW 000100 . . . 01010001011@VX diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index bac0db7128..8799e945bd 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3295,6 +3295,54 @@ TRANS_VDIV_VMOD(ISA310, VDIVUD, MO_64, NULL, do_divud) TRANS_FLAGS2(ISA310, VDIVSQ, do_vx_helper, gen_helper_VDIVSQ) TRANS_FLAGS2(ISA310, VDIVUQ, do_vx_helper, gen_helper_VDIVUQ) +static void do_dives_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ +TCGv_i64 val1, val2; + +val1 = tcg_temp_new_i64(); +val2 = tcg_temp_new_i64(); + +tcg_gen_ext_i32_i64(val1, a); +tcg_gen_ext_i32_i64(val2, b); + +/* (a << 32)/b */ +tcg_gen_shli_i64(val1, val1, 32); +tcg_gen_div_i64(val1, val1, val2); + +/* if quotient doesn't fit in 32 bits the result is undefined */ +tcg_gen_extrl_i64_i32(t, val1); + +tcg_temp_free_i64(val1); +tcg_temp_free_i64(val2); +} + +static void do_diveu_i32(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b) +{ +TCGv_i64 val1, val2; + +val1 = tcg_temp_new_i64(); +val2 = tcg_temp_new_i64(); + +tcg_gen_extu_i32_i64(val1, a); +tcg_gen_extu_i32_i64(val2, b); + +/* (a << 32)/b */ +tcg_gen_shli_i64(val1, val1, 32); +tcg_gen_divu_i64(val1, val1, val2); + +/* if quotient doesn't fit in 32 bits the result is undefined */ +tcg_gen_extrl_i64_i32(t, val1); + +tcg_temp_free_i64(val1); +tcg_temp_free_i64(val2); +} + +DO_VDIV_VMOD(do_divesw, 32, do_dives_i32, true) +DO_VDIV_VMOD(do_diveuw, 32, do_diveu_i32, false) + +TRANS_VDIV_VMOD(ISA310, VDIVESW, MO_32, do_divesw, NULL) +TRANS_VDIV_VMOD(ISA310, VDIVEUW, MO_32, do_diveuw, NULL) + #undef DO_VDIV_VMOD #undef GEN_VR_LDX -- 2.31.1
[PATCH v2 5/9] host-utils: Implemented unsigned 256-by-128 division
From: "Lucas Mateus Castro (alqotel)" Based on already existing QEMU implementation, created an unsigned 256 bit by 128 bit division needed to implement the vector divide extended unsigned instruction from PowerISA3.1 Signed-off-by: Lucas Mateus Castro (alqotel) --- include/qemu/host-utils.h | 15 + include/qemu/int128.h | 20 ++ util/host-utils.c | 128 ++ 3 files changed, 163 insertions(+) diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h index ca979dc6cc..6da6a93f69 100644 --- a/include/qemu/host-utils.h +++ b/include/qemu/host-utils.h @@ -32,6 +32,7 @@ #include "qemu/compiler.h" #include "qemu/bswap.h" +#include "qemu/int128.h" #ifdef CONFIG_INT128 static inline void mulu64(uint64_t *plow, uint64_t *phigh, @@ -153,6 +154,19 @@ static inline int clo64(uint64_t val) return clz64(~val); } +/* + * clz128 - count leading zeros in a 128-bit value. + * @val: The value to search + */ +static inline int clz128(Int128 a) +{ +if (int128_gethi(a)) { +return clz64(int128_gethi(a)); +} else { +return clz64(int128_getlo(a)) + 64; +} +} + /** * ctz32 - count trailing zeros in a 32-bit value. * @val: The value to search @@ -849,4 +863,5 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1, #endif } +Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor); #endif diff --git a/include/qemu/int128.h b/include/qemu/int128.h index 3af01f38cd..2a9ee956aa 100644 --- a/include/qemu/int128.h +++ b/include/qemu/int128.h @@ -128,11 +128,21 @@ static inline bool int128_ge(Int128 a, Int128 b) return a >= b; } +static inline bool int128_uge(Int128 a, Int128 b) +{ +return ((__uint128_t)a) >= ((__uint128_t)b); +} + static inline bool int128_lt(Int128 a, Int128 b) { return a < b; } +static inline bool int128_ult(Int128 a, Int128 b) +{ +return (__uint128_t)a < (__uint128_t)b; +} + static inline bool int128_le(Int128 a, Int128 b) { return a <= b; @@ -373,11 +383,21 @@ static inline bool int128_ge(Int128 a, Int128 b) return a.hi > b.hi || (a.hi == b.hi && a.lo >= b.lo); } +static inline bool int128_uge(Int128 a, Int128 b) +{ +return (uint64_t)a.hi > (uint64_t)b.hi || (a.hi == b.hi && a.lo >= b.lo); +} + static inline bool int128_lt(Int128 a, Int128 b) { return !int128_ge(a, b); } +static inline bool int128_ult(Int128 a, Int128 b) +{ +return !int128_uge(a, b); +} + static inline bool int128_le(Int128 a, Int128 b) { return int128_ge(b, a); diff --git a/util/host-utils.c b/util/host-utils.c index bcc772b8ec..c6a01638c7 100644 --- a/util/host-utils.c +++ b/util/host-utils.c @@ -266,3 +266,131 @@ void ulshift(uint64_t *plow, uint64_t *phigh, int32_t shift, bool *overflow) *plow = *plow << shift; } } +/* + * Unsigned 256-by-128 division. + * Returns the remainder via r. + * Returns lower 128 bit of quotient. + * Needs a normalized divisor (most significant bit set to 1). + * + * Adapted from include/qemu/host-utils.h udiv_qrnnd, + * from the GNU Multi Precision Library - longlong.h __udiv_qrnnd + * (https://gmplib.org/repo/gmp/file/tip/longlong.h) + * + * Licensed under the GPLv2/LGPLv3 + */ +static Int128 udiv256_qrnnd(Int128 *r, Int128 n1, Int128 n0, Int128 d) +{ +Int128 d0, d1, q0, q1, r1, r0, m; +uint64_t mp0, mp1; + +d0 = int128_make64(int128_getlo(d)); +d1 = int128_make64(int128_gethi(d)); + +r1 = int128_remu(n1, d1); +q1 = int128_divu(n1, d1); +mp0 = int128_getlo(q1); +mp1 = int128_gethi(q1); +mulu128(, , int128_getlo(d0)); +m = int128_make128(mp0, mp1); +r1 = int128_make128(int128_gethi(n0), int128_getlo(r1)); +if (int128_ult(r1, m)) { +q1 = int128_sub(q1, int128_one()); +r1 = int128_add(r1, d); +if (int128_uge(r1, d)) { +if (int128_ult(r1, m)) { +q1 = int128_sub(q1, int128_one()); +r1 = int128_add(r1, d); +} +} +} +r1 = int128_sub(r1, m); + +r0 = int128_remu(r1, d1); +q0 = int128_divu(r1, d1); +mp0 = int128_getlo(q0); +mp1 = int128_gethi(q0); +mulu128(, , int128_getlo(d0)); +m = int128_make128(mp0, mp1); +r0 = int128_make128(int128_getlo(n0), int128_getlo(r0)); +if (int128_ult(r0, m)) { +q0 = int128_sub(q0, int128_one()); +r0 = int128_add(r0, d); +if (int128_uge(r0, d)) { +if (int128_ult(r0, m)) { +q0 = int128_sub(q0, int128_one()); +r0 = int128_add(r0, d); +} +} +} +r0 = int128_sub(r0, m); + +*r = r0; +return int128_or(int128_lshift(q1, 64), q0); +} + +/* + * Unsigned 256-by-128 division. + * Returns the remainder. + * Returns quotient via plow and phigh. + * Also returns the remainder via the function return value. + */ +Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor) +{ +Int128 dhi = *phigh; +Int128 dlo = *plow;
[PATCH v2 2/9] target/ppc: Implemented vector divide instructions
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vdivsw: Vector Divide Signed Word vdivuw: Vector Divide Unsigned Word vdivsd: Vector Divide Signed Doubleword vdivud: Vector Divide Unsigned Doubleword Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/insn32.decode| 7 target/ppc/translate/vmx-impl.c.inc | 59 + 2 files changed, 66 insertions(+) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index ac2d3da9a7..597768558b 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -703,3 +703,10 @@ XVTLSBB 00 ... -- 00010 . 111011011 . - @XX2_bf_xb _s s:uint8_t @XL_s ..-- s:1 .. - _s RFEBB 010011-- . 0010010010 - @XL_s + +## Vector Division Instructions + +VDIVSW 000100 . . . 00110001011@VX +VDIVUW 000100 . . . 00010001011@VX +VDIVSD 000100 . . . 00111001011@VX +VDIVUD 000100 . . . 00011001011@VX diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index 6101bca3fd..be35d6fdf3 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3236,6 +3236,65 @@ TRANS(VMULHSD, do_vx_mulh, true , do_vx_vmulhd_i64) TRANS(VMULHUW, do_vx_mulh, false, do_vx_vmulhw_i64) TRANS(VMULHUD, do_vx_mulh, false, do_vx_vmulhd_i64) +#define TRANS_VDIV_VMOD(FLAGS, NAME, VECE, FNI4_FUNC, FNI8_FUNC)\ +static bool trans_##NAME(DisasContext *ctx, arg_VX *a) \ +{ \ +static const GVecGen3 op = {\ +.fni4 = FNI4_FUNC, \ +.fni8 = FNI8_FUNC, \ +.vece = VECE\ +}; \ +\ +REQUIRE_VECTOR(ctx);\ +REQUIRE_INSNS_FLAGS2(ctx, FLAGS); \ +\ +tcg_gen_gvec_3(avr_full_offset(a->vrt), avr_full_offset(a->vra),\ + avr_full_offset(a->vrb), 16, 16, ); \ +\ +return true;\ +} + +#define DO_VDIV_VMOD(NAME, SZ, DIV, SIGNED) \ +static void NAME(TCGv_i##SZ t, TCGv_i##SZ a, TCGv_i##SZ b) \ +{ \ +/* \ + * If N/0 the instruction used by the backend might deliver\ + * an invalid division signal to the process, so if b = 0 return \ + * N/1 and if signed instruction, the same for a = int_min, b = -1 \ + */ \ +if (SIGNED) { \ +TCGv_i##SZ t0 = tcg_temp_new_i##SZ(); \ +TCGv_i##SZ t1 = tcg_temp_new_i##SZ(); \ +tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t0, a, INT##SZ##_MIN); \ +tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t1, b, -1); \ +tcg_gen_and_i##SZ(t0, t0, t1); \ +tcg_gen_setcondi_i##SZ(TCG_COND_EQ, t1, b, 0); \ +tcg_gen_or_i##SZ(t0, t0, t1); \ +tcg_gen_movi_i##SZ(t1, 0); \ +tcg_gen_movcond_i##SZ(TCG_COND_NE, b, t0, t1, t0, b); \ +DIV(t, a, b); \ +tcg_temp_free_i##SZ(t0);\ +tcg_temp_free_i##SZ(t1);\ +} else {\ +TCGv_i##SZ zero = tcg_constant_i##SZ(0);\ +TCGv_i##SZ one = tcg_constant_i##SZ(1); \ +tcg_gen_movcond_i##SZ(TCG_COND_EQ, b, b, zero, one, b); \ +DIV(t, a, b); \ +} \ +} + +DO_VDIV_VMOD(do_divsw, 32, tcg_gen_div_i32, true) +DO_VDIV_VMOD(do_divuw, 32, tcg_gen_divu_i32, false) +DO_VDIV_VMOD(do_divsd, 64, tcg_gen_div_i64, true) +DO_VDIV_VMOD(do_divud, 64, tcg_gen_divu_i64, false) + +TRANS_VDIV_VMOD(ISA310, VDIVSW,
[PATCH v2 3/9] target/ppc: Implemented vector divide quadword
From: "Lucas Mateus Castro (alqotel)" Implement the following PowerISA v3.1 instructions: vdivsq: Vector Divide Signed Quadword vdivuq: Vector Divide Unsigned Quadword Signed-off-by: Lucas Mateus Castro (alqotel) --- target/ppc/helper.h | 2 ++ target/ppc/insn32.decode| 2 ++ target/ppc/int_helper.c | 21 + target/ppc/translate/vmx-impl.c.inc | 2 ++ 4 files changed, 27 insertions(+) diff --git a/target/ppc/helper.h b/target/ppc/helper.h index 57da11c77e..4cfdf7b3ec 100644 --- a/target/ppc/helper.h +++ b/target/ppc/helper.h @@ -171,6 +171,8 @@ DEF_HELPER_FLAGS_3(VMULOSW, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VMULOUB, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VMULOUH, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_FLAGS_3(VMULOUW, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVSQ, TCG_CALL_NO_RWG, void, avr, avr, avr) +DEF_HELPER_FLAGS_3(VDIVUQ, TCG_CALL_NO_RWG, void, avr, avr, avr) DEF_HELPER_3(vslo, void, avr, avr, avr) DEF_HELPER_3(vsro, void, avr, avr, avr) DEF_HELPER_3(vsrv, void, avr, avr, avr) diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode index 597768558b..3a88a0b5bc 100644 --- a/target/ppc/insn32.decode +++ b/target/ppc/insn32.decode @@ -710,3 +710,5 @@ VDIVSW 000100 . . . 00110001011@VX VDIVUW 000100 . . . 00010001011@VX VDIVSD 000100 . . . 00111001011@VX VDIVUD 000100 . . . 00011001011@VX +VDIVSQ 000100 . . . 0011011@VX +VDIVUQ 000100 . . . 0001011@VX diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 492f34c499..ba5d4193ff 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -1036,6 +1036,27 @@ void helper_XXPERMX(ppc_vsr_t *t, ppc_vsr_t *s0, ppc_vsr_t *s1, ppc_vsr_t *pcv, *t = tmp; } +void helper_VDIVSQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +Int128 neg1 = int128_makes64(-1); +Int128 int128_min = int128_make128(0, INT64_MIN); +if (likely(int128_nz(b->s128) && + (int128_ne(a->s128, int128_min) || int128_ne(b->s128, neg1 { +t->s128 = int128_divs(a->s128, b->s128); +} else { +t->s128 = a->s128; /* Undefined behavior */ +} +} + +void helper_VDIVUQ(ppc_avr_t *t, ppc_avr_t *a, ppc_avr_t *b) +{ +if (int128_nz(b->s128)) { +t->s128 = int128_divu(a->s128, b->s128); +} else { +t->s128 = a->s128; /* Undefined behavior */ +} +} + void helper_VPERM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c) { ppc_avr_t result; diff --git a/target/ppc/translate/vmx-impl.c.inc b/target/ppc/translate/vmx-impl.c.inc index be35d6fdf3..bac0db7128 100644 --- a/target/ppc/translate/vmx-impl.c.inc +++ b/target/ppc/translate/vmx-impl.c.inc @@ -3292,6 +3292,8 @@ TRANS_VDIV_VMOD(ISA310, VDIVSW, MO_32, do_divsw, NULL) TRANS_VDIV_VMOD(ISA310, VDIVUW, MO_32, do_divuw, NULL) TRANS_VDIV_VMOD(ISA310, VDIVSD, MO_64, NULL, do_divsd) TRANS_VDIV_VMOD(ISA310, VDIVUD, MO_64, NULL, do_divud) +TRANS_FLAGS2(ISA310, VDIVSQ, do_vx_helper, gen_helper_VDIVSQ) +TRANS_FLAGS2(ISA310, VDIVUQ, do_vx_helper, gen_helper_VDIVUQ) #undef DO_VDIV_VMOD -- 2.31.1
[PATCH v2 1/9] qemu/int128: add int128_urshift
From: Matheus Ferst Implement an unsigned right shift for Int128 values and add the same tests cases of int128_rshift in the unit test. Signed-off-by: Matheus Ferst Signed-off-by: Lucas Mateus Castro (alqotel) --- include/qemu/int128.h| 19 +++ tests/unit/test-int128.c | 32 2 files changed, 51 insertions(+) diff --git a/include/qemu/int128.h b/include/qemu/int128.h index 2c4064256c..3af01f38cd 100644 --- a/include/qemu/int128.h +++ b/include/qemu/int128.h @@ -83,6 +83,11 @@ static inline Int128 int128_rshift(Int128 a, int n) return a >> n; } +static inline Int128 int128_urshift(Int128 a, int n) +{ +return (__uint128_t)a >> n; +} + static inline Int128 int128_lshift(Int128 a, int n) { return a << n; @@ -299,6 +304,20 @@ static inline Int128 int128_rshift(Int128 a, int n) } } +static inline Int128 int128_urshift(Int128 a, int n) +{ +uint64_t h = a.hi; +if (!n) { +return a; +} +h = h >> (n & 63); +if (n >= 64) { +return int128_make64(h); +} else { +return int128_make128((a.lo >> n) | ((uint64_t)a.hi << (64 - n)), h); +} +} + static inline Int128 int128_lshift(Int128 a, int n) { uint64_t l = a.lo << (n & 63); diff --git a/tests/unit/test-int128.c b/tests/unit/test-int128.c index b86a3c76e6..ae0f552193 100644 --- a/tests/unit/test-int128.c +++ b/tests/unit/test-int128.c @@ -206,6 +206,37 @@ static void test_rshift(void) test_rshift_one(0xFFFE8000U, 0, 0xFFFEULL, 0x8000ULL); } +static void __attribute__((__noinline__)) ATTRIBUTE_NOCLONE +test_urshift_one(uint32_t x, int n, uint64_t h, uint64_t l) +{ +Int128 a = expand(x); +Int128 r = int128_urshift(a, n); +g_assert_cmpuint(int128_getlo(r), ==, l); +g_assert_cmpuint(int128_gethi(r), ==, h); +} + +static void test_urshift(void) +{ +test_urshift_one(0x0001U, 64, 0xULL, 0x0001ULL); +test_urshift_one(0x8001U, 64, 0xULL, 0x8001ULL); +test_urshift_one(0x7FFEU, 64, 0xULL, 0x7FFEULL); +test_urshift_one(0xFFFEU, 64, 0xULL, 0xFFFEULL); +test_urshift_one(0x0001U, 60, 0xULL, 0x0010ULL); +test_urshift_one(0x8001U, 60, 0x0008ULL, 0x0010ULL); +test_urshift_one(0x00018000U, 60, 0xULL, 0x0018ULL); +test_urshift_one(0x80018000U, 60, 0x0008ULL, 0x0018ULL); +test_urshift_one(0x7FFEU, 60, 0x0007ULL, 0xFFE0ULL); +test_urshift_one(0xFFFEU, 60, 0x000FULL, 0xFFE0ULL); +test_urshift_one(0x7FFE8000U, 60, 0x0007ULL, 0xFFE8ULL); +test_urshift_one(0xFFFE8000U, 60, 0x000FULL, 0xFFE8ULL); +test_urshift_one(0x00018000U, 0, 0x0001ULL, 0x8000ULL); +test_urshift_one(0x80018000U, 0, 0x8001ULL, 0x8000ULL); +test_urshift_one(0x7FFEU, 0, 0x7FFEULL, 0xULL); +test_urshift_one(0xFFFEU, 0, 0xFFFEULL, 0xULL); +test_urshift_one(0x7FFE8000U, 0, 0x7FFEULL, 0x8000ULL); +test_urshift_one(0xFFFE8000U, 0, 0xFFFEULL, 0x8000ULL); +} + int main(int argc, char **argv) { g_test_init(, , NULL); @@ -219,5 +250,6 @@ int main(int argc, char **argv) g_test_add_func("/int128/int128_ge", test_ge); g_test_add_func("/int128/int128_gt", test_gt); g_test_add_func("/int128/int128_rshift", test_rshift); +g_test_add_func("/int128/int128_urshift", test_urshift); return g_test_run(); } -- 2.31.1
[PATCH v2 6/9] host-utils: Implemented signed 256-by-128 division
From: "Lucas Mateus Castro (alqotel)" Based on already existing QEMU implementation created a signed 256 bit by 128 bit division needed to implement the vector divide extended signed quadword instruction from PowerISA 3.1 Signed-off-by: Lucas Mateus Castro (alqotel) Reviewed-by: Richard Henderson --- include/qemu/host-utils.h | 1 + util/host-utils.c | 51 +++ 2 files changed, 52 insertions(+) diff --git a/include/qemu/host-utils.h b/include/qemu/host-utils.h index 6da6a93f69..d0b444a40f 100644 --- a/include/qemu/host-utils.h +++ b/include/qemu/host-utils.h @@ -864,4 +864,5 @@ static inline uint64_t udiv_qrnnd(uint64_t *r, uint64_t n1, } Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor); +Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor); #endif diff --git a/util/host-utils.c b/util/host-utils.c index c6a01638c7..d221657e43 100644 --- a/util/host-utils.c +++ b/util/host-utils.c @@ -394,3 +394,54 @@ Int128 divu256(Int128 *plow, Int128 *phigh, Int128 divisor) return rem; } } + +/* + * Signed 256-by-128 division. + * Returns quotient via plow and phigh. + * Also returns the remainder via the function return value. + */ +Int128 divs256(Int128 *plow, Int128 *phigh, Int128 divisor) +{ +bool neg_quotient = false, neg_remainder = false; +Int128 unsig_hi = *phigh, unsig_lo = *plow; +Int128 rem; + +if (!int128_nonneg(*phigh)) { +neg_quotient = !neg_quotient; +neg_remainder = !neg_remainder; + +if (!int128_nz(unsig_lo)) { +unsig_hi = int128_neg(unsig_hi); +} else { +unsig_hi = int128_not(unsig_hi); +unsig_lo = int128_neg(unsig_lo); +} +} + +if (!int128_nonneg(divisor)) { +neg_quotient = !neg_quotient; + +divisor = int128_neg(divisor); +} + +rem = divu256(_lo, _hi, divisor); + +if (neg_quotient) { +if (!int128_nz(unsig_lo)) { +*phigh = int128_neg(unsig_hi); +*plow = int128_zero(); +} else { +*phigh = int128_not(unsig_hi); +*plow = int128_neg(unsig_lo); +} +} else { +*phigh = unsig_hi; +*plow = unsig_lo; +} + +if (neg_remainder) { +return int128_neg(rem); +} else { +return rem; +} +} -- 2.31.1
[PATCH v2 0/9] VDIV/VMOD Implementation
From: "Lucas Mateus Castro (alqotel)" This patch series is an implementation of the vector divide, vector divide extended and vector modulo instructions from PowerISA 3.1 The first patch are Matheus' patch, used here since the divs256 and divu256 functions use int128_urshift. v2 changes: - Dropped int128_lshift patch - Added missing int_min/-1 check - Changed invalid division to a division by 1 - Created new macro responsible for invalid division check (replacing DIV_VEC, REM_VEC and the check in dives_i32/diveu_i32) - Turned GVecGen3 array into single element Lucas Mateus Castro (alqotel) (8): target/ppc: Implemented vector divide instructions target/ppc: Implemented vector divide quadword target/ppc: Implemented vector divide extended word host-utils: Implemented unsigned 256-by-128 division host-utils: Implemented signed 256-by-128 division target/ppc: Implemented remaining vector divide extended target/ppc: Implemented vector module word/doubleword target/ppc: Implemented vector module quadword Matheus Ferst (1): qemu/int128: add int128_urshift include/qemu/host-utils.h | 16 +++ include/qemu/int128.h | 39 ++ target/ppc/helper.h | 8 ++ target/ppc/insn32.decode| 23 target/ppc/int_helper.c | 106 target/ppc/translate/vmx-impl.c.inc | 125 +++ tests/unit/test-int128.c| 32 + util/host-utils.c | 179 8 files changed, 528 insertions(+) -- 2.31.1
Re: [PATCH v2 2/4] target/ppc: init 'lpcr' in kvmppc_enable_cap_large_decr()
On 4/1/22 00:40, David Gibson wrote: On Thu, Mar 31, 2022 at 03:46:57PM -0300, Daniel Henrique Barboza wrote: On 3/31/22 14:36, Richard Henderson wrote: On 3/31/22 11:17, Daniel Henrique Barboza wrote: Hmm... this is seeming a bit like whack-a-mole. Could we instead use one of the valgrind hinting mechanisms to inform it that kvm_get_one_reg() writes the variable at *target? I didn't find a way of doing that looking in the memcheck helpers (https://valgrind.org/docs/manual/mc-manual.html section 4.7). That would be a good way of solving this warning because we would put stuff inside a specific function X and all callers of X would be covered by it. What I did find instead is a memcheck macro called VALGRIND_MAKE_MEM_DEFINED that tells Valgrind that the var was initialized. This patch would then be something as follows: diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index dc93b99189..b0e22fa283 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -56,6 +56,10 @@ #define DEBUG_RETURN_GUEST 0 #define DEBUG_RETURN_GDB 1 +#ifdef CONFIG_VALGRIND_H +#include +#endif + const KVMCapabilityInfo kvm_arch_required_capabilities[] = { KVM_CAP_LAST_INFO }; @@ -2539,6 +2543,10 @@ int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable) CPUState *cs = CPU(cpu); uint64_t lpcr; +#ifdef CONFIG_VALGRIND_H + VALGRIND_MAKE_MEM_DEFINED(lpcr, sizeof(uint64_t)); +#endif + kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, ); /* Do we need to modify the LPCR? */ CONFIG_VALGRIND_H needs 'valgrind-devel´ installed. I agree that this "Valgrind is complaining about variable initialization" is a whack-a-mole situation that will keep happening in the future if we keep adding this same code pattern (passing as reference an uninitialized var). For now, given that we have only 4 instances to fix it in ppc code (as far as I'm aware of), and we don't have a better way of telling Valgrind that we know what we're doing, I think we're better of initializing these vars. I would instead put this annotation inside kvm_get_one_reg, so that it covers all kvm hosts. But it's too late to do this for 7.0. I wasn't planning on pushing these changes for 7.0 since they aren't fixing mem leaks or anything really bad. It's more of a quality of life improvement when using Valgrind. I also tried to put this annotation in kvm_get_one_reg() and it didn't solve the warning. That's weird, I'm pretty sure that should work. I'd double check to make sure you had all the parameters right (e.g. could you have marked the pointer itself as initialized, rather than the memory it points to). You're right. I got confused with different setups here and there and thought that it didn't work. I sent a patch to kvm-all.c that tries to do that: https://lists.gnu.org/archive/html/qemu-devel/2022-04/msg00507.html As for this series, for now I'm willing to take it since it improves the situation with simple initializations. We can reconsider it if we make good progress through the common code. At any rate these are 7.1 patches, so we have time. Thanks, Daniel I didn't find a way of telling Valgrind "consider that every time this function is called with parameter X it initializes X". That would be a good solution to put in the common KVM files and fix the problem for everybody. Daniel r~
Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory
On 4/5/22 11:30, Peter Maydell wrote: On Tue, 5 Apr 2022 at 14:07, Daniel Henrique Barboza wrote: There is a lot of Valgrind warnings about conditional jump depending on unintialized values like this one (taken from a pSeries guest): Conditional jump or move depends on uninitialised value(s) at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544) by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523) by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921) by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73) (...) Uninitialised value was created by a stack allocation at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538) In this case, the alleged unintialized value is the 'lpcr' variable that is written by kvm_get_one_reg() and then used in an if clause: int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable) { CPUState *cs = CPU(cpu); uint64_t lpcr; kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, ); /* Do we need to modify the LPCR? */ if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here (...) A quick fix is to init the variable that kvm_get_one_reg() is going to write ('lpcr' in the example above). Another idea is to convince Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case the ioctl() is successful. This will put some boilerplate in the function but it will bring benefit for its other callers. Doesn't Valgrind have a way of modelling ioctls where it knows what data is read and written ? In general ioctl-using programs don't need to have special case "I am running under valgrind" handling, so this seems to me like valgrind is missing support for this particular ioctl. I don't know if Valgrind is capable of doing that. Guess it's worth a look. More generally, how much use is running QEMU with KVM enabled under valgrind anyway? Valgrind has no way of knowing about writes to memory that the guest vCPUs do... At least in the hosts I have access to, I wasn't able to get a pSeries guest booting up to prompt with Valgrind + TCG. It was painfully slow. Valgrind + KVM is slow but doable. Granted, vCPUs reads/writes can't be profiled with it when using KVM, but for everything else is alright. Thanks, Daniel thanks -- PMM
Re: [PATCH 4/7] virtio: don't read pending event on host notifier if disabled
On 4/1/2022 7:00 PM, Jason Wang wrote: On Sat, Apr 2, 2022 at 4:37 AM Si-Wei Liu wrote: On 3/31/2022 1:36 AM, Jason Wang wrote: On Thu, Mar 31, 2022 at 12:41 AM Si-Wei Liu wrote: On 3/30/2022 2:14 AM, Jason Wang wrote: On Wed, Mar 30, 2022 at 2:33 PM Si-Wei Liu wrote: Previous commit prevents vhost-user and vhost-vdpa from using userland vq handler via disable_ioeventfd_handler. The same needs to be done for host notifier cleanup too, as the virtio_queue_host_notifier_read handler still tends to read pending event left behind on ioeventfd and attempts to handle outstanding kicks from QEMU userland vq. If vq handler is not disabled on cleanup, it may lead to sigsegv with recursive virtio_net_set_status call on the control vq: 0 0x7f8ce3ff3387 in raise () at /lib64/libc.so.6 1 0x7f8ce3ff4a78 in abort () at /lib64/libc.so.6 2 0x7f8ce3fec1a6 in __assert_fail_base () at /lib64/libc.so.6 3 0x7f8ce3fec252 in () at /lib64/libc.so.6 4 0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, idx=) at ../hw/virtio/vhost-vdpa.c:563 5 0x558f52d79421 in vhost_vdpa_get_vq_index (dev=, idx=) at ../hw/virtio/vhost-vdpa.c:558 6 0x558f52d7329a in vhost_virtqueue_mask (hdev=0x558f55c01800, vdev=0x558f568f91f0, n=2, mask=) at ../hw/virtio/vhost.c:1557 I feel it's probably a bug elsewhere e.g when we fail to start vhost-vDPA, it's the charge of the Qemu to poll host notifier and we will fallback to the userspace vq handler. Apologies, an incorrect stack trace was pasted which actually came from patch #1. I will post a v2 with the corresponding one as below: 0 0x55f800df1780 in qdev_get_parent_bus (dev=0x0) at ../hw/core/qdev.c:376 1 0x55f800c68ad8 in virtio_bus_device_iommu_enabled (vdev=vdev@entry=0x0) at ../hw/virtio/virtio-bus.c:331 2 0x55f800d70d7f in vhost_memory_unmap (dev=) at ../hw/virtio/vhost.c:318 3 0x55f800d70d7f in vhost_memory_unmap (dev=, buffer=0x7fc19bec5240, len=2052, is_write=1, access_len=2052) at ../hw/virtio/vhost.c:336 4 0x55f800d71867 in vhost_virtqueue_stop (dev=dev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590, vq=0x55f8037cceb0, idx=0) at ../hw/virtio/vhost.c:1241 5 0x55f800d7406c in vhost_dev_stop (hdev=hdev@entry=0x55f8037ccc30, vdev=vdev@entry=0x55f8044ec590) at ../hw/virtio/vhost.c:1839 6 0x55f800bf00a7 in vhost_net_stop_one (net=0x55f8037ccc30, dev=0x55f8044ec590) at ../hw/net/vhost_net.c:315 7 0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, cvq=cvq@entry=1) at ../hw/net/vhost_net.c:423 8 0x55f800d4e628 in virtio_net_set_status (status=, n=0x55f8044ec590) at ../hw/net/virtio-net.c:296 9 0x55f800d4e628 in virtio_net_set_status (vdev=vdev@entry=0x55f8044ec590, status=15 '\017') at ../hw/net/virtio-net.c:370 I don't understand why virtio_net_handle_ctrl() call virtio_net_set_stauts()... The pending request left over on the ctrl vq was a VIRTIO_NET_CTRL_MQ command, i.e. in virtio_net_handle_mq(): Completely forget that the code was actually written by me :\ 1413 n->curr_queue_pairs = queue_pairs; 1414 /* stop the backend before changing the number of queue_pairs to avoid handling a 1415 * disabled queue */ 1416 virtio_net_set_status(vdev, vdev->status); 1417 virtio_net_set_queue_pairs(n); Noted before the vdpa multiqueue support, there was never a vhost_dev for ctrl_vq exposed, i.e. there's no host notifier set up for the ctrl_vq on vhost_kernel as it is emulated in QEMU software. 10 0x55f800d534d8 in virtio_net_handle_ctrl (iov_cnt=, iov=, cmd=0 '\000', n=0x55f8044ec590) at ../hw/net/virtio-net.c:1408 11 0x55f800d534d8 in virtio_net_handle_ctrl (vdev=0x55f8044ec590, vq=0x7fc1a7e888d0) at ../hw/net/virtio-net.c:1452 12 0x55f800d69f37 in virtio_queue_host_notifier_read (vq=0x7fc1a7e888d0) at ../hw/virtio/virtio.c:2331 13 0x55f800d69f37 in virtio_queue_host_notifier_read (n=n@entry=0x7fc1a7e8894c) at ../hw/virtio/virtio.c:3575 14 0x55f800c688e6 in virtio_bus_cleanup_host_notifier (bus=, n=n@entry=14) at ../hw/virtio/virtio-bus.c:312 15 0x55f800d73106 in vhost_dev_disable_notifiers (hdev=hdev@entry=0x55f8035b51b0, vdev=vdev@entry=0x55f8044ec590) at ../../../include/hw/virtio/virtio-bus.h:35 16 0x55f800bf00b2 in vhost_net_stop_one (net=0x55f8035b51b0, dev=0x55f8044ec590) at ../hw/net/vhost_net.c:316 17 0x55f800bf0678 in vhost_net_stop (dev=dev@entry=0x55f8044ec590, ncs=0x55f80452bae0, data_queue_pairs=data_queue_pairs@entry=7, cvq=cvq@entry=1) at ../hw/net/vhost_net.c:423 18 0x55f800d4e628 in virtio_net_set_status (status=, n=0x55f8044ec590) at ../hw/net/virtio-net.c:296 19 0x55f800d4e628 in virtio_net_set_status (vdev=0x55f8044ec590, status=15 '\017') at ../hw/net/virtio-net.c:370 20 0x55f800d6c4b2 in virtio_set_status (vdev=0x55f8044ec590, val=) at ../hw/virtio/virtio.c:1945 21 0x55f800d11d9d in vm_state_notify
[PATCH] acpi: Bodge acpi_index migration
binkkNQxjetjk.bin Description: Binary data
Re: [PULL 0/3] Misc changes for 2022-04-05
On Tue, 5 Apr 2022 at 10:25, Paolo Bonzini wrote: > > The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5: > > Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into > staging (2022-04-04 15:48:55 +0100) > > are available in the Git repository at: > > https://gitlab.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 776a6a32b4982a68d3b7a77cbfaae6c2b363a0b8: > > docs/system/i386: Add measurement calculation details to > amd-memory-encryption (2022-04-05 10:42:06 +0200) > > > * fix vss-win32 compilation with clang++ > > * update Coverity model > > * add measurement calculation to amd-memory-encryption docs > > > Dov Murik (1): > docs/system/i386: Add measurement calculation details to > amd-memory-encryption > > Helge Konetzka (1): > qga/vss-win32: fix compilation with clang++ > > Paolo Bonzini (1): > coverity: update model for latest tools Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0 for any user-visible changes. -- PMM
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
On Tue, Apr 05, 2022, Andy Lutomirski wrote: > On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote: > > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > >> The best I can come up with is a special type of shared page that is not > >> GUP-able and maybe not even mmappable, having a clear option for > >> transitions to fail, and generally preventing the nasty cases from > >> happening in the first place. > > > > Right, that sounds reasonable to me. > > At least as a v1, this is probably more straightforward than allowing mmap(). > Also, there's much to be said for a simpler, limited API, to be expanded if > genuinely needed, as opposed to starting out with a very featureful API. Regarding "genuinely needed", IMO the same applies to supporting this at all. Without numbers from something at least approximating a real use case, we're just speculating on which will be the most performant approach. > >> Maybe there could be a special mode for the private memory fds in which > >> specific pages are marked as "managed by this fd but actually shared". > >> pread() and pwrite() would work on those pages, but not mmap(). (Or maybe > >> mmap() but the resulting mappings would not permit GUP.) And > >> transitioning them would be a special operation on the fd that is specific > >> to pKVM and wouldn't work on TDX or SEV. > > > > Aha, didn't think of pread()/pwrite(). Very interesting. > > There are plenty of use cases for which pread()/pwrite()/splice() will be as > fast or even much faster than mmap()+memcpy(). ... > resume guest > *** host -> hypervisor -> guest *** > Guest unshares the page. > *** guest -> hypervisor *** > Hypervisor removes PTE. TLBI. > *** hypervisor -> guest *** > > Obviously considerable cleverness is needed to make a virt IOMMU like this > work well, but still. > > Anyway, my suggestion is that the fd backing proposal get slightly modified > to get it ready for multiple subtypes of backing object, which should be a > pretty minimal change. Then, if someone actually needs any of this > cleverness, it can be added later. In the mean time, the > pread()/pwrite()/splice() scheme is pretty good. Tangentially related to getting private-fd ready for multiple things, what about implementing the pread()/pwrite()/splice() scheme in pKVM itself? I.e. read() on the VM fd, with the offset corresponding to gfn in some way. Ditto for mmap() on the VM fd, though that would require additional changes outside of pKVM. That would allow pKVM to support in-place conversions without the private-fd having to differentiate between the type of protected VM, and without having to provide new APIs from the private-fd. TDX, SNP, etc... Just Work by not supporting the pKVM APIs. And assuming we get multiple consumers down the road, pKVM will need to be able to communicate the "true" state of a page to other consumers, because in addition to being a consumer, pKVM is also an owner/enforcer analogous to the TDX Module and the SEV PSP.
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05/04/2022 um 19:53 schrieb Emanuele Giuseppe Esposito: > > > Am 05/04/2022 um 17:04 schrieb Kevin Wolf: >> Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben: >>> Am 05/04/2022 um 12:14 schrieb Kevin Wolf: I think all of this is really relevant for Emanuele's work, which involves adding AIO_WAIT_WHILE() deep inside graph update functions. I fully expect that we would see very similar problems, and just stacking drain sections over drain sections that might happen to usually fix things, but aren't guaranteed to, doesn't look like a good solution. >>> >>> Yes, I think at this point we all agreed to drop subtree_drain as >>> replacement for AioContext. >>> >>> The alternative is what Paolo proposed in the other thread " Removal of >>> AioContext lock, bs->parents and ->children: proof of concept" >>> I am not sure which thread you replied first :) >> >> This one, I think. :-) >> >>> I think that proposal is not far from your idea, and it avoids to >>> introduce or even use drains at all. >>> Not sure why you called it a "step backwards even from AioContext locks". >> >> I was only referring to the lock locality there. AioContext locks are >> really coarse, but still a finer granularity than a single global lock. >> >> In the big picture, it's still be better than the AioContext lock, but >> that's because it's a different type of lock, not because it has better >> locality. >> >> So I was just wondering if we can't have the different type of lock and >> make it local to the BDS, too. > > I guess this is the right time to discuss this. > > I think that a global lock will be easier to handle, and we already have > a concrete implementation (cpus-common). > > I think that the reads in some sense are already BDS-specific, because > each BDS that is reading has an internal a flag. > Writes, on the other hand, are global. If a write is happening, no other > read at all can run, even if it has nothing to do with it. > > The question then is: how difficult would be to implement a BDS-specific > write? > From the API prospective, change > bdrv_graph_wrlock(void); > into > bdrv_graph_wrlock(BlockDriverState *parent, BlockDriverState *child); > I am not sure if/how complicated it will be. For sure all the global > variables would end up in the BDS struct. > > On the other side, also making instead read generic could be interesting. > Think about drain: it is a recursive function, and it doesn't really > make sense to take the rdlock for each node it traverses. Otherwise a simple solution for drains that require no change at allis to just take the rdlock on the bs calling drain, and since each write waits for all reads to complete, it will work anyways. The only detail is that assert_bdrv_graph_readable() will then need to iterate through all nodes to be sure that at leas one of them is actually reading. So yeah I know this might be hard to realize without an implementation, but my conclusion is to leave the lock as it is for now. > Even though I don't know an easy way to replace ->has_waiter and > ->reading_graph flags... > > Emanuele >
Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu
On Tue, Apr 5, 2022, 5:03 AM Damien Hedde wrote: > > > On 4/4/22 22:34, John Snow wrote: > > On Wed, Mar 16, 2022 at 5:55 AM Damien Hedde > wrote: > >> > >> It takes an input file containing raw qmp commands (concatenated json > >> dicts) and send all commands one by one to a qmp server. When one > >> command fails, it exits. > >> > >> As a convenience, it can also wrap the qemu process to avoid having > >> to start qemu in background. When wrapping qemu, the program returns > >> only when the qemu process terminates. > >> > >> Signed-off-by: Damien Hedde > >> --- > >> > >> Hi all, > >> > >> Following our discussion, I've started this. What do you think ? > >> > >> I tried to follow Daniel's qmp-shell-wrap. I think it is > >> better to have similar options (eg: logging). There is also room > >> for factorizing code if we want to keep them aligned and ease > >> maintenance. > >> > >> There are still some pylint issues (too many branches in main and it > >> does not like my context manager if else line). But it's kind of a > >> mess to fix theses so I think it's enough for a first version. > > > > Yeah, don't worry about these. You can just tell pylint to shut up > > while you prototype. Sometimes it's just not worth spending more time > > on a more beautiful factoring. Oh well. > > > >> > >> I name that qmp-send as Daniel proposed, maybe qmp-test matches better > >> what I'm doing there ? > >> > > > > I think I agree with Dan's response. > > > >> Thanks, > >> Damien > >> --- > >> python/qemu/aqmp/qmp_send.py | 229 +++ > > > > I recommend putting this in qemu/util/qmp_send.py instead. > > > > I'm in the process of pulling out the AQMP lib and hosting it > > separately. Scripts like this I think should stay in the QEMU tree, so > > moving it to util instead is probably best. Otherwise, I'll *really* > > have to commit to the syntax, and that's probably a bigger hurdle than > > you want to deal with. > > If it stays in QEMU tree, what licensing should I use ? LGPL does not > hurt, no ? > Whichever you please. GPLv2+ would be convenient and harmonizes well with other tools. LGPL is only something I started doing so that the "qemu.qmp" package would be LGPL. Licensing the tools as LGPL was just a sin of convenience so I could claim a single license for the whole wheel/egg/tgz. (I didn't want to make separate qmp and qmp-tools packages.) Go with what you feel is best. > > > >> scripts/qmp/qmp-send | 11 ++ > >> 2 files changed, 240 insertions(+) > >> create mode 100644 python/qemu/aqmp/qmp_send.py > >> create mode 100755 scripts/qmp/qmp-send > >> > >> diff --git a/python/qemu/aqmp/qmp_send.py b/python/qemu/aqmp/qmp_send.py > >> new file mode 100644 > >> index 00..cbca1d0205 > >> --- /dev/null > >> +++ b/python/qemu/aqmp/qmp_send.py > > > > Seems broadly fine to me, but I didn't review closely this time. If it > > works for you, it works for me. > > > > As for making QEMU hang: there's a few things you could do, take a > > look at iotests and see how they handle timeout blocks in synchronous > > code -- iotests.py line 696 or so, "class Timeout". When writing async > > code, you can also do stuff like this: > > > > async def foo(): > > await asyncio.wait_for(qmp.execute("some-command", args_etc), > timeout=30) > > > > See https://docs.python.org/3/library/asyncio-task.html#asyncio.wait_for > > > > --js > > > > Thanks for the tip, > -- > Damien > Oh, and one more. the legacy.py bindings for AQMP also support a configurable timeout that applies to most API calls by default. see https://gitlab.com/jsnow/qemu.qmp/-/blob/main/qemu/qmp/legacy.py#L285 (Branch still in limbo here, but it should still be close to the same in qemu.git) I believe this is used by iotests.py when it sets up its machine.py subclass ("VM", iirc) so that most qmp invocations in iotests have a default timeout and won't hang tests indefinitely. --js >
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
On Tue, Apr 05, 2022, Quentin Perret wrote: > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: > > >> - it can be very useful for protected VMs to do shared=>private > > >>conversions. Think of a VM receiving some data from the host in a > > >>shared buffer, and then it wants to operate on that buffer without > > >>risking to leak confidential informations in a transient state. In > > >>that case the most logical thing to do is to convert the buffer back > > >>to private, do whatever needs to be done on that buffer (decrypting a > > >>frame, ...), and then share it back with the host to consume it; > > > > > > If performance is a motivation, why would the guest want to do two > > > conversions instead of just doing internal memcpy() to/from a private > > > page? I would be quite surprised if multiple exits and TLB shootdowns is > > > actually faster, especially at any kind of scale where zapping stage-2 > > > PTEs will cause lock contention and IPIs. > > > > I don't know the numbers or all the details, but this is arm64, which is a > > rather better architecture than x86 in this regard. So maybe it's not so > > bad, at least in very simple cases, ignoring all implementation details. > > (But see below.) Also the systems in question tend to have fewer CPUs than > > some of the massive x86 systems out there. > > Yep. I can try and do some measurements if that's really necessary, but > I'm really convinced the cost of the TLBI for the shared->private > conversion is going to be significantly smaller than the cost of memcpy > the buffer twice in the guest for us. It's not just the TLB shootdown, the VM-Exits aren't free. And barring non-trivial improvements to KVM's MMU, e.g. sharding of mmu_lock, modifying the page tables will block all other updates and MMU operations. Taking mmu_lock for read, should arm64 ever convert to a rwlock, is not an option because KVM needs to block other conversions to avoid races. Hmm, though batching multiple pages into a single request would mitigate most of the overhead. > There are variations of that idea: e.g. allow userspace to mmap the > entire private fd but w/o taking a reference on pages mapped with > PROT_NONE. And then the VMM can use mprotect() in response to > share/unshare requests. I think Marc liked that idea as it keeps the > userspace API closer to normal KVM -- there actually is a > straightforward gpa->hva relation. Not sure how much that would impact > the implementation at this point. > > For the shared=>private conversion, this would be something like so: > > - the guest issues a hypercall to unshare a page; > > - the hypervisor forwards the request to the host; > > - the host kernel forwards the request to userspace; > > - userspace then munmap()s the shared page; > > - KVM then tries to take a reference to the page. If it succeeds, it >re-enters the guest with a flag of some sort saying that the share >succeeded, and the hypervisor will adjust pgtables accordingly. If >KVM failed to take a reference, it flags this and the hypervisor will >be responsible for communicating that back to the guest. This means >the guest must handle failures (possibly fatal). > > (There are probably many ways in which we can optimize this, e.g. by > having the host proactively munmap() pages it no longer needs so that > the unshare hypercall from the guest doesn't need to exit all the way > back to host userspace.) ... > > Maybe there could be a special mode for the private memory fds in which > > specific pages are marked as "managed by this fd but actually shared". > > pread() and pwrite() would work on those pages, but not mmap(). (Or maybe > > mmap() but the resulting mappings would not permit GUP.) Unless I misunderstand what you intend by pread()/pwrite(), I think we'd need to allow mmap(), otherwise e.g. uaccess from the kernel wouldn't work. > > And transitioning them would be a special operation on the fd that is > > specific to pKVM and wouldn't work on TDX or SEV. To keep things feature agnostic (IMO, baking TDX vs SEV vs pKVM info into private-fd is a really bad idea), this could be handled by adding a flag and/or callback into the notifier/client stating whether or not it supports mapping a private-fd, and then mapping would be allowed if and only if all consumers support/allow mapping. > > Hmm. Sean and Chao, are we making a bit of a mistake by making these fds > > technology-agnostic? That is, would we want to distinguish between a TDX > > backing fd, a SEV backing fd, a software-based backing fd, etc? API-wise > > this could work by requiring the fd to be bound to a KVM VM instance and > > possibly even configured a bit before any other operations would be > > allowed. I really don't want to distinguish between between each exact feature, but I've no objection to adding flags/callbacks to track specific properties of the downstream consumers, e.g. "can this
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05/04/2022 um 17:04 schrieb Kevin Wolf: > Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben: >> Am 05/04/2022 um 12:14 schrieb Kevin Wolf: >>> I think all of this is really relevant for Emanuele's work, which >>> involves adding AIO_WAIT_WHILE() deep inside graph update functions. I >>> fully expect that we would see very similar problems, and just stacking >>> drain sections over drain sections that might happen to usually fix >>> things, but aren't guaranteed to, doesn't look like a good solution. >> >> Yes, I think at this point we all agreed to drop subtree_drain as >> replacement for AioContext. >> >> The alternative is what Paolo proposed in the other thread " Removal of >> AioContext lock, bs->parents and ->children: proof of concept" >> I am not sure which thread you replied first :) > > This one, I think. :-) > >> I think that proposal is not far from your idea, and it avoids to >> introduce or even use drains at all. >> Not sure why you called it a "step backwards even from AioContext locks". > > I was only referring to the lock locality there. AioContext locks are > really coarse, but still a finer granularity than a single global lock. > > In the big picture, it's still be better than the AioContext lock, but > that's because it's a different type of lock, not because it has better > locality. > > So I was just wondering if we can't have the different type of lock and > make it local to the BDS, too. I guess this is the right time to discuss this. I think that a global lock will be easier to handle, and we already have a concrete implementation (cpus-common). I think that the reads in some sense are already BDS-specific, because each BDS that is reading has an internal a flag. Writes, on the other hand, are global. If a write is happening, no other read at all can run, even if it has nothing to do with it. The question then is: how difficult would be to implement a BDS-specific write? >From the API prospective, change bdrv_graph_wrlock(void); into bdrv_graph_wrlock(BlockDriverState *parent, BlockDriverState *child); I am not sure if/how complicated it will be. For sure all the global variables would end up in the BDS struct. On the other side, also making instead read generic could be interesting. Think about drain: it is a recursive function, and it doesn't really make sense to take the rdlock for each node it traverses. Even though I don't know an easy way to replace ->has_waiter and ->reading_graph flags... Emanuele
Re: [PATCH v5 00/13] KVM: mm: fd-based approach for supporting KVM guest private memory
On Tue, Apr 5, 2022, at 3:36 AM, Quentin Perret wrote: > On Monday 04 Apr 2022 at 15:04:17 (-0700), Andy Lutomirski wrote: >> >> >> On Mon, Apr 4, 2022, at 10:06 AM, Sean Christopherson wrote: >> > On Mon, Apr 04, 2022, Quentin Perret wrote: >> >> On Friday 01 Apr 2022 at 12:56:50 (-0700), Andy Lutomirski wrote: >> >> FWIW, there are a couple of reasons why I'd like to have in-place >> >> conversions: >> >> >> >> - one goal of pKVM is to migrate some things away from the Arm >> >>Trustzone environment (e.g. DRM and the likes) and into protected VMs >> >>instead. This will give Linux a fighting chance to defend itself >> >>against these things -- they currently have access to _all_ memory. >> >>And transitioning pages between Linux and Trustzone (donations and >> >>shares) is fast and non-destructive, so we really do not want pKVM to >> >>regress by requiring the hypervisor to memcpy things; >> > >> > Is there actually a _need_ for the conversion to be non-destructive? >> > E.g. I assume >> > the "trusted" side of things will need to be reworked to run as a pKVM >> > guest, at >> > which point reworking its logic to understand that conversions are >> > destructive and >> > slow-ish doesn't seem too onerous. >> > >> >> - it can be very useful for protected VMs to do shared=>private >> >>conversions. Think of a VM receiving some data from the host in a >> >>shared buffer, and then it wants to operate on that buffer without >> >>risking to leak confidential informations in a transient state. In >> >>that case the most logical thing to do is to convert the buffer back >> >>to private, do whatever needs to be done on that buffer (decrypting a >> >>frame, ...), and then share it back with the host to consume it; >> > >> > If performance is a motivation, why would the guest want to do two >> > conversions >> > instead of just doing internal memcpy() to/from a private page? I >> > would be quite >> > surprised if multiple exits and TLB shootdowns is actually faster, >> > especially at >> > any kind of scale where zapping stage-2 PTEs will cause lock contention >> > and IPIs. >> >> I don't know the numbers or all the details, but this is arm64, which is a >> rather better architecture than x86 in this regard. So maybe it's not so >> bad, at least in very simple cases, ignoring all implementation details. >> (But see below.) Also the systems in question tend to have fewer CPUs than >> some of the massive x86 systems out there. > > Yep. I can try and do some measurements if that's really necessary, but > I'm really convinced the cost of the TLBI for the shared->private > conversion is going to be significantly smaller than the cost of memcpy > the buffer twice in the guest for us. To be fair, although the cost for > the CPU update is going to be low, the cost for IOMMU updates _might_ be > higher, but that very much depends on the hardware. On systems that use > e.g. the Arm SMMU, the IOMMUs can use the CPU page-tables directly, and > the iotlb invalidation is done on the back of the CPU invalidation. So, > on systems with sane hardware the overhead is *really* quite small. > > Also, memcpy requires double the memory, it is pretty bad for power, and > it causes memory traffic which can't be a good thing for things running > concurrently. > >> If we actually wanted to support transitioning the same page between shared >> and private, though, we have a bit of an awkward situation. Private to >> shared is conceptually easy -- do some bookkeeping, reconstitute the direct >> map entry, and it's done. The other direction is a mess: all existing uses >> of the page need to be torn down. If the page has been recently used for >> DMA, this includes IOMMU entries. >> >> Quentin: let's ignore any API issues for now. Do you have a concept of how >> a nondestructive shared -> private transition could work well, even in >> principle? > > I had a high level idea for the workflow, but I haven't looked into the > implementation details. > > The idea would be to allow KVM *or* userspace to take a reference > to a page in the fd in an exclusive manner. KVM could take a reference > on a page (which would be necessary before to donating it to a guest) > using some kind of memfile_notifier as proposed in this series, and > userspace could do the same some other way (mmap presumably?). In both > cases, the operation might fail. > > I would imagine the boot and private->shared flow as follow: > > - the VMM uses fallocate on the private fd, and associates the offset, size> with a memslot; > > - the guest boots, and as part of that KVM takes references to all the >pages that are donated to the guest. If userspace happens to have a >mapping to a page, KVM will fail to take the reference, which would >be fatal for the guest. > > - once the guest has booted, it issues a hypercall to share a page back >with the host; > > - KVM is notified,
Re: [PATCH v4 10/11] tests/tcg/s390x: Tests for Vector Enhancements Facility 2
Recommendation for comment? /* vri-d encoding matches vrr for 4b imm. .insn does not handle this encoding variant. */ Christian: I will push another patch version as soon as that's decided. (unless you prefer to choose the comment and edit during staging) On Tue, Apr 5, 2022 at 6:13 AM David Hildenbrand wrote: > > On 01.04.22 17:25, Christian Borntraeger wrote: > > Am 01.04.22 um 17:02 schrieb David Miller: > >> vrr is almost a perfect match (it is for this, larger than imm4 would > >> need to be split). > >> > >> .long : this would be uglier. > >> use enough to be filled with nops after ? > >> or use a 32b and 16b instead if it's in .text it should make no difference. > > > > I will let Richard or David decide what they prefer. > > > > I don't particularly care as long as there is a comment stating why we > need this hack. > > -- > Thanks, > > David / dhildenb >
Re: [PATCH v5 0/9] Add support for AST1030 SoC
Hello Jamin, On 4/1/22 10:38, Jamin Lin wrote: Changes from v5: - remove TYPE_ASPEED_MINIBMC_MACHINE and ASPEED_MINIBMC_MACHINE - remove ast1030_machine_instance_init function Changes from v4: - drop the ASPEED_SMC_FEATURE_WDT_CONTROL flag in hw/ssi/aspeed_smc.c Changes from v3: - remove AspeedMiniBmcMachineState state structure and AspeedMiniBmcMachineClass class - remove redundant new line in hw/arm/aspeed_ast10xx.c Do we want to be in sync with the zephyr naming and use ast10x0.c ? https://github.com/zephyrproject-rtos/zephyr/tree/main/soc/arm/aspeed This is just a question. Don't resend for this. Thanks, C.
Re: [PATCH v3 3/3] qcow2: Add errp to rebuild_refcount_structure()
On Tue, Apr 05, 2022 at 03:46:52PM +0200, Hanna Reitz wrote: > Instead of fprint()-ing error messages in rebuild_refcount_structure() > and its rebuild_refcounts_write_refblocks() helper, pass them through an > Error object to qcow2_check_refcounts() (which will then print it). > > Suggested-by: Eric Blake > Signed-off-by: Hanna Reitz > --- > block/qcow2-refcount.c | 33 +++-- > 1 file changed, 19 insertions(+), 14 deletions(-) > > diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c > index c5669eaa51..ed0ecfaa89 100644 > --- a/block/qcow2-refcount.c > +++ b/block/qcow2-refcount.c > @@ -2465,7 +2465,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs, > static int rebuild_refcounts_write_refblocks( > BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters, > int64_t first_cluster, int64_t end_cluster, > -uint64_t **on_disk_reftable_ptr, uint32_t > *on_disk_reftable_entries_ptr > +uint64_t **on_disk_reftable_ptr, uint32_t > *on_disk_reftable_entries_ptr, > +Error **errp > ) > { > BDRVQcow2State *s = bs->opaque; > @@ -2516,8 +2517,8 @@ static int rebuild_refcounts_write_refblocks( >nb_clusters, >_free_cluster); > if (refblock_offset < 0) { > -fprintf(stderr, "ERROR allocating refblock: %s\n", > -strerror(-refblock_offset)); > +error_setg_errno(errp, -refblock_offset, > + "ERROR allocating refblock"); Most uses of error_setg* don't ALL_CAPS the first word. But this is pre-existing, so I'm not insisting you change it here. > return refblock_offset; > } > > @@ -2539,6 +2540,7 @@ static int rebuild_refcounts_write_refblocks( >on_disk_reftable_entries * >REFTABLE_ENTRY_SIZE); > if (!on_disk_reftable) { > +error_setg(errp, "ERROR allocating reftable memory"); > return -ENOMEM; Ah, so this is also a corner case bug fix, where we didn't have a message on all error paths. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [qemu.qmp PATCH 10/13] docs: add versioning policy to README
On Tue, Apr 5, 2022, 5:16 AM Damien Hedde wrote: > > > On 3/30/22 20:24, John Snow wrote: > > The package is in an alpha state, but there's a method to the madness. > > > > Signed-off-by: John Snow > > --- > > README.rst | 21 + > > 1 file changed, 21 insertions(+) > > > > diff --git a/README.rst b/README.rst > > index 8593259..88efe84 100644 > > --- a/README.rst > > +++ b/README.rst > > @@ -154,6 +154,27 @@ fail. These checks use their own virtual > environments and won't pollute > > your working space. > > > > > > +Stability and Versioning > > + > > + > > +This package uses a major.minor.micro SemVer versioning, with the > > +following additional semantics during the alpha/beta period (Major > > +version 0): > > + > > +This package treats 0.0.z versions as "alpha" versions. Each micro > > +version update may change the API incompatibly. Early users are advised > > +to pin against explicit versions, but check for updates often. > > + > > +A planned 0.1.z version will introduce the first "beta", whereafter each > > +micro update will be backwards compatible, but each minor update will > > +not be. The first beta version will be released after legacy.py is > > +removed, and the API is tentatively "stable". > > + > > +Thereafter, normal SemVer/PEP440 rules will apply; micro updates will > > +always be bugfixes, and minor updates will be reserved for backwards > > +compatible feature changes. > > + > > + > > Changelog > > - > > > > Looks reasonable to me. > Reviewed-by: Damien Hedde > Thanks! I'm hoping to make it easier to spin up more dev tooling outside of the qemu tree. If you've got any wishlist items, feel free to let me know. It's still early days for Python packages outside of the qemu tree, so nearly everything is on the table still. (the jsnow/python staging branch has some 17 patches in it that will be checked in to QEMU when development re-opens. The forked qemu.qmp repo will be based off of qemu.git after those patches go in. There's a bit of shakeup where I delete the old qmp lib and replace it with what's currently aqmp. It should hopefully not be a huge nuisance to your work, but if there's issues, let me know.) Thanks, --John Snow
Re: [RFC PATCH] docs/devel: start documenting writing VirtIO devices
Cornelia Huck writes: > On Wed, Mar 16 2022, Alex Bennée wrote: > >> Cornelia Huck writes: >> >>> On Wed, Mar 09 2022, Alex Bennée wrote: > +Writing VirtIO backends for QEMU + + +This document attempts to outline the information a developer needs to +know to write backends for QEMU. It is specifically focused on +implementing VirtIO devices. >>> >>> I think you first need to define a bit more clearly what you consider a >>> "backend". For virtio, it is probably "everything a device needs to >>> function as a specific device type like net, block, etc., which may be >>> implemented by different methods" (as you describe further below). >> >> How about: >> >> This document attempts to outline the information a developer needs to >> know to write device emulations in QEMU. It is specifically focused on >> implementing VirtIO devices. For VirtIO the frontend is the driver >> running on the guest. The backend is the everything that QEMU needs to >> do to handle the emulation of the VirtIO device. This can be done >> entirely in QEMU, divided between QEMU and the kernel (vhost) or >> handled by a separate process which is configured by QEMU >> (vhost-user). > > I'm afraid that confuses me even more :) > > This sounds to me like frontend == driver (in virtio spec terminology) > and backend == device. Is that really what you meant? I think so. To be honest it's the different types of backend (in QEMU, vhost and vhost-user) I'm trying to be clear about here. The frontend/driver is just mentioned for completeness. > >> >>> + +Front End Transports + + +VirtIO supports a number of different front end transports. The +details of the device remain the same but there are differences in +command line for specifying the device (e.g. -device virtio-foo +and -device virtio-foo-pci). For example: + +.. code:: c + + static const TypeInfo vhost_user_blk_info = { + .name = TYPE_VHOST_USER_BLK, + .parent = TYPE_VIRTIO_DEVICE, + .instance_size = sizeof(VHostUserBlk), + .instance_init = vhost_user_blk_instance_init, + .class_init = vhost_user_blk_class_init, + }; + +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic +``TYPE_VIRTIO_DEVICE``. >>> >>> That's not what I'd consider a "front end", though? >> >> Yeah clumsy wording. I'm trying to get find a good example to show how >> QOM can be used to abstract the core device operation and the wrappers >> for different transports. However in the code base there seems to be >> considerable variation about how this is done. Any advice as to the >> best exemplary device to follow is greatly welcomed. > > I'm not sure which of the example we can really consider a "good" > device; the normal modus operandi when writing a new device seems to be > "pick the first device you can think of and copy whatever it > does". Yeah the QEMU curse. Hence trying to document the "best" approach or at least make the picking of a reference a little less random ;-) > Personally, I usally look at blk or net, but those carry a lot of > legacy baggage; so maybe a modern virtio-1 only device like gpu? That > one also has the advantage of not being pci-only. > > Does anyone else have a good suggestion here? Sorry I totally forgot to include you in the Cc of the v1 posting: Subject: [PATCH v1 09/13] docs/devel: start documenting writing VirtIO devices Date: Mon, 21 Mar 2022 15:30:33 + Message-Id: <20220321153037.3622127-10-alex.ben...@linaro.org> although expect a v2 soonish (once I can get a reasonable qos-test vhost-user test working). > >> And then for the PCI device it wraps around the +base device (although explicitly initialising via +virtio_instance_init_common): + +.. code:: c + + struct VHostUserBlkPCI { + VirtIOPCIProxy parent_obj; + VHostUserBlk vdev; + }; >>> >>> The VirtIOPCIProxy seems to materialize a bit out of thin air >>> here... maybe the information simply needs to be structured in a >>> different way? Perhaps: >>> >>> - describe that virtio devices consist of a part that implements the >>> device functionality, which ultimately derives from VirtIODevice (the >>> "backend"), and a part that exposes a way for the operating system to >>> discover and use the device (the "frontend", what the virtio spec >>> calls a "transport") >>> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy, >>> VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for >>> PCI, MMIO, and CCW devices) >>> - list the different types of "backends" (as you did below), and give >>> two examples of how VirtIODevice is extended (a plain one, and a >>> vhost-user one) >>> - explain how frontend and backend together create an actual device >>> (with the two device
Re: [PATCH] docs/ccid: convert to restructuredText
On 4/5/22 16:29, oxr...@gmx.us wrote: From: Lucas Ramage Buglink: https://gitlab.com/qemu-project/qemu/-/issues/527 Signed-off-by: Lucas Ramage Provided 2 minors tweaks (see below: missing empty line, and empty line at EOF), Reviewed-by: Damien Hedde Note that I'm not competent regarding the content of this doc. But it corresponds to the previous version and the doc generation works. --- docs/ccid.txt| 182 --- docs/system/device-emulation.rst | 1 + docs/system/devices/ccid.rst | 171 + 3 files changed, 172 insertions(+), 182 deletions(-) delete mode 100644 docs/ccid.txt create mode 100644 docs/system/devices/ccid.rst diff --git a/docs/ccid.txt b/docs/ccid.txt deleted file mode 100644 index 2b85b1bd42..00 --- a/docs/ccid.txt +++ /dev/null @@ -1,182 +0,0 @@ -QEMU CCID Device Documentation. - -Contents -1. USB CCID device -2. Building -3. Using ccid-card-emulated with hardware -4. Using ccid-card-emulated with certificates -5. Using ccid-card-passthru with client side hardware -6. Using ccid-card-passthru with client side certificates -7. Passthrough protocol scenario -8. libcacard - -1. USB CCID device - -The USB CCID device is a USB device implementing the CCID specification, which -lets one connect smart card readers that implement the same spec. For more -information see the specification: - - Universal Serial Bus - Device Class: Smart Card - CCID - Specification for - Integrated Circuit(s) Cards Interface Devices - Revision 1.1 - April 22rd, 2005 - -Smartcards are used for authentication, single sign on, decryption in -public/private schemes and digital signatures. A smartcard reader on the client -cannot be used on a guest with simple usb passthrough since it will then not be -available on the client, possibly locking the computer when it is "removed". On -the other hand this device can let you use the smartcard on both the client and -the guest machine. It is also possible to have a completely virtual smart card -reader and smart card (i.e. not backed by a physical device) using this device. - -2. Building - -The cryptographic functions and access to the physical card is done via the -libcacard library, whose development package must be installed prior to -building QEMU: - -In redhat/fedora: -yum install libcacard-devel -In ubuntu: -apt-get install libcacard-dev - -Configuring and building: -./configure --enable-smartcard && make - - -3. Using ccid-card-emulated with hardware - -Assuming you have a working smartcard on the host with the current -user, using libcacard, QEMU acts as another client using ccid-card-emulated: - -qemu -usb -device usb-ccid -device ccid-card-emulated - - -4. Using ccid-card-emulated with certificates stored in files - -You must create the CA and card certificates. This is a one time process. -We use NSS certificates: - -mkdir fake-smartcard -cd fake-smartcard -certutil -N -d sql:$PWD -certutil -S -d sql:$PWD -s "CN=Fake Smart Card CA" -x -t TC,TC,TC -n fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe" -n id-cert -c fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (signing)" --nsCertType smime -n signing-cert -c fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (encryption)" --nsCertType sslClient -n encryption-cert -c fake-smartcard-ca - -Note: you must have exactly three certificates. - -You can use the emulated card type with the certificates backend: - -qemu -usb -device usb-ccid -device ccid-card-emulated,backend=certificates,db=sql:$PWD,cert1=id-cert,cert2=signing-cert,cert3=encryption-cert - -To use the certificates in the guest, export the CA certificate: - -certutil -L -r -d sql:$PWD -o fake-smartcard-ca.cer -n fake-smartcard-ca - -and import it in the guest: - -certutil -A -d /etc/pki/nssdb -i fake-smartcard-ca.cer -t TC,TC,TC -n fake-smartcard-ca - -In a Linux guest you can then use the CoolKey PKCS #11 module to access -the card: - -certutil -d /etc/pki/nssdb -L -h all - -It will prompt you for the PIN (which is the password you assigned to the -certificate database early on), and then show you all three certificates -together with the manually imported CA cert: - -Certificate NicknameTrust Attributes -fake-smartcard-ca CT,C,C -John Doe:CAC ID Certificate u,u,u -John Doe:CAC Email Signature Certificateu,u,u -John Doe:CAC Email Encryption Certificate u,u,u - -If this does not happen, CoolKey is not installed or not registered with -NSS. Registration can be done from Firefox or the command line: - -modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile /usr/lib64/pkcs11/libcoolkeypk11.so -modutil -dbdir /etc/pki/nssdb -list - - -5. Using ccid-card-passthru with client side hardware - -on the host specify the ccid-card-passthru device
[RFC v2 7/8] blkio: implement BDRV_REQ_REGISTERED_BUF optimization
Avoid bounce buffers when QEMUIOVector elements are within previously registered bdrv_register_buf() buffers. The idea is that emulated storage controllers will register guest RAM using bdrv_register_buf() and set the BDRV_REQ_REGISTERED_BUF on I/O requests. Therefore no blkio_add_mem_region() calls are necessary in the performance-critical I/O code path. This optimization doesn't apply if the I/O buffer is internally allocated by QEMU (e.g. qcow2 metadata). There we still take the slow path because BDRV_REQ_REGISTERED_BUF is not set. Signed-off-by: Stefan Hajnoczi --- block/blkio.c | 106 +++--- 1 file changed, 101 insertions(+), 5 deletions(-) diff --git a/block/blkio.c b/block/blkio.c index 562e972003..41894c7015 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -1,7 +1,9 @@ #include "qemu/osdep.h" #include #include "block/block_int.h" +#include "exec/memory.h" #include "qapi/error.h" +#include "qemu/error-report.h" #include "qapi/qmp/qdict.h" #include "qemu/module.h" @@ -25,6 +27,9 @@ typedef struct { /* Can we skip adding/deleting blkio_mem_regions? */ bool needs_mem_regions; + +/* Are file descriptors necessary for blkio_mem_regions? */ +bool needs_mem_region_fd; } BDRVBlkioState; static void blkio_aiocb_complete(BlkioAIOCB *acb, int ret) @@ -157,6 +162,8 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset, BlockCompletionFunc *cb, void *opaque) { BDRVBlkioState *s = bs->opaque; +bool needs_mem_regions = +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF); struct iovec *iov = qiov->iov; int iovcnt = qiov->niov; BlkioAIOCB *acb; @@ -166,7 +173,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset, acb = blkio_aiocb_get(bs, cb, opaque); -if (s->needs_mem_regions) { +if (needs_mem_regions) { if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) { qemu_aio_unref(>common); return NULL; @@ -181,7 +188,7 @@ static BlockAIOCB *blkio_aio_preadv(BlockDriverState *bs, int64_t offset, ret = blkioq_readv(s->blkioq, offset, iov, iovcnt, acb, 0); if (ret < 0) { -if (s->needs_mem_regions) { +if (needs_mem_regions) { blkio_free_mem_region(s->blkio, >mem_region); qemu_iovec_destroy(>qiov); } @@ -202,6 +209,8 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset, { uint32_t blkio_flags = (flags & BDRV_REQ_FUA) ? BLKIO_REQ_FUA : 0; BDRVBlkioState *s = bs->opaque; +bool needs_mem_regions = +s->needs_mem_regions && !(flags & BDRV_REQ_REGISTERED_BUF); struct iovec *iov = qiov->iov; int iovcnt = qiov->niov; BlkioAIOCB *acb; @@ -211,7 +220,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset, acb = blkio_aiocb_get(bs, cb, opaque); -if (s->needs_mem_regions) { +if (needs_mem_regions) { if (blkio_aiocb_init_mem_region_locked(acb, bytes) < 0) { qemu_aio_unref(>common); return NULL; @@ -225,7 +234,7 @@ static BlockAIOCB *blkio_aio_pwritev(BlockDriverState *bs, int64_t offset, ret = blkioq_writev(s->blkioq, offset, iov, iovcnt, acb, blkio_flags); if (ret < 0) { -if (s->needs_mem_regions) { +if (needs_mem_regions) { blkio_free_mem_region(s->blkio, >mem_region); } qemu_aio_unref(>common); @@ -273,6 +282,80 @@ static void blkio_io_unplug(BlockDriverState *bs) } } +static void blkio_register_buf(BlockDriverState *bs, void *host, size_t size) +{ +BDRVBlkioState *s = bs->opaque; +int ret; +struct blkio_mem_region region = (struct blkio_mem_region){ +.addr = host, +.len = size, +.fd = -1, +}; + +if (((uintptr_t)host | size) % s->mem_region_alignment) { +error_report_once("%s: skipping unaligned buf %p with size %zu", + __func__, host, size); +return; /* skip unaligned */ +} + +/* Attempt to find the fd for a MemoryRegion */ +if (s->needs_mem_region_fd) { +int fd = -1; +ram_addr_t offset; +MemoryRegion *mr; + +/* + * bdrv_register_buf() is called with the BQL held so mr lives at least + * until this function returns. + */ +mr = memory_region_from_host(host, ); +if (mr) { +fd = memory_region_get_fd(mr); +} +if (fd == -1) { +error_report_once("%s: skipping fd-less buf %p with size %zu", + __func__, host, size); +return; /* skip if there is no fd */ +} + +region.fd = fd; +region.fd_offset = offset; +} + +WITH_QEMU_LOCK_GUARD(>lock) { +ret = blkio_add_mem_region(s->blkio, ); +} + +if (ret < 0) { +error_report_once("Failed to add blkio mem
[RFC v2 8/8] virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint
Register guest RAM using BlockRAMRegistrar and set the BDRV_REQ_REGISTERED_BUF flag so block drivers can optimize memory accesses in I/O requests. This is for vdpa-blk, vhost-user-blk, and other I/O interfaces that rely on DMA mapping/unmapping. Signed-off-by: Stefan Hajnoczi --- include/hw/virtio/virtio-blk.h | 2 ++ hw/block/virtio-blk.c | 13 + 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h index d311c57cca..7f589b4146 100644 --- a/include/hw/virtio/virtio-blk.h +++ b/include/hw/virtio/virtio-blk.h @@ -19,6 +19,7 @@ #include "hw/block/block.h" #include "sysemu/iothread.h" #include "sysemu/block-backend.h" +#include "sysemu/block-ram-registrar.h" #include "qom/object.h" #define TYPE_VIRTIO_BLK "virtio-blk-device" @@ -64,6 +65,7 @@ struct VirtIOBlock { struct VirtIOBlockDataPlane *dataplane; uint64_t host_features; size_t config_size; +BlockRAMRegistrar blk_ram_registrar; }; typedef struct VirtIOBlockReq { diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index 540c38f829..a18cf05f14 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -21,6 +21,7 @@ #include "hw/block/block.h" #include "hw/qdev-properties.h" #include "sysemu/blockdev.h" +#include "sysemu/block-ram-registrar.h" #include "sysemu/sysemu.h" #include "sysemu/runstate.h" #include "hw/virtio/virtio-blk.h" @@ -421,11 +422,13 @@ static inline void submit_requests(BlockBackend *blk, MultiReqBuffer *mrb, } if (is_write) { -blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0, -virtio_blk_rw_complete, mrb->reqs[start]); +blk_aio_pwritev(blk, sector_num << BDRV_SECTOR_BITS, qiov, +BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete, +mrb->reqs[start]); } else { -blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, 0, - virtio_blk_rw_complete, mrb->reqs[start]); +blk_aio_preadv(blk, sector_num << BDRV_SECTOR_BITS, qiov, + BDRV_REQ_REGISTERED_BUF, virtio_blk_rw_complete, + mrb->reqs[start]); } } @@ -1228,6 +1231,7 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) } s->change = qemu_add_vm_change_state_handler(virtio_blk_dma_restart_cb, s); +blk_ram_registrar_init(>blk_ram_registrar, s->blk); blk_set_dev_ops(s->blk, _block_ops, s); blk_set_guest_block_size(s->blk, s->conf.conf.logical_block_size); @@ -1255,6 +1259,7 @@ static void virtio_blk_device_unrealize(DeviceState *dev) } qemu_coroutine_decrease_pool_batch_size(conf->num_queues * conf->queue_size / 2); +blk_ram_registrar_destroy(>blk_ram_registrar); qemu_del_vm_change_state_handler(s->change); blockdev_mark_auto_del(s->blk); virtio_cleanup(vdev); -- 2.35.1
[RFC v2 5/8] block: add BlockRAMRegistrar
Emulated devices and other BlockBackend users wishing to take advantage of blk_register_buf() all have the same repetitive job: register RAMBlocks with the BlockBackend using RAMBlockNotifier. Add a BlockRAMRegistrar API to do this. A later commit will use this from hw/block/virtio-blk.c. Signed-off-by: Stefan Hajnoczi --- MAINTAINERS | 1 + include/sysemu/block-ram-registrar.h | 30 + block/block-ram-registrar.c | 39 block/meson.build| 1 + 4 files changed, 71 insertions(+) create mode 100644 include/sysemu/block-ram-registrar.h create mode 100644 block/block-ram-registrar.c diff --git a/MAINTAINERS b/MAINTAINERS index d839301f68..655f79c9f7 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -2462,6 +2462,7 @@ F: block* F: block/ F: hw/block/ F: include/block/ +F: include/sysemu/block-*.h F: qemu-img* F: docs/tools/qemu-img.rst F: qemu-io* diff --git a/include/sysemu/block-ram-registrar.h b/include/sysemu/block-ram-registrar.h new file mode 100644 index 00..09d63f64b2 --- /dev/null +++ b/include/sysemu/block-ram-registrar.h @@ -0,0 +1,30 @@ +/* + * BlockBackend RAM Registrar + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#ifndef BLOCK_RAM_REGISTRAR_H +#define BLOCK_RAM_REGISTRAR_H + +#include "exec/ramlist.h" + +/** + * struct BlockRAMRegistrar: + * + * Keeps RAMBlock memory registered with a BlockBackend using + * blk_register_buf() including hotplugged memory. + * + * Emulated devices or other BlockBackend users initialize a BlockRAMRegistrar + * with blk_ram_registrar_init() before submitting I/O requests with the + * BLK_REQ_REGISTERED_BUF flag set. + */ +typedef struct { +BlockBackend *blk; +RAMBlockNotifier notifier; +} BlockRAMRegistrar; + +void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk); +void blk_ram_registrar_destroy(BlockRAMRegistrar *r); + +#endif /* BLOCK_RAM_REGISTRAR_H */ diff --git a/block/block-ram-registrar.c b/block/block-ram-registrar.c new file mode 100644 index 00..32a14b69ae --- /dev/null +++ b/block/block-ram-registrar.c @@ -0,0 +1,39 @@ +/* + * BlockBackend RAM Registrar + * + * SPDX-License-Identifier: GPL-2.0-or-later + */ + +#include "qemu/osdep.h" +#include "sysemu/block-backend.h" +#include "sysemu/block-ram-registrar.h" + +static void ram_block_added(RAMBlockNotifier *n, void *host, size_t size, +size_t max_size) +{ +BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier); +blk_register_buf(r->blk, host, max_size); +} + +static void ram_block_removed(RAMBlockNotifier *n, void *host, size_t size, + size_t max_size) +{ +BlockRAMRegistrar *r = container_of(n, BlockRAMRegistrar, notifier); +blk_unregister_buf(r->blk, host, max_size); +} + +void blk_ram_registrar_init(BlockRAMRegistrar *r, BlockBackend *blk) +{ +r->blk = blk; +r->notifier = (RAMBlockNotifier){ +.ram_block_added = ram_block_added, +.ram_block_removed = ram_block_removed, +}; + +ram_block_notifier_add(>notifier); +} + +void blk_ram_registrar_destroy(BlockRAMRegistrar *r) +{ +ram_block_notifier_remove(>notifier); +} diff --git a/block/meson.build b/block/meson.build index 787667384a..b315593054 100644 --- a/block/meson.build +++ b/block/meson.build @@ -46,6 +46,7 @@ block_ss.add(files( ), zstd, zlib, gnutls) softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c')) +softmmu_ss.add(files('block-ram-registrar.c')) if get_option('qcow1').allowed() block_ss.add(files('qcow.c')) -- 2.35.1
[RFC v2 2/8] numa: call ->ram_block_removed() in ram_block_notifer_remove()
When a RAMBlockNotifier is added, ->ram_block_added() is called with all existing RAMBlocks. There is no equivalent ->ram_block_removed() call when a RAMBlockNotifier is removed. The util/vfio-helpers.c code (the sole user of RAMBlockNotifier) is fine with this asymmetry because it does not rely on RAMBlockNotifier for cleanup. It walks its internal list of DMA mappings and unmaps them by itself. Future users of RAMBlockNotifier may not have an internal data structure that records added RAMBlocks so they will need ->ram_block_removed() callbacks. This patch makes ram_block_notifier_remove() symmetric with respect to callbacks. Now util/vfio-helpers.c needs to unmap remaining DMA mappings after ram_block_notifier_remove() has been called. This is necessary since users like block/nvme.c may create additional DMA mappings that do not originate from the RAMBlockNotifier. Reviewed-by: David Hildenbrand Signed-off-by: Stefan Hajnoczi --- hw/core/numa.c | 17 + util/vfio-helpers.c | 5 - 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/hw/core/numa.c b/hw/core/numa.c index 1aa05dcf42..6bf9694d20 100644 --- a/hw/core/numa.c +++ b/hw/core/numa.c @@ -822,6 +822,19 @@ static int ram_block_notify_add_single(RAMBlock *rb, void *opaque) return 0; } +static int ram_block_notify_remove_single(RAMBlock *rb, void *opaque) +{ +const ram_addr_t max_size = qemu_ram_get_max_length(rb); +const ram_addr_t size = qemu_ram_get_used_length(rb); +void *host = qemu_ram_get_host_addr(rb); +RAMBlockNotifier *notifier = opaque; + +if (host) { +notifier->ram_block_removed(notifier, host, size, max_size); +} +return 0; +} + void ram_block_notifier_add(RAMBlockNotifier *n) { QLIST_INSERT_HEAD(_list.ramblock_notifiers, n, next); @@ -835,6 +848,10 @@ void ram_block_notifier_add(RAMBlockNotifier *n) void ram_block_notifier_remove(RAMBlockNotifier *n) { QLIST_REMOVE(n, next); + +if (n->ram_block_removed) { +qemu_ram_foreach_block(ram_block_notify_remove_single, n); +} } void ram_block_notify_add(void *host, size_t size, size_t max_size) diff --git a/util/vfio-helpers.c b/util/vfio-helpers.c index b037d5faa5..dc90496592 100644 --- a/util/vfio-helpers.c +++ b/util/vfio-helpers.c @@ -847,10 +847,13 @@ void qemu_vfio_close(QEMUVFIOState *s) if (!s) { return; } + +ram_block_notifier_remove(>ram_notifier); + for (i = 0; i < s->nr_mappings; ++i) { qemu_vfio_undo_mapping(s, >mappings[i], NULL); } -ram_block_notifier_remove(>ram_notifier); + g_free(s->usable_iova_ranges); s->nb_iova_ranges = 0; qemu_vfio_reset(s); -- 2.35.1
[RFC v2 6/8] stubs: add memory_region_from_host() and memory_region_get_fd()
The blkio block driver will need to look up the file descriptor for a given pointer. This is possible in softmmu builds where the memory API is available for querying guest RAM. Add stubs so tools like qemu-img that link the block layer still build successfully. In this case there is no guest RAM but that is fine. Bounce buffers and their file descriptors will be allocated with libblkio's blkio_alloc_mem_region() so we won't rely on QEMU's memory_region_get_fd() in that case. Signed-off-by: Stefan Hajnoczi --- stubs/memory.c| 13 + stubs/meson.build | 1 + 2 files changed, 14 insertions(+) create mode 100644 stubs/memory.c diff --git a/stubs/memory.c b/stubs/memory.c new file mode 100644 index 00..e9ec4e384b --- /dev/null +++ b/stubs/memory.c @@ -0,0 +1,13 @@ +#include "qemu/osdep.h" +#include "exec/memory.h" + +MemoryRegion *memory_region_from_host(void *host, ram_addr_t *offset) +{ +return NULL; +} + +int memory_region_get_fd(MemoryRegion *mr) +{ +return -1; +} + diff --git a/stubs/meson.build b/stubs/meson.build index 6f80fec761..1e274d2db2 100644 --- a/stubs/meson.build +++ b/stubs/meson.build @@ -25,6 +25,7 @@ stub_ss.add(files('is-daemonized.c')) if libaio.found() stub_ss.add(files('linux-aio.c')) endif +stub_ss.add(files('memory.c')) stub_ss.add(files('migr-blocker.c')) stub_ss.add(files('module-opts.c')) stub_ss.add(files('monitor.c')) -- 2.35.1
[RFC v2 4/8] block: add BDRV_REQ_REGISTERED_BUF request flag
Block drivers may optimize I/O requests accessing buffers previously registered with bdrv_register_buf(). Checking whether all elements of a request's QEMUIOVector are within previously registered buffers is expensive, so we need a hint from the user to avoid costly checks. Add a BDRV_REQ_REGISTERED_BUF request flag to indicate that all QEMUIOVector elements in an I/O request are known to be within previously registered buffers. bdrv_aligned_preadv() is strict in validating supported read flags and its assertions fail when it sees BDRV_REQ_REGISTERED_BUF. There is no harm in passing BDRV_REQ_REGISTERED_BUF to block drivers that do not support it, so update the assertions to ignore BDRV_REQ_REGISTERED_BUF. Care must be taken to clear the flag when the block layer or filter drivers replace QEMUIOVector elements with bounce buffers since these have not been registered with bdrv_register_buf(). A lot of the changes in this commit deal with clearing the flag in those cases. Ensuring that the flag is cleared properly is somewhat invasive to implement across the block layer and it's hard to spot when future code changes accidentally break it. Another option might be to add a flag to QEMUIOVector itself and clear it in qemu_iovec_*() functions that modify elements. That is more robust but somewhat of a layering violation, so I haven't attempted that. Signed-off-by: Stefan Hajnoczi --- include/block/block-common.h | 9 + block/blkverify.c| 4 ++-- block/crypto.c | 2 ++ block/io.c | 30 +++--- block/mirror.c | 2 ++ block/raw-format.c | 2 ++ 6 files changed, 40 insertions(+), 9 deletions(-) diff --git a/include/block/block-common.h b/include/block/block-common.h index fdb7306e78..061606e867 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -80,6 +80,15 @@ typedef enum { */ BDRV_REQ_MAY_UNMAP = 0x4, +/* + * An optimization hint when all QEMUIOVector elements are within + * previously registered bdrv_register_buf() memory ranges. + * + * Code that replaces the user's QEMUIOVector elements with bounce buffers + * must take care to clear this flag. + */ +BDRV_REQ_REGISTERED_BUF = 0x8, + BDRV_REQ_FUA= 0x10, BDRV_REQ_WRITE_COMPRESSED = 0x20, diff --git a/block/blkverify.c b/block/blkverify.c index e4a37af3b2..d624f4fd05 100644 --- a/block/blkverify.c +++ b/block/blkverify.c @@ -235,8 +235,8 @@ blkverify_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes, qemu_iovec_init(_qiov, qiov->niov); qemu_iovec_clone(_qiov, qiov, buf); -ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, flags, -false); +ret = blkverify_co_prwv(bs, , offset, bytes, qiov, _qiov, +flags & ~BDRV_REQ_REGISTERED_BUF, false); cmp_offset = qemu_iovec_compare(qiov, _qiov); if (cmp_offset != -1) { diff --git a/block/crypto.c b/block/crypto.c index 1ba82984ef..c900355adb 100644 --- a/block/crypto.c +++ b/block/crypto.c @@ -473,6 +473,8 @@ block_crypto_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes, uint64_t sector_size = qcrypto_block_get_sector_size(crypto->block); uint64_t payload_offset = qcrypto_block_get_payload_offset(crypto->block); +flags &= ~BDRV_REQ_REGISTERED_BUF; + assert(!(flags & ~BDRV_REQ_FUA)); assert(payload_offset < INT64_MAX); assert(QEMU_IS_ALIGNED(offset, sector_size)); diff --git a/block/io.c b/block/io.c index a8a7920e29..139e36c2e1 100644 --- a/block/io.c +++ b/block/io.c @@ -1556,11 +1556,14 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align); -/* TODO: We would need a per-BDS .supported_read_flags and +/* + * TODO: We would need a per-BDS .supported_read_flags and * potential fallback support, if we ever implement any read flags * to pass through to drivers. For now, there aren't any - * passthrough flags. */ -assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH))); + * passthrough flags except the BDRV_REQ_REGISTERED_BUF optimization hint. + */ +assert(!(flags & ~(BDRV_REQ_COPY_ON_READ | BDRV_REQ_PREFETCH | + BDRV_REQ_REGISTERED_BUF))); /* Handle Copy on Read and associated serialisation */ if (flags & BDRV_REQ_COPY_ON_READ) { @@ -1601,7 +1604,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, goto out; } -assert(!(flags & ~bs->supported_read_flags)); +assert(!(flags & ~(bs->supported_read_flags | BDRV_REQ_REGISTERED_BUF))); max_bytes = ROUND_UP(MAX(0, total_bytes - offset), align); if (bytes <= max_bytes && bytes <= max_transfer) { @@ -1790,7 +1793,8 @@ static void
[RFC v2 3/8] block: pass size to bdrv_unregister_buf()
The only implementor of bdrv_register_buf() is block/nvme.c, where the size is not needed when unregistering a buffer. This is because util/vfio-helpers.c can look up mappings by address. Future block drivers that implement bdrv_register_buf() may not be able to do their job given only the buffer address. Add a size argument to bdrv_unregister_buf(). Also document the assumptions about bdrv_register_buf()/bdrv_unregister_buf() calls. The same values that were given to bdrv_register_buf() must be given to bdrv_unregister_buf(). gcc 11.2.1 emits a spurious warning that img_bench()'s buf_size local variable might be uninitialized, so it's necessary to silence the compiler. Signed-off-by: Stefan Hajnoczi --- include/block/block-global-state.h | 5 - include/block/block_int-common.h| 2 +- include/sysemu/block-backend-global-state.h | 2 +- block/block-backend.c | 4 ++-- block/io.c | 6 +++--- block/nvme.c| 2 +- qemu-img.c | 4 ++-- 7 files changed, 14 insertions(+), 11 deletions(-) diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h index 25bb69bbef..2295a7c767 100644 --- a/include/block/block-global-state.h +++ b/include/block/block-global-state.h @@ -244,9 +244,12 @@ void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp); * Register/unregister a buffer for I/O. For example, VFIO drivers are * interested to know the memory areas that would later be used for I/O, so * that they can prepare IOMMU mapping etc., to get better performance. + * + * Buffers must not overlap and they must be unregistered with the same values that they were registered with. */ void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size); -void bdrv_unregister_buf(BlockDriverState *bs, void *host); +void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size); void bdrv_cancel_in_flight(BlockDriverState *bs); diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index 8947abab76..b7a7cbd3a5 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -435,7 +435,7 @@ struct BlockDriver { * DMA mapping for hot buffers. */ void (*bdrv_register_buf)(BlockDriverState *bs, void *host, size_t size); -void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host); +void (*bdrv_unregister_buf)(BlockDriverState *bs, void *host, size_t size); /* * This field is modified only under the BQL, and is part of diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h index 2e93a74679..989ec0364b 100644 --- a/include/sysemu/block-backend-global-state.h +++ b/include/sysemu/block-backend-global-state.h @@ -107,7 +107,7 @@ void blk_io_limits_update_group(BlockBackend *blk, const char *group); void blk_set_force_allow_inactivate(BlockBackend *blk); void blk_register_buf(BlockBackend *blk, void *host, size_t size); -void blk_unregister_buf(BlockBackend *blk, void *host); +void blk_unregister_buf(BlockBackend *blk, void *host, size_t size); const BdrvChild *blk_root(BlockBackend *blk); diff --git a/block/block-backend.c b/block/block-backend.c index e0e1aff4b1..8af00d8a36 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2591,10 +2591,10 @@ void blk_register_buf(BlockBackend *blk, void *host, size_t size) bdrv_register_buf(blk_bs(blk), host, size); } -void blk_unregister_buf(BlockBackend *blk, void *host) +void blk_unregister_buf(BlockBackend *blk, void *host, size_t size) { GLOBAL_STATE_CODE(); -bdrv_unregister_buf(blk_bs(blk), host); +bdrv_unregister_buf(blk_bs(blk), host, size); } int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, diff --git a/block/io.c b/block/io.c index 3280144a17..a8a7920e29 100644 --- a/block/io.c +++ b/block/io.c @@ -3365,16 +3365,16 @@ void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size) } } -void bdrv_unregister_buf(BlockDriverState *bs, void *host) +void bdrv_unregister_buf(BlockDriverState *bs, void *host, size_t size) { BdrvChild *child; GLOBAL_STATE_CODE(); if (bs->drv && bs->drv->bdrv_unregister_buf) { -bs->drv->bdrv_unregister_buf(bs, host); +bs->drv->bdrv_unregister_buf(bs, host, size); } QLIST_FOREACH(child, >children, next) { -bdrv_unregister_buf(child->bs, host); +bdrv_unregister_buf(child->bs, host, size); } } diff --git a/block/nvme.c b/block/nvme.c index 552029931d..88485e77f1 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -1592,7 +1592,7 @@ static void nvme_register_buf(BlockDriverState *bs, void *host, size_t size) } } -static void nvme_unregister_buf(BlockDriverState *bs, void *host) +static void nvme_unregister_buf(BlockDriverState *bs, void *host, size_t
[RFC v2 1/8] blkio: add io_uring block driver using libblkio
libblkio (https://gitlab.com/libblkio/libblkio/) is a library for high-performance disk I/O. It currently supports io_uring with additional drivers planned. One of the reasons for developing libblkio is that other applications besides QEMU can use it. This will be particularly useful for vhost-user-blk which applications may wish to use for connecting to qemu-storage-daemon. libblkio also gives us an opportunity to develop in Rust behind a C API that is easy to consume from QEMU. This commit adds an io_uring BlockDriver to QEMU using libblkio. For now I/O buffers are copied through bounce buffers if the libblkio driver requires it. Later commits add an optimization for pre-registering guest RAM to avoid bounce buffers. It will be easy to add other libblkio drivers since they will share the majority of code. Signed-off-by: Stefan Hajnoczi --- MAINTAINERS | 6 + meson_options.txt | 2 + qapi/block-core.json | 18 +- meson.build | 9 + block/blkio.c | 537 ++ tests/qtest/modules-test.c| 3 + block/meson.build | 1 + scripts/meson-buildoptions.sh | 3 + 8 files changed, 578 insertions(+), 1 deletion(-) create mode 100644 block/blkio.c diff --git a/MAINTAINERS b/MAINTAINERS index 4ad2451e03..d839301f68 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3349,6 +3349,12 @@ L: qemu-bl...@nongnu.org S: Maintained F: block/vdi.c +blkio +M: Stefan Hajnoczi +L: qemu-bl...@nongnu.org +S: Maintained +F: block/blkio.c + iSCSI M: Ronnie Sahlberg M: Paolo Bonzini diff --git a/meson_options.txt b/meson_options.txt index 52b11cead4..1e82e770e7 100644 --- a/meson_options.txt +++ b/meson_options.txt @@ -101,6 +101,8 @@ option('bzip2', type : 'feature', value : 'auto', description: 'bzip2 support for DMG images') option('cap_ng', type : 'feature', value : 'auto', description: 'cap_ng support') +option('blkio', type : 'feature', value : 'auto', + description: 'libblkio block device driver') option('bpf', type : 'feature', value : 'auto', description: 'eBPF support') option('cocoa', type : 'feature', value : 'auto', diff --git a/qapi/block-core.json b/qapi/block-core.json index 4a7a6940a3..c04e1e325b 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -2924,7 +2924,9 @@ 'file', 'snapshot-access', 'ftp', 'ftps', 'gluster', {'name': 'host_cdrom', 'if': 'HAVE_HOST_BLOCK_DEVICE' }, {'name': 'host_device', 'if': 'HAVE_HOST_BLOCK_DEVICE' }, -'http', 'https', 'iscsi', +'http', 'https', +{ 'name': 'io_uring', 'if': 'CONFIG_BLKIO' }, +'iscsi', 'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd', { 'name': 'replication', 'if': 'CONFIG_REPLICATION' }, @@ -3656,6 +3658,18 @@ '*debug': 'int', '*logfile': 'str' } } +## +# @BlockdevOptionsIoUring: +# +# Driver specific block device options for the io_uring backend. +# +# @filename: path to the image file +# +# Since: 6.3 +## +{ 'struct': 'BlockdevOptionsIoUring', + 'data': { 'filename': 'str' } } + ## # @IscsiTransport: # @@ -4254,6 +4268,8 @@ 'if': 'HAVE_HOST_BLOCK_DEVICE' }, 'http': 'BlockdevOptionsCurlHttp', 'https': 'BlockdevOptionsCurlHttps', + 'io_uring': { 'type': 'BlockdevOptionsIoUring', + 'if': 'CONFIG_BLKIO' }, 'iscsi': 'BlockdevOptionsIscsi', 'luks': 'BlockdevOptionsLUKS', 'nbd':'BlockdevOptionsNbd', diff --git a/meson.build b/meson.build index 861de93c4f..0ab17c8767 100644 --- a/meson.build +++ b/meson.build @@ -636,6 +636,13 @@ if not get_option('virglrenderer').auto() or have_system or have_vhost_user_gpu required: get_option('virglrenderer'), kwargs: static_kwargs) endif +blkio = not_found +if not get_option('blkio').auto() or have_block + blkio = dependency('blkio', + method: 'pkg-config', + required: get_option('blkio'), + kwargs: static_kwargs) +endif curl = not_found if not get_option('curl').auto() or have_block curl = dependency('libcurl', version: '>=7.29.0', @@ -1519,6 +1526,7 @@ config_host_data.set('CONFIG_LIBUDEV', libudev.found()) config_host_data.set('CONFIG_LZO', lzo.found()) config_host_data.set('CONFIG_MPATH', mpathpersist.found()) config_host_data.set('CONFIG_MPATH_NEW_API', mpathpersist_new_api) +config_host_data.set('CONFIG_BLKIO', blkio.found()) config_host_data.set('CONFIG_CURL', curl.found()) config_host_data.set('CONFIG_CURSES', curses.found()) config_host_data.set('CONFIG_GBM', gbm.found()) @@ -3672,6 +3680,7 @@ summary_info += {'PAM': pam} summary_info += {'iconv support': iconv}
[RFC v2 0/8] blkio: add libblkio BlockDriver
v2: - Add BDRV_REQ_REGISTERED_BUF to bs.supported_write_flags [Stefano] - Use new blkioq_get_num_completions() API - Implement .bdrv_refresh_limits() This patch series adds a QEMU BlockDriver for libblkio (https://gitlab.com/libblkio/libblkio/), a library for high-performance block device I/O. Currently libblkio has basic io_uring support with additional drivers in development. The first patch adds the core BlockDriver and most of the libblkio API usage. The remainder of the patch series reworks the existing QEMU bdrv_register_buf() API so virtio-blk emulation efficiently map guest RAM for libblkio - some libblkio drivers require that I/O buffer memory is pre-registered (think VFIO, vhost, etc). This block driver is functional enough to boot guests. See the BlockDriver struct in block/blkio.c for a list of APIs that still need to be implemented (write_zeroes and discard are in development, the others are not). I'm also waiting for libblkio to define queuing behavior and iovec lifetime requirements before sending this as a non-RFC patch. Regarding the design: each libblkio driver is a separately named BlockDriver. That means there is an "io_uring" BlockDriver and not a generic "libblkio" BlockDriver. In the future there will be additional BlockDrivers, all defined in block/blkio.c. This way QAPI and open parameters are type-safe and mandatory parameters can be checked by QEMU. Stefan Hajnoczi (8): blkio: add io_uring block driver using libblkio numa: call ->ram_block_removed() in ram_block_notifer_remove() block: pass size to bdrv_unregister_buf() block: add BDRV_REQ_REGISTERED_BUF request flag block: add BlockRAMRegistrar stubs: add memory_region_from_host() and memory_region_get_fd() blkio: implement BDRV_REQ_REGISTERED_BUF optimization virtio-blk: use BDRV_REQ_REGISTERED_BUF optimization hint MAINTAINERS | 7 + meson_options.txt | 2 + qapi/block-core.json| 18 +- meson.build | 9 + include/block/block-common.h| 9 + include/block/block-global-state.h | 5 +- include/block/block_int-common.h| 2 +- include/hw/virtio/virtio-blk.h | 2 + include/sysemu/block-backend-global-state.h | 2 +- include/sysemu/block-ram-registrar.h| 30 + block/blkio.c | 633 block/blkverify.c | 4 +- block/block-backend.c | 4 +- block/block-ram-registrar.c | 39 ++ block/crypto.c | 2 + block/io.c | 36 +- block/mirror.c | 2 + block/nvme.c| 2 +- block/raw-format.c | 2 + hw/block/virtio-blk.c | 13 +- hw/core/numa.c | 17 + qemu-img.c | 4 +- stubs/memory.c | 13 + tests/qtest/modules-test.c | 3 + util/vfio-helpers.c | 5 +- block/meson.build | 2 + scripts/meson-buildoptions.sh | 3 + stubs/meson.build | 1 + 28 files changed, 845 insertions(+), 26 deletions(-) create mode 100644 include/sysemu/block-ram-registrar.h create mode 100644 block/blkio.c create mode 100644 block/block-ram-registrar.c create mode 100644 stubs/memory.c -- 2.35.1
Re: [qemu.qmp PATCH 02/13] fork qemu.qmp from qemu.git
On Tue, Apr 5, 2022, 4:51 AM Kashyap Chamarthy wrote: > On Mon, Apr 04, 2022 at 02:56:10PM -0400, John Snow wrote: > > On Mon, Apr 4, 2022 at 2:54 PM John Snow wrote: > > [...] > > > > > > .gitignore | 2 +- > > > > > Makefile | 16 > > > > > setup.cfg | 24 +--- > > > > > setup.py | 2 +- > > > > > 4 files changed, 11 insertions(+), 33 deletions(-) > > > > > > > > The changes here look fine to me (and thanks for making it a "micro > > > > change"). I'll let sharper eyes than mine to give a closer look at > the > > > > `git filter-repo` surgery. Although, that looks fine to me too. > > > > > > > > [...] > > > > > > > > > .PHONY: distclean > > > > > distclean: clean > > > > > - rm -rf qemu.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/ > > > > > + rm -rf qemu.qmp.egg-info/ .venv/ .tox/ $(QEMU_VENV_DIR) dist/ > > > > > rm -f .coverage .coverage.* > > > > > rm -rf htmlcov/ > > > > > diff --git a/setup.cfg b/setup.cfg > > > > > index e877ea5..4ffab73 100644 > > > > > --- a/setup.cfg > > > > > +++ b/setup.cfg > > > > > @@ -1,5 +1,5 @@ > > > > > [metadata] > > > > > -name = qemu > > > > > +name = qemu.qmp > > > > > version = file:VERSION > > > > > maintainer = QEMU Developer Team > > > > > > > > In the spirit of patch 04 ("update maintainer metadata"), do you also > > > > want to update here too? s/QEMU Developer Team/QEMU Project? > > > > > > > > > > Good spot. > > > > ...Or, uh. That's exactly what I update in patch 04. Are you asking me > > to fold in that change earlier? I'm confused now. > > Oops, perils of reviewing late in the day. I missed to notice it's the > same file. You're right; please ignore my remark. Sorry for the noise. > I made the same mistake upon reading the feedback, so we're both guilty Thanks Kashyap, I appreciate the review. There's three more series here to apply to the new forked package (not yet re-sent to the ML): (2) Adding GitLab CI configuration. Not relevant for you, probably. (3) Adding Sphinx documentation. This builds jsnow.gitlab.io/qemu.qmp/ - I'd be appreciative of your feedback on this. I'm interested both in proofreading and in design feedback here. All comments welcome. [Though more rigorous changes to the design might be a "later" thing, but the feedback is welcomed all the same.] (4) Adding automatic package builds and git-based versioning to GitLab. Maybe also not too relevant for you. > > > -- > /kashyap > Thanks for your time!
Re: [PULL 00/10] QAPI patches patches for 2022-04-05
On Tue, 5 Apr 2022 at 11:35, Markus Armbruster wrote: > > I double-checked these patches affect *only* generated documentation. > Safe enough for 7.0, I think. But I'm quite content to hold on to > them until after the release, if that's preferred. > > The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5: > > Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into > staging (2022-04-04 15:48:55 +0100) > > are available in the Git repository at: > > git://repo.or.cz/qemu/armbru.git tags/pull-qapi-2022-04-05 > > for you to fetch changes up to 8230f3389c7d7215d0c3946d415f54b3e9c07f73: > > qapi: Fix calc-dirty-rate example (2022-04-05 12:30:45 +0200) > > > QAPI patches patches for 2022-04-05 > > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0 for any user-visible changes. -- PMM
Re: [RFC PATCH] docs/devel: start documenting writing VirtIO devices
On Wed, Mar 16 2022, Alex Bennée wrote: > Cornelia Huck writes: > >> On Wed, Mar 09 2022, Alex Bennée wrote: >>> +Writing VirtIO backends for QEMU >>> + >>> + >>> +This document attempts to outline the information a developer needs to >>> +know to write backends for QEMU. It is specifically focused on >>> +implementing VirtIO devices. >> >> I think you first need to define a bit more clearly what you consider a >> "backend". For virtio, it is probably "everything a device needs to >> function as a specific device type like net, block, etc., which may be >> implemented by different methods" (as you describe further below). > > How about: > > This document attempts to outline the information a developer needs to > know to write device emulations in QEMU. It is specifically focused on > implementing VirtIO devices. For VirtIO the frontend is the driver > running on the guest. The backend is the everything that QEMU needs to > do to handle the emulation of the VirtIO device. This can be done > entirely in QEMU, divided between QEMU and the kernel (vhost) or > handled by a separate process which is configured by QEMU > (vhost-user). I'm afraid that confuses me even more :) This sounds to me like frontend == driver (in virtio spec terminology) and backend == device. Is that really what you meant? > >> >>> + >>> +Front End Transports >>> + >>> + >>> +VirtIO supports a number of different front end transports. The >>> +details of the device remain the same but there are differences in >>> +command line for specifying the device (e.g. -device virtio-foo >>> +and -device virtio-foo-pci). For example: >>> + >>> +.. code:: c >>> + >>> + static const TypeInfo vhost_user_blk_info = { >>> + .name = TYPE_VHOST_USER_BLK, >>> + .parent = TYPE_VIRTIO_DEVICE, >>> + .instance_size = sizeof(VHostUserBlk), >>> + .instance_init = vhost_user_blk_instance_init, >>> + .class_init = vhost_user_blk_class_init, >>> + }; >>> + >>> +defines ``TYPE_VHOST_USER_BLK`` as a child of the generic >>> +``TYPE_VIRTIO_DEVICE``. >> >> That's not what I'd consider a "front end", though? > > Yeah clumsy wording. I'm trying to get find a good example to show how > QOM can be used to abstract the core device operation and the wrappers > for different transports. However in the code base there seems to be > considerable variation about how this is done. Any advice as to the > best exemplary device to follow is greatly welcomed. I'm not sure which of the example we can really consider a "good" device; the normal modus operandi when writing a new device seems to be "pick the first device you can think of and copy whatever it does". Personally, I usally look at blk or net, but those carry a lot of legacy baggage; so maybe a modern virtio-1 only device like gpu? That one also has the advantage of not being pci-only. Does anyone else have a good suggestion here? > >>> And then for the PCI device it wraps around the >>> +base device (although explicitly initialising via >>> +virtio_instance_init_common): >>> + >>> +.. code:: c >>> + >>> + struct VHostUserBlkPCI { >>> + VirtIOPCIProxy parent_obj; >>> + VHostUserBlk vdev; >>> + }; >> >> The VirtIOPCIProxy seems to materialize a bit out of thin air >> here... maybe the information simply needs to be structured in a >> different way? Perhaps: >> >> - describe that virtio devices consist of a part that implements the >> device functionality, which ultimately derives from VirtIODevice (the >> "backend"), and a part that exposes a way for the operating system to >> discover and use the device (the "frontend", what the virtio spec >> calls a "transport") >> - decribe how the "frontend" part works (maybe mention VirtIOPCIProxy, >> VirtIOMMIOProxy, and VirtioCcwDevice as specialized proxy devices for >> PCI, MMIO, and CCW devices) >> - list the different types of "backends" (as you did below), and give >> two examples of how VirtIODevice is extended (a plain one, and a >> vhost-user one) >> - explain how frontend and backend together create an actual device >> (with the two device examples, and maybe also with the plain one >> plugged as both PCI and CCW?); maybe also mention that MMIO is a bit >> different? (it always confuses me) > > OK I'll see how I can restructure things to make it clearer. Do we also > have to take into account the object heirarchy for different types of > device (i.e. block or net)? Or is that all plumbing into QEMUs > sub-system internals done in the VirtIO device objects? An example of how a device plugs into a bigger infrastructure like the block layer might be helpful, but it also might complicate the documentation (as you probably won't need to do anything like that if you write a device that does not use any established infrastructure.) Maybe just gloss over it for now? > >>> + >>> +Back End Implementations >>> + >>> + >>>
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05.04.2022 um 15:09 hat Emanuele Giuseppe Esposito geschrieben: > Am 05/04/2022 um 12:14 schrieb Kevin Wolf: > > I think all of this is really relevant for Emanuele's work, which > > involves adding AIO_WAIT_WHILE() deep inside graph update functions. I > > fully expect that we would see very similar problems, and just stacking > > drain sections over drain sections that might happen to usually fix > > things, but aren't guaranteed to, doesn't look like a good solution. > > Yes, I think at this point we all agreed to drop subtree_drain as > replacement for AioContext. > > The alternative is what Paolo proposed in the other thread " Removal of > AioContext lock, bs->parents and ->children: proof of concept" > I am not sure which thread you replied first :) This one, I think. :-) > I think that proposal is not far from your idea, and it avoids to > introduce or even use drains at all. > Not sure why you called it a "step backwards even from AioContext locks". I was only referring to the lock locality there. AioContext locks are really coarse, but still a finer granularity than a single global lock. In the big picture, it's still be better than the AioContext lock, but that's because it's a different type of lock, not because it has better locality. So I was just wondering if we can't have the different type of lock and make it local to the BDS, too. Kevin
Re: [PATCH] ui/cursor: fix integer overflow in cursor_alloc (CVE-2022-4206)
On Tue, Apr 5, 2022 at 1:10 PM Gerd Hoffmann wrote: > > > > +++ b/ui/cursor.c > > > @@ -46,6 +46,13 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[]) > > > > > > /* parse pixel data */ > > > c = cursor_alloc(width, height); > > > + > > > +if (!c) { > > > +fprintf(stderr, "%s: cursor %ux%u alloc error\n", > > > +__func__, width, height); > > > +return NULL; > > > +} > > > > > > > I think you could simply abort() in this function. It is used with static > > data (ui/cursor*.xpm) > > Yes, that should never happen. > > Missing: vmsvga_cursor_define() calls cursor_alloc() with guest-supplied > values too. I skipped that because the check (cursor.width > 256 || cursor.height > 256) is already done in vmsvga_fifo_run before calling vmsvga_cursor_define. You want me to add another check in vmsvga_cursor_define and return NULL if cursor_alloc fails? > take care, > Gerd > -- Mauro Matteo Cascella Red Hat Product Security PGP-Key ID: BB3410B0
[PATCH v1] configure: judge build dir permission
If this patch is applied, issue: https://gitlab.com/qemu-project/qemu/-/issues/321 can be closed. Signed-off-by: Guo Zhi --- configure | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/configure b/configure index 7c08c18358..9cfa78efd2 100755 --- a/configure +++ b/configure @@ -24,7 +24,13 @@ then then if test -f $MARKER then - rm -rf build +if test -w $MARKER +then +rm -rf build +else +echo "ERROR: ./build dir already exists and can not be removed due to permission" +exit 1 +fi else echo "ERROR: ./build dir already exists and was not previously created by configure" exit 1 -- 2.35.1
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05.04.2022 um 14:12 hat Vladimir Sementsov-Ogievskiy geschrieben: > Thanks Kevin! I have already run out of arguments in the battle > against using subtree-drains to isolate graph modification operations > from each other in different threads in the mailing list) > > (Note also, that the top-most version of this patch is "[PATCH v2] > block/stream: Drain subtree around graph change") Oops, I completely missed the v2. Thanks! > About avoiding polling during graph-modifying operations, there is a > problem: some IO operations are involved into block-graph modifying > operations. At least it's rewriting "backing_file_offset" and > "backing_file_size" fields in qcow2 header. > > We can't just separate rewriting metadata from graph modifying > operation: this way another graph-modifying operation may interleave > and we'll write outdated metadata. Hm, generally we don't update image metadata when we reconfigure the graph. Most changes are temporary (like insertion of filter nodes) and the image header only contains a "default configuration" to be used on the next start. There are only a few places that update the image header; I think it's generally block job completions. They obviously update the in-memory graph, too, but they don't write to the image file (and therefore potentially poll) in the middle of updating the in-memory graph, but they do both in separate steps. I think this is okay. We must just avoid polling in the middle of graph updates because if something else changes the graph there, it's not clear any more that we're really doing what the caller had in mind. > So I still think, we need a kind of global lock for graph modifying > operations. Or a kind per-BDS locks as you propose. But in this case > we need to be sure that taking all needed per-BDS locks we'll avoid > deadlocking. I guess this depends on the exact granularity of the locks we're using. If you take the lock only while updating a single edge, I don't think you could easily deadlock. If you hold it for more complex operations, it becomes harder to tell without checking the code. Kevin
[PATCH v1] hw/ppc: change indentation to spaces from TABs
There are still some files in the QEMU PPC code base that use TABs for indentation instead of using spaces. The TABs should be replaced so that we have a consistent coding style. If this patch is applied, issue: https://gitlab.com/qemu-project/qemu/-/issues/374 can be closed. Signed-off-by: Guo Zhi --- hw/core/uboot_image.h | 185 - hw/ppc/ppc440_bamboo.c | 6 +- hw/ppc/spapr_rtas.c| 18 ++-- include/hw/ppc/ppc.h | 10 +-- 4 files changed, 109 insertions(+), 110 deletions(-) diff --git a/hw/core/uboot_image.h b/hw/core/uboot_image.h index 608022de6e..980e9cc014 100644 --- a/hw/core/uboot_image.h +++ b/hw/core/uboot_image.h @@ -12,7 +12,7 @@ * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License along @@ -32,128 +32,127 @@ /* * Operating System Codes */ -#define IH_OS_INVALID 0 /* Invalid OS */ -#define IH_OS_OPENBSD 1 /* OpenBSD */ -#define IH_OS_NETBSD 2 /* NetBSD */ -#define IH_OS_FREEBSD 3 /* FreeBSD */ -#define IH_OS_4_4BSD 4 /* 4.4BSD */ -#define IH_OS_LINUX5 /* Linux*/ -#define IH_OS_SVR4 6 /* SVR4 */ -#define IH_OS_ESIX 7 /* Esix */ -#define IH_OS_SOLARIS 8 /* Solaris */ -#define IH_OS_IRIX 9 /* Irix */ -#define IH_OS_SCO 10 /* SCO */ -#define IH_OS_DELL 11 /* Dell */ -#define IH_OS_NCR 12 /* NCR */ -#define IH_OS_LYNXOS 13 /* LynxOS */ -#define IH_OS_VXWORKS 14 /* VxWorks */ -#define IH_OS_PSOS 15 /* pSOS */ -#define IH_OS_QNX 16 /* QNX */ -#define IH_OS_U_BOOT 17 /* Firmware */ -#define IH_OS_RTEMS18 /* RTEMS*/ -#define IH_OS_ARTOS19 /* ARTOS*/ -#define IH_OS_UNITY20 /* Unity OS */ +#define IH_OS_INVALID 0 /* Invalid OS */ +#define IH_OS_OPENBSD 1 /* OpenBSD */ +#define IH_OS_NETBSD 2 /* NetBSD */ +#define IH_OS_FREEBSD 3 /* FreeBSD */ +#define IH_OS_4_4BSD 4 /* 4.4BSD */ +#define IH_OS_LINUX 5 /* Linux */ +#define IH_OS_SVR46 /* SVR4 */ +#define IH_OS_ESIX7 /* Esix */ +#define IH_OS_SOLARIS 8 /* Solaris */ +#define IH_OS_IRIX9 /* Irix */ +#define IH_OS_SCO 10 /* SCO */ +#define IH_OS_DELL11 /* Dell */ +#define IH_OS_NCR 12 /* NCR */ +#define IH_OS_LYNXOS 13 /* LynxOS */ +#define IH_OS_VXWORKS 14 /* VxWorks */ +#define IH_OS_PSOS15 /* pSOS */ +#define IH_OS_QNX 16 /* QNX */ +#define IH_OS_U_BOOT 17 /* Firmware */ +#define IH_OS_RTEMS 18 /* RTEMS */ +#define IH_OS_ARTOS 19 /* ARTOS */ +#define IH_OS_UNITY 20 /* Unity OS */ /* * CPU Architecture Codes (supported by Linux) */ -#define IH_CPU_INVALID 0 /* Invalid CPU */ -#define IH_CPU_ALPHA 1 /* Alpha*/ -#define IH_CPU_ARM 2 /* ARM */ -#define IH_CPU_I3863 /* Intel x86*/ -#define IH_CPU_IA644 /* IA64 */ -#define IH_CPU_MIPS5 /* MIPS */ -#define IH_CPU_MIPS64 6 /* MIPS 64 Bit */ -#define IH_CPU_PPC 7 /* PowerPC */ -#define IH_CPU_S3908 /* IBM S390 */ -#define IH_CPU_SH 9 /* SuperH */ -#define IH_CPU_SPARC 10 /* Sparc*/ -#define IH_CPU_SPARC64 11 /* Sparc 64 Bit */ -#define IH_CPU_M68K12 /* M68K */ -#define IH_CPU_NIOS13 /* Nios-32 */ -#define IH_CPU_MICROBLAZE 14 /* MicroBlaze */ -#define IH_CPU_NIOS2 15 /* Nios-II */ -#define IH_CPU_BLACKFIN16 /* Blackfin */ -#define IH_CPU_AVR32 17 /* AVR32*/ +#define IH_CPU_INVALID0 /* Invalid CPU */ +#define IH_CPU_ALPHA 1 /* Alpha */ +#define IH_CPU_ARM2 /* ARM */ +#define IH_CPU_I386 3 /* Intel x86 */ +#define IH_CPU_IA64 4 /* IA64 */ +#define IH_CPU_MIPS 5 /* MIPS */ +#define IH_CPU_MIPS64 6 /* MIPS 64 Bit */ +#define IH_CPU_PPC7 /* PowerPC */ +#define IH_CPU_S390 8 /* IBM S390 */ +#define IH_CPU_SH 9 /* SuperH */ +#define IH_CPU_SPARC 10 /* Sparc */ +#define IH_CPU_SPARC6411 /* Sparc 64 Bit */ +#define IH_CPU_M68K 12 /* M68K */ +#define IH_CPU_NIOS 13 /* Nios-32 */ +#define IH_CPU_MICROBLAZE 14 /* MicroBlaze */
Re: [PATCH v3 3/5] tests/qtest/libqos: Skip hotplug tests if pci root bus is not hotpluggable
Eric Auger writes: > ARM does not not support hotplug on pcie.0. Add a flag on the bus > which tells if devices can be hotplugged and skip hotplug tests > if the bus cannot be hotplugged. This is a temporary solution to > enable the other pci tests on aarch64. > > Signed-off-by: Eric Auger > Acked-by: Thomas Huth Reviewed-by: Alex Bennée -- Alex Bennée
Re: [RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory
On Tue, 5 Apr 2022 at 14:07, Daniel Henrique Barboza wrote: > > There is a lot of Valgrind warnings about conditional jump depending on > unintialized values like this one (taken from a pSeries guest): > > Conditional jump or move depends on uninitialised value(s) > at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544) > by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523) > by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921) > by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73) > (...) > Uninitialised value was created by a stack allocation > at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538) > > In this case, the alleged unintialized value is the 'lpcr' variable that > is written by kvm_get_one_reg() and then used in an if clause: > > int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable) > { > CPUState *cs = CPU(cpu); > uint64_t lpcr; > > kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, ); > /* Do we need to modify the LPCR? */ > if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here > (...) > > A quick fix is to init the variable that kvm_get_one_reg() is going to > write ('lpcr' in the example above). Another idea is to convince > Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case > the ioctl() is successful. This will put some boilerplate in the > function but it will bring benefit for its other callers. Doesn't Valgrind have a way of modelling ioctls where it knows what data is read and written ? In general ioctl-using programs don't need to have special case "I am running under valgrind" handling, so this seems to me like valgrind is missing support for this particular ioctl. More generally, how much use is running QEMU with KVM enabled under valgrind anyway? Valgrind has no way of knowing about writes to memory that the guest vCPUs do... thanks -- PMM
[PATCH] docs/ccid: convert to restructuredText
From: Lucas Ramage Buglink: https://gitlab.com/qemu-project/qemu/-/issues/527 Signed-off-by: Lucas Ramage --- docs/ccid.txt| 182 --- docs/system/device-emulation.rst | 1 + docs/system/devices/ccid.rst | 171 + 3 files changed, 172 insertions(+), 182 deletions(-) delete mode 100644 docs/ccid.txt create mode 100644 docs/system/devices/ccid.rst diff --git a/docs/ccid.txt b/docs/ccid.txt deleted file mode 100644 index 2b85b1bd42..00 --- a/docs/ccid.txt +++ /dev/null @@ -1,182 +0,0 @@ -QEMU CCID Device Documentation. - -Contents -1. USB CCID device -2. Building -3. Using ccid-card-emulated with hardware -4. Using ccid-card-emulated with certificates -5. Using ccid-card-passthru with client side hardware -6. Using ccid-card-passthru with client side certificates -7. Passthrough protocol scenario -8. libcacard - -1. USB CCID device - -The USB CCID device is a USB device implementing the CCID specification, which -lets one connect smart card readers that implement the same spec. For more -information see the specification: - - Universal Serial Bus - Device Class: Smart Card - CCID - Specification for - Integrated Circuit(s) Cards Interface Devices - Revision 1.1 - April 22rd, 2005 - -Smartcards are used for authentication, single sign on, decryption in -public/private schemes and digital signatures. A smartcard reader on the client -cannot be used on a guest with simple usb passthrough since it will then not be -available on the client, possibly locking the computer when it is "removed". On -the other hand this device can let you use the smartcard on both the client and -the guest machine. It is also possible to have a completely virtual smart card -reader and smart card (i.e. not backed by a physical device) using this device. - -2. Building - -The cryptographic functions and access to the physical card is done via the -libcacard library, whose development package must be installed prior to -building QEMU: - -In redhat/fedora: -yum install libcacard-devel -In ubuntu: -apt-get install libcacard-dev - -Configuring and building: -./configure --enable-smartcard && make - - -3. Using ccid-card-emulated with hardware - -Assuming you have a working smartcard on the host with the current -user, using libcacard, QEMU acts as another client using ccid-card-emulated: - -qemu -usb -device usb-ccid -device ccid-card-emulated - - -4. Using ccid-card-emulated with certificates stored in files - -You must create the CA and card certificates. This is a one time process. -We use NSS certificates: - -mkdir fake-smartcard -cd fake-smartcard -certutil -N -d sql:$PWD -certutil -S -d sql:$PWD -s "CN=Fake Smart Card CA" -x -t TC,TC,TC -n fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe" -n id-cert -c fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (signing)" --nsCertType smime -n signing-cert -c fake-smartcard-ca -certutil -S -d sql:$PWD -t ,, -s "CN=John Doe (encryption)" --nsCertType sslClient -n encryption-cert -c fake-smartcard-ca - -Note: you must have exactly three certificates. - -You can use the emulated card type with the certificates backend: - -qemu -usb -device usb-ccid -device ccid-card-emulated,backend=certificates,db=sql:$PWD,cert1=id-cert,cert2=signing-cert,cert3=encryption-cert - -To use the certificates in the guest, export the CA certificate: - -certutil -L -r -d sql:$PWD -o fake-smartcard-ca.cer -n fake-smartcard-ca - -and import it in the guest: - -certutil -A -d /etc/pki/nssdb -i fake-smartcard-ca.cer -t TC,TC,TC -n fake-smartcard-ca - -In a Linux guest you can then use the CoolKey PKCS #11 module to access -the card: - -certutil -d /etc/pki/nssdb -L -h all - -It will prompt you for the PIN (which is the password you assigned to the -certificate database early on), and then show you all three certificates -together with the manually imported CA cert: - -Certificate NicknameTrust Attributes -fake-smartcard-ca CT,C,C -John Doe:CAC ID Certificate u,u,u -John Doe:CAC Email Signature Certificateu,u,u -John Doe:CAC Email Encryption Certificate u,u,u - -If this does not happen, CoolKey is not installed or not registered with -NSS. Registration can be done from Firefox or the command line: - -modutil -dbdir /etc/pki/nssdb -add "CAC Module" -libfile /usr/lib64/pkcs11/libcoolkeypk11.so -modutil -dbdir /etc/pki/nssdb -list - - -5. Using ccid-card-passthru with client side hardware - -on the host specify the ccid-card-passthru device with a suitable chardev: - -qemu -chardev socket,server=on,host=0.0.0.0,port=2001,id=ccid,wait=off \ - -usb -device usb-ccid -device ccid-card-passthru,chardev=ccid - -on the client run vscclient, built when you built QEMU: - -vscclient 2001 - - -6. Using ccid-card-passthru with client
Re: [PATCH 2/2] hw/xen/xen_pt: Resolve igd_passthrough_isa_bridge_create() indirection
On Sat, Mar 26, 2022 at 05:58:24PM +0100, Bernhard Beschow wrote: > Now that igd_passthrough_isa_bridge_create() is implemented within the > xen context it may use Xen* data types directly and become > xen_igd_passthrough_isa_bridge_create(). This resolves an indirection. > > Signed-off-by: Bernhard Beschow Acked-by: Anthony PERARD Thanks, -- Anthony PERARD
Re: [PATCH 1/2] hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN
On Sat, Mar 26, 2022 at 05:58:23PM +0100, Bernhard Beschow wrote: > igd-passthrough-isa-bridge is only requested in xen_pt but was > implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc > which is hereby resolved. > > Signed-off-by: Bernhard Beschow Acked-by: Anthony PERARD Thanks, -- Anthony PERARD
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05.04.2022 um 13:47 hat Hanna Reitz geschrieben: > On 05.04.22 12:14, Kevin Wolf wrote: > > Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben: > > > When the stream block job cuts out the nodes between top and base in > > > stream_prepare(), it does not drain the subtree manually; it fetches the > > > base node, and tries to insert it as the top node's backing node with > > > bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so > > > the actual base node might change (because the base node is actually not > > > part of the stream job) before the old base node passed to > > > bdrv_set_backing_hd() is installed. > > > > > > This has two implications: > > > > > > First, the stream job does not keep a strong reference to the base node. > > > Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g. > > > because some other block job is drained to finish), we will get a > > > use-after-free. We should keep a strong reference to that node. > > > > > > Second, even with such a strong reference, the problem remains that the > > > base node might change before bdrv_set_backing_hd() actually runs and as > > > a result the wrong base node is installed. > > > > > > Both effects can be seen in 030's TestParallelOps.test_overlapping_5() > > > case, which has five nodes, and simultaneously streams from the middle > > > node to the top node, and commits the middle node down to the base node. > > > As it is, this will sometimes crash, namely when we encounter the > > > above-described use-after-free. > > > > > > Taking a strong reference to the base node, we no longer get a crash, > > > but the resuling block graph is less than ideal: The expected result is > > > obviously that all middle nodes are cut out and the base node is the > > > immediate backing child of the top node. However, if stream_prepare() > > > takes a strong reference to its base node (the middle node), and then > > > the commit job finishes in bdrv_set_backing_hd(), supposedly dropping > > > that middle node, the stream job will just reinstall it again. > > > > > > Therefore, we need to keep the whole subtree drained in > > > stream_prepare() > > That doesn't sound right. I think in reality it's "if we take the really > > big hammer and drain the whole subtree, then the bit that we really need > > usually happens to be covered, too". > > > > When you have a long backing chain and merge the two topmost overlays > > with streaming, then it's none of the stream job's business whether > > there is I/O going on for the base image way down the chain. Subtree > > drains do much more than they should in this case. > > Yes, see the discussion I had with Vladimir. He convinced me that this > can’t be an indefinite solution, but that we need locking for graph changes > that’s separate from draining, because (1) those are different things, and > (2) changing the graph should influence I/O as little as possible. > > I found this the best solution to fix a known case of a use-after-free for > 7.1, though. I'm not arguing against a short-term band-aid solution (I assume you mean for 7.0?) as long as we agree that this is what it is. The commit message just sounded as if this were the right solution rather than a hack, so I wanted to make the point. > > At the same time they probably do too little, because what you're > > describing you're protecting against is not I/O, but graph modifications > > done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the > > backing file. The callback could be invoked by I/O on an entirely > > different subgraph (maybe if the other thing is a mirror job)or it > > could be a BH or anything else really. bdrv_drain_all() would increase > > your chances, but I'm not sure if even that would be guaranteed to be > > enough - because it's really another instance of abusing drain for > > locking, we're not really interested in the _I/O_ of the node. > > The most common instances of graph modification I see are QMP and block jobs > finishing. The former will not be deterred by draining, and we do know of > one instance where that is a problem (see the bdrv_next() discussion). > Generally, it isn’t though. (If it is, this case here won’t be the only > thing that breaks.) To be honest, I would be surprised if other things weren't broken if QMP commands come in with unfortunate timing. > As for the latter, most block jobs are parents of the nodes they touch > (stream is one notable exception with how it handles its base, and changing > that did indeed cause us headache before), and so will at least be paused > when a drain occurs on a node they touch. Since pausing doesn’t affect jobs > that have exited their main loop, there might be some problem with > concurrent jobs that are also finished but yielding, but I couldn’t find > such a case. True, the way that we implement drain in the block job actually means that they fully pause and therefore can't complete even if they wouldn't
Re: [PATCH v3 2/5] tests/qtest/libqos/pci: Introduce pio_limit
Eric Auger writes: > At the moment the IO space limit is hardcoded to > QPCI_PIO_LIMIT = 0x1. When accesses are performed to a bar, > the base address of this latter is compared against the limit > to decide whether we perform an IO or a memory access. > > On ARM, we cannot keep this PIO limit as the arm-virt machine > uses [0x3eff, 0x3f00 ] for the IO space map and we > are mandated to allocate at 0x0. > > Add a new flag in QPCIBar indicating whether it is an IO bar > or a memory bar. This flag is set on QPCIBar allocation and > provisionned based on the BAR configuration. Then the new flag > is used in access functions and in iomap() function. > > Signed-off-by: Eric Auger > Reviewed-by: Thomas Huth Reviewed-by: Alex Bennée -- Alex Bennée
Re: [RFC PATCH] tests/qtest: attempt to enable tests for virtio-gpio (!working)
"Dr. David Alan Gilbert" writes: > * Alex Bennée (alex.ben...@linaro.org) wrote: >> >> (expanding the CC list for help, anyone have a better idea about how >> vhost-user qtests should work/see obvious issues with this patch?) > > How exactly does it fail? ➜ env QTEST_QEMU_BINARY=./qemu-system-aarch64 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon G_TEST_DBUS_DAEMON=/home/alex/lsrc/qemu.git/tests/dbus-v mstate-daemon.sh QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=137 ./tests/qtest/qos-test -p /aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile # random seed: R02S5d7667675b4f6dd3b8559f8db621296c # starting QEMU: exec ./qemu-system-aarch64 -qtest unix:/tmp/qtest-1245871.sock -qtest-log /dev/null -chardev socket,path=/tmp/qtest-1245871.qmp,id=char0 -mon chardev=char0,mode=control -display none -machine none -accel qtest # Start of aarch64 tests # Start of virt tests # Start of generic-pcihost tests # Start of pci-bus-generic tests # Start of pci-bus tests # Start of vhost-user-gpio-pci tests # Start of vhost-user-gpio tests # Start of vhost-user-gpio-tests tests # Start of read-guest-mem tests # child process (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess [1245877]) exit status: 1 (error) # child process (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess [1245877]) stdout: "" # child process (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess [1245877]) stderr: "qemu-system-aarch64: -device vhost-user-gpio-pci,id=gpio0,chardev=chr-vhost-user-test,vhostforce=on: Duplicate ID 'gpio0' for device\nsocket_accept failed: Resource temporarily unavailable\n**\nERROR:../../tests/qtest/libqtest.c:321:qtest_init_without_qmp_handshake: assertion failed: (s->fd >= 0 && s->qmp_fd >= 0)\n" ** ERROR:../../tests/qtest/qos-test.c:189:subprocess_run_one_test: child process (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess [1245877]) failed unexpectedly Bail out! ERROR:../../tests/qtest/qos-test.c:189:subprocess_run_one_test: child process (/aarch64/virt/generic-pcihost/pci-bus-generic/pci-bus/vhost-user-gpio-pci/vhost-user-gpio/vhost-user-gpio-tests/read-guest-mem/memfile/subprocess [1245877]) failed unexpectedly fish: “env QTEST_QEMU_BINARY=./qemu-sy…” terminated by signal SIGABRT (Abort) Although it would be nice if I could individually run qos-tests with all the make machinery setting things up. > > DAve > >> Alex Bennée writes: >> >> > We don't have a virtio-gpio implementation in QEMU and only >> > support a vhost-user backend. The QEMU side of the code is minimal so >> > it should be enough to instantiate the device and pass some vhost-user >> > messages over the control socket. To do this we hook into the existing >> > vhost-user-test code and just add the bits required for gpio. >> > >> > Based-on: 20220118203833.316741-1-eric.au...@redhat.com >> > Signed-off-by: Alex Bennée >> > Cc: Viresh Kumar >> > Cc: Paolo Bonzini >> > >> > --- >> > >> > This goes as far as to add things to the QOS tree but so far it's >> > failing to properly start QEMU with the chardev socket needed to >> > communicate between the mock vhost-user daemon and QEMU itself. >> > --- >> > tests/qtest/libqos/virtio-gpio.h | 34 +++ >> > tests/qtest/libqos/virtio-gpio.c | 98 >> > tests/qtest/vhost-user-test.c| 34 +++ >> > tests/qtest/libqos/meson.build | 1 + >> > 4 files changed, 167 insertions(+) >> > create mode 100644 tests/qtest/libqos/virtio-gpio.h >> > create mode 100644 tests/qtest/libqos/virtio-gpio.c >> > >> > diff --git a/tests/qtest/libqos/virtio-gpio.h >> > b/tests/qtest/libqos/virtio-gpio.h >> > new file mode 100644 >> > index 00..abe6967ae9 >> > --- /dev/null >> > +++ b/tests/qtest/libqos/virtio-gpio.h >> > @@ -0,0 +1,34 @@ >> > +/* >> > + * virtio-gpio structures >> > + * >> > + * Copyright (c) 2022 Linaro Ltd >> > + * >> > + * SPDX-License-Identifier: GPL-2.0-or-later >> > + */ >> > + >> > +#ifndef TESTS_LIBQOS_VIRTIO_GPIO_H >> > +#define TESTS_LIBQOS_VIRTIO_GPIO_H >> > + >> > +#include "qgraph.h" >> > +#include "virtio.h" >> > +#include "virtio-pci.h" >> > + >> > +typedef struct QVhostUserGPIO QVhostUserGPIO; >> > +typedef struct QVhostUserGPIOPCI QVhostUserGPIOPCI; >> > +typedef struct QVhostUserGPIODevice QVhostUserGPIODevice; >> > + >> > +struct QVhostUserGPIO { >> > +QVirtioDevice *vdev; >> > +}; >> > + >> > +struct QVhostUserGPIOPCI { >> > +QVirtioPCIDevice pci_vdev; >> > +QVhostUserGPIO gpio; >> > +}; >> > + >> > +struct
[PATCH v3 2/3] iotests/108: Test new refcount rebuild algorithm
One clear problem with how qcow2's refcount structure rebuild algorithm used to be before "qcow2: Improve refcount structure rebuilding" was that it is prone to failure for qcow2 images on block devices: There is generally unused space after the actual image, and if that exceeds what one refblock covers, the old algorithm would invariably write the reftable past the block device's end, which cannot work. The new algorithm does not have this problem. Test it with three tests: (1) Create an image with more empty space at the end than what one refblock covers, see whether rebuilding the refcount structures results in a change in the image file length. (It should not.) (2) Leave precisely enough space somewhere at the beginning of the image for the new reftable (and the refblock for that place), see whether the new algorithm puts the reftable there. (It should.) (3) Test the original problem: Create (something like) a block device with a fixed size, then create a qcow2 image in there, write some data, and then have qemu-img check rebuild the refcount structures. Before HEAD^, the reftable would have been written past the image file end, i.e. outside of what the block device provides, which cannot work. HEAD^ should have fixed that. ("Something like a block device" means a loop device if we can use one ("sudo -n losetup" works), or a FUSE block export with growable=false otherwise.) Reviewed-by: Eric Blake Signed-off-by: Hanna Reitz --- tests/qemu-iotests/108 | 259 - tests/qemu-iotests/108.out | 81 2 files changed, 339 insertions(+), 1 deletion(-) diff --git a/tests/qemu-iotests/108 b/tests/qemu-iotests/108 index 56339ab2c5..ed02b3267b 100755 --- a/tests/qemu-iotests/108 +++ b/tests/qemu-iotests/108 @@ -30,13 +30,20 @@ status=1# failure is the default! _cleanup() { - _cleanup_test_img +_cleanup_test_img +if [ -f "$TEST_DIR/qsd.pid" ]; then +qsd_pid=$(cat "$TEST_DIR/qsd.pid") +kill -KILL "$qsd_pid" +fusermount -u "$TEST_DIR/fuse-export" &>/dev/null +fi +rm -f "$TEST_DIR/fuse-export" } trap "_cleanup; exit \$status" 0 1 2 3 15 # get standard environment, filters and checks . ./common.rc . ./common.filter +. ./common.qemu # This tests qcow2-specific low-level functionality _supported_fmt qcow2 @@ -47,6 +54,22 @@ _supported_os Linux # files _unsupported_imgopts 'refcount_bits=\([^1]\|.\([^6]\|$\)\)' data_file +# This test either needs sudo -n losetup or FUSE exports to work +if sudo -n losetup &>/dev/null; then +loopdev=true +else +loopdev=false + +# QSD --export fuse will either yield "Parameter 'id' is missing" +# or "Invalid parameter 'fuse'", depending on whether there is +# FUSE support or not. +error=$($QSD --export fuse 2>&1) +if [[ $error = *"Invalid parameter 'fuse'" ]]; then +_notrun 'Passwordless sudo for losetup or FUSE support required, but' \ +'neither is available' +fi +fi + echo echo '=== Repairing an image without any refcount table ===' echo @@ -138,6 +161,240 @@ _make_test_img 64M poke_file "$TEST_IMG" $((0x10008)) "\xff\xff\xff\xff\xff\xff\x00\x00" _check_test_img -r all +echo +echo '=== Check rebuilt reftable location ===' + +# In an earlier version of the refcount rebuild algorithm, the +# reftable was generally placed at the image end (unless something was +# allocated in the area covered by the refblock right before the image +# file end, then we would try to place the reftable in that refblock). +# This was later changed so the reftable would be placed in the +# earliest possible location. Test this. + +echo +echo '--- Does the image size increase? ---' +echo + +# First test: Just create some image, write some data to it, and +# resize it so there is free space at the end of the image (enough +# that it spans at least one full refblock, which for cluster_size=512 +# images, spans 128k). With the old algorithm, the reftable would +# have then been placed at the end of the image file, but with the new +# one, it will be put in that free space. +# We want to check whether the size of the image file increases due to +# rebuilding the refcount structures (it should not). + +_make_test_img -o 'cluster_size=512' 1M +# Write something +$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io + +# Add free space +file_len=$(stat -c '%s' "$TEST_IMG") +truncate -s $((file_len + 256 * 1024)) "$TEST_IMG" + +# Corrupt the image by saying the image header was not allocated +rt_offset=$(peek_file_be "$TEST_IMG" 48 8) +rb_offset=$(peek_file_be "$TEST_IMG" $rt_offset 8) +poke_file "$TEST_IMG" $rb_offset "\x00\x00" + +# Check whether rebuilding the refcount structures increases the image +# file size +file_len=$(stat -c '%s' "$TEST_IMG") +echo +# The only leaks there can be are the old refcount structures that are +# leaked during rebuilding, no need to clutter
[PATCH v3 3/3] qcow2: Add errp to rebuild_refcount_structure()
Instead of fprint()-ing error messages in rebuild_refcount_structure() and its rebuild_refcounts_write_refblocks() helper, pass them through an Error object to qcow2_check_refcounts() (which will then print it). Suggested-by: Eric Blake Signed-off-by: Hanna Reitz --- block/qcow2-refcount.c | 33 +++-- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index c5669eaa51..ed0ecfaa89 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -2465,7 +2465,8 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs, static int rebuild_refcounts_write_refblocks( BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters, int64_t first_cluster, int64_t end_cluster, -uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr +uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr, +Error **errp ) { BDRVQcow2State *s = bs->opaque; @@ -2516,8 +2517,8 @@ static int rebuild_refcounts_write_refblocks( nb_clusters, _free_cluster); if (refblock_offset < 0) { -fprintf(stderr, "ERROR allocating refblock: %s\n", -strerror(-refblock_offset)); +error_setg_errno(errp, -refblock_offset, + "ERROR allocating refblock"); return refblock_offset; } @@ -2539,6 +2540,7 @@ static int rebuild_refcounts_write_refblocks( on_disk_reftable_entries * REFTABLE_ENTRY_SIZE); if (!on_disk_reftable) { +error_setg(errp, "ERROR allocating reftable memory"); return -ENOMEM; } @@ -2562,7 +2564,7 @@ static int rebuild_refcounts_write_refblocks( ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset, s->cluster_size, false); if (ret < 0) { -fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret)); +error_setg_errno(errp, -ret, "ERROR writing refblock"); return ret; } @@ -2578,7 +2580,7 @@ static int rebuild_refcounts_write_refblocks( ret = bdrv_pwrite(bs->file, refblock_offset, on_disk_refblock, s->cluster_size); if (ret < 0) { -fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret)); +error_setg_errno(errp, -ret, "ERROR writing refblock"); return ret; } @@ -2601,7 +2603,8 @@ static int rebuild_refcounts_write_refblocks( static int rebuild_refcount_structure(BlockDriverState *bs, BdrvCheckResult *res, void **refcount_table, - int64_t *nb_clusters) + int64_t *nb_clusters, + Error **errp) { BDRVQcow2State *s = bs->opaque; int64_t reftable_offset = -1; @@ -2652,7 +2655,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs, rebuild_refcounts_write_refblocks(bs, refcount_table, nb_clusters, 0, *nb_clusters, _disk_reftable, - _disk_reftable_entries); + _disk_reftable_entries, errp); if (reftable_size_changed < 0) { res->check_errors++; ret = reftable_size_changed; @@ -2676,8 +2679,8 @@ static int rebuild_refcount_structure(BlockDriverState *bs, refcount_table, nb_clusters, _free_cluster); if (reftable_offset < 0) { -fprintf(stderr, "ERROR allocating reftable: %s\n", -strerror(-reftable_offset)); +error_setg_errno(errp, -reftable_offset, + "ERROR allocating reftable"); res->check_errors++; ret = reftable_offset; goto fail; @@ -2695,7 +2698,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs, reftable_start_cluster, reftable_end_cluster, _disk_reftable, - _disk_reftable_entries); + _disk_reftable_entries, errp); if (reftable_size_changed < 0) { res->check_errors++; ret = reftable_size_changed; @@ -2725,7 +2728,7 @@ static int rebuild_refcount_structure(BlockDriverState *bs, ret =
[PATCH v3 1/3] qcow2: Improve refcount structure rebuilding
When rebuilding the refcount structures (when qemu-img check -r found errors with refcount = 0, but reference count > 0), the new refcount table defaults to being put at the image file end[1]. There is no good reason for that except that it means we will not have to rewrite any refblocks we already wrote to disk. Changing the code to rewrite those refblocks is not too difficult, though, so let us do that. That is beneficial for images on block devices, where we cannot really write beyond the end of the image file. Use this opportunity to add extensive comments to the code, and refactor it a bit, getting rid of the backwards-jumping goto. [1] Unless there is something allocated in the area pointed to by the last refblock, so we have to write that refblock. In that case, we try to put the reftable in there. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071 Closes: https://gitlab.com/qemu-project/qemu/-/issues/941 Reviewed-by: Eric Blake Signed-off-by: Hanna Reitz --- block/qcow2-refcount.c | 332 + 1 file changed, 235 insertions(+), 97 deletions(-) diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index b91499410c..c5669eaa51 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -2438,111 +2438,140 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs, } /* - * Creates a new refcount structure based solely on the in-memory information - * given through *refcount_table. All necessary allocations will be reflected - * in that array. + * Helper function for rebuild_refcount_structure(). * - * On success, the old refcount structure is leaked (it will be covered by the - * new refcount structure). + * Scan the range of clusters [first_cluster, end_cluster) for allocated + * clusters and write all corresponding refblocks to disk. The refblock + * and allocation data is taken from the in-memory refcount table + * *refcount_table[] (of size *nb_clusters), which is basically one big + * (unlimited size) refblock for the whole image. + * + * For these refblocks, clusters are allocated using said in-memory + * refcount table. Care is taken that these allocations are reflected + * in the refblocks written to disk. + * + * The refblocks' offsets are written into a reftable, which is + * *on_disk_reftable_ptr[] (of size *on_disk_reftable_entries_ptr). If + * that reftable is of insufficient size, it will be resized to fit. + * This reftable is not written to disk. + * + * (If *on_disk_reftable_ptr is not NULL, the entries within are assumed + * to point to existing valid refblocks that do not need to be allocated + * again.) + * + * Return whether the on-disk reftable array was resized (true/false), + * or -errno on error. */ -static int rebuild_refcount_structure(BlockDriverState *bs, - BdrvCheckResult *res, - void **refcount_table, - int64_t *nb_clusters) +static int rebuild_refcounts_write_refblocks( +BlockDriverState *bs, void **refcount_table, int64_t *nb_clusters, +int64_t first_cluster, int64_t end_cluster, +uint64_t **on_disk_reftable_ptr, uint32_t *on_disk_reftable_entries_ptr +) { BDRVQcow2State *s = bs->opaque; -int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0; +int64_t cluster; int64_t refblock_offset, refblock_start, refblock_index; -uint32_t reftable_size = 0; -uint64_t *on_disk_reftable = NULL; +int64_t first_free_cluster = 0; +uint64_t *on_disk_reftable = *on_disk_reftable_ptr; +uint32_t on_disk_reftable_entries = *on_disk_reftable_entries_ptr; void *on_disk_refblock; -int ret = 0; -struct { -uint64_t reftable_offset; -uint32_t reftable_clusters; -} QEMU_PACKED reftable_offset_and_clusters; - -qcow2_cache_empty(bs, s->refcount_block_cache); +bool reftable_grown = false; +int ret; -write_refblocks: -for (; cluster < *nb_clusters; cluster++) { +for (cluster = first_cluster; cluster < end_cluster; cluster++) { +/* Check all clusters to find refblocks that contain non-zero entries */ if (!s->get_refcount(*refcount_table, cluster)) { continue; } +/* + * This cluster is allocated, so we need to create a refblock + * for it. The data we will write to disk is just the + * respective slice from *refcount_table, so it will contain + * accurate refcounts for all clusters belonging to this + * refblock. After we have written it, we will therefore skip + * all remaining clusters in this refblock. + */ + refblock_index = cluster >> s->refcount_block_bits; refblock_start = refblock_index << s->refcount_block_bits; -/* Don't allocate a cluster in a refblock already written to disk */ -if (first_free_cluster < refblock_start)
[PATCH v3 0/3] qcow2: Improve refcount structure rebuilding
Hi, v2 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2022-03/msg01260.html v1 cover letter: https://lists.nongnu.org/archive/html/qemu-block/2021-03/msg00651.html This series fixes the qcow2 refcount structure rebuilding mechanism for when the qcow2 image file doesn’t allow writes beyond the end of file (e.g. because it’s on an LVM block device). v3: - Added patch 3 (didn’t squash this into patch 1, because (a) Eric gave his R-b on 1 as-is, and (b) I ended up retouching rebuild_refcount_structure() as a whole, not just the new helper, so a dedicated patch made more sense) - In patch 1: Changed `assert(reftable_size_changed == true)` to just `assert(reftable_size_changed)` - In patch 2: In comments, replaced “were” by “was” git-backport-diff against v2: Key: [] : patches are identical [] : number of functional differences between upstream/downstream patch [down] : patch is downstream-only The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively 001/3:[0002] [FC] 'qcow2: Improve refcount structure rebuilding' 002/3:[0006] [FC] 'iotests/108: Test new refcount rebuild algorithm' 003/3:[down] 'qcow2: Add errp to rebuild_refcount_structure()' Hanna Reitz (3): qcow2: Improve refcount structure rebuilding iotests/108: Test new refcount rebuild algorithm qcow2: Add errp to rebuild_refcount_structure() block/qcow2-refcount.c | 353 ++--- tests/qemu-iotests/108 | 259 ++- tests/qemu-iotests/108.out | 81 + 3 files changed, 587 insertions(+), 106 deletions(-) -- 2.35.1
Re: [PATCH v9 27/45] hw/cxl/host: Add support for CXL Fixed Memory Windows.
Jonathan Cameron writes: > From: Jonathan Cameron > > The concept of these is introduced in [1] in terms of the > description the CEDT ACPI table. The principal is more general. > Unlike once traffic hits the CXL root bridges, the host system > memory address routing is implementation defined and effectively > static once observable by standard / generic system software. > Each CXL Fixed Memory Windows (CFMW) is a region of PA space > which has fixed system dependent routing configured so that > accesses can be routed to the CXL devices below a set of target > root bridges. The accesses may be interleaved across multiple > root bridges. > > For QEMU we could have fully specified these regions in terms > of a base PA + size, but as the absolute address does not matter > it is simpler to let individual platforms place the memory regions. > > ExampleS: > -cxl-fixed-memory-window targets.0=cxl.0,size=128G > -cxl-fixed-memory-window targets.0=cxl.1,size=128G > -cxl-fixed-memory-window > targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k > > Specifies > * 2x 128G regions not interleaved across root bridges, one for each of > the root bridges with ids cxl.0 and cxl.1 > * 256G region interleaved across root bridges with ids cxl.0 and cxl.1 > with a 2k interleave granularity. > > When system software enumerates the devices below a given root bridge > it can then decide which CFMW to use. If non interleave is desired > (or possible) it can use the appropriate CFMW for the root bridge in > question. If there are suitable devices to interleave across the > two root bridges then it may use the 3rd CFMS. > > A number of other designs were considered but the following constraints > made it hard to adapt existing QEMU approaches to this particular problem. > 1) The size must be known before a specific architecture / board brings >up it's PA memory map. We need to set up an appropriate region. > 2) Using links to the host bridges provides a clean command line interface >but these links cannot be established until command line devices have >been added. > > Hence the two step process used here of first establishing the size, > interleave-ways and granularity + caching the ids of the host bridges > and then, once available finding the actual host bridges so they can > be used later to support interleave decoding. > > [1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / > specifications) > > Signed-off-by: Jonathan Cameron QAPI schema Acked-by: Markus Armbruster
Re: [PATCH] block/stream: Drain subtree around graph change
Am 05/04/2022 um 12:14 schrieb Kevin Wolf: > I think all of this is really relevant for Emanuele's work, which > involves adding AIO_WAIT_WHILE() deep inside graph update functions. I > fully expect that we would see very similar problems, and just stacking > drain sections over drain sections that might happen to usually fix > things, but aren't guaranteed to, doesn't look like a good solution. Yes, I think at this point we all agreed to drop subtree_drain as replacement for AioContext. The alternative is what Paolo proposed in the other thread " Removal of AioContext lock, bs->parents and ->children: proof of concept" I am not sure which thread you replied first :) I think that proposal is not far from your idea, and it avoids to introduce or even use drains at all. Not sure why you called it a "step backwards even from AioContext locks". Emanuele
[RFC PATCH 1/1] kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory
There is a lot of Valgrind warnings about conditional jump depending on unintialized values like this one (taken from a pSeries guest): Conditional jump or move depends on uninitialised value(s) at 0xB011DC: kvmppc_enable_cap_large_decr (kvm.c:2544) by 0x92F28F: cap_large_decr_cpu_apply (spapr_caps.c:523) by 0x930C37: spapr_caps_cpu_apply (spapr_caps.c:921) by 0x955D3B: spapr_reset_vcpu (spapr_cpu_core.c:73) (...) Uninitialised value was created by a stack allocation at 0xB01150: kvmppc_enable_cap_large_decr (kvm.c:2538) In this case, the alleged unintialized value is the 'lpcr' variable that is written by kvm_get_one_reg() and then used in an if clause: int kvmppc_enable_cap_large_decr(PowerPCCPU *cpu, int enable) { CPUState *cs = CPU(cpu); uint64_t lpcr; kvm_get_one_reg(cs, KVM_REG_PPC_LPCR_64, ); /* Do we need to modify the LPCR? */ if (!!(lpcr & LPCR_LD) != !!enable) { < Valgrind warns here (...) A quick fix is to init the variable that kvm_get_one_reg() is going to write ('lpcr' in the example above). Another idea is to convince Valgrind that kvm_get_one_reg() inits the 'void *target' memory in case the ioctl() is successful. This will put some boilerplate in the function but it will bring benefit for its other callers. This patch uses the memcheck VALGRING_MAKE_MEM_DEFINED() to mark the 'target' variable as initialized if the ioctl is successful. Cc: Paolo Bonzini Signed-off-by: Daniel Henrique Barboza --- accel/kvm/kvm-all.c | 17 + 1 file changed, 17 insertions(+) diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 5f1377ca04..d9acba23c7 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -53,6 +53,10 @@ #include #endif +#ifdef CONFIG_VALGRIND_H +#include +#endif + /* KVM uses PAGE_SIZE in its definition of KVM_COALESCED_MMIO_MAX. We * need to use the real host PAGE_SIZE, as that's what KVM will use. */ @@ -3504,6 +3508,19 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target) if (r) { trace_kvm_failed_reg_get(id, strerror(-r)); } + +#ifdef CONFIG_VALGRIND_H +if (r == 0) { +switch (id & KVM_REG_SIZE_MASK) { +case KVM_REG_SIZE_U32: +VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint32_t)); +break; +case KVM_REG_SIZE_U64: +VALGRIND_MAKE_MEM_DEFINED(target, sizeof(uint64_t)); +break; +} +} +#endif return r; } -- 2.35.1
[RFC PATCH 0/1] add Valgrind hint in kvm_get_one_reg()
Hi, Valgrind is not happy with how we're using KVM functions that receives a parameter via reference and write them. This results in a lot of complaints about uninitialized values when using these functions because, as default, Valgrind doesn't know that the variable is being initialized in the function. This is the overall pattern that Valgrind does not like: --- uint64_t val; (...) kvm_get_one_reg(, ); if (val) {...} --- Valgrind complains that the 'if' clause is using an uninitialized variable. A quick fix is to init 'val' and be done with it. The drawback is that every single caller of kvm_get_one_reg() must also be bothered with initializing these variables to avoid the warnings. David suggested in [1] that, instead, we should add a Valgrind hint in the common KVM functions to fix this issue for everyone. This is what this patch accomplishes. kvm_get_one_reg() has 20+ callers so I believe this extra boilerplate is worth the benefits. There are more common instances of KVM functions that Valgrind complains about. If we're good with the approach taken here we can think about adding this hint for more functions. [1] https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg07351.html Daniel Henrique Barboza (1): kvm-all.c: hint Valgrind that kvm_get_one_reg() inits memory accel/kvm/kvm-all.c | 17 + 1 file changed, 17 insertions(+) -- 2.35.1
Re: [PULL 0/2] target-arm queue
On Tue, 5 Apr 2022 at 10:26, Peter Maydell wrote: > > Couple of trivial fixes for rc3... > > The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5: > > Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into > staging (2022-04-04 15:48:55 +0100) > > are available in the Git repository at: > > https://git.linaro.org/people/pmaydell/qemu-arm.git > tags/pull-target-arm-20220405 > > for you to fetch changes up to 80b952bb694a90f7e530d407b01066894e64a443: > > docs/system/devices/can.rst: correct links to CTU CAN FD IP core > documentation. (2022-04-05 09:29:28 +0100) > > > target-arm queue: > * docs/system/devices/can.rst: correct links to CTU CAN FD IP core > documentation. > * xlnx-bbram: hw/nvram: Fix uninitialized Error * > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/7.0 for any user-visible changes. -- PMM
Re: [RFC PATCH] python: add qmp-send program to send raw qmp commands to qemu
On 4/5/22 07:41, Markus Armbruster wrote: Daniel P. Berrangé writes: On Wed, Mar 16, 2022 at 10:54:55AM +0100, Damien Hedde wrote: It takes an input file containing raw qmp commands (concatenated json dicts) and send all commands one by one to a qmp server. When one command fails, it exits. As a convenience, it can also wrap the qemu process to avoid having to start qemu in background. When wrapping qemu, the program returns only when the qemu process terminates. Signed-off-by: Damien Hedde [...] I name that qmp-send as Daniel proposed, maybe qmp-test matches better what I'm doing there ? 'qmp-test' is a use case specific name. I think it is better to name it based on functionality provided rather than anticipated use case, since use cases evolve over time, hence 'qmp-send'. Well, it doesn't just send, it also receives. qmpcat, like netcat and socat? anyone against qmpcat ? -- Damien
Re: [PATCH v2] hw/ppc/ppc405_boards: Initialize g_autofree pointer
On Tue, 5 Apr 2022 at 13:40, Bernhard Beschow wrote: > > Resolves the only compiler warning when building a full QEMU under Arch Linux: > > Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o > In file included from /usr/include/glib-2.0/glib.h:114, >from qemu/include/glib-compat.h:32, >from qemu/include/qemu/osdep.h:132, >from ../src/hw/ppc/ppc405_boards.c:25: > ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’: > /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ > may be used uninitialized in this function [-Wmaybe-uninitialized] > 28 | g_free (*pp); > | ^~~~ > ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here > 265 | g_autofree char *filename; > | ^~~~ > > Signed-off-by: Bernhard Beschow > --- Reviewed-by: Peter Maydell thanks -- PMM
[PATCH v2] hw/ppc/ppc405_boards: Initialize g_autofree pointer
Resolves the only compiler warning when building a full QEMU under Arch Linux: Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o In file included from /usr/include/glib-2.0/glib.h:114, from qemu/include/glib-compat.h:32, from qemu/include/qemu/osdep.h:132, from ../src/hw/ppc/ppc405_boards.c:25: ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’: /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ may be used uninitialized in this function [-Wmaybe-uninitialized] 28 | g_free (*pp); | ^~~~ ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here 265 | g_autofree char *filename; | ^~~~ Signed-off-by: Bernhard Beschow --- hw/ppc/ppc405_boards.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index 7e1a4ac955..3bed7002d2 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -262,13 +262,13 @@ static void ref405ep_init(MachineState *machine) /* allocate and load BIOS */ if (machine->firmware) { MemoryRegion *bios = g_new(MemoryRegion, 1); -g_autofree char *filename; +g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, + machine->firmware); long bios_size; memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE, _fatal); -filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware); if (!filename) { error_report("Could not find firmware '%s'", machine->firmware); exit(1); -- 2.35.1
Re: [PATCH] hw/ppc/ppc405_boards: Initialize g_autofree pointer
Am 5. April 2022 12:00:19 UTC schrieb Peter Maydell : >On Tue, 5 Apr 2022 at 12:32, Bernhard Beschow wrote: >> >> Resolves the only compiler warning when building a full QEMU under Arch >> Linux: >> >> Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o >> In file included from /usr/include/glib-2.0/glib.h:114, >>from qemu/include/glib-compat.h:32, >>from qemu/include/qemu/osdep.h:132, >>from ../src/hw/ppc/ppc405_boards.c:25: >> ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’: >> /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ >> may be used uninitialized in this function [-Wmaybe-uninitialized] >> 28 | g_free (*pp); >> | ^~~~ >> ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here >> 265 | g_autofree char *filename; >> | ^~~~ >> >> Signed-off-by: Bernhard Beschow >> --- >> hw/ppc/ppc405_boards.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c >> index 7e1a4ac955..326353ea25 100644 >> --- a/hw/ppc/ppc405_boards.c >> +++ b/hw/ppc/ppc405_boards.c >> @@ -262,7 +262,7 @@ static void ref405ep_init(MachineState *machine) >> /* allocate and load BIOS */ >> if (machine->firmware) { >> MemoryRegion *bios = g_new(MemoryRegion, 1); >> -g_autofree char *filename; >> +g_autofree char *filename = NULL; >> long bios_size; >> >> memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE, > >The compiler's wrong here, because there's no way to get to the free >without passing through the actual initialization: Yep. It breaks compilation with -Werror, though, which is useful for development. > >filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware); > >I think I would prefer a fix which hoisted that up to the declaration, >rather than setting it to NULL and then unconditionally overwriting that >(which some future compiler version might notice and warn about): > > g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, >machine->firmware); Ack - I prefer that solution and I'll submit v2. I'm often confused as to when to use RAII in QEMU and when not to. Best regards, Bernhard > >thanks >-- PMM
[PATCH] [PATCH RFC v3] Implements Backend Program conventions for vhost-user-scsi
Signed-off-by: Sakshi Kaushik --- contrib/vhost-user-scsi/vhost-user-scsi.c | 76 +++ 1 file changed, 51 insertions(+), 25 deletions(-) diff --git a/contrib/vhost-user-scsi/vhost-user-scsi.c b/contrib/vhost-user-scsi/vhost-user-scsi.c index 4f6e3e2a24..74ec44d190 100644 --- a/contrib/vhost-user-scsi/vhost-user-scsi.c +++ b/contrib/vhost-user-scsi/vhost-user-scsi.c @@ -351,34 +351,58 @@ fail: /** vhost-user-scsi **/ +int opt_fdnum = -1; +char *opt_socket_path; +gboolean opt_print_caps; +char *iscsi_uri; + +static GOptionEntry entries[] = { +{ "print-capabilities", 'c', 0, G_OPTION_ARG_NONE, _print_caps, + "Print capabilities", NULL }, +{ "fd", 'f', 0, G_OPTION_ARG_INT, _fdnum, + "Use inherited fd socket", "FDNUM" }, +{ "iscsi_uri", 'i', 0, G_OPTION_ARG_FILENAME, _uri, + "Use inherited fd socket", "FDNUM" }, +{ "socket-path", 's', 0, G_OPTION_ARG_FILENAME, _socket_path, + "Use UNIX socket path", "PATH" } +}; + int main(int argc, char **argv) { VusDev *vdev_scsi = NULL; -char *unix_fn = NULL; -char *iscsi_uri = NULL; -int lsock = -1, csock = -1, opt, err = EXIT_SUCCESS; - -while ((opt = getopt(argc, argv, "u:i:")) != -1) { -switch (opt) { -case 'h': -goto help; -case 'u': -unix_fn = g_strdup(optarg); -break; -case 'i': -iscsi_uri = g_strdup(optarg); -break; -default: -goto help; -} +int lsock = -1, csock = -1, err = EXIT_SUCCESS; + +GError *error = NULL; +GOptionContext *context; + +context = g_option_context_new(NULL); +g_option_context_add_main_entries(context, entries, NULL); +if (!g_option_context_parse(context, , , )) { +g_printerr("Option parsing failed: %s\n", error->message); +exit(EXIT_FAILURE); } -if (!unix_fn || !iscsi_uri) { + +if (opt_print_caps) { +g_print("{\n"); +g_print(" \"type\": \"scsi\",\n"); +g_print("}\n"); +goto out; +} + +if (!opt_socket_path || !iscsi_uri) { goto help; } -lsock = unix_sock_new(unix_fn); -if (lsock < 0) { -goto err; +if (opt_socket_path) { +lsock = unix_sock_new(opt_socket_path); +if (lsock < 0) { +exit(EXIT_FAILURE); +} +} else if (opt_fdnum < 0) { +g_print("%s\n", g_option_context_get_help(context, true, NULL)); +exit(EXIT_FAILURE); +} else { +lsock = opt_fdnum; } csock = accept(lsock, NULL, NULL); @@ -408,7 +432,7 @@ out: if (vdev_scsi) { g_main_loop_unref(vdev_scsi->loop); g_free(vdev_scsi); -unlink(unix_fn); +unlink(opt_socket_path); } if (csock >= 0) { close(csock); @@ -416,7 +440,7 @@ out: if (lsock >= 0) { close(lsock); } -g_free(unix_fn); +g_free(opt_socket_path); g_free(iscsi_uri); return err; @@ -426,10 +450,12 @@ err: goto out; help: -fprintf(stderr, "Usage: %s [ -u unix_sock_path -i iscsi_uri ] | [ -h ]\n", +fprintf(stderr, "Usage: %s [ -s socket-path -i iscsi_uri -f fd -p print-capabilities ] | [ -h ]\n", argv[0]); -fprintf(stderr, " -u path to unix socket\n"); +fprintf(stderr, " -s path to unix socket\n"); fprintf(stderr, " -i iscsi uri for lun 0\n"); +fprintf(stderr, " -f fd, file-descriptor\n"); +fprintf(stderr, " -p denotes print-capabilities\n"); fprintf(stderr, " -h print help and quit\n"); goto err; -- 2.17.1
Re: [PATCH] block/stream: Drain subtree around graph change
05.04.2022 13:14, Kevin Wolf wrote: Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben: When the stream block job cuts out the nodes between top and base in stream_prepare(), it does not drain the subtree manually; it fetches the base node, and tries to insert it as the top node's backing node with bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so the actual base node might change (because the base node is actually not part of the stream job) before the old base node passed to bdrv_set_backing_hd() is installed. This has two implications: First, the stream job does not keep a strong reference to the base node. Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g. because some other block job is drained to finish), we will get a use-after-free. We should keep a strong reference to that node. Second, even with such a strong reference, the problem remains that the base node might change before bdrv_set_backing_hd() actually runs and as a result the wrong base node is installed. Both effects can be seen in 030's TestParallelOps.test_overlapping_5() case, which has five nodes, and simultaneously streams from the middle node to the top node, and commits the middle node down to the base node. As it is, this will sometimes crash, namely when we encounter the above-described use-after-free. Taking a strong reference to the base node, we no longer get a crash, but the resuling block graph is less than ideal: The expected result is obviously that all middle nodes are cut out and the base node is the immediate backing child of the top node. However, if stream_prepare() takes a strong reference to its base node (the middle node), and then the commit job finishes in bdrv_set_backing_hd(), supposedly dropping that middle node, the stream job will just reinstall it again. Therefore, we need to keep the whole subtree drained in stream_prepare() That doesn't sound right. I think in reality it's "if we take the really big hammer and drain the whole subtree, then the bit that we really need usually happens to be covered, too". When you have a long backing chain and merge the two topmost overlays with streaming, then it's none of the stream job's business whether there is I/O going on for the base image way down the chain. Subtree drains do much more than they should in this case. At the same time they probably do too little, because what you're describing you're protecting against is not I/O, but graph modifications done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the backing file. The callback could be invoked by I/O on an entirely different subgraph (maybe if the other thing is a mirror job) or it could be a BH or anything else really. bdrv_drain_all() would increase your chances, but I'm not sure if even that would be guaranteed to be enough - because it's really another instance of abusing drain for locking, we're not really interested in the _I/O_ of the node. so that the graph modification it performs is effectively atomic, i.e. that the base node it fetches is still the base node when bdrv_set_backing_hd() sets it as the top node's backing node. I think the way to keep graph modifications atomic is avoid polling in the middle. Not even running any callbacks is a lot safer than trying to make sure there can't be undesired callbacks that want to run. So probably adding drain (or anything else that polls) in bdrv_set_backing_hd() was a bad idea. It could assert that the parent node is drained, but it should be the caller's responsibility to do so. What streaming completion should look like is probably something like this: 1. Drain above_base, this also drains all parents up to the top node (needed because in-flight I/O using an edge that is removed isn't going to end well) 2. Without any polling involved: a. Find base (it can't change without polling) b. Update top->backing to point to base 3. End of drain. You don't have to keep extra references or deal with surprise removals of nodes because the whole thing is atomic when you don't poll. Other threads can't interfere either because graph modification requires the BQL. There is no reason to keep base drained because its I/O doesn't interfere with the incoming edge that we're changing. I think all of this is really relevant for Emanuele's work, which involves adding AIO_WAIT_WHILE() deep inside graph update functions. I fully expect that we would see very similar problems, and just stacking drain sections over drain sections that might happen to usually fix things, but aren't guaranteed to, doesn't look like a good solution. Thanks Kevin! I have already run out of arguments in the battle against using subtree-drains to isolate graph modification operations from each other in different threads in the mailing list) (Note also, that the top-most version of this patch is "[PATCH v2] block/stream: Drain subtree around graph change") About
Re: [PATCH] block/stream: Drain subtree around graph change
On 05.04.22 13:47, Hanna Reitz wrote: On 05.04.22 12:14, Kevin Wolf wrote: [...] At the same time they probably do too little, because what you're describing you're protecting against is not I/O, but graph modifications done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the backing file. The callback could be invoked by I/O on an entirely different subgraph (maybe if the other thing is a mirror job)or it could be a BH or anything else really. bdrv_drain_all() would increase your chances, but I'm not sure if even that would be guaranteed to be enough - because it's really another instance of abusing drain for locking, we're not really interested in the _I/O_ of the node. [...] I’m not sure what you’re arguing for, so I can only assume. Perhaps you’re arguing for reverting this patch, which I wouldn’t want to do, because at least it fixes the one known use-after-free case. Perhaps you’re arguing that we need something better, and then I completely agree. Perhaps I should also note that what actually fixes the use-after-free is the bdrv_ref()/unref() pair. The drained section is just there to ensure that the graph is actually correct (i.e. if a concurrently finishing job removes @base before the stream job’s bdrv_set_backing_hd() can set it as the top node’s backing node, that we won’t reinstate this @base that the other job just removed). So even if this does too little, at least there won’t be a use-after-free. OTOH, if it does much too much, we can drop the drain and keep the ref/unref. I don’t want to have a release with a use-after-free that I know of, but I’d be fine if the block graph is “just” outdated. Hanna
Re: [PATCH] hw/ppc/ppc405_boards: Initialize g_autofree pointer
On Tue, 5 Apr 2022 at 12:32, Bernhard Beschow wrote: > > Resolves the only compiler warning when building a full QEMU under Arch Linux: > > Compiling C object libqemu-ppc-softmmu.fa.p/hw_ppc_ppc405_boards.c.o > In file included from /usr/include/glib-2.0/glib.h:114, >from qemu/include/glib-compat.h:32, >from qemu/include/qemu/osdep.h:132, >from ../src/hw/ppc/ppc405_boards.c:25: > ../src/hw/ppc/ppc405_boards.c: In function ‘ref405ep_init’: > /usr/include/glib-2.0/glib/glib-autocleanups.h:28:3: warning: ‘filename’ > may be used uninitialized in this function [-Wmaybe-uninitialized] > 28 | g_free (*pp); > | ^~~~ > ../src/hw/ppc/ppc405_boards.c:265:26: note: ‘filename’ was declared here > 265 | g_autofree char *filename; > | ^~~~ > > Signed-off-by: Bernhard Beschow > --- > hw/ppc/ppc405_boards.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c > index 7e1a4ac955..326353ea25 100644 > --- a/hw/ppc/ppc405_boards.c > +++ b/hw/ppc/ppc405_boards.c > @@ -262,7 +262,7 @@ static void ref405ep_init(MachineState *machine) > /* allocate and load BIOS */ > if (machine->firmware) { > MemoryRegion *bios = g_new(MemoryRegion, 1); > -g_autofree char *filename; > +g_autofree char *filename = NULL; > long bios_size; > > memory_region_init_rom(bios, NULL, "ef405ep.bios", BIOS_SIZE, The compiler's wrong here, because there's no way to get to the free without passing through the actual initialization: filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware); I think I would prefer a fix which hoisted that up to the declaration, rather than setting it to NULL and then unconditionally overwriting that (which some future compiler version might notice and warn about): g_autofree char *filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, machine->firmware); thanks -- PMM
Re: [PATCH] ui/cursor: fix integer overflow in cursor_alloc (CVE-2022-4206)
On Tue, 5 Apr 2022 at 11:50, Mauro Matteo Cascella wrote: > > Prevent potential integer overflow by limiting 'width' and 'height' to > 512x512. Also change 'datasize' type to size_t. Refer to security > advisory https://starlabs.sg/advisories/22-4206/ for more information. > > Fixes: CVE-2022-4206 > Signed-off-by: Mauro Matteo Cascella > diff --git a/ui/cursor.c b/ui/cursor.c > index 1d62ddd4d0..7cfb08a030 100644 > --- a/ui/cursor.c > +++ b/ui/cursor.c > @@ -46,6 +46,13 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[]) > > /* parse pixel data */ > c = cursor_alloc(width, height); > + > +if (!c) { > +fprintf(stderr, "%s: cursor %ux%u alloc error\n", > +__func__, width, height); > +return NULL; Side note, we should probably clean up the error handling in this file to not be "print to stderr" at some point... > +} > + > for (pixel = 0, y = 0; y < height; y++, line++) { > for (x = 0; x < height; x++, pixel++) { > idx = xpm[line][x]; > @@ -91,7 +98,10 @@ QEMUCursor *cursor_builtin_left_ptr(void) > QEMUCursor *cursor_alloc(int width, int height) > { > QEMUCursor *c; > -int datasize = width * height * sizeof(uint32_t); > +size_t datasize = width * height * sizeof(uint32_t); > + > +if (width > 512 || height > 512) > +return NULL; Coding style requires braces on if statements. thanks -- PMM
Re: [PATCH] block/stream: Drain subtree around graph change
On 05.04.22 12:14, Kevin Wolf wrote: Am 24.03.2022 um 13:57 hat Hanna Reitz geschrieben: When the stream block job cuts out the nodes between top and base in stream_prepare(), it does not drain the subtree manually; it fetches the base node, and tries to insert it as the top node's backing node with bdrv_set_backing_hd(). bdrv_set_backing_hd() however will drain, and so the actual base node might change (because the base node is actually not part of the stream job) before the old base node passed to bdrv_set_backing_hd() is installed. This has two implications: First, the stream job does not keep a strong reference to the base node. Therefore, if it is deleted in bdrv_set_backing_hd()'s drain (e.g. because some other block job is drained to finish), we will get a use-after-free. We should keep a strong reference to that node. Second, even with such a strong reference, the problem remains that the base node might change before bdrv_set_backing_hd() actually runs and as a result the wrong base node is installed. Both effects can be seen in 030's TestParallelOps.test_overlapping_5() case, which has five nodes, and simultaneously streams from the middle node to the top node, and commits the middle node down to the base node. As it is, this will sometimes crash, namely when we encounter the above-described use-after-free. Taking a strong reference to the base node, we no longer get a crash, but the resuling block graph is less than ideal: The expected result is obviously that all middle nodes are cut out and the base node is the immediate backing child of the top node. However, if stream_prepare() takes a strong reference to its base node (the middle node), and then the commit job finishes in bdrv_set_backing_hd(), supposedly dropping that middle node, the stream job will just reinstall it again. Therefore, we need to keep the whole subtree drained in stream_prepare() That doesn't sound right. I think in reality it's "if we take the really big hammer and drain the whole subtree, then the bit that we really need usually happens to be covered, too". When you have a long backing chain and merge the two topmost overlays with streaming, then it's none of the stream job's business whether there is I/O going on for the base image way down the chain. Subtree drains do much more than they should in this case. Yes, see the discussion I had with Vladimir. He convinced me that this can’t be an indefinite solution, but that we need locking for graph changes that’s separate from draining, because (1) those are different things, and (2) changing the graph should influence I/O as little as possible. I found this the best solution to fix a known case of a use-after-free for 7.1, though. At the same time they probably do too little, because what you're describing you're protecting against is not I/O, but graph modifications done by callbacks invoked in the AIO_WAIT_WHILE() when replacing the backing file. The callback could be invoked by I/O on an entirely different subgraph (maybe if the other thing is a mirror job)or it could be a BH or anything else really. bdrv_drain_all() would increase your chances, but I'm not sure if even that would be guaranteed to be enough - because it's really another instance of abusing drain for locking, we're not really interested in the _I/O_ of the node. The most common instances of graph modification I see are QMP and block jobs finishing. The former will not be deterred by draining, and we do know of one instance where that is a problem (see the bdrv_next() discussion). Generally, it isn’t though. (If it is, this case here won’t be the only thing that breaks.) As for the latter, most block jobs are parents of the nodes they touch (stream is one notable exception with how it handles its base, and changing that did indeed cause us headache before), and so will at least be paused when a drain occurs on a node they touch. Since pausing doesn’t affect jobs that have exited their main loop, there might be some problem with concurrent jobs that are also finished but yielding, but I couldn’t find such a case. I’m not sure what you’re arguing for, so I can only assume. Perhaps you’re arguing for reverting this patch, which I wouldn’t want to do, because at least it fixes the one known use-after-free case. Perhaps you’re arguing that we need something better, and then I completely agree. so that the graph modification it performs is effectively atomic, i.e. that the base node it fetches is still the base node when bdrv_set_backing_hd() sets it as the top node's backing node. I think the way to keep graph modifications atomic is avoid polling in the middle. Not even running any callbacks is a lot safer than trying to make sure there can't be undesired callbacks that want to run. So probably adding drain (or anything else that polls) in bdrv_set_backing_hd() was a bad idea. It could assert that the parent node is drained, but it should be the
Re: [PULL 0/3] Misc changes for 2022-04-05
On Tue, 5 Apr 2022 at 10:25, Paolo Bonzini wrote: > > The following changes since commit 20661b75ea6093f5e59079d00a778a972d6732c5: > > Merge tag 'pull-ppc-20220404' of https://github.com/legoater/qemu into > staging (2022-04-04 15:48:55 +0100) > > are available in the Git repository at: > > https://gitlab.com/bonzini/qemu.git tags/for-upstream > > for you to fetch changes up to 776a6a32b4982a68d3b7a77cbfaae6c2b363a0b8: > > docs/system/i386: Add measurement calculation details to > amd-memory-encryption (2022-04-05 10:42:06 +0200) > > > * fix vss-win32 compilation with clang++ > > * update Coverity model > > * add measurement calculation to amd-memory-encryption docs > > Hi; this tag doesn't match what your pullreq cover letter claims it is -- it is pointing at 267b85d4e3d15, not 776a6a32b49, and it has way more than 3 patches in it. thanks -- PMM