Re: [PATCH 12/21] vhost-vdpa: introduce uAPI to get the number of virtqueue groups
On 2020/12/29 下午8:24, Eli Cohen wrote: On Wed, Dec 16, 2020 at 02:48:09PM +0800, Jason Wang wrote: Follows the vDPA support for multiple address spaces, this patch introduce uAPI for the userspace to know the number of virtqueue groups supported by the vDPA device. Can you explain what exactly you mean be userspace? It's the userspace that uses the uAPI introduced in this patch. Is it just qemu or is it destined to the virtio_net driver run by the qemu process? It could be Qemu, DPDK or other userspace program. The guest virtio-net driver will not use this but talks to the virtio device emulated by Qemu. Also can you say for what purpose? This can be used for facilitate the checking of whether the control vq could be supported. E.g if the device support less than 2 groups, qemu won't advertise control vq. Yes, #groups could be inferred from GET_VRING_GROUP. But it's not straightforward as this. Thanks Signed-off-by: Jason Wang --- drivers/vhost/vdpa.c | 4 include/uapi/linux/vhost.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 060d5b5b7e64..1ba5901b28e7 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -536,6 +536,10 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, case VHOST_VDPA_GET_VRING_NUM: r = vhost_vdpa_get_vring_num(v, argp); break; + case VHOST_VDPA_GET_GROUP_NUM: + r = copy_to_user(argp, &v->vdpa->ngroups, +sizeof(v->vdpa->ngroups)); + break; case VHOST_SET_LOG_BASE: case VHOST_SET_LOG_FD: r = -ENOIOCTLCMD; diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index 59c6c0fbaba1..8a4e6e426bbf 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -145,4 +145,7 @@ /* Get the valid iova range */ #define VHOST_VDPA_GET_IOVA_RANGE _IOR(VHOST_VIRTIO, 0x78, \ struct vhost_vdpa_iova_range) +/* Get the number of virtqueue groups. */ +#define VHOST_VDPA_GET_GROUP_NUM _IOR(VHOST_VIRTIO, 0x79, unsigned int) + #endif -- 2.25.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB
On 2020/12/29 下午7:53, Eli Cohen wrote: + +static struct vhost_vdpa_as *vhost_vdpa_alloc_as(struct vhost_vdpa *v, u32 asid) +{ + struct hlist_head *head = &v->as[asid % VHOST_VDPA_IOTLB_BUCKETS]; + struct vhost_vdpa_as *as; + + if (asid_to_as(v, asid)) + return NULL; + + as = kmalloc(sizeof(*as), GFP_KERNEL); kzalloc()? See comment below. + if (!as) + return NULL; + + vhost_iotlb_init(&as->iotlb, 0, 0); + as->id = asid; + hlist_add_head(&as->hash_link, head); + ++v->used_as; Although you eventually ended up removing used_as, this is a bug since you're incrementing a random value. Maybe it's better to be on the safe side and use kzalloc() for as above. Yes and used_as needs to be removed. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB
On 2020/12/29 下午8:05, Eli Cohen wrote: + +static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid) The return value is never interpreted. I think it should either be made void or return values checked. Right, will make it void. +{ + struct vhost_vdpa_as *as = asid_to_as(v, asid); + + /* Remove default address space is not allowed */ + if (asid == 0) + return -EINVAL; Can you explain why? I think you have a memory leak due to this as no one will ever free as with id 0. Looks like a bug. Will remove this. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 11/21] vhost-vdpa: introduce asid based IOTLB
On 2020/12/29 下午7:41, Eli Cohen wrote: On Wed, Dec 16, 2020 at 02:48:08PM +0800, Jason Wang wrote: This patch converts the vhost-vDPA device to support multiple IOTLBs tagged via ASID via hlist. This will be used for supporting multiple address spaces in the following patches. Signed-off-by: Jason Wang --- drivers/vhost/vdpa.c | 106 --- 1 file changed, 80 insertions(+), 26 deletions(-) diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index feb6a58df22d..060d5b5b7e64 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -33,13 +33,21 @@ enum { #define VHOST_VDPA_DEV_MAX (1U << MINORBITS) +#define VHOST_VDPA_IOTLB_BUCKETS 16 + +struct vhost_vdpa_as { + struct hlist_node hash_link; + struct vhost_iotlb iotlb; + u32 id; +}; + struct vhost_vdpa { struct vhost_dev vdev; struct iommu_domain *domain; struct vhost_virtqueue *vqs; struct completion completion; struct vdpa_device *vdpa; - struct vhost_iotlb *iotlb; + struct hlist_head as[VHOST_VDPA_IOTLB_BUCKETS]; struct device dev; struct cdev cdev; atomic_t opened; @@ -49,12 +57,64 @@ struct vhost_vdpa { struct eventfd_ctx *config_ctx; int in_batch; struct vdpa_iova_range range; + int used_as; This is not really used. Not in this patch and later removed. Right, will remove this. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message
On 2020/12/29 下午6:26, Yongji Xie wrote: On Tue, Dec 29, 2020 at 5:11 PM Jason Wang wrote: - Original Message - On Mon, Dec 28, 2020 at 4:43 PM Jason Wang wrote: On 2020/12/28 下午4:14, Yongji Xie wrote: I see. So all the above two questions are because VHOST_IOTLB_INVALIDATE is expected to be synchronous. This need to be solved by tweaking the current VDUSE API or we can re-visit to go with descriptors relaying first. Actually all vdpa related operations are synchronous in current implementation. The ops.set_map/dma_map/dma_unmap should not return until the VDUSE_UPDATE_IOTLB/VDUSE_INVALIDATE_IOTLB message is replied by userspace. Could it solve this problem? I was thinking whether or not we need to generate IOTLB_INVALIDATE message to VDUSE during dma_unmap (vduse_dev_unmap_page). If we don't, we're probably fine. It seems not feasible. This message will be also used in the virtio-vdpa case to notify userspace to unmap some pages during consistent dma unmapping. Maybe we can document it to make sure the users can handle the message correctly. Just to make sure I understand your point. Do you mean you plan to notify the unmap of 1) streaming DMA or 2) coherent DMA? For 1) you probably need a workqueue to do that since dma unmap can be done in irq or bh context. And if usrspace does't do the unmap, it can still access the bounce buffer (if you don't zap pte)? I plan to do it in the coherent DMA case. Any reason for treating coherent DMA differently? It's true that userspace can access the dma buffer if userspace doesn't do the unmap. But the dma pages would not be freed and reused unless user space called munmap() for them. I wonder whether or not we could recycle IOVA in this case to avoid the IOTLB_UMAP message. Thanks Thanks, Yongji ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 10/21] vhost: support ASID in IOTLB API
On 2020/12/29 下午6:20, Eli Cohen wrote: -static int vhost_process_iotlb_msg(struct vhost_dev *dev, +static int vhost_process_iotlb_msg(struct vhost_dev *dev, u16 asid, struct vhost_iotlb_msg *msg) { int ret = 0; + if (asid != 0) + return -EINVAL; + mutex_lock(&dev->mutex); vhost_dev_lock_vqs(dev); switch (msg->type) { @@ -1135,6 +1138,7 @@ ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct vhost_iotlb_msg msg; size_t offset; int type, ret; + u16 asid = 0; You assume asid occupies just 16 bits. So maybe you should reserve the other 16 bits for future extension: struct vhost_msg_v2 { __u32 type; - __u32 reserved; + __u16 asid; + __u16 reserved; union { Moreover, maybe this should be reflected in previous patches that use the asid: -static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb) +static int mlx5_vdpa_set_map(struct vdpa_device *vdev, u16 asid, +struct vhost_iotlb *iotlb) -static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, +static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, u16 asid, struct vhost_iotlb_msg *msg) etc. Good catch. This is a bug of the code actually. Since I want to stick to 32bit to be large enough for e.g PASID. Will fix. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 07/21] vdpa: multiple address spaces support
On 2020/12/29 下午3:28, Eli Cohen wrote: @@ -43,6 +43,8 @@ struct vdpa_vq_state { * @index: device index * @features_valid: were features initialized? for legacy guests * @nvqs: the number of virtqueues + * @ngroups: the number of virtqueue groups + * @nas: the number of address spaces I am not sure these can be categorised as part of the state of the VQ. It's more of a property so maybe we can have a callback to get the properties of the VQ? Or maybe there's a misunderstanding of the patch. Those two attributes belongs to vdpa_device instead of vdpa_vq_state actually. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 07/21] vdpa: multiple address spaces support
On 2020/12/29 下午3:28, Eli Cohen wrote: @@ -43,6 +43,8 @@ struct vdpa_vq_state { * @index: device index * @features_valid: were features initialized? for legacy guests * @nvqs: the number of virtqueues + * @ngroups: the number of virtqueue groups + * @nas: the number of address spaces I am not sure these can be categorised as part of the state of the VQ. It's more of a property so maybe we can have a callback to get the properties of the VQ? Moreover, I think they should be handled in the hardware drivers to return 1 for both ngroups and nas. We can, but those are static attributes that is not expected to be changed by the driver. Introduce callbacks for those static stuffs does not give us any advantage. For group id and notification area, since we don't have a abstract of vdpa_virtqueue. The only choice is to introduce callbacks for them. Maybe it's time to introduce vdpa_virtqueue. Thanks ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
On Tue, Dec 29, 2020 at 4:23 AM Jason Wang wrote: > > > > - Original Message - > > On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin wrote: > > > > > > On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote: > > > > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin > > > > wrote: > > > > > > > > > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote: > > > > > > From: Willem de Bruijn > > > > > > > > > > > > Add optional PTP hardware timestamp offload for virtio-net. > > > > > > > > > > > > Accurate RTT measurement requires timestamps close to the wire. > > > > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the > > > > > > virtio-net header is expanded with room for a timestamp. A host may > > > > > > pass receive timestamps for all or some packets. A timestamp is > > > > > > valid > > > > > > if non-zero. > > > > > > > > > > > > The timestamp straddles (virtual) hardware domains. Like PTP, use > > > > > > international atomic time (CLOCK_TAI) as global clock base. It is > > > > > > guest responsibility to sync with host, e.g., through kvm-clock. > > > > > > > > > > > > Signed-off-by: Willem de Bruijn > > > > > > diff --git a/include/uapi/linux/virtio_net.h > > > > > > b/include/uapi/linux/virtio_net.h > > > > > > index f6881b5b77ee..0ffe2eeebd4a 100644 > > > > > > --- a/include/uapi/linux/virtio_net.h > > > > > > +++ b/include/uapi/linux/virtio_net.h > > > > > > @@ -57,6 +57,7 @@ > > > > > >* Steering */ > > > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */ > > > > > > > > > > > > +#define VIRTIO_NET_F_RX_TSTAMP 55/* Host sends TAI > > > > > > receive time */ > > > > > > #define VIRTIO_NET_F_TX_HASH 56/* Guest sends hash report */ > > > > > > #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */ > > > > > > #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */ > > > > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash { > > > > > > }; > > > > > > }; > > > > > > > > > > > > +struct virtio_net_hdr_v12 { > > > > > > + struct virtio_net_hdr_v1 hdr; > > > > > > + struct { > > > > > > + __le32 value; > > > > > > + __le16 report; > > > > > > + __le16 flow_state; > > > > > > + } hash; > > > > > > + __virtio32 reserved; > > > > > > + __virtio64 tstamp; > > > > > > +}; > > > > > > + > > > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > > > > /* This header comes first in the scatter-gather list. > > > > > > * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it > > > > > > must > > > > > > > > > > > > > > > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both > > > > > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then? > > > > > > > > Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP? > > > > > > > > I think if either is enabled we need to enable the extended layout. > > > > Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are > > > > not, then those fields are ignored. > > > > > > I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not? > > > If TSTAMP depends on HASH then point is moot. > > > > True, but the two features really are independent. > > > > I did consider using configurable metadata layout depending on > > features negotiated. If there are tons of optional extensions, that > > makes sense. But it is more complex and parsing error prone. With a > > handful of options each of a few bytes, that did not seem worth the > > cost to me at this point. > > Consider NFV workloads(64B) packet. Most fields of current vnet header > is even a burdern. Such workloads will not negotiate these features, I imagine. I think this could only become an issue if a small packet workload would want to negotiate one optional feature, without the others. Even then, the actual cost of a few untouched bytes is negligible, as long as the headers don't span cache-lines. I expect it to be cheaper than having to parse a dynamic layout. > It might be the time to introduce configurable or > self-descriptive vnet header. I did briefly toy with a type-val encoding. Not type-val-len, as all options have fixed length (so far), so length can be left implicit. But as each feature is negotiated before hand, the type can be left implicit too. Leaving just the data elements tightly packed. As long as order is defined. > > And importantly, such a mode can always be added later as a separate > > VIRTIO_NET_F_PACKED_HEADER option. > > What will do feature provide? The tightly packed data elements. Strip out the elements for non negotiated features. So if either tstamp option is negotiated, but hash is not, exclude the 64b of hash fields. Note that I'm not actually arguing for this (at this point). ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.
Re: [PATCH rfc 2/3] virtio-net: support receive timestamp
- Original Message - > On Mon, Dec 28, 2020 at 7:55 PM Michael S. Tsirkin wrote: > > > > On Mon, Dec 28, 2020 at 02:30:31PM -0500, Willem de Bruijn wrote: > > > On Mon, Dec 28, 2020 at 12:29 PM Michael S. Tsirkin > > > wrote: > > > > > > > > On Mon, Dec 28, 2020 at 11:22:32AM -0500, Willem de Bruijn wrote: > > > > > From: Willem de Bruijn > > > > > > > > > > Add optional PTP hardware timestamp offload for virtio-net. > > > > > > > > > > Accurate RTT measurement requires timestamps close to the wire. > > > > > Introduce virtio feature VIRTIO_NET_F_RX_TSTAMP. If negotiated, the > > > > > virtio-net header is expanded with room for a timestamp. A host may > > > > > pass receive timestamps for all or some packets. A timestamp is valid > > > > > if non-zero. > > > > > > > > > > The timestamp straddles (virtual) hardware domains. Like PTP, use > > > > > international atomic time (CLOCK_TAI) as global clock base. It is > > > > > guest responsibility to sync with host, e.g., through kvm-clock. > > > > > > > > > > Signed-off-by: Willem de Bruijn > > > > > diff --git a/include/uapi/linux/virtio_net.h > > > > > b/include/uapi/linux/virtio_net.h > > > > > index f6881b5b77ee..0ffe2eeebd4a 100644 > > > > > --- a/include/uapi/linux/virtio_net.h > > > > > +++ b/include/uapi/linux/virtio_net.h > > > > > @@ -57,6 +57,7 @@ > > > > >* Steering */ > > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23/* Set MAC address */ > > > > > > > > > > +#define VIRTIO_NET_F_RX_TSTAMP 55/* Host sends TAI > > > > > receive time */ > > > > > #define VIRTIO_NET_F_TX_HASH 56/* Guest sends hash report */ > > > > > #define VIRTIO_NET_F_HASH_REPORT 57 /* Supports hash report */ > > > > > #define VIRTIO_NET_F_RSS 60/* Supports RSS RX steering */ > > > > > @@ -182,6 +183,17 @@ struct virtio_net_hdr_v1_hash { > > > > > }; > > > > > }; > > > > > > > > > > +struct virtio_net_hdr_v12 { > > > > > + struct virtio_net_hdr_v1 hdr; > > > > > + struct { > > > > > + __le32 value; > > > > > + __le16 report; > > > > > + __le16 flow_state; > > > > > + } hash; > > > > > + __virtio32 reserved; > > > > > + __virtio64 tstamp; > > > > > +}; > > > > > + > > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > > > /* This header comes first in the scatter-gather list. > > > > > * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it > > > > > must > > > > > > > > > > > > So it looks like VIRTIO_NET_F_RX_TSTAMP should depend on both > > > > VIRTIO_NET_F_RX_TSTAMP and VIRTIO_NET_F_HASH_REPORT then? > > > > > > Do you mean VIRTIO_NET_F_TX_TSTAMP depends on VIRTIO_NET_F_RX_TSTAMP? > > > > > > I think if either is enabled we need to enable the extended layout. > > > Regardless of whether TX_HASH or HASH_REPORT are enabled. If they are > > > not, then those fields are ignored. > > > > I mean do we waste the 8 bytes for hash if TSTAMP is set but HASH is not? > > If TSTAMP depends on HASH then point is moot. > > True, but the two features really are independent. > > I did consider using configurable metadata layout depending on > features negotiated. If there are tons of optional extensions, that > makes sense. But it is more complex and parsing error prone. With a > handful of options each of a few bytes, that did not seem worth the > cost to me at this point. Consider NFV workloads(64B) packet. Most fields of current vnet header is even a burdern. It might be the time to introduce configurable or self-descriptive vnet header. > > And importantly, such a mode can always be added later as a separate > VIRTIO_NET_F_PACKED_HEADER option. What will do feature provide? Thanks > > If anything, perhaps if we increase the virtio_net_hdr_* allocation, > we should allocate some additional reserved space now? As each new > size introduces quite a bit of boilerplate. Also, e.g., in qemu just > to pass the sizes between virtio-net driver and vhost-net device. > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [RFC v2 09/13] vduse: Add support for processing vhost iotlb message
- Original Message - > On Mon, Dec 28, 2020 at 4:43 PM Jason Wang wrote: > > > > > > On 2020/12/28 下午4:14, Yongji Xie wrote: > > >> I see. So all the above two questions are because VHOST_IOTLB_INVALIDATE > > >> is expected to be synchronous. This need to be solved by tweaking the > > >> current VDUSE API or we can re-visit to go with descriptors relaying > > >> first. > > >> > > > Actually all vdpa related operations are synchronous in current > > > implementation. The ops.set_map/dma_map/dma_unmap should not return > > > until the VDUSE_UPDATE_IOTLB/VDUSE_INVALIDATE_IOTLB message is replied > > > by userspace. Could it solve this problem? > > > > > > I was thinking whether or not we need to generate IOTLB_INVALIDATE > > message to VDUSE during dma_unmap (vduse_dev_unmap_page). > > > > If we don't, we're probably fine. > > > > It seems not feasible. This message will be also used in the > virtio-vdpa case to notify userspace to unmap some pages during > consistent dma unmapping. Maybe we can document it to make sure the > users can handle the message correctly. Just to make sure I understand your point. Do you mean you plan to notify the unmap of 1) streaming DMA or 2) coherent DMA? For 1) you probably need a workqueue to do that since dma unmap can be done in irq or bh context. And if usrspace does't do the unmap, it can still access the bounce buffer (if you don't zap pte)? Thanks > > Thanks, > Yongji > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] virtio-mem: use false for bool variable
On 29.12.20 03:44, Tian Tao wrote: > Fixes coccicheck warnings: > drivers/virtio/virtio_mem.c:2580:2-25: WARNING: Assignment of 0/1 to > bool variable. > > Signed-off-by: Tian Tao > --- > drivers/virtio/virtio_mem.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c > index 9fc9ec4..85a272c 100644 > --- a/drivers/virtio/virtio_mem.c > +++ b/drivers/virtio/virtio_mem.c > @@ -2577,7 +2577,7 @@ static int virtio_mem_probe(struct virtio_device *vdev) >* actually in use (e.g., trying to reload the driver). >*/ > if (vm->plugged_size) { > - vm->unplug_all_required = 1; > + vm->unplug_all_required = true; > dev_info(&vm->vdev->dev, "unplugging all memory is required\n"); > } > > Thanks - the patch subject is a little weird ("false"). I suggest "virtio-mem: use boolean value when setting vm->unplug_all_required" Apart from that Acked-by: David Hildenbrand @Mst, do you want a resend or can you fixup the subject? Thanks! -- Thanks, David / dhildenb ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2 0/6] KVM: arm64: VCPU preempted check support
On 2020/1/15 22:14, Marc Zyngier wrote: > On 2020-01-13 12:12, Will Deacon wrote: >> [+PeterZ] >> >> On Thu, Dec 26, 2019 at 09:58:27PM +0800, Zengruan Ye wrote: >>> This patch set aims to support the vcpu_is_preempted() functionality >>> under KVM/arm64, which allowing the guest to obtain the VCPU is >>> currently running or not. This will enhance lock performance on >>> overcommitted hosts (more runnable VCPUs than physical CPUs in the >>> system) as doing busy waits for preempted VCPUs will hurt system >>> performance far worse than early yielding. >>> >>> We have observed some performace improvements in uninx benchmark tests. >>> >>> unix benchmark result: >>> host: kernel 5.5.0-rc1, HiSilicon Kunpeng920, 8 CPUs >>> guest: kernel 5.5.0-rc1, 16 VCPUs >>> >>> test-case | after-patch | before-patch >>> +---+-- >>> Dhrystone 2 using register variables | 334600751.0 lps | 335319028.3 >>> lps >>> Double-Precision Whetstone | 32856.1 MWIPS | 32849.6 >>> MWIPS >>> Execl Throughput | 3662.1 lps | 2718.0 >>> lps >>> File Copy 1024 bufsize 2000 maxblocks | 432906.4 KBps | 158011.8 >>> KBps >>> File Copy 256 bufsize 500 maxblocks | 116023.0 KBps | 37664.0 >>> KBps >>> File Copy 4096 bufsize 8000 maxblocks | 1432769.8 KBps | 441108.8 >>> KBps >>> Pipe Throughput | 6405029.6 lps | 6021457.6 >>> lps >>> Pipe-based Context Switching | 185872.7 lps | 184255.3 >>> lps >>> Process Creation | 4025.7 lps | 3706.6 >>> lps >>> Shell Scripts (1 concurrent) | 6745.6 lpm | 6436.1 >>> lpm >>> Shell Scripts (8 concurrent) | 998.7 lpm | 931.1 >>> lpm >>> System Call Overhead | 3913363.1 lps | 3883287.8 >>> lps >>> +---+-- >>> System Benchmarks Index Score | 1835.1 | 1327.6 >> >> Interesting, thanks for the numbers. >> >> So it looks like there is a decent improvement to be had from targetted vCPU >> wakeup, but I really dislike the explicit PV interface and it's already been >> shown to interact badly with the WFE-based polling in smp_cond_load_*(). >> >> Rather than expose a divergent interface, I would instead like to explore an >> improvement to smp_cond_load_*() and see how that performs before we commit >> to something more intrusive. Marc and I looked at this very briefly in the >> past, and the basic idea is to register all of the WFE sites with the >> hypervisor, indicating which register contains the address being spun on >> and which register contains the "bad" value. That way, you don't bother >> rescheduling a vCPU if the value at the address is still bad, because you >> know it will exit immediately. >> >> Of course, the devil is in the details because when I say "address", that's >> a guest virtual address, so you need to play some tricks in the hypervisor >> so that you have a separate mapping for the lockword (it's enough to keep >> track of the physical address). >> >> Our hacks are here but we basically ran out of time to work on them beyond >> an unoptimised and hacky prototype: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy >> >> Marc -- how would you prefer to handle this? > > Let me try and rebase this thing to a modern kernel (I doubt it applies > without > conflicts to mainline). We can then have discussion about its merit on the > list > once I post it. It'd be good to have a pointer to the benchamrks that have > been > used here. > > Thanks, > > M. Hi Marc, Will, My apologies for the slow reply. Just checking what is the latest on this PV cond yield prototype? https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/pvcy The following are the unixbench test results of PV cond yield prototype: unix benchmark result: host: kernel 5.10.0-rc6, HiSilicon Kunpeng920, 8 CPUs guest: kernel 5.10.0-rc6, 16 VCPUs | 5.10.0-rc6 | pv_cond_yield | vcpu_is_preempted System Benchmarks Index Values|INDEX | INDEX| INDEX ---++---+--- Dhrystone 2 using register variables | 29164.0 |29156.9|29207.2 Double-Precision Whetstone| 6807.6 | 6789.2| 6912.1 Execl Throughput |856.7 | 1195.6| 863.1 File Copy 1024 bufsize 2000 maxblocks |189.9 | 923.5| 1094.2 File Copy 256 bufsize 500 maxblocks |121.9 | 578.4| 588.7 File Copy 4096 bufsize 8000 maxblocks |419.9 | 1992.0| 2733.7 Pipe