Re: [PATCH V2 2/3] vdpa_sim_net: support feature provisioning
On 9/28/2022 9:55 PM, Jason Wang wrote: On Tue, Sep 27, 2022 at 5:41 PM Si-Wei Liu wrote: On 9/26/2022 8:59 PM, Jason Wang wrote: On Tue, Sep 27, 2022 at 9:02 AM Si-Wei Liu wrote: On 9/26/2022 12:11 AM, Jason Wang wrote: On Sat, Sep 24, 2022 at 4:01 AM Si-Wei Liu wrote: On 9/21/2022 7:43 PM, Jason Wang wrote: This patch implements features provisioning for vdpa_sim_net. 1) validating the provisioned features to be a subset of the parent features. 2) clearing the features that is not wanted by the userspace For example: # vdpa mgmtdev show vdpasim_net: supported_classes net max_supported_vqs 3 dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM Sighs, not to blame any one and it's perhaps too late, but this "dev_features" attr in "mgmtdev show" command output should have been called "supported_features" in the first place. Not sure I get this, but I guess this is the negotiated features actually. Actually no, that is why I said the name is a bit confusing and "supported_features" might sound better. You're right, it's an mgmtdev show actually. This attribute in the parent device (mgmtdev) denotes the real device capability for what virtio features can be supported by the parent device. Any unprivileged user can check into this field to know parent device's capability without having to create a child vDPA device at all. The features that child vDPA device may support should be a subset of, or at most up to what the parent device offers. For e.g. the vdpa device dev1 you created below can expose less or equal device_features bit than 0x308820028 (MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 ACCESS_PLATFORM), but shouldn't be no more than what the parent device can actually support. Yes, I didn't see anything wrong with "dev_features", Yep, it didn't appear to me anything wrong either at first sight, then I gave my R-b on the series introduced this attribute. But it's not a perfect name, either, on the other hand. Parav later pointed out that the corresponding enum definition for this attribute should follow pre-existing naming convention that we should perhaps do s/VDPA_ATTR_DEV_SUPPORTED_FEATURES/VDPA_ATTR_MGMTDEV_SUPPORTED_FEATURES/ to get it renamed, as this is a mgmtdev level attribute, which I agree. Now that with the upcoming "device_features" attribute (vdpa dev level) from this series, it's subject to another confusions between these two similar names, but actually would represent things at different level. While all other attributes in "mgmtdev dev show" seem to be aligned with the "supported_" prefix, e.g. supported_classes, max_supported_vqs, from which I think the stance of device is already implied through "mgmtdev" in the command. For the perspective of clarify and easy distinction, "supported_features" seems to be a better name than "dev_features". See another reply, I think I get your point, 1) VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES (lingshan's series) and VDPA_ATTR_VDPA_DEV_FEATURES should be equivalent and unified to a single attribute. 2) A better name to "supported_features" should be fine, patches are welcomed it aligns to the virtio spec which means the features could be used to create a vdpa device. But if everyone agree on the renaming, I'm fine. Never mind, if it's late don't have to bother. I think Ling Shan is working on reporting both negotiated features with the device features. Does it imply this series is connected to another work in parallel? Is it possible to add a reference in the cover letter? I'm not sure, I remember Ling Shan did some work to not block the config show in this commit: commit a34bed37fc9d3da319bb75dfbf02a7d3e95e12de Author: Zhu Lingshan Date: Fri Jul 22 19:53:07 2022 +0800 vDPA: !FEATURES_OK should not block querying device config space We need some changes in the vdpa tool to show device_features unconditionally in the "dev config show" command. That's true, I think I ever pointed it out to Lingshan before, that it's not needed to bother exposing those config space fields in "dev config show" output, if the only intent is for live migration of device features between nodes. For vDPA live migration, what cares most is those configuration parameters specified on vdpa creation, and userspace VMM (QEMU) is supposed to take care of saving and restoring live device states. I think it's easier to extend "vdpa dev show" output to include device_features and other config params as well, rather than count on validity of various config space fields. Probably, but for the migration it's more about the ability of the mgmtdev instead of the vDPA device itself I think. If picking the appropriate device for migration is what it is concerned, it's subject to the capability that mgmtdev offers. That's true, for sure. On the other hand, mgmt software would also need to take care of reconstructing the destination device with the same configuration
Re: [PATCH v4 2/7] Enable balloon drivers to report inflated memory
On Oct 6, 2022, at 12:34 AM, Alexander Atanasov wrote: > Hello, > > > On 5.10.22 20:25, Nadav Amit wrote: >> On Oct 5, 2022, at 2:01 AM, Alexander Atanasov >> wrote: >>> Add counters to be updated by the balloon drivers. >>> Create balloon notifier to propagate changes. >> I missed the other patches before (including this one). Sorry, but next >> time, please cc me. > > You are CCed in the cover letter since the version. I will add CC to you > in the individual patches if you want so. Thanks. Just to clarify - I am not attacking you. It’s more of me making an excuse for not addressing some issues in earlier versions. >> I was looking through the series and I did not see actual users of the >> notifier. Usually, it is not great to build an API without users. > > > You are right. I hope to get some feedback/interest from potential users that > i mentioned in the cover letter. I will probably split the notifier > in separate series. To make it usefull it will require more changes. > See bellow more about them. Fair, but this is something that is more suitable for RFC. Otherwise, more likely than not - your patches would go in as is. >> [snip] >>> + >>> +static int balloon_notify(unsigned long val) >>> +{ >>> + return srcu_notifier_call_chain(_chain, val, NULL); >> Since you know the inflated_kb value here, why not to use it as an argument >> to the callback? I think casting to (void *) and back is best. But you can >> also provide pointer to the value. Doesn’t it sound better than having >> potentially different notifiers reading different values? > > My current idea is to have a struct with current and previous value, > may be change in percents. The actual value does not matter to anyone > but the size of change does. When a user gets notified it can act upon > the change - if it is small it can ignore it , if it is above some threshold > it can act - if it makes sense for some receiver is can accumulate changes > from several notification. Other option/addition is to have > si_meminfo_current(..) and totalram_pages_current(..) that return values > adjusted with the balloon values. > > Going further - there are few places that calculate something based on > available memory that do not have sysfs/proc interface for setting limits. > Most of them work in percents so they can be converted to do calculations > when they get notification. > > The one that have interface for configuration but use memory values can be > handled in two ways - convert to use percents of what is available or extend > the notifier to notify userspace which in turn to do calculations and update > configuration. I really need to see code to fully understand what you have in mind. Division, as you know, is not something that we really want to do very frequently. >> Anyhow, without users (actual notifiers) it’s kind of hard to know how >> reasonable it all is. For instance, is it balloon_notify() supposed to >> prevent further balloon inflating/deflating until the notifier completes? > > No, we must avoid that at any cost. > >> Accordingly, are callers to balloon_notify() expected to relinquish locks >> before calling balloon_notify() to prevent deadlocks and high latency? > > My goal is to avoid any possible impact on performance. Drivers are free to > delay notifications if they get in the way. (I see that i need to move the > notification after the semaphore in the vmw driver - i missed that - will fix > in the next iterration.) > Deadlocks - depends on the users but a few to none will possibly have to deal > with common locks. I will need to see the next version to give better feedback. One more thing that comes to mind though is whether saving the balloon size in multiple places (both mem_balloon_inflated_total_kb and each balloon’s accounting) is the right way. It does not sounds very clean. Two other options is to move *all* the accounting to your new mem_balloon_inflated_total_kb-like interface or expose some per-balloon function to get the balloon size (indirect-function-call would likely have some overhead though). Anyhow, I am not crazy about having the same data replicated. Even from reading the code stand-of-view it is not intuitive. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] vsock: replace virtio_vsock_pkt with sk_buff
On Thu, Oct 06, 2022 at 03:08:12AM -0400, Michael S. Tsirkin wrote: On Wed, Oct 05, 2022 at 06:19:44PM -0700, Bobby Eshleman wrote: This patch replaces the struct virtio_vsock_pkt with struct sk_buff. Using sk_buff in vsock benefits it by a) allowing vsock to be extended for socket-related features like sockmap, b) vsock may in the future use other sk_buff-dependent kernel capabilities, and c) vsock shares commonality with other socket types. This patch is taken from the original series found here: https://lore.kernel.org/all/cover.1660362668.git.bobby.eshle...@bytedance.com/ Small-sized packet throughput improved by ~5% (from 18.53 Mb/s to 19.51 Mb/s). Tested using uperf, 16B payloads, 64 threads, 100s, averaged from 10 test runs (n=10). This improvement is likely due to packet merging. Large-sized packet throughput decreases ~9% (from 27.25 Gb/s to 25.04 Gb/s). Tested using uperf, 64KB payloads, 64 threads, 100s, averaged from 10 test runs (n=10). Medium-sized packet throughput decreases ~5% (from 4.0 Gb/s to 3.81 Gb/s). Tested using uperf, 4k to 8k payload sizes picked randomly according to normal distribution, 64 threads, 100s, averaged from 10 test runs (n=10). It is surprizing to me that the original vsock code managed to outperform the new one, given that to my knowledge we did not focus on optimizing it. Yeah mee to. From this numbers maybe the allocation cost has been reduced as it performs better with small packets. But with medium to large packets we perform worse, perhaps because previously we were allocating a contiguous buffer up to 64k? Instead alloc_skb() could allocate non-contiguous pages ? (which would solve the problems we saw a few days ago) @Bobby Are these numbers for guest -> host communication? Can we try the reverse path as well? I will review the patch in the next few days! Thanks, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v2] vsock: replace virtio_vsock_pkt with sk_buff
On Wed, Oct 05, 2022 at 06:19:44PM -0700, Bobby Eshleman wrote: > This patch replaces the struct virtio_vsock_pkt with struct sk_buff. > > Using sk_buff in vsock benefits it by a) allowing vsock to be extended > for socket-related features like sockmap, b) vsock may in the future > use other sk_buff-dependent kernel capabilities, and c) vsock shares > commonality with other socket types. > > This patch is taken from the original series found here: > https://lore.kernel.org/all/cover.1660362668.git.bobby.eshle...@bytedance.com/ > > Small-sized packet throughput improved by ~5% (from 18.53 Mb/s to 19.51 > Mb/s). Tested using uperf, 16B payloads, 64 threads, 100s, averaged from > 10 test runs (n=10). This improvement is likely due to packet merging. > > Large-sized packet throughput decreases ~9% (from 27.25 Gb/s to 25.04 > Gb/s). Tested using uperf, 64KB payloads, 64 threads, 100s, averaged > from 10 test runs (n=10). > > Medium-sized packet throughput decreases ~5% (from 4.0 Gb/s to 3.81 > Gb/s). Tested using uperf, 4k to 8k payload sizes picked randomly > according to normal distribution, 64 threads, 100s, averaged from 10 > test runs (n=10). It is surprizing to me that the original vsock code managed to outperform the new one, given that to my knowledge we did not focus on optimizing it. Is there a chance a bit more tuning before merging would get rid of the performance regressions? > All tests done in nested VMs (virtual host and virtual guest). > > Signed-off-by: Bobby Eshleman > --- > Changes in v2: > - Use alloc_skb() directly instead of sock_alloc_send_pskb() to minimize > uAPI changes. > - Do not marshal errors to -ENOMEM for non-virtio implementations. > - No longer a part of the original series > - Some code cleanup and refactoring > - Include performance stats > > drivers/vhost/vsock.c | 215 +-- > include/linux/virtio_vsock.h| 64 +++- > net/vmw_vsock/af_vsock.c| 1 + > net/vmw_vsock/virtio_transport.c| 206 +- > net/vmw_vsock/virtio_transport_common.c | 483 > net/vmw_vsock/vsock_loopback.c | 51 +-- > 6 files changed, 504 insertions(+), 516 deletions(-) > > diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c > index 368330417bde..6179e637602f 100644 > --- a/drivers/vhost/vsock.c > +++ b/drivers/vhost/vsock.c > @@ -51,8 +51,7 @@ struct vhost_vsock { > struct hlist_node hash; > > struct vhost_work send_pkt_work; > - spinlock_t send_pkt_list_lock; > - struct list_head send_pkt_list; /* host->guest pending packets */ > + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */ > > atomic_t queued_replies; > > @@ -108,7 +107,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > vhost_disable_notify(>dev, vq); > > do { > - struct virtio_vsock_pkt *pkt; > + struct sk_buff *skb; > + struct virtio_vsock_hdr *hdr; > struct iov_iter iov_iter; > unsigned out, in; > size_t nbytes; > @@ -116,31 +116,22 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > int head; > u32 flags_to_restore = 0; > > - spin_lock_bh(>send_pkt_list_lock); > - if (list_empty(>send_pkt_list)) { > - spin_unlock_bh(>send_pkt_list_lock); > + skb = skb_dequeue(>send_pkt_queue); > + > + if (!skb) { > vhost_enable_notify(>dev, vq); > break; > } > > - pkt = list_first_entry(>send_pkt_list, > -struct virtio_vsock_pkt, list); > - list_del_init(>list); > - spin_unlock_bh(>send_pkt_list_lock); > - > head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), >, , NULL, NULL); > if (head < 0) { > - spin_lock_bh(>send_pkt_list_lock); > - list_add(>list, >send_pkt_list); > - spin_unlock_bh(>send_pkt_list_lock); > + skb_queue_head(>send_pkt_queue, skb); > break; > } > > if (head == vq->num) { > - spin_lock_bh(>send_pkt_list_lock); > - list_add(>list, >send_pkt_list); > - spin_unlock_bh(>send_pkt_list_lock); > + skb_queue_head(>send_pkt_queue, skb); > > /* We cannot finish yet if more buffers snuck in while >* re-enabling notify. > @@ -153,26 +144,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, > } > > if (out) { > - virtio_transport_free_pkt(pkt); > + kfree_skb(skb); > vq_err(vq, "Expected 0 output buffers, got %u\n",