Re: [PATCH V2 2/3] vdpa_sim_net: support feature provisioning

2022-10-06 Thread Si-Wei Liu



On 9/28/2022 9:55 PM, Jason Wang wrote:

On Tue, Sep 27, 2022 at 5:41 PM Si-Wei Liu  wrote:



On 9/26/2022 8:59 PM, Jason Wang wrote:

On Tue, Sep 27, 2022 at 9:02 AM Si-Wei Liu  wrote:


On 9/26/2022 12:11 AM, Jason Wang wrote:

On Sat, Sep 24, 2022 at 4:01 AM Si-Wei Liu  wrote:


On 9/21/2022 7:43 PM, Jason Wang wrote:

This patch implements features provisioning for vdpa_sim_net.

1) validating the provisioned features to be a subset of the parent
 features.
2) clearing the features that is not wanted by the userspace

For example:

# vdpa mgmtdev show
vdpasim_net:
supported_classes net
max_supported_vqs 3
dev_features MTU MAC CTRL_VQ CTRL_MAC_ADDR ANY_LAYOUT VERSION_1 
ACCESS_PLATFORM

Sighs, not to blame any one and it's perhaps too late, but this
"dev_features" attr in "mgmtdev show" command output should have been
called "supported_features" in the first place.

Not sure I get this, but I guess this is the negotiated features actually.

Actually no, that is why I said the name is a bit confusing and 
"supported_features" might sound better.

You're right, it's an mgmtdev show actually.

This attribute in the parent device (mgmtdev) denotes the real device 
capability for what virtio features can be supported by the parent device. Any 
unprivileged user can check into this field to know parent device's capability 
without having to create a child vDPA device at all. The features that child 
vDPA device may support should be a subset of, or at most up to what the parent 
device offers. For e.g. the vdpa device dev1 you created below can expose less 
or equal device_features bit than 0x308820028 (MTU MAC CTRL_VQ CTRL_MAC_ADDR 
ANY_LAYOUT VERSION_1 ACCESS_PLATFORM), but shouldn't be no more than what the 
parent device can actually support.

Yes, I didn't see anything wrong with "dev_features",

Yep, it didn't appear to me anything wrong either at first sight, then I gave my R-b on the series introduced this attribute. But it's not 
a perfect name, either, on the other hand. Parav later pointed out that the corresponding enum definition for this attribute should follow 
pre-existing naming convention that we should perhaps do s/VDPA_ATTR_DEV_SUPPORTED_FEATURES/VDPA_ATTR_MGMTDEV_SUPPORTED_FEATURES/ to get it 
renamed, as this is a mgmtdev level attribute, which I agree. Now that with the upcoming "device_features" attribute (vdpa dev 
level) from this series, it's subject to another confusions between these two similar names, but actually would represent things at 
different level. While all other attributes in "mgmtdev dev show" seem to be aligned with the "supported_" prefix, e.g. 
supported_classes, max_supported_vqs, from which I think the stance of device is already implied through "mgmtdev" in the 
command. For the perspective of clarify and easy distinction, "supported_features" seems to be a better name than 
"dev_features".

See another reply, I think I get your point,

1) VDPA_ATTR_VDPA_DEV_SUPPORTED_FEATURES (lingshan's series) and
VDPA_ATTR_VDPA_DEV_FEATURES should be equivalent and unified to a
single attribute.
2) A better name to "supported_features" should be fine, patches are welcomed


  it aligns to the
virtio spec which means the features could be used to create a vdpa
device. But if everyone agree on the renaming, I'm fine.

Never mind, if it's late don't have to bother.



I think Ling Shan is working on reporting both negotiated features
with the device features.

Does it imply this series is connected to another work in parallel? Is it 
possible to add a reference in the cover letter?

I'm not sure, I remember Ling Shan did some work to not block the
config show in this commit:

commit a34bed37fc9d3da319bb75dfbf02a7d3e95e12de
Author: Zhu Lingshan 
Date:   Fri Jul 22 19:53:07 2022 +0800

 vDPA: !FEATURES_OK should not block querying device config space

We need some changes in the vdpa tool to show device_features
unconditionally in the "dev config show" command.

That's true, I think I ever pointed it out to Lingshan before, that it's not needed to bother 
exposing those config space fields in "dev config show" output, if the only intent is for 
live migration of device features between nodes. For vDPA live migration, what cares most is those 
configuration parameters specified on vdpa creation, and userspace VMM (QEMU) is supposed to take 
care of saving and restoring live device states. I think it's easier to extend "vdpa dev 
show" output to include device_features and other config params as well, rather than count on 
validity of various config space fields.

Probably, but for the migration it's more about the ability of the
mgmtdev instead of the vDPA device itself I think.
If picking the appropriate device for migration is what it is concerned, 
it's subject to the capability that mgmtdev offers. That's true, for sure.


On the other hand, mgmt software would also need to take care of 
reconstructing the destination device with the same configuration 

Re: [PATCH v4 2/7] Enable balloon drivers to report inflated memory

2022-10-06 Thread Nadav Amit
On Oct 6, 2022, at 12:34 AM, Alexander Atanasov 
 wrote:

> Hello,
> 
> 
> On 5.10.22 20:25, Nadav Amit wrote:
>> On Oct 5, 2022, at 2:01 AM, Alexander Atanasov 
>>  wrote:
>>> Add counters to be updated by the balloon drivers.
>>> Create balloon notifier to propagate changes.
>> I missed the other patches before (including this one). Sorry, but next
>> time, please cc me.
> 
> You are CCed in the cover letter since the version. I will add CC to you
> in the individual patches if you want so.

Thanks.

Just to clarify - I am not attacking you. It’s more of me making an excuse
for not addressing some issues in earlier versions.

>> I was looking through the series and I did not see actual users of the
>> notifier. Usually, it is not great to build an API without users.
> 
> 
> You are right. I hope to get some feedback/interest from potential users that 
> i mentioned in the cover letter. I will probably split the notifier
> in separate series. To make it usefull it will require more changes.
> See bellow more about them.

Fair, but this is something that is more suitable for RFC. Otherwise, more
likely than not - your patches would go in as is.

>> [snip]
>>> +
>>> +static int balloon_notify(unsigned long val)
>>> +{
>>> +   return srcu_notifier_call_chain(_chain, val, NULL);
>> Since you know the inflated_kb value here, why not to use it as an argument
>> to the callback? I think casting to (void *) and back is best. But you can
>> also provide pointer to the value. Doesn’t it sound better than having
>> potentially different notifiers reading different values?
> 
> My current idea is to have a struct with current and previous value,
> may be change in percents. The actual value does not matter to anyone
> but the size of change does. When a user gets notified it can act upon
> the change - if it is small it can ignore it , if it is above some threshold 
> it can act - if it makes sense for some receiver  is can accumulate changes 
> from several notification. Other option/addition is to have 
> si_meminfo_current(..) and totalram_pages_current(..) that return values 
> adjusted with the balloon values.
> 
> Going further - there are few places that calculate something based on 
> available memory that do not have sysfs/proc interface for setting limits. 
> Most of them work in percents so they can be converted to do calculations 
> when they get notification.
> 
> The one that have interface for configuration but use memory values can be 
> handled in two ways - convert to use percents of what is available or extend 
> the notifier to notify userspace which in turn to do calculations and update 
> configuration.

I really need to see code to fully understand what you have in mind.
Division, as you know, is not something that we really want to do very
frequently.

>> Anyhow, without users (actual notifiers) it’s kind of hard to know how
>> reasonable it all is. For instance, is it balloon_notify() supposed to
>> prevent further balloon inflating/deflating until the notifier completes?
> 
> No, we must avoid that at any cost.
> 
>> Accordingly, are callers to balloon_notify() expected to relinquish locks
>> before calling balloon_notify() to prevent deadlocks and high latency?
> 
> My goal is to avoid any possible impact on performance. Drivers are free to 
> delay notifications if they get in the way. (I see that i need to move the 
> notification after the semaphore in the vmw driver - i missed that - will fix 
> in the next iterration.)
> Deadlocks - depends on the users but a few to none will possibly have to deal 
> with common locks.

I will need to see the next version to give better feedback. One more thing
that comes to mind though is whether saving the balloon size in multiple
places (both mem_balloon_inflated_total_kb and each balloon’s accounting) is
the right way. It does not sounds very clean.

Two other options is to move *all* the accounting to your new
mem_balloon_inflated_total_kb-like interface or expose some per-balloon
function to get the balloon size (indirect-function-call would likely have
some overhead though).

Anyhow, I am not crazy about having the same data replicated. Even from
reading the code stand-of-view it is not intuitive.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2] vsock: replace virtio_vsock_pkt with sk_buff

2022-10-06 Thread Stefano Garzarella

On Thu, Oct 06, 2022 at 03:08:12AM -0400, Michael S. Tsirkin wrote:

On Wed, Oct 05, 2022 at 06:19:44PM -0700, Bobby Eshleman wrote:

This patch replaces the struct virtio_vsock_pkt with struct sk_buff.

Using sk_buff in vsock benefits it by a) allowing vsock to be extended
for socket-related features like sockmap, b) vsock may in the future
use other sk_buff-dependent kernel capabilities, and c) vsock shares
commonality with other socket types.

This patch is taken from the original series found here:
https://lore.kernel.org/all/cover.1660362668.git.bobby.eshle...@bytedance.com/

Small-sized packet throughput improved by ~5% (from 18.53 Mb/s to 19.51
Mb/s). Tested using uperf, 16B payloads, 64 threads, 100s, averaged from
10 test runs (n=10). This improvement is likely due to packet merging.

Large-sized packet throughput decreases ~9% (from 27.25 Gb/s to 25.04
Gb/s). Tested using uperf, 64KB payloads, 64 threads, 100s, averaged
from 10 test runs (n=10).

Medium-sized packet throughput decreases ~5% (from 4.0 Gb/s to 3.81
Gb/s). Tested using uperf, 4k to 8k payload sizes picked randomly
according to normal distribution, 64 threads, 100s, averaged from 10
test runs (n=10).


It is surprizing to me that the original vsock code managed to outperform
the new one, given that to my knowledge we did not focus on optimizing it.


Yeah mee to.

From this numbers maybe the allocation cost has been reduced as it 
performs better with small packets. But with medium to large packets we 
perform worse, perhaps because previously we were allocating a 
contiguous buffer up to 64k?
Instead alloc_skb() could allocate non-contiguous pages ? (which would 
solve the problems we saw a few days ago)


@Bobby Are these numbers for guest -> host communication? Can we try the 
reverse path as well?


I will review the patch in the next few days!

Thanks,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vsock: replace virtio_vsock_pkt with sk_buff

2022-10-06 Thread Michael S. Tsirkin
On Wed, Oct 05, 2022 at 06:19:44PM -0700, Bobby Eshleman wrote:
> This patch replaces the struct virtio_vsock_pkt with struct sk_buff.
> 
> Using sk_buff in vsock benefits it by a) allowing vsock to be extended
> for socket-related features like sockmap, b) vsock may in the future
> use other sk_buff-dependent kernel capabilities, and c) vsock shares
> commonality with other socket types.
> 
> This patch is taken from the original series found here:
> https://lore.kernel.org/all/cover.1660362668.git.bobby.eshle...@bytedance.com/
> 
> Small-sized packet throughput improved by ~5% (from 18.53 Mb/s to 19.51
> Mb/s). Tested using uperf, 16B payloads, 64 threads, 100s, averaged from
> 10 test runs (n=10). This improvement is likely due to packet merging.
> 
> Large-sized packet throughput decreases ~9% (from 27.25 Gb/s to 25.04
> Gb/s). Tested using uperf, 64KB payloads, 64 threads, 100s, averaged
> from 10 test runs (n=10).
> 
> Medium-sized packet throughput decreases ~5% (from 4.0 Gb/s to 3.81
> Gb/s). Tested using uperf, 4k to 8k payload sizes picked randomly
> according to normal distribution, 64 threads, 100s, averaged from 10
> test runs (n=10).

It is surprizing to me that the original vsock code managed to outperform
the new one, given that to my knowledge we did not focus on optimizing it.
Is there a chance a bit more tuning before merging would get
rid of the performance regressions?

> All tests done in nested VMs (virtual host and virtual guest).
> 
> Signed-off-by: Bobby Eshleman 
> ---
> Changes in v2:
> - Use alloc_skb() directly instead of sock_alloc_send_pskb() to minimize
>   uAPI changes.
> - Do not marshal errors to -ENOMEM for non-virtio implementations.
> - No longer a part of the original series
> - Some code cleanup and refactoring
> - Include performance stats
> 
>  drivers/vhost/vsock.c   | 215 +--
>  include/linux/virtio_vsock.h|  64 +++-
>  net/vmw_vsock/af_vsock.c|   1 +
>  net/vmw_vsock/virtio_transport.c| 206 +-
>  net/vmw_vsock/virtio_transport_common.c | 483 
>  net/vmw_vsock/vsock_loopback.c  |  51 +--
>  6 files changed, 504 insertions(+), 516 deletions(-)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index 368330417bde..6179e637602f 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -51,8 +51,7 @@ struct vhost_vsock {
>   struct hlist_node hash;
>  
>   struct vhost_work send_pkt_work;
> - spinlock_t send_pkt_list_lock;
> - struct list_head send_pkt_list; /* host->guest pending packets */
> + struct sk_buff_head send_pkt_queue; /* host->guest pending packets */
>  
>   atomic_t queued_replies;
>  
> @@ -108,7 +107,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   vhost_disable_notify(>dev, vq);
>  
>   do {
> - struct virtio_vsock_pkt *pkt;
> + struct sk_buff *skb;
> + struct virtio_vsock_hdr *hdr;
>   struct iov_iter iov_iter;
>   unsigned out, in;
>   size_t nbytes;
> @@ -116,31 +116,22 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   int head;
>   u32 flags_to_restore = 0;
>  
> - spin_lock_bh(>send_pkt_list_lock);
> - if (list_empty(>send_pkt_list)) {
> - spin_unlock_bh(>send_pkt_list_lock);
> + skb = skb_dequeue(>send_pkt_queue);
> +
> + if (!skb) {
>   vhost_enable_notify(>dev, vq);
>   break;
>   }
>  
> - pkt = list_first_entry(>send_pkt_list,
> -struct virtio_vsock_pkt, list);
> - list_del_init(>list);
> - spin_unlock_bh(>send_pkt_list_lock);
> -
>   head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
>, , NULL, NULL);
>   if (head < 0) {
> - spin_lock_bh(>send_pkt_list_lock);
> - list_add(>list, >send_pkt_list);
> - spin_unlock_bh(>send_pkt_list_lock);
> + skb_queue_head(>send_pkt_queue, skb);
>   break;
>   }
>  
>   if (head == vq->num) {
> - spin_lock_bh(>send_pkt_list_lock);
> - list_add(>list, >send_pkt_list);
> - spin_unlock_bh(>send_pkt_list_lock);
> + skb_queue_head(>send_pkt_queue, skb);
>  
>   /* We cannot finish yet if more buffers snuck in while
>* re-enabling notify.
> @@ -153,26 +144,27 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
>   }
>  
>   if (out) {
> - virtio_transport_free_pkt(pkt);
> + kfree_skb(skb);
>   vq_err(vq, "Expected 0 output buffers, got %u\n",