Re: [RFC PATCH v2] vdpa: Do not count the pages that were already pinned in the vhost-vDPA

2022-05-31 Thread Jason Wang
On Wed, Jun 1, 2022 at 9:20 AM Cindy Lu  wrote:
>
> We count pinned_vm as follow in vhost-vDPA
>
> lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
>  ret = -ENOMEM;
>  goto unlock;
> }
> This means if we have two vDPA devices for the same VM the pages would be 
> counted twice
> So we add a tree to save the page that counted and we will not count it
> again.

The code is not easy to be reviewed, some suggestions:

- It's better to explain in general the algorithm you used here
- Add more comment in the codes to explain the rationale

And I still see the above check against the RLIMIT in the code, is it
intentional?

> Signed-off-by: Cindy Lu 
> ---
>  drivers/vhost/vdpa.c  | 542 +-
>  drivers/vhost/vhost.h |   1 +
>  2 files changed, 539 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 05f5fd2af58f..1b0da0735efd 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -24,6 +24,10 @@
>  #include 
>
>  #include "vhost.h"
> +#include 
> +#include 
> +#include 
> +#include 
>
>  enum {
> VHOST_VDPA_BACKEND_FEATURES =
> @@ -506,12 +510,478 @@ static long vhost_vdpa_unlocked_ioctl(struct file 
> *filep,
> return r;
>  }
>
> +struct vdpa_tree_node {
> +   struct interval_tree_node tree_node;

Can we simply reuse the vhost_iotlb tree? Note that vhost_iotlb_map
can be associated with a opaque as token which could be used as
reference count.

> +   int ref;

If it's a refcount, let's use unsigned here.

> +};
> +struct vdpa_link_node {
> +   struct vdpa_tree_node *vdpa_node;
> +   struct vdpa_link_node *next;

Need to explain why we need a linked list here considering we've
already structured it as an interval tree.

Btw, unless it's performance critical, let's try to re-use kernel list.h.

> +   u64 node_start;
> +   u64 node_last;

Let's add a comment to explain each member here.

> +};
> +
> +int vhost_vdpa_add_range_ctx(struct rb_root_cached *root, u64 start, u64 
> last,
> +int ref)

We don't want to export this symbol, so let's make the function
static, so did other functions.

> +{
> +   struct interval_tree_node *new_node;
> +   struct vdpa_tree_node *vdpa_node;
> +
> +   if (last < start)
> +   return -EFAULT;
> +
> +   /* If the range being mapped is [0, ULONG_MAX], split it into two 
> entries
> +* otherwise its size would overflow u64.
> +*/
> +   if (start == 0 && last == ULONG_MAX) {
> +   u64 mid = last / 2;
> +
> +   vhost_vdpa_add_range_ctx(root, start, mid, ref);
> +   start = mid + 1;
> +   }
> +   vdpa_node = kmalloc(sizeof(struct vdpa_tree_node), GFP_ATOMIC);
> +

Let's check if kmalloc succeeds here.


> +   new_node = &vdpa_node->tree_node;
> +   if (!new_node)
> +   return -ENOMEM;
> +
> +   new_node->start = start;
> +   new_node->last = last;
> +   vdpa_node->ref = ref;
> +
> +   interval_tree_insert(new_node, root);
> +
> +   return 0;
> +}
> +
> +u64 vhost_vdpa_range_ref_add(struct rb_root_cached *root,
> +struct vdpa_link_node *link_head, int 
> node_number,
> +u64 start, u64 last)
> +{
> +   int i = 0;
> +   u64 size = 0;
> +   int new_ref;
> +   u64 node_start;
> +   u64 node_last;
> +   u64 range_start;
> +   u64 range_last;
> +   int range_size;
> +   struct vdpa_link_node *link_node;
> +   struct vdpa_tree_node *vdpa_node = NULL;
> +   struct interval_tree_node *node = NULL;
> +
> +   if (node_number == 0) {
> +   vhost_vdpa_add_range_ctx(root, start, last, 1);
> +
> +   size = last - start + 1;
> +   return size;
> +   }
> +
> +   link_node = link_head;
> +   range_start = start;
> +   range_last = last;
> +   range_size = range_start - range_last;
> +   for (i = 0; i < node_number; i++) {
> +   vdpa_node = link_node->vdpa_node;
> +   link_node = link_node->next;
> +   node = &vdpa_node->tree_node;
> +   new_ref = vdpa_node->ref;
> +   node_start = node->start;
> +   node_last = node->last;
> +
> +   if (range_start == node_start) {
> +   if (node_last < range_last) {
> +   /* range_start= node->start--- 
> node->last--range_last*/
> +   vhost_vdpa_add_range_ctx(root, node_start,
> +node_last,
> +new_ref + 1);
> +   /*count the next range */
> +   } else if (node_last > range_last) {
> +   /* range

Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-31 Thread Jason Wang
On Wed, Jun 1, 2022 at 4:19 AM Parav Pandit  wrote:
>
>
> > From: Jason Wang 
> > Sent: Sunday, May 29, 2022 11:39 PM
> >
> > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin  wrote:
> > >
> > > On Thu, May 26, 2022 at 12:54:32PM +, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Eugenio Pérez 
> > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > >
> > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > > offer
> > > > >
> > > > > that backend feature and userspace can effectively stop the device.
> > > > >
> > > > >
> > > > >
> > > > > This is a must before get virtqueue indexes (base) for live
> > > > > migration,
> > > > >
> > > > > since the device could modify them after userland gets them. There
> > > > > are
> > > > >
> > > > > individual ways to perform that action for some devices
> > > > >
> > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> > there
> > > > > was no
> > > > >
> > > > > way to perform it for any vhost device (and, in particular, 
> > > > > vhost-vdpa).
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > > any
> > > > >
> > > > > pending operations like in flight requests. It must also preserve
> > > > > all
> > > > >
> > > > > the necessary state (the virtqueue vring base plus the possible
> > > > > device
> > > > >
> > > > > specific states) that is required for restoring in the future. The
> > > > >
> > > > > device must not change its configuration after that point.
> > > > >
> > > > >
> > > > >
> > > > > After the return of ioctl with stop == 0, the device can continue
> > > > >
> > > > > processing buffers as long as typical conditions are met (vq is
> > > > > enabled,
> > > > >
> > > > > DRIVER_OK status bit is enabled, etc).
> > > >
> > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map 
> > > > to
> > any mechanism in the virtio spec.
> > > >
> > > > Why can't we use this ioctl() to indicate driver to start/stop the 
> > > > device
> > instead of driving it through the driver_ok?
> > > > This is in the context of other discussion we had in the LM series.
> > >
> > > If there's something in the spec that does this then let's use that.
> >
> > Actually, we try to propose a independent feature here:
> >
> > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> >
> This will stop the device for all the operations.

Well, the ability to query the virtqueue state was proposed as another
feature (Eugenio, please correct me). This should be sufficient for
making virtio-net to be live migrated.

https://lists.oasis-open.org/archives/virtio-comment/202103/msg8.html

> Once the device is stopped, its state cannot be queried further as device 
> won't respond.
> It has limited use case.
> What we need is to stop non admin queue related portion of the device.

See above.

Thanks

>
> > Does it make sense to you?
> >
> > Thanks
> >
> > > Unfortunately the LM series seems to be stuck on moving bits around
> > > with the admin virtqueue ...
> > >
> > > --
> > > MST
> > >
>

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: Re: Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

2022-05-31 Thread zhenwei pi

On 5/31/22 12:08, Jue Wang wrote:

On Mon, May 30, 2022 at 8:49 AM Peter Xu  wrote:


On Mon, May 30, 2022 at 07:33:35PM +0800, zhenwei pi wrote:

A VM uses RAM of 2M huge page. Once a MCE(@HVAy in [HVAx,HVAz)) occurs, the
2M([HVAx,HVAz)) of hypervisor becomes unaccessible, but the guest poisons 4K
(@GPAy in [GPAx, GPAz)) only, it may hit another 511 MCE ([GPAx, GPAz)
except GPAy). This is the worse case, so I want to add
  '__le32 corrupted_pages' in struct virtio_balloon_config, it is used in the
next step: reporting 512 * 4K 'corrupted_pages' to the guest, the guest has
a chance to isolate the other 511 pages ahead of time. And the guest
actually loses 2M, fixing 512*4K seems to help significantly.


It sounds hackish to teach a virtio device to assume one page will always
be poisoned in huge page granule.  That's only a limitation to host kernel
not virtio itself.

E.g. there're upstream effort ongoing with enabling doublemap on hugetlbfs
pages so hugetlb pages can be mapped in 4k with it.  It provides potential
possibility to do page poisoning with huge pages in 4k too.  When that'll
be ready the assumption can go away, and that does sound like a better
approach towards this problem.


+1.

A hypervisor should always strive to minimize the guest memory loss.

The HugeTLB double mapping enlightened memory poisoning behavior (only
poison 4K out of a 2MB huge page and 4K in guest) is a much better
solution here. To be completely transparent, it's not _strictly_
required to poison the page (whatever the granularity it is) on the
host side, as long as the following are true:

1. A hypervisor can emulate the _minimized_ (e.g., 4K) the poison to the guest.
2. The host page with the UC error is "isolated" (could be PG_HWPOISON
or in some other way) and prevented from being reused by other
processes.

For #2, PG_HWPOISON and HugeTLB double mapping enlightened memory
poisoning is a good solution.







I assume when talking about "the performance memory drops a lot", you
imply that this patch set can mitigate that performance drop?

But why do you see a performance drop? Because we might lose some
possible THP candidates (in the host or the guest) and you want to plug
does holes? I assume you'll see a performance drop simply because
poisoning memory is expensive, including migrating pages around on CE.

If you have some numbers to share, especially before/after this change,
that would be great.



The CE storm leads 2 problems I have even seen:
1, the memory bandwidth slows down to 10%~20%, and the cycles per
instruction of CPU increases a lot.
2, the THR (/proc/interrupts) interrupts frequently, the CPU has to use a
lot time to handle IRQ.


Totally no good knowledge on CMCI, but if 2) is true then I'm wondering
whether it's necessary to handle the interrupts that frequently.  When I
was reading the Intel CMCI vector handler I stumbled over this comment:

/*
  * The interrupt handler. This is called on every event.
  * Just call the poller directly to log any events.
  * This could in theory increase the threshold under high load,
  * but doesn't for now.
  */
static void intel_threshold_interrupt(void)

I think that matches with what I was thinking..  I mean for 2) not sure
whether it can be seen as a CMCI problem and potentially can be optimized
by adjust the cmci threshold dynamically.


The CE storm caused performance drop is caused by the extra cycles
spent by the ECC steps in memory controller, not in CMCI handling.
This is observed in the Google fleet as well. A good solution is to
monitor the CE rate closely in user space via /dev/mcelog and migrate
all VMs to another host once the CE rate exceeds some threshold.

CMCI is a _background_ interrupt that is not handled in the process
execution context and its handler is setup to switch to poll (1 / 5
min) mode if there are more than ~ a dozen CEs reported via CMCI per
second.


--
Peter Xu



Hi, Andrew, David, Naoya

According to the suggestions, I'd give up the improvement of memory 
failure on huge page in this series.


Is it worth recovering corrupted pages for the guest kernel? I'd follow 
your decision.


--
zhenwei pi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: RE: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm

2022-05-31 Thread zhenwei pi

On 5/31/22 20:08, Gonglei (Arei) wrote:




-Original Message-
From: zhenwei pi [mailto:pizhen...@bytedance.com]
Sent: Tuesday, May 31, 2022 9:48 AM
To: Gonglei (Arei) 
Cc: qemu-de...@nongnu.org; m...@redhat.com;
virtualization@lists.linux-foundation.org; helei.si...@bytedance.com;
berra...@redhat.com
Subject: Re: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm

On 5/30/22 21:31, Gonglei (Arei) wrote:




-Original Message-
From: zhenwei pi [mailto:pizhen...@bytedance.com]
Sent: Friday, May 27, 2022 4:48 PM
To: m...@redhat.com; Gonglei (Arei) 
Cc: qemu-de...@nongnu.org; virtualization@lists.linux-foundation.org;
helei.si...@bytedance.com; berra...@redhat.com; zhenwei pi

Subject: [PATCH v8 1/1] crypto: Introduce RSA algorithm



Skip...





OK. For the patch:

Reviewed-by: Gonglei 


Regards,
-Gonglei
 



Hi, Michael & Lei,

The other patches of this series has been already merged into QEMU, this 
patch is the last one. With this patch, we can test virtio-crypto 
akcipher end-to-end.


Thanks a lot!

--
zhenwei pi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[mst-vhost:vhost 62/65] drivers/virtio/virtio_ring.c:1783:9: error: use of undeclared identifier 'vq'

2022-05-31 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git vhost
head:   5e83df49b4993a11b01399f6ce402f775940f965
commit: a50f09346a341984d34ff41f03dbd14dea6f20fe [62/65] virtio_ring: remove 
unused variable in virtqueue_add()
config: mips-randconfig-c004-20220531 
(https://download.01.org/0day-ci/archive/20220601/202206010444.egbxgpmj-...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 
c825abd6b0198fb088d9752f556a70705bc99dfd)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mipsel-linux-gnu
# 
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?id=a50f09346a341984d34ff41f03dbd14dea6f20fe
git remote add mst-vhost 
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git
git fetch --no-tags mst-vhost vhost
git checkout a50f09346a341984d34ff41f03dbd14dea6f20fe
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 
O=build_dir ARCH=mips SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1783:9: error: use of undeclared identifier 'vq'
   return vq->packed_ring ? virtqueue_add_packed(_vq, sgs, total_sg,
  ^
   1 error generated.


vim +/vq +1783 drivers/virtio/virtio_ring.c

1ce9e6055fa0a9 Tiwei Bie 2018-11-21  1768  
1ce9e6055fa0a9 Tiwei Bie 2018-11-21  1769  
e6f633e5beab65 Tiwei Bie 2018-11-21  1770  /*
e6f633e5beab65 Tiwei Bie 2018-11-21  1771   * Generic functions and exported 
symbols.
e6f633e5beab65 Tiwei Bie 2018-11-21  1772   */
e6f633e5beab65 Tiwei Bie 2018-11-21  1773  
e6f633e5beab65 Tiwei Bie 2018-11-21  1774  static inline int 
virtqueue_add(struct virtqueue *_vq,
e6f633e5beab65 Tiwei Bie 2018-11-21  1775   struct 
scatterlist *sgs[],
e6f633e5beab65 Tiwei Bie 2018-11-21  1776   
unsigned int total_sg,
e6f633e5beab65 Tiwei Bie 2018-11-21  1777   
unsigned int out_sgs,
e6f633e5beab65 Tiwei Bie 2018-11-21  1778   
unsigned int in_sgs,
e6f633e5beab65 Tiwei Bie 2018-11-21  1779   void 
*data,
e6f633e5beab65 Tiwei Bie 2018-11-21  1780   void 
*ctx,
e6f633e5beab65 Tiwei Bie 2018-11-21  1781   gfp_t 
gfp)
e6f633e5beab65 Tiwei Bie 2018-11-21  1782  {
1ce9e6055fa0a9 Tiwei Bie 2018-11-21 @1783   return vq->packed_ring ? 
virtqueue_add_packed(_vq, sgs, total_sg,
1ce9e6055fa0a9 Tiwei Bie 2018-11-21  1784   
out_sgs, in_sgs, data, ctx, gfp) :
1ce9e6055fa0a9 Tiwei Bie 2018-11-21  1785
virtqueue_add_split(_vq, sgs, total_sg,
e6f633e5beab65 Tiwei Bie 2018-11-21  1786   
out_sgs, in_sgs, data, ctx, gfp);
e6f633e5beab65 Tiwei Bie 2018-11-21  1787  }
e6f633e5beab65 Tiwei Bie 2018-11-21  1788  

:: The code at line 1783 was first introduced by commit
:: 1ce9e6055fa0a9043405c5604cf19169ec5379ff virtio_ring: introduce packed 
ring support

:: TO: Tiwei Bie 
:: CC: David S. Miller 

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-31 Thread Parav Pandit via Virtualization


> From: Eugenio Perez Martin 
> Sent: Friday, May 27, 2022 3:55 AM
> 
> On Fri, May 27, 2022 at 4:26 AM Jason Wang  wrote:
> >
> > On Thu, May 26, 2022 at 8:54 PM Parav Pandit  wrote:
> > >
> > >
> > >
> > > > From: Eugenio Pérez 
> > > > Sent: Thursday, May 26, 2022 8:44 AM
> > >
> > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > offer
> > > >
> > > > that backend feature and userspace can effectively stop the device.
> > > >
> > > >
> > > >
> > > > This is a must before get virtqueue indexes (base) for live
> > > > migration,
> > > >
> > > > since the device could modify them after userland gets them. There
> > > > are
> > > >
> > > > individual ways to perform that action for some devices
> > > >
> > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> there
> > > > was no
> > > >
> > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > any
> > > >
> > > > pending operations like in flight requests. It must also preserve
> > > > all
> > > >
> > > > the necessary state (the virtqueue vring base plus the possible
> > > > device
> > > >
> > > > specific states) that is required for restoring in the future. The
> > > >
> > > > device must not change its configuration after that point.
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop == 0, the device can continue
> > > >
> > > > processing buffers as long as typical conditions are met (vq is
> > > > enabled,
> > > >
> > > > DRIVER_OK status bit is enabled, etc).
> > >
> > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> any mechanism in the virtio spec.
> >
> > We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That
> > means it is expected to implement at least a subset of
> > VIRTIO_CONFIG_S_STOP.
> >
> 
> Appending a link to the proposal, just for reference [1].
> 
> > >
> > > Why can't we use this ioctl() to indicate driver to start/stop the device
> instead of driving it through the driver_ok?
> >
> 
> Parav, I'm not sure I follow you here.
> 
> By the proposal, the resume of the device is (From qemu POV):
> 1. To configure all data vqs and cvq (addr, num, ...) 2. To enable only CVQ, 
> not
> data vqs 3. To send DRIVER_OK 4. Wait for all buffers of CVQ to be used 5. To
> enable all others data vqs (individual ioctl at the moment)
> 
> Where can we fit the resume (as "stop(false)") here? If the device is stopped
> (as if we send stop(true) before DRIVER_OK), we don't read CVQ first. If we
> send it right after (or instead) DRIVER_OK, data buffers can reach data vqs
> before configuring RSS.
> 
It doesn’t make sense with currently proposed way of using cvq to replay the 
config.
Need to continue with currently proposed temporary method that subsequently to 
be replaced with optimized flow as we discussed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-31 Thread Parav Pandit via Virtualization

> From: Jason Wang 
> Sent: Sunday, May 29, 2022 11:39 PM
> 
> On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, May 26, 2022 at 12:54:32PM +, Parav Pandit wrote:
> > >
> > >
> > > > From: Eugenio Pérez 
> > > > Sent: Thursday, May 26, 2022 8:44 AM
> > >
> > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will
> > > > offer
> > > >
> > > > that backend feature and userspace can effectively stop the device.
> > > >
> > > >
> > > >
> > > > This is a must before get virtqueue indexes (base) for live
> > > > migration,
> > > >
> > > > since the device could modify them after userland gets them. There
> > > > are
> > > >
> > > > individual ways to perform that action for some devices
> > > >
> > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but
> there
> > > > was no
> > > >
> > > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop != 0, the device MUST finish
> > > > any
> > > >
> > > > pending operations like in flight requests. It must also preserve
> > > > all
> > > >
> > > > the necessary state (the virtqueue vring base plus the possible
> > > > device
> > > >
> > > > specific states) that is required for restoring in the future. The
> > > >
> > > > device must not change its configuration after that point.
> > > >
> > > >
> > > >
> > > > After the return of ioctl with stop == 0, the device can continue
> > > >
> > > > processing buffers as long as typical conditions are met (vq is
> > > > enabled,
> > > >
> > > > DRIVER_OK status bit is enabled, etc).
> > >
> > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to
> any mechanism in the virtio spec.
> > >
> > > Why can't we use this ioctl() to indicate driver to start/stop the device
> instead of driving it through the driver_ok?
> > > This is in the context of other discussion we had in the LM series.
> >
> > If there's something in the spec that does this then let's use that.
> 
> Actually, we try to propose a independent feature here:
> 
> https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.html
> 
This will stop the device for all the operations.
Once the device is stopped, its state cannot be queried further as device won't 
respond.
It has limited use case.
What we need is to stop non admin queue related portion of the device.

> Does it make sense to you?
> 
> Thanks
> 
> > Unfortunately the LM series seems to be stuck on moving bits around
> > with the admin virtqueue ...
> >
> > --
> > MST
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH V9 0/8] Use copy_process in vhost layer

2022-05-31 Thread Michael S. Tsirkin
On Thu, May 12, 2022 at 04:46:56PM -0500, Mike Christie wrote:
> The following patches were made over Eric's tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
> 
> and his kthread-cleanups-for-v5.19 branch. They allow the vhost layer to
> do a copy_process on the thread that does the VHOST_SET_OWNER ioctl like how
> io_uring does a copy_process against its userspace app. This allows the
> vhost layer's worker threads to inherit cgroups, namespaces, address
> space, etc and this worker thread will also be accounted for against that
> owner/parent process's RLIMIT_NPROC limit.
> 
> If you are not familiar with qemu and vhost here is more detailed
> problem description:
> 
> Qemu will create vhost devices in the kernel which perform network, SCSI,
> etc IO and management operations from worker threads created by the
> kthread API. Because the kthread API does a copy_process on the kthreadd
> thread, the vhost layer has to use kthread_use_mm to access the Qemu
> thread's memory and cgroup_attach_task_all to add itself to the Qemu
> thread's cgroups.
> 
> The problem with this approach is that we then have to add new functions/
> args/functionality for every thing we want to inherit. I started doing
> that here:
> 
> https://lkml.org/lkml/2021/6/23/1233
> 
> for the RLIMIT_NPROC check, but it seems it might be easier to just
> inherit everything from the beginning, becuase I'd need to do something
> like that patch several times.

Series:

Acked-by: Michael S. Tsirkin 

Who is going to merge this?

> V9:
> - Rebase against Eric's kthread-cleanups-for-v5.19 branch. Drop patches
> no longer needed due to kernel clone arg and pf io worker patches in that
> branch.
> V8:
> - Fix kzalloc GFP use.
> - Fix email subject version number.
> V7:
> - Drop generic user_worker_* helpers and replace with vhost_task specific
>   ones.
> - Drop autoreap patch. Use kernel_wait4 instead.
> - Fix issue where vhost.ko could be removed while the worker function is
>   still running.
> V6:
> - Rename kernel_worker to user_worker and fix prefixes.
> - Add better patch descriptions.
> V5:
> - Handle kbuild errors by building patchset against current kernel that
>   has all deps merged. Also add patch to remove create_io_thread code as
>   it's not used anymore.
> - Rebase patchset against current kernel and handle a new vm PF_IO_WORKER
>   case added in 5.16-rc1.
> - Add PF_USER_WORKER flag so we can check it later after the initial
>   thread creation for the wake up, vm and singal cses.
> - Added patch to auto reap the worker thread.
> V4:
> - Drop NO_SIG patch and replaced with Christian's SIG_IGN patch.
> - Merged Christian's kernel_worker_flags_valid helpers into patch 5 that
>   added the new kernel worker functions.
> - Fixed extra "i" issue.
> - Added PF_USER_WORKER flag and added check that kernel_worker_start users
>   had that flag set. Also dropped patches that passed worker flags to
>   copy_thread and replaced with PF_USER_WORKER check.
> V3:
> - Add parentheses in p->flag and work_flags check in copy_thread.
> - Fix check in arm/arm64 which was doing the reverse of other archs
>   where it did likely(!flags) instead of unlikely(flags).
> V2:
> - Rename kernel_copy_process to kernel_worker.
> - Instead of exporting functions, make kernel_worker() a proper
>   function/API that does common work for the caller.
> - Instead of adding new fields to kernel_clone_args for each option
>   make it flag based similar to CLONE_*.
> - Drop unused completion struct in vhost.
> - Fix compile warnings by merging vhost cgroup cleanup patch and
>   vhost conversion patch.
> 
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


ACM HPDC 2022 Call for Participation

2022-05-31 Thread Ali Anwar
HPDC'22 Call for Participation
https://www.hpdc.org/2022/program/call-for-participation/

Overview:
The ACM International Symposium on High-Performance Parallel and
Distributed Computing (HPDC) is the premier annual conference for
presenting the latest research on the design, implementation,
evaluation, and the use of parallel and distributed systems for
high-end computing.

Registration:
HPDC'22 will be held in both in-person and virtual mode. HPDC
registrants can attend the main conference and workshops. There is a
separate workshops only registration. For more information, visit
Registration.

Highlights:
1) 3 Keynotes: Franck Cappello (Argonne National Lab), Sudhanva
Gurumurthi (AMD), Manish Parashar (University of Utah). For more
information, visit Keynote Speakers.
2) HPDC Achievement Award: Franck Cappello. For more information,
visit Achievement Award.
3) 21 research papers and 5 poster presentations across 6 sessions,
namely, {Data Centers and HPC Systems}, {HPC Memory, I/O, and Storage
Systems}, {Reliability and Scheduling}, {HPC Algorithms}, {HPC
Toolchains, Traces, and More} and {Cloud Computing and Machine
Learning}. For more information, visit Technical Sessions and Accepted
Papers/Posters.
4) This year we have 7 workshops to be held in-conjunction with
HPDC'22. For more information, visit Workshops.

Main Conference:
The 21 contributed papers accepted for the main conference will be
presented in hybrid(virtual and in-person) technical sessions on June
27, 28 and 29. Details on the presentation format to follow.

Poster Sessions:
The 5 accepted posters will be presented on the evening of June 28.
Details on the presentation format to follow.

Workshops:
The 7 workshops will be held in two sessions:

June 30: SNTA, HiPS'22, PERMAVOST, P-RECS'22
July 1: FRAME'22, EMOSS'22, FlexScience
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] x86/vmware: use unsigned integer for shifting

2022-05-31 Thread Srivatsa S. Bhat
On 5/28/22 6:52 AM, Peter Zijlstra wrote:
> On Fri, May 27, 2022 at 11:27:37PM +0530, Shreenidhi Shedi wrote:
>> From: Shreenidhi Shedi 
>>
>> Shifting signed 32-bit value by 31 bits is implementation-defined
>> behaviour. Using unsigned is better option for this.
> 
> The kernel builds with -fno-strict-overflow and as such this behaviour
> is well defined.
>

Ah, I see. Thank you, Peter!
 
>> Fixes: 4cca6ea04d31 ("x86/apic: Allow x2apic without IR on VMware platform")
> 
> Nothing broken, therefore nothing fixed.
> 

Agreed.

I think using the BIT() macro still provides a nice readability
improvement. So, Shreenidhi, could you spin a new version of the patch
with the same code changes but with a different commit message about
using the BIT() macro to simplify the code, and also include a
clarification as to why the existing code is correct (which Peter
pointed out), please?

Thank you!

Regards,
Srivatsa
VMware Photon OS
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm

2022-05-31 Thread Gonglei (Arei) via Virtualization



> -Original Message-
> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> Sent: Tuesday, May 31, 2022 9:48 AM
> To: Gonglei (Arei) 
> Cc: qemu-de...@nongnu.org; m...@redhat.com;
> virtualization@lists.linux-foundation.org; helei.si...@bytedance.com;
> berra...@redhat.com
> Subject: Re: RE: [PATCH v8 1/1] crypto: Introduce RSA algorithm
> 
> On 5/30/22 21:31, Gonglei (Arei) wrote:
> >
> >
> >> -Original Message-
> >> From: zhenwei pi [mailto:pizhen...@bytedance.com]
> >> Sent: Friday, May 27, 2022 4:48 PM
> >> To: m...@redhat.com; Gonglei (Arei) 
> >> Cc: qemu-de...@nongnu.org; virtualization@lists.linux-foundation.org;
> >> helei.si...@bytedance.com; berra...@redhat.com; zhenwei pi
> >> 
> >> Subject: [PATCH v8 1/1] crypto: Introduce RSA algorithm
> >>
> >>
> > Skip...
> >
> >> +static int64_t
> >> +virtio_crypto_create_asym_session(VirtIOCrypto *vcrypto,
> >> +   struct virtio_crypto_akcipher_create_session_req
> >> *sess_req,
> >> +   uint32_t queue_id, uint32_t opcode,
> >> +   struct iovec *iov, unsigned int out_num) {
> >> +VirtIODevice *vdev = VIRTIO_DEVICE(vcrypto);
> >> +CryptoDevBackendSessionInfo info = {0};
> >> +CryptoDevBackendAsymSessionInfo *asym_info;
> >> +int64_t session_id;
> >> +int queue_index;
> >> +uint32_t algo, keytype, keylen;
> >> +g_autofree uint8_t *key = NULL;
> >> +Error *local_err = NULL;
> >> +
> >> +algo = ldl_le_p(&sess_req->para.algo);
> >> +keytype = ldl_le_p(&sess_req->para.keytype);
> >> +keylen = ldl_le_p(&sess_req->para.keylen);
> >> +
> >> +if ((keytype != VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PUBLIC)
> >> + && (keytype !=
> VIRTIO_CRYPTO_AKCIPHER_KEY_TYPE_PRIVATE)) {
> >> +error_report("unsupported asym keytype: %d", keytype);
> >> +return -VIRTIO_CRYPTO_NOTSUPP;
> >> +}
> >> +
> >> +if (keylen) {
> >> +key = g_malloc(keylen);
> >> +if (iov_to_buf(iov, out_num, 0, key, keylen) != keylen) {
> >> +virtio_error(vdev, "virtio-crypto asym key incorrect");
> >> +return -EFAULT;
> >
> > Memory leak.
> >
> >> +}
> >> +iov_discard_front(&iov, &out_num, keylen);
> >> +}
> >> +
> >> +info.op_code = opcode;
> >> +asym_info = &info.u.asym_sess_info;
> >> +asym_info->algo = algo;
> >> +asym_info->keytype = keytype;
> >> +asym_info->keylen = keylen;
> >> +asym_info->key = key;
> >> +switch (asym_info->algo) {
> >> +case VIRTIO_CRYPTO_AKCIPHER_RSA:
> >> +asym_info->u.rsa.padding_algo =
> >> +ldl_le_p(&sess_req->para.u.rsa.padding_algo);
> >> +asym_info->u.rsa.hash_algo =
> >> +ldl_le_p(&sess_req->para.u.rsa.hash_algo);
> >> +break;
> >> +
> >> +/* TODO DSA&ECDSA handling */
> >> +
> >> +default:
> >> +return -VIRTIO_CRYPTO_ERR;
> >> +}
> >> +
> >> +queue_index = virtio_crypto_vq2q(queue_id);
> >> +session_id =
> >> + cryptodev_backend_create_session(vcrypto->cryptodev,
> >> &info,
> >> + queue_index, &local_err);
> >> +if (session_id < 0) {
> >> +if (local_err) {
> >> +error_report_err(local_err);
> >> +}
> >> +return -VIRTIO_CRYPTO_ERR;
> >> +}
> >> +
> >> +return session_id;
> >
> > Where to free the key at both normal and exceptional paths?
> >
> 
> Hi, Lei
> 
> The key is declared with g_autofree:
> g_autofree uint8_t *key = NULL;
> 

OK. For the patch:

Reviewed-by: Gonglei 


Regards,
-Gonglei


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-05-31 Thread Michael S. Tsirkin
On Tue, May 31, 2022 at 09:13:38AM +0200, Eugenio Perez Martin wrote:
> On Tue, May 31, 2022 at 7:42 AM Michael S. Tsirkin  wrote:
> >
> > On Thu, May 26, 2022 at 02:43:34PM +0200, Eugenio Pérez wrote:
> > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
> > > that backend feature and userspace can effectively stop the device.
> > >
> > > This is a must before get virtqueue indexes (base) for live migration,
> > > since the device could modify them after userland gets them. There are
> > > individual ways to perform that action for some devices
> > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
> > > way to perform it for any vhost device (and, in particular, vhost-vdpa).
> > >
> > > After the return of ioctl with stop != 0, the device MUST finish any
> > > pending operations like in flight requests. It must also preserve all
> > > the necessary state (the virtqueue vring base plus the possible device
> > > specific states) that is required for restoring in the future. The
> > > device must not change its configuration after that point.
> > >
> > > After the return of ioctl with stop == 0, the device can continue
> > > processing buffers as long as typical conditions are met (vq is enabled,
> > > DRIVER_OK status bit is enabled, etc).
> > >
> > > In the future, we will provide features similar to 
> > > VHOST_USER_GET_INFLIGHT_FD
> > > so the device can save pending operations.
> > >
> > > Comments are welcome.
> >
> >
> > So given this is just for simulator and affects UAPI I think it's fine
> > to make it wait for the next merge window, until there's a consensus.
> > Right?
> >
> 
> While the change is only implemented in the simulator at this moment,
> it's just the very last missing piece in the kernel to implement
> complete live migration for net devices with cvq :). All vendor
> drivers can implement this call with current code, just a little bit
> of plumbing is needed. And it was accepted in previous meetings.
> 
> If it proves it works for every configuration (nested, etc), the
> implementation can forward the call to the admin vq for example. At
> the moment, it follows the proposed stop status bit sematic to stop
> the device, which POC has been tested in these circumstances.
> 
> Thanks!

Oh absolutely, but I am guessing this plumbing won't
be ready for this merge window.

> > > v4:
> > > * Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.
> > >
> > > v3:
> > > * s/VHOST_STOP/VHOST_VDPA_STOP/
> > > * Add documentation and requirements of the ioctl above its definition.
> > >
> > > v2:
> > > * Replace raw _F_STOP with BIT_ULL(_F_STOP).
> > > * Fix obtaining of stop ioctl arg (it was not obtained but written).
> > > * Add stop to vdpa_sim_blk.
> > >
> > > Eugenio Pérez (4):
> > >   vdpa: Add stop operation
> > >   vhost-vdpa: introduce STOP backend feature bit
> > >   vhost-vdpa: uAPI to stop the device
> > >   vdpa_sim: Implement stop vdpa op
> > >
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +
> > >  drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
> > >  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
> > >  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
> > >  drivers/vhost/vdpa.c | 34 +++-
> > >  include/linux/vdpa.h |  6 +
> > >  include/uapi/linux/vhost.h   | 14 
> > >  include/uapi/linux/vhost_types.h |  2 ++
> > >  8 files changed, 83 insertions(+), 1 deletion(-)
> > >
> > > --
> > > 2.31.1
> > >
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization