Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Jason Wang
On Thu, Jun 2, 2022 at 10:59 AM Parav Pandit  wrote:
>
>
> > From: Jason Wang 
> > Sent: Wednesday, June 1, 2022 10:00 PM
> >
> > On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit  wrote:
> > >
> > >
> > > > From: Jason Wang 
> > > > Sent: Tuesday, May 31, 2022 10:42 PM
> > > >
> > > > Well, the ability to query the virtqueue state was proposed as
> > > > another feature (Eugenio, please correct me). This should be
> > > > sufficient for making virtio-net to be live migrated.
> > > >
> > > The device is stopped, it won't answer to this special vq config done 
> > > here.
> >
> > This depends on the definition of the stop. Any query to the device state
> > should be allowed otherwise it's meaningless for us.
> >
> > > Programming all of these using cfg registers doesn't scale for on-chip
> > memory and for the speed.
> >
> > Well, they are orthogonal and what I want to say is, we should first define
> > the semantics of stop and state of the virtqueue.
> >
> > Such a facility could be accessed by either transport specific method or 
> > admin
> > virtqueue, it totally depends on the hardware architecture of the vendor.
> >
> I find it hard to believe that a vendor can implement a CVQ but not AQ and 
> chose to expose tens of hundreds of registers.
> But maybe, it fits some specific hw.

You can have a look at the ifcvf dpdk driver as an example.

But another thing that is unrelated to hardware architecture is the
nesting support. Having admin virtqueue in a nesting environment looks
like an overkill. Presenting a register in L1 and map it to L0's admin
should be good enough.

>
> I like to learn the advantages of such method other than simplicity.
>
> We can clearly that we are shifting away from such PCI registers with SIOV, 
> IMS and other scalable solutions.
> virtio drifting in reverse direction by introducing more registers as 
> transport.
> I expect it to an optional transport like AQ.

Actually, I had a proposal of using admin virtqueue as a transport,
it's designed to be SIOV/IMS capable. And it's not hard to extend it
with the state/stop support etc.

>
> > >
> > > Next would be to program hundreds of statistics of the 64 VQs through a
> > giant PCI config space register in some busy polling scheme.
> >
> > We don't need giant config space, and this method has been implemented
> > by some vDPA vendors.
> >
> There are tens of 64-bit counters per VQs. These needs to programmed on 
> destination side.
> Programming these via registers requires exposing them on the registers.
> In one of the proposals, I see them being queried via CVQ from the device.

I didn't see a proposal like this. And I don't think querying general
virtio state like idx with a device specific CVQ is a good design.

>
> Programming them via cfg registers requires large cfg space or synchronous 
> programming until receiving ACK from it.
> This means one entry at a time...
>
> Programming them via CVQ needs replicate and align cmd values etc on all 
> device types. All duplicate and hard to maintain.
>
>
> > >
> > > I can clearly see how all these are inefficient for faster LM.
> > > We need an efficient AQ to proceed with at minimum.
> >
> > I'm fine with admin virtqueue, but the stop and state are orthogonal to 
> > that.
> > And using admin virtqueue for stop/state will be more natural if we use
> > admin virtqueue as a transport.
> Ok.
> We should have defined it bit earlier that all vendors can use. :(

I agree.

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Parav Pandit via Virtualization


> From: Jason Wang 
> Sent: Wednesday, June 1, 2022 10:00 PM
> 
> On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit  wrote:
> >
> >
> > > From: Jason Wang 
> > > Sent: Tuesday, May 31, 2022 10:42 PM
> > >
> > > Well, the ability to query the virtqueue state was proposed as
> > > another feature (Eugenio, please correct me). This should be
> > > sufficient for making virtio-net to be live migrated.
> > >
> > The device is stopped, it won't answer to this special vq config done here.
> 
> This depends on the definition of the stop. Any query to the device state
> should be allowed otherwise it's meaningless for us.
> 
> > Programming all of these using cfg registers doesn't scale for on-chip
> memory and for the speed.
> 
> Well, they are orthogonal and what I want to say is, we should first define
> the semantics of stop and state of the virtqueue.
> 
> Such a facility could be accessed by either transport specific method or admin
> virtqueue, it totally depends on the hardware architecture of the vendor.
> 
I find it hard to believe that a vendor can implement a CVQ but not AQ and 
chose to expose tens of hundreds of registers.
But maybe, it fits some specific hw.

I like to learn the advantages of such method other than simplicity.

We can clearly that we are shifting away from such PCI registers with SIOV, IMS 
and other scalable solutions.
virtio drifting in reverse direction by introducing more registers as transport.
I expect it to an optional transport like AQ.

> >
> > Next would be to program hundreds of statistics of the 64 VQs through a
> giant PCI config space register in some busy polling scheme.
> 
> We don't need giant config space, and this method has been implemented
> by some vDPA vendors.
> 
There are tens of 64-bit counters per VQs. These needs to programmed on 
destination side.
Programming these via registers requires exposing them on the registers.
In one of the proposals, I see them being queried via CVQ from the device.

Programming them via cfg registers requires large cfg space or synchronous 
programming until receiving ACK from it.
This means one entry at a time...

Programming them via CVQ needs replicate and align cmd values etc on all device 
types. All duplicate and hard to maintain.


> >
> > I can clearly see how all these are inefficient for faster LM.
> > We need an efficient AQ to proceed with at minimum.
> 
> I'm fine with admin virtqueue, but the stop and state are orthogonal to that.
> And using admin virtqueue for stop/state will be more natural if we use
> admin virtqueue as a transport.
Ok.
We should have defined it bit earlier that all vendors can use. :(
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Jason Wang


在 2022/5/26 20:43, Eugenio Pérez 写道:

Implement stop operation for vdpa_sim devices, so vhost-vdpa will offer
that backend feature and userspace can effectively stop the device.

This is a must before get virtqueue indexes (base) for live migration,
since the device could modify them after userland gets them. There are
individual ways to perform that action for some devices
(VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no
way to perform it for any vhost device (and, in particular, vhost-vdpa).

After the return of ioctl with stop != 0, the device MUST finish any
pending operations like in flight requests. It must also preserve all
the necessary state (the virtqueue vring base plus the possible device
specific states) that is required for restoring in the future. The
device must not change its configuration after that point.



I think we probably need more accurate definition on the state as Parav 
suggested.


Besides this, we should also clarify when stop is allowed. E.g should we 
allow setting stop without DRIVER_OK?


Thanks




After the return of ioctl with stop == 0, the device can continue
processing buffers as long as typical conditions are met (vq is enabled,
DRIVER_OK status bit is enabled, etc).

In the future, we will provide features similar to VHOST_USER_GET_INFLIGHT_FD
so the device can save pending operations.

Comments are welcome.

v4:
* Replace VHOST_STOP to VHOST_VDPA_STOP in vhost ioctl switch case too.

v3:
* s/VHOST_STOP/VHOST_VDPA_STOP/
* Add documentation and requirements of the ioctl above its definition.

v2:
* Replace raw _F_STOP with BIT_ULL(_F_STOP).
* Fix obtaining of stop ioctl arg (it was not obtained but written).
* Add stop to vdpa_sim_blk.

Eugenio Pérez (4):
   vdpa: Add stop operation
   vhost-vdpa: introduce STOP backend feature bit
   vhost-vdpa: uAPI to stop the device
   vdpa_sim: Implement stop vdpa op

  drivers/vdpa/vdpa_sim/vdpa_sim.c | 21 +
  drivers/vdpa/vdpa_sim/vdpa_sim.h |  1 +
  drivers/vdpa/vdpa_sim/vdpa_sim_blk.c |  3 +++
  drivers/vdpa/vdpa_sim/vdpa_sim_net.c |  3 +++
  drivers/vhost/vdpa.c | 34 +++-
  include/linux/vdpa.h |  6 +
  include/uapi/linux/vhost.h   | 14 
  include/uapi/linux/vhost_types.h |  2 ++
  8 files changed, 83 insertions(+), 1 deletion(-)



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Jason Wang
On Thu, Jun 2, 2022 at 3:30 AM Parav Pandit  wrote:
>
>
>
> > From: Eugenio Perez Martin 
> > Sent: Wednesday, June 1, 2022 5:50 AM
> >
> > On Tue, May 31, 2022 at 10:19 PM Parav Pandit  wrote:
> > >
> > >
> > > > From: Jason Wang 
> > > > Sent: Sunday, May 29, 2022 11:39 PM
> > > >
> > > > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin 
> > wrote:
> > > > >
> > > > > On Thu, May 26, 2022 at 12:54:32PM +, Parav Pandit wrote:
> > > > > >
> > > > > >
> > > > > > > From: Eugenio Pérez 
> > > > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > > > >
> > > > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa
> > > > > > > will offer
> > > > > > >
> > > > > > > that backend feature and userspace can effectively stop the 
> > > > > > > device.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > This is a must before get virtqueue indexes (base) for live
> > > > > > > migration,
> > > > > > >
> > > > > > > since the device could modify them after userland gets them.
> > > > > > > There are
> > > > > > >
> > > > > > > individual ways to perform that action for some devices
> > > > > > >
> > > > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...)
> > but
> > > > there
> > > > > > > was no
> > > > > > >
> > > > > > > way to perform it for any vhost device (and, in particular, vhost-
> > vdpa).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After the return of ioctl with stop != 0, the device MUST
> > > > > > > finish any
> > > > > > >
> > > > > > > pending operations like in flight requests. It must also
> > > > > > > preserve all
> > > > > > >
> > > > > > > the necessary state (the virtqueue vring base plus the
> > > > > > > possible device
> > > > > > >
> > > > > > > specific states) that is required for restoring in the future.
> > > > > > > The
> > > > > > >
> > > > > > > device must not change its configuration after that point.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After the return of ioctl with stop == 0, the device can
> > > > > > > continue
> > > > > > >
> > > > > > > processing buffers as long as typical conditions are met (vq
> > > > > > > is enabled,
> > > > > > >
> > > > > > > DRIVER_OK status bit is enabled, etc).
> > > > > >
> > > > > > Just to be clear, we are adding vdpa level new ioctl() that
> > > > > > doesn’t map to
> > > > any mechanism in the virtio spec.
> > > > > >
> > > > > > Why can't we use this ioctl() to indicate driver to start/stop
> > > > > > the device
> > > > instead of driving it through the driver_ok?
> > > > > > This is in the context of other discussion we had in the LM series.
> > > > >
> > > > > If there's something in the spec that does this then let's use that.
> > > >
> > > > Actually, we try to propose a independent feature here:
> > > >
> > > > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.htm
> > > > l
> > > >
> > > This will stop the device for all the operations.
> > > Once the device is stopped, its state cannot be queried further as device
> > won't respond.
> > > It has limited use case.
> > > What we need is to stop non admin queue related portion of the device.
> > >
> >
> > Still don't follow this, sorry.
> Once a device it stopped its state etc cannot be queried.

This is not what is proposed here.

> if you want to stop and still allow certain operations, a better spec 
> definition is needed that says,
>
> stop A, B, C, but allow D, E, F, G.
> A = stop CVQs and save its state somewhere
> B = stop data VQs and save it state somewhere
> C = stop generic config interrupt

Actually, it's the stop of the config space change.
And what more, any guest visible state must not be changed.

>
> D = query state of multiple VQs
> E = query device statistics and other elements/objects in future

This is the device state I believe.

> F = setup/config/restore certain fields

This is the reverse of D and E, that is setting the state.

> G = resume the device
>

Thanks

> >
> > Adding the admin vq to the mix, this would stop a device of a device group,
> > but not the whole virtqueue group. If the admin VQ is offered by the PF
> > (since it's not exposed to the guest), it will continue accepting requests 
> > as
> > normal. If it's exposed in the VF, I think the best bet is to shadow it, 
> > since
> > guest and host requests could conflict.
> >
> > Since this is offered through vdpa, the device backend driver can route it 
> > to
> > whatever method works better for the hardware. For example, to send an
> > admin vq command to the PF. That's why it's important to keep the feature
> > as self-contained and orthogonal to others as possible.
> >
>
> I replied in other thread to continue there.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Jason Wang
On Thu, Jun 2, 2022 at 2:58 AM Parav Pandit  wrote:
>
>
> > From: Jason Wang 
> > Sent: Tuesday, May 31, 2022 10:42 PM
> >
> > Well, the ability to query the virtqueue state was proposed as another
> > feature (Eugenio, please correct me). This should be sufficient for making
> > virtio-net to be live migrated.
> >
> The device is stopped, it won't answer to this special vq config done here.

This depends on the definition of the stop. Any query to the device
state should be allowed otherwise it's meaningless for us.

> Programming all of these using cfg registers doesn't scale for on-chip memory 
> and for the speed.

Well, they are orthogonal and what I want to say is, we should first
define the semantics of stop and state of the virtqueue.

Such a facility could be accessed by either transport specific method
or admin virtqueue, it totally depends on the hardware architecture of
the vendor.

>
> Next would be to program hundreds of statistics of the 64 VQs through a giant 
> PCI config space register in some busy polling scheme.

We don't need giant config space, and this method has been implemented
by some vDPA vendors.

>
> I can clearly see how all these are inefficient for faster LM.
> We need an efficient AQ to proceed with at minimum.

I'm fine with admin virtqueue, but the stop and state are orthogonal
to that. And using admin virtqueue for stop/state will be more natural
if we use admin virtqueue as a transport.

Thanks

>
> > https://lists.oasis-open.org/archives/virtio-comment/202103/msg8.html
> >
> > > Once the device is stopped, its state cannot be queried further as device
> > won't respond.
> > > It has limited use case.
> > > What we need is to stop non admin queue related portion of the device.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V9 0/8] Use copy_process in vhost layer

2022-06-01 Thread michael . christie
On 5/12/22 4:46 PM, Mike Christie wrote:
> The following patches were made over Eric's tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git
> 
> and his kthread-cleanups-for-v5.19 branch.

Hi Eric,

Did you have more review comments for the patches or are you ok with them
now? I haven't seen any comments the last couple of postings. I've
been watching your trees, so I know you are doing cleanup/improvements
in the areas I needed help with so I know it's still on your radar, but
now I'm worried I'm missing emails/reviews from you :)

Are you also going to push the kthread-cleanups for 5.19 still or is that
going to be 5.20? I have patches that depend and/or have conflicts with
this set so I'm trying to figure out what I should be rebasing and testing
against.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Parav Pandit via Virtualization


> From: Eugenio Perez Martin 
> Sent: Wednesday, June 1, 2022 5:50 AM
> 
> On Tue, May 31, 2022 at 10:19 PM Parav Pandit  wrote:
> >
> >
> > > From: Jason Wang 
> > > Sent: Sunday, May 29, 2022 11:39 PM
> > >
> > > On Fri, May 27, 2022 at 6:56 PM Michael S. Tsirkin 
> wrote:
> > > >
> > > > On Thu, May 26, 2022 at 12:54:32PM +, Parav Pandit wrote:
> > > > >
> > > > >
> > > > > > From: Eugenio Pérez 
> > > > > > Sent: Thursday, May 26, 2022 8:44 AM
> > > > >
> > > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa
> > > > > > will offer
> > > > > >
> > > > > > that backend feature and userspace can effectively stop the device.
> > > > > >
> > > > > >
> > > > > >
> > > > > > This is a must before get virtqueue indexes (base) for live
> > > > > > migration,
> > > > > >
> > > > > > since the device could modify them after userland gets them.
> > > > > > There are
> > > > > >
> > > > > > individual ways to perform that action for some devices
> > > > > >
> > > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...)
> but
> > > there
> > > > > > was no
> > > > > >
> > > > > > way to perform it for any vhost device (and, in particular, vhost-
> vdpa).
> > > > > >
> > > > > >
> > > > > >
> > > > > > After the return of ioctl with stop != 0, the device MUST
> > > > > > finish any
> > > > > >
> > > > > > pending operations like in flight requests. It must also
> > > > > > preserve all
> > > > > >
> > > > > > the necessary state (the virtqueue vring base plus the
> > > > > > possible device
> > > > > >
> > > > > > specific states) that is required for restoring in the future.
> > > > > > The
> > > > > >
> > > > > > device must not change its configuration after that point.
> > > > > >
> > > > > >
> > > > > >
> > > > > > After the return of ioctl with stop == 0, the device can
> > > > > > continue
> > > > > >
> > > > > > processing buffers as long as typical conditions are met (vq
> > > > > > is enabled,
> > > > > >
> > > > > > DRIVER_OK status bit is enabled, etc).
> > > > >
> > > > > Just to be clear, we are adding vdpa level new ioctl() that
> > > > > doesn’t map to
> > > any mechanism in the virtio spec.
> > > > >
> > > > > Why can't we use this ioctl() to indicate driver to start/stop
> > > > > the device
> > > instead of driving it through the driver_ok?
> > > > > This is in the context of other discussion we had in the LM series.
> > > >
> > > > If there's something in the spec that does this then let's use that.
> > >
> > > Actually, we try to propose a independent feature here:
> > >
> > > https://lists.oasis-open.org/archives/virtio-dev/202111/msg00020.htm
> > > l
> > >
> > This will stop the device for all the operations.
> > Once the device is stopped, its state cannot be queried further as device
> won't respond.
> > It has limited use case.
> > What we need is to stop non admin queue related portion of the device.
> >
> 
> Still don't follow this, sorry.
Once a device it stopped its state etc cannot be queried.
if you want to stop and still allow certain operations, a better spec 
definition is needed that says,

stop A, B, C, but allow D, E, F, G.
A = stop CVQs and save its state somewhere
B = stop data VQs and save it state somewhere
C = stop generic config interrupt

D = query state of multiple VQs
E = query device statistics and other elements/objects in future
F = setup/config/restore certain fields
G = resume the device

> 
> Adding the admin vq to the mix, this would stop a device of a device group,
> but not the whole virtqueue group. If the admin VQ is offered by the PF
> (since it's not exposed to the guest), it will continue accepting requests as
> normal. If it's exposed in the VF, I think the best bet is to shadow it, since
> guest and host requests could conflict.
> 
> Since this is offered through vdpa, the device backend driver can route it to
> whatever method works better for the hardware. For example, to send an
> admin vq command to the PF. That's why it's important to keep the feature
> as self-contained and orthogonal to others as possible.
> 

I replied in other thread to continue there.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

RE: [PATCH v4 3/4] vhost-vdpa: uAPI to stop the device

2022-06-01 Thread Parav Pandit via Virtualization


> From: Eugenio Perez Martin 
> Sent: Wednesday, June 1, 2022 7:15 AM
> 
> On Wed, Jun 1, 2022 at 1:03 PM Michael S. Tsirkin  wrote:
> >
> > On Thu, May 26, 2022 at 02:43:37PM +0200, Eugenio Pérez wrote:
> > > The ioctl adds support for stop the device from userspace.
> > >
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >  drivers/vhost/vdpa.c   | 18 ++
> > >  include/uapi/linux/vhost.h | 14 ++
> > >  2 files changed, 32 insertions(+)
> > >
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index
> > > 32713db5831d..d1d19555c4b7 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -478,6 +478,21 @@ static long vhost_vdpa_get_vqs_count(struct
> vhost_vdpa *v, u32 __user *argp)
> > >   return 0;
> > >  }
> > >
> > > +static long vhost_vdpa_stop(struct vhost_vdpa *v, u32 __user *argp)
> > > +{
> > > + struct vdpa_device *vdpa = v->vdpa;
> > > + const struct vdpa_config_ops *ops = vdpa->config;
> > > + int stop;
> > > +
> > > + if (!ops->stop)
> > > + return -EOPNOTSUPP;
> > > +
> > > + if (copy_from_user(, argp, sizeof(stop)))
> > > + return -EFAULT;
> > > +
> > > + return ops->stop(vdpa, stop);
> > > +}
> > > +
> > >  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int
> cmd,
> > >  void __user *argp)  { @@ -650,6
> > > +665,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> > >   case VHOST_VDPA_GET_VQS_COUNT:
> > >   r = vhost_vdpa_get_vqs_count(v, argp);
> > >   break;
> > > + case VHOST_VDPA_STOP:
> > > + r = vhost_vdpa_stop(v, argp);
> > > + break;
> > >   default:
> > >   r = vhost_dev_ioctl(>vdev, cmd, argp);
> > >   if (r == -ENOIOCTLCMD) diff --git
> > > a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index
> > > cab645d4a645..c7e47b29bf61 100644
> > > --- a/include/uapi/linux/vhost.h
> > > +++ b/include/uapi/linux/vhost.h
> > > @@ -171,4 +171,18 @@
> > >  #define VHOST_VDPA_SET_GROUP_ASID_IOW(VHOST_VIRTIO, 0x7C,
> \
> > >struct vhost_vring_state)
> > >
> > > +/* Stop or resume a device so it does not process virtqueue
> > > +requests anymore
> > > + *
> > > + * After the return of ioctl with stop != 0, the device must finish
> > > +any
> > > + * pending operations like in flight requests. It must also
> > > +preserve all
> > > + * the necessary state (the virtqueue vring base plus the possible
> > > +device
> > > + * specific states) that is required for restoring in the future.
> > > +The
> > > + * device must not change its configuration after that point.
> > > + *
> > > + * After the return of ioctl with stop == 0, the device can
> > > +continue
> > > + * processing buffers as long as typical conditions are met (vq is
> > > +enabled,
> > > + * DRIVER_OK status bit is enabled, etc).
> > > + */
> > > +#define VHOST_VDPA_STOP  _IOW(VHOST_VIRTIO, 0x7D, 
> > > int)
> > > +
A better name is VHOST_VDPA_SET_STATE
State = stop/suspend
State = start/resume

Suspend/resume seems more logical, as opposed start/stop, because it more 
clearly indicates that the resume (start) is from some programmed beginning 
(and not first boot).

> > >  #endif
> >
> > I wonder how does this interact with the admin vq idea.
> > I.e. if we stop all VQs then apparently admin vq can't work either ...
> > Thoughts?
> >
> 
> Copying here the answer to Parav, feel free to answer to any thread or
> highlight if I missed something :). Using the admin vq proposal terminology of
> "device group".
> 
> --
> This would stop a device of a device
> group, but not the whole virtqueue group. If the admin VQ is offered by the
> PF (since it's not exposed to the guest), it will continue accepting requests 
> as
> normal. If it's exposed in the VF, I think the best bet is to shadow it, since
> guest and host requests could conflict.
> 

vhost-vdpa device is exposed for a VF through vp-vdpa driver to user land.
Now vp-vdpa driver will have to choose between using config register vs using 
AQ to suspend/resume the device.

Why not always begin with more superior interface of AQ that address multiple 
of these needs for LM case?

For LM case, more you explore, we realize that either VF relying on PF's AQ for 
query/config/setup/restore makes more sense or have its own dedicated AQ.

VM's suspend/resume operation can be handled through the shadow Q.

> Since this is offered through vdpa, the device backend driver can route it to
> whatever method works better for the hardware. For example, to send an
> admin vq command to the PF. That's why it's important to keep the feature
> as self-contained and orthogonal to others as possible.
> --
> 
> > > --
> > > 2.31.1
> >

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org

RE: [PATCH v4 0/4] Implement vdpasim stop operation

2022-06-01 Thread Parav Pandit via Virtualization


> From: Jason Wang 
> Sent: Tuesday, May 31, 2022 10:42 PM
> 
> Well, the ability to query the virtqueue state was proposed as another
> feature (Eugenio, please correct me). This should be sufficient for making
> virtio-net to be live migrated.
> 
The device is stopped, it won't answer to this special vq config done here.
Programming all of these using cfg registers doesn't scale for on-chip memory 
and for the speed.

Next would be to program hundreds of statistics of the 64 VQs through giant PCI 
config space register in some busy polling scheme.

I can clearly see how all these are inefficient for faster LM.
We need an efficient AQ to proceed with at minimum.

> https://lists.oasis-open.org/archives/virtio-comment/202103/msg8.html
> 
> > Once the device is stopped, its state cannot be queried further as device
> won't respond.
> > It has limited use case.
> > What we need is to stop non admin queue related portion of the device.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 3/4] vhost-vdpa: uAPI to stop the device

2022-06-01 Thread Michael S. Tsirkin
On Thu, May 26, 2022 at 02:43:37PM +0200, Eugenio Pérez wrote:
> The ioctl adds support for stop the device from userspace.
> 
> Signed-off-by: Eugenio Pérez 
> ---
>  drivers/vhost/vdpa.c   | 18 ++
>  include/uapi/linux/vhost.h | 14 ++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 32713db5831d..d1d19555c4b7 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -478,6 +478,21 @@ static long vhost_vdpa_get_vqs_count(struct vhost_vdpa 
> *v, u32 __user *argp)
>   return 0;
>  }
>  
> +static long vhost_vdpa_stop(struct vhost_vdpa *v, u32 __user *argp)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> + int stop;
> +
> + if (!ops->stop)
> + return -EOPNOTSUPP;
> +
> + if (copy_from_user(, argp, sizeof(stop)))
> + return -EFAULT;
> +
> + return ops->stop(vdpa, stop);
> +}
> +
>  static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>  void __user *argp)
>  {
> @@ -650,6 +665,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
>   case VHOST_VDPA_GET_VQS_COUNT:
>   r = vhost_vdpa_get_vqs_count(v, argp);
>   break;
> + case VHOST_VDPA_STOP:
> + r = vhost_vdpa_stop(v, argp);
> + break;
>   default:
>   r = vhost_dev_ioctl(>vdev, cmd, argp);
>   if (r == -ENOIOCTLCMD)
> diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
> index cab645d4a645..c7e47b29bf61 100644
> --- a/include/uapi/linux/vhost.h
> +++ b/include/uapi/linux/vhost.h
> @@ -171,4 +171,18 @@
>  #define VHOST_VDPA_SET_GROUP_ASID_IOW(VHOST_VIRTIO, 0x7C, \
>struct vhost_vring_state)
>  
> +/* Stop or resume a device so it does not process virtqueue requests anymore
> + *
> + * After the return of ioctl with stop != 0, the device must finish any
> + * pending operations like in flight requests. It must also preserve all
> + * the necessary state (the virtqueue vring base plus the possible device
> + * specific states) that is required for restoring in the future. The
> + * device must not change its configuration after that point.
> + *
> + * After the return of ioctl with stop == 0, the device can continue
> + * processing buffers as long as typical conditions are met (vq is enabled,
> + * DRIVER_OK status bit is enabled, etc).
> + */
> +#define VHOST_VDPA_STOP  _IOW(VHOST_VIRTIO, 0x7D, int)
> +
>  #endif

I wonder how does this interact with the admin vq idea.
I.e. if we stop all VQs then apparently admin vq can't
work either ...
Thoughts?

> -- 
> 2.31.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

2022-06-01 Thread David Hildenbrand
On 01.06.22 04:17, zhenwei pi wrote:
> On 5/31/22 12:08, Jue Wang wrote:
>> On Mon, May 30, 2022 at 8:49 AM Peter Xu  wrote:
>>>
>>> On Mon, May 30, 2022 at 07:33:35PM +0800, zhenwei pi wrote:
 A VM uses RAM of 2M huge page. Once a MCE(@HVAy in [HVAx,HVAz)) occurs, the
 2M([HVAx,HVAz)) of hypervisor becomes unaccessible, but the guest poisons 
 4K
 (@GPAy in [GPAx, GPAz)) only, it may hit another 511 MCE ([GPAx, GPAz)
 except GPAy). This is the worse case, so I want to add
   '__le32 corrupted_pages' in struct virtio_balloon_config, it is used in 
 the
 next step: reporting 512 * 4K 'corrupted_pages' to the guest, the guest has
 a chance to isolate the other 511 pages ahead of time. And the guest
 actually loses 2M, fixing 512*4K seems to help significantly.
>>>
>>> It sounds hackish to teach a virtio device to assume one page will always
>>> be poisoned in huge page granule.  That's only a limitation to host kernel
>>> not virtio itself.
>>>
>>> E.g. there're upstream effort ongoing with enabling doublemap on hugetlbfs
>>> pages so hugetlb pages can be mapped in 4k with it.  It provides potential
>>> possibility to do page poisoning with huge pages in 4k too.  When that'll
>>> be ready the assumption can go away, and that does sound like a better
>>> approach towards this problem.
>>
>> +1.
>>
>> A hypervisor should always strive to minimize the guest memory loss.
>>
>> The HugeTLB double mapping enlightened memory poisoning behavior (only
>> poison 4K out of a 2MB huge page and 4K in guest) is a much better
>> solution here. To be completely transparent, it's not _strictly_
>> required to poison the page (whatever the granularity it is) on the
>> host side, as long as the following are true:
>>
>> 1. A hypervisor can emulate the _minimized_ (e.g., 4K) the poison to the 
>> guest.
>> 2. The host page with the UC error is "isolated" (could be PG_HWPOISON
>> or in some other way) and prevented from being reused by other
>> processes.
>>
>> For #2, PG_HWPOISON and HugeTLB double mapping enlightened memory
>> poisoning is a good solution.
>>
>>>

>
> I assume when talking about "the performance memory drops a lot", you
> imply that this patch set can mitigate that performance drop?
>
> But why do you see a performance drop? Because we might lose some
> possible THP candidates (in the host or the guest) and you want to plug
> does holes? I assume you'll see a performance drop simply because
> poisoning memory is expensive, including migrating pages around on CE.
>
> If you have some numbers to share, especially before/after this change,
> that would be great.
>

 The CE storm leads 2 problems I have even seen:
 1, the memory bandwidth slows down to 10%~20%, and the cycles per
 instruction of CPU increases a lot.
 2, the THR (/proc/interrupts) interrupts frequently, the CPU has to use a
 lot time to handle IRQ.
>>>
>>> Totally no good knowledge on CMCI, but if 2) is true then I'm wondering
>>> whether it's necessary to handle the interrupts that frequently.  When I
>>> was reading the Intel CMCI vector handler I stumbled over this comment:
>>>
>>> /*
>>>   * The interrupt handler. This is called on every event.
>>>   * Just call the poller directly to log any events.
>>>   * This could in theory increase the threshold under high load,
>>>   * but doesn't for now.
>>>   */
>>> static void intel_threshold_interrupt(void)
>>>
>>> I think that matches with what I was thinking..  I mean for 2) not sure
>>> whether it can be seen as a CMCI problem and potentially can be optimized
>>> by adjust the cmci threshold dynamically.
>>
>> The CE storm caused performance drop is caused by the extra cycles
>> spent by the ECC steps in memory controller, not in CMCI handling.
>> This is observed in the Google fleet as well. A good solution is to
>> monitor the CE rate closely in user space via /dev/mcelog and migrate
>> all VMs to another host once the CE rate exceeds some threshold.
>>
>> CMCI is a _background_ interrupt that is not handled in the process
>> execution context and its handler is setup to switch to poll (1 / 5
>> min) mode if there are more than ~ a dozen CEs reported via CMCI per
>> second.
>>>
>>> --
>>> Peter Xu
>>>
> 
> Hi, Andrew, David, Naoya
> 
> According to the suggestions, I'd give up the improvement of memory 
> failure on huge page in this series.
> 
> Is it worth recovering corrupted pages for the guest kernel? I'd follow 
> your decision.

Well, as I said, I am not sure if we really need/want this for a handful
of 4k poisoned pages in a VM. As I suspected, doing so might primarily
be interesting for some sort of de-fragmentation (allow again a higher
order page to be placed at the affected PFNs), not because of the slight
reduction of available memory. A simple VM reboot would get the job
similarly done.

As the poisoning refcount code is already a bit shaky as I learned
recently in