Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-17 Thread Alexander Duyck
On Wed, Jan 17, 2018 at 11:57 AM, Michael S. Tsirkin  wrote:
> On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote:
>>
>>
>> On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote:
>> > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:
>> > > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
>> > >  wrote:
>> > > > This feature bit can be used by hypervisor to indicate virtio_net 
>> > > > device to
>> > > > act as a backup for another device with the same MAC address.
>> > > >
>> > > > Signed-off-by: Sridhar Samudrala 
>> > > > ---
>> > > >   drivers/net/virtio_net.c| 2 +-
>> > > >   include/uapi/linux/virtio_net.h | 3 +++
>> > > >   2 files changed, 4 insertions(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > > > index 12dfc5fee58e..f149a160a8c5 100644
>> > > > --- a/drivers/net/virtio_net.c
>> > > > +++ b/drivers/net/virtio_net.c
>> > > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
>> > > >  VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>> > > >  VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> > > >  VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
>> > > > -   VIRTIO_NET_F_SPEED_DUPLEX
>> > > > +   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
>> > > >
>> > > >   static unsigned int features[] = {
>> > > >  VIRTNET_FEATURES,
>> > > > diff --git a/include/uapi/linux/virtio_net.h 
>> > > > b/include/uapi/linux/virtio_net.h
>> > > > index 5de6ed37695b..c7c35fd1a5ed 100644
>> > > > --- a/include/uapi/linux/virtio_net.h
>> > > > +++ b/include/uapi/linux/virtio_net.h
>> > > > @@ -57,6 +57,9 @@
>> > > >   * Steering */
>> > > >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
>> > > >
>> > > > +#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another 
>> > > > device
>> > > > +* with the same MAC.
>> > > > +*/
>> > > >   #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and 
>> > > > duplex */
>> > > >
>> > > >   #ifndef VIRTIO_NET_NO_LEGACY
>> > > I'm not a huge fan of the name "backup" since that implies that the
>> > > Virtio interface is only used if the VF is not present, and there are
>> > > multiple instances such as dealing with east/west or
>> > > broadcast/multicast traffic where it may be desirable to use the
>> > > para-virtual interface rather then deal with PCI overhead/bottleneck
>> > > to send the packet.
>> > Right now hypervisors mostly expect that yes, only one at a time is
>> > used.  E.g. if you try to do multicast sending packets on both VF and
>> > virtio then you will end up with two copies of each packet.
>>
>> I think we want to use only 1 interface to  send out any packet. In case of
>> broadcast/multicasts it would be an optimization to send them via virtio and
>> this patch series adds that optimization.
>
> Right that's what I think we should rather avoid for now.
>
> It's *not* an optimization if there's a single VM on this host,
> or if a specific multicast group does not have any VMs on same
> host.

Agreed. In my mind this is something that is controlled by the
pass-thru interface once it is enslaved.

> I'd rather we just sent everything out on the PT if that's
> there. The reason we have virtio in the picture is just so
> we can migrate without downtime.

I wasn't saying we do that in all cases. That would be something that
would have to be decided by the pass-thru interface. Ideally the
virtio would provide just enough information to get itself into the
bond and I see this being the mechanism for it to do so. From there
the complexity mostly lies in the pass-thru interface to configure the
correct transmit modes if for example you have multiple pass-thru
interfaces or a more complex traffic setup due to things like
SwitchDev.

In my mind we go the bonding route and there are few use cases for all
of this. First is the backup case that is being addressed here. That
becomes your basic "copy netvsc" approach for this which would be
default. It is how we would handle basic pass-thru back-up paths. If
the host decides to send multicast/broadcast traffic from the host up
through it that is a host side decision. I am okay with our default
transmit behavior from the guest being to send everything through the
pass-thru interface. All the virtio would be doing is advertising that
it is a side channel for some sort of bond with another interface. By
calling it a side channel it implies that it isn't the direct channel
for the interface, but it is an alternative communications channel for
things like backup, and other future uses.

There end up being a few different "phase 2" concepts I have floating
around which I want to avoid limiting. Solving the east/west problem
is an example. In my mind that just becomes 

Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-17 Thread Michael S. Tsirkin
On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote:
> 
> 
> On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote:
> > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:
> > > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
> > >  wrote:
> > > > This feature bit can be used by hypervisor to indicate virtio_net 
> > > > device to
> > > > act as a backup for another device with the same MAC address.
> > > > 
> > > > Signed-off-by: Sridhar Samudrala 
> > > > ---
> > > >   drivers/net/virtio_net.c| 2 +-
> > > >   include/uapi/linux/virtio_net.h | 3 +++
> > > >   2 files changed, 4 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 12dfc5fee58e..f149a160a8c5 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
> > > >  VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> > > >  VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > > >  VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> > > > -   VIRTIO_NET_F_SPEED_DUPLEX
> > > > +   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
> > > > 
> > > >   static unsigned int features[] = {
> > > >  VIRTNET_FEATURES,
> > > > diff --git a/include/uapi/linux/virtio_net.h 
> > > > b/include/uapi/linux/virtio_net.h
> > > > index 5de6ed37695b..c7c35fd1a5ed 100644
> > > > --- a/include/uapi/linux/virtio_net.h
> > > > +++ b/include/uapi/linux/virtio_net.h
> > > > @@ -57,6 +57,9 @@
> > > >   * Steering */
> > > >   #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
> > > > 
> > > > +#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another 
> > > > device
> > > > +* with the same MAC.
> > > > +*/
> > > >   #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and 
> > > > duplex */
> > > > 
> > > >   #ifndef VIRTIO_NET_NO_LEGACY
> > > I'm not a huge fan of the name "backup" since that implies that the
> > > Virtio interface is only used if the VF is not present, and there are
> > > multiple instances such as dealing with east/west or
> > > broadcast/multicast traffic where it may be desirable to use the
> > > para-virtual interface rather then deal with PCI overhead/bottleneck
> > > to send the packet.
> > Right now hypervisors mostly expect that yes, only one at a time is
> > used.  E.g. if you try to do multicast sending packets on both VF and
> > virtio then you will end up with two copies of each packet.
> 
> I think we want to use only 1 interface to  send out any packet. In case of
> broadcast/multicasts it would be an optimization to send them via virtio and
> this patch series adds that optimization.

Right that's what I think we should rather avoid for now.

It's *not* an optimization if there's a single VM on this host,
or if a specific multicast group does not have any VMs on same
host.

I'd rather we just sent everything out on the PT if that's
there. The reason we have virtio in the picture is just so
we can migrate without downtime.


> In the receive path,  the broadcasts should only go the PF and reach the VM
> via vitio so that the VM doesn't see duplicate broadcasts.
> 
> 
> > 
> > To me the east/west scenario looks like you want something
> > more similar to a bridge on top of the virtio/PT pair.
> > 
> > So I suspect that use-case will need a separate configuration bit,
> > and possibly that's when you will want something more powerful
> > such as a full bridge.
> 
> east-west optimization of unicasts would be a harder problem to solve as
> somehow the hypervisor needs to indicate the VM about the local dest. macs

Using a bridge with a dedicated device for east/west would let
bridge use standard learning techniques for that perhaps?


> > 
> > 
> > > What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it
> > > is a bit of double entendre as we are using the physical MAC address
> > > to provide configuration information, and then in addition this
> > > interface acts as a secondary channel for passing frames to and from
> > > the guest rather than just using the VF.
> > > 
> > > Just a thought.
> > > 
> > > Thanks.
> > > 
> > > - Alex
> > I just feel that's a very generic name, not conveying enough information
> > about how they hypervisor expects the pair of devices to be used.
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-17 Thread Samudrala, Sridhar



On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote:

On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:

On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
 wrote:

This feature bit can be used by hypervisor to indicate virtio_net device to
act as a backup for another device with the same MAC address.

Signed-off-by: Sridhar Samudrala 
---
  drivers/net/virtio_net.c| 2 +-
  include/uapi/linux/virtio_net.h | 3 +++
  2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 12dfc5fee58e..f149a160a8c5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
 VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 VIRTIO_NET_F_CTRL_MAC_ADDR, \
 VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
-   VIRTIO_NET_F_SPEED_DUPLEX
+   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP

  static unsigned int features[] = {
 VIRTNET_FEATURES,
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index 5de6ed37695b..c7c35fd1a5ed 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -57,6 +57,9 @@
  * Steering */
  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */

+#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another device
+* with the same MAC.
+*/
  #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */

  #ifndef VIRTIO_NET_NO_LEGACY

I'm not a huge fan of the name "backup" since that implies that the
Virtio interface is only used if the VF is not present, and there are
multiple instances such as dealing with east/west or
broadcast/multicast traffic where it may be desirable to use the
para-virtual interface rather then deal with PCI overhead/bottleneck
to send the packet.

Right now hypervisors mostly expect that yes, only one at a time is
used.  E.g. if you try to do multicast sending packets on both VF and
virtio then you will end up with two copies of each packet.


I think we want to use only 1 interface to  send out any packet. In case of
broadcast/multicasts it would be an optimization to send them via virtio and
this patch series adds that optimization.

In the receive path,  the broadcasts should only go the PF and reach the VM
via vitio so that the VM doesn't see duplicate broadcasts.




To me the east/west scenario looks like you want something
more similar to a bridge on top of the virtio/PT pair.

So I suspect that use-case will need a separate configuration bit,
and possibly that's when you will want something more powerful
such as a full bridge.


east-west optimization of unicasts would be a harder problem to solve as
somehow the hypervisor needs to indicate the VM about the local dest. macs





What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it
is a bit of double entendre as we are using the physical MAC address
to provide configuration information, and then in addition this
interface acts as a secondary channel for passing frames to and from
the guest rather than just using the VF.

Just a thought.

Thanks.

- Alex

I just feel that's a very generic name, not conveying enough information
about how they hypervisor expects the pair of devices to be used.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-17 Thread Michael S. Tsirkin
On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote:
> On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
>  wrote:
> > This feature bit can be used by hypervisor to indicate virtio_net device to
> > act as a backup for another device with the same MAC address.
> >
> > Signed-off-by: Sridhar Samudrala 
> > ---
> >  drivers/net/virtio_net.c| 2 +-
> >  include/uapi/linux/virtio_net.h | 3 +++
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 12dfc5fee58e..f149a160a8c5 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
> > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> > VIRTIO_NET_F_CTRL_MAC_ADDR, \
> > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> > -   VIRTIO_NET_F_SPEED_DUPLEX
> > +   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
> >
> >  static unsigned int features[] = {
> > VIRTNET_FEATURES,
> > diff --git a/include/uapi/linux/virtio_net.h 
> > b/include/uapi/linux/virtio_net.h
> > index 5de6ed37695b..c7c35fd1a5ed 100644
> > --- a/include/uapi/linux/virtio_net.h
> > +++ b/include/uapi/linux/virtio_net.h
> > @@ -57,6 +57,9 @@
> >  * Steering */
> >  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
> >
> > +#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another device
> > +* with the same MAC.
> > +*/
> >  #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex 
> > */
> >
> >  #ifndef VIRTIO_NET_NO_LEGACY
> 
> I'm not a huge fan of the name "backup" since that implies that the
> Virtio interface is only used if the VF is not present, and there are
> multiple instances such as dealing with east/west or
> broadcast/multicast traffic where it may be desirable to use the
> para-virtual interface rather then deal with PCI overhead/bottleneck
> to send the packet.

Right now hypervisors mostly expect that yes, only one at a time is
used.  E.g. if you try to do multicast sending packets on both VF and
virtio then you will end up with two copies of each packet.

To me the east/west scenario looks like you want something
more similar to a bridge on top of the virtio/PT pair.

So I suspect that use-case will need a separate configuration bit,
and possibly that's when you will want something more powerful
such as a full bridge.


> What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it
> is a bit of double entendre as we are using the physical MAC address
> to provide configuration information, and then in addition this
> interface acts as a secondary channel for passing frames to and from
> the guest rather than just using the VF.
> 
> Just a thought.
> 
> Thanks.
> 
> - Alex

I just feel that's a very generic name, not conveying enough information
about how they hypervisor expects the pair of devices to be used.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit

2018-01-17 Thread Alexander Duyck
On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala
 wrote:
> This feature bit can be used by hypervisor to indicate virtio_net device to
> act as a backup for another device with the same MAC address.
>
> Signed-off-by: Sridhar Samudrala 
> ---
>  drivers/net/virtio_net.c| 2 +-
>  include/uapi/linux/virtio_net.h | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 12dfc5fee58e..f149a160a8c5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = {
> VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> VIRTIO_NET_F_CTRL_MAC_ADDR, \
> VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \
> -   VIRTIO_NET_F_SPEED_DUPLEX
> +   VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP
>
>  static unsigned int features[] = {
> VIRTNET_FEATURES,
> diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
> index 5de6ed37695b..c7c35fd1a5ed 100644
> --- a/include/uapi/linux/virtio_net.h
> +++ b/include/uapi/linux/virtio_net.h
> @@ -57,6 +57,9 @@
>  * Steering */
>  #define VIRTIO_NET_F_CTRL_MAC_ADDR 23  /* Set MAC address */
>
> +#define VIRTIO_NET_F_BACKUP  62/* Act as backup for another device
> +* with the same MAC.
> +*/
>  #define VIRTIO_NET_F_SPEED_DUPLEX 63   /* Device set linkspeed and duplex */
>
>  #ifndef VIRTIO_NET_NO_LEGACY

I'm not a huge fan of the name "backup" since that implies that the
Virtio interface is only used if the VF is not present, and there are
multiple instances such as dealing with east/west or
broadcast/multicast traffic where it may be desirable to use the
para-virtual interface rather then deal with PCI overhead/bottleneck
to send the packet.

What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it
is a bit of double entendre as we are using the physical MAC address
to provide configuration information, and then in addition this
interface acts as a secondary channel for passing frames to and from
the guest rather than just using the VF.

Just a thought.

Thanks.

- Alex
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-17 Thread Michael S. Tsirkin
On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote:
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free pages by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host may also requests
> the guest to stop the reporting in advance by sending the stop cmd id to
> the guest via the configuration register.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> ---
>  drivers/virtio/virtio_balloon.c | 242 
> +++-
>  include/uapi/linux/virtio_balloon.h |   4 +
>  2 files changed, 214 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index a1fb52c..b9561a5 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt;
>  
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +68,13 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* Start to report free pages */
> + bool report_free_page;
> + /* Stores the cmd id given by host to start the free page reporting */
> + uint32_t start_cmd_id;
> + /* Stores STOP_ID as a sign to tell host that the reporting is done */
> + uint32_t stop_cmd_id;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct 
> virtio_balloon *vb)
>   return idx;
>  }
>  
> +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len)
> +{
> + struct scatterlist sg;
> + unsigned int unused;
> + int err;
> +
> + sg_init_table(, 1);
> + sg_set_page(, pfn_to_page(pfn), len, 0);
> +
> + /* Detach all the used buffers from the vq */
> + while (virtqueue_get_buf(vq, ))
> + ;
> +
> + /*
> +  * Since this is an optimization feature, losing a couple of free
> +  * pages to report isn't important.
> We simply resturn

return

> without adding
> +  * the page if the vq is full. We are adding one entry each time,
> +  * which essentially results in no memory allocation, so the
> +  * GFP_KERNEL flag below can be ignored.
> +  */
> + if (vq->num_free) {
> + err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);

Should we kick here? At least when ring is close to
being full. Kick at half way full?
Otherwise it's unlikely ring will
ever be cleaned until we finish the scan.

> + /*
> +  * This is expected to never fail, because there is always an
> +  * entry available on the vq.
> +  */
> + BUG_ON(err);
> + }
> +}
> +
> +static void batch_free_page_sg(struct virtqueue *vq,
> +unsigned long pfn,
> +uint32_t len)

Not sure what does batch refer to here.
I'd just open-code this.

> +{
> + add_one_sg(vq, pfn, len);
> +
> + /* Batch till the vq is full */
> + if (!vq->num_free)
> + virtqueue_kick(vq);
> +}
> +
> +static void send_cmd_id(struct virtqueue *vq, void *addr)

Why void *? Should be a specific type.
then you can use sizeof *addr as size.

> +{
> + struct scatterlist sg;
> + unsigned int unused;
> + int err;
> +
> + sg_init_one(, addr, sizeof(uint32_t));

This passes a guest-endian value to host. This is a problem:
should always pass LE values.

> +
> + /*
> +  * This handles the cornercase that the vq happens to be full when
> +  * adding a cmd id. Rarely happen in practice.
> +  */
> + while (!vq->num_free)
> + virtqueue_get_buf(vq, );

I dislike this busy-waiting. It's a hint after all -
why not just retry later - hopefully after getting an
interrupt?

Alternatively, stop adding more entries when we have a single
ring entry left, making sure 

Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-17 Thread Pankaj Gupta

> On 01/17/2018 04:21 PM, Pankaj Gupta wrote:
> >> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
> >> support of reporting hints of guest free pages to host via virtio-balloon.
> >>
> >> Host requests the guest to report free pages by sending a new cmd
> >> id to the guest via the free_page_report_cmd_id configuration register.
> >>
> >> When the guest starts to report, the first element added to the free page
> >> vq is the cmd id given by host. When the guest finishes the reporting
> >> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> >> to the vq to tell host that the reporting is done. Host may also requests
> >> the guest to stop the reporting in advance by sending the stop cmd id to
> >> the guest via the configuration register.
> >>
> >> Signed-off-by: Wei Wang 
> >> Signed-off-by: Liang Li 
> >> Cc: Michael S. Tsirkin 
> >> Cc: Michal Hocko 
> >> ---
> >>   drivers/virtio/virtio_balloon.c | 242
> >>   +++-
> >>   include/uapi/linux/virtio_balloon.h |   4 +
> >>   2 files changed, 214 insertions(+), 32 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_balloon.c
> >> b/drivers/virtio/virtio_balloon.c
> >> index a1fb52c..b9561a5 100644
> >> --- a/drivers/virtio/virtio_balloon.c
> >> +++ b/drivers/virtio/virtio_balloon.c
> >> @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt;
> >>   
> >>   struct virtio_balloon {
> >>   struct virtio_device *vdev;
> >> -struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> >> +struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, 
> >> *free_page_vq;
> >> +
> >> +/* Balloon's own wq for cpu-intensive work items */
> >> +struct workqueue_struct *balloon_wq;
> >> +/* The free page reporting work item submitted to the balloon wq 
> >> */
> >> +struct work_struct report_free_page_work;
> >>   
> >>   /* The balloon servicing is delegated to a freezable workqueue. 
> >> */
> >>   struct work_struct update_balloon_stats_work;
> >> @@ -63,6 +68,13 @@ struct virtio_balloon {
> >>   spinlock_t stop_update_lock;
> >>   bool stop_update;
> >>   
> >> +/* Start to report free pages */
> >> +bool report_free_page;
> >> +/* Stores the cmd id given by host to start the free page 
> >> reporting */
> >> +uint32_t start_cmd_id;
> >> +/* Stores STOP_ID as a sign to tell host that the reporting is 
> >> done */
> >> +uint32_t stop_cmd_id;
> >> +
> >>   /* Waiting for host to ack the pages we released. */
> >>   wait_queue_head_t acked;
> >>   
> >> @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct
> >> virtio_balloon *vb)
> >>   return idx;
> >>   }
> >>   
> >> +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t
> >> len)
> >> +{
> >> +struct scatterlist sg;
> >> +unsigned int unused;
> >> +int err;
> >> +
> >> +sg_init_table(, 1);
> >> +sg_set_page(, pfn_to_page(pfn), len, 0);
> >> +
> >> +/* Detach all the used buffers from the vq */
> >> +while (virtqueue_get_buf(vq, ))
> >> +;
> >> +
> >> +/*
> >> + * Since this is an optimization feature, losing a couple of free
> >> + * pages to report isn't important. We simply resturn without 
> >> adding
> >> + * the page if the vq is full. We are adding one entry each time,
> >> + * which essentially results in no memory allocation, so the
> >> + * GFP_KERNEL flag below can be ignored.
> >> + */
> >> +if (vq->num_free) {
> >> +err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> >> +/*
> >> + * This is expected to never fail, because there is 
> >> always an
> >> + * entry available on the vq.
> >> + */
> >> +BUG_ON(err);
> >> +}
> >> +}
> >> +
> >> +static void batch_free_page_sg(struct virtqueue *vq,
> >> +   unsigned long pfn,
> >> +   uint32_t len)
> >> +{
> >> +add_one_sg(vq, pfn, len);
> >> +
> >> +/* Batch till the vq is full */
> >> +if (!vq->num_free)
> >> +virtqueue_kick(vq);
> >> +}
> >> +
> >> +static void send_cmd_id(struct virtqueue *vq, void *addr)
> >> +{
> >> +struct scatterlist sg;
> >> +unsigned int unused;
> >> +int err;
> >> +
> >> +sg_init_one(, addr, sizeof(uint32_t));
> >> +
> >> +/*
> >> + * This handles the cornercase that the vq happens to be full when
> >> + * adding a cmd id. Rarely happen in practice.
> >> + */
> >> +while (!vq->num_free)
> >> +virtqueue_get_buf(vq, );
> >> +
> >> +err = virtqueue_add_outbuf(vq, 

Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-17 Thread Wei Wang

On 01/17/2018 04:21 PM, Pankaj Gupta wrote:

Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
support of reporting hints of guest free pages to host via virtio-balloon.

Host requests the guest to report free pages by sending a new cmd
id to the guest via the free_page_report_cmd_id configuration register.

When the guest starts to report, the first element added to the free page
vq is the cmd id given by host. When the guest finishes the reporting
of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
to the vq to tell host that the reporting is done. Host may also requests
the guest to stop the reporting in advance by sending the stop cmd id to
the guest via the configuration register.

Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
Cc: Michael S. Tsirkin 
Cc: Michal Hocko 
---
  drivers/virtio/virtio_balloon.c | 242
  +++-
  include/uapi/linux/virtio_balloon.h |   4 +
  2 files changed, 214 insertions(+), 32 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c
b/drivers/virtio/virtio_balloon.c
index a1fb52c..b9561a5 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt;
  
  struct virtio_balloon {

struct virtio_device *vdev;
-   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
+   struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
+
+   /* Balloon's own wq for cpu-intensive work items */
+   struct workqueue_struct *balloon_wq;
+   /* The free page reporting work item submitted to the balloon wq */
+   struct work_struct report_free_page_work;
  
  	/* The balloon servicing is delegated to a freezable workqueue. */

struct work_struct update_balloon_stats_work;
@@ -63,6 +68,13 @@ struct virtio_balloon {
spinlock_t stop_update_lock;
bool stop_update;
  
+	/* Start to report free pages */

+   bool report_free_page;
+   /* Stores the cmd id given by host to start the free page reporting */
+   uint32_t start_cmd_id;
+   /* Stores STOP_ID as a sign to tell host that the reporting is done */
+   uint32_t stop_cmd_id;
+
/* Waiting for host to ack the pages we released. */
wait_queue_head_t acked;
  
@@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct

virtio_balloon *vb)
return idx;
  }
  
+static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t

len)
+{
+   struct scatterlist sg;
+   unsigned int unused;
+   int err;
+
+   sg_init_table(, 1);
+   sg_set_page(, pfn_to_page(pfn), len, 0);
+
+   /* Detach all the used buffers from the vq */
+   while (virtqueue_get_buf(vq, ))
+   ;
+
+   /*
+* Since this is an optimization feature, losing a couple of free
+* pages to report isn't important. We simply resturn without adding
+* the page if the vq is full. We are adding one entry each time,
+* which essentially results in no memory allocation, so the
+* GFP_KERNEL flag below can be ignored.
+*/
+   if (vq->num_free) {
+   err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
+   /*
+* This is expected to never fail, because there is always an
+* entry available on the vq.
+*/
+   BUG_ON(err);
+   }
+}
+
+static void batch_free_page_sg(struct virtqueue *vq,
+  unsigned long pfn,
+  uint32_t len)
+{
+   add_one_sg(vq, pfn, len);
+
+   /* Batch till the vq is full */
+   if (!vq->num_free)
+   virtqueue_kick(vq);
+}
+
+static void send_cmd_id(struct virtqueue *vq, void *addr)
+{
+   struct scatterlist sg;
+   unsigned int unused;
+   int err;
+
+   sg_init_one(, addr, sizeof(uint32_t));
+
+   /*
+* This handles the cornercase that the vq happens to be full when
+* adding a cmd id. Rarely happen in practice.
+*/
+   while (!vq->num_free)
+   virtqueue_get_buf(vq, );
+
+   err = virtqueue_add_outbuf(vq, , 1, vq, GFP_KERNEL);
+   /*
+* This is expected to never fail, because there is always an
+* entry available on the vq.
+*/
+   BUG_ON(err);
+   virtqueue_kick(vq);
+}
+
  /*
   * While most virtqueues communicate guest-initiated requests to the
   hypervisor,
   * the stats queue operates in reverse.  The driver initializes the
   virtqueue
@@ -316,17 +393,6 @@ static void stats_handle_request(struct virtio_balloon
*vb)
virtqueue_kick(vq);
  }
  
-static void virtballoon_changed(struct virtio_device *vdev)

-{
-   struct virtio_balloon *vb = vdev->priv;
-   unsigned long flags;
-
-   spin_lock_irqsave(>stop_update_lock, flags);
-   if 

Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ

2018-01-17 Thread Pankaj Gupta

> 
> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the
> support of reporting hints of guest free pages to host via virtio-balloon.
> 
> Host requests the guest to report free pages by sending a new cmd
> id to the guest via the free_page_report_cmd_id configuration register.
> 
> When the guest starts to report, the first element added to the free page
> vq is the cmd id given by host. When the guest finishes the reporting
> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added
> to the vq to tell host that the reporting is done. Host may also requests
> the guest to stop the reporting in advance by sending the stop cmd id to
> the guest via the configuration register.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> Cc: Michael S. Tsirkin 
> Cc: Michal Hocko 
> ---
>  drivers/virtio/virtio_balloon.c | 242
>  +++-
>  include/uapi/linux/virtio_balloon.h |   4 +
>  2 files changed, 214 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c
> b/drivers/virtio/virtio_balloon.c
> index a1fb52c..b9561a5 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt;
>  
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The free page reporting work item submitted to the balloon wq */
> + struct work_struct report_free_page_work;
>  
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work;
> @@ -63,6 +68,13 @@ struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
>  
> + /* Start to report free pages */
> + bool report_free_page;
> + /* Stores the cmd id given by host to start the free page reporting */
> + uint32_t start_cmd_id;
> + /* Stores STOP_ID as a sign to tell host that the reporting is done */
> + uint32_t stop_cmd_id;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
>  
> @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct
> virtio_balloon *vb)
>   return idx;
>  }
>  
> +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t
> len)
> +{
> + struct scatterlist sg;
> + unsigned int unused;
> + int err;
> +
> + sg_init_table(, 1);
> + sg_set_page(, pfn_to_page(pfn), len, 0);
> +
> + /* Detach all the used buffers from the vq */
> + while (virtqueue_get_buf(vq, ))
> + ;
> +
> + /*
> +  * Since this is an optimization feature, losing a couple of free
> +  * pages to report isn't important. We simply resturn without adding
> +  * the page if the vq is full. We are adding one entry each time,
> +  * which essentially results in no memory allocation, so the
> +  * GFP_KERNEL flag below can be ignored.
> +  */
> + if (vq->num_free) {
> + err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL);
> + /*
> +  * This is expected to never fail, because there is always an
> +  * entry available on the vq.
> +  */
> + BUG_ON(err);
> + }
> +}
> +
> +static void batch_free_page_sg(struct virtqueue *vq,
> +unsigned long pfn,
> +uint32_t len)
> +{
> + add_one_sg(vq, pfn, len);
> +
> + /* Batch till the vq is full */
> + if (!vq->num_free)
> + virtqueue_kick(vq);
> +}
> +
> +static void send_cmd_id(struct virtqueue *vq, void *addr)
> +{
> + struct scatterlist sg;
> + unsigned int unused;
> + int err;
> +
> + sg_init_one(, addr, sizeof(uint32_t));
> +
> + /*
> +  * This handles the cornercase that the vq happens to be full when
> +  * adding a cmd id. Rarely happen in practice.
> +  */
> + while (!vq->num_free)
> + virtqueue_get_buf(vq, );
> +
> + err = virtqueue_add_outbuf(vq, , 1, vq, GFP_KERNEL);
> + /*
> +  * This is expected to never fail, because there is always an
> +  * entry available on the vq.
> +  */
> + BUG_ON(err);
> + virtqueue_kick(vq);
> +}
> +
>  /*
>   * While most virtqueues communicate guest-initiated requests to the
>   hypervisor,
>   * the stats queue operates in reverse.  The driver initializes the
>   virtqueue
> @@ -316,17 +393,6 @@ static void stats_handle_request(struct virtio_balloon
> *vb)
>   virtqueue_kick(vq);
>  }
>  
> -static void virtballoon_changed(struct virtio_device *vdev)
> -{
> - struct virtio_balloon *vb = vdev->priv;
> -