Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
On Wed, Jan 17, 2018 at 11:57 AM, Michael S. Tsirkinwrote: > On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote: >> >> >> On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote: >> > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote: >> > > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala >> > > wrote: >> > > > This feature bit can be used by hypervisor to indicate virtio_net >> > > > device to >> > > > act as a backup for another device with the same MAC address. >> > > > >> > > > Signed-off-by: Sridhar Samudrala >> > > > --- >> > > > drivers/net/virtio_net.c| 2 +- >> > > > include/uapi/linux/virtio_net.h | 3 +++ >> > > > 2 files changed, 4 insertions(+), 1 deletion(-) >> > > > >> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> > > > index 12dfc5fee58e..f149a160a8c5 100644 >> > > > --- a/drivers/net/virtio_net.c >> > > > +++ b/drivers/net/virtio_net.c >> > > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = { >> > > > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \ >> > > > VIRTIO_NET_F_CTRL_MAC_ADDR, \ >> > > > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \ >> > > > - VIRTIO_NET_F_SPEED_DUPLEX >> > > > + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP >> > > > >> > > > static unsigned int features[] = { >> > > > VIRTNET_FEATURES, >> > > > diff --git a/include/uapi/linux/virtio_net.h >> > > > b/include/uapi/linux/virtio_net.h >> > > > index 5de6ed37695b..c7c35fd1a5ed 100644 >> > > > --- a/include/uapi/linux/virtio_net.h >> > > > +++ b/include/uapi/linux/virtio_net.h >> > > > @@ -57,6 +57,9 @@ >> > > > * Steering */ >> > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ >> > > > >> > > > +#define VIRTIO_NET_F_BACKUP 62/* Act as backup for another >> > > > device >> > > > +* with the same MAC. >> > > > +*/ >> > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and >> > > > duplex */ >> > > > >> > > > #ifndef VIRTIO_NET_NO_LEGACY >> > > I'm not a huge fan of the name "backup" since that implies that the >> > > Virtio interface is only used if the VF is not present, and there are >> > > multiple instances such as dealing with east/west or >> > > broadcast/multicast traffic where it may be desirable to use the >> > > para-virtual interface rather then deal with PCI overhead/bottleneck >> > > to send the packet. >> > Right now hypervisors mostly expect that yes, only one at a time is >> > used. E.g. if you try to do multicast sending packets on both VF and >> > virtio then you will end up with two copies of each packet. >> >> I think we want to use only 1 interface to send out any packet. In case of >> broadcast/multicasts it would be an optimization to send them via virtio and >> this patch series adds that optimization. > > Right that's what I think we should rather avoid for now. > > It's *not* an optimization if there's a single VM on this host, > or if a specific multicast group does not have any VMs on same > host. Agreed. In my mind this is something that is controlled by the pass-thru interface once it is enslaved. > I'd rather we just sent everything out on the PT if that's > there. The reason we have virtio in the picture is just so > we can migrate without downtime. I wasn't saying we do that in all cases. That would be something that would have to be decided by the pass-thru interface. Ideally the virtio would provide just enough information to get itself into the bond and I see this being the mechanism for it to do so. From there the complexity mostly lies in the pass-thru interface to configure the correct transmit modes if for example you have multiple pass-thru interfaces or a more complex traffic setup due to things like SwitchDev. In my mind we go the bonding route and there are few use cases for all of this. First is the backup case that is being addressed here. That becomes your basic "copy netvsc" approach for this which would be default. It is how we would handle basic pass-thru back-up paths. If the host decides to send multicast/broadcast traffic from the host up through it that is a host side decision. I am okay with our default transmit behavior from the guest being to send everything through the pass-thru interface. All the virtio would be doing is advertising that it is a side channel for some sort of bond with another interface. By calling it a side channel it implies that it isn't the direct channel for the interface, but it is an alternative communications channel for things like backup, and other future uses. There end up being a few different "phase 2" concepts I have floating around which I want to avoid limiting. Solving the east/west problem is an example. In my mind that just becomes
Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
On Wed, Jan 17, 2018 at 11:25:41AM -0800, Samudrala, Sridhar wrote: > > > On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote: > > On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote: > > > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala > > >wrote: > > > > This feature bit can be used by hypervisor to indicate virtio_net > > > > device to > > > > act as a backup for another device with the same MAC address. > > > > > > > > Signed-off-by: Sridhar Samudrala > > > > --- > > > > drivers/net/virtio_net.c| 2 +- > > > > include/uapi/linux/virtio_net.h | 3 +++ > > > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index 12dfc5fee58e..f149a160a8c5 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = { > > > > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \ > > > > VIRTIO_NET_F_CTRL_MAC_ADDR, \ > > > > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \ > > > > - VIRTIO_NET_F_SPEED_DUPLEX > > > > + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP > > > > > > > > static unsigned int features[] = { > > > > VIRTNET_FEATURES, > > > > diff --git a/include/uapi/linux/virtio_net.h > > > > b/include/uapi/linux/virtio_net.h > > > > index 5de6ed37695b..c7c35fd1a5ed 100644 > > > > --- a/include/uapi/linux/virtio_net.h > > > > +++ b/include/uapi/linux/virtio_net.h > > > > @@ -57,6 +57,9 @@ > > > > * Steering */ > > > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > > > > > +#define VIRTIO_NET_F_BACKUP 62/* Act as backup for another > > > > device > > > > +* with the same MAC. > > > > +*/ > > > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and > > > > duplex */ > > > > > > > > #ifndef VIRTIO_NET_NO_LEGACY > > > I'm not a huge fan of the name "backup" since that implies that the > > > Virtio interface is only used if the VF is not present, and there are > > > multiple instances such as dealing with east/west or > > > broadcast/multicast traffic where it may be desirable to use the > > > para-virtual interface rather then deal with PCI overhead/bottleneck > > > to send the packet. > > Right now hypervisors mostly expect that yes, only one at a time is > > used. E.g. if you try to do multicast sending packets on both VF and > > virtio then you will end up with two copies of each packet. > > I think we want to use only 1 interface to send out any packet. In case of > broadcast/multicasts it would be an optimization to send them via virtio and > this patch series adds that optimization. Right that's what I think we should rather avoid for now. It's *not* an optimization if there's a single VM on this host, or if a specific multicast group does not have any VMs on same host. I'd rather we just sent everything out on the PT if that's there. The reason we have virtio in the picture is just so we can migrate without downtime. > In the receive path, the broadcasts should only go the PF and reach the VM > via vitio so that the VM doesn't see duplicate broadcasts. > > > > > > To me the east/west scenario looks like you want something > > more similar to a bridge on top of the virtio/PT pair. > > > > So I suspect that use-case will need a separate configuration bit, > > and possibly that's when you will want something more powerful > > such as a full bridge. > > east-west optimization of unicasts would be a harder problem to solve as > somehow the hypervisor needs to indicate the VM about the local dest. macs Using a bridge with a dedicated device for east/west would let bridge use standard learning techniques for that perhaps? > > > > > > > What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it > > > is a bit of double entendre as we are using the physical MAC address > > > to provide configuration information, and then in addition this > > > interface acts as a secondary channel for passing frames to and from > > > the guest rather than just using the VF. > > > > > > Just a thought. > > > > > > Thanks. > > > > > > - Alex > > I just feel that's a very generic name, not conveying enough information > > about how they hypervisor expects the pair of devices to be used. > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
On 1/17/2018 11:02 AM, Michael S. Tsirkin wrote: On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote: On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudralawrote: This feature bit can be used by hypervisor to indicate virtio_net device to act as a backup for another device with the same MAC address. Signed-off-by: Sridhar Samudrala --- drivers/net/virtio_net.c| 2 +- include/uapi/linux/virtio_net.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 12dfc5fee58e..f149a160a8c5 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = { VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \ VIRTIO_NET_F_CTRL_MAC_ADDR, \ VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \ - VIRTIO_NET_F_SPEED_DUPLEX + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP static unsigned int features[] = { VIRTNET_FEATURES, diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h index 5de6ed37695b..c7c35fd1a5ed 100644 --- a/include/uapi/linux/virtio_net.h +++ b/include/uapi/linux/virtio_net.h @@ -57,6 +57,9 @@ * Steering */ #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ +#define VIRTIO_NET_F_BACKUP 62/* Act as backup for another device +* with the same MAC. +*/ #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ #ifndef VIRTIO_NET_NO_LEGACY I'm not a huge fan of the name "backup" since that implies that the Virtio interface is only used if the VF is not present, and there are multiple instances such as dealing with east/west or broadcast/multicast traffic where it may be desirable to use the para-virtual interface rather then deal with PCI overhead/bottleneck to send the packet. Right now hypervisors mostly expect that yes, only one at a time is used. E.g. if you try to do multicast sending packets on both VF and virtio then you will end up with two copies of each packet. I think we want to use only 1 interface to send out any packet. In case of broadcast/multicasts it would be an optimization to send them via virtio and this patch series adds that optimization. In the receive path, the broadcasts should only go the PF and reach the VM via vitio so that the VM doesn't see duplicate broadcasts. To me the east/west scenario looks like you want something more similar to a bridge on top of the virtio/PT pair. So I suspect that use-case will need a separate configuration bit, and possibly that's when you will want something more powerful such as a full bridge. east-west optimization of unicasts would be a harder problem to solve as somehow the hypervisor needs to indicate the VM about the local dest. macs What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it is a bit of double entendre as we are using the physical MAC address to provide configuration information, and then in addition this interface acts as a secondary channel for passing frames to and from the guest rather than just using the VF. Just a thought. Thanks. - Alex I just feel that's a very generic name, not conveying enough information about how they hypervisor expects the pair of devices to be used. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
On Wed, Jan 17, 2018 at 10:15:52AM -0800, Alexander Duyck wrote: > On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudrala >wrote: > > This feature bit can be used by hypervisor to indicate virtio_net device to > > act as a backup for another device with the same MAC address. > > > > Signed-off-by: Sridhar Samudrala > > --- > > drivers/net/virtio_net.c| 2 +- > > include/uapi/linux/virtio_net.h | 3 +++ > > 2 files changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 12dfc5fee58e..f149a160a8c5 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = { > > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \ > > VIRTIO_NET_F_CTRL_MAC_ADDR, \ > > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \ > > - VIRTIO_NET_F_SPEED_DUPLEX > > + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP > > > > static unsigned int features[] = { > > VIRTNET_FEATURES, > > diff --git a/include/uapi/linux/virtio_net.h > > b/include/uapi/linux/virtio_net.h > > index 5de6ed37695b..c7c35fd1a5ed 100644 > > --- a/include/uapi/linux/virtio_net.h > > +++ b/include/uapi/linux/virtio_net.h > > @@ -57,6 +57,9 @@ > > * Steering */ > > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > > > +#define VIRTIO_NET_F_BACKUP 62/* Act as backup for another device > > +* with the same MAC. > > +*/ > > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex > > */ > > > > #ifndef VIRTIO_NET_NO_LEGACY > > I'm not a huge fan of the name "backup" since that implies that the > Virtio interface is only used if the VF is not present, and there are > multiple instances such as dealing with east/west or > broadcast/multicast traffic where it may be desirable to use the > para-virtual interface rather then deal with PCI overhead/bottleneck > to send the packet. Right now hypervisors mostly expect that yes, only one at a time is used. E.g. if you try to do multicast sending packets on both VF and virtio then you will end up with two copies of each packet. To me the east/west scenario looks like you want something more similar to a bridge on top of the virtio/PT pair. So I suspect that use-case will need a separate configuration bit, and possibly that's when you will want something more powerful such as a full bridge. > What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it > is a bit of double entendre as we are using the physical MAC address > to provide configuration information, and then in addition this > interface acts as a secondary channel for passing frames to and from > the guest rather than just using the VF. > > Just a thought. > > Thanks. > > - Alex I just feel that's a very generic name, not conveying enough information about how they hypervisor expects the pair of devices to be used. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [virtio-dev] [RFC PATCH net-next v2 1/2] virtio_net: Introduce VIRTIO_NET_F_BACKUP feature bit
On Thu, Jan 11, 2018 at 9:58 PM, Sridhar Samudralawrote: > This feature bit can be used by hypervisor to indicate virtio_net device to > act as a backup for another device with the same MAC address. > > Signed-off-by: Sridhar Samudrala > --- > drivers/net/virtio_net.c| 2 +- > include/uapi/linux/virtio_net.h | 3 +++ > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 12dfc5fee58e..f149a160a8c5 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -2829,7 +2829,7 @@ static struct virtio_device_id id_table[] = { > VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \ > VIRTIO_NET_F_CTRL_MAC_ADDR, \ > VIRTIO_NET_F_MTU, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS, \ > - VIRTIO_NET_F_SPEED_DUPLEX > + VIRTIO_NET_F_SPEED_DUPLEX, VIRTIO_NET_F_BACKUP > > static unsigned int features[] = { > VIRTNET_FEATURES, > diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h > index 5de6ed37695b..c7c35fd1a5ed 100644 > --- a/include/uapi/linux/virtio_net.h > +++ b/include/uapi/linux/virtio_net.h > @@ -57,6 +57,9 @@ > * Steering */ > #define VIRTIO_NET_F_CTRL_MAC_ADDR 23 /* Set MAC address */ > > +#define VIRTIO_NET_F_BACKUP 62/* Act as backup for another device > +* with the same MAC. > +*/ > #define VIRTIO_NET_F_SPEED_DUPLEX 63 /* Device set linkspeed and duplex */ > > #ifndef VIRTIO_NET_NO_LEGACY I'm not a huge fan of the name "backup" since that implies that the Virtio interface is only used if the VF is not present, and there are multiple instances such as dealing with east/west or broadcast/multicast traffic where it may be desirable to use the para-virtual interface rather then deal with PCI overhead/bottleneck to send the packet. What if instead of BACKUP we used the name SIDE_CHANNEL? Basically it is a bit of double entendre as we are using the physical MAC address to provide configuration information, and then in addition this interface acts as a secondary channel for passing frames to and from the guest rather than just using the VF. Just a thought. Thanks. - Alex ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
On Wed, Jan 17, 2018 at 01:10:11PM +0800, Wei Wang wrote: > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the > support of reporting hints of guest free pages to host via virtio-balloon. > > Host requests the guest to report free pages by sending a new cmd > id to the guest via the free_page_report_cmd_id configuration register. > > When the guest starts to report, the first element added to the free page > vq is the cmd id given by host. When the guest finishes the reporting > of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added > to the vq to tell host that the reporting is done. Host may also requests > the guest to stop the reporting in advance by sending the stop cmd id to > the guest via the configuration register. > > Signed-off-by: Wei Wang> Signed-off-by: Liang Li > Cc: Michael S. Tsirkin > Cc: Michal Hocko > --- > drivers/virtio/virtio_balloon.c | 242 > +++- > include/uapi/linux/virtio_balloon.h | 4 + > 2 files changed, 214 insertions(+), 32 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index a1fb52c..b9561a5 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt; > > struct virtio_balloon { > struct virtio_device *vdev; > - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; > + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; > + > + /* Balloon's own wq for cpu-intensive work items */ > + struct workqueue_struct *balloon_wq; > + /* The free page reporting work item submitted to the balloon wq */ > + struct work_struct report_free_page_work; > > /* The balloon servicing is delegated to a freezable workqueue. */ > struct work_struct update_balloon_stats_work; > @@ -63,6 +68,13 @@ struct virtio_balloon { > spinlock_t stop_update_lock; > bool stop_update; > > + /* Start to report free pages */ > + bool report_free_page; > + /* Stores the cmd id given by host to start the free page reporting */ > + uint32_t start_cmd_id; > + /* Stores STOP_ID as a sign to tell host that the reporting is done */ > + uint32_t stop_cmd_id; > + > /* Waiting for host to ack the pages we released. */ > wait_queue_head_t acked; > > @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct > virtio_balloon *vb) > return idx; > } > > +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len) > +{ > + struct scatterlist sg; > + unsigned int unused; > + int err; > + > + sg_init_table(, 1); > + sg_set_page(, pfn_to_page(pfn), len, 0); > + > + /* Detach all the used buffers from the vq */ > + while (virtqueue_get_buf(vq, )) > + ; > + > + /* > + * Since this is an optimization feature, losing a couple of free > + * pages to report isn't important. > We simply resturn return > without adding > + * the page if the vq is full. We are adding one entry each time, > + * which essentially results in no memory allocation, so the > + * GFP_KERNEL flag below can be ignored. > + */ > + if (vq->num_free) { > + err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL); Should we kick here? At least when ring is close to being full. Kick at half way full? Otherwise it's unlikely ring will ever be cleaned until we finish the scan. > + /* > + * This is expected to never fail, because there is always an > + * entry available on the vq. > + */ > + BUG_ON(err); > + } > +} > + > +static void batch_free_page_sg(struct virtqueue *vq, > +unsigned long pfn, > +uint32_t len) Not sure what does batch refer to here. I'd just open-code this. > +{ > + add_one_sg(vq, pfn, len); > + > + /* Batch till the vq is full */ > + if (!vq->num_free) > + virtqueue_kick(vq); > +} > + > +static void send_cmd_id(struct virtqueue *vq, void *addr) Why void *? Should be a specific type. then you can use sizeof *addr as size. > +{ > + struct scatterlist sg; > + unsigned int unused; > + int err; > + > + sg_init_one(, addr, sizeof(uint32_t)); This passes a guest-endian value to host. This is a problem: should always pass LE values. > + > + /* > + * This handles the cornercase that the vq happens to be full when > + * adding a cmd id. Rarely happen in practice. > + */ > + while (!vq->num_free) > + virtqueue_get_buf(vq, ); I dislike this busy-waiting. It's a hint after all - why not just retry later - hopefully after getting an interrupt? Alternatively, stop adding more entries when we have a single ring entry left, making sure
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
> On 01/17/2018 04:21 PM, Pankaj Gupta wrote: > >> Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the > >> support of reporting hints of guest free pages to host via virtio-balloon. > >> > >> Host requests the guest to report free pages by sending a new cmd > >> id to the guest via the free_page_report_cmd_id configuration register. > >> > >> When the guest starts to report, the first element added to the free page > >> vq is the cmd id given by host. When the guest finishes the reporting > >> of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added > >> to the vq to tell host that the reporting is done. Host may also requests > >> the guest to stop the reporting in advance by sending the stop cmd id to > >> the guest via the configuration register. > >> > >> Signed-off-by: Wei Wang> >> Signed-off-by: Liang Li > >> Cc: Michael S. Tsirkin > >> Cc: Michal Hocko > >> --- > >> drivers/virtio/virtio_balloon.c | 242 > >> +++- > >> include/uapi/linux/virtio_balloon.h | 4 + > >> 2 files changed, 214 insertions(+), 32 deletions(-) > >> > >> diff --git a/drivers/virtio/virtio_balloon.c > >> b/drivers/virtio/virtio_balloon.c > >> index a1fb52c..b9561a5 100644 > >> --- a/drivers/virtio/virtio_balloon.c > >> +++ b/drivers/virtio/virtio_balloon.c > >> @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt; > >> > >> struct virtio_balloon { > >> struct virtio_device *vdev; > >> -struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; > >> +struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, > >> *free_page_vq; > >> + > >> +/* Balloon's own wq for cpu-intensive work items */ > >> +struct workqueue_struct *balloon_wq; > >> +/* The free page reporting work item submitted to the balloon wq > >> */ > >> +struct work_struct report_free_page_work; > >> > >> /* The balloon servicing is delegated to a freezable workqueue. > >> */ > >> struct work_struct update_balloon_stats_work; > >> @@ -63,6 +68,13 @@ struct virtio_balloon { > >> spinlock_t stop_update_lock; > >> bool stop_update; > >> > >> +/* Start to report free pages */ > >> +bool report_free_page; > >> +/* Stores the cmd id given by host to start the free page > >> reporting */ > >> +uint32_t start_cmd_id; > >> +/* Stores STOP_ID as a sign to tell host that the reporting is > >> done */ > >> +uint32_t stop_cmd_id; > >> + > >> /* Waiting for host to ack the pages we released. */ > >> wait_queue_head_t acked; > >> > >> @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct > >> virtio_balloon *vb) > >> return idx; > >> } > >> > >> +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t > >> len) > >> +{ > >> +struct scatterlist sg; > >> +unsigned int unused; > >> +int err; > >> + > >> +sg_init_table(, 1); > >> +sg_set_page(, pfn_to_page(pfn), len, 0); > >> + > >> +/* Detach all the used buffers from the vq */ > >> +while (virtqueue_get_buf(vq, )) > >> +; > >> + > >> +/* > >> + * Since this is an optimization feature, losing a couple of free > >> + * pages to report isn't important. We simply resturn without > >> adding > >> + * the page if the vq is full. We are adding one entry each time, > >> + * which essentially results in no memory allocation, so the > >> + * GFP_KERNEL flag below can be ignored. > >> + */ > >> +if (vq->num_free) { > >> +err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL); > >> +/* > >> + * This is expected to never fail, because there is > >> always an > >> + * entry available on the vq. > >> + */ > >> +BUG_ON(err); > >> +} > >> +} > >> + > >> +static void batch_free_page_sg(struct virtqueue *vq, > >> + unsigned long pfn, > >> + uint32_t len) > >> +{ > >> +add_one_sg(vq, pfn, len); > >> + > >> +/* Batch till the vq is full */ > >> +if (!vq->num_free) > >> +virtqueue_kick(vq); > >> +} > >> + > >> +static void send_cmd_id(struct virtqueue *vq, void *addr) > >> +{ > >> +struct scatterlist sg; > >> +unsigned int unused; > >> +int err; > >> + > >> +sg_init_one(, addr, sizeof(uint32_t)); > >> + > >> +/* > >> + * This handles the cornercase that the vq happens to be full when > >> + * adding a cmd id. Rarely happen in practice. > >> + */ > >> +while (!vq->num_free) > >> +virtqueue_get_buf(vq, ); > >> + > >> +err = virtqueue_add_outbuf(vq,
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
On 01/17/2018 04:21 PM, Pankaj Gupta wrote: Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the support of reporting hints of guest free pages to host via virtio-balloon. Host requests the guest to report free pages by sending a new cmd id to the guest via the free_page_report_cmd_id configuration register. When the guest starts to report, the first element added to the free page vq is the cmd id given by host. When the guest finishes the reporting of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added to the vq to tell host that the reporting is done. Host may also requests the guest to stop the reporting in advance by sending the stop cmd id to the guest via the configuration register. Signed-off-by: Wei WangSigned-off-by: Liang Li Cc: Michael S. Tsirkin Cc: Michal Hocko --- drivers/virtio/virtio_balloon.c | 242 +++- include/uapi/linux/virtio_balloon.h | 4 + 2 files changed, 214 insertions(+), 32 deletions(-) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index a1fb52c..b9561a5 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt; struct virtio_balloon { struct virtio_device *vdev; - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; + + /* Balloon's own wq for cpu-intensive work items */ + struct workqueue_struct *balloon_wq; + /* The free page reporting work item submitted to the balloon wq */ + struct work_struct report_free_page_work; /* The balloon servicing is delegated to a freezable workqueue. */ struct work_struct update_balloon_stats_work; @@ -63,6 +68,13 @@ struct virtio_balloon { spinlock_t stop_update_lock; bool stop_update; + /* Start to report free pages */ + bool report_free_page; + /* Stores the cmd id given by host to start the free page reporting */ + uint32_t start_cmd_id; + /* Stores STOP_ID as a sign to tell host that the reporting is done */ + uint32_t stop_cmd_id; + /* Waiting for host to ack the pages we released. */ wait_queue_head_t acked; @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct virtio_balloon *vb) return idx; } +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t len) +{ + struct scatterlist sg; + unsigned int unused; + int err; + + sg_init_table(, 1); + sg_set_page(, pfn_to_page(pfn), len, 0); + + /* Detach all the used buffers from the vq */ + while (virtqueue_get_buf(vq, )) + ; + + /* +* Since this is an optimization feature, losing a couple of free +* pages to report isn't important. We simply resturn without adding +* the page if the vq is full. We are adding one entry each time, +* which essentially results in no memory allocation, so the +* GFP_KERNEL flag below can be ignored. +*/ + if (vq->num_free) { + err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL); + /* +* This is expected to never fail, because there is always an +* entry available on the vq. +*/ + BUG_ON(err); + } +} + +static void batch_free_page_sg(struct virtqueue *vq, + unsigned long pfn, + uint32_t len) +{ + add_one_sg(vq, pfn, len); + + /* Batch till the vq is full */ + if (!vq->num_free) + virtqueue_kick(vq); +} + +static void send_cmd_id(struct virtqueue *vq, void *addr) +{ + struct scatterlist sg; + unsigned int unused; + int err; + + sg_init_one(, addr, sizeof(uint32_t)); + + /* +* This handles the cornercase that the vq happens to be full when +* adding a cmd id. Rarely happen in practice. +*/ + while (!vq->num_free) + virtqueue_get_buf(vq, ); + + err = virtqueue_add_outbuf(vq, , 1, vq, GFP_KERNEL); + /* +* This is expected to never fail, because there is always an +* entry available on the vq. +*/ + BUG_ON(err); + virtqueue_kick(vq); +} + /* * While most virtqueues communicate guest-initiated requests to the hypervisor, * the stats queue operates in reverse. The driver initializes the virtqueue @@ -316,17 +393,6 @@ static void stats_handle_request(struct virtio_balloon *vb) virtqueue_kick(vq); } -static void virtballoon_changed(struct virtio_device *vdev) -{ - struct virtio_balloon *vb = vdev->priv; - unsigned long flags; - - spin_lock_irqsave(>stop_update_lock, flags); - if
Re: [PATCH v22 2/3] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_VQ
> > Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_VQ feature indicates the > support of reporting hints of guest free pages to host via virtio-balloon. > > Host requests the guest to report free pages by sending a new cmd > id to the guest via the free_page_report_cmd_id configuration register. > > When the guest starts to report, the first element added to the free page > vq is the cmd id given by host. When the guest finishes the reporting > of all the free pages, VIRTIO_BALLOON_FREE_PAGE_REPORT_STOP_ID is added > to the vq to tell host that the reporting is done. Host may also requests > the guest to stop the reporting in advance by sending the stop cmd id to > the guest via the configuration register. > > Signed-off-by: Wei Wang> Signed-off-by: Liang Li > Cc: Michael S. Tsirkin > Cc: Michal Hocko > --- > drivers/virtio/virtio_balloon.c | 242 > +++- > include/uapi/linux/virtio_balloon.h | 4 + > 2 files changed, 214 insertions(+), 32 deletions(-) > > diff --git a/drivers/virtio/virtio_balloon.c > b/drivers/virtio/virtio_balloon.c > index a1fb52c..b9561a5 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -53,7 +53,12 @@ static struct vfsmount *balloon_mnt; > > struct virtio_balloon { > struct virtio_device *vdev; > - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq; > + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *free_page_vq; > + > + /* Balloon's own wq for cpu-intensive work items */ > + struct workqueue_struct *balloon_wq; > + /* The free page reporting work item submitted to the balloon wq */ > + struct work_struct report_free_page_work; > > /* The balloon servicing is delegated to a freezable workqueue. */ > struct work_struct update_balloon_stats_work; > @@ -63,6 +68,13 @@ struct virtio_balloon { > spinlock_t stop_update_lock; > bool stop_update; > > + /* Start to report free pages */ > + bool report_free_page; > + /* Stores the cmd id given by host to start the free page reporting */ > + uint32_t start_cmd_id; > + /* Stores STOP_ID as a sign to tell host that the reporting is done */ > + uint32_t stop_cmd_id; > + > /* Waiting for host to ack the pages we released. */ > wait_queue_head_t acked; > > @@ -281,6 +293,71 @@ static unsigned int update_balloon_stats(struct > virtio_balloon *vb) > return idx; > } > > +static void add_one_sg(struct virtqueue *vq, unsigned long pfn, uint32_t > len) > +{ > + struct scatterlist sg; > + unsigned int unused; > + int err; > + > + sg_init_table(, 1); > + sg_set_page(, pfn_to_page(pfn), len, 0); > + > + /* Detach all the used buffers from the vq */ > + while (virtqueue_get_buf(vq, )) > + ; > + > + /* > + * Since this is an optimization feature, losing a couple of free > + * pages to report isn't important. We simply resturn without adding > + * the page if the vq is full. We are adding one entry each time, > + * which essentially results in no memory allocation, so the > + * GFP_KERNEL flag below can be ignored. > + */ > + if (vq->num_free) { > + err = virtqueue_add_inbuf(vq, , 1, vq, GFP_KERNEL); > + /* > + * This is expected to never fail, because there is always an > + * entry available on the vq. > + */ > + BUG_ON(err); > + } > +} > + > +static void batch_free_page_sg(struct virtqueue *vq, > +unsigned long pfn, > +uint32_t len) > +{ > + add_one_sg(vq, pfn, len); > + > + /* Batch till the vq is full */ > + if (!vq->num_free) > + virtqueue_kick(vq); > +} > + > +static void send_cmd_id(struct virtqueue *vq, void *addr) > +{ > + struct scatterlist sg; > + unsigned int unused; > + int err; > + > + sg_init_one(, addr, sizeof(uint32_t)); > + > + /* > + * This handles the cornercase that the vq happens to be full when > + * adding a cmd id. Rarely happen in practice. > + */ > + while (!vq->num_free) > + virtqueue_get_buf(vq, ); > + > + err = virtqueue_add_outbuf(vq, , 1, vq, GFP_KERNEL); > + /* > + * This is expected to never fail, because there is always an > + * entry available on the vq. > + */ > + BUG_ON(err); > + virtqueue_kick(vq); > +} > + > /* > * While most virtqueues communicate guest-initiated requests to the > hypervisor, > * the stats queue operates in reverse. The driver initializes the > virtqueue > @@ -316,17 +393,6 @@ static void stats_handle_request(struct virtio_balloon > *vb) > virtqueue_kick(vq); > } > > -static void virtballoon_changed(struct virtio_device *vdev) > -{ > - struct virtio_balloon *vb = vdev->priv; > -