[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Siwei Liu
On Tue, Mar 13, 2018 at 5:28 PM, Samudrala, Sridhar
 wrote:
> On 3/12/2018 3:44 PM, Siwei Liu wrote:
>>
>> Apologies, still some comments going. Please see inline.
>>
>> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
>>  wrote:
>>>
>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>> netdev is present with the same MAC address. It allows live migration
>>> of a VM with a direct attached VF without the need to setup a bond/team
>>> between a VF and virtio net device in the guest.
>>>
>>> The hypervisor needs to enable only one datapath at any time so that
>>> packets don't get looped back to the VM over the other datapath. When a
>>> VF
>>> is plugged, the virtio datapath link state can be marked as down. The
>>> hypervisor needs to unplug the VF device from the guest on the source
>>> host
>>> and reset the MAC filter of the VF to initiate failover of datapath to
>>> virtio before starting the migration. After the migration is completed,
>>> the destination hypervisor sets the MAC filter on the VF and plugs it
>>> back
>>> to the guest to switch over to VF datapath.
>>>
>>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>>> created that acts as a master device and tracks the state of the 2 lower
>>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and
>>> a
>>> passthru device with the same MAC is registered as 'active' netdev.
>>>
>>> This patch is based on the discussion initiated by Jesse on this thread.
>>> https://marc.info/?l=linux-virtualization=151189725224231=2
>>>
>>> Signed-off-by: Sridhar Samudrala 
>>> Signed-off-by: Alexander Duyck 
>>> Reviewed-by: Jesse Brandeburg 
>>> ---
>>>   drivers/net/virtio_net.c | 683
>>> ++-
>>>   1 file changed, 682 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index bcd13fe906ca..f2860d86c952 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -30,6 +30,8 @@
>>>   #include 
>>>   #include 
>>>   #include 
>>> +#include 
>>> +#include 
>>>   #include 
>>>   #include 
>>>
>>> @@ -206,6 +208,9 @@ struct virtnet_info {
>>>  u32 speed;
>>>
>>>  unsigned long guest_offloads;
>>> +
>>> +   /* upper netdev created when BACKUP feature enabled */
>>> +   struct net_device *bypass_netdev;
>>>   };
>>>
>>>   struct padded_vnet_hdr {
>>> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev,
>>> struct netdev_bpf *xdp)
>>>  }
>>>   }
>>>
>>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>>> + size_t len)
>>> +{
>>> +   struct virtnet_info *vi = netdev_priv(dev);
>>> +   int ret;
>>> +
>>> +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   ret = snprintf(buf, len, "_bkup");
>>> +   if (ret >= len)
>>> +   return -EOPNOTSUPP;
>>> +
>>> +   return 0;
>>> +}
>>> +
>>>   static const struct net_device_ops virtnet_netdev = {
>>>  .ndo_open= virtnet_open,
>>>  .ndo_stop= virtnet_close,
>>> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev =
>>> {
>>>  .ndo_xdp_xmit   = virtnet_xdp_xmit,
>>>  .ndo_xdp_flush  = virtnet_xdp_flush,
>>>  .ndo_features_check = passthru_features_check,
>>> +   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>>>   };
>>>
>>>   static void virtnet_config_changed_work(struct work_struct *work)
>>> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device
>>> *vdev)
>>>  return 0;
>>>   }
>>>
>>> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
>>> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
>>> + * is created that acts as a master device and tracks the state of the
>>> + * 2 lower netdevs. The original virtio_net netdev is registered as
>>> + * 'backup' netdev and a passthru device with the same MAC is registered
>>> + * as 'active' netdev.
>>> + */
>>> +
>>> +/* bypass state maintained when BACKUP feature is enabled */
>>> +struct virtnet_bypass_info {
>>> +   /* passthru netdev with same MAC */
>>> +   struct net_device __rcu *active_netdev;
>>> +
>>> +   /* virtio_net netdev */
>>> +   struct net_device __rcu *backup_netdev;
>>> +
>>> +   /* active netdev stats */
>>> +   struct rtnl_link_stats64 active_stats;
>>> +
>>> +   /* backup netdev stats */
>>> +   struct rtnl_link_stats64 backup_stats;
>>> +
>>> +   /* aggregated stats */
>>> +   struct rtnl_link_stats64 bypass_stats;
>>> +
>>> +   /* spinlock while updating stats */
>>> +   spinlock_t stats_lock;
>>> +};
>>> +
>>> 

[virtio-dev] [PATCH] github: PULL_REQUEST_TEMPLATE.md

2018-03-13 Thread Michael S. Tsirkin
Document that we don't accept pull requests at this point.

Signed-off-by: Michael S. Tsirkin 
---
 .github/PULL_REQUEST_TEMPLATE.md | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 .github/PULL_REQUEST_TEMPLATE.md

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index 000..69d41fc
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,10 @@
+# Thanks for proposing a change to the Virtual I/O Device (VIRTIO) 
specification!
+The VIRTIO TC is not yet accepting pull requests at this time as they are not
+integrated with our voting system.
+
+Instead, please
+- [] Propose the spec change (preferably as a patch) on the [mailing 
list](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio#feedback).
+- [] Open an [issue](https://github.com/oasis-tcs/virtio-spec/issues),
+ including the link to the proposal in the [mailing list 
archives](https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=virtio#feedback).
+
+The TC will vote and apply the change.
-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-13 Thread Michael S. Tsirkin
On Wed, Mar 14, 2018 at 10:43:01AM +0800, Wei Wang wrote:
> On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> > 
> > > Signed-off-by: Wei Wang 
> > > Signed-off-by: Liang Li 
> > > CC: Michael S. Tsirkin 
> > > CC: Dr. David Alan Gilbert 
> > > CC: Juan Quintela 
> > I find it suspicious that neither unrealize nor reset
> > functions have been touched at all.
> > Are you sure you have thought through scenarious like
> > hot-unplug or disabling the device by guest?
> 
> OK. I think we can call balloon_free_page_stop in unrealize and reset.
> 
> 
> > +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> > +{
> > +VirtQueueElement *elem;
> > +VirtIOBalloon *dev = opaque;
> > +VirtQueue *vq = dev->free_page_vq;
> > +uint32_t id;
> > +size_t size;
> > What makes it safe to poke at this device from multiple threads?
> > I think that it would be safer to do it from e.g. BH.
> > 
> 
> Actually the free_page_optimization thread is the only user of free_page_vq,
> and there is only one optimization thread each time. Would this be safe
> enough?
> 
> Best,
> Wei

Aren't there other fields there? Also things like reset affect all VQs.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon

2018-03-13 Thread Michael S. Tsirkin
On Wed, Mar 14, 2018 at 10:41:36AM +0800, Wei Wang wrote:
> On 03/14/2018 12:35 AM, Michael S. Tsirkin wrote:
> > On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> > > Start the free page optimization after the migration bitmap is
> > > synchronized. This can't be used in the stop phase since the guest
> > > is paused. Make sure the guest reporting has stopped before
> > > synchronizing the migration dirty bitmap. Currently, the optimization is
> > > added to precopy only.
> > > 
> > > Signed-off-by: Wei Wang 
> > > CC: Dr. David Alan Gilbert 
> > > CC: Juan Quintela 
> > > CC: Michael S. Tsirkin 
> > > ---
> > >   migration/ram.c | 19 ++-
> > >   1 file changed, 18 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/ram.c b/migration/ram.c
> > > index e172798..7b4c9b1 100644
> > > --- a/migration/ram.c
> > > +++ b/migration/ram.c
> > > @@ -51,6 +51,8 @@
> > >   #include "qemu/rcu_queue.h"
> > >   #include "migration/colo.h"
> > >   #include "migration/block.h"
> > > +#include "sysemu/balloon.h"
> > > +#include "sysemu/sysemu.h"
> > >   /***/
> > >   /* ram save/restore */
> > > @@ -208,6 +210,8 @@ struct RAMState {
> > >   uint32_t last_version;
> > >   /* We are in the first round */
> > >   bool ram_bulk_stage;
> > > +/* The free pages optimization feature is supported */
> > > +bool free_page_support;
> > >   /* How many times we have dirty too many pages */
> > >   int dirty_rate_high_cnt;
> > >   /* these variables are used for bitmap sync */
> > > @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState 
> > > *rs, RAMBlock *rb,
> > >   unsigned long *bitmap = rb->bmap;
> > >   unsigned long next;
> > > -if (rs->ram_bulk_stage && start > 0) {
> > > +if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
> > >   next = start + 1;
> > >   } else {
> > >   next = find_next_bit(bitmap, size, start);
> > > @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >   int64_t end_time;
> > >   uint64_t bytes_xfer_now;
> > > +if (rs->free_page_support) {
> > > +balloon_free_page_stop();
> > > +}
> > > +
> > >   ram_counters.dirty_sync_count++;
> > >   if (!rs->time_last_bitmap_sync) {
> > > @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
> > >   if (migrate_use_events()) {
> > >   qapi_event_send_migration_pass(ram_counters.dirty_sync_count, 
> > > NULL);
> > >   }
> > > +
> > > +if (rs->free_page_support && runstate_is_running()) {
> > > +balloon_free_page_start();
> > > +}
> > >   }
> > I think some of these conditions should go into
> > balloon_free_page_start/stop.
> > 
> > Checking runstate is generally problematic unless you
> > also handle run state change notifiers as it can
> > be manipulated from QMP.
> 
> How about moving the check of runstate to
> virtio_balloon_poll_free_page_hints:
> 
> while (dev->free_page_report_status < FREE_PAGE_REPORT_S_STOP &&
> runstate_is_running()) {
> ...
> }

Hard to tell on the outset. E.g. why is just stop affected?  Pls add
comments explaining what happens if VM is not running when start or stop
is called.


> In this case, I think we won't need a notifier - if the run state is changed
> by qmp, the optimization thread will just exist.

But you need to wake it up and notify the guest presumably?

> 
> > >   /**
> > > @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
> > >   rs->last_page = 0;
> > >   rs->last_version = ram_list.version;
> > >   rs->ram_bulk_stage = true;
> > > +rs->free_page_support = balloon_free_page_support() &
> > > +!migration_in_postcopy();
> > Probably &&?
> > 
> 
> OK, will use &&. (Both work well here actually, since all of the values here
> are boolean)
> 
> 
> Best,
> Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



Re: [virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-13 Thread Wei Wang

On 03/14/2018 12:49 AM, Michael S. Tsirkin wrote:

On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:


Signed-off-by: Wei Wang 
Signed-off-by: Liang Li 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 

I find it suspicious that neither unrealize nor reset
functions have been touched at all.
Are you sure you have thought through scenarious like
hot-unplug or disabling the device by guest?
   


OK. I think we can call balloon_free_page_stop in unrealize and reset.


  
+static void *virtio_balloon_poll_free_page_hints(void *opaque)

+{
+VirtQueueElement *elem;
+VirtIOBalloon *dev = opaque;
+VirtQueue *vq = dev->free_page_vq;
+uint32_t id;
+size_t size;
What makes it safe to poke at this device from multiple threads?
I think that it would be safer to do it from e.g. BH.



Actually the free_page_optimization thread is the only user of 
free_page_vq, and there is only one optimization thread each time. Would 
this be safe enough?


Best,
Wei

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Michael S. Tsirkin
On Tue, Mar 13, 2018 at 05:28:07PM -0700, Samudrala, Sridhar wrote:
> > I am not sure if it's a good idea to leave the
> > virtio_bypass around if running into failure: the guest is not
> > migratable as the VF doesn't have a backup path,
> 
> Are you talking about a failure when registering backup netdev?  This should 
> not
> happen, but i guess we can improve error handling in such scenario.

A nice way to do this would be to clear the backup feature bit.

> 
> > And perhaps the worse
> > part is that, it now has two interfaces with identical MAC address but
> > one of them is invalid (user cannot use the virtio interface as it has
> > a dampened datapath). IMHO the virtio_bypass and its lower netdev
> > should be destroyed at all when it fails to bind the VF, and
> > technically, there should be some way to propogate the failure status
> > to the hypervisor/backend, indicating that the VM is not migratable
> > because of guest software errors (maybe by clearing out the backup
> > feature from the guest virtio driver so host can see/learn it).
> 
> In BACKUP mode, user can only use the upper virtio_bypass netdev and that will
> always be there. Any failure to enslave VF netdev is not fatal, but i will see
> if we can improve the error handling of failure to enslave backup netdev.
> Also, i don't think the BACKUP feature bit is negotiable with the host.
> 
> Thanks
> Sridhar

All bits are negotiable.  It's up to the host whether to support
a device with this bit clear, or to fail negotiation.

-- 
MST

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Samudrala, Sridhar

On 3/12/2018 2:08 PM, Jiri Pirko wrote:

Mon, Mar 12, 2018 at 09:58:06PM CET, sridhar.samudr...@intel.com wrote:


On 3/12/2018 1:12 PM, Jiri Pirko wrote:

Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudr...@intel.com wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 

[...]



+static int virtnet_bypass_register_child(struct net_device *child_netdev)
+{
+   struct virtnet_bypass_info *vbi;
+   struct net_device *dev;
+   bool backup;
+   int ret;
+
+   if (child_netdev->addr_len != ETH_ALEN)
+   return NOTIFY_DONE;
+
+   /* We will use the MAC address to locate the virtnet_bypass netdev
+* to associate with the child netdev. If we don't find a matching
+* bypass netdev, move on.
+*/
+   dev = get_virtnet_bypass_bymac(dev_net(child_netdev),
+  child_netdev->perm_addr);
+   if (!dev)
+   return NOTIFY_DONE;
+
+   vbi = netdev_priv(dev);
+   backup = (child_netdev->dev.parent == dev->dev.parent);
+   if (backup ? rtnl_dereference(vbi->backup_netdev) :
+   rtnl_dereference(vbi->active_netdev)) {
+   netdev_info(dev,
+   "%s attempting to join bypass dev when %s already 
present\n",
+   child_netdev->name, backup ? "backup" : "active");
+   return NOTIFY_DONE;
+   }
+
+   /* Avoid non pci devices as active netdev */
+   if (!backup && (!child_netdev->dev.parent ||
+   !dev_is_pci(child_netdev->dev.parent)))
+   return NOTIFY_DONE;
+
+   ret = netdev_rx_handler_register(child_netdev,
+virtnet_bypass_handle_frame, dev);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not register bypass receive handler (err = 
%d)\n",
+  ret);
+   goto rx_handler_failed;
+   }
+
+   ret = netdev_upper_dev_link(child_netdev, dev, NULL);
+   if (ret != 0) {
+   netdev_err(child_netdev,
+  "can not set master device %s (err = %d)\n",
+  dev->name, ret);
+   goto upper_link_failed;
+   }
+
+   child_netdev->flags |= IFF_SLAVE;
+
+   if (netif_running(dev)) {
+   ret = dev_open(child_netdev);
+   if (ret && (ret != -EBUSY)) {
+   netdev_err(dev, "Opening child %s failed ret:%d\n",
+  child_netdev->name, ret);
+   goto err_interface_up;
+   }
+   }

Much of this function is copy of netvsc_vf_join, should be shared with
netvsc.

Yes. This is based on netvsc_register_vf() and netvsc_vf_join(). But modified
to handle registration of 2 child netdevs(backup virtio and active vf).  In case
of netvsc, they only register VF. There is no backup netdev.
I think trying to make this code shareable with netvsc will introduce additional
checks and make it convoluted.

No problem.


Not clear what you meant here?
Want to confirm that you are agreeing that it is OK to not share this function
with netvsc.










+
+   /* Align MTU of child with master */
+   ret = dev_set_mtu(child_netdev, dev->mtu);
+   if (ret) {
+   netdev_err(dev,
+  "unable to change mtu of %s to %u register failed\n",
+  child_netdev->name, dev->mtu);
+   goto err_set_mtu;
+   }
+
+   call_netdevice_notifiers(NETDEV_JOIN, child_netdev);
+
+ 

[virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-13 Thread Samudrala, Sridhar

On 3/12/2018 3:44 PM, Siwei Liu wrote:

Apologies, still some comments going. Please see inline.

On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
 wrote:

This patch enables virtio_net to switch over to a VF datapath when a VF
netdev is present with the same MAC address. It allows live migration
of a VM with a direct attached VF without the need to setup a bond/team
between a VF and virtio net device in the guest.

The hypervisor needs to enable only one datapath at any time so that
packets don't get looped back to the VM over the other datapath. When a VF
is plugged, the virtio datapath link state can be marked as down. The
hypervisor needs to unplug the VF device from the guest on the source host
and reset the MAC filter of the VF to initiate failover of datapath to
virtio before starting the migration. After the migration is completed,
the destination hypervisor sets the MAC filter on the VF and plugs it back
to the guest to switch over to VF datapath.

When BACKUP feature is enabled, an additional netdev(bypass netdev) is
created that acts as a master device and tracks the state of the 2 lower
netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
passthru device with the same MAC is registered as 'active' netdev.

This patch is based on the discussion initiated by Jesse on this thread.
https://marc.info/?l=linux-virtualization=151189725224231=2

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Alexander Duyck 
Reviewed-by: Jesse Brandeburg 
---
  drivers/net/virtio_net.c | 683 ++-
  1 file changed, 682 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bcd13fe906ca..f2860d86c952 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -30,6 +30,8 @@
  #include 
  #include 
  #include 
+#include 
+#include 
  #include 
  #include 

@@ -206,6 +208,9 @@ struct virtnet_info {
 u32 speed;

 unsigned long guest_offloads;
+
+   /* upper netdev created when BACKUP feature enabled */
+   struct net_device *bypass_netdev;
  };

  struct padded_vnet_hdr {
@@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct 
netdev_bpf *xdp)
 }
  }

+static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
+ size_t len)
+{
+   struct virtnet_info *vi = netdev_priv(dev);
+   int ret;
+
+   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
+   return -EOPNOTSUPP;
+
+   ret = snprintf(buf, len, "_bkup");
+   if (ret >= len)
+   return -EOPNOTSUPP;
+
+   return 0;
+}
+
  static const struct net_device_ops virtnet_netdev = {
 .ndo_open= virtnet_open,
 .ndo_stop= virtnet_close,
@@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
 .ndo_xdp_xmit   = virtnet_xdp_xmit,
 .ndo_xdp_flush  = virtnet_xdp_flush,
 .ndo_features_check = passthru_features_check,
+   .ndo_get_phys_port_name = virtnet_get_phys_port_name,
  };

  static void virtnet_config_changed_work(struct work_struct *work)
@@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device *vdev)
 return 0;
  }

+/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
+ * When BACKUP feature is enabled, an additional netdev(bypass netdev)
+ * is created that acts as a master device and tracks the state of the
+ * 2 lower netdevs. The original virtio_net netdev is registered as
+ * 'backup' netdev and a passthru device with the same MAC is registered
+ * as 'active' netdev.
+ */
+
+/* bypass state maintained when BACKUP feature is enabled */
+struct virtnet_bypass_info {
+   /* passthru netdev with same MAC */
+   struct net_device __rcu *active_netdev;
+
+   /* virtio_net netdev */
+   struct net_device __rcu *backup_netdev;
+
+   /* active netdev stats */
+   struct rtnl_link_stats64 active_stats;
+
+   /* backup netdev stats */
+   struct rtnl_link_stats64 backup_stats;
+
+   /* aggregated stats */
+   struct rtnl_link_stats64 bypass_stats;
+
+   /* spinlock while updating stats */
+   spinlock_t stats_lock;
+};
+
+static void virtnet_bypass_child_open(struct net_device *dev,
+ struct net_device *child_netdev)
+{
+   int err = dev_open(child_netdev);
+
+   if (err)
+   netdev_warn(dev, "unable to open slave: %s: %d\n",
+   child_netdev->name, err);
+}

Why ignoring the error?? i.e. virtnet_bypass_child_open should have
return value. I don't believe the caller is in a safe context to
guarantee the dev_open always succeeds.


OK.  Will handle this in the next revision.





+
+static int virtnet_bypass_open(struct 

[virtio-dev] [PULL 03/22] virtio-net: add linkspeed and duplex settings to virtio-net

2018-03-13 Thread Michael S. Tsirkin
From: Jason Baron 

Although linkspeed and duplex can be set in a linux guest via 'ethtool -s',
this requires custom ethtool commands for virtio-net by default.

Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.

Linkspeed and duplex settings can be set as:
'-device virtio-net,speed=1,duplex=full'

where speed is [0...INT_MAX], and duplex is ["half"|"full"].

Signed-off-by: Jason Baron 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio-net.h |  3 +++
 hw/net/virtio-net.c| 26 ++
 2 files changed, 29 insertions(+)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index e7634c9..02484dc 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -38,6 +38,9 @@ typedef struct virtio_net_conf
 uint16_t rx_queue_size;
 uint16_t tx_queue_size;
 uint16_t mtu;
+int32_t speed;
+char *duplex_str;
+uint8_t duplex;
 } virtio_net_conf;
 
 /* Maximum packet size we can receive from tap device: header + 64k */
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index ab06f93..67ad38c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -26,6 +26,7 @@
 #include "qapi/qapi-events-net.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "standard-headers/linux/ethtool.h"
 
 #define VIRTIO_NET_VM_VERSION11
 
@@ -61,6 +62,8 @@ static VirtIOFeature feature_sizes[] = {
  .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
 {.flags = 1ULL << VIRTIO_NET_F_MTU,
  .end = endof(struct virtio_net_config, mtu)},
+{.flags = 1ULL << VIRTIO_NET_F_SPEED_DUPLEX,
+ .end = endof(struct virtio_net_config, duplex)},
 {}
 };
 
@@ -89,6 +92,8 @@ static void virtio_net_get_config(VirtIODevice *vdev, uint8_t 
*config)
 virtio_stw_p(vdev, _virtqueue_pairs, n->max_queues);
 virtio_stw_p(vdev, , n->net_conf.mtu);
 memcpy(netcfg.mac, n->mac, ETH_ALEN);
+virtio_stl_p(vdev, , n->net_conf.speed);
+netcfg.duplex = n->net_conf.duplex;
 memcpy(config, , n->config_size);
 }
 
@@ -1941,6 +1946,25 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
 }
 
+if (n->net_conf.duplex_str) {
+if (strncmp(n->net_conf.duplex_str, "half", 5) == 0) {
+n->net_conf.duplex = DUPLEX_HALF;
+} else if (strncmp(n->net_conf.duplex_str, "full", 5) == 0) {
+n->net_conf.duplex = DUPLEX_FULL;
+} else {
+error_setg(errp, "'duplex' must be 'half' or 'full'");
+}
+n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+} else {
+n->net_conf.duplex = DUPLEX_UNKNOWN;
+}
+
+if (n->net_conf.speed < SPEED_UNKNOWN) {
+error_setg(errp, "'speed' must be between 0 and INT_MAX");
+} else if (n->net_conf.speed >= 0) {
+n->host_features |= (1ULL << VIRTIO_NET_F_SPEED_DUPLEX);
+}
+
 virtio_net_set_config_size(n, n->host_features);
 virtio_init(vdev, "virtio-net", VIRTIO_ID_NET, n->config_size);
 
@@ -2161,6 +2185,8 @@ static Property virtio_net_properties[] = {
 DEFINE_PROP_UINT16("host_mtu", VirtIONet, net_conf.mtu, 0),
 DEFINE_PROP_BOOL("x-mtu-bypass-backend", VirtIONet, mtu_bypass_backend,
  true),
+DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
+DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
MST


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [PULL 02/22] virtio-net: use 64-bit values for feature flags

2018-03-13 Thread Michael S. Tsirkin
From: Jason Baron 

In prepartion for using some of the high order feature bits, make sure that
virtio-net uses 64-bit values everywhere.

Signed-off-by: Jason Baron 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: virtio-dev@lists.oasis-open.org
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/virtio/virtio-net.h |  2 +-
 hw/net/virtio-net.c| 55 +-
 2 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index b81b6a4..e7634c9 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -67,7 +67,7 @@ typedef struct VirtIONet {
 uint32_t has_vnet_hdr;
 size_t host_hdr_len;
 size_t guest_hdr_len;
-uint32_t host_features;
+uint64_t host_features;
 uint8_t has_ufo;
 uint32_t mergeable_rx_bufs;
 uint8_t promisc;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 188744e..ab06f93 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,18 +48,18 @@
 (offsetof(container, field) + sizeof(((container *)0)->field))
 
 typedef struct VirtIOFeature {
-uint32_t flags;
+uint64_t flags;
 size_t end;
 } VirtIOFeature;
 
 static VirtIOFeature feature_sizes[] = {
-{.flags = 1 << VIRTIO_NET_F_MAC,
+{.flags = 1ULL << VIRTIO_NET_F_MAC,
  .end = endof(struct virtio_net_config, mac)},
-{.flags = 1 << VIRTIO_NET_F_STATUS,
+{.flags = 1ULL << VIRTIO_NET_F_STATUS,
  .end = endof(struct virtio_net_config, status)},
-{.flags = 1 << VIRTIO_NET_F_MQ,
+{.flags = 1ULL << VIRTIO_NET_F_MQ,
  .end = endof(struct virtio_net_config, max_virtqueue_pairs)},
-{.flags = 1 << VIRTIO_NET_F_MTU,
+{.flags = 1ULL << VIRTIO_NET_F_MTU,
  .end = endof(struct virtio_net_config, mtu)},
 {}
 };
@@ -1938,7 +1938,7 @@ static void virtio_net_device_realize(DeviceState *dev, 
Error **errp)
 int i;
 
 if (n->net_conf.mtu) {
-n->host_features |= (0x1 << VIRTIO_NET_F_MTU);
+n->host_features |= (1ULL << VIRTIO_NET_F_MTU);
 }
 
 virtio_net_set_config_size(n, n->host_features);
@@ -2109,45 +2109,46 @@ static const VMStateDescription vmstate_virtio_net = {
 };
 
 static Property virtio_net_properties[] = {
-DEFINE_PROP_BIT("csum", VirtIONet, host_features, VIRTIO_NET_F_CSUM, true),
-DEFINE_PROP_BIT("guest_csum", VirtIONet, host_features,
+DEFINE_PROP_BIT64("csum", VirtIONet, host_features,
+VIRTIO_NET_F_CSUM, true),
+DEFINE_PROP_BIT64("guest_csum", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_CSUM, true),
-DEFINE_PROP_BIT("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
-DEFINE_PROP_BIT("guest_tso4", VirtIONet, host_features,
+DEFINE_PROP_BIT64("gso", VirtIONet, host_features, VIRTIO_NET_F_GSO, true),
+DEFINE_PROP_BIT64("guest_tso4", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_TSO4, true),
-DEFINE_PROP_BIT("guest_tso6", VirtIONet, host_features,
+DEFINE_PROP_BIT64("guest_tso6", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_TSO6, true),
-DEFINE_PROP_BIT("guest_ecn", VirtIONet, host_features,
+DEFINE_PROP_BIT64("guest_ecn", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_ECN, true),
-DEFINE_PROP_BIT("guest_ufo", VirtIONet, host_features,
+DEFINE_PROP_BIT64("guest_ufo", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_UFO, true),
-DEFINE_PROP_BIT("guest_announce", VirtIONet, host_features,
+DEFINE_PROP_BIT64("guest_announce", VirtIONet, host_features,
 VIRTIO_NET_F_GUEST_ANNOUNCE, true),
-DEFINE_PROP_BIT("host_tso4", VirtIONet, host_features,
+DEFINE_PROP_BIT64("host_tso4", VirtIONet, host_features,
 VIRTIO_NET_F_HOST_TSO4, true),
-DEFINE_PROP_BIT("host_tso6", VirtIONet, host_features,
+DEFINE_PROP_BIT64("host_tso6", VirtIONet, host_features,
 VIRTIO_NET_F_HOST_TSO6, true),
-DEFINE_PROP_BIT("host_ecn", VirtIONet, host_features,
+DEFINE_PROP_BIT64("host_ecn", VirtIONet, host_features,
 VIRTIO_NET_F_HOST_ECN, true),
-DEFINE_PROP_BIT("host_ufo", VirtIONet, host_features,
+DEFINE_PROP_BIT64("host_ufo", VirtIONet, host_features,
 VIRTIO_NET_F_HOST_UFO, true),
-DEFINE_PROP_BIT("mrg_rxbuf", VirtIONet, host_features,
+DEFINE_PROP_BIT64("mrg_rxbuf", VirtIONet, host_features,
 VIRTIO_NET_F_MRG_RXBUF, true),
-DEFINE_PROP_BIT("status", VirtIONet, host_features,
+DEFINE_PROP_BIT64("status", VirtIONet, host_features,
 VIRTIO_NET_F_STATUS, true),
-DEFINE_PROP_BIT("ctrl_vq", VirtIONet, host_features,
+DEFINE_PROP_BIT64("ctrl_vq", VirtIONet, host_features,
 

[virtio-dev] [PULL 01/22] scripts/update-linux-headers: add ethtool.h and update to 4.16.0-rc4

2018-03-13 Thread Michael S. Tsirkin
From: Jason Baron 

A subsequent patch to add support for setting linkspeed/duplex in
virtio-net, requires a few definitions from ethtool.h, which ends up
pulling in kernel.h and sysinfo.h as well.

Signed-off-by: Jason Baron 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: virtio-dev@lists.oasis-open.org
---
 include/standard-headers/linux/ethtool.h | 1821 ++
 include/standard-headers/linux/kernel.h  |   15 +
 include/standard-headers/linux/sysinfo.h |   25 +
 scripts/update-linux-headers.sh  |   11 +-
 4 files changed, 1871 insertions(+), 1 deletion(-)
 create mode 100644 include/standard-headers/linux/ethtool.h
 create mode 100644 include/standard-headers/linux/kernel.h
 create mode 100644 include/standard-headers/linux/sysinfo.h

diff --git a/include/standard-headers/linux/ethtool.h 
b/include/standard-headers/linux/ethtool.h
new file mode 100644
index 000..94aacb7
--- /dev/null
+++ b/include/standard-headers/linux/ethtool.h
@@ -0,0 +1,1821 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * ethtool.h: Defines for Linux ethtool.
+ *
+ * Copyright (C) 1998 David S. Miller (da...@redhat.com)
+ * Copyright 2001 Jeff Garzik 
+ * Portions Copyright 2001 Sun Microsystems (thoc...@sun.com)
+ * Portions Copyright 2002 Intel (eli.kuperm...@intel.com,
+ *christopher.le...@intel.com,
+ *scott.feld...@intel.com)
+ * Portions Copyright (C) Sun Microsystems 2008
+ */
+
+#ifndef _LINUX_ETHTOOL_H
+#define _LINUX_ETHTOOL_H
+
+#include "net/eth.h"
+
+#include "standard-headers/linux/kernel.h"
+#include "standard-headers/linux/types.h"
+#include "standard-headers/linux/if_ether.h"
+
+#include  /* for INT_MAX */
+
+/* All structures exposed to userland should be defined such that they
+ * have the same layout for 32-bit and 64-bit userland.
+ */
+
+/**
+ * struct ethtool_cmd - DEPRECATED, link control and status
+ * This structure is DEPRECATED, please use struct ethtool_link_settings.
+ * @cmd: Command number = %ETHTOOL_GSET or %ETHTOOL_SSET
+ * @supported: Bitmask of %SUPPORTED_* flags for the link modes,
+ * physical connectors and other link features for which the
+ * interface supports autonegotiation or auto-detection.
+ * Read-only.
+ * @advertising: Bitmask of %ADVERTISED_* flags for the link modes,
+ * physical connectors and other link features that are
+ * advertised through autonegotiation or enabled for
+ * auto-detection.
+ * @speed: Low bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @duplex: Duplex mode; one of %DUPLEX_*
+ * @port: Physical connector type; one of %PORT_*
+ * @phy_address: MDIO address of PHY (transceiver); 0 or 255 if not
+ * applicable.  For clause 45 PHYs this is the PRTAD.
+ * @transceiver: Historically used to distinguish different possible
+ * PHY types, but not in a consistent way.  Deprecated.
+ * @autoneg: Enable/disable autonegotiation and auto-detection;
+ * either %AUTONEG_DISABLE or %AUTONEG_ENABLE
+ * @mdio_support: Bitmask of %ETH_MDIO_SUPPORTS_* flags for the MDIO
+ * protocols supported by the interface; 0 if unknown.
+ * Read-only.
+ * @maxtxpkt: Historically used to report TX IRQ coalescing; now
+ * obsoleted by  ethtool_coalesce.  Read-only; deprecated.
+ * @maxrxpkt: Historically used to report RX IRQ coalescing; now
+ * obsoleted by  ethtool_coalesce.  Read-only; deprecated.
+ * @speed_hi: High bits of the speed, 1Mb units, 0 to INT_MAX or SPEED_UNKNOWN
+ * @eth_tp_mdix: Ethernet twisted-pair MDI(-X) status; one of
+ * %ETH_TP_MDI_*.  If the status is unknown or not applicable, the
+ * value will be %ETH_TP_MDI_INVALID.  Read-only.
+ * @eth_tp_mdix_ctrl: Ethernet twisted pair MDI(-X) control; one of
+ * %ETH_TP_MDI_*.  If MDI(-X) control is not implemented, reads
+ * yield %ETH_TP_MDI_INVALID and writes may be ignored or rejected.
+ * When written successfully, the link should be renegotiated if
+ * necessary.
+ * @lp_advertising: Bitmask of %ADVERTISED_* flags for the link modes
+ * and other link features that the link partner advertised
+ * through autonegotiation; 0 if unknown or not applicable.
+ * Read-only.
+ *
+ * The link speed in Mbps is split between @speed and @speed_hi.  Use
+ * the ethtool_cmd_speed() and ethtool_cmd_speed_set() functions to
+ * access it.
+ *
+ * If autonegotiation is disabled, the speed and @duplex represent the
+ * fixed link mode and are writable if the driver supports multiple
+ * link modes.  If it is enabled then they are read-only; if the link
+ * is up they represent the negotiated link mode; if the link is down,
+ * the speed is 0, %SPEED_UNKNOWN or the highest enabled speed and
+ * @duplex is %DUPLEX_UNKNOWN or the best enabled duplex mode.
+ *
+ * Some hardware interfaces may have multiple PHYs and/or 

[virtio-dev] [pci PATCH v6 4/5] nvme: Migrate over to unmanaged SR-IOV support

2018-03-13 Thread Alexander Duyck
From: Alexander Duyck 

Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver we can just reuse the existing
pci_sriov_configure_simple function.

Signed-off-by: Alexander Duyck 
---

v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition

 drivers/nvme/host/pci.c |   20 +---
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 5933a5c732e8..5e963058882a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2580,24 +2580,6 @@ static void nvme_remove(struct pci_dev *pdev)
nvme_put_ctrl(>ctrl);
 }
 
-static int nvme_pci_sriov_configure(struct pci_dev *pdev, int numvfs)
-{
-   int ret = 0;
-
-   if (numvfs == 0) {
-   if (pci_vfs_assigned(pdev)) {
-   dev_warn(>dev,
-   "Cannot disable SR-IOV VFs while assigned\n");
-   return -EPERM;
-   }
-   pci_disable_sriov(pdev);
-   return 0;
-   }
-
-   ret = pci_enable_sriov(pdev, numvfs);
-   return ret ? ret : numvfs;
-}
-
 #ifdef CONFIG_PM_SLEEP
 static int nvme_suspend(struct device *dev)
 {
@@ -2716,7 +2698,7 @@ static void nvme_error_resume(struct pci_dev *pdev)
.driver = {
.pm = _dev_pm_ops,
},
-   .sriov_configure = nvme_pci_sriov_configure,
+   .sriov_configure = pci_sriov_configure_simple,
.err_handler= _err_handler,
 };
 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [pci PATCH v6 5/5] pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs

2018-03-13 Thread Alexander Duyck
From: Alexander Duyck 

Add a new driver called "pci-pf-stub" to act as a "white-list" for PF
devices that provide no other functionality other then acting as a means of
allocating a set of VFs. For now I only have one example ID provided by
Amazon in terms of devices that require this functionality. The general
idea is that in the future we will see other devices added as vendors come
up with devices where the PF is more or less just a lightweight shim used
to allocate VFs.

Signed-off-by: Alexander Duyck 
---

v6: New driver to address concerns about Amazon devices left unsupported

 drivers/pci/Kconfig   |   12 +
 drivers/pci/Makefile  |2 ++
 drivers/pci/pci-pf-stub.c |   60 +
 include/linux/pci_ids.h   |2 ++
 4 files changed, 76 insertions(+)
 create mode 100644 drivers/pci/pci-pf-stub.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8f8480..cdef2a2a9bc5 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -71,6 +71,18 @@ config PCI_STUB
 
  When in doubt, say N.
 
+config PCI_PF_STUB
+   tristate "PCI PF Stub driver"
+   depends on PCI
+   depends on PCI_IOV
+   help
+ Say Y or M here if you want to enable support for devices that
+ require SR-IOV support, while at the same time the PF itself is
+ not providing any actual services on the host itself such as
+ storage or networking.
+
+ When in doubt, say N.
+
 config XEN_PCIDEV_FRONTEND
 tristate "Xen PCI Frontend"
 depends on PCI && X86 && XEN
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 941970936840..4e133d3df403 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -43,6 +43,8 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
 
 obj-$(CONFIG_PCI_STUB) += pci-stub.o
 
+obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
+
 obj-$(CONFIG_PCI_ECAM) += ecam.o
 
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
new file mode 100644
index ..d218924d9bdb
--- /dev/null
+++ b/drivers/pci/pci-pf-stub.c
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: GPL-2.0
+/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
+ *
+ * This driver is meant to act as a "white-list" for devices that provde
+ * SR-IOV functionality while at the same time not actually needing a
+ * driver of their own.
+ */
+
+#include 
+#include 
+
+/**
+ * pci_pf_stub_white_list - White list of devices to bind pci-pf-stub onto
+ *
+ * This table provides the list of IDs this driver is supposed to bind
+ * onto. You could think of this as a list of "quirked" devices where we
+ * are adding support for SR-IOV here since there are no other drivers
+ * that they would be running under.
+ *
+ * Layout of the table below is as follows:
+ * { Vendor ID, Device ID,
+ *   SubVendor ID, SubDevice ID,
+ *   Class, Class Mask,
+ *   private data (not used) }
+ */
+static const struct pci_device_id pci_pf_stub_white_list[] = {
+   { PCI_VDEVICE(AMAZON, 0x0053) },
+   /* required last entry */
+   { 0 }
+};
+MODULE_DEVICE_TABLE(pci, pci_pf_stub_white_list);
+
+static int pci_pf_stub_probe(struct pci_dev *dev,
+const struct pci_device_id *id)
+{
+   pci_info(dev, "claimed by pci-pf-stub\n");
+   return 0;
+}
+
+static struct pci_driver pf_stub_driver = {
+   .name   = "pci-pf-stub",
+   .id_table   = pci_pf_stub_white_list,
+   .probe  = pci_pf_stub_probe,
+   .sriov_configure= pci_sriov_configure_simple,
+};
+
+static int __init pci_pf_stub_init(void)
+{
+   return pci_register_driver(_stub_driver);
+}
+
+static void __exit pci_pf_stub_exit(void)
+{
+   pci_unregister_driver(_stub_driver);
+}
+
+module_init(pci_pf_stub_init);
+module_exit(pci_pf_stub_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index a6b30667a331..b10621896017 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2548,6 +2548,8 @@
 #define PCI_VENDOR_ID_CIRCUITCO0x1cc8
 #define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD 0x0001
 
+#define PCI_VENDOR_ID_AMAZON   0x1d0f
+
 #define PCI_VENDOR_ID_TEKRAM   0x1de1
 #define PCI_DEVICE_ID_TEKRAM_DC290 0xdc29
 


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [pci PATCH v6 3/5] ena: Migrate over to unmanaged SR-IOV support

2018-03-13 Thread Alexander Duyck
From: Alexander Duyck 

Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver we can just reuse the existing
pci_sriov_configure_simple function.

Signed-off-by: Alexander Duyck 
---

v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition

 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 +-
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 6975150d144e..6054deb1e6aa 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3385,32 +3385,6 @@ static int ena_probe(struct pci_dev *pdev, const struct 
pci_device_id *ent)
 }
 
 /*/
-static int ena_sriov_configure(struct pci_dev *dev, int numvfs)
-{
-   int rc;
-
-   if (numvfs > 0) {
-   rc = pci_enable_sriov(dev, numvfs);
-   if (rc != 0) {
-   dev_err(>dev,
-   "pci_enable_sriov failed to enable: %d vfs with 
the error: %d\n",
-   numvfs, rc);
-   return rc;
-   }
-
-   return numvfs;
-   }
-
-   if (numvfs == 0) {
-   pci_disable_sriov(dev);
-   return 0;
-   }
-
-   return -EINVAL;
-}
-
-/*/
-/*/
 
 /* ena_remove - Device Removal Routine
  * @pdev: PCI device information struct
@@ -3525,7 +3499,7 @@ static int ena_resume(struct pci_dev *pdev)
.suspend= ena_suspend,
.resume = ena_resume,
 #endif
-   .sriov_configure = ena_sriov_configure,
+   .sriov_configure = pci_sriov_configure_simple,
 };
 
 static int __init ena_init(void)


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [pci PATCH v6 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices

2018-03-13 Thread Alexander Duyck
From: Alexander Duyck 

Hardware-realized virtio_pci devices can implement SR-IOV, so this
patch enables its use. The device in question is an upcoming Intel
NIC that implements both a virtio_net PF and virtio_net VFs. These
are hardware realizations of what has been up to now been a software
interface.

The device in question has the following 4-part PCI IDs:

PF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 15fe
VF: vendor: 1af4 device: 1041 subvendor: 8086 subdevice: 05fe

The patch currently needs no check for device ID, because the callback
will never be made for devices that do not assert the capability or
when run on a platform incapable of SR-IOV.

One reason for this patch is because the hardware requires the
vendor ID of a VF to be the same as the vendor ID of the PF that
created it. So it seemed logical to simply have a fully-functioning
virtio_net PF create the VFs. This patch makes that possible.

Signed-off-by: Mark Rustad 
Signed-off-by: Alexander Duyck 
---

v4: Dropped call to pci_disable_sriov in virtio_pci_remove function
v5: Replaced call to pci_sriov_configure_unmanaged with
pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition

 drivers/virtio/virtio_pci_common.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/virtio/virtio_pci_common.c 
b/drivers/virtio/virtio_pci_common.c
index 48d4d1cf1cb6..67a227fd7aa0 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -596,6 +596,7 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
 #ifdef CONFIG_PM_SLEEP
.driver.pm  = _pci_pm_ops,
 #endif
+   .sriov_configure = pci_sriov_configure_simple,
 };
 
 module_pci_driver(virtio_pci_driver);


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [pci PATCH v6 1/5] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources

2018-03-13 Thread Alexander Duyck
From: Alexander Duyck 

This patch adds a common configuration function called
pci_sriov_configure_simple that will allow for managing VFs on devices
where the PF is not capable of managing VF resources.

Signed-off-by: Alexander Duyck 
---

v5: New patch replacing pci_sriov_configure_unmanaged with
  pci_sriov_configure_simple
Dropped bits related to autoprobe changes
v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled

 drivers/pci/iov.c   |   32 
 include/linux/pci.h |3 +++
 2 files changed, 35 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924ae0350..bd7021491fdb 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -807,3 +807,35 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
return dev->sriov->total_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
+
+/**
+ * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable, 0 to disable
+ *
+ * Used to provide generic enable/disable SR-IOV option for devices
+ * that do not manage the VFs generated by their driver
+ */
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
+{
+   int err = -EINVAL;
+
+   might_sleep();
+
+   if (!dev->is_physfn)
+   return -ENODEV;
+
+   if (pci_vfs_assigned(dev)) {
+   pci_warn(dev,
+"Cannot modify SR-IOV while VFs are assigned\n");
+   err = -EPERM;
+   } else if (!nr_virtfn) {
+   sriov_disable(dev);
+   err = 0;
+   } else if (!dev->sriov->num_VFs) {
+   err = sriov_enable(dev, nr_virtfn);
+   }
+
+   return err ? err : nr_virtfn;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_simple);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 024a1beda008..f3099e940cda 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1953,6 +1953,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
 #else
@@ -1980,6 +1981,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev 
*dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+/* since this expected to be used as a function pointer just define as NULL */
+#define pci_sriov_configure_simple NULL
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int 
resno)
 { return 0; }
 static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { 
}


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] [pci PATCH v6 0/5] Add support for unmanaged SR-IOV

2018-03-13 Thread Alexander Duyck
This series is meant to add support for SR-IOV on devices when the VFs are
not managed by the kernel. Examples of recent patches attempting to do this
include:
virto - https://patchwork.kernel.org/patch/10241225/
pci-stub - https://patchwork.kernel.org/patch/10109935/
vfio - https://patchwork.kernel.org/patch/10103353/
uio - https://patchwork.kernel.org/patch/9974031/

Since this is quickly blowing up into a multi-driver problem it is probably
best to implement this solution as generically as possible.

This series is an attempt to do that. What we do with this patch set is
provide a generic framework to enable SR-IOV in the case that the PF driver
doesn't support managing the VFs itself.

I based my patch set originally on the patch by Mark Rustad but there isn't
much left after going through and cleaning out the bits that were no longer
needed, and after incorporating the feedback from David Miller. At this point
the only items to be fully reused was his patch description which is now
present in patch 3 of the set.

This solution is limited in scope to just adding support for devices that
provide no functionality for SR-IOV other than allocating the VFs by
calling pci_enable_sriov. Previous sets had included patches for VFIO, but
for now I am dropping that as the scope of that work is larger then I
think I can take on at this time.

v2: Reduced scope back to just virtio_pci and vfio-pci
Broke into 3 patch set from single patch
Changed autoprobe behavior to always set when num_vfs is set non-zero
v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
v4: Dropped vfio-pci patch
Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
Added new patch that enables pci_sriov_configure_simple
Updated drivers to use pci_sriov_configure_simple
v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
Updated drivers to drop "#ifdef" checks for IOV
Added pci-pf-stub as place for PF-only drivers to add support

Cc: Mark Rustad 
Cc: Maximilian Heyne 
Cc: Liang-Min Wang 
Cc: David Woodhouse 

---

Alexander Duyck (5):
  pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
  virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
  ena: Migrate over to unmanaged SR-IOV support
  nvme: Migrate over to unmanaged SR-IOV support
  pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs


 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 
 drivers/nvme/host/pci.c  |   20 -
 drivers/pci/Kconfig  |   12 +
 drivers/pci/Makefile |2 +
 drivers/pci/iov.c|   32 ++
 drivers/pci/pci-pf-stub.c|   60 ++
 drivers/virtio/virtio_pci_common.c   |1 
 include/linux/pci.h  |3 +
 include/linux/pci_ids.h  |2 +
 9 files changed, 114 insertions(+), 46 deletions(-)
 create mode 100644 drivers/pci/pci-pf-stub.c

--

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [PATCH v4 3/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-03-13 Thread Michael S. Tsirkin
On Wed, Mar 07, 2018 at 08:34:24PM +0800, Wei Wang wrote:
> The new feature enables the virtio-balloon device to receive hints of
> guest free pages from the free page vq.
> 
> balloon_free_page_start - start guest free page hint reporting.
> balloon_free_page_stop - stop guest free page hint reporting.
> 
> Note: balloon will report pages which were free at the time
> of this call. As the reporting happens asynchronously, dirty bit logging
> must be enabled before this call is made. Guest reporting must be
> disabled before the migration dirty bitmap is synchronized.

Add this usage documentation in code as well, not just
in the commit log.

Another limitation seems to be that machine needs
to be running. I think ideally you should not teach callers
about this limitation though.

> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> CC: Michael S. Tsirkin 
> CC: Dr. David Alan Gilbert 
> CC: Juan Quintela 

I find it suspicious that neither unrealize nor reset
functions have been touched at all.
Are you sure you have thought through scenarious like
hot-unplug or disabling the device by guest?
  
> ---
>  balloon.c   |  49 +--
>  hw/virtio/virtio-balloon.c  | 183 
> +---
>  include/hw/virtio/virtio-balloon.h  |  15 +-
>  include/standard-headers/linux/virtio_balloon.h |   7 +
>  include/sysemu/balloon.h|  15 +-
>  5 files changed, 240 insertions(+), 29 deletions(-)
> 
> diff --git a/balloon.c b/balloon.c
> index d8dd6fe..b0b0749 100644
> --- a/balloon.c
> +++ b/balloon.c
> @@ -36,6 +36,9 @@
>  
>  static QEMUBalloonEvent *balloon_event_fn;
>  static QEMUBalloonStatus *balloon_stat_fn;
> +static QEMUBalloonFreePageSupport *balloon_free_page_support_fn;
> +static QEMUBalloonFreePageStart *balloon_free_page_start_fn;
> +static QEMUBalloonFreePageStop *balloon_free_page_stop_fn;
>  static void *balloon_opaque;
>  static bool balloon_inhibited;
>  
> @@ -64,19 +67,42 @@ static bool have_balloon(Error **errp)
>  return true;
>  }
>  
> -int qemu_add_balloon_handler(QEMUBalloonEvent *event_func,
> - QEMUBalloonStatus *stat_func, void *opaque)
> +bool balloon_free_page_support(void)
>  {
> -if (balloon_event_fn || balloon_stat_fn || balloon_opaque) {
> -/* We're already registered one balloon handler.  How many can
> - * a guest really have?
> - */
> -return -1;
> +return balloon_free_page_support_fn &&
> +   balloon_free_page_support_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_start(void)
> +{
> +balloon_free_page_start_fn(balloon_opaque);
> +}
> +
> +void balloon_free_page_stop(void)
> +{
> +balloon_free_page_stop_fn(balloon_opaque);
> +}
> +
> +void qemu_add_balloon_handler(QEMUBalloonEvent *event_fn,
> +  QEMUBalloonStatus *stat_fn,
> +  QEMUBalloonFreePageSupport 
> *free_page_support_fn,
> +  QEMUBalloonFreePageStart *free_page_start_fn,
> +  QEMUBalloonFreePageStop *free_page_stop_fn,
> +  void *opaque)
> +{
> +if (balloon_event_fn || balloon_stat_fn || balloon_free_page_support_fn 
> ||
> +balloon_free_page_start_fn || balloon_free_page_stop_fn ||
> +balloon_opaque) {
> +/* We already registered one balloon handler. */
> +return;
>  }
> -balloon_event_fn = event_func;
> -balloon_stat_fn = stat_func;
> +
> +balloon_event_fn = event_fn;
> +balloon_stat_fn = stat_fn;
> +balloon_free_page_support_fn = free_page_support_fn;
> +balloon_free_page_start_fn = free_page_start_fn;
> +balloon_free_page_stop_fn = free_page_stop_fn;
>  balloon_opaque = opaque;
> -return 0;
>  }
>  
>  void qemu_remove_balloon_handler(void *opaque)
> @@ -86,6 +112,9 @@ void qemu_remove_balloon_handler(void *opaque)
>  }
>  balloon_event_fn = NULL;
>  balloon_stat_fn = NULL;
> +balloon_free_page_support_fn = NULL;
> +balloon_free_page_start_fn = NULL;
> +balloon_free_page_stop_fn = NULL;
>  balloon_opaque = NULL;
>  }
>  
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 4822449..48ed2ec 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -31,6 +31,7 @@
>  
>  #include "hw/virtio/virtio-bus.h"
>  #include "hw/virtio/virtio-access.h"
> +#include "migration/misc.h"
>  
>  #define BALLOON_PAGE_SIZE  (1 << VIRTIO_BALLOON_PFN_SHIFT)
>  
> @@ -308,6 +309,111 @@ out:
>  }
>  }
>  
> +static void *virtio_balloon_poll_free_page_hints(void *opaque)
> +{
> +VirtQueueElement *elem;
> +VirtIOBalloon *dev = opaque;
> +VirtQueue *vq = dev->free_page_vq;
> +uint32_t id;
> +size_t size;

What makes it safe to poke at 

[virtio-dev] Re: [PATCH v4 4/4] migration: use the free page hint feature from balloon

2018-03-13 Thread Michael S. Tsirkin
On Wed, Mar 07, 2018 at 08:34:25PM +0800, Wei Wang wrote:
> Start the free page optimization after the migration bitmap is
> synchronized. This can't be used in the stop phase since the guest
> is paused. Make sure the guest reporting has stopped before
> synchronizing the migration dirty bitmap. Currently, the optimization is
> added to precopy only.
> 
> Signed-off-by: Wei Wang 
> CC: Dr. David Alan Gilbert 
> CC: Juan Quintela 
> CC: Michael S. Tsirkin 
> ---
>  migration/ram.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index e172798..7b4c9b1 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -51,6 +51,8 @@
>  #include "qemu/rcu_queue.h"
>  #include "migration/colo.h"
>  #include "migration/block.h"
> +#include "sysemu/balloon.h"
> +#include "sysemu/sysemu.h"
>  
>  /***/
>  /* ram save/restore */
> @@ -208,6 +210,8 @@ struct RAMState {
>  uint32_t last_version;
>  /* We are in the first round */
>  bool ram_bulk_stage;
> +/* The free pages optimization feature is supported */
> +bool free_page_support;
>  /* How many times we have dirty too many pages */
>  int dirty_rate_high_cnt;
>  /* these variables are used for bitmap sync */
> @@ -775,7 +779,7 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, 
> RAMBlock *rb,
>  unsigned long *bitmap = rb->bmap;
>  unsigned long next;
>  
> -if (rs->ram_bulk_stage && start > 0) {
> +if (rs->ram_bulk_stage && start > 0 && !rs->free_page_support) {
>  next = start + 1;
>  } else {
>  next = find_next_bit(bitmap, size, start);
> @@ -833,6 +837,10 @@ static void migration_bitmap_sync(RAMState *rs)
>  int64_t end_time;
>  uint64_t bytes_xfer_now;
>  
> +if (rs->free_page_support) {
> +balloon_free_page_stop();
> +}
> +
>  ram_counters.dirty_sync_count++;
>  
>  if (!rs->time_last_bitmap_sync) {
> @@ -899,6 +907,10 @@ static void migration_bitmap_sync(RAMState *rs)
>  if (migrate_use_events()) {
>  qapi_event_send_migration_pass(ram_counters.dirty_sync_count, NULL);
>  }
> +
> +if (rs->free_page_support && runstate_is_running()) {
> +balloon_free_page_start();
> +}
>  }

I think some of these conditions should go into
balloon_free_page_start/stop.

Checking runstate is generally problematic unless you
also handle run state change notifiers as it can
be manipulated from QMP.

>  
>  /**
> @@ -1656,6 +1668,8 @@ static void ram_state_reset(RAMState *rs)
>  rs->last_page = 0;
>  rs->last_version = ram_list.version;
>  rs->ram_bulk_stage = true;
> +rs->free_page_support = balloon_free_page_support() &
> +!migration_in_postcopy();

Probably &&?

>  }
>  
>  #define MAX_WAIT 50 /* ms, half buffered_file limit */
> @@ -2330,6 +2344,9 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
>  
>  ret = qemu_file_get_error(f);
>  if (ret < 0) {
> +if (rs->free_page_support) {
> +balloon_free_page_stop();
> +}
>  return ret;
>  }
>  
> -- 
> 1.8.3.1

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org



[virtio-dev] Re: [pci PATCH v5 3/4] ena: Migrate over to unmanaged SR-IOV support

2018-03-13 Thread Alexander Duyck
On Tue, Mar 13, 2018 at 1:12 AM, David Woodhouse  wrote:
> On Mon, 2018-03-12 at 10:23 -0700, Alexander Duyck wrote:
>>
>> -   .sriov_configure = ena_sriov_configure,
>> +#ifdef CONFIG_PCI_IOV
>> +   .sriov_configure = pci_sriov_configure_simple,
>> +#endif
>>  };
>
> I'd like to see that ifdef go away, as discussed. I agree that just
> #define pci_sriov_configure_simple NULL
> should suffice. As Christoph points out, it's not going to compile if
> people try to just invoke it directly.
>
> I'd also *really* like to see a way to enable this for PFs which don't
> have (and don't need) a driver. We seem to have lost that along the
> way.

Actually the suggestion I had from Don Dutile was that we should be
looking at creating a pci-stub like driver specifically for those type
of devices, but without the ability to arbitrarily assign devices.
Basically we have to white-list it in one device at a time for those
kind of things.

If you have the device ID of the thing you wanted to have work with
pci-stub before I could look at putting together a quick driver and
adding it to this set.

Thanks.

- Alex

-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org