Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Siwei Liu
On Tue, Jan 23, 2018 at 2:58 PM, Michael S. Tsirkin  wrote:
> On Tue, Jan 23, 2018 at 12:24:47PM -0800, Siwei Liu wrote:
>> On Mon, Jan 22, 2018 at 1:41 PM, Michael S. Tsirkin  wrote:
>> > On Mon, Jan 22, 2018 at 12:27:14PM -0800, Siwei Liu wrote:
>> >> First off, as mentioned in another thread, the model of stacking up
>> >> virt-bond functionality over virtio seems a wrong direction to me.
>> >> Essentially the migration process would need to carry over all guest
>> >> side configurations previously done on the VF/PT and get them moved to
>> >> the new device being it virtio or VF/PT.
>> >
>> > I might be wrong but I don't see why we should worry about this usecase.
>> > Whoever has a bond configured already has working config for migration.
>> > We are trying to help people who don't, not convert existig users.
>>
>> That has been placed in the view of cloud providers that the imported
>> images from the store must be able to run unmodified thus no
>> additional setup script is allowed (just as Stephen mentioned in
>> another mail). Cloud users don't care about live migration themselves
>> but the providers are required to implement such automation mechanism
>> to make this process transparent if at all possible. The user does not
>> care about the device underneath being VF or not, but they do care
>> about consistency all across and the resulting performance
>> acceleration in making VF the prefered datapath. It is not quite
>> peculiar user cases but IMHO *any* approach proposed for live
>> migration should be able to persist the state including network config
>> e.g. as simple as MTU. Actually this requirement has nothing to do
>> with virtio but our target users are live migration agnostic, being it
>> tracking DMA through dirty pages, using virtio as the helper, or
>> whatsoever, the goal of persisting configs across remains same.
>
> So the patching being discussed here will mostly do exactly that if your
> original config was simply a single virtio net device.
>

True, but I don't see the patch being discussed starts with good
foundation of supporting the same for VF/PT device. That is the core
of the issue.

>
> What kind of configs do your users have right now?

Any configs be it generic or driver specific that the VF/PT device
supports and have been enabled/configured. General network configs
(MAC, IP address, VLAN, MTU, iptables rules), ethtool settings
(hardware offload, # of queues and ring entris, RSC options, rss
rxfh-indir table, rx-flow-hash, et al) , bpf/XDP program being run, tc
flower offload, just to name a few. As cloud providers we don't limit
users from applying driver specific tuning to the NIC/VF, and
sometimes this is essential to achieving best performance for their
workload. We've seen cases like tuning coalescing parameters for
getting low latency, changing rx-flow-hash function for better VXLAN
throughput, or even adopting quite advanced NIC features such as flow
director or cloud filter. We don't expect users to compromise even a
little bit on these. That is once we turn on live migration for the VF
or pass through devices in the VM, it all takes place under the hood,
users (guest admins, applications) don't have to react upon it or even
notice the change. I should note that the majority of live migrations
take place between machines with completely identical hardware, it's
more critical than necessary to keep the config as-is across the move,
stealth while quiet.

As you see generic bond or bridge cannot suffice the need. That's why
we need a new customized virt bond driver, and tailor it for VM live
migration specifically. Leveraging para-virtual e.g. virtio net device
as the backup path is one thing, tracking through driver config
changes in order to re-config as necessary is another. I would think
without considering the latter, the proposal being discussed is rather
incomplete, and remote to be useful in production.

>
>
>> >
>> >> Without the help of a new
>> >> upper layer bond driver that enslaves virtio and VF/PT devices
>> >> underneath, virtio will be overloaded with too much specifics being a
>> >> VF/PT backup in the future.
>> >
>> > So this paragraph already includes at least two conflicting
>> > proposals. On the one hand you want a separate device for
>> > the virtual bond, on the other you are saying a separate
>> > driver.
>>
>> Just to be crystal clear: separate virtual bond device (netdev ops,
>> not necessarily bus device) for VM migration specifically with a
>> separate driver.
>
> Okay, but note that any config someone had on a virtio device won't
> propagate to that bond.
>
>> >
>> > Further, the reason to have a separate *driver* was that
>> > some people wanted to share code with netvsc - and that
>> > one does not create a separate device, which you can't
>> > change without breaking existing configs.
>>
>> I'm not sure I understand this statement. netvsc is already another
>> netdev being created than the enslaved VF netdev, why it bothers?
>

Re: [virtio-dev] Re: [PATCH v25 2/2] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-01-26 Thread Tetsuo Handa
On 2018/01/26 12:31, Wei Wang wrote:
> On 01/26/2018 10:42 AM, Michael S. Tsirkin wrote:
>> On Fri, Jan 26, 2018 at 09:40:44AM +0800, Wei Wang wrote:
>>> On 01/25/2018 09:49 PM, Michael S. Tsirkin wrote:
 On Thu, Jan 25, 2018 at 05:14:06PM +0800, Wei Wang wrote:

> 
>>> The controversy is that the free list is not static
>>> once the lock is dropped, so everything is dynamically changing, including
>>> the state that was recorded. The method we are using is more prudent, IMHO.
>>> How about taking the fundamental solution, and seek to improve incrementally
>>> in the future?
>>>
>>>
>>> Best,
>>> Wei
>> I'd like to see kicks happen outside the spinlock. kick with a spinlock
>> taken looks like a scalability issue that won't be easy to
>> reproduce but hurt workloads at random unexpected times.
>>
> 
> Is that "kick inside the spinlock" the only concern you have? I think we can 
> remove the kick actually. If we check how the host side works, it is 
> worthwhile to let the host poll the virtqueue after it receives the cmd id 
> from the guest (kick for cmd id isn't within the lock).

We should start from the worst case.

+ * The callback itself must not sleep or perform any operations which would
+ * require any memory allocations directly (not even GFP_NOWAIT/GFP_ATOMIC)
+ * or via any lock dependency. It is generally advisable to implement
+ * the callback as simple as possible and defer any heavy lifting to a
+ * different context.

Making decision based on performance numbers of idle guests is dangerous.
There might be busy CPUs waiting for zone->lock.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 12/12] tools/virtio: fix smp_mb on x86

2018-01-26 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 11:56:14AM +0800, Jason Wang wrote:
> 
> 
> On 2018年01月26日 07:36, Michael S. Tsirkin wrote:
> > Offset 128 overlaps the last word of the redzone.
> > Use 132 which is always beyond that.
> > 
> > Signed-off-by: Michael S. Tsirkin 
> > ---
> >   tools/virtio/ringtest/main.h | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/tools/virtio/ringtest/main.h b/tools/virtio/ringtest/main.h
> > index 593a328..301d59b 100644
> > --- a/tools/virtio/ringtest/main.h
> > +++ b/tools/virtio/ringtest/main.h
> > @@ -111,7 +111,7 @@ static inline void busy_wait(void)
> >   }
> >   #if defined(__x86_64__) || defined(__i386__)
> > -#define smp_mb() asm volatile("lock; addl $0,-128(%%rsp)" ::: 
> > "memory", "cc")
> 
> Just wonder did "rsp" work for __i386__ ?
> 
> Thanks

Oh you are right of course. Probably no one ever run this one on i386 :)
I'll add a patch on top as this is not a new bug.

> > +#define smp_mb() asm volatile("lock; addl $0,-132(%%rsp)" ::: 
> > "memory", "cc")
> >   #else
> >   /*
> >* Not using __ATOMIC_SEQ_CST since gcc docs say they are only 
> > synchronized
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v3 1/2] drm/virtio: Add window server support

2018-01-26 Thread Tomeu Vizoso
This is to allow clients running within VMs to be able to communicate
with a compositor in the host. Clients will use the communication
protocol that the compositor supports, and virtio-gpu will assist with
making buffers available in both sides, and copying content as needed.

It is expected that a service in the guest will act as a proxy,
interacting with virtio-gpu to support unmodified clients. For some
features of the protocol, the hypervisor might have to intervene and
also parse the protocol data to properly bridge resources. The following
IOCTLs have been added to this effect:

*_WINSRV_CONNECT: Opens a connection to the compositor in the host,
returns a FD that represents this connection and on which the following
IOCTLs can be used. Callers are expected to poll this FD for new
messages from the compositor.

*_WINSRV_TX: Asks the hypervisor to forward a message to the compositor

*_WINSRV_RX: Returns all queued messages

Alongside protocol data that is opaque to the kernel, the client can
send file descriptors that reference GEM buffers allocated by
virtio-gpu. The guest proxy is expected to figure out when a client is
passing a FD that refers to shared memory in the guest and allocate a
virtio-gpu buffer of the same size with DRM_VIRTGPU_RESOURCE_CREATE.

When the client notifies the compositor that it can read from that buffer,
the proxy should copy the contents from the SHM region to the virtio-gpu
resource and call DRM_VIRTGPU_TRANSFER_TO_HOST.

This has been tested with Wayland clients that make use of wl_shm to
pass buffers to the compositor, but is expected to work similarly for X
clients that make use of MIT-SHM with FD passing.

v2: * Add padding to two virtio command structs
* Properly cast two __user pointers (kbuild test robot)

v3: * Handle absence of winsrv support in QEMU (Dave Airlie)

Signed-off-by: Tomeu Vizoso 
---

 drivers/gpu/drm/virtio/virtgpu_drv.c   |   1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  39 -
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 165 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c   |  66 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c| 285 -
 include/uapi/drm/virtgpu_drm.h |  29 
 include/uapi/linux/virtio_gpu.h|  43 +
 7 files changed, 613 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index 49a3d8d5a249..a528ddedd09f 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -79,6 +79,7 @@ static unsigned int features[] = {
 */
VIRTIO_GPU_F_VIRGL,
 #endif
+   VIRTIO_GPU_F_WINSRV,
 };
 static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index da2fb585fea4..268b386e1232 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -178,6 +178,8 @@ struct virtio_gpu_device {
 
struct virtio_gpu_queue ctrlq;
struct virtio_gpu_queue cursorq;
+   struct virtio_gpu_queue winsrv_rxq;
+   struct virtio_gpu_queue winsrv_txq;
struct kmem_cache *vbufs;
bool vqs_ready;
 
@@ -205,10 +207,32 @@ struct virtio_gpu_device {
 
 struct virtio_gpu_fpriv {
uint32_t ctx_id;
+
+   struct list_head winsrv_conns; /* list of virtio_gpu_winsrv_conn */
+   spinlock_t winsrv_lock;
+};
+
+struct virtio_gpu_winsrv_rx_qentry {
+   struct virtio_gpu_winsrv_rx *cmd;
+   struct list_head next;
+};
+
+struct virtio_gpu_winsrv_conn {
+   struct virtio_gpu_device *vgdev;
+
+   spinlock_t lock;
+
+   int fd;
+   struct drm_file *drm_file;
+
+   struct list_head cmdq; /* queue of virtio_gpu_winsrv_rx_qentry */
+   wait_queue_head_t cmdwq;
+
+   struct list_head next;
 };
 
 /* virtio_ioctl.c */
-#define DRM_VIRTIO_NUM_IOCTLS 10
+#define DRM_VIRTIO_NUM_IOCTLS 11
 extern struct drm_ioctl_desc virtio_gpu_ioctls[DRM_VIRTIO_NUM_IOCTLS];
 
 /* virtio_kms.c */
@@ -318,9 +342,22 @@ virtio_gpu_cmd_resource_create_3d(struct virtio_gpu_device 
*vgdev,
 void virtio_gpu_ctrl_ack(struct virtqueue *vq);
 void virtio_gpu_cursor_ack(struct virtqueue *vq);
 void virtio_gpu_fence_ack(struct virtqueue *vq);
+void virtio_gpu_winsrv_tx_ack(struct virtqueue *vq);
+void virtio_gpu_winsrv_rx_read(struct virtqueue *vq);
 void virtio_gpu_dequeue_ctrl_func(struct work_struct *work);
 void virtio_gpu_dequeue_cursor_func(struct work_struct *work);
+void virtio_gpu_dequeue_winsrv_rx_func(struct work_struct *work);
+void virtio_gpu_dequeue_winsrv_tx_func(struct work_struct *work);
 void virtio_gpu_dequeue_fence_func(struct work_struct *work);
+void virtio_gpu_fill_winsrv_rx(struct virtio_gpu_device *vgdev);
+void virtio_gpu_queue_winsrv_rx_in(struct virtio_gpu_device *vgdev,
+  struct virtio_gpu_winsrv_rx *cmd);
+int virtio_gpu_cmd_winsrv_connect

[PATCH v3 2/2] drm/virtio: Handle buffers from the compositor

2018-01-26 Thread Tomeu Vizoso
When retrieving queued messages from the compositor in the host for
clients in the guest, handle buffers that may be passed.

These buffers should have been mapped to the guest's address space, for
example via the KVM_SET_USER_MEMORY_REGION ioctl.

Signed-off-by: Tomeu Vizoso 
---

 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 54 ++
 1 file changed, 54 insertions(+)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ioctl.c 
b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
index d4230b1fa91d..57b1ad51d251 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ioctl.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ioctl.c
@@ -545,14 +545,58 @@ static unsigned int winsrv_poll(struct file *filp,
return mask;
 }
 
+struct virtio_gpu_winsrv_region {
+   uint64_t pfn;
+   size_t size;
+};
+
+static int winsrv_fd_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+   struct virtio_gpu_winsrv_region *region = filp->private_data;
+   unsigned long vm_size = vma->vm_end - vma->vm_start;
+   int ret = 0;
+
+   if (vm_size +
+   (vma->vm_pgoff << PAGE_SHIFT) > PAGE_ALIGN(region->size))
+   return -EINVAL;
+
+   ret = io_remap_pfn_range(vma, vma->vm_start, region->pfn, vm_size,
+vma->vm_page_prot);
+   if (ret)
+   return ret;
+
+   vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+
+   return ret;
+}
+
+static int winsrv_fd_release(struct inode *inodep, struct file *filp)
+{
+   struct virtio_gpu_winsrv_region *region = filp->private_data;
+
+   kfree(region);
+
+   return 0;
+}
+
+static const struct file_operations winsrv_fd_fops = {
+   .mmap = winsrv_fd_mmap,
+   .release = winsrv_fd_release,
+};
+
 static int winsrv_ioctl_rx(struct virtio_gpu_device *vgdev,
   struct virtio_gpu_winsrv_conn *conn,
   struct drm_virtgpu_winsrv *cmd)
 {
struct virtio_gpu_winsrv_rx_qentry *qentry, *tmp;
struct virtio_gpu_winsrv_rx *virtio_cmd;
+   struct virtio_gpu_winsrv_region *region;
int available_len = cmd->len;
int read_count = 0;
+   int i;
+
+   for (i = 0; i < VIRTGPU_WINSRV_MAX_ALLOCS; i++)
+   cmd->fds[i] = -1;
 
list_for_each_entry_safe(qentry, tmp, &conn->cmdq, next) {
virtio_cmd = qentry->cmd;
@@ -567,6 +611,16 @@ static int winsrv_ioctl_rx(struct virtio_gpu_device *vgdev,
return -EFAULT;
}
 
+   for (i = 0; virtio_cmd->pfns[i]; i++) {
+   region = kmalloc(sizeof(*region), GFP_KERNEL);
+   region->pfn = virtio_cmd->pfns[i];
+   region->size = virtio_cmd->lens[i];
+   cmd->fds[i] = anon_inode_getfd("[winsrv_fd]",
+  &winsrv_fd_fops,
+  region,
+  O_CLOEXEC | O_RDWR);
+   }
+
available_len -= virtio_cmd->len;
read_count += virtio_cmd->len;
 
-- 
2.14.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v3 0/2] drm/virtio: Add window server support

2018-01-26 Thread Tomeu Vizoso
Hi,

this work is based on the virtio_wl driver in the ChromeOS kernel by
Zach Reizner, currently at:

https://chromium.googlesource.com/chromiumos/third_party/kernel/+/chromeos-4.4/drivers/virtio/virtio_wl.c

There's one feature missing currently, which is letting clients write
directly to the host part of a resource, so the extra copy in
TRANSFER_TO_HOST isn't needed.

Have pushed the QEMU counterpart to this branch, though it isn't as
polished atm:

https://gitlab.collabora.com/tomeu/qemu/commits/winsrv-wip

Thanks,

Tomeu


Tomeu Vizoso (2):
  drm/virtio: Add window server support
  drm/virtio: Handle buffers from the compositor

 drivers/gpu/drm/virtio/virtgpu_drv.c   |   1 +
 drivers/gpu/drm/virtio/virtgpu_drv.h   |  39 -
 drivers/gpu/drm/virtio/virtgpu_ioctl.c | 219 +
 drivers/gpu/drm/virtio/virtgpu_kms.c   |  66 ++--
 drivers/gpu/drm/virtio/virtgpu_vq.c| 285 -
 include/uapi/drm/virtgpu_drm.h |  29 
 include/uapi/linux/virtio_gpu.h|  43 +
 7 files changed, 667 insertions(+), 15 deletions(-)

-- 
2.14.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v24 1/2] mm: support reporting free page blocks

2018-01-26 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 11:29:15AM +0800, Wei Wang wrote:
> On 01/25/2018 09:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 24, 2018 at 06:42:41PM +0800, Wei Wang wrote:
> > > This patch adds support to walk through the free page blocks in the
> > > system and report them via a callback function. Some page blocks may
> > > leave the free list after zone->lock is released, so it is the caller's
> > > responsibility to either detect or prevent the use of such pages.
> > > 
> > > One use example of this patch is to accelerate live migration by skipping
> > > the transfer of free pages reported from the guest. A popular method used
> > > by the hypervisor to track which part of memory is written during live
> > > migration is to write-protect all the guest memory. So, those pages that
> > > are reported as free pages but are written after the report function
> > > returns will be captured by the hypervisor, and they will be added to the
> > > next round of memory transfer.
> > > 
> > > Signed-off-by: Wei Wang 
> > > Signed-off-by: Liang Li 
> > > Cc: Michal Hocko 
> > > Cc: Michael S. Tsirkin 
> > > Acked-by: Michal Hocko 
> > > ---
> > >   include/linux/mm.h |  6 
> > >   mm/page_alloc.c| 91 
> > > ++
> > >   2 files changed, 97 insertions(+)
> > > 
> > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > index ea818ff..b3077dd 100644
> > > --- a/include/linux/mm.h
> > > +++ b/include/linux/mm.h
> > > @@ -1938,6 +1938,12 @@ extern void free_area_init_node(int nid, unsigned 
> > > long * zones_size,
> > >   unsigned long zone_start_pfn, unsigned long 
> > > *zholes_size);
> > >   extern void free_initmem(void);
> > > +extern void walk_free_mem_block(void *opaque,
> > > + int min_order,
> > > + bool (*report_pfn_range)(void *opaque,
> > > +  unsigned long pfn,
> > > +  unsigned long num));
> > > +
> > >   /*
> > >* Free reserved pages within range [PAGE_ALIGN(start), end & PAGE_MASK)
> > >* into the buddy system. The freed pages will be poisoned with pattern
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 76c9688..705de22 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4899,6 +4899,97 @@ void show_free_areas(unsigned int filter, 
> > > nodemask_t *nodemask)
> > >   show_swap_cache_info();
> > >   }
> > > +/*
> > > + * Walk through a free page list and report the found pfn range via the
> > > + * callback.
> > > + *
> > > + * Return false if the callback requests to stop reporting. Otherwise,
> > > + * return true.
> > > + */
> > > +static bool walk_free_page_list(void *opaque,
> > > + struct zone *zone,
> > > + int order,
> > > + enum migratetype mt,
> > > + bool (*report_pfn_range)(void *,
> > > +  unsigned long,
> > > +  unsigned long))
> > > +{
> > > + struct page *page;
> > > + struct list_head *list;
> > > + unsigned long pfn, flags;
> > > + bool ret;
> > > +
> > > + spin_lock_irqsave(&zone->lock, flags);
> > > + list = &zone->free_area[order].free_list[mt];
> > > + list_for_each_entry(page, list, lru) {
> > > + pfn = page_to_pfn(page);
> > > + ret = report_pfn_range(opaque, pfn, 1 << order);
> > > + if (!ret)
> > > + break;
> > > + }
> > > + spin_unlock_irqrestore(&zone->lock, flags);
> > > +
> > > + return ret;
> > > +}
> > There are two issues with this API. One is that it is not
> > restarteable: if you return false, you start from the
> > beginning. So no way to drop lock, do something slow
> > and then proceed.
> > 
> > Another is that you are using it to report free page hints. Presumably
> > the point is to drop these pages - keeping them near head of the list
> > and reusing the reported ones will just make everything slower
> > invalidating the hint.
> > 
> > How about rotating these pages towards the end of the list?
> > Probably not on each call, callect reported pages and then
> > move them to tail when we exit.
> 
> 
> I'm not sure how this would help. For example, we have a list of 2M free
> page blocks:
> A-->B-->C-->D-->E-->F-->G--H
> 
> After reporting A and B, and put them to the end and exit, when the caller
> comes back,
> 1) if the list remains unchanged, then it will be
> C-->D-->E-->F-->G-->H-->A-->B

Right. So here we can just scan until we see A, right?  It's a harder
question what to do if A and only A has been consumed.  We don't want B
to be sent twice ideally. OTOH maybe that isn't a big deal if it's only
twice. Host might know page is already gone - how about host gives us a
hint after using the buffer?

> 2) If worse, all the blocks have been split into smaller blo

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Samudrala, Sridhar



On 1/26/2018 12:14 AM, Siwei Liu wrote:

On Tue, Jan 23, 2018 at 2:58 PM, Michael S. Tsirkin  wrote:

On Tue, Jan 23, 2018 at 12:24:47PM -0800, Siwei Liu wrote:

On Mon, Jan 22, 2018 at 1:41 PM, Michael S. Tsirkin  wrote:

On Mon, Jan 22, 2018 at 12:27:14PM -0800, Siwei Liu wrote:

First off, as mentioned in another thread, the model of stacking up
virt-bond functionality over virtio seems a wrong direction to me.
Essentially the migration process would need to carry over all guest
side configurations previously done on the VF/PT and get them moved to
the new device being it virtio or VF/PT.

I might be wrong but I don't see why we should worry about this usecase.
Whoever has a bond configured already has working config for migration.
We are trying to help people who don't, not convert existig users.

That has been placed in the view of cloud providers that the imported
images from the store must be able to run unmodified thus no
additional setup script is allowed (just as Stephen mentioned in
another mail). Cloud users don't care about live migration themselves
but the providers are required to implement such automation mechanism
to make this process transparent if at all possible. The user does not
care about the device underneath being VF or not, but they do care
about consistency all across and the resulting performance
acceleration in making VF the prefered datapath. It is not quite
peculiar user cases but IMHO *any* approach proposed for live
migration should be able to persist the state including network config
e.g. as simple as MTU. Actually this requirement has nothing to do
with virtio but our target users are live migration agnostic, being it
tracking DMA through dirty pages, using virtio as the helper, or
whatsoever, the goal of persisting configs across remains same.

So the patching being discussed here will mostly do exactly that if your
original config was simply a single virtio net device.


True, but I don't see the patch being discussed starts with good
foundation of supporting the same for VF/PT device. That is the core
of the issue.



What kind of configs do your users have right now?

Any configs be it generic or driver specific that the VF/PT device
supports and have been enabled/configured. General network configs
(MAC, IP address, VLAN, MTU, iptables rules), ethtool settings
(hardware offload, # of queues and ring entris, RSC options, rss
rxfh-indir table, rx-flow-hash, et al) , bpf/XDP program being run, tc
flower offload, just to name a few. As cloud providers we don't limit
users from applying driver specific tuning to the NIC/VF, and
sometimes this is essential to achieving best performance for their
workload. We've seen cases like tuning coalescing parameters for
getting low latency, changing rx-flow-hash function for better VXLAN
throughput, or even adopting quite advanced NIC features such as flow
director or cloud filter. We don't expect users to compromise even a
little bit on these. That is once we turn on live migration for the VF
or pass through devices in the VM, it all takes place under the hood,
users (guest admins, applications) don't have to react upon it or even
notice the change. I should note that the majority of live migrations
take place between machines with completely identical hardware, it's
more critical than necessary to keep the config as-is across the move,
stealth while quiet.


This usecase is much more complicated and different than what this patch 
is trying
to address.  Also your usecase seems to be assuming that source and 
destination

hosts are identical and have the same HW.

It makes it pretty hard to transparently migrate all these settings with 
live
migration when we are looking at a solution that unplugs the VF 
interface from

the host and the VF driver is unloaded.


As you see generic bond or bridge cannot suffice the need. That's why
we need a new customized virt bond driver, and tailor it for VM live
migration specifically. Leveraging para-virtual e.g. virtio net device
as the backup path is one thing, tracking through driver config
changes in order to re-config as necessary is another. I would think
without considering the latter, the proposal being discussed is rather
incomplete, and remote to be useful in production.




Without the help of a new
upper layer bond driver that enslaves virtio and VF/PT devices
underneath, virtio will be overloaded with too much specifics being a
VF/PT backup in the future.

So this paragraph already includes at least two conflicting
proposals. On the one hand you want a separate device for
the virtual bond, on the other you are saying a separate
driver.

Just to be crystal clear: separate virtual bond device (netdev ops,
not necessarily bus device) for VM migration specifically with a
separate driver.

Okay, but note that any config someone had on a virtio device won't
propagate to that bond.


Further, the reason to have a separate *driver* was that
some people wanted to share code with netvsc -

Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Michael S. Tsirkin
On Thu, Jan 11, 2018 at 09:58:39PM -0800, Sridhar Samudrala wrote:
> @@ -2859,6 +3123,42 @@ static struct virtio_driver virtio_net_driver = {
>  #endif
>  };
>  
> +static int virtio_netdev_event(struct notifier_block *this,
> +unsigned long event, void *ptr)
> +{
> + struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
> +
> + /* Skip our own events */
> + if (event_dev->netdev_ops == &virtnet_netdev)
> + return NOTIFY_DONE;
> +
> + /* Avoid non-Ethernet type devices */
> + if (event_dev->type != ARPHRD_ETHER)
> + return NOTIFY_DONE;
> +
> + /* Avoid Vlan dev with same MAC registering as VF */
> + if (is_vlan_dev(event_dev))
> + return NOTIFY_DONE;
> +
> + /* Avoid Bonding master dev with same MAC registering as VF */
> + if ((event_dev->priv_flags & IFF_BONDING) &&
> + (event_dev->flags & IFF_MASTER))
> + return NOTIFY_DONE;
> +
> + switch (event) {
> + case NETDEV_REGISTER:
> + return virtnet_register_vf(event_dev);
> + case NETDEV_UNREGISTER:
> + return virtnet_unregister_vf(event_dev);
> + default:
> + return NOTIFY_DONE;
> + }
> +}
> +
> +static struct notifier_block virtio_netdev_notifier = {
> + .notifier_call = virtio_netdev_event,
> +};
> +
>  static __init int virtio_net_driver_init(void)
>  {
>   int ret;
> @@ -2877,6 +3177,8 @@ static __init int virtio_net_driver_init(void)
>  ret = register_virtio_driver(&virtio_net_driver);
>   if (ret)
>   goto err_virtio;
> +
> + register_netdevice_notifier(&virtio_netdev_notifier);
>   return 0;
>  err_virtio:
>   cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
> @@ -2889,6 +3191,7 @@ module_init(virtio_net_driver_init);
>  
>  static __exit void virtio_net_driver_exit(void)
>  {
> + unregister_netdevice_notifier(&virtio_netdev_notifier);
>   unregister_virtio_driver(&virtio_net_driver);
>   cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
>   cpuhp_remove_multi_state(virtionet_online);

I have a question here: what if PT device driver module loads
and creates a device before virtio?


> -- 
> 2.14.3
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Samudrala, Sridhar



On 1/26/2018 8:58 AM, Michael S. Tsirkin wrote:

On Thu, Jan 11, 2018 at 09:58:39PM -0800, Sridhar Samudrala wrote:

@@ -2859,6 +3123,42 @@ static struct virtio_driver virtio_net_driver = {
  #endif
  };
  
+static int virtio_netdev_event(struct notifier_block *this,

+  unsigned long event, void *ptr)
+{
+   struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
+
+   /* Skip our own events */
+   if (event_dev->netdev_ops == &virtnet_netdev)
+   return NOTIFY_DONE;
+
+   /* Avoid non-Ethernet type devices */
+   if (event_dev->type != ARPHRD_ETHER)
+   return NOTIFY_DONE;
+
+   /* Avoid Vlan dev with same MAC registering as VF */
+   if (is_vlan_dev(event_dev))
+   return NOTIFY_DONE;
+
+   /* Avoid Bonding master dev with same MAC registering as VF */
+   if ((event_dev->priv_flags & IFF_BONDING) &&
+   (event_dev->flags & IFF_MASTER))
+   return NOTIFY_DONE;
+
+   switch (event) {
+   case NETDEV_REGISTER:
+   return virtnet_register_vf(event_dev);
+   case NETDEV_UNREGISTER:
+   return virtnet_unregister_vf(event_dev);
+   default:
+   return NOTIFY_DONE;
+   }
+}
+
+static struct notifier_block virtio_netdev_notifier = {
+   .notifier_call = virtio_netdev_event,
+};
+
  static __init int virtio_net_driver_init(void)
  {
int ret;
@@ -2877,6 +3177,8 @@ static __init int virtio_net_driver_init(void)
  ret = register_virtio_driver(&virtio_net_driver);
if (ret)
goto err_virtio;
+
+   register_netdevice_notifier(&virtio_netdev_notifier);
return 0;
  err_virtio:
cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
@@ -2889,6 +3191,7 @@ module_init(virtio_net_driver_init);
  
  static __exit void virtio_net_driver_exit(void)

  {
+   unregister_netdevice_notifier(&virtio_netdev_notifier);
unregister_virtio_driver(&virtio_net_driver);
cpuhp_remove_multi_state(CPUHP_VIRT_NET_DEAD);
cpuhp_remove_multi_state(virtionet_online);

I have a question here: what if PT device driver module loads
and creates a device before virtio?


Initially i also had this question if we get NETDEV_REGISTER events for 
netdevs
that are already present. But it looks like 
register_netdevice_notifier() will cause

replay of all the registration and up events of existing devices.

/**
 * register_netdevice_notifier - register a network notifier block
 * @nb: notifier
 *
 * Register a notifier to be called when network device events occur.
 * The notifier passed is linked into the kernel structures and must
 * not be reused until it has been unregistered. A negative errno code
 * is returned on a failure.
 *
 * When registered all registration and up events are replayed
 * to the new notifier to allow device to have a race free
 * view of the network device list.
 */

Thanks
Sridhar
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH v2 1/6] Documentation: crypto: document crypto engine API

2018-01-26 Thread Corentin Labbe
Signed-off-by: Corentin Labbe 
---
 Documentation/crypto/crypto_engine.rst | 48 ++
 1 file changed, 48 insertions(+)
 create mode 100644 Documentation/crypto/crypto_engine.rst

diff --git a/Documentation/crypto/crypto_engine.rst 
b/Documentation/crypto/crypto_engine.rst
new file mode 100644
index ..8272ac92a14f
--- /dev/null
+++ b/Documentation/crypto/crypto_engine.rst
@@ -0,0 +1,48 @@
+=
+CRYPTO ENGINE
+=
+
+Overview
+
+The crypto engine API (CE), is a crypto queue manager.
+
+Requirement
+---
+You have to put at start of your tfm_ctx the struct crypto_engine_ctx
+struct your_tfm_ctx {
+struct crypto_engine_ctx enginectx;
+...
+};
+Why: Since CE manage only crypto_async_request, it cannot know the underlying
+request_type and so have access only on the TFM.
+So using container_of for accessing __ctx is impossible.
+Furthermore, the crypto engine cannot know the "struct your_tfm_ctx",
+so it must assume that crypto_engine_ctx is at start of it.
+
+Order of operations
+---
+You have to obtain a struct crypto_engine via crypto_engine_alloc_init().
+And start it via crypto_engine_start().
+
+Before transferring any request, you have to fill the enginectx.
+- prepare_request: (taking a function pointer) If you need to do some 
processing before doing the request
+- unprepare_request: (taking a function pointer) Undoing what's done in 
prepare_request
+- do_one_request: (taking a function pointer) Do encryption for current request
+
+Note: that those three functions get the crypto_async_request associated with 
the received request.
+So your need to get the original request via container_of(areq, struct 
yourrequesttype_request, base);
+
+When your driver receive a crypto_request, you have to transfer it to
+the cryptoengine via one of:
+- crypto_transfer_ablkcipher_request_to_engine()
+- crypto_transfer_aead_request_to_engine()
+- crypto_transfer_akcipher_request_to_engine()
+- crypto_transfer_hash_request_to_engine()
+- crypto_transfer_skcipher_request_to_engine()
+
+At the end of the request process, a call to one of the following function is 
needed:
+- crypto_finalize_ablkcipher_request
+- crypto_finalize_aead_request
+- crypto_finalize_akcipher_request
+- crypto_finalize_hash_request
+- crypto_finalize_skcipher_request
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 0/6] crypto: engine - Permit to enqueue all async requests

2018-01-26 Thread Corentin Labbe
Hello

The current crypto_engine support only ahash and ablkcipher request.
My first patch which try to add skcipher was Nacked, it will add too many 
functions
and adding other algs(aead, asymetric_key) will make the situation worst.

This patchset remove all algs specific stuff and now only process generic 
crypto_async_request.

The requests handler function pointer are now moved out of struct engine and
are now stored directly in a crypto_engine_reqctx.

The original proposal of Herbert [1] cannot be done completly since the 
crypto_engine
could only dequeue crypto_async_request and it is impossible to access any 
request_ctx
without knowing the underlying request type.

So I do something near that was requested: adding crypto_engine_reqctx in TFM 
context.
Note that the current implementation expect that crypto_engine_reqctx
is the first member of the context.

The first patch is a try to document the crypto engine API.
The second patch convert the crypto engine with the new way,
while the following patchs convert the 4 existing users of crypto_engine.
Note that this split break bisection, so probably the final commit will be all 
merged.

Appart from virtio, all 4 latest patch were compile tested only.
But the crypto engine is tested with my new sun8i-ce driver.

Regards

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1474434.html

Changes since V1:
- renamed crypto_engine_reqctx to crypto_engine_ctx
- indentation fix in function parameter
- do not export crypto_transfer_request
- Add aead support
- crypto_finalize_request is now static

Changes since RFC:
- Added a documentation patch
- Added patch for stm32-cryp
- Changed parameter of all crypto_engine_op functions from
crypto_async_request to void*
- Reintroduced crypto_transfer_xxx_request_to_engine functions

Corentin Labbe (6):
  Documentation: crypto: document crypto engine API
  crypto: engine - Permit to enqueue all async requests
  crypto: omap: convert to new crypto engine API
  crypto: virtio: convert to new crypto engine API
  crypto: stm32-hash: convert to the new crypto engine API
  crypto: stm32-cryp: convert to the new crypto engine API

 Documentation/crypto/crypto_engine.rst   |  48 +
 crypto/crypto_engine.c   | 301 +++
 drivers/crypto/omap-aes.c|  21 +-
 drivers/crypto/omap-aes.h|   3 +
 drivers/crypto/omap-des.c|  24 ++-
 drivers/crypto/stm32/stm32-cryp.c|  29 ++-
 drivers/crypto/stm32/stm32-hash.c|  20 +-
 drivers/crypto/virtio/virtio_crypto_algs.c   |  16 +-
 drivers/crypto/virtio/virtio_crypto_common.h |   3 +-
 drivers/crypto/virtio/virtio_crypto_core.c   |   3 -
 include/crypto/engine.h  |  68 +++---
 11 files changed, 332 insertions(+), 204 deletions(-)
 create mode 100644 Documentation/crypto/crypto_engine.rst

-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 4/6] crypto: virtio: convert to new crypto engine API

2018-01-26 Thread Corentin Labbe
This patch convert the driver to the new crypto engine API.

Signed-off-by: Corentin Labbe 
---
 drivers/crypto/virtio/virtio_crypto_algs.c   | 16 ++--
 drivers/crypto/virtio/virtio_crypto_common.h |  3 +--
 drivers/crypto/virtio/virtio_crypto_core.c   |  3 ---
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/virtio/virtio_crypto_algs.c 
b/drivers/crypto/virtio/virtio_crypto_algs.c
index abe8c15450df..ba190cfa7aa1 100644
--- a/drivers/crypto/virtio/virtio_crypto_algs.c
+++ b/drivers/crypto/virtio/virtio_crypto_algs.c
@@ -29,6 +29,7 @@
 
 
 struct virtio_crypto_ablkcipher_ctx {
+   struct crypto_engine_ctx enginectx;
struct virtio_crypto *vcrypto;
struct crypto_tfm *tfm;
 
@@ -491,7 +492,7 @@ static int virtio_crypto_ablkcipher_encrypt(struct 
ablkcipher_request *req)
vc_sym_req->ablkcipher_req = req;
vc_sym_req->encrypt = true;
 
-   return crypto_transfer_cipher_request_to_engine(data_vq->engine, req);
+   return crypto_transfer_ablkcipher_request_to_engine(data_vq->engine, 
req);
 }
 
 static int virtio_crypto_ablkcipher_decrypt(struct ablkcipher_request *req)
@@ -511,7 +512,7 @@ static int virtio_crypto_ablkcipher_decrypt(struct 
ablkcipher_request *req)
vc_sym_req->ablkcipher_req = req;
vc_sym_req->encrypt = false;
 
-   return crypto_transfer_cipher_request_to_engine(data_vq->engine, req);
+   return crypto_transfer_ablkcipher_request_to_engine(data_vq->engine, 
req);
 }
 
 static int virtio_crypto_ablkcipher_init(struct crypto_tfm *tfm)
@@ -521,6 +522,9 @@ static int virtio_crypto_ablkcipher_init(struct crypto_tfm 
*tfm)
tfm->crt_ablkcipher.reqsize = sizeof(struct virtio_crypto_sym_request);
ctx->tfm = tfm;
 
+   ctx->enginectx.op.do_one_request = virtio_crypto_ablkcipher_crypt_req;
+   ctx->enginectx.op.prepare_request = NULL;
+   ctx->enginectx.op.unprepare_request = NULL;
return 0;
 }
 
@@ -538,9 +542,9 @@ static void virtio_crypto_ablkcipher_exit(struct crypto_tfm 
*tfm)
 }
 
 int virtio_crypto_ablkcipher_crypt_req(
-   struct crypto_engine *engine,
-   struct ablkcipher_request *req)
+   struct crypto_engine *engine, void *vreq)
 {
+   struct ablkcipher_request *req = container_of(vreq, struct 
ablkcipher_request, base);
struct virtio_crypto_sym_request *vc_sym_req =
ablkcipher_request_ctx(req);
struct virtio_crypto_request *vc_req = &vc_sym_req->base;
@@ -561,8 +565,8 @@ static void virtio_crypto_ablkcipher_finalize_req(
struct ablkcipher_request *req,
int err)
 {
-   crypto_finalize_cipher_request(vc_sym_req->base.dataq->engine,
-   req, err);
+   crypto_finalize_ablkcipher_request(vc_sym_req->base.dataq->engine,
+  req, err);
kzfree(vc_sym_req->iv);
virtcrypto_clear_request(&vc_sym_req->base);
 }
diff --git a/drivers/crypto/virtio/virtio_crypto_common.h 
b/drivers/crypto/virtio/virtio_crypto_common.h
index e976539a05d9..72621bd67211 100644
--- a/drivers/crypto/virtio/virtio_crypto_common.h
+++ b/drivers/crypto/virtio/virtio_crypto_common.h
@@ -107,8 +107,7 @@ struct virtio_crypto *virtcrypto_get_dev_node(int node);
 int virtcrypto_dev_start(struct virtio_crypto *vcrypto);
 void virtcrypto_dev_stop(struct virtio_crypto *vcrypto);
 int virtio_crypto_ablkcipher_crypt_req(
-   struct crypto_engine *engine,
-   struct ablkcipher_request *req);
+   struct crypto_engine *engine, void *vreq);
 
 void
 virtcrypto_clear_request(struct virtio_crypto_request *vc_req);
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c 
b/drivers/crypto/virtio/virtio_crypto_core.c
index ff1410a32c2b..83326986c113 100644
--- a/drivers/crypto/virtio/virtio_crypto_core.c
+++ b/drivers/crypto/virtio/virtio_crypto_core.c
@@ -111,9 +111,6 @@ static int virtcrypto_find_vqs(struct virtio_crypto *vi)
ret = -ENOMEM;
goto err_engine;
}
-
-   vi->data_vq[i].engine->cipher_one_request =
-   virtio_crypto_ablkcipher_crypt_req;
}
 
kfree(names);
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 2/6] crypto: engine - Permit to enqueue all async requests

2018-01-26 Thread Corentin Labbe
The crypto engine could actually only enqueue hash and ablkcipher request.
This patch permit it to enqueue any type of crypto_async_request.

Signed-off-by: Corentin Labbe 
Tested-by: Fabien Dessenne 
---
 crypto/crypto_engine.c  | 301 ++--
 include/crypto/engine.h |  68 ++-
 2 files changed, 203 insertions(+), 166 deletions(-)

diff --git a/crypto/crypto_engine.c b/crypto/crypto_engine.c
index 61e7c4e02fd2..992e8d8dcdd9 100644
--- a/crypto/crypto_engine.c
+++ b/crypto/crypto_engine.c
@@ -15,13 +15,50 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include "internal.h"
 
 #define CRYPTO_ENGINE_MAX_QLEN 10
 
 /**
+ * crypto_finalize_request - finalize one request if the request is done
+ * @engine: the hardware engine
+ * @req: the request need to be finalized
+ * @err: error number
+ */
+static void crypto_finalize_request(struct crypto_engine *engine,
+struct crypto_async_request *req, int err)
+{
+   unsigned long flags;
+   bool finalize_cur_req = false;
+   int ret;
+   struct crypto_engine_ctx *enginectx;
+
+   spin_lock_irqsave(&engine->queue_lock, flags);
+   if (engine->cur_req == req)
+   finalize_cur_req = true;
+   spin_unlock_irqrestore(&engine->queue_lock, flags);
+
+   if (finalize_cur_req) {
+   enginectx = crypto_tfm_ctx(req->tfm);
+   if (engine->cur_req_prepared &&
+   enginectx->op.unprepare_request) {
+   ret = enginectx->op.unprepare_request(engine, req);
+   if (ret)
+   dev_err(engine->dev, "failed to unprepare 
request\n");
+   }
+   spin_lock_irqsave(&engine->queue_lock, flags);
+   engine->cur_req = NULL;
+   engine->cur_req_prepared = false;
+   spin_unlock_irqrestore(&engine->queue_lock, flags);
+   }
+
+   req->complete(req, err);
+
+   kthread_queue_work(engine->kworker, &engine->pump_requests);
+}
+
+/**
  * crypto_pump_requests - dequeue one request from engine queue to process
  * @engine: the hardware engine
  * @in_kthread: true if we are in the context of the request pump thread
@@ -34,11 +71,10 @@ static void crypto_pump_requests(struct crypto_engine 
*engine,
 bool in_kthread)
 {
struct crypto_async_request *async_req, *backlog;
-   struct ahash_request *hreq;
-   struct ablkcipher_request *breq;
unsigned long flags;
bool was_busy = false;
-   int ret, rtype;
+   int ret;
+   struct crypto_engine_ctx *enginectx;
 
spin_lock_irqsave(&engine->queue_lock, flags);
 
@@ -94,7 +130,6 @@ static void crypto_pump_requests(struct crypto_engine 
*engine,
 
spin_unlock_irqrestore(&engine->queue_lock, flags);
 
-   rtype = crypto_tfm_alg_type(engine->cur_req->tfm);
/* Until here we get the request need to be encrypted successfully */
if (!was_busy && engine->prepare_crypt_hardware) {
ret = engine->prepare_crypt_hardware(engine);
@@ -104,57 +139,31 @@ static void crypto_pump_requests(struct crypto_engine 
*engine,
}
}
 
-   switch (rtype) {
-   case CRYPTO_ALG_TYPE_AHASH:
-   hreq = ahash_request_cast(engine->cur_req);
-   if (engine->prepare_hash_request) {
-   ret = engine->prepare_hash_request(engine, hreq);
-   if (ret) {
-   dev_err(engine->dev, "failed to prepare 
request: %d\n",
-   ret);
-   goto req_err;
-   }
-   engine->cur_req_prepared = true;
-   }
-   ret = engine->hash_one_request(engine, hreq);
-   if (ret) {
-   dev_err(engine->dev, "failed to hash one request from 
queue\n");
-   goto req_err;
-   }
-   return;
-   case CRYPTO_ALG_TYPE_ABLKCIPHER:
-   breq = ablkcipher_request_cast(engine->cur_req);
-   if (engine->prepare_cipher_request) {
-   ret = engine->prepare_cipher_request(engine, breq);
-   if (ret) {
-   dev_err(engine->dev, "failed to prepare 
request: %d\n",
-   ret);
-   goto req_err;
-   }
-   engine->cur_req_prepared = true;
-   }
-   ret = engine->cipher_one_request(engine, breq);
+   enginectx = crypto_tfm_ctx(async_req->tfm);
+
+   if (enginectx->op.prepare_request) {
+   ret = enginectx->op.prepare_request(engine, async_req);
if (ret) {
-   dev_err(engine->dev, "failed to cipher one request from 
queue\n");
+  

[PATCH v2 5/6] crypto: stm32-hash: convert to the new crypto engine API

2018-01-26 Thread Corentin Labbe
This patch convert the stm32-hash driver to the new crypto engine API.

Signed-off-by: Corentin Labbe 
Tested-by: Fabien Dessenne 
---
 drivers/crypto/stm32/stm32-hash.c | 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/stm32/stm32-hash.c 
b/drivers/crypto/stm32/stm32-hash.c
index 4ca4a264a833..89b0c2490d80 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -122,6 +122,7 @@ enum stm32_hash_data_format {
 #define HASH_DMA_THRESHOLD 50
 
 struct stm32_hash_ctx {
+   struct crypto_engine_ctx enginectx;
struct stm32_hash_dev   *hdev;
unsigned long   flags;
 
@@ -828,15 +829,19 @@ static int stm32_hash_hw_init(struct stm32_hash_dev *hdev,
return 0;
 }
 
+static int stm32_hash_one_request(struct crypto_engine *engine, void *areq);
+static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq);
+
 static int stm32_hash_handle_queue(struct stm32_hash_dev *hdev,
   struct ahash_request *req)
 {
return crypto_transfer_hash_request_to_engine(hdev->engine, req);
 }
 
-static int stm32_hash_prepare_req(struct crypto_engine *engine,
- struct ahash_request *req)
+static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq)
 {
+   struct ahash_request *req = container_of(areq, struct ahash_request,
+base);
struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
struct stm32_hash_request_ctx *rctx;
@@ -854,9 +859,10 @@ static int stm32_hash_prepare_req(struct crypto_engine 
*engine,
return stm32_hash_hw_init(hdev, rctx);
 }
 
-static int stm32_hash_one_request(struct crypto_engine *engine,
- struct ahash_request *req)
+static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
 {
+   struct ahash_request *req = container_of(areq, struct ahash_request,
+base);
struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
struct stm32_hash_request_ctx *rctx;
@@ -1033,6 +1039,9 @@ static int stm32_hash_cra_init_algs(struct crypto_tfm 
*tfm,
if (algs_hmac_name)
ctx->flags |= HASH_FLAGS_HMAC;
 
+   ctx->enginectx.op.do_one_request = stm32_hash_one_request;
+   ctx->enginectx.op.prepare_request = stm32_hash_prepare_req;
+   ctx->enginectx.op.unprepare_request = NULL;
return 0;
 }
 
@@ -1493,9 +1502,6 @@ static int stm32_hash_probe(struct platform_device *pdev)
goto err_engine;
}
 
-   hdev->engine->prepare_hash_request = stm32_hash_prepare_req;
-   hdev->engine->hash_one_request = stm32_hash_one_request;
-
ret = crypto_engine_start(hdev->engine);
if (ret)
goto err_engine_start;
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v2 3/6] crypto: omap: convert to new crypto engine API

2018-01-26 Thread Corentin Labbe
This patch convert the driver to the new crypto engine API.

Signed-off-by: Corentin Labbe 
---
 drivers/crypto/omap-aes.c | 21 +++--
 drivers/crypto/omap-aes.h |  3 +++
 drivers/crypto/omap-des.c | 24 ++--
 3 files changed, 36 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/omap-aes.c b/drivers/crypto/omap-aes.c
index fbec0a2e76dd..5bd383ed3dec 100644
--- a/drivers/crypto/omap-aes.c
+++ b/drivers/crypto/omap-aes.c
@@ -388,7 +388,7 @@ static void omap_aes_finish_req(struct omap_aes_dev *dd, 
int err)
 
pr_debug("err: %d\n", err);
 
-   crypto_finalize_cipher_request(dd->engine, req, err);
+   crypto_finalize_ablkcipher_request(dd->engine, req, err);
 
pm_runtime_mark_last_busy(dd->dev);
pm_runtime_put_autosuspend(dd->dev);
@@ -408,14 +408,15 @@ static int omap_aes_handle_queue(struct omap_aes_dev *dd,
 struct ablkcipher_request *req)
 {
if (req)
-   return crypto_transfer_cipher_request_to_engine(dd->engine, 
req);
+   return crypto_transfer_ablkcipher_request_to_engine(dd->engine, 
req);
 
return 0;
 }
 
 static int omap_aes_prepare_req(struct crypto_engine *engine,
-   struct ablkcipher_request *req)
+   void *areq)
 {
+   struct ablkcipher_request *req = container_of(areq, struct 
ablkcipher_request, base);
struct omap_aes_ctx *ctx = crypto_ablkcipher_ctx(
crypto_ablkcipher_reqtfm(req));
struct omap_aes_reqctx *rctx = ablkcipher_request_ctx(req);
@@ -468,8 +469,9 @@ static int omap_aes_prepare_req(struct crypto_engine 
*engine,
 }
 
 static int omap_aes_crypt_req(struct crypto_engine *engine,
- struct ablkcipher_request *req)
+ void *areq)
 {
+   struct ablkcipher_request *req = container_of(areq, struct 
ablkcipher_request, base);
struct omap_aes_reqctx *rctx = ablkcipher_request_ctx(req);
struct omap_aes_dev *dd = rctx->dd;
 
@@ -601,6 +603,11 @@ static int omap_aes_ctr_decrypt(struct ablkcipher_request 
*req)
return omap_aes_crypt(req, FLAGS_CTR);
 }
 
+static int omap_aes_prepare_req(struct crypto_engine *engine,
+   void *req);
+static int omap_aes_crypt_req(struct crypto_engine *engine,
+ void *req);
+
 static int omap_aes_cra_init(struct crypto_tfm *tfm)
 {
const char *name = crypto_tfm_alg_name(tfm);
@@ -616,6 +623,10 @@ static int omap_aes_cra_init(struct crypto_tfm *tfm)
 
tfm->crt_ablkcipher.reqsize = sizeof(struct omap_aes_reqctx);
 
+   ctx->enginectx.op.prepare_request = omap_aes_prepare_req;
+   ctx->enginectx.op.unprepare_request = NULL;
+   ctx->enginectx.op.do_one_request = omap_aes_crypt_req;
+
return 0;
 }
 
@@ -1119,8 +1130,6 @@ static int omap_aes_probe(struct platform_device *pdev)
goto err_engine;
}
 
-   dd->engine->prepare_cipher_request = omap_aes_prepare_req;
-   dd->engine->cipher_one_request = omap_aes_crypt_req;
err = crypto_engine_start(dd->engine);
if (err)
goto err_engine;
diff --git a/drivers/crypto/omap-aes.h b/drivers/crypto/omap-aes.h
index 8906342e2b9a..fc3b46a85809 100644
--- a/drivers/crypto/omap-aes.h
+++ b/drivers/crypto/omap-aes.h
@@ -13,6 +13,8 @@
 #ifndef __OMAP_AES_H__
 #define __OMAP_AES_H__
 
+#include 
+
 #define DST_MAXBURST   4
 #define DMA_MIN(DST_MAXBURST * sizeof(u32))
 
@@ -95,6 +97,7 @@ struct omap_aes_gcm_result {
 };
 
 struct omap_aes_ctx {
+   struct crypto_engine_ctx enginectx;
int keylen;
u32 key[AES_KEYSIZE_256 / sizeof(u32)];
u8  nonce[4];
diff --git a/drivers/crypto/omap-des.c b/drivers/crypto/omap-des.c
index ebc5c0f11f03..eb95b0d7f184 100644
--- a/drivers/crypto/omap-des.c
+++ b/drivers/crypto/omap-des.c
@@ -86,6 +86,7 @@
 #define FLAGS_OUT_DATA_ST_SHIFT10
 
 struct omap_des_ctx {
+   struct crypto_engine_ctx enginectx;
struct omap_des_dev *dd;
 
int keylen;
@@ -498,7 +499,7 @@ static void omap_des_finish_req(struct omap_des_dev *dd, 
int err)
 
pr_debug("err: %d\n", err);
 
-   crypto_finalize_cipher_request(dd->engine, req, err);
+   crypto_finalize_ablkcipher_request(dd->engine, req, err);
 
pm_runtime_mark_last_busy(dd->dev);
pm_runtime_put_autosuspend(dd->dev);
@@ -520,14 +521,15 @@ static int omap_des_handle_queue(struct omap_des_dev *dd,
 struct ablkcipher_request *req)
 {
if (req)
-   return crypto_transfer_cipher_request_to_engine(dd->engine, 
req);
+   return crypto_transfer_ablkcipher_request_to_engine(dd->engine, 
req);
 
return 0;
 }
 
 static int omap_des_prepare_req(struct crypto_engine *

[PATCH v2 6/6] crypto: stm32-cryp: convert to the new crypto engine API

2018-01-26 Thread Corentin Labbe
This patch convert the stm32-cryp driver to the new crypto engine API.
Signed-off-by: Corentin Labbe 
Tested-by: Fabien Dessenne 
---
 drivers/crypto/stm32/stm32-cryp.c | 29 +
 1 file changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/crypto/stm32/stm32-cryp.c 
b/drivers/crypto/stm32/stm32-cryp.c
index cf1dddbeaa2c..a816b2ffcaad 100644
--- a/drivers/crypto/stm32/stm32-cryp.c
+++ b/drivers/crypto/stm32/stm32-cryp.c
@@ -91,6 +91,7 @@
 #define _walked_out (cryp->out_walk.offset - cryp->out_sg->offset)
 
 struct stm32_cryp_ctx {
+   struct crypto_engine_ctx enginectx;
struct stm32_cryp   *cryp;
int keylen;
u32 key[AES_KEYSIZE_256 / sizeof(u32)];
@@ -478,7 +479,7 @@ static void stm32_cryp_finish_req(struct stm32_cryp *cryp)
free_pages((unsigned long)buf_out, pages);
}
 
-   crypto_finalize_cipher_request(cryp->engine, cryp->req, err);
+   crypto_finalize_ablkcipher_request(cryp->engine, cryp->req, err);
cryp->req = NULL;
 
memset(cryp->ctx->key, 0, cryp->ctx->keylen);
@@ -494,10 +495,19 @@ static int stm32_cryp_cpu_start(struct stm32_cryp *cryp)
return 0;
 }
 
+static int stm32_cryp_cipher_one_req(struct crypto_engine *engine, void *areq);
+static int stm32_cryp_prepare_cipher_req(struct crypto_engine *engine,
+void *areq);
+
 static int stm32_cryp_cra_init(struct crypto_tfm *tfm)
 {
+   struct stm32_cryp_ctx *ctx = crypto_tfm_ctx(tfm);
+
tfm->crt_ablkcipher.reqsize = sizeof(struct stm32_cryp_reqctx);
 
+   ctx->enginectx.op.do_one_request = stm32_cryp_cipher_one_req;
+   ctx->enginectx.op.prepare_request = stm32_cryp_prepare_cipher_req;
+   ctx->enginectx.op.unprepare_request = NULL;
return 0;
 }
 
@@ -513,7 +523,7 @@ static int stm32_cryp_crypt(struct ablkcipher_request *req, 
unsigned long mode)
 
rctx->mode = mode;
 
-   return crypto_transfer_cipher_request_to_engine(cryp->engine, req);
+   return crypto_transfer_ablkcipher_request_to_engine(cryp->engine, req);
 }
 
 static int stm32_cryp_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
@@ -695,14 +705,20 @@ static int stm32_cryp_prepare_req(struct crypto_engine 
*engine,
 }
 
 static int stm32_cryp_prepare_cipher_req(struct crypto_engine *engine,
-struct ablkcipher_request *req)
+void *areq)
 {
+   struct ablkcipher_request *req = container_of(areq,
+ struct ablkcipher_request,
+ base);
+
return stm32_cryp_prepare_req(engine, req);
 }
 
-static int stm32_cryp_cipher_one_req(struct crypto_engine *engine,
-struct ablkcipher_request *req)
+static int stm32_cryp_cipher_one_req(struct crypto_engine *engine, void *areq)
 {
+   struct ablkcipher_request *req = container_of(areq,
+ struct ablkcipher_request,
+ base);
struct stm32_cryp_ctx *ctx = crypto_ablkcipher_ctx(
crypto_ablkcipher_reqtfm(req));
struct stm32_cryp *cryp = ctx->cryp;
@@ -1104,9 +1120,6 @@ static int stm32_cryp_probe(struct platform_device *pdev)
goto err_engine1;
}
 
-   cryp->engine->prepare_cipher_request = stm32_cryp_prepare_cipher_req;
-   cryp->engine->cipher_one_request = stm32_cryp_cipher_one_req;
-
ret = crypto_engine_start(cryp->engine);
if (ret) {
dev_err(dev, "Could not start crypto engine\n");
-- 
2.13.6

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v24 1/2] mm: support reporting free page blocks

2018-01-26 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 05:00:09PM +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 26, 2018 at 11:29:15AM +0800, Wei Wang wrote:
> > On 01/25/2018 09:41 PM, Michael S. Tsirkin wrote:
> > > On Wed, Jan 24, 2018 at 06:42:41PM +0800, Wei Wang wrote:
> > > > This patch adds support to walk through the free page blocks in the
> > > > system and report them via a callback function. Some page blocks may
> > > > leave the free list after zone->lock is released, so it is the caller's
> > > > responsibility to either detect or prevent the use of such pages.
> > > > 
> > > > One use example of this patch is to accelerate live migration by 
> > > > skipping
> > > > the transfer of free pages reported from the guest. A popular method 
> > > > used
> > > > by the hypervisor to track which part of memory is written during live
> > > > migration is to write-protect all the guest memory. So, those pages that
> > > > are reported as free pages but are written after the report function
> > > > returns will be captured by the hypervisor, and they will be added to 
> > > > the
> > > > next round of memory transfer.
> > > > 
> > > > Signed-off-by: Wei Wang 
> > > > Signed-off-by: Liang Li 
> > > > Cc: Michal Hocko 
> > > > Cc: Michael S. Tsirkin 
> > > > Acked-by: Michal Hocko 
> > > > ---
> > > >   include/linux/mm.h |  6 
> > > >   mm/page_alloc.c| 91 
> > > > ++
> > > >   2 files changed, 97 insertions(+)
> > > > 
> > > > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > > > index ea818ff..b3077dd 100644
> > > > --- a/include/linux/mm.h
> > > > +++ b/include/linux/mm.h
> > > > @@ -1938,6 +1938,12 @@ extern void free_area_init_node(int nid, 
> > > > unsigned long * zones_size,
> > > > unsigned long zone_start_pfn, unsigned long 
> > > > *zholes_size);
> > > >   extern void free_initmem(void);
> > > > +extern void walk_free_mem_block(void *opaque,
> > > > +   int min_order,
> > > > +   bool (*report_pfn_range)(void *opaque,
> > > > +unsigned long 
> > > > pfn,
> > > > +unsigned long 
> > > > num));
> > > > +
> > > >   /*
> > > >* Free reserved pages within range [PAGE_ALIGN(start), end & 
> > > > PAGE_MASK)
> > > >* into the buddy system. The freed pages will be poisoned with 
> > > > pattern
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 76c9688..705de22 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -4899,6 +4899,97 @@ void show_free_areas(unsigned int filter, 
> > > > nodemask_t *nodemask)
> > > > show_swap_cache_info();
> > > >   }
> > > > +/*
> > > > + * Walk through a free page list and report the found pfn range via the
> > > > + * callback.
> > > > + *
> > > > + * Return false if the callback requests to stop reporting. Otherwise,
> > > > + * return true.
> > > > + */
> > > > +static bool walk_free_page_list(void *opaque,
> > > > +   struct zone *zone,
> > > > +   int order,
> > > > +   enum migratetype mt,
> > > > +   bool (*report_pfn_range)(void *,
> > > > +unsigned long,
> > > > +unsigned long))
> > > > +{
> > > > +   struct page *page;
> > > > +   struct list_head *list;
> > > > +   unsigned long pfn, flags;
> > > > +   bool ret;
> > > > +
> > > > +   spin_lock_irqsave(&zone->lock, flags);
> > > > +   list = &zone->free_area[order].free_list[mt];
> > > > +   list_for_each_entry(page, list, lru) {
> > > > +   pfn = page_to_pfn(page);
> > > > +   ret = report_pfn_range(opaque, pfn, 1 << order);
> > > > +   if (!ret)
> > > > +   break;
> > > > +   }
> > > > +   spin_unlock_irqrestore(&zone->lock, flags);
> > > > +
> > > > +   return ret;
> > > > +}
> > > There are two issues with this API. One is that it is not
> > > restarteable: if you return false, you start from the
> > > beginning. So no way to drop lock, do something slow
> > > and then proceed.
> > > 
> > > Another is that you are using it to report free page hints. Presumably
> > > the point is to drop these pages - keeping them near head of the list
> > > and reusing the reported ones will just make everything slower
> > > invalidating the hint.
> > > 
> > > How about rotating these pages towards the end of the list?
> > > Probably not on each call, callect reported pages and then
> > > move them to tail when we exit.
> > 
> > 
> > I'm not sure how this would help. For example, we have a list of 2M free
> > page blocks:
> > A-->B-->C-->D-->E-->F-->G--H
> > 
> > After reporting A and B, and put them to the end and exit, when the cal

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Siwei Liu
On Fri, Jan 26, 2018 at 8:51 AM, Samudrala, Sridhar
 wrote:
>
>
> On 1/26/2018 12:14 AM, Siwei Liu wrote:
>>
>> On Tue, Jan 23, 2018 at 2:58 PM, Michael S. Tsirkin 
>> wrote:
>>>
>>> On Tue, Jan 23, 2018 at 12:24:47PM -0800, Siwei Liu wrote:

 On Mon, Jan 22, 2018 at 1:41 PM, Michael S. Tsirkin 
 wrote:
>
> On Mon, Jan 22, 2018 at 12:27:14PM -0800, Siwei Liu wrote:
>>
>> First off, as mentioned in another thread, the model of stacking up
>> virt-bond functionality over virtio seems a wrong direction to me.
>> Essentially the migration process would need to carry over all guest
>> side configurations previously done on the VF/PT and get them moved to
>> the new device being it virtio or VF/PT.
>
> I might be wrong but I don't see why we should worry about this
> usecase.
> Whoever has a bond configured already has working config for migration.
> We are trying to help people who don't, not convert existig users.

 That has been placed in the view of cloud providers that the imported
 images from the store must be able to run unmodified thus no
 additional setup script is allowed (just as Stephen mentioned in
 another mail). Cloud users don't care about live migration themselves
 but the providers are required to implement such automation mechanism
 to make this process transparent if at all possible. The user does not
 care about the device underneath being VF or not, but they do care
 about consistency all across and the resulting performance
 acceleration in making VF the prefered datapath. It is not quite
 peculiar user cases but IMHO *any* approach proposed for live
 migration should be able to persist the state including network config
 e.g. as simple as MTU. Actually this requirement has nothing to do
 with virtio but our target users are live migration agnostic, being it
 tracking DMA through dirty pages, using virtio as the helper, or
 whatsoever, the goal of persisting configs across remains same.
>>>
>>> So the patching being discussed here will mostly do exactly that if your
>>> original config was simply a single virtio net device.
>>>
>> True, but I don't see the patch being discussed starts with good
>> foundation of supporting the same for VF/PT device. That is the core
>> of the issue.
>
>
>>> What kind of configs do your users have right now?
>>
>> Any configs be it generic or driver specific that the VF/PT device
>> supports and have been enabled/configured. General network configs
>> (MAC, IP address, VLAN, MTU, iptables rules), ethtool settings
>> (hardware offload, # of queues and ring entris, RSC options, rss
>> rxfh-indir table, rx-flow-hash, et al) , bpf/XDP program being run, tc
>> flower offload, just to name a few. As cloud providers we don't limit
>> users from applying driver specific tuning to the NIC/VF, and
>> sometimes this is essential to achieving best performance for their
>> workload. We've seen cases like tuning coalescing parameters for
>> getting low latency, changing rx-flow-hash function for better VXLAN
>> throughput, or even adopting quite advanced NIC features such as flow
>> director or cloud filter. We don't expect users to compromise even a
>> little bit on these. That is once we turn on live migration for the VF
>> or pass through devices in the VM, it all takes place under the hood,
>> users (guest admins, applications) don't have to react upon it or even
>> notice the change. I should note that the majority of live migrations
>> take place between machines with completely identical hardware, it's
>> more critical than necessary to keep the config as-is across the move,
>> stealth while quiet.
>
>
> This usecase is much more complicated and different than what this patch is
> trying
> to address.

Yep, it is technically difficult, but as cloud providers we would like
to take actions to address use case for our own if no one else is
willing to do so. However we're not seeking complicated design or
messing up the others such as your use case. As this is the first time
a real patch of the PV failover approach, although having be discussed
for years, posted to the mailing list. All voices suddenly came over,
various parties wish their specific needs added to the todo list, it's
indeed hard to accommodate all at once in the first place. I went
through same tough period of time while I was doing similar work so I
completely understand that. The task is not easy for sure. :)

The attempts I made was trying to consolidate all potential use cases
into one single solution rather than diverge from the very beginning.
It's in the phase of RFC and I don't want to wait expressing our
interest until very late.

>  Also your usecase seems to be assuming that source and
> destination
> hosts are identical and have the same HW.

Not exactly, this will be positioned as an optimization, but it is
crucial to our use case. While for the generic case with 

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Michael S. Tsirkin
On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:
> > and the VM is not expected to do any tuning/optimizations on the VF driver
> > directly,
> > i think the current patch that follows the netvsc model of 2 netdevs(virtio
> > and vf) should
> > work fine.
> 
> OK. For your use case that's fine. But that's too specific scenario
> with lots of restrictions IMHO, perhaps very few users will benefit
> from it, I'm not sure. If you're unwilling to move towards it, we'd
> take this one and come back with a generic solution that is able to
> address general use cases for VF/PT live migration .

I think that's a fine approach. Scratch your own itch!  I imagine a very
generic virtio-switchdev providing host routing info to guests could
address lots of usecases. A driver could bind to that one and enslave
arbitrary other devices.  Sounds reasonable.

But given the fundamental idea of a failover was floated at least as
early as 2013, and made 0 progress since precisely because it kept
trying to address more and more features, and given netvsc is already
using the basic solution with some success, I'm not inclined to block
this specific effort waiting for the generic one.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Jakub Kicinski
On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:
> On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:
> > > and the VM is not expected to do any tuning/optimizations on the VF driver
> > > directly,
> > > i think the current patch that follows the netvsc model of 2 
> > > netdevs(virtio
> > > and vf) should
> > > work fine.  
> > 
> > OK. For your use case that's fine. But that's too specific scenario
> > with lots of restrictions IMHO, perhaps very few users will benefit
> > from it, I'm not sure. If you're unwilling to move towards it, we'd
> > take this one and come back with a generic solution that is able to
> > address general use cases for VF/PT live migration .  
> 
> I think that's a fine approach. Scratch your own itch!  I imagine a very
> generic virtio-switchdev providing host routing info to guests could
> address lots of usecases. A driver could bind to that one and enslave
> arbitrary other devices.  Sounds reasonable.
> 
> But given the fundamental idea of a failover was floated at least as
> early as 2013, and made 0 progress since precisely because it kept
> trying to address more and more features, and given netvsc is already
> using the basic solution with some success, I'm not inclined to block
> this specific effort waiting for the generic one.

I think there is an agreement that the extra netdev will be useful for
more advanced use cases, and is generally preferable.  What is the
argument for not doing that from the start?  If it was made I must have
missed it.  Is it just unwillingness to write the extra 300 lines of
code?  Sounds like a pretty weak argument when adding kernel ABI is at
stake...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Samudrala, Sridhar


On 1/26/2018 2:47 PM, Jakub Kicinski wrote:

On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:

and the VM is not expected to do any tuning/optimizations on the VF driver
directly,
i think the current patch that follows the netvsc model of 2 netdevs(virtio
and vf) should
work fine.

OK. For your use case that's fine. But that's too specific scenario
with lots of restrictions IMHO, perhaps very few users will benefit
from it, I'm not sure. If you're unwilling to move towards it, we'd
take this one and come back with a generic solution that is able to
address general use cases for VF/PT live migration .

I think that's a fine approach. Scratch your own itch!  I imagine a very
generic virtio-switchdev providing host routing info to guests could
address lots of usecases. A driver could bind to that one and enslave
arbitrary other devices.  Sounds reasonable.

But given the fundamental idea of a failover was floated at least as
early as 2013, and made 0 progress since precisely because it kept
trying to address more and more features, and given netvsc is already
using the basic solution with some success, I'm not inclined to block
this specific effort waiting for the generic one.

I think there is an agreement that the extra netdev will be useful for
more advanced use cases, and is generally preferable.  What is the
argument for not doing that from the start?  If it was made I must have
missed it.  Is it just unwillingness to write the extra 300 lines of
code?  Sounds like a pretty weak argument when adding kernel ABI is at
stake...


I am still not clear on the need for the extra netdev created by 
virtio_net. The only advantage
i can see is that the stats can be broken between VF and virtio 
datapaths compared
to the aggregrated stats on virtio netdev as seen with the 2 netdev 
approach.


With 2 netdev model, any VM image that has a working network 
configuration will transparently get
VF based acceleration without any changes. 3 netdev model breaks this 
configuration starting with the
creation and naming of the 2 devices to udev needing to be aware of 
master and slave virtio-net devices.
Also, from a user experience point of view, loading a virtio-net with 
BACKUP feature

enabled will  now show 2 virtio-net netdevs.

For live migration with advanced usecases that Siwei is suggesting, i 
think we need a new driver
with a new device type that can track the VF specific feature settings 
even when the VF driver is unloaded.


Thanks
Sridhar


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Jakub Kicinski
On Fri, 26 Jan 2018 15:30:35 -0800, Samudrala, Sridhar wrote:
> On 1/26/2018 2:47 PM, Jakub Kicinski wrote:
> > On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:  
> >> On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:  
>  and the VM is not expected to do any tuning/optimizations on the VF 
>  driver
>  directly,
>  i think the current patch that follows the netvsc model of 2 
>  netdevs(virtio
>  and vf) should
>  work fine.  
> >>> OK. For your use case that's fine. But that's too specific scenario
> >>> with lots of restrictions IMHO, perhaps very few users will benefit
> >>> from it, I'm not sure. If you're unwilling to move towards it, we'd
> >>> take this one and come back with a generic solution that is able to
> >>> address general use cases for VF/PT live migration .  
> >> I think that's a fine approach. Scratch your own itch!  I imagine a very
> >> generic virtio-switchdev providing host routing info to guests could
> >> address lots of usecases. A driver could bind to that one and enslave
> >> arbitrary other devices.  Sounds reasonable.
> >>
> >> But given the fundamental idea of a failover was floated at least as
> >> early as 2013, and made 0 progress since precisely because it kept
> >> trying to address more and more features, and given netvsc is already
> >> using the basic solution with some success, I'm not inclined to block
> >> this specific effort waiting for the generic one.  
> > I think there is an agreement that the extra netdev will be useful for
> > more advanced use cases, and is generally preferable.  What is the
> > argument for not doing that from the start?  If it was made I must have
> > missed it.  Is it just unwillingness to write the extra 300 lines of
> > code?  Sounds like a pretty weak argument when adding kernel ABI is at
> > stake...  
> 
> I am still not clear on the need for the extra netdev created by 
> virtio_net. The only advantage i can see is that the stats can be
> broken between VF and virtio datapaths compared to the aggregrated
> stats on virtio netdev as seen with the 2 netdev approach.

Maybe you're not convinced but multiple arguments were made.

> With 2 netdev model, any VM image that has a working network 
> configuration will transparently get VF based acceleration without
> any changes. 

Nothing happens transparently.  Things may happen automatically.  The
VF netdev doesn't disappear with netvsc.  The PV netdev transforms into
something it did not use to be.  And configures and reports some
information from the PV (e.g. speed) but PV doesn't pass traffic any
longer.

> 3 netdev model breaks this configuration starting with the creation
> and naming of the 2 devices to udev needing to be aware of master and
> slave virtio-net devices.

I don't understand this comment.  There is one virtio-net device and
one "virtio-bond" netdev.  And user space has to be aware of the special
automatic arrangement anyway, because it can't touch the VF.  It
doesn't make any difference whether it ignores the VF or PV and VF.
It simply can't touch the slaves, no matter how many there are.

> Also, from a user experience point of view, loading a virtio-net with
> BACKUP feature enabled will now show 2 virtio-net netdevs.

One virtio-net and one virtio-bond, which represents what's happening.

> For live migration with advanced usecases that Siwei is suggesting, i 
> think we need a new driver with a new device type that can track the
> VF specific feature settings even when the VF driver is unloaded.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Samudrala, Sridhar

On 1/26/2018 6:30 PM, Jakub Kicinski wrote:

On Fri, 26 Jan 2018 15:30:35 -0800, Samudrala, Sridhar wrote:

On 1/26/2018 2:47 PM, Jakub Kicinski wrote:

On Sat, 27 Jan 2018 00:14:20 +0200, Michael S. Tsirkin wrote:

On Fri, Jan 26, 2018 at 01:46:42PM -0800, Siwei Liu wrote:

and the VM is not expected to do any tuning/optimizations on the VF driver
directly,
i think the current patch that follows the netvsc model of 2 netdevs(virtio
and vf) should
work fine.

OK. For your use case that's fine. But that's too specific scenario
with lots of restrictions IMHO, perhaps very few users will benefit
from it, I'm not sure. If you're unwilling to move towards it, we'd
take this one and come back with a generic solution that is able to
address general use cases for VF/PT live migration .

I think that's a fine approach. Scratch your own itch!  I imagine a very
generic virtio-switchdev providing host routing info to guests could
address lots of usecases. A driver could bind to that one and enslave
arbitrary other devices.  Sounds reasonable.

But given the fundamental idea of a failover was floated at least as
early as 2013, and made 0 progress since precisely because it kept
trying to address more and more features, and given netvsc is already
using the basic solution with some success, I'm not inclined to block
this specific effort waiting for the generic one.

I think there is an agreement that the extra netdev will be useful for
more advanced use cases, and is generally preferable.  What is the
argument for not doing that from the start?  If it was made I must have
missed it.  Is it just unwillingness to write the extra 300 lines of
code?  Sounds like a pretty weak argument when adding kernel ABI is at
stake...

I am still not clear on the need for the extra netdev created by
virtio_net. The only advantage i can see is that the stats can be
broken between VF and virtio datapaths compared to the aggregrated
stats on virtio netdev as seen with the 2 netdev approach.

Maybe you're not convinced but multiple arguments were made.
All the arguments seem to either saying that semantically this doesn't 
look like

a bond OR suggesting usecases that this patch is not trying to solve.
This approach should help cloud environments where the guest networking 
is fully
controlled from the hypervisor via the PF driver or via port representor 
when switchdev
mode is enabled. The guest admin is not expected or allowed to make any 
networking

changes from the VM.




With 2 netdev model, any VM image that has a working network
configuration will transparently get VF based acceleration without
any changes.

Nothing happens transparently.  Things may happen automatically.  The
VF netdev doesn't disappear with netvsc.  The PV netdev transforms into
something it did not use to be.  And configures and reports some
information from the PV (e.g. speed) but PV doesn't pass traffic any
longer.



3 netdev model breaks this configuration starting with the creation
and naming of the 2 devices to udev needing to be aware of master and
slave virtio-net devices.

I don't understand this comment.  There is one virtio-net device and
one "virtio-bond" netdev.  And user space has to be aware of the special
automatic arrangement anyway, because it can't touch the VF.  It
doesn't make any difference whether it ignores the VF or PV and VF.
It simply can't touch the slaves, no matter how many there are.


If the userspace is not expected to touch the slaves, then why do we need to
take extra effort to expose a netdev that is just not really useful.




Also, from a user experience point of view, loading a virtio-net with
BACKUP feature enabled will now show 2 virtio-net netdevs.

One virtio-net and one virtio-bond, which represents what's happening.
This again assumes that we want to represent a bond setup. Can't we 
treat this
as virtio-net providing an alternate low-latency datapath by taking over 
VF datapath?



For live migration with advanced usecases that Siwei is suggesting, i
think we need a new driver with a new device type that can track the
VF specific feature settings even when the VF driver is unloaded.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [virtio-dev] Re: [RFC PATCH net-next v2 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-01-26 Thread Jakub Kicinski
On Fri, 26 Jan 2018 21:33:01 -0800, Samudrala, Sridhar wrote:
> >> 3 netdev model breaks this configuration starting with the creation
> >> and naming of the 2 devices to udev needing to be aware of master and
> >> slave virtio-net devices.  
> > I don't understand this comment.  There is one virtio-net device and
> > one "virtio-bond" netdev.  And user space has to be aware of the special
> > automatic arrangement anyway, because it can't touch the VF.  It
> > doesn't make any difference whether it ignores the VF or PV and VF.
> > It simply can't touch the slaves, no matter how many there are.  
> 
> If the userspace is not expected to touch the slaves, then why do we need to
> take extra effort to expose a netdev that is just not really useful.

You said:
"[user space] needs to be aware of master and slave virtio-net devices."

I'm saying:
It has to be aware of the special arrangement whether there is an
explicit bond netdev or not.

> >> Also, from a user experience point of view, loading a virtio-net with
> >> BACKUP feature enabled will now show 2 virtio-net netdevs.  
> > One virtio-net and one virtio-bond, which represents what's happening.  
> This again assumes that we want to represent a bond setup. Can't we 
> treat this
> as virtio-net providing an alternate low-latency datapath by taking over 
> VF datapath?

Bond is just a familiar name, we can call it something else if you
prefer.  The point is there are two data paths which can have
independent low-level settings and a higher level entity with
global settings which represents any path to the outside world.

Hiding low-level netdevs from a lay user requires a generic solution.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization