RE: [PATCH v15 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ

2017-09-05 Thread Wang, Wei W
Ping for comments if possible. Thanks.

On Monday, August 28, 2017 6:09 PM, Wang, Wei W wrote:
> [PATCH v15 5/5] virtio-balloon: VIRTIO_BALLOON_F_CTRL_VQ
> 
> Add a new vq, ctrl_vq, to handle commands between the host and guest.
> With this feature, we will be able to have the control plane and data plane
> separated. In other words, the control related data of each feature will be 
> sent
> via the ctrl_vq cmds, meanwhile each feature may have its own data plane vq.
> 
> Free page report is the the first new feature controlled via ctrl_vq, and a 
> new
> cmd class, VIRTIO_BALLOON_CTRLQ_CLASS_FREE_PAGE, is added.
> Currently, this feature has two cmds:
> VIRTIO_BALLOON_FREE_PAGE_F_START: This cmd is sent from host to guest to
> start the free page reporting work.
> VIRTIO_BALLOON_FREE_PAGE_F_STOP: This cmd is used bidirectionally. The
> guest would send the cmd to the host to indicate the reporting work is done.
> The host would send the cmd to the guest to actively request the stop of the
> reporting work.
> 
> The free_page_vq is used to transmit the guest free page blocks to the host.
> 
> Signed-off-by: Wei Wang 
> Signed-off-by: Liang Li 
> ---
>  drivers/virtio/virtio_balloon.c | 247 +-
> --
>  include/uapi/linux/virtio_balloon.h |  15 +++
>  2 files changed, 242 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_balloon.c 
> b/drivers/virtio/virtio_balloon.c index
> 8ecc1d4..1d384a4 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -55,7 +55,13 @@ static struct vfsmount *balloon_mnt;
> 
>  struct virtio_balloon {
>   struct virtio_device *vdev;
> - struct virtqueue *inflate_vq, *deflate_vq, *stats_vq;
> + struct virtqueue *inflate_vq, *deflate_vq, *stats_vq, *ctrl_vq,
> +  *free_page_vq;
> +
> + /* Balloon's own wq for cpu-intensive work items */
> + struct workqueue_struct *balloon_wq;
> + /* The work items submitted to the balloon wq are listed here */
> + struct work_struct report_free_page_work;
> 
>   /* The balloon servicing is delegated to a freezable workqueue. */
>   struct work_struct update_balloon_stats_work; @@ -65,6 +71,9 @@
> struct virtio_balloon {
>   spinlock_t stop_update_lock;
>   bool stop_update;
> 
> + /* Stop reporting free pages */
> + bool report_free_page_stop;
> +
>   /* Waiting for host to ack the pages we released. */
>   wait_queue_head_t acked;
> 
> @@ -93,6 +102,11 @@ struct virtio_balloon {
> 
>   /* To register callback in oom notifier call chain */
>   struct notifier_block nb;
> +
> + /* Host to guest ctrlq cmd buf for free page report */
> + struct virtio_balloon_ctrlq_cmd free_page_cmd_in;
> + /* Guest to Host ctrlq cmd buf for free page report */
> + struct virtio_balloon_ctrlq_cmd free_page_cmd_out;
>  };
> 
>  static struct virtio_device_id id_table[] = { @@ -177,6 +191,26 @@ static 
> void
> send_balloon_page_sg(struct virtio_balloon *vb,
>   }
>  }
> 
> +static void send_free_page_sg(struct virtqueue *vq, void *addr,
> +uint32_t size) {
> + unsigned int len;
> + int err = -ENOSPC;
> +
> + do {
> + if (vq->num_free) {
> + err = add_one_sg(vq, addr, size);
> + /* Sanity check: this can't really happen */
> + WARN_ON(err);
> + if (!err)
> + virtqueue_kick(vq);
> + }
> +
> + /* Release entries if there are */
> + while (virtqueue_get_buf(vq, &len))
> + ;
> + } while (err == -ENOSPC && vq->num_free); }
> +
>  /*
>   * Send balloon pages in sgs to host. The balloon pages are recorded in the
>   * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
> @@ -525,42 +559,206 @@ static void update_balloon_size_func(struct
> work_struct *work)
>   queue_work(system_freezable_wq, work);  }
> 
> -static int init_vqs(struct virtio_balloon *vb)
> +static bool virtio_balloon_send_free_pages(void *opaque, unsigned long pfn,
> +unsigned long nr_pages)
> +{
> + struct virtio_balloon *vb = (struct virtio_balloon *)opaque;
> + void *addr = (void *)pfn_to_kaddr(pfn);
> + uint32_t len = nr_pages << PAGE_SHIFT;
> +
> + if (vb->report_free_page_stop)
> + return 1;
> +
> + send_free_page_sg(vb->free_page_vq, addr, len);
> +
> + return 0;
> +}
> +
> +static void ctrlq_add_cmd(struct virtqueue *vq,
> +   struct virtio_balloon_ctrlq_cmd *cmd,
> +   bool inbuf)
>  {
> - struct virtqueue *vqs[3];
> - vq_callback_t *callbacks[] = { balloon_ack, balloon_ack, stats_request 
> };
> - static const char * const names[] = { "inflate", "deflate", "stats" };
> - int err, nvqs;
> + struct scatterlist sg;
> + int err;
> +
> + sg_init_on

[PATCH 0/4] make virt_spin_lock() a pvops function

2017-09-05 Thread Juergen Gross
With virt_spin_lock() being a pvops function the bare metal case can be
optimized by patching the call away completely. In case a kernel running
as a guest it can decide whether to use paravitualized spinlocks, the
current fallback to the unfair test-and-set scheme, or to mimic the
bare metal behavior.

Juergen Gross (4):
  paravirt: add generic _paravirt_false() function
  paravirt: switch vcpu_is_preempted to use _paravirt_false() on bare
metal
  paravirt: add virt_spin_lock pvops function
  paravirt,xen: correct xen_nopvspin case

 arch/x86/include/asm/paravirt.h   |  5 
 arch/x86/include/asm/paravirt_types.h |  3 +++
 arch/x86/include/asm/qspinlock.h  | 48 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 22 
 arch/x86/kernel/paravirt.c|  7 +
 arch/x86/kernel/paravirt_patch_32.c   | 18 ++---
 arch/x86/kernel/paravirt_patch_64.c   | 17 +
 arch/x86/kernel/smpboot.c |  2 ++
 arch/x86/xen/spinlock.c   |  2 ++
 9 files changed, 79 insertions(+), 45 deletions(-)

-- 
2.12.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Juergen Gross
There are cases where a guest tries to switch spinlocks to bare metal
behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
has the downside of falling back to unfair test and set scheme for
qspinlocks due to virt_spin_lock() detecting the virtualized
environment.

Make virt_spin_lock() a paravirt operation in order to enable users
to select an explicit behavior like bare metal.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h   |  5 
 arch/x86/include/asm/paravirt_types.h |  1 +
 arch/x86/include/asm/qspinlock.h  | 48 ---
 arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
 arch/x86/kernel/smpboot.c |  2 ++
 5 files changed, 55 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index c25dd22f7c70..d9e954fb37df 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long cpu)
return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
 }
 
+static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
+{
+   return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
+}
+
 #endif /* SMP && PARAVIRT_SPINLOCKS */
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 19efefc0e27e..928f5e7953a7 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -319,6 +319,7 @@ struct pv_lock_ops {
void (*kick)(int cpu);
 
struct paravirt_callee_save vcpu_is_preempted;
+   struct paravirt_callee_save virt_spin_lock;
 } __no_randomize_layout;
 
 /* This contains all the paravirt structures: we get a convenient
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 48a706f641f2..fbd98896385c 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
qspinlock *lock)
smp_store_release((u8 *)lock, 0);
 }
 
+static inline bool native_virt_spin_lock(struct qspinlock *lock)
+{
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
+   return false;
+
+   /*
+* On hypervisors without PARAVIRT_SPINLOCKS support we fall
+* back to a Test-and-Set spinlock, because fair locks have
+* horrible lock 'holder' preemption issues.
+*/
+
+   do {
+   while (atomic_read(&lock->val) != 0)
+   cpu_relax();
+   } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
+
+   return true;
+}
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __pv_init_lock_hash(void);
@@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
 {
return pv_vcpu_is_preempted(cpu);
 }
+
+void native_pv_lock_init(void) __init;
 #else
 static inline void queued_spin_unlock(struct qspinlock *lock)
 {
native_queued_spin_unlock(lock);
 }
+
+static inline void native_pv_lock_init(void)
+{
+}
 #endif
 
 #ifdef CONFIG_PARAVIRT
 #define virt_spin_lock virt_spin_lock
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
 static inline bool virt_spin_lock(struct qspinlock *lock)
 {
-   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
-   return false;
-
-   /*
-* On hypervisors without PARAVIRT_SPINLOCKS support we fall
-* back to a Test-and-Set spinlock, because fair locks have
-* horrible lock 'holder' preemption issues.
-*/
-
-   do {
-   while (atomic_read(&lock->val) != 0)
-   cpu_relax();
-   } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
-
-   return true;
+   return pv_virt_spin_lock(lock);
+}
+#else
+static inline bool virt_spin_lock(struct qspinlock *lock)
+{
+   return native_virt_spin_lock(lock);
 }
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
 #endif /* CONFIG_PARAVIRT */
 
 #include 
diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 26e4bd92f309..1be187ef8a38 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -20,6 +20,12 @@ bool pv_is_native_spin_unlock(void)
__raw_callee_save___native_queued_spin_unlock;
 }
 
+__visible bool __native_virt_spin_lock(struct qspinlock *lock)
+{
+   return native_virt_spin_lock(lock);
+}
+PV_CALLEE_SAVE_REGS_THUNK(__native_virt_spin_lock);
+
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
.queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
@@ -27,6 +33,14 @@ struct pv_lock_ops pv_lock_ops = {
.wait = paravirt_nop,
.kick = paravirt_nop,
.vcpu_is_preempted = __PV_IS_CALLEE_SAVE(_paravirt_false),
+   .virt_spin_lock = PV_CALLEE_SAVE(__native_virt_spin_lock),
 #en

[PATCH 2/4] paravirt: switch vcpu_is_preempted to use _paravirt_false() on bare metal

2017-09-05 Thread Juergen Gross
Instead of special casing pv_lock_ops.vcpu_is_preempted when patching
use _paravirt_false() on bare metal.

Signed-off-by: Juergen Gross 
---
 arch/x86/kernel/paravirt-spinlocks.c | 14 +-
 arch/x86/kernel/paravirt_patch_32.c  | 10 --
 arch/x86/kernel/paravirt_patch_64.c  | 10 --
 3 files changed, 1 insertion(+), 33 deletions(-)

diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
b/arch/x86/kernel/paravirt-spinlocks.c
index 8f2d1c9d43a8..26e4bd92f309 100644
--- a/arch/x86/kernel/paravirt-spinlocks.c
+++ b/arch/x86/kernel/paravirt-spinlocks.c
@@ -20,25 +20,13 @@ bool pv_is_native_spin_unlock(void)
__raw_callee_save___native_queued_spin_unlock;
 }
 
-__visible bool __native_vcpu_is_preempted(long cpu)
-{
-   return false;
-}
-PV_CALLEE_SAVE_REGS_THUNK(__native_vcpu_is_preempted);
-
-bool pv_is_native_vcpu_is_preempted(void)
-{
-   return pv_lock_ops.vcpu_is_preempted.func ==
-   __raw_callee_save___native_vcpu_is_preempted;
-}
-
 struct pv_lock_ops pv_lock_ops = {
 #ifdef CONFIG_SMP
.queued_spin_lock_slowpath = native_queued_spin_lock_slowpath,
.queued_spin_unlock = PV_CALLEE_SAVE(__native_queued_spin_unlock),
.wait = paravirt_nop,
.kick = paravirt_nop,
-   .vcpu_is_preempted = PV_CALLEE_SAVE(__native_vcpu_is_preempted),
+   .vcpu_is_preempted = __PV_IS_CALLEE_SAVE(_paravirt_false),
 #endif /* SMP */
 };
 EXPORT_SYMBOL(pv_lock_ops);
diff --git a/arch/x86/kernel/paravirt_patch_32.c 
b/arch/x86/kernel/paravirt_patch_32.c
index 287c7b9735de..ea311a3563e3 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -13,7 +13,6 @@ DEF_NATIVE(, xor, "xor %eax, %eax");
 
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
-DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "xor %eax, %eax");
 #endif
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
@@ -35,7 +34,6 @@ unsigned paravirt_patch_false(void *insnbuf, unsigned len)
 }
 
 extern bool pv_is_native_spin_unlock(void);
-extern bool pv_is_native_vcpu_is_preempted(void);
 
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
  unsigned long addr, unsigned len)
@@ -65,14 +63,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
goto patch_site;
}
goto patch_default;
-
-   case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
-   if (pv_is_native_vcpu_is_preempted()) {
-   start = start_pv_lock_ops_vcpu_is_preempted;
-   end   = end_pv_lock_ops_vcpu_is_preempted;
-   goto patch_site;
-   }
-   goto patch_default;
 #endif
 
default:
diff --git a/arch/x86/kernel/paravirt_patch_64.c 
b/arch/x86/kernel/paravirt_patch_64.c
index 8ab4379ceea9..64dffe4499b4 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -21,7 +21,6 @@ DEF_NATIVE(, xor, "xor %rax, %rax");
 
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
-DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "xor %rax, %rax");
 #endif
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len)
@@ -43,7 +42,6 @@ unsigned paravirt_patch_false(void *insnbuf, unsigned len)
 }
 
 extern bool pv_is_native_spin_unlock(void);
-extern bool pv_is_native_vcpu_is_preempted(void);
 
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
  unsigned long addr, unsigned len)
@@ -76,14 +74,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
goto patch_site;
}
goto patch_default;
-
-   case PARAVIRT_PATCH(pv_lock_ops.vcpu_is_preempted):
-   if (pv_is_native_vcpu_is_preempted()) {
-   start = start_pv_lock_ops_vcpu_is_preempted;
-   end   = end_pv_lock_ops_vcpu_is_preempted;
-   goto patch_site;
-   }
-   goto patch_default;
 #endif
 
default:
-- 
2.12.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 1/4] paravirt: add generic _paravirt_false() function

2017-09-05 Thread Juergen Gross
Add a _paravirt_false() default function returning always false which
can be used for cases where a boolean pvops replacement should just
say "no".

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt_types.h | 2 ++
 arch/x86/kernel/paravirt.c| 7 +++
 arch/x86/kernel/paravirt_patch_32.c   | 8 
 arch/x86/kernel/paravirt_patch_64.c   | 7 +++
 4 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 6b64fc6367f2..19efefc0e27e 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -377,6 +377,7 @@ extern struct pv_lock_ops pv_lock_ops;
 
 unsigned paravirt_patch_ident_32(void *insnbuf, unsigned len);
 unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len);
+unsigned paravirt_patch_false(void *insnbuf, unsigned len);
 unsigned paravirt_patch_call(void *insnbuf,
 const void *target, u16 tgt_clobbers,
 unsigned long addr, u16 site_clobbers,
@@ -682,6 +683,7 @@ void paravirt_flush_lazy_mmu(void);
 void _paravirt_nop(void);
 u32 _paravirt_ident_32(u32);
 u64 _paravirt_ident_64(u64);
+bool _paravirt_false(void);
 
 #define paravirt_nop   ((void *)_paravirt_nop)
 
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index a14df9eecfed..94105773c00c 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -66,6 +66,11 @@ u64 notrace _paravirt_ident_64(u64 x)
return x;
 }
 
+bool notrace _paravirt_false(void)
+{
+   return false;
+}
+
 void __init default_banner(void)
 {
printk(KERN_INFO "Booting paravirtualized kernel on %s\n",
@@ -149,6 +154,8 @@ unsigned paravirt_patch_default(u8 type, u16 clobbers, void 
*insnbuf,
ret = paravirt_patch_ident_32(insnbuf, len);
else if (opfunc == _paravirt_ident_64)
ret = paravirt_patch_ident_64(insnbuf, len);
+   else if (opfunc == _paravirt_false)
+   ret = paravirt_patch_false(insnbuf, len);
 
else if (type == PARAVIRT_PATCH(pv_cpu_ops.iret) ||
 type == PARAVIRT_PATCH(pv_cpu_ops.usergs_sysret64))
diff --git a/arch/x86/kernel/paravirt_patch_32.c 
b/arch/x86/kernel/paravirt_patch_32.c
index 553acbbb4d32..287c7b9735de 100644
--- a/arch/x86/kernel/paravirt_patch_32.c
+++ b/arch/x86/kernel/paravirt_patch_32.c
@@ -9,6 +9,8 @@ DEF_NATIVE(pv_mmu_ops, read_cr2, "mov %cr2, %eax");
 DEF_NATIVE(pv_mmu_ops, write_cr3, "mov %eax, %cr3");
 DEF_NATIVE(pv_mmu_ops, read_cr3, "mov %cr3, %eax");
 
+DEF_NATIVE(, xor, "xor %eax, %eax");
+
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%eax)");
 DEF_NATIVE(pv_lock_ops, vcpu_is_preempted, "xor %eax, %eax");
@@ -26,6 +28,12 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
return 0;
 }
 
+unsigned paravirt_patch_false(void *insnbuf, unsigned len)
+{
+   return paravirt_patch_insns(insnbuf, len,
+   start__xor, end__xor);
+}
+
 extern bool pv_is_native_spin_unlock(void);
 extern bool pv_is_native_vcpu_is_preempted(void);
 
diff --git a/arch/x86/kernel/paravirt_patch_64.c 
b/arch/x86/kernel/paravirt_patch_64.c
index 11aaf1eaa0e4..8ab4379ceea9 100644
--- a/arch/x86/kernel/paravirt_patch_64.c
+++ b/arch/x86/kernel/paravirt_patch_64.c
@@ -17,6 +17,7 @@ DEF_NATIVE(pv_cpu_ops, swapgs, "swapgs");
 
 DEF_NATIVE(, mov32, "mov %edi, %eax");
 DEF_NATIVE(, mov64, "mov %rdi, %rax");
+DEF_NATIVE(, xor, "xor %rax, %rax");
 
 #if defined(CONFIG_PARAVIRT_SPINLOCKS)
 DEF_NATIVE(pv_lock_ops, queued_spin_unlock, "movb $0, (%rdi)");
@@ -35,6 +36,12 @@ unsigned paravirt_patch_ident_64(void *insnbuf, unsigned len)
start__mov64, end__mov64);
 }
 
+unsigned paravirt_patch_false(void *insnbuf, unsigned len)
+{
+   return paravirt_patch_insns(insnbuf, len,
+   start__xor, end__xor);
+}
+
 extern bool pv_is_native_spin_unlock(void);
 extern bool pv_is_native_vcpu_is_preempted(void);
 
-- 
2.12.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH 4/4] paravirt,xen: correct xen_nopvspin case

2017-09-05 Thread Juergen Gross
With the boot parameter "xen_nopvspin" specified a Xen guest should not
make use of paravirt spinlocks, but behave as if running on bare
metal. This is not true, however, as the qspinlock code will fall back
to a test-and-set scheme when it is detecting a hypervisor.

In order to avoid this set the virt_spin_lock pvops function to
_paravirt_false() in case xen_nopvspin has been specified.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/spinlock.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
index 25a7c4302ce7..c01cedfa745d 100644
--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -129,6 +129,8 @@ void __init xen_init_spinlocks(void)
 
if (!xen_pvspin) {
printk(KERN_DEBUG "xen: PV spinlocks disabled\n");
+   pv_lock_ops.virt_spin_lock =
+   __PV_IS_CALLEE_SAVE(_paravirt_false);
return;
}
printk(KERN_DEBUG "xen: PV spinlocks enabled\n");
-- 
2.12.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Peter Zijlstra
On Tue, Sep 05, 2017 at 03:24:43PM +0200, Juergen Gross wrote:
> diff --git a/arch/x86/include/asm/qspinlock.h 
> b/arch/x86/include/asm/qspinlock.h
> index 48a706f641f2..fbd98896385c 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
> qspinlock *lock)
>   smp_store_release((u8 *)lock, 0);
>  }
>  


Should this not have:

#ifdef CONFIG_PARAVIRT

?

> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
> +{
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return false;
> +
> + /*
> +  * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> +  * back to a Test-and-Set spinlock, because fair locks have
> +  * horrible lock 'holder' preemption issues.
> +  */
> +
> + do {
> + while (atomic_read(&lock->val) != 0)
> + cpu_relax();
> + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> +
> + return true;
> +}

#endif

> +
>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
> val);
>  extern void __pv_init_lock_hash(void);


>  #ifdef CONFIG_PARAVIRT
>  #define virt_spin_lock virt_spin_lock
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>  static inline bool virt_spin_lock(struct qspinlock *lock)
>  {
> + return pv_virt_spin_lock(lock);
> +}
> +#else
> +static inline bool virt_spin_lock(struct qspinlock *lock)
> +{
> + return native_virt_spin_lock(lock);
>  }
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  #endif /* CONFIG_PARAVIRT */

Because I think the above only ever uses native_virt_spin_lock() when
PARAVIRT.

> @@ -1381,6 +1382,7 @@ void __init native_smp_prepare_boot_cpu(void)
>   /* already set me in cpu_online_mask in boot_cpu_init() */
>   cpumask_set_cpu(me, cpu_callout_mask);
>   cpu_set_state_online(me);
> + native_pv_lock_init();
>  }

Aah, this is where that goes.. OK that works too.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Juergen Gross
On 05/09/17 15:55, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 03:24:43PM +0200, Juergen Gross wrote:
>> diff --git a/arch/x86/include/asm/qspinlock.h 
>> b/arch/x86/include/asm/qspinlock.h
>> index 48a706f641f2..fbd98896385c 100644
>> --- a/arch/x86/include/asm/qspinlock.h
>> +++ b/arch/x86/include/asm/qspinlock.h
>> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
>> qspinlock *lock)
>>  smp_store_release((u8 *)lock, 0);
>>  }
>>  
> 
> 
> Should this not have:
> 
> #ifdef CONFIG_PARAVIRT
> 
> ?

I can add it if you want.


Juergen
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Waiman Long
On 09/05/2017 09:24 AM, Juergen Gross wrote:
> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
> has the downside of falling back to unfair test and set scheme for
> qspinlocks due to virt_spin_lock() detecting the virtualized
> environment.
>
> Make virt_spin_lock() a paravirt operation in order to enable users
> to select an explicit behavior like bare metal.
>
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/paravirt.h   |  5 
>  arch/x86/include/asm/paravirt_types.h |  1 +
>  arch/x86/include/asm/qspinlock.h  | 48 
> ---
>  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
>  arch/x86/kernel/smpboot.c |  2 ++
>  5 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index c25dd22f7c70..d9e954fb37df 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long 
> cpu)
>   return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
>  }
>  
> +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
> +{
> + return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
> +}
> +
>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 19efefc0e27e..928f5e7953a7 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -319,6 +319,7 @@ struct pv_lock_ops {
>   void (*kick)(int cpu);
>  
>   struct paravirt_callee_save vcpu_is_preempted;
> + struct paravirt_callee_save virt_spin_lock;
>  } __no_randomize_layout;
>  
>  /* This contains all the paravirt structures: we get a convenient
> diff --git a/arch/x86/include/asm/qspinlock.h 
> b/arch/x86/include/asm/qspinlock.h
> index 48a706f641f2..fbd98896385c 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
> qspinlock *lock)
>   smp_store_release((u8 *)lock, 0);
>  }
>  
> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
> +{
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return false;
> +

I think you can take the above if statement out as you has done test in
native_pv_lock_init(). So the test will also be false here.

As this patch series is x86 specific, you should probably add "x86/" in
front of paravirt in the patch titles.

Cheers,
Longman

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Peter Zijlstra
On Tue, Sep 05, 2017 at 10:02:57AM -0400, Waiman Long wrote:
> On 09/05/2017 09:24 AM, Juergen Gross wrote:

> > +static inline bool native_virt_spin_lock(struct qspinlock *lock)
> > +{
> > +   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> > +   return false;
> > +
> 
> I think you can take the above if statement out as you has done test in
> native_pv_lock_init(). So the test will also be false here.

That does mean we'll run a test-and-set spinlock until paravirt patching
happens though. I prefer to not do that.

One important point.. we must not be holding any locks when we switch
over between the two locks. Back then I spend some time making sure that
didn't happen with the X86 feature flag muck.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-09-05 Thread Willem de Bruijn
On Mon, Sep 4, 2017 at 5:03 AM, Jason Wang  wrote:
>
>
> On 2017年09月02日 00:17, Willem de Bruijn wrote:
>
> This is not a 50/50 split, which impliesTw that some packets from the
> large
> packet flow are still converted to copying. Without the change the rate
> without queue was 80k zerocopy vs 80k copy, so this choice of
> (vq->num >> 2) appears too conservative.
>
> However, testing with (vq->num >> 1) was not as effective at mitigating
> stalls. I did not save that data, unfortunately. Can run more tests on
> fine
> tuning this variable, if the idea sounds good.


 Looks like there're still two cases were left:
>>>
>>> To be clear, this patch is not intended to fix all issues. It is a small
>>> improvement to avoid HoL blocking due to queued zerocopy skbs.
>
>
> Right, just want to see if there's anything left.
>
>>>
>>> The trade-off is that reverting to copying in these cases increases
>>> cycle cost. I think that that is a trade-off worth making compared to
>>> the alternative drop in throughput. It probably would be good to be
>>> able to measure this without kernel instrumentation: export
>>> counters similar to net->tx_zcopy_err and net->tx_packets (though
>>> without reset to zero, as in vhost_net_tx_packet).
>
>
> I think it's acceptable if extra cycles were spent if we detect HOL anyhow.
>
>>>
 1) sndbuf is not INT_MAX
>>>
>>> You mean the case where the device stalls, later zerocopy notifications
>>> are queued, but these are never cleaned in free_old_xmit_skbs,
>>> because it requires a start_xmit and by now the (only) socket is out of
>>> descriptors?
>>
>> Typo, sorry. I meant out of sndbuf.
>
>
> I mean e.g for tun. If its sndbuf is smaller than e.g (vq->num >> 1) *
> $pkt_size and if all packet were held by some modules, limitation like
> vq->num >> 1 won't work since we hit sudbuf before it.

Good point.

>>
>>> A watchdog would help somewhat. With tx-napi, this case cannot occur,
>>> either, as free_old_xmit_skbs no longer depends on a call to start_xmit.
>>>
 2) tx napi is used for virtio-net
>>>
>>> I am not aware of any issue specific to the use of tx-napi?
>
>
> Might not be clear here, I mean e.g virtio_net (tx-napi) in guest +
> vhost_net (zerocopy) in host. In this case, even if we switch to datacopy if
> ubuf counts exceeds vq->num >> 1, we still complete tx buffers in order, tx
> interrupt could be delayed for indefinite time.

Copied buffers are completed immediately in handle_tx.

Do you mean when a process sends fewer packets than vq->num >> 1,
so that all are queued? Yes, then the latency is indeed that of the last
element leaving the qdisc.

>>>
 1) could be a corner case, and for 2) what your suggest here may not
 solve
 the issue since it still do in order completion.
>>>
>>> Somewhat tangential, but it might also help to break the in-order
>>> completion processing in vhost_zerocopy_signal_used. Complete
>>> all descriptors between done_idx and upend_idx. done_idx should
>>> then only be forward to the oldest still not-completed descriptor.
>>>
>>> In the test I ran, where the oldest descriptors are held in a queue and
>>> all newer ones are tail-dropped,
>
>
> Do you mean the descriptors were tail-dropped by vhost?

Tail-dropped by netem. The dropped items are completed out of
order by vhost before the held items.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Waiman Long
On 09/05/2017 09:24 AM, Juergen Gross wrote:
> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
> has the downside of falling back to unfair test and set scheme for
> qspinlocks due to virt_spin_lock() detecting the virtualized
> environment.
>
> Make virt_spin_lock() a paravirt operation in order to enable users
> to select an explicit behavior like bare metal.
>
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/paravirt.h   |  5 
>  arch/x86/include/asm/paravirt_types.h |  1 +
>  arch/x86/include/asm/qspinlock.h  | 48 
> ---
>  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
>  arch/x86/kernel/smpboot.c |  2 ++
>  5 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
> index c25dd22f7c70..d9e954fb37df 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long 
> cpu)
>   return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
>  }
>  
> +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
> +{
> + return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
> +}
> +
>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 19efefc0e27e..928f5e7953a7 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -319,6 +319,7 @@ struct pv_lock_ops {
>   void (*kick)(int cpu);
>  
>   struct paravirt_callee_save vcpu_is_preempted;
> + struct paravirt_callee_save virt_spin_lock;
>  } __no_randomize_layout;
>  
>  /* This contains all the paravirt structures: we get a convenient
> diff --git a/arch/x86/include/asm/qspinlock.h 
> b/arch/x86/include/asm/qspinlock.h
> index 48a706f641f2..fbd98896385c 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
> qspinlock *lock)
>   smp_store_release((u8 *)lock, 0);
>  }
>  
> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
> +{
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return false;
> +
> + /*
> +  * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> +  * back to a Test-and-Set spinlock, because fair locks have
> +  * horrible lock 'holder' preemption issues.
> +  */
> +
> + do {
> + while (atomic_read(&lock->val) != 0)
> + cpu_relax();
> + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> +
> + return true;
> +}
> +
>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
> val);
>  extern void __pv_init_lock_hash(void);
> @@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
>  {
>   return pv_vcpu_is_preempted(cpu);
>  }
> +
> +void native_pv_lock_init(void) __init;
>  #else
>  static inline void queued_spin_unlock(struct qspinlock *lock)
>  {
>   native_queued_spin_unlock(lock);
>  }
> +
> +static inline void native_pv_lock_init(void)
> +{
> +}
>  #endif
>  
>  #ifdef CONFIG_PARAVIRT
>  #define virt_spin_lock virt_spin_lock
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>  static inline bool virt_spin_lock(struct qspinlock *lock)
>  {
> - if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> - return false;

Have you consider just add one more jump label here to skip
virt_spin_lock when KVM or Xen want to do so?

> -
> - /*
> -  * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> -  * back to a Test-and-Set spinlock, because fair locks have
> -  * horrible lock 'holder' preemption issues.
> -  */
> -
> - do {
> - while (atomic_read(&lock->val) != 0)
> - cpu_relax();
> - } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> -
> - return true;
> + return pv_virt_spin_lock(lock);
> +}
> +#else
> +static inline bool virt_spin_lock(struct qspinlock *lock)
> +{
> + return native_virt_spin_lock(lock);
>  }
> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
>  #endif /* CONFIG_PARAVIRT */
>  
>  #include 
> diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
> b/arch/x86/kernel/paravirt-spinlocks.c
> index 26e4bd92f309..1be187ef8a38 100644
> --- a/arch/x86/kernel/paravirt-spinlocks.c
> +++ b/arch/x86/kernel/paravirt-spinlocks.c
> @@ -20,6 +20,12 @@ bool pv_is_native_spin_unlock(void)
>   __raw_callee_save___native_queued_spin_unlock;
>  }
>  
> +__visible bool __native_virt_spin_lock(struct qspinlock *lock)
> +{
> + return native_virt_spin_lock(lock);
> +}
> +PV_CALLEE_SAVE_REGS_THUNK(__native_virt_spin_lock);

I have so

Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Waiman Long
On 09/05/2017 10:08 AM, Peter Zijlstra wrote:
> On Tue, Sep 05, 2017 at 10:02:57AM -0400, Waiman Long wrote:
>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>>> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
>>> +{
>>> +   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>>> +   return false;
>>> +
>> I think you can take the above if statement out as you has done test in
>> native_pv_lock_init(). So the test will also be false here.
> That does mean we'll run a test-and-set spinlock until paravirt patching
> happens though. I prefer to not do that.
>
> One important point.. we must not be holding any locks when we switch
> over between the two locks. Back then I spend some time making sure that
> didn't happen with the X86 feature flag muck.

AFAICT, native_pv_lock_init() is called before SMP init. So it shouldn't
matter.

Cheers,
Longman

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Juergen Gross
On 05/09/17 16:10, Waiman Long wrote:
> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>> There are cases where a guest tries to switch spinlocks to bare metal
>> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
>> has the downside of falling back to unfair test and set scheme for
>> qspinlocks due to virt_spin_lock() detecting the virtualized
>> environment.
>>
>> Make virt_spin_lock() a paravirt operation in order to enable users
>> to select an explicit behavior like bare metal.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  arch/x86/include/asm/paravirt.h   |  5 
>>  arch/x86/include/asm/paravirt_types.h |  1 +
>>  arch/x86/include/asm/qspinlock.h  | 48 
>> ---
>>  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
>>  arch/x86/kernel/smpboot.c |  2 ++
>>  5 files changed, 55 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/paravirt.h 
>> b/arch/x86/include/asm/paravirt.h
>> index c25dd22f7c70..d9e954fb37df 100644
>> --- a/arch/x86/include/asm/paravirt.h
>> +++ b/arch/x86/include/asm/paravirt.h
>> @@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long 
>> cpu)
>>  return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
>>  }
>>  
>> +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
>> +{
>> +return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
>> +}
>> +
>>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>>  
>>  #ifdef CONFIG_X86_32
>> diff --git a/arch/x86/include/asm/paravirt_types.h 
>> b/arch/x86/include/asm/paravirt_types.h
>> index 19efefc0e27e..928f5e7953a7 100644
>> --- a/arch/x86/include/asm/paravirt_types.h
>> +++ b/arch/x86/include/asm/paravirt_types.h
>> @@ -319,6 +319,7 @@ struct pv_lock_ops {
>>  void (*kick)(int cpu);
>>  
>>  struct paravirt_callee_save vcpu_is_preempted;
>> +struct paravirt_callee_save virt_spin_lock;
>>  } __no_randomize_layout;
>>  
>>  /* This contains all the paravirt structures: we get a convenient
>> diff --git a/arch/x86/include/asm/qspinlock.h 
>> b/arch/x86/include/asm/qspinlock.h
>> index 48a706f641f2..fbd98896385c 100644
>> --- a/arch/x86/include/asm/qspinlock.h
>> +++ b/arch/x86/include/asm/qspinlock.h
>> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
>> qspinlock *lock)
>>  smp_store_release((u8 *)lock, 0);
>>  }
>>  
>> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
>> +{
>> +if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>> +return false;
>> +
>> +/*
>> + * On hypervisors without PARAVIRT_SPINLOCKS support we fall
>> + * back to a Test-and-Set spinlock, because fair locks have
>> + * horrible lock 'holder' preemption issues.
>> + */
>> +
>> +do {
>> +while (atomic_read(&lock->val) != 0)
>> +cpu_relax();
>> +} while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
>> +
>> +return true;
>> +}
>> +
>>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>>  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
>> val);
>>  extern void __pv_init_lock_hash(void);
>> @@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
>>  {
>>  return pv_vcpu_is_preempted(cpu);
>>  }
>> +
>> +void native_pv_lock_init(void) __init;
>>  #else
>>  static inline void queued_spin_unlock(struct qspinlock *lock)
>>  {
>>  native_queued_spin_unlock(lock);
>>  }
>> +
>> +static inline void native_pv_lock_init(void)
>> +{
>> +}
>>  #endif
>>  
>>  #ifdef CONFIG_PARAVIRT
>>  #define virt_spin_lock virt_spin_lock
>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>>  static inline bool virt_spin_lock(struct qspinlock *lock)
>>  {
>> -if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>> -return false;
> 
> Have you consider just add one more jump label here to skip
> virt_spin_lock when KVM or Xen want to do so?

Why? Did you look at patch 4? This is the way to do it...

> 
>> -
>> -/*
>> - * On hypervisors without PARAVIRT_SPINLOCKS support we fall
>> - * back to a Test-and-Set spinlock, because fair locks have
>> - * horrible lock 'holder' preemption issues.
>> - */
>> -
>> -do {
>> -while (atomic_read(&lock->val) != 0)
>> -cpu_relax();
>> -} while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
>> -
>> -return true;
>> +return pv_virt_spin_lock(lock);
>> +}
>> +#else
>> +static inline bool virt_spin_lock(struct qspinlock *lock)
>> +{
>> +return native_virt_spin_lock(lock);
>>  }
>> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
>>  #endif /* CONFIG_PARAVIRT */
>>  
>>  #include 
>> diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
>> b/arch/x86/kernel/paravirt-spinlocks.c
>> index 26e4bd92f309..1be187ef8a38 100644
>> --- a/arch/x86/kernel/paravirt-spinlocks.c
>> +++ b/arch/x86/kernel/paravirt-spinlocks.c
>> @@ -20,6 +20,12 @@ bool pv_is_native_spin_unlock(void)
>>  __raw_callee_save___native_

Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Waiman Long
On 09/05/2017 10:18 AM, Juergen Gross wrote:
> On 05/09/17 16:10, Waiman Long wrote:
>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
>>> There are cases where a guest tries to switch spinlocks to bare metal
>>> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
>>> has the downside of falling back to unfair test and set scheme for
>>> qspinlocks due to virt_spin_lock() detecting the virtualized
>>> environment.
>>>
>>> Make virt_spin_lock() a paravirt operation in order to enable users
>>> to select an explicit behavior like bare metal.
>>>
>>> Signed-off-by: Juergen Gross 
>>> ---
>>>  arch/x86/include/asm/paravirt.h   |  5 
>>>  arch/x86/include/asm/paravirt_types.h |  1 +
>>>  arch/x86/include/asm/qspinlock.h  | 48 
>>> ---
>>>  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
>>>  arch/x86/kernel/smpboot.c |  2 ++
>>>  5 files changed, 55 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/arch/x86/include/asm/paravirt.h 
>>> b/arch/x86/include/asm/paravirt.h
>>> index c25dd22f7c70..d9e954fb37df 100644
>>> --- a/arch/x86/include/asm/paravirt.h
>>> +++ b/arch/x86/include/asm/paravirt.h
>>> @@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long 
>>> cpu)
>>> return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
>>>  }
>>>  
>>> +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
>>> +{
>>> +   return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
>>> +}
>>> +
>>>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>>>  
>>>  #ifdef CONFIG_X86_32
>>> diff --git a/arch/x86/include/asm/paravirt_types.h 
>>> b/arch/x86/include/asm/paravirt_types.h
>>> index 19efefc0e27e..928f5e7953a7 100644
>>> --- a/arch/x86/include/asm/paravirt_types.h
>>> +++ b/arch/x86/include/asm/paravirt_types.h
>>> @@ -319,6 +319,7 @@ struct pv_lock_ops {
>>> void (*kick)(int cpu);
>>>  
>>> struct paravirt_callee_save vcpu_is_preempted;
>>> +   struct paravirt_callee_save virt_spin_lock;
>>>  } __no_randomize_layout;
>>>  
>>>  /* This contains all the paravirt structures: we get a convenient
>>> diff --git a/arch/x86/include/asm/qspinlock.h 
>>> b/arch/x86/include/asm/qspinlock.h
>>> index 48a706f641f2..fbd98896385c 100644
>>> --- a/arch/x86/include/asm/qspinlock.h
>>> +++ b/arch/x86/include/asm/qspinlock.h
>>> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
>>> qspinlock *lock)
>>> smp_store_release((u8 *)lock, 0);
>>>  }
>>>  
>>> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
>>> +{
>>> +   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>>> +   return false;
>>> +
>>> +   /*
>>> +* On hypervisors without PARAVIRT_SPINLOCKS support we fall
>>> +* back to a Test-and-Set spinlock, because fair locks have
>>> +* horrible lock 'holder' preemption issues.
>>> +*/
>>> +
>>> +   do {
>>> +   while (atomic_read(&lock->val) != 0)
>>> +   cpu_relax();
>>> +   } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
>>> +
>>> +   return true;
>>> +}
>>> +
>>>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>>>  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
>>> val);
>>>  extern void __pv_init_lock_hash(void);
>>> @@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
>>>  {
>>> return pv_vcpu_is_preempted(cpu);
>>>  }
>>> +
>>> +void native_pv_lock_init(void) __init;
>>>  #else
>>>  static inline void queued_spin_unlock(struct qspinlock *lock)
>>>  {
>>> native_queued_spin_unlock(lock);
>>>  }
>>> +
>>> +static inline void native_pv_lock_init(void)
>>> +{
>>> +}
>>>  #endif
>>>  
>>>  #ifdef CONFIG_PARAVIRT
>>>  #define virt_spin_lock virt_spin_lock
>>> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>>>  static inline bool virt_spin_lock(struct qspinlock *lock)
>>>  {
>>> -   if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
>>> -   return false;
>> Have you consider just add one more jump label here to skip
>> virt_spin_lock when KVM or Xen want to do so?
> Why? Did you look at patch 4? This is the way to do it...

I asked this because of my performance concern as stated later in the email.

>>> -
>>> -   /*
>>> -* On hypervisors without PARAVIRT_SPINLOCKS support we fall
>>> -* back to a Test-and-Set spinlock, because fair locks have
>>> -* horrible lock 'holder' preemption issues.
>>> -*/
>>> -
>>> -   do {
>>> -   while (atomic_read(&lock->val) != 0)
>>> -   cpu_relax();
>>> -   } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
>>> -
>>> -   return true;
>>> +   return pv_virt_spin_lock(lock);
>>> +}
>>> +#else
>>> +static inline bool virt_spin_lock(struct qspinlock *lock)
>>> +{
>>> +   return native_virt_spin_lock(lock);
>>>  }
>>> +#endif /* CONFIG_PARAVIRT_SPINLOCKS */
>>>  #endif /* CONFIG_PARAVIRT */
>>>  
>>>  #include 
>>> diff --git a/arch/x86/kernel/paravirt-spinlocks.c 
>>> b/arch/x86/kernel/paravirt-spinlocks.c
>>> index 26e4

Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Waiman Long
On 09/05/2017 10:24 AM, Waiman Long wrote:
> On 09/05/2017 10:18 AM, Juergen Gross wrote:
>> On 05/09/17 16:10, Waiman Long wrote:
>>> On 09/05/2017 09:24 AM, Juergen Gross wrote:
 There are cases where a guest tries to switch spinlocks to bare metal
 behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
 has the downside of falling back to unfair test and set scheme for
 qspinlocks due to virt_spin_lock() detecting the virtualized
 environment.

 Make virt_spin_lock() a paravirt operation in order to enable users
 to select an explicit behavior like bare metal.

 Signed-off-by: Juergen Gross 
 ---
  arch/x86/include/asm/paravirt.h   |  5 
  arch/x86/include/asm/paravirt_types.h |  1 +
  arch/x86/include/asm/qspinlock.h  | 48 
 ---
  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
  arch/x86/kernel/smpboot.c |  2 ++
  5 files changed, 55 insertions(+), 15 deletions(-)

 diff --git a/arch/x86/include/asm/paravirt.h 
 b/arch/x86/include/asm/paravirt.h
 index c25dd22f7c70..d9e954fb37df 100644
 --- a/arch/x86/include/asm/paravirt.h
 +++ b/arch/x86/include/asm/paravirt.h
 @@ -725,6 +725,11 @@ static __always_inline bool pv_vcpu_is_preempted(long 
 cpu)
return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
  }
  
 +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
 +{
 +  return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
 +}
 +
  #endif /* SMP && PARAVIRT_SPINLOCKS */
  
  #ifdef CONFIG_X86_32
 diff --git a/arch/x86/include/asm/paravirt_types.h 
 b/arch/x86/include/asm/paravirt_types.h
 index 19efefc0e27e..928f5e7953a7 100644
 --- a/arch/x86/include/asm/paravirt_types.h
 +++ b/arch/x86/include/asm/paravirt_types.h
 @@ -319,6 +319,7 @@ struct pv_lock_ops {
void (*kick)(int cpu);
  
struct paravirt_callee_save vcpu_is_preempted;
 +  struct paravirt_callee_save virt_spin_lock;
  } __no_randomize_layout;
  
  /* This contains all the paravirt structures: we get a convenient
 diff --git a/arch/x86/include/asm/qspinlock.h 
 b/arch/x86/include/asm/qspinlock.h
 index 48a706f641f2..fbd98896385c 100644
 --- a/arch/x86/include/asm/qspinlock.h
 +++ b/arch/x86/include/asm/qspinlock.h
 @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
 qspinlock *lock)
smp_store_release((u8 *)lock, 0);
  }
  
 +static inline bool native_virt_spin_lock(struct qspinlock *lock)
 +{
 +  if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
 +  return false;
 +
 +  /*
 +   * On hypervisors without PARAVIRT_SPINLOCKS support we fall
 +   * back to a Test-and-Set spinlock, because fair locks have
 +   * horrible lock 'holder' preemption issues.
 +   */
 +
 +  do {
 +  while (atomic_read(&lock->val) != 0)
 +  cpu_relax();
 +  } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
 +
 +  return true;
 +}
 +
  #ifdef CONFIG_PARAVIRT_SPINLOCKS
  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
 val);
  extern void __pv_init_lock_hash(void);
 @@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
  {
return pv_vcpu_is_preempted(cpu);
  }
 +
 +void native_pv_lock_init(void) __init;
  #else
  static inline void queued_spin_unlock(struct qspinlock *lock)
  {
native_queued_spin_unlock(lock);
  }
 +
 +static inline void native_pv_lock_init(void)
 +{
 +}
  #endif
  
  #ifdef CONFIG_PARAVIRT
  #define virt_spin_lock virt_spin_lock
 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
  static inline bool virt_spin_lock(struct qspinlock *lock)
  {
 -  if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
 -  return false;
>>> Have you consider just add one more jump label here to skip
>>> virt_spin_lock when KVM or Xen want to do so?
>> Why? Did you look at patch 4? This is the way to do it...
> I asked this because of my performance concern as stated later in the email.

For clarification, I was actually asking if you consider just adding one
more jump label to skip it for Xen/KVM instead of making
virt_spin_lock() a pv-op.

Cheers,
Longman
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] paravirt: switch maintainer

2017-09-05 Thread Juergen Gross
Jeremy Fitzhardinge is stepping down as a paravirt maintainer. I'll
replace him.

While at it, update the file list to the actual pattern.

Signed-off-by: Juergen Gross 
---
 MAINTAINERS | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8ef4694af6e8..d7565683aab3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9956,7 +9956,7 @@ F:include/uapi/linux/ppdev.h
 F: Documentation/parport*.txt
 
 PARAVIRT_OPS INTERFACE
-M: Jeremy Fitzhardinge 
+M: Juergen Gross 
 M: Chris Wright 
 M: Alok Kataria 
 M: Rusty Russell 
@@ -9964,7 +9964,7 @@ L:virtualization@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
 F: arch/*/kernel/paravirt*
-F: arch/*/include/asm/paravirt.h
+F: arch/*/include/asm/paravirt*.h
 F: include/linux/hypervisor.h
 
 PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
-- 
2.12.3

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] paravirt: add virt_spin_lock pvops function

2017-09-05 Thread Juergen Gross
On 05/09/17 16:31, Waiman Long wrote:
> On 09/05/2017 10:24 AM, Waiman Long wrote:
>> On 09/05/2017 10:18 AM, Juergen Gross wrote:
>>> On 05/09/17 16:10, Waiman Long wrote:
 On 09/05/2017 09:24 AM, Juergen Gross wrote:
> There are cases where a guest tries to switch spinlocks to bare metal
> behavior (e.g. by setting "xen_nopvspin" boot parameter). Today this
> has the downside of falling back to unfair test and set scheme for
> qspinlocks due to virt_spin_lock() detecting the virtualized
> environment.
>
> Make virt_spin_lock() a paravirt operation in order to enable users
> to select an explicit behavior like bare metal.
>
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/include/asm/paravirt.h   |  5 
>  arch/x86/include/asm/paravirt_types.h |  1 +
>  arch/x86/include/asm/qspinlock.h  | 48 
> ---
>  arch/x86/kernel/paravirt-spinlocks.c  | 14 ++
>  arch/x86/kernel/smpboot.c |  2 ++
>  5 files changed, 55 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/paravirt.h 
> b/arch/x86/include/asm/paravirt.h
> index c25dd22f7c70..d9e954fb37df 100644
> --- a/arch/x86/include/asm/paravirt.h
> +++ b/arch/x86/include/asm/paravirt.h
> @@ -725,6 +725,11 @@ static __always_inline bool 
> pv_vcpu_is_preempted(long cpu)
>   return PVOP_CALLEE1(bool, pv_lock_ops.vcpu_is_preempted, cpu);
>  }
>  
> +static __always_inline bool pv_virt_spin_lock(struct qspinlock *lock)
> +{
> + return PVOP_CALLEE1(bool, pv_lock_ops.virt_spin_lock, lock);
> +}
> +
>  #endif /* SMP && PARAVIRT_SPINLOCKS */
>  
>  #ifdef CONFIG_X86_32
> diff --git a/arch/x86/include/asm/paravirt_types.h 
> b/arch/x86/include/asm/paravirt_types.h
> index 19efefc0e27e..928f5e7953a7 100644
> --- a/arch/x86/include/asm/paravirt_types.h
> +++ b/arch/x86/include/asm/paravirt_types.h
> @@ -319,6 +319,7 @@ struct pv_lock_ops {
>   void (*kick)(int cpu);
>  
>   struct paravirt_callee_save vcpu_is_preempted;
> + struct paravirt_callee_save virt_spin_lock;
>  } __no_randomize_layout;
>  
>  /* This contains all the paravirt structures: we get a convenient
> diff --git a/arch/x86/include/asm/qspinlock.h 
> b/arch/x86/include/asm/qspinlock.h
> index 48a706f641f2..fbd98896385c 100644
> --- a/arch/x86/include/asm/qspinlock.h
> +++ b/arch/x86/include/asm/qspinlock.h
> @@ -17,6 +17,25 @@ static inline void native_queued_spin_unlock(struct 
> qspinlock *lock)
>   smp_store_release((u8 *)lock, 0);
>  }
>  
> +static inline bool native_virt_spin_lock(struct qspinlock *lock)
> +{
> + if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> + return false;
> +
> + /*
> +  * On hypervisors without PARAVIRT_SPINLOCKS support we fall
> +  * back to a Test-and-Set spinlock, because fair locks have
> +  * horrible lock 'holder' preemption issues.
> +  */
> +
> + do {
> + while (atomic_read(&lock->val) != 0)
> + cpu_relax();
> + } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0);
> +
> + return true;
> +}
> +
>  #ifdef CONFIG_PARAVIRT_SPINLOCKS
>  extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 
> val);
>  extern void __pv_init_lock_hash(void);
> @@ -38,33 +57,32 @@ static inline bool vcpu_is_preempted(long cpu)
>  {
>   return pv_vcpu_is_preempted(cpu);
>  }
> +
> +void native_pv_lock_init(void) __init;
>  #else
>  static inline void queued_spin_unlock(struct qspinlock *lock)
>  {
>   native_queued_spin_unlock(lock);
>  }
> +
> +static inline void native_pv_lock_init(void)
> +{
> +}
>  #endif
>  
>  #ifdef CONFIG_PARAVIRT
>  #define virt_spin_lock virt_spin_lock
> +#ifdef CONFIG_PARAVIRT_SPINLOCKS
>  static inline bool virt_spin_lock(struct qspinlock *lock)
>  {
> - if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
> - return false;
 Have you consider just add one more jump label here to skip
 virt_spin_lock when KVM or Xen want to do so?
>>> Why? Did you look at patch 4? This is the way to do it...
>> I asked this because of my performance concern as stated later in the email.
> 
> For clarification, I was actually asking if you consider just adding one
> more jump label to skip it for Xen/KVM instead of making
> virt_spin_lock() a pv-op.

Aah, okay.

Bare metal could make use of this jump label, too.

I'll have a try how it looks like.


Juergen

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V2] vhost_net: correctly check tx avail during rx busy polling

2017-09-05 Thread David Miller
From: Jason Wang 
Date: Tue,  5 Sep 2017 09:22:05 +0800

> We check tx avail through vhost_enable_notify() in the past which is
> wrong since it only checks whether or not guest has filled more
> available buffer since last avail idx synchronization which was just
> done by vhost_vq_avail_empty() before. What we really want is checking
> pending buffers in the avail ring. Fix this by calling
> vhost_vq_avail_empty() instead.
> 
> This issue could be noticed by doing netperf TCP_RR benchmark as
> client from guest (but not host). With this fix, TCP_RR from guest to
> localhost restores from 1375.91 trans per sec to 55235.28 trans per
> sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
> 
> Fixes: 030881372460 ("vhost_net: basic polling support")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] virtio-net: invoke zerocopy callback on xmit path if no tx napi

2017-09-05 Thread Michael S. Tsirkin
On Tue, Sep 05, 2017 at 04:09:19PM +0200, Willem de Bruijn wrote:
> On Mon, Sep 4, 2017 at 5:03 AM, Jason Wang  wrote:
> >
> >
> > On 2017年09月02日 00:17, Willem de Bruijn wrote:
> >
> > This is not a 50/50 split, which impliesTw that some packets from the
> > large
> > packet flow are still converted to copying. Without the change the rate
> > without queue was 80k zerocopy vs 80k copy, so this choice of
> > (vq->num >> 2) appears too conservative.
> >
> > However, testing with (vq->num >> 1) was not as effective at mitigating
> > stalls. I did not save that data, unfortunately. Can run more tests on
> > fine
> > tuning this variable, if the idea sounds good.
> 
> 
>  Looks like there're still two cases were left:
> >>>
> >>> To be clear, this patch is not intended to fix all issues. It is a small
> >>> improvement to avoid HoL blocking due to queued zerocopy skbs.
> >
> >
> > Right, just want to see if there's anything left.
> >
> >>>
> >>> The trade-off is that reverting to copying in these cases increases
> >>> cycle cost. I think that that is a trade-off worth making compared to
> >>> the alternative drop in throughput. It probably would be good to be
> >>> able to measure this without kernel instrumentation: export
> >>> counters similar to net->tx_zcopy_err and net->tx_packets (though
> >>> without reset to zero, as in vhost_net_tx_packet).
> >
> >
> > I think it's acceptable if extra cycles were spent if we detect HOL anyhow.
> >
> >>>
>  1) sndbuf is not INT_MAX
> >>>
> >>> You mean the case where the device stalls, later zerocopy notifications
> >>> are queued, but these are never cleaned in free_old_xmit_skbs,
> >>> because it requires a start_xmit and by now the (only) socket is out of
> >>> descriptors?
> >>
> >> Typo, sorry. I meant out of sndbuf.
> >
> >
> > I mean e.g for tun. If its sndbuf is smaller than e.g (vq->num >> 1) *
> > $pkt_size and if all packet were held by some modules, limitation like
> > vq->num >> 1 won't work since we hit sudbuf before it.
> 
> Good point.

I agree however anyone using SNDBUF < infinity already hits HOQ blocking
in some scenarios.


> >>
> >>> A watchdog would help somewhat. With tx-napi, this case cannot occur,
> >>> either, as free_old_xmit_skbs no longer depends on a call to start_xmit.
> >>>
>  2) tx napi is used for virtio-net
> >>>
> >>> I am not aware of any issue specific to the use of tx-napi?
> >
> >
> > Might not be clear here, I mean e.g virtio_net (tx-napi) in guest +
> > vhost_net (zerocopy) in host. In this case, even if we switch to datacopy if
> > ubuf counts exceeds vq->num >> 1, we still complete tx buffers in order, tx
> > interrupt could be delayed for indefinite time.
> 
> Copied buffers are completed immediately in handle_tx.
> 
> Do you mean when a process sends fewer packets than vq->num >> 1,
> so that all are queued? Yes, then the latency is indeed that of the last
> element leaving the qdisc.
> 
> >>>
>  1) could be a corner case, and for 2) what your suggest here may not
>  solve
>  the issue since it still do in order completion.
> >>>
> >>> Somewhat tangential, but it might also help to break the in-order
> >>> completion processing in vhost_zerocopy_signal_used. Complete
> >>> all descriptors between done_idx and upend_idx. done_idx should
> >>> then only be forward to the oldest still not-completed descriptor.
> >>>
> >>> In the test I ran, where the oldest descriptors are held in a queue and
> >>> all newer ones are tail-dropped,
> >
> >
> > Do you mean the descriptors were tail-dropped by vhost?
> 
> Tail-dropped by netem. The dropped items are completed out of
> order by vhost before the held items.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization