Re: vhost + multiqueue + RSS question.

2014-11-16 Thread Gleb Natapov
On Sun, Nov 16, 2014 at 08:56:04PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote:
> > Hi Michael,
> > 
> >  I am playing with vhost multiqueue capability and have a question about
> > vhost multiqueue and RSS (receive side steering). My setup has Mellanox
> > ConnectX-3 NIC which supports multiqueue and RSS. Network related
> > parameters for qemu are:
> > 
> >-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
> >-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10
> > 
> > In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.
> > 
> > I am running one tcp stream into the guest using iperf. Since there is
> > only one tcp stream I expect it to be handled by one queue only but
> > this seams to be not the case. ethtool -S on a host shows that the
> > stream is handled by one queue in the NIC, just like I would expect,
> > but in a guest all 4 virtio-input interrupt are incremented. Am I
> > missing any configuration?
> 
> I don't see anything obviously wrong with what you describe.
> Maybe, somehow, same irqfd got bound to multiple MSI vectors?
It does not look like this is what is happening judging by the way
interrupts are distributed between queues. They are not distributed
uniformly and often I see one queue gets most interrupt and others get
much less and then it changes.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost + multiqueue + RSS question.

2014-11-16 Thread Gleb Natapov
On Mon, Nov 17, 2014 at 01:30:06PM +0800, Jason Wang wrote:
> On 11/17/2014 02:56 AM, Michael S. Tsirkin wrote:
> > On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote:
> >> Hi Michael,
> >>
> >>  I am playing with vhost multiqueue capability and have a question about
> >> vhost multiqueue and RSS (receive side steering). My setup has Mellanox
> >> ConnectX-3 NIC which supports multiqueue and RSS. Network related
> >> parameters for qemu are:
> >>
> >>-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
> >>-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10
> >>
> >> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.
> >>
> >> I am running one tcp stream into the guest using iperf. Since there is
> >> only one tcp stream I expect it to be handled by one queue only but
> >> this seams to be not the case. ethtool -S on a host shows that the
> >> stream is handled by one queue in the NIC, just like I would expect,
> >> but in a guest all 4 virtio-input interrupt are incremented. Am I
> >> missing any configuration?
> > I don't see anything obviously wrong with what you describe.
> > Maybe, somehow, same irqfd got bound to multiple MSI vectors?
> > To see, can you try dumping struct kvm_irqfd that's passed to kvm?
> >
> >
> >> --
> >>Gleb.
> 
> This sounds like a regression, which kernel/qemu version did you use?
Sorry, should have mentioned it from the start. Host is a fedora 20 with
kernel 3.16.6-200.fc20.x86_64 and qemu-system-x86-1.6.2-9.fc20.x86_64.
Guest is also fedora 20 but with an older kernel 3.11.10-301.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested KVM slower than QEMU with gnumach guest kernel

2014-11-16 Thread Jan Kiszka
On 2014-11-16 23:18, Samuel Thibault wrote:
> Hello,
> 
> Jan Kiszka, le Wed 12 Nov 2014 00:42:52 +0100, a écrit :
>> On 2014-11-11 19:55, Samuel Thibault wrote:
>>> jenkins.debian.net is running inside a KVM VM, and it runs nested
>>> KVM guests for its installation attempts.  This goes fine with Linux
>>> kernels, but it is extremely slow with gnumach kernels.
> 
>> You can try to catch a trace (ftrace) on the physical host.
>>
>> I suspect the setup forces a lot of instruction emulation, either on L0
>> or L1. And that is slower than QEMU is KVM does not optimize like QEMU does.
> 
> Here is a sample of trace-cmd output dump: the same kind of pattern
> repeats over and over, with EXTERNAL_INTERRUPT happening mostly
> every other microsecond:
> 
>  qemu-system-x86-9752  [003]  4106.187755: kvm_exit: reason 
> EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
>  qemu-system-x86-9752  [003]  4106.187756: kvm_entry:vcpu 0
>  qemu-system-x86-9752  [003]  4106.187757: kvm_exit: reason 
> EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
>  qemu-system-x86-9752  [003]  4106.187758: kvm_entry:vcpu 0
>  qemu-system-x86-9752  [003]  4106.187759: kvm_exit: reason 
> EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
>  qemu-system-x86-9752  [003]  4106.187760: kvm_entry:vcpu 0

You may want to turn on more trace events, if not all, to possibly see
what Linux does then. The next level after that is function tracing (may
require a kernel rebuild or a tracing kernel of the distro).

> 
> The various functions being interrupted are vmx_vcpu_run
> (0xa02848b1 and 0xa0284972), handle_io
> (0xa027ee62), vmx_get_cpl (0xa027a7de),
> load_vmc12_host_state (0xa027ea31), native_read_tscp
> (0x81050a84), native_write_msr_safe (0x81050aa6),
> vmx_decache_cr0_guest_bits (0xa027a384),
> vmx_handle_external_intr (0xa027a54d).
> 
> AIUI, the external interrupt is 0xf6, i.e. Linux' IRQ_WORK_VECTOR.  I
> however don't see any of them, neither in L0's /proc/interrupts, nor in
> L1's /proc/interrupts...

I suppose this is a SMP host and guest? Does reducing CPUs to 1 change
to picture? If not, it may help to understand cause and effect easier.

Jan




signature.asc
Description: OpenPGP digital signature


Re: vhost + multiqueue + RSS question.

2014-11-16 Thread Jason Wang
On 11/17/2014 12:54 PM, Venkateswara Rao Nandigam wrote:
> I have a question related this topic. So How do you set the RSS Key on the 
> Mellanox NIc? I mean from your Guest?

I believe it's possible but not implemented currently. The issue is the
implementation should not be vendor specific.

TUN/TAP has its own automatic flow steering implementation (flow caches).
>
> If it being set as part of Host driver, is there a way to set it from Guest? 
> I mean my guest will choose a RSS Key and will try to set on the Physical NIC.

Flow caches can co-operate with RFS/aRFS now, so there's indeed some
kind of co-operation between host card and guest I believe.
>
> Thanks,
> Venkatesh
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost + multiqueue + RSS question.

2014-11-16 Thread Jason Wang
On 11/17/2014 02:56 AM, Michael S. Tsirkin wrote:
> On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote:
>> Hi Michael,
>>
>>  I am playing with vhost multiqueue capability and have a question about
>> vhost multiqueue and RSS (receive side steering). My setup has Mellanox
>> ConnectX-3 NIC which supports multiqueue and RSS. Network related
>> parameters for qemu are:
>>
>>-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
>>-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10
>>
>> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.
>>
>> I am running one tcp stream into the guest using iperf. Since there is
>> only one tcp stream I expect it to be handled by one queue only but
>> this seams to be not the case. ethtool -S on a host shows that the
>> stream is handled by one queue in the NIC, just like I would expect,
>> but in a guest all 4 virtio-input interrupt are incremented. Am I
>> missing any configuration?
> I don't see anything obviously wrong with what you describe.
> Maybe, somehow, same irqfd got bound to multiple MSI vectors?
> To see, can you try dumping struct kvm_irqfd that's passed to kvm?
>
>
>> --
>>  Gleb.

This sounds like a regression, which kernel/qemu version did you use?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: vhost + multiqueue + RSS question.

2014-11-16 Thread Venkateswara Rao Nandigam
I have a question related this topic. So How do you set the RSS Key on the 
Mellanox NIc? I mean from your Guest?

If it being set as part of Host driver, is there a way to set it from Guest? I 
mean my guest will choose a RSS Key and will try to set on the Physical NIC.

Thanks,
Venkatesh

-Original Message-
From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On Behalf Of 
Michael S. Tsirkin
Sent: Monday, November 17, 2014 12:26 AM
To: Gleb Natapov
Cc: kvm@vger.kernel.org; Jason Wang; virtualizat...@lists.linux-foundation.org
Subject: Re: vhost + multiqueue + RSS question.

On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote:
> Hi Michael,
> 
>  I am playing with vhost multiqueue capability and have a question 
> about vhost multiqueue and RSS (receive side steering). My setup has 
> Mellanox
> ConnectX-3 NIC which supports multiqueue and RSS. Network related 
> parameters for qemu are:
> 
>-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
>-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10
> 
> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.
> 
> I am running one tcp stream into the guest using iperf. Since there is 
> only one tcp stream I expect it to be handled by one queue only but 
> this seams to be not the case. ethtool -S on a host shows that the 
> stream is handled by one queue in the NIC, just like I would expect, 
> but in a guest all 4 virtio-input interrupt are incremented. Am I 
> missing any configuration?

I don't see anything obviously wrong with what you describe.
Maybe, somehow, same irqfd got bound to multiple MSI vectors?
To see, can you try dumping struct kvm_irqfd that's passed to kvm?


> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a 
message to majord...@vger.kernel.org More majordomo info at  
http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] KVM: simplification to the memslots code

2014-11-16 Thread Takuya Yoshikawa
On 2014/11/14 20:11, Paolo Bonzini wrote:
> Hi Igor and Takuya,
> 
> here are a few small patches that simplify __kvm_set_memory_region
> and associated code.  Can you please review them?

Ah, already queued.  Sorry for being late to respond.

Takuya

> 
> Thanks,
> 
> Paolo
> 
> Paolo Bonzini (3):
>kvm: memslots: track id_to_index changes during the insertion sort
>kvm: commonize allocation of the new memory slots
>kvm: simplify update_memslots invocation
> 
>   virt/kvm/kvm_main.c | 87 
> ++---
>   1 file changed, 36 insertions(+), 51 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 2/2] kvm: x86: mmio: fix setting the present bit of mmio spte

2014-11-16 Thread Chen, Tiejun

On 2014/11/14 18:11, Paolo Bonzini wrote:



On 14/11/2014 10:31, Tiejun Chen wrote:

In PAE case maxphyaddr may be 52bit as well, we also need to
disable mmio page fault. Here we can check MMIO_SPTE_GEN_HIGH_SHIFT
directly to determine if we should set the present bit, and
bring a little cleanup.

Signed-off-by: Tiejun Chen 
---
  arch/x86/include/asm/kvm_host.h |  1 +
  arch/x86/kvm/mmu.c  | 23 +++
  arch/x86/kvm/x86.c  | 30 --
  3 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dc932d3..667f2b6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -809,6 +809,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 struct kvm_memory_slot *slot,
 gfn_t gfn_offset, unsigned long mask);
  void kvm_mmu_zap_all(struct kvm *kvm);
+void kvm_set_mmio_spte_mask(void);
  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ac1c4de..8e4be36 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -295,6 +295,29 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
return likely(kvm_gen == spte_gen);
  }

+/*
+ * Set the reserved bits and the present bit of an paging-structure
+ * entry to generate page fault with PFER.RSV = 1.
+ */
+void kvm_set_mmio_spte_mask(void)
+{
+   u64 mask;
+   int maxphyaddr = boot_cpu_data.x86_phys_bits;
+
+   /* Mask the reserved physical address bits. */
+   mask = rsvd_bits(maxphyaddr, MMIO_SPTE_GEN_HIGH_SHIFT - 1);
+
+   /* Magic bits are always reserved for 32bit host. */
+   mask |= 0x3ull << 62;


This should be enough to trigger the page fault on PAE systems.

The problem is specific to non-EPT 64-bit hosts, where the PTEs have no
reserved bits beyond 51:MAXPHYADDR.  On EPT we use WX- permissions to
trigger EPT misconfig, on 32-bit systems we have bit 62.


Thanks for your explanation.




+   /* Set the present bit to enable mmio page fault. */
+   if (maxphyaddr < MMIO_SPTE_GEN_HIGH_SHIFT)
+   mask = PT_PRESENT_MASK;


Shouldn't this be "|=" anyway, instead of "="?



Yeah, just miss this. Thanks a lot, I will fix this in next revision.

Thanks
Tiejun
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] kvm: commonize allocation of the new memory slots

2014-11-16 Thread Takuya Yoshikawa
On 2014/11/14 20:12, Paolo Bonzini wrote:
> The two kmemdup invocations can be unified.  I find that the new
> placement of the comment makes it easier to see what happens.

A lot easier to follow the logic.

Reviewed-by: Takuya Yoshikawa 

> 
> Signed-off-by: Paolo Bonzini 
> ---
>   virt/kvm/kvm_main.c | 28 +++-
>   1 file changed, 11 insertions(+), 17 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c8ff99cc0ccb..7bfc842b96d7 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -865,11 +865,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   goto out_free;
>   }
>   
> + slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
> + GFP_KERNEL);
> + if (!slots)
> + goto out_free;
> +
>   if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
> - slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
> - GFP_KERNEL);
> - if (!slots)
> - goto out_free;
>   slot = id_to_memslot(slots, mem->slot);
>   slot->flags |= KVM_MEMSLOT_INVALID;
>   
> @@ -885,6 +886,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
>*  - kvm_is_visible_gfn (mmu_check_roots)
>*/
>   kvm_arch_flush_shadow_memslot(kvm, slot);
> +
> + /*
> +  * We can re-use the old_memslots from above, the only 
> difference
> +  * from the currently installed memslots is the invalid flag.  
> This
> +  * will get overwritten by update_memslots anyway.
> +  */
>   slots = old_memslots;
>   }
>   
> @@ -892,19 +899,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
>   if (r)
>   goto out_slots;
>   
> - r = -ENOMEM;
> - /*
> -  * We can re-use the old_memslots from above, the only difference
> -  * from the currently installed memslots is the invalid flag.  This
> -  * will get overwritten by update_memslots anyway.
> -  */
> - if (!slots) {
> - slots = kmemdup(kvm->memslots, sizeof(struct kvm_memslots),
> - GFP_KERNEL);
> - if (!slots)
> - goto out_free;
> - }
> -
>   /* actual memory is freed via old in kvm_free_physmem_slot below */
>   if (change == KVM_MR_DELETE) {
>   new.dirty_bitmap = NULL;
> 


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH 1/2] kvm: x86: mmu: return zero if s > e in rsvd_bits()

2014-11-16 Thread Chen, Tiejun

On 2014/11/14 18:06, Paolo Bonzini wrote:



On 14/11/2014 10:31, Tiejun Chen wrote:

In some real scenarios 'start' may not be less than 'end' like
maxphyaddr = 52.

Signed-off-by: Tiejun Chen 
---
  arch/x86/kvm/mmu.h | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index bde8ee7..0e98b5e 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -58,6 +58,8 @@

  static inline u64 rsvd_bits(int s, int e)
  {
+   if (unlikely(s > e))
+   return 0;
return ((1ULL << (e - s + 1)) - 1) << s;
  }




s == e + 1 is supported:

(1ULL << (e - (e + 1) + 1)) - 1) << s ==


(1ULL << (e - (e + 1) + 1)) - 1) << s
= (1ULL << (e - e - 1) + 1)) - 1) << s
= (1ULL << (-1) + 1)) - 1) << s
= (1ULL << (0) - 1) << s
= (1ULL << (- 1) << s

Am I missing something?

Thanks
Tiejun


(1ULL << 0) << s ==
0

Is there any case where s is even bigger?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested KVM slower than QEMU with gnumach guest kernel

2014-11-16 Thread Samuel Thibault
Hello,

Jan Kiszka, le Wed 12 Nov 2014 00:42:52 +0100, a écrit :
> On 2014-11-11 19:55, Samuel Thibault wrote:
> > jenkins.debian.net is running inside a KVM VM, and it runs nested
> > KVM guests for its installation attempts.  This goes fine with Linux
> > kernels, but it is extremely slow with gnumach kernels.

> You can try to catch a trace (ftrace) on the physical host.
> 
> I suspect the setup forces a lot of instruction emulation, either on L0
> or L1. And that is slower than QEMU is KVM does not optimize like QEMU does.

Here is a sample of trace-cmd output dump: the same kind of pattern
repeats over and over, with EXTERNAL_INTERRUPT happening mostly
every other microsecond:

 qemu-system-x86-9752  [003]  4106.187755: kvm_exit: reason 
EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
 qemu-system-x86-9752  [003]  4106.187756: kvm_entry:vcpu 0
 qemu-system-x86-9752  [003]  4106.187757: kvm_exit: reason 
EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
 qemu-system-x86-9752  [003]  4106.187758: kvm_entry:vcpu 0
 qemu-system-x86-9752  [003]  4106.187759: kvm_exit: reason 
EXTERNAL_INTERRUPT rip 0xa02848b1 info 0 80f6
 qemu-system-x86-9752  [003]  4106.187760: kvm_entry:vcpu 0

The various functions being interrupted are vmx_vcpu_run
(0xa02848b1 and 0xa0284972), handle_io
(0xa027ee62), vmx_get_cpl (0xa027a7de),
load_vmc12_host_state (0xa027ea31), native_read_tscp
(0x81050a84), native_write_msr_safe (0x81050aa6),
vmx_decache_cr0_guest_bits (0xa027a384),
vmx_handle_external_intr (0xa027a54d).

AIUI, the external interrupt is 0xf6, i.e. Linux' IRQ_WORK_VECTOR.  I
however don't see any of them, neither in L0's /proc/interrupts, nor in
L1's /proc/interrupts...

Samuel


trace.bz2
Description: Binary data


[PATCH] KVM: x86: Fix lost interrupt on irr_pending race

2014-11-16 Thread Nadav Amit
apic_find_highest_irr assumes irr_pending is set if any vector in APIC_IRR is
set.  If this assumption is broken and apicv is disabled, the injection of
interrupts may be deferred until another interrupt is delivered to the guest.
Ultimately, if no other interrupt should be injected to that vCPU, the pending
interrupt may be lost.

commit 56cc2406d68c ("KVM: nVMX: fix "acknowledge interrupt on exit" when APICv
is in use") changed the behavior of apic_clear_irr so irr_pending is cleared
after setting APIC_IRR vector. After this commit, if apic_set_irr and
apic_clear_irr run simultaneously, a race may occur, resulting in APIC_IRR
vector set, and irr_pending cleared. In the following example, assume a single
vector is set in IRR prior to calling apic_clear_irr:

apic_set_irrapic_clear_irr
--
apic->irr_pending = true;
apic_clear_vector(...);
vec = apic_search_irr(apic);
// => vec == -1
apic_set_vector(...);
apic->irr_pending = (vec != -1);
// => apic->irr_pending == false

Nonetheless, it appears the race might even occur prior to this commit:

apic_set_irrapic_clear_irr
--
apic->irr_pending = true;
apic->irr_pending = false;
apic_clear_vector(...);
if (apic_search_irr(apic) != -1)
apic->irr_pending = true;
// => apic->irr_pending == false
apic_set_vector(...);

Fixing this issue by:
1. Restoring the previous behavior of apic_clear_irr: clear irr_pending, call
   apic_clear_vector, and then if APIC_IRR is non-zero, set irr_pending.
2. On apic_set_irr: first call apic_set_vector, then set irr_pending.

Signed-off-by: Nadav Amit 
---
 arch/x86/kvm/lapic.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 6e8ce5a..e0e5642 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -341,8 +341,12 @@ EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline void apic_set_irr(int vec, struct kvm_lapic *apic)
 {
-   apic->irr_pending = true;
apic_set_vector(vec, apic->regs + APIC_IRR);
+   /*
+* irr_pending must be true if any interrupt is pending; set it after
+* APIC_IRR to avoid race with apic_clear_irr
+*/
+   apic->irr_pending = true;
 }
 
 static inline int apic_search_irr(struct kvm_lapic *apic)
@@ -374,13 +378,15 @@ static inline void apic_clear_irr(int vec, struct 
kvm_lapic *apic)
 
vcpu = apic->vcpu;
 
-   apic_clear_vector(vec, apic->regs + APIC_IRR);
-   if (unlikely(kvm_apic_vid_enabled(vcpu->kvm)))
+   if (unlikely(kvm_apic_vid_enabled(vcpu->kvm))) {
/* try to update RVI */
+   apic_clear_vector(vec, apic->regs + APIC_IRR);
kvm_make_request(KVM_REQ_EVENT, vcpu);
-   else {
-   vec = apic_search_irr(apic);
-   apic->irr_pending = (vec != -1);
+   } else {
+   apic->irr_pending = false;
+   apic_clear_vector(vec, apic->regs + APIC_IRR);
+   if (apic_search_irr(apic) != -1)
+   apic->irr_pending = true;
}
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost + multiqueue + RSS question.

2014-11-16 Thread Michael S. Tsirkin
On Sun, Nov 16, 2014 at 06:18:18PM +0200, Gleb Natapov wrote:
> Hi Michael,
> 
>  I am playing with vhost multiqueue capability and have a question about
> vhost multiqueue and RSS (receive side steering). My setup has Mellanox
> ConnectX-3 NIC which supports multiqueue and RSS. Network related
> parameters for qemu are:
> 
>-netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
>-device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10
> 
> In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.
> 
> I am running one tcp stream into the guest using iperf. Since there is
> only one tcp stream I expect it to be handled by one queue only but
> this seams to be not the case. ethtool -S on a host shows that the
> stream is handled by one queue in the NIC, just like I would expect,
> but in a guest all 4 virtio-input interrupt are incremented. Am I
> missing any configuration?

I don't see anything obviously wrong with what you describe.
Maybe, somehow, same irqfd got bound to multiple MSI vectors?
To see, can you try dumping struct kvm_irqfd that's passed to kvm?


> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


vhost + multiqueue + RSS question.

2014-11-16 Thread Gleb Natapov
Hi Michael,

 I am playing with vhost multiqueue capability and have a question about
vhost multiqueue and RSS (receive side steering). My setup has Mellanox
ConnectX-3 NIC which supports multiqueue and RSS. Network related
parameters for qemu are:

   -netdev tap,id=hn0,script=qemu-ifup.sh,vhost=on,queues=4
   -device virtio-net-pci,netdev=hn0,id=nic1,mq=on,vectors=10

In a guest I ran "ethtool -L eth0 combined 4" to enable multiqueue.

I am running one tcp stream into the guest using iperf. Since there is
only one tcp stream I expect it to be handled by one queue only but
this seams to be not the case. ethtool -S on a host shows that the
stream is handled by one queue in the NIC, just like I would expect,
but in a guest all 4 virtio-input interrupt are incremented. Am I
missing any configuration?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Benchmarking for vhost polling patch

2014-11-16 Thread Michael S. Tsirkin
On Sun, Nov 16, 2014 at 02:08:49PM +0200, Razya Ladelsky wrote:
> Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:
> 
> > From: Razya Ladelsky/Haifa/IBM@IBMIL
> > To: m...@redhat.com
> > Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
> > Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
> > Joel Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, kvm@vger.kernel.org
> > Date: 29/10/2014 02:38 PM
> > Subject: Benchmarking for vhost polling patch
> > 
> > Hi Michael,
> > 
> > Following the polling patch thread: http://marc.info/?
> > l=kvm&m=140853271510179&w=2, 
> > I changed poll_stop_idle to be counted in micro seconds, and carried out 
> 
> > experiments using varying sizes of this value. 
> > 
> > If it makes sense to you, I will continue with the other changes 
> > requested for 
> > the patch.
> > 
> > Thank you,
> > Razya
> > 
> > 
> 
> Dear Michael,
> I'm still interested in hearing your opinion about these numbers 
> http://marc.info/?l=kvm&m=141458631532669&w=2, 
> and whether it is worthwhile to continue with the polling patch.
> Thank you,
> Razya 
> 
> 
> > 
> > 

Hi Razya,
On the netperf benchmark, it looks like polling=10 gives a modest but
measureable gain.  So from that perspective it might be worth it if it's
not too much code, though we'll need to spend more time checking the
macro effect - we barely moved the needle on the macro benchmark and
that is suspicious.
Is there a chance you are actually trading latency for throughput?
do you observe any effect on latency?
How about trying some other benchmark, e.g. NFS?


Also, I am wondering:

since vhost thread is polling in kernel anyway, shouldn't
we try and poll the host NIC?
that would likely reduce at least the latency significantly,
won't it?


-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Benchmarking for vhost polling patch

2014-11-16 Thread Razya Ladelsky
Razya Ladelsky/Haifa/IBM@IBMIL wrote on 29/10/2014 02:38:31 PM:

> From: Razya Ladelsky/Haifa/IBM@IBMIL
> To: m...@redhat.com
> Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
> Eran Raichstein/Haifa/IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, 
> Joel Nider/Haifa/IBM@IBMIL, abel.gor...@gmail.com, kvm@vger.kernel.org
> Date: 29/10/2014 02:38 PM
> Subject: Benchmarking for vhost polling patch
> 
> Hi Michael,
> 
> Following the polling patch thread: http://marc.info/?
> l=kvm&m=140853271510179&w=2, 
> I changed poll_stop_idle to be counted in micro seconds, and carried out 

> experiments using varying sizes of this value. 
> 
> If it makes sense to you, I will continue with the other changes 
> requested for 
> the patch.
> 
> Thank you,
> Razya
> 
> 

Dear Michael,
I'm still interested in hearing your opinion about these numbers 
http://marc.info/?l=kvm&m=141458631532669&w=2, 
and whether it is worthwhile to continue with the polling patch.
Thank you,
Razya 


> 
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html