Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2012-01-15 Thread Michael S. Tsirkin
On Thu, Jan 12, 2012 at 04:09:17PM +0200, Avi Kivity wrote:
 On 10/11/2011 08:38 PM, Michael S. Tsirkin wrote:
  To forward an interrupt to a vcpu that runs on
  a host cpu different from the current one,
  we need an ipi which likely will cost us as much
  as delivering the interrupt directly to that cpu would.
 
  Set irq affinity hint to point there, irq balancer
  can then take this into accound and balance
  interrupts accordingly.
 
   
  +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
  +{
  +   const struct cpumask *mask;
  +   /* raw_smp_processor_id() is ok here: if we get preempted we can get a
  +* wrong value but we don't mind much. */
  +   if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
  +   mask = get_cpu_mask(vcpu-cpu);
  +   irq_set_affinity_hint(host_irq, mask);
  +   }
  +}
  +   
   int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
  struct kvm_lapic_irq *irq, int host_irq)
   {
  @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
  kvm_lapic *src,
  if (r  0)
  r = 0;
  r += kvm_apic_set_irq(vcpu, irq);
  +   kvm_vcpu_host_irq_hint(vcpu, host_irq);
 
 Doing this every time seems excessive.

Please note this doesn't modify the mapping in the hardware:
it only updates the hint for the irqbalancer,
and only if we target a CPU which is different
from the current one.

 How about doing it every N interrupts?

That's certainly easy.

 We can even collect information about which vcpus were
 targeted, and then use a mask instead of just one vcpu.

Yes, we can, but it certainly becomes much more complex.
The motivatin behind using the same host cpu is reducing the
number of IPIs. If the interrupt is bouncing between
guest VCPUs, this optimizaion won't work, so what's the
theory behind using a mask?
How about detecting an unstable vcpu mapping and
completely disabling the optimization, instead?

 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2012-01-12 Thread Avi Kivity
On 10/11/2011 08:38 PM, Michael S. Tsirkin wrote:
 To forward an interrupt to a vcpu that runs on
 a host cpu different from the current one,
 we need an ipi which likely will cost us as much
 as delivering the interrupt directly to that cpu would.

 Set irq affinity hint to point there, irq balancer
 can then take this into accound and balance
 interrupts accordingly.

  
 +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
 +{
 + const struct cpumask *mask;
 + /* raw_smp_processor_id() is ok here: if we get preempted we can get a
 +  * wrong value but we don't mind much. */
 + if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
 + mask = get_cpu_mask(vcpu-cpu);
 + irq_set_affinity_hint(host_irq, mask);
 + }
 +}
 + 
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
   struct kvm_lapic_irq *irq, int host_irq)
  {
 @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   if (r  0)
   r = 0;
   r += kvm_apic_set_irq(vcpu, irq);
 + kvm_vcpu_host_irq_hint(vcpu, host_irq);

Doing this every time seems excessive.  How about doing it every N
interrupts?  We can even collect information about which vcpus were
targeted, and then use a mask instead of just one vcpu.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2012-01-11 Thread Marcelo Tosatti
On Mon, Oct 17, 2011 at 07:04:40PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote:
Configurations to consider, all common ones used for assigned devices?
   
   I mean, besides round robin, any other modes that
   have an issue? Interrupts can also be multicast,
   I think, but we probably don't care what happens
   to affinity then, as msi interrupts are probably never
   broadcast ...
  
  There is also lowest priority, which can be used with MSI.
 
 
 So the following will probably address that comment?

Yes, it does. Patch looks fine.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Marcelo Tosatti
On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
 On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
  On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
   To forward an interrupt to a vcpu that runs on
   a host cpu different from the current one,
   we need an ipi which likely will cost us as much
   as delivering the interrupt directly to that cpu would.
   
   Set irq affinity hint to point there, irq balancer
   can then take this into accound and balance
   interrupts accordingly.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
virt/kvm/assigned-dev.c |8 +---
virt/kvm/irq_comm.c |   17 -
2 files changed, 21 insertions(+), 4 deletions(-)
   
   diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
   index f89f138..b579777 100644
   --- a/virt/kvm/assigned-dev.c
   +++ b/virt/kvm/assigned-dev.c
   @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
 for (i = 0; i  assigned_dev-entries_nr; i++)
 disable_irq(assigned_dev-host_msix_entries[i].vector);

   - for (i = 0; i  assigned_dev-entries_nr; i++)
   - free_irq(assigned_dev-host_msix_entries[i].vector,
   -  (void *)assigned_dev);
   + for (i = 0; i  assigned_dev-entries_nr; i++) {
   + u32 vector = assigned_dev-host_msix_entries[i].vector;
   + irq_set_affinity_hint(vector, NULL);
   + free_irq(vector, (void *)assigned_dev);
   + }

 assigned_dev-entries_nr = 0;
 kfree(assigned_dev-host_msix_entries);
   diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
   index ac8b629..68b1f7c 100644
   --- a/virt/kvm/irq_comm.c
   +++ b/virt/kvm/irq_comm.c
   @@ -22,6 +22,7 @@

#include linux/kvm_host.h
#include linux/slab.h
   +#include linux/interrupt.h
#include trace/events/kvm.h

#include asm/msidef.h
   @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
   kvm_lapic_irq *irq)
#endif
}

   +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
   +{
   + const struct cpumask *mask;
   + /* raw_smp_processor_id() is ok here: if we get preempted we can get a
   +  * wrong value but we don't mind much. */
   + if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
   + mask = get_cpu_mask(vcpu-cpu);
   + irq_set_affinity_hint(host_irq, mask);
   + }
   +}
  
  Unsure about the internals of irq_set_affinity_hint, but AFAICS its
  exported so that irqbalance in userspace can make a decision.
 
 Yes. Pls note at the moment there's no hint so irqbalance
 will likely try to move the irq away from vcpu if that
 is doing a lot of work. My patch tries to correct that.
 
  If that is the case, then irqbalance update rate should be high enough
  to catch up with a vcpu migrating betweens cpus (which initially does
  not appear a sensible arrangement).
 
 At least for pinned vcpus, that's almost sure to be the case :)

What i mean is that the frequency of a vcpu migrating between cpus
might be higher than what irqbalance can cope with.

  The decision to have the host interrupt follow the vcpu seems a good
  one, given that it saves an IPI and is potentially more cache friendly
  overall.
 
  And AFAICS its more intelligent for the device assignment case than
  anything irqbalance can come up with
 
 Do you just propose overwriting affinity set by userspace then?

Yes.

 My concern would be to avoid breaking setups some users have,
 with carefully manually optimized affinity for vcpus and device irqs.

They can disable automatic in-kernel affinity.

 
  (note it depends on how the APIC is
  configured, your patch ignores that).
 
 Could you clarify please? What is meant by 'it' in 'it depends'?

It means the target vcpu selection. It depends on how the guest
APIC is programmed.

 Which APIC - host or guest - do you mean, and what are possible APIC
 configurations to consider?

Guest APIC. Guest APIC programmed with round robin would break the
static assignment on your patch.

Configurations to consider, all common ones used for assigned devices?


int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
 struct kvm_lapic_irq *irq, int host_irq)
{
   @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
   kvm_lapic *src,
 if (r  0)
 r = 0;
 r += kvm_apic_set_irq(vcpu, irq);
   + kvm_vcpu_host_irq_hint(vcpu, host_irq);
 } else if (kvm_lapic_enabled(vcpu)) {
 if (!lowest)
 lowest = vcpu;
   @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
   kvm_lapic *src,
 }
 }

   - if (lowest)
   + if (lowest) {
 r = 

Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Marcelo Tosatti
On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
 On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
  On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
   To forward an interrupt to a vcpu that runs on
   a host cpu different from the current one,
   we need an ipi which likely will cost us as much
   as delivering the interrupt directly to that cpu would.
   
   Set irq affinity hint to point there, irq balancer
   can then take this into accound and balance
   interrupts accordingly.
   
   Signed-off-by: Michael S. Tsirkin m...@redhat.com
   ---
virt/kvm/assigned-dev.c |8 +---
virt/kvm/irq_comm.c |   17 -
2 files changed, 21 insertions(+), 4 deletions(-)

Or maybe even automatic in-kernel irq affinity disabled by 
default (to avoid interference with userspace pinning). 
All of that if it makes sense performance wise, of course.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote:
 On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
  On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
   On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
To forward an interrupt to a vcpu that runs on
a host cpu different from the current one,
we need an ipi which likely will cost us as much
as delivering the interrupt directly to that cpu would.

Set irq affinity hint to point there, irq balancer
can then take this into accound and balance
interrupts accordingly.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 virt/kvm/assigned-dev.c |8 +---
 virt/kvm/irq_comm.c |   17 -
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index f89f138..b579777 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
for (i = 0; i  assigned_dev-entries_nr; i++)

disable_irq(assigned_dev-host_msix_entries[i].vector);
 
-   for (i = 0; i  assigned_dev-entries_nr; i++)
-   
free_irq(assigned_dev-host_msix_entries[i].vector,
-(void *)assigned_dev);
+   for (i = 0; i  assigned_dev-entries_nr; i++) {
+   u32 vector = 
assigned_dev-host_msix_entries[i].vector;
+   irq_set_affinity_hint(vector, NULL);
+   free_irq(vector, (void *)assigned_dev);
+   }
 
assigned_dev-entries_nr = 0;
kfree(assigned_dev-host_msix_entries);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index ac8b629..68b1f7c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -22,6 +22,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/interrupt.h
 #include trace/events/kvm.h
 
 #include asm/msidef.h
@@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
kvm_lapic_irq *irq)
 #endif
 }
 
+static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
+{
+   const struct cpumask *mask;
+   /* raw_smp_processor_id() is ok here: if we get preempted we 
can get a
+* wrong value but we don't mind much. */
+   if (host_irq = 0  unlikely(vcpu-cpu != 
raw_smp_processor_id())) {
+   mask = get_cpu_mask(vcpu-cpu);
+   irq_set_affinity_hint(host_irq, mask);
+   }
+}
   
   Unsure about the internals of irq_set_affinity_hint, but AFAICS its
   exported so that irqbalance in userspace can make a decision.
  
  Yes. Pls note at the moment there's no hint so irqbalance
  will likely try to move the irq away from vcpu if that
  is doing a lot of work. My patch tries to correct that.
  
   If that is the case, then irqbalance update rate should be high enough
   to catch up with a vcpu migrating betweens cpus (which initially does
   not appear a sensible arrangement).
  
  At least for pinned vcpus, that's almost sure to be the case :)
 
 What i mean is that the frequency of a vcpu migrating between cpus
 might be higher than what irqbalance can cope with.
 
   The decision to have the host interrupt follow the vcpu seems a good
   one, given that it saves an IPI and is potentially more cache friendly
   overall.
  
   And AFAICS its more intelligent for the device assignment case than
   anything irqbalance can come up with
  
  Do you just propose overwriting affinity set by userspace then?
 
 Yes.
 
  My concern would be to avoid breaking setups some users have,
  with carefully manually optimized affinity for vcpus and device irqs.
 
 They can disable automatic in-kernel affinity.

This still means code needs to be changed ...
Anyway, what's the interface for that?

  
   (note it depends on how the APIC is
   configured, your patch ignores that).
  
  Could you clarify please? What is meant by 'it' in 'it depends'?
 
 It means the target vcpu selection. It depends on how the guest
 APIC is programmed.
 
  Which APIC - host or guest - do you mean, and what are possible APIC
  configurations to consider?
 
 Guest APIC. Guest APIC programmed with round robin would break the
 static assignment on your patch.

For round robin we might just want to disable this
automatic affinity?

 Configurations to consider, all common ones used for assigned devices?

I mean, besides round robin, any other modes that
have an issue? Interrupts can also be multicast,
I think, but we probably don't care what happens
to affinity then, as msi interrupts are probably never
broadcast ...

 
 int kvm_irq_delivery_to_apic(struct 

Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 09:09:46AM -0200, Marcelo Tosatti wrote:
 On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
  On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
   On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
To forward an interrupt to a vcpu that runs on
a host cpu different from the current one,
we need an ipi which likely will cost us as much
as delivering the interrupt directly to that cpu would.

Set irq affinity hint to point there, irq balancer
can then take this into accound and balance
interrupts accordingly.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 virt/kvm/assigned-dev.c |8 +---
 virt/kvm/irq_comm.c |   17 -
 2 files changed, 21 insertions(+), 4 deletions(-)
 
 Or maybe even automatic in-kernel irq affinity disabled by 
 default (to avoid interference with userspace pinning). 

Right. But what's the interface to enable it?
There's no standard interface, is there?

The advantage of hinting is that it works
with existing tools, without changing userspace at all.

 All of that if it makes sense performance wise, of course.

That's the point. Almost any benchmark I'm aware of is run with vcpus
pinned.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Marcelo Tosatti
On Mon, Oct 17, 2011 at 03:32:15PM +0200, Michael S. Tsirkin wrote:
 On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote:
  On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
   On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
 To forward an interrupt to a vcpu that runs on
 a host cpu different from the current one,
 we need an ipi which likely will cost us as much
 as delivering the interrupt directly to that cpu would.
 
 Set irq affinity hint to point there, irq balancer
 can then take this into accound and balance
 interrupts accordingly.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  virt/kvm/assigned-dev.c |8 +---
  virt/kvm/irq_comm.c |   17 -
  2 files changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index f89f138..b579777 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
   for (i = 0; i  assigned_dev-entries_nr; i++)
   
 disable_irq(assigned_dev-host_msix_entries[i].vector);
  
 - for (i = 0; i  assigned_dev-entries_nr; i++)
 - 
 free_irq(assigned_dev-host_msix_entries[i].vector,
 -  (void *)assigned_dev);
 + for (i = 0; i  assigned_dev-entries_nr; i++) {
 + u32 vector = 
 assigned_dev-host_msix_entries[i].vector;
 + irq_set_affinity_hint(vector, NULL);
 + free_irq(vector, (void *)assigned_dev);
 + }
  
   assigned_dev-entries_nr = 0;
   kfree(assigned_dev-host_msix_entries);
 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index ac8b629..68b1f7c 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -22,6 +22,7 @@
  
  #include linux/kvm_host.h
  #include linux/slab.h
 +#include linux/interrupt.h
  #include trace/events/kvm.h
  
  #include asm/msidef.h
 @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
 kvm_lapic_irq *irq)
  #endif
  }
  
 +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int 
 host_irq)
 +{
 + const struct cpumask *mask;
 + /* raw_smp_processor_id() is ok here: if we get preempted we 
 can get a
 +  * wrong value but we don't mind much. */
 + if (host_irq = 0  unlikely(vcpu-cpu != 
 raw_smp_processor_id())) {
 + mask = get_cpu_mask(vcpu-cpu);
 + irq_set_affinity_hint(host_irq, mask);
 + }
 +}

Unsure about the internals of irq_set_affinity_hint, but AFAICS its
exported so that irqbalance in userspace can make a decision.
   
   Yes. Pls note at the moment there's no hint so irqbalance
   will likely try to move the irq away from vcpu if that
   is doing a lot of work. My patch tries to correct that.
   
If that is the case, then irqbalance update rate should be high enough
to catch up with a vcpu migrating betweens cpus (which initially does
not appear a sensible arrangement).
   
   At least for pinned vcpus, that's almost sure to be the case :)
  
  What i mean is that the frequency of a vcpu migrating between cpus
  might be higher than what irqbalance can cope with.
  
The decision to have the host interrupt follow the vcpu seems a good
one, given that it saves an IPI and is potentially more cache friendly
overall.
   
And AFAICS its more intelligent for the device assignment case than
anything irqbalance can come up with
   
   Do you just propose overwriting affinity set by userspace then?
  
  Yes.
  
   My concern would be to avoid breaking setups some users have,
   with carefully manually optimized affinity for vcpus and device irqs.
  
  They can disable automatic in-kernel affinity.
 
 This still means code needs to be changed ...
 Anyway, what's the interface for that?
 
   
(note it depends on how the APIC is
configured, your patch ignores that).
   
   Could you clarify please? What is meant by 'it' in 'it depends'?
  
  It means the target vcpu selection. It depends on how the guest
  APIC is programmed.
  
   Which APIC - host or guest - do you mean, and what are possible APIC
   configurations to consider?
  
  Guest APIC. Guest APIC programmed with round robin would break the
  static assignment on your patch.
 
 For round robin we might just want to disable this
 automatic affinity?

OK.

  Configurations to consider, all common ones used for assigned devices?
 
 I mean, besides round robin, any other modes that
 have an issue? Interrupts can also be 

Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote:
 On Mon, Oct 17, 2011 at 03:32:15PM +0200, Michael S. Tsirkin wrote:
  On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote:
   On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote:
On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
 On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
  To forward an interrupt to a vcpu that runs on
  a host cpu different from the current one,
  we need an ipi which likely will cost us as much
  as delivering the interrupt directly to that cpu would.
  
  Set irq affinity hint to point there, irq balancer
  can then take this into accound and balance
  interrupts accordingly.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   virt/kvm/assigned-dev.c |8 +---
   virt/kvm/irq_comm.c |   17 -
   2 files changed, 21 insertions(+), 4 deletions(-)
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index f89f138..b579777 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
  for (i = 0; i  assigned_dev-entries_nr; i++)
  
  disable_irq(assigned_dev-host_msix_entries[i].vector);
   
  -   for (i = 0; i  assigned_dev-entries_nr; i++)
  -   
  free_irq(assigned_dev-host_msix_entries[i].vector,
  -(void *)assigned_dev);
  +   for (i = 0; i  assigned_dev-entries_nr; i++) {
  +   u32 vector = 
  assigned_dev-host_msix_entries[i].vector;
  +   irq_set_affinity_hint(vector, NULL);
  +   free_irq(vector, (void *)assigned_dev);
  +   }
   
  assigned_dev-entries_nr = 0;
  kfree(assigned_dev-host_msix_entries);
  diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
  index ac8b629..68b1f7c 100644
  --- a/virt/kvm/irq_comm.c
  +++ b/virt/kvm/irq_comm.c
  @@ -22,6 +22,7 @@
   
   #include linux/kvm_host.h
   #include linux/slab.h
  +#include linux/interrupt.h
   #include trace/events/kvm.h
   
   #include asm/msidef.h
  @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
  kvm_lapic_irq *irq)
   #endif
   }
   
  +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int 
  host_irq)
  +{
  +   const struct cpumask *mask;
  +   /* raw_smp_processor_id() is ok here: if we get preempted we 
  can get a
  +* wrong value but we don't mind much. */
  +   if (host_irq = 0  unlikely(vcpu-cpu != 
  raw_smp_processor_id())) {
  +   mask = get_cpu_mask(vcpu-cpu);
  +   irq_set_affinity_hint(host_irq, mask);
  +   }
  +}
 
 Unsure about the internals of irq_set_affinity_hint, but AFAICS its
 exported so that irqbalance in userspace can make a decision.

Yes. Pls note at the moment there's no hint so irqbalance
will likely try to move the irq away from vcpu if that
is doing a lot of work. My patch tries to correct that.

 If that is the case, then irqbalance update rate should be high enough
 to catch up with a vcpu migrating betweens cpus (which initially does
 not appear a sensible arrangement).

At least for pinned vcpus, that's almost sure to be the case :)
   
   What i mean is that the frequency of a vcpu migrating between cpus
   might be higher than what irqbalance can cope with.
   
 The decision to have the host interrupt follow the vcpu seems a good
 one, given that it saves an IPI and is potentially more cache friendly
 overall.

 And AFAICS its more intelligent for the device assignment case than
 anything irqbalance can come up with

Do you just propose overwriting affinity set by userspace then?
   
   Yes.
   
My concern would be to avoid breaking setups some users have,
with carefully manually optimized affinity for vcpus and device irqs.
   
   They can disable automatic in-kernel affinity.
  
  This still means code needs to be changed ...
  Anyway, what's the interface for that?
  

 (note it depends on how the APIC is
 configured, your patch ignores that).

Could you clarify please? What is meant by 'it' in 'it depends'?
   
   It means the target vcpu selection. It depends on how the guest
   APIC is programmed.
   
Which APIC - host or guest - do you mean, and what are possible APIC
configurations to consider?
   
   Guest APIC. Guest APIC programmed with round robin would break the
   static assignment on your patch.
  
  For round robin we might just want to disable this
  automatic affinity?
 
 OK.
 
   Configurations 

Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-17 Thread Michael S. Tsirkin
On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote:
   Configurations to consider, all common ones used for assigned devices?
  
  I mean, besides round robin, any other modes that
  have an issue? Interrupts can also be multicast,
  I think, but we probably don't care what happens
  to affinity then, as msi interrupts are probably never
  broadcast ...
 
 There is also lowest priority, which can be used with MSI.


So the following will probably address that comment?

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index f89f138..b579777 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
for (i = 0; i  assigned_dev-entries_nr; i++)
disable_irq(assigned_dev-host_msix_entries[i].vector);
 
-   for (i = 0; i  assigned_dev-entries_nr; i++)
-   free_irq(assigned_dev-host_msix_entries[i].vector,
-(void *)assigned_dev);
+   for (i = 0; i  assigned_dev-entries_nr; i++) {
+   u32 vector = assigned_dev-host_msix_entries[i].vector;
+   irq_set_affinity_hint(vector, NULL);
+   free_irq(vector, (void *)assigned_dev);
+   }
 
assigned_dev-entries_nr = 0;
kfree(assigned_dev-host_msix_entries);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index ac8b629..dabf89f 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -22,6 +22,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/interrupt.h
 #include trace/events/kvm.h
 
 #include asm/msidef.h
@@ -80,11 +81,23 @@ inline static bool kvm_is_dm_lowest_prio(struct 
kvm_lapic_irq *irq)
 #endif
 }
 
+static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
+{
+   const struct cpumask *mask;
+   /* raw_smp_processor_id() is ok here: if we get preempted we can get a
+* wrong value but we don't mind much. */
+   if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
+   mask = get_cpu_mask(vcpu-cpu);
+   irq_set_affinity_hint(host_irq, mask);
+   }
+}
+   
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int host_irq)
 {
int i, r = -1;
-   struct kvm_vcpu *vcpu, *lowest = NULL;
+   int matches = 0;
+   struct kvm_vcpu *vcpu, *lowest = NULL, *match;
 
if (irq-dest_mode == 0  irq-dest_id == 0xff 
kvm_is_dm_lowest_prio(irq))
@@ -98,10 +111,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
irq-dest_id, irq-dest_mode))
continue;
 
+   ++matches;
+
if (!kvm_is_dm_lowest_prio(irq)) {
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
+   match = vcpu;
} else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
@@ -110,8 +126,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
}
}
 
-   if (lowest)
+   if (lowest) {
r = kvm_apic_set_irq(lowest, irq);
+   match = lowest;
+   }
+
+   if (matches == 1)
+   kvm_vcpu_host_irq_hint(match, host_irq);
 
return r;
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-16 Thread Michael S. Tsirkin
On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote:
 On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
  To forward an interrupt to a vcpu that runs on
  a host cpu different from the current one,
  we need an ipi which likely will cost us as much
  as delivering the interrupt directly to that cpu would.
  
  Set irq affinity hint to point there, irq balancer
  can then take this into accound and balance
  interrupts accordingly.
  
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   virt/kvm/assigned-dev.c |8 +---
   virt/kvm/irq_comm.c |   17 -
   2 files changed, 21 insertions(+), 4 deletions(-)
  
  diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
  index f89f138..b579777 100644
  --- a/virt/kvm/assigned-dev.c
  +++ b/virt/kvm/assigned-dev.c
  @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
  for (i = 0; i  assigned_dev-entries_nr; i++)
  disable_irq(assigned_dev-host_msix_entries[i].vector);
   
  -   for (i = 0; i  assigned_dev-entries_nr; i++)
  -   free_irq(assigned_dev-host_msix_entries[i].vector,
  -(void *)assigned_dev);
  +   for (i = 0; i  assigned_dev-entries_nr; i++) {
  +   u32 vector = assigned_dev-host_msix_entries[i].vector;
  +   irq_set_affinity_hint(vector, NULL);
  +   free_irq(vector, (void *)assigned_dev);
  +   }
   
  assigned_dev-entries_nr = 0;
  kfree(assigned_dev-host_msix_entries);
  diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
  index ac8b629..68b1f7c 100644
  --- a/virt/kvm/irq_comm.c
  +++ b/virt/kvm/irq_comm.c
  @@ -22,6 +22,7 @@
   
   #include linux/kvm_host.h
   #include linux/slab.h
  +#include linux/interrupt.h
   #include trace/events/kvm.h
   
   #include asm/msidef.h
  @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
  kvm_lapic_irq *irq)
   #endif
   }
   
  +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
  +{
  +   const struct cpumask *mask;
  +   /* raw_smp_processor_id() is ok here: if we get preempted we can get a
  +* wrong value but we don't mind much. */
  +   if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
  +   mask = get_cpu_mask(vcpu-cpu);
  +   irq_set_affinity_hint(host_irq, mask);
  +   }
  +}
 
 Unsure about the internals of irq_set_affinity_hint, but AFAICS its
 exported so that irqbalance in userspace can make a decision.

Yes. Pls note at the moment there's no hint so irqbalance
will likely try to move the irq away from vcpu if that
is doing a lot of work. My patch tries to correct that.

 If that is the case, then irqbalance update rate should be high enough
 to catch up with a vcpu migrating betweens cpus (which initially does
 not appear a sensible arrangement).

At least for pinned vcpus, that's almost sure to be the case :)

 The decision to have the host interrupt follow the vcpu seems a good
 one, given that it saves an IPI and is potentially more cache friendly
 overall.

 And AFAICS its more intelligent for the device assignment case than
 anything irqbalance can come up with

Do you just propose overwriting affinity set by userspace then?
My concern would be to avoid breaking setups some users have,
with carefully manually optimized affinity for vcpus and device irqs.

 (note it depends on how the APIC is
 configured, your patch ignores that).

Could you clarify please? What is meant by 'it' in 'it depends'?
Which APIC - host or guest - do you mean, and what are possible APIC
configurations to consider?

   int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
  struct kvm_lapic_irq *irq, int host_irq)
   {
  @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
  kvm_lapic *src,
  if (r  0)
  r = 0;
  r += kvm_apic_set_irq(vcpu, irq);
  +   kvm_vcpu_host_irq_hint(vcpu, host_irq);
  } else if (kvm_lapic_enabled(vcpu)) {
  if (!lowest)
  lowest = vcpu;
  @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
  kvm_lapic *src,
  }
  }
   
  -   if (lowest)
  +   if (lowest) {
  r = kvm_apic_set_irq(lowest, irq);
  +   kvm_vcpu_host_irq_hint(vcpu, host_irq);
  +   }
   
  return r;
   }
  -- 
  1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-13 Thread Marcelo Tosatti
On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote:
 To forward an interrupt to a vcpu that runs on
 a host cpu different from the current one,
 we need an ipi which likely will cost us as much
 as delivering the interrupt directly to that cpu would.
 
 Set irq affinity hint to point there, irq balancer
 can then take this into accound and balance
 interrupts accordingly.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  virt/kvm/assigned-dev.c |8 +---
  virt/kvm/irq_comm.c |   17 -
  2 files changed, 21 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index f89f138..b579777 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
   for (i = 0; i  assigned_dev-entries_nr; i++)
   disable_irq(assigned_dev-host_msix_entries[i].vector);
  
 - for (i = 0; i  assigned_dev-entries_nr; i++)
 - free_irq(assigned_dev-host_msix_entries[i].vector,
 -  (void *)assigned_dev);
 + for (i = 0; i  assigned_dev-entries_nr; i++) {
 + u32 vector = assigned_dev-host_msix_entries[i].vector;
 + irq_set_affinity_hint(vector, NULL);
 + free_irq(vector, (void *)assigned_dev);
 + }
  
   assigned_dev-entries_nr = 0;
   kfree(assigned_dev-host_msix_entries);
 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index ac8b629..68b1f7c 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -22,6 +22,7 @@
  
  #include linux/kvm_host.h
  #include linux/slab.h
 +#include linux/interrupt.h
  #include trace/events/kvm.h
  
  #include asm/msidef.h
 @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
 kvm_lapic_irq *irq)
  #endif
  }
  
 +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
 +{
 + const struct cpumask *mask;
 + /* raw_smp_processor_id() is ok here: if we get preempted we can get a
 +  * wrong value but we don't mind much. */
 + if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
 + mask = get_cpu_mask(vcpu-cpu);
 + irq_set_affinity_hint(host_irq, mask);
 + }
 +}

Unsure about the internals of irq_set_affinity_hint, but AFAICS its
exported so that irqbalance in userspace can make a decision.

If that is the case, then irqbalance update rate should be high enough
to catch up with a vcpu migrating betweens cpus (which initially does
not appear a sensible arrangement).

The decision to have the host interrupt follow the vcpu seems a good
one, given that it saves an IPI and is potentially more cache friendly
overall.

And AFAICS its more intelligent for the device assignment case than
anything irqbalance can come up with (note it depends on how the APIC is
configured, your patch ignores that).

 + 
  int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
   struct kvm_lapic_irq *irq, int host_irq)
  {
 @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   if (r  0)
   r = 0;
   r += kvm_apic_set_irq(vcpu, irq);
 + kvm_vcpu_host_irq_hint(vcpu, host_irq);
   } else if (kvm_lapic_enabled(vcpu)) {
   if (!lowest)
   lowest = vcpu;
 @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
 kvm_lapic *src,
   }
   }
  
 - if (lowest)
 + if (lowest) {
   r = kvm_apic_set_irq(lowest, irq);
 + kvm_vcpu_host_irq_hint(vcpu, host_irq);
 + }
  
   return r;
  }
 -- 
 1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC 2/2] kvm: set affinity hint for assigned device msi

2011-10-11 Thread Michael S. Tsirkin
To forward an interrupt to a vcpu that runs on
a host cpu different from the current one,
we need an ipi which likely will cost us as much
as delivering the interrupt directly to that cpu would.

Set irq affinity hint to point there, irq balancer
can then take this into accound and balance
interrupts accordingly.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 virt/kvm/assigned-dev.c |8 +---
 virt/kvm/irq_comm.c |   17 -
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index f89f138..b579777 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm,
for (i = 0; i  assigned_dev-entries_nr; i++)
disable_irq(assigned_dev-host_msix_entries[i].vector);
 
-   for (i = 0; i  assigned_dev-entries_nr; i++)
-   free_irq(assigned_dev-host_msix_entries[i].vector,
-(void *)assigned_dev);
+   for (i = 0; i  assigned_dev-entries_nr; i++) {
+   u32 vector = assigned_dev-host_msix_entries[i].vector;
+   irq_set_affinity_hint(vector, NULL);
+   free_irq(vector, (void *)assigned_dev);
+   }
 
assigned_dev-entries_nr = 0;
kfree(assigned_dev-host_msix_entries);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index ac8b629..68b1f7c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -22,6 +22,7 @@
 
 #include linux/kvm_host.h
 #include linux/slab.h
+#include linux/interrupt.h
 #include trace/events/kvm.h
 
 #include asm/msidef.h
@@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct 
kvm_lapic_irq *irq)
 #endif
 }
 
+static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq)
+{
+   const struct cpumask *mask;
+   /* raw_smp_processor_id() is ok here: if we get preempted we can get a
+* wrong value but we don't mind much. */
+   if (host_irq = 0  unlikely(vcpu-cpu != raw_smp_processor_id())) {
+   mask = get_cpu_mask(vcpu-cpu);
+   irq_set_affinity_hint(host_irq, mask);
+   }
+}
+   
 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, int host_irq)
 {
@@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (r  0)
r = 0;
r += kvm_apic_set_irq(vcpu, irq);
+   kvm_vcpu_host_irq_hint(vcpu, host_irq);
} else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
lowest = vcpu;
@@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
}
}
 
-   if (lowest)
+   if (lowest) {
r = kvm_apic_set_irq(lowest, irq);
+   kvm_vcpu_host_irq_hint(vcpu, host_irq);
+   }
 
return r;
 }
-- 
1.7.5.53.gc233e
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html