Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Thu, Jan 12, 2012 at 04:09:17PM +0200, Avi Kivity wrote: On 10/11/2011 08:38 PM, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); Doing this every time seems excessive. Please note this doesn't modify the mapping in the hardware: it only updates the hint for the irqbalancer, and only if we target a CPU which is different from the current one. How about doing it every N interrupts? That's certainly easy. We can even collect information about which vcpus were targeted, and then use a mask instead of just one vcpu. Yes, we can, but it certainly becomes much more complex. The motivatin behind using the same host cpu is reducing the number of IPIs. If the interrupt is bouncing between guest VCPUs, this optimizaion won't work, so what's the theory behind using a mask? How about detecting an unstable vcpu mapping and completely disabling the optimization, instead? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On 10/11/2011 08:38 PM, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a + * wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); Doing this every time seems excessive. How about doing it every N interrupts? We can even collect information about which vcpus were targeted, and then use a mask instead of just one vcpu. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 07:04:40PM +0200, Michael S. Tsirkin wrote: On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote: Configurations to consider, all common ones used for assigned devices? I mean, besides round robin, any other modes that have an issue? Interrupts can also be multicast, I think, but we probably don't care what happens to affinity then, as msi interrupts are probably never broadcast ... There is also lowest priority, which can be used with MSI. So the following will probably address that comment? Yes, it does. Patch looks fine. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, - (void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a + * wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. Yes. Pls note at the moment there's no hint so irqbalance will likely try to move the irq away from vcpu if that is doing a lot of work. My patch tries to correct that. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). At least for pinned vcpus, that's almost sure to be the case :) What i mean is that the frequency of a vcpu migrating between cpus might be higher than what irqbalance can cope with. The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with Do you just propose overwriting affinity set by userspace then? Yes. My concern would be to avoid breaking setups some users have, with carefully manually optimized affinity for vcpus and device irqs. They can disable automatic in-kernel affinity. (note it depends on how the APIC is configured, your patch ignores that). Could you clarify please? What is meant by 'it' in 'it depends'? It means the target vcpu selection. It depends on how the guest APIC is programmed. Which APIC - host or guest - do you mean, and what are possible APIC configurations to consider? Guest APIC. Guest APIC programmed with round robin would break the static assignment on your patch. Configurations to consider, all common ones used for assigned devices? int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r =
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) Or maybe even automatic in-kernel irq affinity disabled by default (to avoid interference with userspace pinning). All of that if it makes sense performance wise, of course. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote: On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, -(void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. Yes. Pls note at the moment there's no hint so irqbalance will likely try to move the irq away from vcpu if that is doing a lot of work. My patch tries to correct that. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). At least for pinned vcpus, that's almost sure to be the case :) What i mean is that the frequency of a vcpu migrating between cpus might be higher than what irqbalance can cope with. The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with Do you just propose overwriting affinity set by userspace then? Yes. My concern would be to avoid breaking setups some users have, with carefully manually optimized affinity for vcpus and device irqs. They can disable automatic in-kernel affinity. This still means code needs to be changed ... Anyway, what's the interface for that? (note it depends on how the APIC is configured, your patch ignores that). Could you clarify please? What is meant by 'it' in 'it depends'? It means the target vcpu selection. It depends on how the guest APIC is programmed. Which APIC - host or guest - do you mean, and what are possible APIC configurations to consider? Guest APIC. Guest APIC programmed with round robin would break the static assignment on your patch. For round robin we might just want to disable this automatic affinity? Configurations to consider, all common ones used for assigned devices? I mean, besides round robin, any other modes that have an issue? Interrupts can also be multicast, I think, but we probably don't care what happens to affinity then, as msi interrupts are probably never broadcast ... int kvm_irq_delivery_to_apic(struct
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 09:09:46AM -0200, Marcelo Tosatti wrote: On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) Or maybe even automatic in-kernel irq affinity disabled by default (to avoid interference with userspace pinning). Right. But what's the interface to enable it? There's no standard interface, is there? The advantage of hinting is that it works with existing tools, without changing userspace at all. All of that if it makes sense performance wise, of course. That's the point. Almost any benchmark I'm aware of is run with vcpus pinned. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 03:32:15PM +0200, Michael S. Tsirkin wrote: On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote: On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, - (void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a + * wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. Yes. Pls note at the moment there's no hint so irqbalance will likely try to move the irq away from vcpu if that is doing a lot of work. My patch tries to correct that. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). At least for pinned vcpus, that's almost sure to be the case :) What i mean is that the frequency of a vcpu migrating between cpus might be higher than what irqbalance can cope with. The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with Do you just propose overwriting affinity set by userspace then? Yes. My concern would be to avoid breaking setups some users have, with carefully manually optimized affinity for vcpus and device irqs. They can disable automatic in-kernel affinity. This still means code needs to be changed ... Anyway, what's the interface for that? (note it depends on how the APIC is configured, your patch ignores that). Could you clarify please? What is meant by 'it' in 'it depends'? It means the target vcpu selection. It depends on how the guest APIC is programmed. Which APIC - host or guest - do you mean, and what are possible APIC configurations to consider? Guest APIC. Guest APIC programmed with round robin would break the static assignment on your patch. For round robin we might just want to disable this automatic affinity? OK. Configurations to consider, all common ones used for assigned devices? I mean, besides round robin, any other modes that have an issue? Interrupts can also be
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote: On Mon, Oct 17, 2011 at 03:32:15PM +0200, Michael S. Tsirkin wrote: On Mon, Oct 17, 2011 at 08:58:59AM -0200, Marcelo Tosatti wrote: On Sun, Oct 16, 2011 at 03:12:23PM +0200, Michael S. Tsirkin wrote: On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, -(void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. Yes. Pls note at the moment there's no hint so irqbalance will likely try to move the irq away from vcpu if that is doing a lot of work. My patch tries to correct that. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). At least for pinned vcpus, that's almost sure to be the case :) What i mean is that the frequency of a vcpu migrating between cpus might be higher than what irqbalance can cope with. The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with Do you just propose overwriting affinity set by userspace then? Yes. My concern would be to avoid breaking setups some users have, with carefully manually optimized affinity for vcpus and device irqs. They can disable automatic in-kernel affinity. This still means code needs to be changed ... Anyway, what's the interface for that? (note it depends on how the APIC is configured, your patch ignores that). Could you clarify please? What is meant by 'it' in 'it depends'? It means the target vcpu selection. It depends on how the guest APIC is programmed. Which APIC - host or guest - do you mean, and what are possible APIC configurations to consider? Guest APIC. Guest APIC programmed with round robin would break the static assignment on your patch. For round robin we might just want to disable this automatic affinity? OK. Configurations
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Mon, Oct 17, 2011 at 02:07:41PM -0200, Marcelo Tosatti wrote: Configurations to consider, all common ones used for assigned devices? I mean, besides round robin, any other modes that have an issue? Interrupts can also be multicast, I think, but we probably don't care what happens to affinity then, as msi interrupts are probably never broadcast ... There is also lowest priority, which can be used with MSI. So the following will probably address that comment? diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, -(void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..dabf89f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,11 +81,23 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { int i, r = -1; - struct kvm_vcpu *vcpu, *lowest = NULL; + int matches = 0; + struct kvm_vcpu *vcpu, *lowest = NULL, *match; if (irq-dest_mode == 0 irq-dest_id == 0xff kvm_is_dm_lowest_prio(irq)) @@ -98,10 +111,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, irq-dest_id, irq-dest_mode)) continue; + ++matches; + if (!kvm_is_dm_lowest_prio(irq)) { if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + match = vcpu; } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; @@ -110,8 +126,13 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r = kvm_apic_set_irq(lowest, irq); + match = lowest; + } + + if (matches == 1) + kvm_vcpu_host_irq_hint(match, host_irq); return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Thu, Oct 13, 2011 at 11:54:50AM -0300, Marcelo Tosatti wrote: On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, -(void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. Yes. Pls note at the moment there's no hint so irqbalance will likely try to move the irq away from vcpu if that is doing a lot of work. My patch tries to correct that. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). At least for pinned vcpus, that's almost sure to be the case :) The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with Do you just propose overwriting affinity set by userspace then? My concern would be to avoid breaking setups some users have, with carefully manually optimized affinity for vcpus and device irqs. (note it depends on how the APIC is configured, your patch ignores that). Could you clarify please? What is meant by 'it' in 'it depends'? Which APIC - host or guest - do you mean, and what are possible APIC configurations to consider? int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r = kvm_apic_set_irq(lowest, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); + } return r; } -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
On Tue, Oct 11, 2011 at 08:38:28PM +0200, Michael S. Tsirkin wrote: To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, - (void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a + * wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} Unsure about the internals of irq_set_affinity_hint, but AFAICS its exported so that irqbalance in userspace can make a decision. If that is the case, then irqbalance update rate should be high enough to catch up with a vcpu migrating betweens cpus (which initially does not appear a sensible arrangement). The decision to have the host interrupt follow the vcpu seems a good one, given that it saves an IPI and is potentially more cache friendly overall. And AFAICS its more intelligent for the device assignment case than anything irqbalance can come up with (note it depends on how the APIC is configured, your patch ignores that). + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r = kvm_apic_set_irq(lowest, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); + } return r; } -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC 2/2] kvm: set affinity hint for assigned device msi
To forward an interrupt to a vcpu that runs on a host cpu different from the current one, we need an ipi which likely will cost us as much as delivering the interrupt directly to that cpu would. Set irq affinity hint to point there, irq balancer can then take this into accound and balance interrupts accordingly. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- virt/kvm/assigned-dev.c |8 +--- virt/kvm/irq_comm.c | 17 - 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index f89f138..b579777 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -142,9 +142,11 @@ static void deassign_host_irq(struct kvm *kvm, for (i = 0; i assigned_dev-entries_nr; i++) disable_irq(assigned_dev-host_msix_entries[i].vector); - for (i = 0; i assigned_dev-entries_nr; i++) - free_irq(assigned_dev-host_msix_entries[i].vector, -(void *)assigned_dev); + for (i = 0; i assigned_dev-entries_nr; i++) { + u32 vector = assigned_dev-host_msix_entries[i].vector; + irq_set_affinity_hint(vector, NULL); + free_irq(vector, (void *)assigned_dev); + } assigned_dev-entries_nr = 0; kfree(assigned_dev-host_msix_entries); diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index ac8b629..68b1f7c 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -22,6 +22,7 @@ #include linux/kvm_host.h #include linux/slab.h +#include linux/interrupt.h #include trace/events/kvm.h #include asm/msidef.h @@ -80,6 +81,17 @@ inline static bool kvm_is_dm_lowest_prio(struct kvm_lapic_irq *irq) #endif } +static void kvm_vcpu_host_irq_hint(struct kvm_vcpu *vcpu, int host_irq) +{ + const struct cpumask *mask; + /* raw_smp_processor_id() is ok here: if we get preempted we can get a +* wrong value but we don't mind much. */ + if (host_irq = 0 unlikely(vcpu-cpu != raw_smp_processor_id())) { + mask = get_cpu_mask(vcpu-cpu); + irq_set_affinity_hint(host_irq, mask); + } +} + int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, struct kvm_lapic_irq *irq, int host_irq) { @@ -102,6 +114,7 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, if (r 0) r = 0; r += kvm_apic_set_irq(vcpu, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); } else if (kvm_lapic_enabled(vcpu)) { if (!lowest) lowest = vcpu; @@ -110,8 +123,10 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, } } - if (lowest) + if (lowest) { r = kvm_apic_set_irq(lowest, irq); + kvm_vcpu_host_irq_hint(vcpu, host_irq); + } return r; } -- 1.7.5.53.gc233e -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html