Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 30/10/2015 07:16, Yunhong Jiang wrote: > And with this change, we even don't need the module option anymore, we first > try the primary handler, which is in hard irq context, and if failed, then > threaded irq handler. Am I right? Yes. > Paolo/Alex, do you want to work on the patch yourself? If not, I will be > happy to try this method. Of course you can do it yourself. Thanks! Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Thu, Oct 29, 2015 at 10:45:44AM +0100, Paolo Bonzini wrote: > > > On 29/10/2015 04:11, Alex Williamson wrote: > > > The irqfd is already able to schedule a work item, because it runs with > > > interrupts disabled, so I think we can always return IRQ_HANDLED. > > > > I'm confused by this. The problem with adding IRQF_NO_THREAD to our > > current handler is that it hits the spinlock that can sleep in > > eventfd_signal() and the waitqueue further down the stack before we get > > to the irqfd. So if we split to a non-threaded handler vs a threaded > > handler, where the non-threaded handler either returns IRQ_HANDLED or > > IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that > > the non-threaded handler can do before we start running into the same > > problem. > > You're right. I thought schedule_work used raw spinlocks (and then > everything would be done in the inject callback), but I was wrong. > > Basically where irqfd_wakeup now does schedule_work, it would need to > return IRQ_WAKE_THREAD. The threaded handler then can just do the > eventfd_signal. > And with this change, we even don't need the module option anymore, we first try the primary handler, which is in hard irq context, and if failed, then threaded irq handler. Am I right? Paolo/Alex, do you want to work on the patch yourself? If not, I will be happy to try this method. Thanks --jyh > Paolo > > > I think that means that the non-threaded handler needs to > > return IRQ_WAKE_THREAD if we need to use the current eventfd_signal() > > path, such as if the bypass path is not available. If we can get > > through the bypass path and the KVM irqfd side is safe for the > > non-threaded handler, inject succeeds and we return IRQ_HANDLED, right? > > Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 29/10/2015 04:11, Alex Williamson wrote: > > The irqfd is already able to schedule a work item, because it runs with > > interrupts disabled, so I think we can always return IRQ_HANDLED. > > I'm confused by this. The problem with adding IRQF_NO_THREAD to our > current handler is that it hits the spinlock that can sleep in > eventfd_signal() and the waitqueue further down the stack before we get > to the irqfd. So if we split to a non-threaded handler vs a threaded > handler, where the non-threaded handler either returns IRQ_HANDLED or > IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that > the non-threaded handler can do before we start running into the same > problem. You're right. I thought schedule_work used raw spinlocks (and then everything would be done in the inject callback), but I was wrong. Basically where irqfd_wakeup now does schedule_work, it would need to return IRQ_WAKE_THREAD. The threaded handler then can just do the eventfd_signal. Paolo > I think that means that the non-threaded handler needs to > return IRQ_WAKE_THREAD if we need to use the current eventfd_signal() > path, such as if the bypass path is not available. If we can get > through the bypass path and the KVM irqfd side is safe for the > non-threaded handler, inject succeeds and we return IRQ_HANDLED, right? > Thanks, -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, 2015-10-28 at 18:05 +0100, Paolo Bonzini wrote: > > On 28/10/2015 17:00, Alex Williamson wrote: > > > Alex, would it make sense to use the IRQ bypass infrastructure always, > > > not just for VT-d, to do the MSI injection directly from the VFIO > > > interrupt handler and bypass the eventfd? Basically this would add an > > > RCU-protected list of consumers matching the token to struct > > > irq_bypass_producer, and a > > > > > > int (*inject)(struct irq_bypass_consumer *); > > > > > > callback to struct irq_bypass_consumer. If any callback returns true, > > > the eventfd is not signaled. > > > > Yeah, that might be a good idea, it's probably more plausible than > > making the eventfd_signal() code friendly to call from hard interrupt > > context. On the vfio side can we use request_threaded_irq() directly > > for this? > > I don't know if that gives you a non-threaded IRQ with the real-time > kernel... CCing Marcelo to get some insight. > > > Making the hard irq handler return IRQ_HANDLED if we can use > > the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd. > > I think we need some way to get back to irq thread context to use > > eventfd_signal(). > > The irqfd is already able to schedule a work item, because it runs with > interrupts disabled, so I think we can always return IRQ_HANDLED. I'm confused by this. The problem with adding IRQF_NO_THREAD to our current handler is that it hits the spinlock that can sleep in eventfd_signal() and the waitqueue further down the stack before we get to the irqfd. So if we split to a non-threaded handler vs a threaded handler, where the non-threaded handler either returns IRQ_HANDLED or IRQ_WAKE_THREAD to queue the threaded handler, there's only so much that the non-threaded handler can do before we start running into the same problem. I think that means that the non-threaded handler needs to return IRQ_WAKE_THREAD if we need to use the current eventfd_signal() path, such as if the bypass path is not available. If we can get through the bypass path and the KVM irqfd side is safe for the non-threaded handler, inject succeeds and we return IRQ_HANDLED, right? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, Oct 28, 2015 at 06:05:00PM +0100, Paolo Bonzini wrote: > > > On 28/10/2015 17:00, Alex Williamson wrote: > > > Alex, would it make sense to use the IRQ bypass infrastructure always, > > > not just for VT-d, to do the MSI injection directly from the VFIO > > > interrupt handler and bypass the eventfd? Basically this would add an > > > RCU-protected list of consumers matching the token to struct > > > irq_bypass_producer, and a > > > > > > int (*inject)(struct irq_bypass_consumer *); > > > > > > callback to struct irq_bypass_consumer. If any callback returns true, > > > the eventfd is not signaled. > > > > Yeah, that might be a good idea, it's probably more plausible than > > making the eventfd_signal() code friendly to call from hard interrupt > > context. On the vfio side can we use request_threaded_irq() directly > > for this? > > I don't know if that gives you a non-threaded IRQ with the real-time > kernel... CCing Marcelo to get some insight. The vfio interrupt handler (threaded or not) runs at a higher priority than the vcpu thread. So don't worry about -RT. About bypass: the smaller number of instructions between device ISR and injection of interrupt to guest, the better, as that will translate directly to reduction in interrupt latency times, which is important, as it determines 1. how often you can switch from pollmode to ACPI C-states. 2. whether the realtime workload is virtualizable. The answer to properties of request_threaded_irq() is: don't know. > > Making the hard irq handler return IRQ_HANDLED if we can use > > the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd. > > I think we need some way to get back to irq thread context to use > > eventfd_signal(). > > The irqfd is already able to schedule a work item, because it runs with > interrupts disabled, so I think we can always return IRQ_HANDLED. > > There's another little complication. Right now, only x86 has > kvm_set_msi_inatomic. We should merge kvm_set_msi_inatomic, > kvm_set_irq_inatomic and kvm_arch_set_irq. > > Some cleanups are needed there; the flow between the functions is really > badly structured because the API grew somewhat by accretion. I'll get > to it next week or on the way back to Italy. > > > Would we ever not want to use the direct bypass > > manager path if available? Thanks, > > I don't think so. KVM always registers itself as a consumer, even if > there is no VT-d posted interrupts. add_producer simply returns -EINVAL > then. > > Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, Oct 28, 2015 at 12:18:48PM -0600, Alex Williamson wrote: > On Wed, 2015-10-28 at 10:50 -0700, Yunhong Jiang wrote: > > On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote: > > It's in linux-next via the kvm.git next branch: > > git://git.kernel.org/pub/scm/virt/kvm/kvm.git > > Thanks, > Alex Thanks --jyh > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 28/10/2015 18:50, Yunhong Jiang wrote: > > No, I don't think you can use raw_spinlock there. The problem is not > > just eventfd_signal, it is especially wake_up_locked_poll. You cannot > > convert the whole workqueue infrastructure to use raw_spinlock. > > You mean the waitqueue, instead of workqueue, right? Yes. > One choice is to change > the eventfd to use simple wait queue, which is raw_spinlock. But use simple > waitqueue on eventfd may in fact impact real time latency if not in this > scenario. Userspace can put an arbitrary amount of tasks on the work queue, so it's not possible to use a simple wait queue. It would also touch multiple subsystems, so it's much better to bypass the eventfd completely. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, 2015-10-28 at 10:50 -0700, Yunhong Jiang wrote: > On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote: > > > > > > On 27/10/2015 22:26, Yunhong Jiang wrote: > > >> > On RT kernels however can you call eventfd_signal from interrupt > > >> > context? You cannot call spin_lock_irqsave (which can sleep) from a > > >> > non-threaded interrupt handler, can you? You would need a raw spin > > >> > lock. > > > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT > > > kernel. Will do this way on next patch. But not sure if it's overkill to > > > use > > > raw_spinlock there since the eventfd_signal is used by other caller also. > > > > No, I don't think you can use raw_spinlock there. The problem is not > > just eventfd_signal, it is especially wake_up_locked_poll. You cannot > > convert the whole workqueue infrastructure to use raw_spinlock. > > You mean the waitqueue, instead of workqueue, right? One choice is to change > the eventfd to use simple wait queue, which is raw_spinlock. But use simple > waitqueue on eventfd may in fact impact real time latency if not in this > scenario. > > > > > Alex, would it make sense to use the IRQ bypass infrastructure always, > > not just for VT-d, to do the MSI injection directly from the VFIO > > interrupt handler and bypass the eventfd? Basically this would add an > > RCU-protected list of consumers matching the token to struct > > irq_bypass_producer, and a > > > > int (*inject)(struct irq_bypass_consumer *); > > > > callback to struct irq_bypass_consumer. If any callback returns true, > > the eventfd is not signaled. The KVM implementation would be like this > > (compare with virt/kvm/eventfd.c): > > > > /* Extracted out of irqfd_wakeup */ > > static int > > irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) > > { > > ... > > } > > > > /* Extracted out of irqfd_wakeup */ > > static int > > irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd) > > { > > ... > > } > > > > static int > > irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, > > void *key) > > { > > struct _irqfd *irqfd = container_of(wait, > > struct _irqfd, wait); > > unsigned long flags = (unsigned long)key; > > > > if (flags & POLLIN) > > irqfd_wakeup_pollin(irqfd); > > if (flags & POLLHUP) > > irqfd_wakeup_pollhup(irqfd); > > > > return 0; > > } > > > > static int kvm_arch_irq_bypass_inject( > > struct irq_bypass_consumer *cons) > > { > > struct kvm_kernel_irqfd *irqfd = > > container_of(cons, struct kvm_kernel_irqfd, > > consumer); > > > > irqfd_wakeup_pollin(irqfd); > > } > > > This is a good idea IMHO. So for MSI interrupt, the > kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be > invoked anymore, am I right? > > I noticed the irq bypass manager is not merged yet, are there any git branch > for it? It's in linux-next via the kvm.git next branch: git://git.kernel.org/pub/scm/virt/kvm/kvm.git Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote: > > > On 27/10/2015 22:26, Yunhong Jiang wrote: > >> > On RT kernels however can you call eventfd_signal from interrupt > >> > context? You cannot call spin_lock_irqsave (which can sleep) from a > >> > non-threaded interrupt handler, can you? You would need a raw spin lock. > > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT > > kernel. Will do this way on next patch. But not sure if it's overkill to > > use > > raw_spinlock there since the eventfd_signal is used by other caller also. > > No, I don't think you can use raw_spinlock there. The problem is not > just eventfd_signal, it is especially wake_up_locked_poll. You cannot > convert the whole workqueue infrastructure to use raw_spinlock. You mean the waitqueue, instead of workqueue, right? One choice is to change the eventfd to use simple wait queue, which is raw_spinlock. But use simple waitqueue on eventfd may in fact impact real time latency if not in this scenario. > > Alex, would it make sense to use the IRQ bypass infrastructure always, > not just for VT-d, to do the MSI injection directly from the VFIO > interrupt handler and bypass the eventfd? Basically this would add an > RCU-protected list of consumers matching the token to struct > irq_bypass_producer, and a > > int (*inject)(struct irq_bypass_consumer *); > > callback to struct irq_bypass_consumer. If any callback returns true, > the eventfd is not signaled. The KVM implementation would be like this > (compare with virt/kvm/eventfd.c): > > /* Extracted out of irqfd_wakeup */ > static int > irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) > { > ... > } > > /* Extracted out of irqfd_wakeup */ > static int > irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd) > { > ... > } > > static int > irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, >void *key) > { > struct _irqfd *irqfd = container_of(wait, > struct _irqfd, wait); > unsigned long flags = (unsigned long)key; > > if (flags & POLLIN) > irqfd_wakeup_pollin(irqfd); > if (flags & POLLHUP) > irqfd_wakeup_pollhup(irqfd); > > return 0; > } > > static int kvm_arch_irq_bypass_inject( > struct irq_bypass_consumer *cons) > { > struct kvm_kernel_irqfd *irqfd = > container_of(cons, struct kvm_kernel_irqfd, >consumer); > > irqfd_wakeup_pollin(irqfd); > } > This is a good idea IMHO. So for MSI interrupt, the kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be invoked anymore, am I right? I noticed the irq bypass manager is not merged yet, are there any git branch for it? > Or do you think it would be a hack? The latency improvement might > actually be even better than what Yunhong is already reporting. I will be glad to try it. Thanks --jyh > > Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 28/10/2015 17:00, Alex Williamson wrote: > > Alex, would it make sense to use the IRQ bypass infrastructure always, > > not just for VT-d, to do the MSI injection directly from the VFIO > > interrupt handler and bypass the eventfd? Basically this would add an > > RCU-protected list of consumers matching the token to struct > > irq_bypass_producer, and a > > > > int (*inject)(struct irq_bypass_consumer *); > > > > callback to struct irq_bypass_consumer. If any callback returns true, > > the eventfd is not signaled. > > Yeah, that might be a good idea, it's probably more plausible than > making the eventfd_signal() code friendly to call from hard interrupt > context. On the vfio side can we use request_threaded_irq() directly > for this? I don't know if that gives you a non-threaded IRQ with the real-time kernel... CCing Marcelo to get some insight. > Making the hard irq handler return IRQ_HANDLED if we can use > the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd. > I think we need some way to get back to irq thread context to use > eventfd_signal(). The irqfd is already able to schedule a work item, because it runs with interrupts disabled, so I think we can always return IRQ_HANDLED. There's another little complication. Right now, only x86 has kvm_set_msi_inatomic. We should merge kvm_set_msi_inatomic, kvm_set_irq_inatomic and kvm_arch_set_irq. Some cleanups are needed there; the flow between the functions is really badly structured because the API grew somewhat by accretion. I'll get to it next week or on the way back to Italy. > Would we ever not want to use the direct bypass > manager path if available? Thanks, I don't think so. KVM always registers itself as a consumer, even if there is no VT-d posted interrupts. add_producer simply returns -EINVAL then. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Wed, 2015-10-28 at 01:44 +0100, Paolo Bonzini wrote: > > On 27/10/2015 22:26, Yunhong Jiang wrote: > >> > On RT kernels however can you call eventfd_signal from interrupt > >> > context? You cannot call spin_lock_irqsave (which can sleep) from a > >> > non-threaded interrupt handler, can you? You would need a raw spin lock. > > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT > > kernel. Will do this way on next patch. But not sure if it's overkill to > > use > > raw_spinlock there since the eventfd_signal is used by other caller also. > > No, I don't think you can use raw_spinlock there. The problem is not > just eventfd_signal, it is especially wake_up_locked_poll. You cannot > convert the whole workqueue infrastructure to use raw_spinlock. > > Alex, would it make sense to use the IRQ bypass infrastructure always, > not just for VT-d, to do the MSI injection directly from the VFIO > interrupt handler and bypass the eventfd? Basically this would add an > RCU-protected list of consumers matching the token to struct > irq_bypass_producer, and a > > int (*inject)(struct irq_bypass_consumer *); > > callback to struct irq_bypass_consumer. If any callback returns true, > the eventfd is not signaled. The KVM implementation would be like this > (compare with virt/kvm/eventfd.c): > > /* Extracted out of irqfd_wakeup */ > static int > irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) > { > ... > } > > /* Extracted out of irqfd_wakeup */ > static int > irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd) > { > ... > } > > static int > irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, >void *key) > { > struct _irqfd *irqfd = container_of(wait, > struct _irqfd, wait); > unsigned long flags = (unsigned long)key; > > if (flags & POLLIN) > irqfd_wakeup_pollin(irqfd); > if (flags & POLLHUP) > irqfd_wakeup_pollhup(irqfd); > > return 0; > } > > static int kvm_arch_irq_bypass_inject( > struct irq_bypass_consumer *cons) > { > struct kvm_kernel_irqfd *irqfd = > container_of(cons, struct kvm_kernel_irqfd, >consumer); > > irqfd_wakeup_pollin(irqfd); > } > > Or do you think it would be a hack? The latency improvement might > actually be even better than what Yunhong is already reporting. Yeah, that might be a good idea, it's probably more plausible than making the eventfd_signal() code friendly to call from hard interrupt context. On the vfio side can we use request_threaded_irq() directly for this? Making the hard irq handler return IRQ_HANDLED if we can use the irq bypass manager or IRQ_WAKE_THREAD if we need to use the eventfd. I think we need some way to get back to irq thread context to use eventfd_signal(). Would we ever not want to use the direct bypass manager path if available? Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 27/10/2015 22:26, Yunhong Jiang wrote: >> > On RT kernels however can you call eventfd_signal from interrupt >> > context? You cannot call spin_lock_irqsave (which can sleep) from a >> > non-threaded interrupt handler, can you? You would need a raw spin lock. > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT > kernel. Will do this way on next patch. But not sure if it's overkill to use > raw_spinlock there since the eventfd_signal is used by other caller also. No, I don't think you can use raw_spinlock there. The problem is not just eventfd_signal, it is especially wake_up_locked_poll. You cannot convert the whole workqueue infrastructure to use raw_spinlock. Alex, would it make sense to use the IRQ bypass infrastructure always, not just for VT-d, to do the MSI injection directly from the VFIO interrupt handler and bypass the eventfd? Basically this would add an RCU-protected list of consumers matching the token to struct irq_bypass_producer, and a int (*inject)(struct irq_bypass_consumer *); callback to struct irq_bypass_consumer. If any callback returns true, the eventfd is not signaled. The KVM implementation would be like this (compare with virt/kvm/eventfd.c): /* Extracted out of irqfd_wakeup */ static int irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd) { ... } /* Extracted out of irqfd_wakeup */ static int irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd) { ... } static int irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) { struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait); unsigned long flags = (unsigned long)key; if (flags & POLLIN) irqfd_wakeup_pollin(irqfd); if (flags & POLLHUP) irqfd_wakeup_pollhup(irqfd); return 0; } static int kvm_arch_irq_bypass_inject( struct irq_bypass_consumer *cons) { struct kvm_kernel_irqfd *irqfd = container_of(cons, struct kvm_kernel_irqfd, consumer); irqfd_wakeup_pollin(irqfd); } Or do you think it would be a hack? The latency improvement might actually be even better than what Yunhong is already reporting. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Tue, Oct 27, 2015 at 10:29:28AM +0100, Paolo Bonzini wrote: > > > On 27/10/2015 07:35, Yunhong Jiang wrote: > > On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: > >> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: > >>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, > >>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when > >>> assigning a device to a guest with low latency requirement since it > >>> reduce the context switch to/from the IRQ thread. > >> > >> Is there any way we can do this automatically? Perhaps detecting that > >> we're on a RT kernel or maybe that the user is running with RT priority? > >> I find that module options are mostly misunderstood and misused. > > > > Alex, thanks for review. > > > > It's not easy to detect if the user is running with RT priority, since > > sometimes the user start the thread and then set the scheduler priority > > late. > > > > Also should we do this only for in kernel irqchip scenario and not for user > > space handler, since in kernel irqchip has lower overhead? > > The overhead of the non-threaded IRQ handler is the same for kernel or > userspace irqchip, since the handler just writes 1 to the eventfd. IIUC, the handler not only write1 1 to the eventfd, it also invoke the wait queue function, and the in kernel irqchip has different callback with the user space irqchip, am I right? But I should not state that in kernel irqchip has lower overhead since I have no data for it. > > On RT kernels however can you call eventfd_signal from interrupt > context? You cannot call spin_lock_irqsave (which can sleep) from a > non-threaded interrupt handler, can you? You would need a raw spin lock. Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT kernel. Will do this way on next patch. But not sure if it's overkill to use raw_spinlock there since the eventfd_signal is used by other caller also. Thanks --jyh > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On 27/10/2015 07:35, Yunhong Jiang wrote: > On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: >> On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: >>> An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, >>> even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when >>> assigning a device to a guest with low latency requirement since it >>> reduce the context switch to/from the IRQ thread. >> >> Is there any way we can do this automatically? Perhaps detecting that >> we're on a RT kernel or maybe that the user is running with RT priority? >> I find that module options are mostly misunderstood and misused. > > Alex, thanks for review. > > It's not easy to detect if the user is running with RT priority, since > sometimes the user start the thread and then set the scheduler priority > late. > > Also should we do this only for in kernel irqchip scenario and not for user > space handler, since in kernel irqchip has lower overhead? The overhead of the non-threaded IRQ handler is the same for kernel or userspace irqchip, since the handler just writes 1 to the eventfd. On RT kernels however can you call eventfd_signal from interrupt context? You cannot call spin_lock_irqsave (which can sleep) from a non-threaded interrupt handler, can you? You would need a raw spin lock. Paolo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Mon, Oct 26, 2015 at 09:37:14PM -0600, Alex Williamson wrote: > On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: > > An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, > > even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when > > assigning a device to a guest with low latency requirement since it > > reduce the context switch to/from the IRQ thread. > > Is there any way we can do this automatically? Perhaps detecting that > we're on a RT kernel or maybe that the user is running with RT priority? > I find that module options are mostly misunderstood and misused. Alex, thanks for review. It's not easy to detect if the user is running with RT priority, since sometimes the user start the thread and then set the scheduler priority late. Also should we do this only for in kernel irqchip scenario and not for user space handler, since in kernel irqchip has lower overhead? > > > An experiment was conducted on a HSW platform for 1 minutes, with the > > guest vCPU bound to isolated pCPU. The assigned device triggered the > > interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time > > is dropped from 5.3us to 2.2us. > > > > Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this > > option only to specific devices when in kernel irq_chip is enabled. It > > provides more flexibility but is more complex, not sure if we need go > > through that way. > > Allowing the user to decide whether or not to use a threaded IRQ seems > like a privilege violation; a chance for the user to game the system and > give themselves better latency, maybe at the cost of others. I think Yes, you are right. One benefit of the ioctl change is to have a per-device-option thus is more flexible. > we're better off trying to infer the privilege from the task priority or I'd think system admin may make decision after some tunning, like you said it "maybe at the cost of others" and not sure if we should make decision based on task priority or kernel config. Thanks --jyh > kernel config or, if we run out of options, make a module option as you > have here requiring the system admin to provide the privilege. Thanks, > > Alex > > > > Signed-off-by: Yunhong Jiang > > --- > > drivers/vfio/pci/vfio_pci_intrs.c | 10 +- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c > > b/drivers/vfio/pci/vfio_pci_intrs.c > > index 1f577b4..ca1f95a 100644 > > --- a/drivers/vfio/pci/vfio_pci_intrs.c > > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > > @@ -22,9 +22,13 @@ > > #include > > #include > > #include > > +#include > > > > #include "vfio_pci_private.h" > > > > +static bool nonthread_msi = 1; > > +module_param(nonthread_msi, bool, 0444); > > + > > /* > > * INTx > > */ > > @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct > > vfio_pci_device *vdev, > > char *name = msix ? "vfio-msix" : "vfio-msi"; > > struct eventfd_ctx *trigger; > > int ret; > > + unsigned long irqflags = 0; > > > > if (vector >= vdev->num_ctx) > > return -EINVAL; > > @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct > > vfio_pci_device *vdev, > > pci_write_msi_msg(irq, &msg); > > } > > > > - ret = request_irq(irq, vfio_msihandler, 0, > > + if (nonthread_msi) > > + irqflags = IRQF_NO_THREAD; > > + > > + ret = request_irq(irq, vfio_msihandler, irqflags, > > vdev->ctx[vector].name, trigger); > > if (ret) { > > kfree(vdev->ctx[vector].name); > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
On Mon, 2015-10-26 at 18:20 -0700, Yunhong Jiang wrote: > An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, > even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when > assigning a device to a guest with low latency requirement since it > reduce the context switch to/from the IRQ thread. Is there any way we can do this automatically? Perhaps detecting that we're on a RT kernel or maybe that the user is running with RT priority? I find that module options are mostly misunderstood and misused. > An experiment was conducted on a HSW platform for 1 minutes, with the > guest vCPU bound to isolated pCPU. The assigned device triggered the > interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time > is dropped from 5.3us to 2.2us. > > Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this > option only to specific devices when in kernel irq_chip is enabled. It > provides more flexibility but is more complex, not sure if we need go > through that way. Allowing the user to decide whether or not to use a threaded IRQ seems like a privilege violation; a chance for the user to game the system and give themselves better latency, maybe at the cost of others. I think we're better off trying to infer the privilege from the task priority or kernel config or, if we run out of options, make a module option as you have here requiring the system admin to provide the privilege. Thanks, Alex > Signed-off-by: Yunhong Jiang > --- > drivers/vfio/pci/vfio_pci_intrs.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c > b/drivers/vfio/pci/vfio_pci_intrs.c > index 1f577b4..ca1f95a 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -22,9 +22,13 @@ > #include > #include > #include > +#include > > #include "vfio_pci_private.h" > > +static bool nonthread_msi = 1; > +module_param(nonthread_msi, bool, 0444); > + > /* > * INTx > */ > @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_device *vdev, > char *name = msix ? "vfio-msix" : "vfio-msi"; > struct eventfd_ctx *trigger; > int ret; > + unsigned long irqflags = 0; > > if (vector >= vdev->num_ctx) > return -EINVAL; > @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_device *vdev, > pci_write_msi_msg(irq, &msg); > } > > - ret = request_irq(irq, vfio_msihandler, 0, > + if (nonthread_msi) > + irqflags = IRQF_NO_THREAD; > + > + ret = request_irq(irq, vfio_msihandler, irqflags, > vdev->ctx[vector].name, trigger); > if (ret) { > kfree(vdev->ctx[vector].name); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC PATCH] VFIO: Add a parameter to force nonthread IRQ
An option to force VFIO PCI MSI/MSI-X handler as non-threaded IRQ, even when CONFIG_IRQ_FORCED_THREADING=y. This is uselful when assigning a device to a guest with low latency requirement since it reduce the context switch to/from the IRQ thread. An experiment was conducted on a HSW platform for 1 minutes, with the guest vCPU bound to isolated pCPU. The assigned device triggered the interrupt every 1ms. The average EXTERNAL_INTERRUPT exit handling time is dropped from 5.3us to 2.2us. Another choice is to change VFIO_DEVICE_SET_IRQS ioctl, to apply this option only to specific devices when in kernel irq_chip is enabled. It provides more flexibility but is more complex, not sure if we need go through that way. Signed-off-by: Yunhong Jiang --- drivers/vfio/pci/vfio_pci_intrs.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 1f577b4..ca1f95a 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -22,9 +22,13 @@ #include #include #include +#include #include "vfio_pci_private.h" +static bool nonthread_msi = 1; +module_param(nonthread_msi, bool, 0444); + /* * INTx */ @@ -313,6 +317,7 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, char *name = msix ? "vfio-msix" : "vfio-msi"; struct eventfd_ctx *trigger; int ret; + unsigned long irqflags = 0; if (vector >= vdev->num_ctx) return -EINVAL; @@ -352,7 +357,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, pci_write_msi_msg(irq, &msg); } - ret = request_irq(irq, vfio_msihandler, 0, + if (nonthread_msi) + irqflags = IRQF_NO_THREAD; + + ret = request_irq(irq, vfio_msihandler, irqflags, vdev->ctx[vector].name, trigger); if (ret) { kfree(vdev->ctx[vector].name); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/