Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Thu, 2008-07-24 at 14:18 +0200, Sebastien Dugue wrote: On Thu, 24 Jul 2008 21:11:34 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Thursday 24 July 2008 20:50, Sebastien Dugue wrote: From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Hmm, RCU radix tree is in mainline too for quite a while. I thought Ben had already converted this code over ages ago... Mainline does not have the concurrent radix tree which this patch is based on, but maybe it's overkill and the RCU radix tree is enough. Not sure, will have to think about it a bit more. Should be. The model of the concurrent radix tree can be mapped to spinlock + rcu radix tree. So instead of: + DEFINE_RADIX_TREE_CONTEXT(ctx, tree); + radix_tree_lock(ctx); + radix_tree_insert(ctx.tree, hwirq, irq_map[virq]); + radix_tree_unlock(ctx); you then write: spin_lock(host-revmap_data.tree_lock); radix_tree_insert(host-revmap_data.tree, hwirq, irq_map[virq]); spin_unlock(host-revmap_data.tree_lock); The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Still, cool that you used it ;-) ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Fri, 2008-07-25 at 09:49 +0200, Peter Zijlstra wrote: The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Yup, we do not need that for the irq revmap... concurrent lookup is all we need. Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
Hi Peter, On Fri, 25 Jul 2008 09:49:37 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Thu, 2008-07-24 at 14:18 +0200, Sebastien Dugue wrote: On Thu, 24 Jul 2008 21:11:34 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Thursday 24 July 2008 20:50, Sebastien Dugue wrote: From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Hmm, RCU radix tree is in mainline too for quite a while. I thought Ben had already converted this code over ages ago... Mainline does not have the concurrent radix tree which this patch is based on, but maybe it's overkill and the RCU radix tree is enough. Not sure, will have to think about it a bit more. Should be. The model of the concurrent radix tree can be mapped to spinlock + rcu radix tree. So instead of: + DEFINE_RADIX_TREE_CONTEXT(ctx, tree); + radix_tree_lock(ctx); + radix_tree_insert(ctx.tree, hwirq, irq_map[virq]); + radix_tree_unlock(ctx); you then write: spin_lock(host-revmap_data.tree_lock); radix_tree_insert(host-revmap_data.tree, hwirq, irq_map[virq]); spin_unlock(host-revmap_data.tree_lock); Cool, that will indeed makes it much easier to have something applicable to mainline which works with preempt-rt. The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Well in theory that can happen if a module is loaded which creates a mapping while another one is unloaded at the same time. The time window is pretty narrow, but still present nonetheless. That's why I chose to use the concurrent version. Still, cool that you used it ;-) Yep, looked like what was needed until I realized it was not available in mainline. Nice work though and good paper for explaining it all. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Fri, 25 Jul 2008 18:27:20 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Fri, 2008-07-25 at 09:49 +0200, Peter Zijlstra wrote: The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Yup, we do not need that for the irq revmap... concurrent lookup is all we need. Shouldn't we care about concurrent insertion and deletion in the tree? I agree that concern might be a bit artificial but in theory that can happen. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Fri, 25 Jul 2008 18:40:21 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Fri, 2008-07-25 at 10:36 +0200, Sebastien Dugue wrote: On Fri, 25 Jul 2008 18:27:20 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Fri, 2008-07-25 at 09:49 +0200, Peter Zijlstra wrote: The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Yup, we do not need that for the irq revmap... concurrent lookup is all we need. Shouldn't we care about concurrent insertion and deletion in the tree? I agree that concern might be a bit artificial but in theory that can happen. Yes, we just need to protect it with a big hammer, like a spinlock, it's not a performance critical code path. Agreed. Will look into this in the next few days. Thanks, Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Fri, 2008-07-25 at 10:36 +0200, Sebastien Dugue wrote: On Fri, 25 Jul 2008 18:27:20 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: On Fri, 2008-07-25 at 09:49 +0200, Peter Zijlstra wrote: The only advantage of the concurrent radix tree over this model is that it can potentially do multiple modification operations at the same time. Yup, we do not need that for the irq revmap... concurrent lookup is all we need. Shouldn't we care about concurrent insertion and deletion in the tree? I agree that concern might be a bit artificial but in theory that can happen. Yes, we just need to protect it with a big hammer, like a spinlock, it's not a performance critical code path. Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Signed-off-by: Sebastien Dugue [EMAIL PROTECTED] Cc: Benjamin Herrenschmidt [EMAIL PROTECTED] Cc: Paul Mackerras [EMAIL PROTECTED] --- arch/powerpc/kernel/irq.c | 102 -- 1 file changed, 27 insertions(+), 75 deletions(-) Index: linux-2.6.25.8-rt7/arch/powerpc/kernel/irq.c === --- linux-2.6.25.8-rt7.orig/arch/powerpc/kernel/irq.c +++ linux-2.6.25.8-rt7/arch/powerpc/kernel/irq.c @@ -403,8 +403,7 @@ void do_softirq(void) static LIST_HEAD(irq_hosts); static DEFINE_RAW_SPINLOCK(irq_big_lock); -static DEFINE_PER_CPU(unsigned int, irq_radix_reader); -static unsigned int irq_radix_writer; +static atomic_t revmap_trees_allocated = ATOMIC_INIT(0); struct irq_map_entry irq_map[NR_IRQS]; static unsigned int irq_virq_count = NR_IRQS; static struct irq_host *irq_default_host; @@ -547,57 +546,6 @@ void irq_set_virq_count(unsigned int cou irq_virq_count = count; } -/* radix tree not lockless safe ! we use a brlock-type mecanism - * for now, until we can use a lockless radix tree - */ -static void irq_radix_wrlock(unsigned long *flags) -{ - unsigned int cpu, ok; - - spin_lock_irqsave(irq_big_lock, *flags); - irq_radix_writer = 1; - smp_mb(); - do { - barrier(); - ok = 1; - for_each_possible_cpu(cpu) { - if (per_cpu(irq_radix_reader, cpu)) { - ok = 0; - break; - } - } - if (!ok) - cpu_relax(); - } while(!ok); -} - -static void irq_radix_wrunlock(unsigned long flags) -{ - smp_wmb(); - irq_radix_writer = 0; - spin_unlock_irqrestore(irq_big_lock, flags); -} - -static void irq_radix_rdlock(unsigned long *flags) -{ - local_irq_save(*flags); - __get_cpu_var(irq_radix_reader) = 1; - smp_mb(); - if (likely(irq_radix_writer == 0)) - return; - __get_cpu_var(irq_radix_reader) = 0; - smp_wmb(); - spin_lock(irq_big_lock); - __get_cpu_var(irq_radix_reader) = 1; - spin_unlock(irq_big_lock); -} - -static void irq_radix_rdunlock(unsigned long flags) -{ - __get_cpu_var(irq_radix_reader) = 0; - local_irq_restore(flags); -} - static int irq_setup_virq(struct irq_host *host, unsigned int virq, irq_hw_number_t hwirq) { @@ -752,7 +700,6 @@ void irq_dispose_mapping(unsigned int vi { struct irq_host *host; irq_hw_number_t hwirq; - unsigned long flags; if (virq == NO_IRQ) return; @@ -784,15 +731,20 @@ void irq_dispose_mapping(unsigned int vi if (hwirq host-revmap_data.linear.size) host-revmap_data.linear.revmap[hwirq] = NO_IRQ; break; - case IRQ_HOST_MAP_TREE: + case IRQ_HOST_MAP_TREE: { + DEFINE_RADIX_TREE_CONTEXT(ctx, host-revmap_data.tree); + /* Check if radix tree allocated yet */ - if (host-revmap_data.tree.gfp_mask == 0) + if (atomic_read(revmap_trees_allocated) == 0) break; - irq_radix_wrlock(flags); - radix_tree_delete(host-revmap_data.tree, hwirq); - irq_radix_wrunlock(flags); + + radix_tree_lock(ctx); + radix_tree_delete(ctx.tree, hwirq); + radix_tree_unlock(ctx); + break; } + } /* Destroy map */ smp_mb(); @@ -845,22 +797,20 @@ unsigned int irq_radix_revmap(struct irq struct radix_tree_root *tree; struct irq_map_entry *ptr; unsigned int virq; - unsigned long flags; WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); - /* Check if the radix tree exist yet. We test the value of -* the gfp_mask for that. Sneaky but
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Thursday 24 July 2008 20:50, Sebastien Dugue wrote: From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Hmm, RCU radix tree is in mainline too for quite a while. I thought Ben had already converted this code over ages ago... Nothing against the -rt patch, but mainline should probably be updated to use RCU as well? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
On Thu, 24 Jul 2008 21:11:34 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Thursday 24 July 2008 20:50, Sebastien Dugue wrote: From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Hmm, RCU radix tree is in mainline too for quite a while. I thought Ben had already converted this code over ages ago... Mainline does not have the concurrent radix tree which this patch is based on, but maybe it's overkill and the RCU radix tree is enough. Not sure, will have to think about it a bit more. Nothing against the -rt patch, but mainline should probably be updated to use RCU as well? If rcu radix tree is enough, then definitely yes. Sebastien. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Hmm, RCU radix tree is in mainline too for quite a while. I thought Ben had already converted this code over ages ago... Nothing against the -rt patch, but mainline should probably be updated to use RCU as well? No, I haven't updated that code yet, and yes, we should do it :-) Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
[PATCH 2/2][RT] powerpc - Make the irq reverse mapping radix tree lockless
From: Sebastien Dugue [EMAIL PROTECTED] Date: Tue, 22 Jul 2008 11:56:41 +0200 Subject: [PATCH][RT] powerpc - Make the irq reverse mapping radix tree lockless The radix tree used by interrupt controllers for their irq reverse mapping (currently only the XICS found on pSeries) have a complex locking scheme dating back to before the advent of the concurrent radix tree on preempt-rt. Take advantage of this and of the fact that the items of the tree are pointers to a static array (irq_map) elements which can never go under us to simplify the locking. Concurrency between readers and writers are handled by the intrinsic properties of the concurrent radix tree. Concurrency between the tree initialization which is done asynchronously with readers and writers access is handled via an atomic variable (revmap_trees_allocated) set when the tree has been initialized and checked before any reader or writer access just like we used to check for tree.gfp_mask != 0 before. Signed-off-by: Sebastien Dugue [EMAIL PROTECTED] Cc: Benjamin Herrenschmidt [EMAIL PROTECTED] Cc: Paul Mackerras [EMAIL PROTECTED] --- arch/powerpc/kernel/irq.c | 102 -- 1 file changed, 27 insertions(+), 75 deletions(-) Index: linux-2.6.25.8-rt7/arch/powerpc/kernel/irq.c === --- linux-2.6.25.8-rt7.orig/arch/powerpc/kernel/irq.c +++ linux-2.6.25.8-rt7/arch/powerpc/kernel/irq.c @@ -403,8 +403,7 @@ void do_softirq(void) static LIST_HEAD(irq_hosts); static DEFINE_RAW_SPINLOCK(irq_big_lock); -static DEFINE_PER_CPU(unsigned int, irq_radix_reader); -static unsigned int irq_radix_writer; +static atomic_t revmap_trees_allocated = ATOMIC_INIT(0); struct irq_map_entry irq_map[NR_IRQS]; static unsigned int irq_virq_count = NR_IRQS; static struct irq_host *irq_default_host; @@ -547,57 +546,6 @@ void irq_set_virq_count(unsigned int cou irq_virq_count = count; } -/* radix tree not lockless safe ! we use a brlock-type mecanism - * for now, until we can use a lockless radix tree - */ -static void irq_radix_wrlock(unsigned long *flags) -{ - unsigned int cpu, ok; - - spin_lock_irqsave(irq_big_lock, *flags); - irq_radix_writer = 1; - smp_mb(); - do { - barrier(); - ok = 1; - for_each_possible_cpu(cpu) { - if (per_cpu(irq_radix_reader, cpu)) { - ok = 0; - break; - } - } - if (!ok) - cpu_relax(); - } while(!ok); -} - -static void irq_radix_wrunlock(unsigned long flags) -{ - smp_wmb(); - irq_radix_writer = 0; - spin_unlock_irqrestore(irq_big_lock, flags); -} - -static void irq_radix_rdlock(unsigned long *flags) -{ - local_irq_save(*flags); - __get_cpu_var(irq_radix_reader) = 1; - smp_mb(); - if (likely(irq_radix_writer == 0)) - return; - __get_cpu_var(irq_radix_reader) = 0; - smp_wmb(); - spin_lock(irq_big_lock); - __get_cpu_var(irq_radix_reader) = 1; - spin_unlock(irq_big_lock); -} - -static void irq_radix_rdunlock(unsigned long flags) -{ - __get_cpu_var(irq_radix_reader) = 0; - local_irq_restore(flags); -} - static int irq_setup_virq(struct irq_host *host, unsigned int virq, irq_hw_number_t hwirq) { @@ -752,7 +700,6 @@ void irq_dispose_mapping(unsigned int vi { struct irq_host *host; irq_hw_number_t hwirq; - unsigned long flags; if (virq == NO_IRQ) return; @@ -784,15 +731,20 @@ void irq_dispose_mapping(unsigned int vi if (hwirq host-revmap_data.linear.size) host-revmap_data.linear.revmap[hwirq] = NO_IRQ; break; - case IRQ_HOST_MAP_TREE: + case IRQ_HOST_MAP_TREE: { + DEFINE_RADIX_TREE_CONTEXT(ctx, host-revmap_data.tree); + /* Check if radix tree allocated yet */ - if (host-revmap_data.tree.gfp_mask == 0) + if (atomic_read(revmap_trees_allocated) == 0) break; - irq_radix_wrlock(flags); - radix_tree_delete(host-revmap_data.tree, hwirq); - irq_radix_wrunlock(flags); + + radix_tree_lock(ctx); + radix_tree_delete(ctx.tree, hwirq); + radix_tree_unlock(ctx); + break; } + } /* Destroy map */ smp_mb(); @@ -845,22 +797,20 @@ unsigned int irq_radix_revmap(struct irq struct radix_tree_root *tree; struct irq_map_entry *ptr; unsigned int virq; - unsigned long flags; WARN_ON(host-revmap_type != IRQ_HOST_MAP_TREE); - /* Check if the radix tree exist yet. We test the value of -* the gfp_mask for that. Sneaky but