Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor
On Wed, Dec 19, 2018 at 6:25 PM Sumit Saxena wrote: > > On Wed, Dec 19, 2018 at 4:23 PM Thomas Gleixner wrote: > > > > On Tue, 4 Dec 2018, Dou Liyang wrote: > > > > > Now, Spreading the interrupt affinity info by a cpumask pointer is not > > > enough, meets a problem[1] and hard to expand in the future. > > > > > > Fix it by: > > > > > > +---+ > > > | | > > > | struct cpumask *affinity | > > > | | > > > +---+ > > >| > > > +--v---+ > > > | | > > > | struct irq_affinity_desc { | > > > | struct cpumask mask; | > > > | unsigned int is_managed : 1; | > > > | }; | > > > | | > > > +--+ > > > > > > > So, I've applied that lot for 4.21 (or whatever number it will be). That's > > only the first step for solving Kashyap's problem. > > > > IIRC, then Kashap wanted to get initial interrupt spreading for these extra > > magic interrupts as well, but not have them marked managed. > > > > That's trivial to do now with the two queued changes in that area: > > > > - The rework above > > > > - The support for interrupt sets from Jens > > > > Just adding a small bitfield to struct irq_affinity which allows to tell > > the core that a particular interrupt set is not managed does the trick. > > > > Untested patch below. > > > > Kashyap, is that what you were looking for and if so, does it work? > Thomas, > We could not test these patches as they did net get applied to latest > linux-block tree cleanly. > > Our requirement is: 1. extra interrupts should be un-managed and 2. > should be spread to CPUs of local NUMA node. > If interrupts are un-managed but not spread as per our requirement, > then still driver/userspace apps can manage by spreading > them as required by calling API- irq_set_affinity_hint(). > > Thanks, > Sumit I tested this patchset with some minor rework to apply it on latest linux block tree(4.20-rc7). It worked as our expectation. For "pre_vectors" IRQs(extra set of interrupts), "is_managed" flag is set to 0 and later driver can affine these "pre_vectors" to CPUs of local NUMA node through API- irq_set_affinity_hint(). Regular set of interrupts(not pre_vectors/post_vectors) are managed, "is_managed" set to 1. Below are some data from my test setup- # numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 node 0 size: 31822 MB node 0 free: 30241 MB node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 node 1 size: 32248 MB node 1 free: 31960 MB node distances: node 0 1 0: 10 21 1: 21 10 MegaRAID Controller(PCI device 86:00.0) is attached to node 1. # find /sys -name *numa_node* | grep "86:00" | xargs cat 1 IRQ-CPU affinity of extra 16 interrupts for PCI device 86:00.0: irq 149, cpu list 18-35,54-71 irq 150, cpu list 18-35,54-71 irq 151, cpu list 18-35,54-71 irq 152, cpu list 18-35,54-71 irq 153, cpu list 18-35,54-71 irq 154, cpu list 18-35,54-71 irq 155, cpu list 18-35,54-71 irq 156, cpu list 18-35,54-71 irq 157, cpu list 18-35,54-71 irq 158, cpu list 18-35,54-71 irq 159, cpu list 18-35,54-71 irq 160, cpu list 18-35,54-71 irq 161, cpu list 18-35,54-71 irq 162, cpu list 18-35,54-71 irq 163, cpu list 18-35,54-71 irq 164, cpu list 18-35,54-71 --- # cat /sys/kernel/debug/irq/irqs/164 | grep is_managed is_managed: 0 Tested-by: Sumit Saxena > > > > Thanks, > > > > tglx > > > > 8<- > > > > Subject: genirq/affinity: Add support for non-managed affinity sets > > From: Thomas Gleixner > > Date: Tue, 18 Dec 2018 16:46:47 +0100 > > > > Some drivers need an extra set of interrupts which are not marked managed, > > but should get initial interrupt spreading. > > > > Add a bitmap to struct irq_affinity which allows the driver to mark a > > particular set of interrupts as non managed. Check the bitmap during > > spreading and use the result to mark the interrupts in the sets > > accordingly. > > > > The unmanaged interrupts get initial spreading, but user space can change > > their affinity later on. > > > > Usage example: > > > > struct irq_affinity affd = { .pre_vectors = 2 }; > > int sets[2]; > > > > /* Fill in sets[] */ > > > > affd.nr_sets = 2; > > affd.sets = > > affd.unmanaged_sets = 0x02; > > > > .. > > > > So both sets are properly spread out, but the second set is not marked > > managed. > > > > Signed-off-by: Thomas Gleixner > > --- > > include/linux/interrupt.h | 10 ++ > > kernel/irq/affinity.c
Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor
On Wed, Dec 19, 2018 at 4:23 PM Thomas Gleixner wrote: > > On Tue, 4 Dec 2018, Dou Liyang wrote: > > > Now, Spreading the interrupt affinity info by a cpumask pointer is not > > enough, meets a problem[1] and hard to expand in the future. > > > > Fix it by: > > > > +---+ > > | | > > | struct cpumask *affinity | > > | | > > +---+ > >| > > +--v---+ > > | | > > | struct irq_affinity_desc { | > > | struct cpumask mask; | > > | unsigned int is_managed : 1; | > > | }; | > > | | > > +--+ > > > > So, I've applied that lot for 4.21 (or whatever number it will be). That's > only the first step for solving Kashyap's problem. > > IIRC, then Kashap wanted to get initial interrupt spreading for these extra > magic interrupts as well, but not have them marked managed. > > That's trivial to do now with the two queued changes in that area: > > - The rework above > > - The support for interrupt sets from Jens > > Just adding a small bitfield to struct irq_affinity which allows to tell > the core that a particular interrupt set is not managed does the trick. > > Untested patch below. > > Kashyap, is that what you were looking for and if so, does it work? Thomas, We could not test these patches as they did net get applied to latest linux-block tree cleanly. Our requirement is: 1. extra interrupts should be un-managed and 2. should be spread to CPUs of local NUMA node. If interrupts are un-managed but not spread as per our requirement, then still driver/userspace apps can manage by spreading them as required by calling API- irq_set_affinity_hint(). Thanks, Sumit > > Thanks, > > tglx > > 8<- > > Subject: genirq/affinity: Add support for non-managed affinity sets > From: Thomas Gleixner > Date: Tue, 18 Dec 2018 16:46:47 +0100 > > Some drivers need an extra set of interrupts which are not marked managed, > but should get initial interrupt spreading. > > Add a bitmap to struct irq_affinity which allows the driver to mark a > particular set of interrupts as non managed. Check the bitmap during > spreading and use the result to mark the interrupts in the sets > accordingly. > > The unmanaged interrupts get initial spreading, but user space can change > their affinity later on. > > Usage example: > > struct irq_affinity affd = { .pre_vectors = 2 }; > int sets[2]; > > /* Fill in sets[] */ > > affd.nr_sets = 2; > affd.sets = > affd.unmanaged_sets = 0x02; > > .. > > So both sets are properly spread out, but the second set is not marked > managed. > > Signed-off-by: Thomas Gleixner > --- > include/linux/interrupt.h | 10 ++ > kernel/irq/affinity.c | 24 ++-- > 2 files changed, 20 insertions(+), 14 deletions(-) > > --- a/kernel/irq/affinity.c > +++ b/kernel/irq/affinity.c > @@ -99,7 +99,8 @@ static int __irq_build_affinity_masks(co > cpumask_var_t *node_to_cpumask, > const struct cpumask *cpu_mask, > struct cpumask *nmsk, > - struct irq_affinity_desc *masks) > + struct irq_affinity_desc *masks, > + bool managed) > { > int n, nodes, cpus_per_vec, extra_vecs, done = 0; > int last_affv = firstvec + numvecs; > @@ -154,6 +155,7 @@ static int __irq_build_affinity_masks(co > } > irq_spread_init_one([curvec].mask, nmsk, > cpus_per_vec); > + masks[curvec].is_managed = managed; > } > > done += v; > @@ -176,7 +178,8 @@ static int __irq_build_affinity_masks(co > static int irq_build_affinity_masks(const struct irq_affinity *affd, > int startvec, int numvecs, int firstvec, > cpumask_var_t *node_to_cpumask, > - struct irq_affinity_desc *masks) > + struct irq_affinity_desc *masks, > + bool managed) > { > int curvec = startvec, nr_present, nr_others; > int ret = -ENOMEM; > @@ -196,7 +199,8 @@ static int irq_build_affinity_masks(cons > /* Spread on present CPUs starting from affd->pre_vectors */ > nr_present = __irq_build_affinity_masks(affd, curvec, numvecs, > firstvec,
Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor
On Tue, 4 Dec 2018, Dou Liyang wrote: > Now, Spreading the interrupt affinity info by a cpumask pointer is not > enough, meets a problem[1] and hard to expand in the future. > > Fix it by: > > +---+ > | | > | struct cpumask *affinity | > | | > +---+ >| > +--v---+ > | | > | struct irq_affinity_desc { | > | struct cpumask mask; | > | unsigned int is_managed : 1; | > | }; | > | | > +--+ > So, I've applied that lot for 4.21 (or whatever number it will be). That's only the first step for solving Kashyap's problem. IIRC, then Kashap wanted to get initial interrupt spreading for these extra magic interrupts as well, but not have them marked managed. That's trivial to do now with the two queued changes in that area: - The rework above - The support for interrupt sets from Jens Just adding a small bitfield to struct irq_affinity which allows to tell the core that a particular interrupt set is not managed does the trick. Untested patch below. Kashyap, is that what you were looking for and if so, does it work? Thanks, tglx 8<- Subject: genirq/affinity: Add support for non-managed affinity sets From: Thomas Gleixner Date: Tue, 18 Dec 2018 16:46:47 +0100 Some drivers need an extra set of interrupts which are not marked managed, but should get initial interrupt spreading. Add a bitmap to struct irq_affinity which allows the driver to mark a particular set of interrupts as non managed. Check the bitmap during spreading and use the result to mark the interrupts in the sets accordingly. The unmanaged interrupts get initial spreading, but user space can change their affinity later on. Usage example: struct irq_affinity affd = { .pre_vectors = 2 }; int sets[2]; /* Fill in sets[] */ affd.nr_sets = 2; affd.sets = affd.unmanaged_sets = 0x02; .. So both sets are properly spread out, but the second set is not marked managed. Signed-off-by: Thomas Gleixner --- include/linux/interrupt.h | 10 ++ kernel/irq/affinity.c | 24 ++-- 2 files changed, 20 insertions(+), 14 deletions(-) --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -99,7 +99,8 @@ static int __irq_build_affinity_masks(co cpumask_var_t *node_to_cpumask, const struct cpumask *cpu_mask, struct cpumask *nmsk, - struct irq_affinity_desc *masks) + struct irq_affinity_desc *masks, + bool managed) { int n, nodes, cpus_per_vec, extra_vecs, done = 0; int last_affv = firstvec + numvecs; @@ -154,6 +155,7 @@ static int __irq_build_affinity_masks(co } irq_spread_init_one([curvec].mask, nmsk, cpus_per_vec); + masks[curvec].is_managed = managed; } done += v; @@ -176,7 +178,8 @@ static int __irq_build_affinity_masks(co static int irq_build_affinity_masks(const struct irq_affinity *affd, int startvec, int numvecs, int firstvec, cpumask_var_t *node_to_cpumask, - struct irq_affinity_desc *masks) + struct irq_affinity_desc *masks, + bool managed) { int curvec = startvec, nr_present, nr_others; int ret = -ENOMEM; @@ -196,7 +199,8 @@ static int irq_build_affinity_masks(cons /* Spread on present CPUs starting from affd->pre_vectors */ nr_present = __irq_build_affinity_masks(affd, curvec, numvecs, firstvec, node_to_cpumask, - cpu_present_mask, nmsk, masks); + cpu_present_mask, nmsk, masks, + managed); /* * Spread on non present CPUs starting from the next vector to be @@ -211,7 +215,7 @@ static int irq_build_affinity_masks(cons cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask); nr_others = __irq_build_affinity_masks(affd, curvec, numvecs, firstvec, node_to_cpumask, - npresmsk, nmsk, masks); +
[PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor
Now, Spreading the interrupt affinity info by a cpumask pointer is not enough, meets a problem[1] and hard to expand in the future. Fix it by: +---+ | | | struct cpumask *affinity | | | +---+ | +--v---+ | | | struct irq_affinity_desc { | | struct cpumask mask; | | unsigned int is_managed : 1; | | }; | | | +--+ [1]:https://marc.info/?l=linux-kernel=153543887027997=2 Dou Liyang (3): genirq/affinity: Add a new interrupt affinity descriptor irq/affinity: Add is_managed into struct irq_affinity_desc irq/affinity: Fix a possible breakage drivers/pci/msi.c | 9 - include/linux/interrupt.h | 15 +-- include/linux/irq.h | 6 -- include/linux/irqdomain.h | 6 -- include/linux/msi.h | 4 ++-- kernel/irq/affinity.c | 38 +- kernel/irq/devres.c | 4 ++-- kernel/irq/irqdesc.c | 25 + kernel/irq/irqdomain.c| 4 ++-- kernel/irq/msi.c | 7 --- 10 files changed, 77 insertions(+), 41 deletions(-) -- 2.17.2
[PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor
Now, Spreading the interrupt affinity info by a cpumask pointer is not enough, meets a problem[1] and hard to expand in the future. Fix it by: +---+ | | | struct cpumask *affinity | | | +---+ | +--v---+ | | | struct irq_affinity_desc { | | struct cpumask mask; | | unsigned int is_managed : 1; | | }; | | | +--+ [1]:https://marc.info/?l=linux-kernel=153543887027997=2 Dou Liyang (3): genirq/affinity: Add a new interrupt affinity descriptor irq/affinity: Add is_managed into struct irq_affinity_desc irq/affinity: Fix a possible breakage drivers/pci/msi.c | 9 - include/linux/interrupt.h | 15 +-- include/linux/irq.h | 6 -- include/linux/irqdomain.h | 6 -- include/linux/msi.h | 4 ++-- kernel/irq/affinity.c | 38 +- kernel/irq/devres.c | 4 ++-- kernel/irq/irqdesc.c | 25 + kernel/irq/irqdomain.c| 4 ++-- kernel/irq/msi.c | 7 --- 10 files changed, 77 insertions(+), 41 deletions(-) -- 2.17.2