Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-28 Thread Sumit Saxena
On Wed, Dec 19, 2018 at 6:25 PM Sumit Saxena  wrote:
>
> On Wed, Dec 19, 2018 at 4:23 PM Thomas Gleixner  wrote:
> >
> > On Tue, 4 Dec 2018, Dou Liyang wrote:
> >
> > > Now,  Spreading the interrupt affinity info by a cpumask pointer is not
> > > enough, meets a problem[1] and hard to expand in the future.
> > >
> > > Fix it by:
> > >
> > >  +---+
> > >  |   |
> > >  | struct cpumask *affinity  |
> > >  |   |
> > >  +---+
> > >|
> > > +--v---+
> > > |  |
> > > | struct irq_affinity_desc {   |
> > > | struct cpumask   mask;   |
> > > | unsigned int is_managed : 1; |
> > > | };   |
> > > |  |
> > > +--+
> > >
> >
> > So, I've applied that lot for 4.21 (or whatever number it will be). That's
> > only the first step for solving Kashyap's problem.
> >
> > IIRC, then Kashap wanted to get initial interrupt spreading for these extra
> > magic interrupts as well, but not have them marked managed.
> >
> > That's trivial to do now with the two queued changes in that area:
> >
> >   - The rework above
> >
> >   - The support for interrupt sets from Jens
> >
> > Just adding a small bitfield to struct irq_affinity which allows to tell
> > the core that a particular interrupt set is not managed does the trick.
> >
> > Untested patch below.
> >
> > Kashyap, is that what you were looking for and if so, does it work?
> Thomas,
> We could not test these patches as they did net get applied to latest
> linux-block tree cleanly.
>
> Our requirement is: 1. extra interrupts should be un-managed and 2.
> should be spread to CPUs of local NUMA node.
> If interrupts are un-managed but not spread as per our requirement,
> then still driver/userspace apps can manage by spreading
> them as required by calling API- irq_set_affinity_hint().
>
> Thanks,
> Sumit
I tested this patchset with some minor rework to apply it on latest
linux block tree(4.20-rc7).
It worked as our expectation. For "pre_vectors" IRQs(extra set of
interrupts), "is_managed" flag is set to 0
and later driver can affine these "pre_vectors" to CPUs of local NUMA
node through API- irq_set_affinity_hint().
Regular set of interrupts(not pre_vectors/post_vectors) are managed,
"is_managed" set to 1.

Below are some data from my test setup-

# numactl --hardware

available: 2 nodes (0-1)

node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39
40 41 42 43 44 45 46 47 48 49 50 51 52 53

node 0 size: 31822 MB

node 0 free: 30241 MB

node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

node 1 size: 32248 MB

node 1 free: 31960 MB

node distances:

node   0   1

  0:  10  21

  1:  21  10

MegaRAID Controller(PCI device 86:00.0) is attached to node 1.
# find /sys -name *numa_node* | grep "86:00" | xargs cat
1

IRQ-CPU affinity of extra 16 interrupts for PCI  device 86:00.0:
irq 149, cpu list 18-35,54-71
irq 150, cpu list 18-35,54-71
irq 151, cpu list 18-35,54-71
irq 152, cpu list 18-35,54-71
irq 153, cpu list 18-35,54-71
irq 154, cpu list 18-35,54-71
irq 155, cpu list 18-35,54-71
irq 156, cpu list 18-35,54-71
irq 157, cpu list 18-35,54-71
irq 158, cpu list 18-35,54-71
irq 159, cpu list 18-35,54-71
irq 160, cpu list 18-35,54-71
irq 161, cpu list 18-35,54-71
irq 162, cpu list 18-35,54-71
irq 163, cpu list 18-35,54-71
irq 164, cpu list 18-35,54-71
---
# cat /sys/kernel/debug/irq/irqs/164 | grep is_managed
   is_managed:   0

Tested-by: Sumit Saxena 

> >
> > Thanks,
> >
> > tglx
> >
> > 8<-
> >
> > Subject: genirq/affinity: Add support for non-managed affinity sets
> > From: Thomas Gleixner 
> > Date: Tue, 18 Dec 2018 16:46:47 +0100
> >
> > Some drivers need an extra set of interrupts which are not marked managed,
> > but should get initial interrupt spreading.
> >
> > Add a bitmap to struct irq_affinity which allows the driver to mark a
> > particular set of interrupts as non managed. Check the bitmap during
> > spreading and use the result to mark the interrupts in the sets
> > accordingly.
> >
> > The unmanaged interrupts get initial spreading, but user space can change
> > their affinity later on.
> >
> > Usage example:
> >
> >   struct irq_affinity affd = { .pre_vectors = 2 };
> >   int sets[2];
> >
> >   /* Fill in sets[] */
> >
> >   affd.nr_sets = 2;
> >   affd.sets = 
> >   affd.unmanaged_sets = 0x02;
> >
> >   ..
> >
> > So both sets are properly spread out, but the second set is not marked
> > managed.
> >
> > Signed-off-by: Thomas Gleixner 
> > ---
> >  include/linux/interrupt.h |   10 ++
> >  kernel/irq/affinity.c 

Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-19 Thread Sumit Saxena
On Wed, Dec 19, 2018 at 4:23 PM Thomas Gleixner  wrote:
>
> On Tue, 4 Dec 2018, Dou Liyang wrote:
>
> > Now,  Spreading the interrupt affinity info by a cpumask pointer is not
> > enough, meets a problem[1] and hard to expand in the future.
> >
> > Fix it by:
> >
> >  +---+
> >  |   |
> >  | struct cpumask *affinity  |
> >  |   |
> >  +---+
> >|
> > +--v---+
> > |  |
> > | struct irq_affinity_desc {   |
> > | struct cpumask   mask;   |
> > | unsigned int is_managed : 1; |
> > | };   |
> > |  |
> > +--+
> >
>
> So, I've applied that lot for 4.21 (or whatever number it will be). That's
> only the first step for solving Kashyap's problem.
>
> IIRC, then Kashap wanted to get initial interrupt spreading for these extra
> magic interrupts as well, but not have them marked managed.
>
> That's trivial to do now with the two queued changes in that area:
>
>   - The rework above
>
>   - The support for interrupt sets from Jens
>
> Just adding a small bitfield to struct irq_affinity which allows to tell
> the core that a particular interrupt set is not managed does the trick.
>
> Untested patch below.
>
> Kashyap, is that what you were looking for and if so, does it work?
Thomas,
We could not test these patches as they did net get applied to latest
linux-block tree cleanly.

Our requirement is: 1. extra interrupts should be un-managed and 2.
should be spread to CPUs of local NUMA node.
If interrupts are un-managed but not spread as per our requirement,
then still driver/userspace apps can manage by spreading
them as required by calling API- irq_set_affinity_hint().

Thanks,
Sumit
>
> Thanks,
>
> tglx
>
> 8<-
>
> Subject: genirq/affinity: Add support for non-managed affinity sets
> From: Thomas Gleixner 
> Date: Tue, 18 Dec 2018 16:46:47 +0100
>
> Some drivers need an extra set of interrupts which are not marked managed,
> but should get initial interrupt spreading.
>
> Add a bitmap to struct irq_affinity which allows the driver to mark a
> particular set of interrupts as non managed. Check the bitmap during
> spreading and use the result to mark the interrupts in the sets
> accordingly.
>
> The unmanaged interrupts get initial spreading, but user space can change
> their affinity later on.
>
> Usage example:
>
>   struct irq_affinity affd = { .pre_vectors = 2 };
>   int sets[2];
>
>   /* Fill in sets[] */
>
>   affd.nr_sets = 2;
>   affd.sets = 
>   affd.unmanaged_sets = 0x02;
>
>   ..
>
> So both sets are properly spread out, but the second set is not marked
> managed.
>
> Signed-off-by: Thomas Gleixner 
> ---
>  include/linux/interrupt.h |   10 ++
>  kernel/irq/affinity.c |   24 ++--
>  2 files changed, 20 insertions(+), 14 deletions(-)
>
> --- a/kernel/irq/affinity.c
> +++ b/kernel/irq/affinity.c
> @@ -99,7 +99,8 @@ static int __irq_build_affinity_masks(co
>   cpumask_var_t *node_to_cpumask,
>   const struct cpumask *cpu_mask,
>   struct cpumask *nmsk,
> - struct irq_affinity_desc *masks)
> + struct irq_affinity_desc *masks,
> + bool managed)
>  {
> int n, nodes, cpus_per_vec, extra_vecs, done = 0;
> int last_affv = firstvec + numvecs;
> @@ -154,6 +155,7 @@ static int __irq_build_affinity_masks(co
> }
> irq_spread_init_one([curvec].mask, nmsk,
> cpus_per_vec);
> +   masks[curvec].is_managed = managed;
> }
>
> done += v;
> @@ -176,7 +178,8 @@ static int __irq_build_affinity_masks(co
>  static int irq_build_affinity_masks(const struct irq_affinity *affd,
> int startvec, int numvecs, int firstvec,
> cpumask_var_t *node_to_cpumask,
> -   struct irq_affinity_desc *masks)
> +   struct irq_affinity_desc *masks,
> +   bool managed)
>  {
> int curvec = startvec, nr_present, nr_others;
> int ret = -ENOMEM;
> @@ -196,7 +199,8 @@ static int irq_build_affinity_masks(cons
> /* Spread on present CPUs starting from affd->pre_vectors */
> nr_present = __irq_build_affinity_masks(affd, curvec, numvecs,
> firstvec, 

Re: [PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-19 Thread Thomas Gleixner
On Tue, 4 Dec 2018, Dou Liyang wrote:

> Now,  Spreading the interrupt affinity info by a cpumask pointer is not
> enough, meets a problem[1] and hard to expand in the future.
> 
> Fix it by:
> 
>  +---+
>  |   |
>  | struct cpumask *affinity  |
>  |   |
>  +---+
>|
> +--v---+
> |  |
> | struct irq_affinity_desc {   |
> | struct cpumask   mask;   |
> | unsigned int is_managed : 1; |
> | };   |
> |  |
> +--+
> 

So, I've applied that lot for 4.21 (or whatever number it will be). That's
only the first step for solving Kashyap's problem.

IIRC, then Kashap wanted to get initial interrupt spreading for these extra
magic interrupts as well, but not have them marked managed.

That's trivial to do now with the two queued changes in that area:

  - The rework above
  
  - The support for interrupt sets from Jens

Just adding a small bitfield to struct irq_affinity which allows to tell
the core that a particular interrupt set is not managed does the trick.

Untested patch below.

Kashyap, is that what you were looking for and if so, does it work?

Thanks,

tglx

8<-

Subject: genirq/affinity: Add support for non-managed affinity sets
From: Thomas Gleixner 
Date: Tue, 18 Dec 2018 16:46:47 +0100

Some drivers need an extra set of interrupts which are not marked managed,
but should get initial interrupt spreading.

Add a bitmap to struct irq_affinity which allows the driver to mark a
particular set of interrupts as non managed. Check the bitmap during
spreading and use the result to mark the interrupts in the sets
accordingly.

The unmanaged interrupts get initial spreading, but user space can change
their affinity later on.

Usage example:

  struct irq_affinity affd = { .pre_vectors = 2 };
  int sets[2];

  /* Fill in sets[] */

  affd.nr_sets = 2;
  affd.sets = 
  affd.unmanaged_sets = 0x02;

  ..

So both sets are properly spread out, but the second set is not marked
managed.

Signed-off-by: Thomas Gleixner 
---
 include/linux/interrupt.h |   10 ++
 kernel/irq/affinity.c |   24 ++--
 2 files changed, 20 insertions(+), 14 deletions(-)

--- a/kernel/irq/affinity.c
+++ b/kernel/irq/affinity.c
@@ -99,7 +99,8 @@ static int __irq_build_affinity_masks(co
  cpumask_var_t *node_to_cpumask,
  const struct cpumask *cpu_mask,
  struct cpumask *nmsk,
- struct irq_affinity_desc *masks)
+ struct irq_affinity_desc *masks,
+ bool managed)
 {
int n, nodes, cpus_per_vec, extra_vecs, done = 0;
int last_affv = firstvec + numvecs;
@@ -154,6 +155,7 @@ static int __irq_build_affinity_masks(co
}
irq_spread_init_one([curvec].mask, nmsk,
cpus_per_vec);
+   masks[curvec].is_managed = managed;
}
 
done += v;
@@ -176,7 +178,8 @@ static int __irq_build_affinity_masks(co
 static int irq_build_affinity_masks(const struct irq_affinity *affd,
int startvec, int numvecs, int firstvec,
cpumask_var_t *node_to_cpumask,
-   struct irq_affinity_desc *masks)
+   struct irq_affinity_desc *masks,
+   bool managed)
 {
int curvec = startvec, nr_present, nr_others;
int ret = -ENOMEM;
@@ -196,7 +199,8 @@ static int irq_build_affinity_masks(cons
/* Spread on present CPUs starting from affd->pre_vectors */
nr_present = __irq_build_affinity_masks(affd, curvec, numvecs,
firstvec, node_to_cpumask,
-   cpu_present_mask, nmsk, masks);
+   cpu_present_mask, nmsk, masks,
+   managed);
 
/*
 * Spread on non present CPUs starting from the next vector to be
@@ -211,7 +215,7 @@ static int irq_build_affinity_masks(cons
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
nr_others = __irq_build_affinity_masks(affd, curvec, numvecs,
   firstvec, node_to_cpumask,
-  npresmsk, nmsk, masks);
+

[PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-04 Thread Dou Liyang
Now,  Spreading the interrupt affinity info by a cpumask pointer is not
enough, meets a problem[1] and hard to expand in the future.

Fix it by:

 +---+
 |   |
 | struct cpumask *affinity  |
 |   |
 +---+
   |
+--v---+
|  |
| struct irq_affinity_desc {   |
| struct cpumask   mask;   |
| unsigned int is_managed : 1; |
| };   |
|  |
+--+

[1]:https://marc.info/?l=linux-kernel=153543887027997=2

Dou Liyang (3):
  genirq/affinity: Add a new interrupt affinity descriptor
  irq/affinity: Add is_managed into struct irq_affinity_desc
  irq/affinity: Fix a possible breakage

 drivers/pci/msi.c |  9 -
 include/linux/interrupt.h | 15 +--
 include/linux/irq.h   |  6 --
 include/linux/irqdomain.h |  6 --
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 38 +-
 kernel/irq/devres.c   |  4 ++--
 kernel/irq/irqdesc.c  | 25 +
 kernel/irq/irqdomain.c|  4 ++--
 kernel/irq/msi.c  |  7 ---
 10 files changed, 77 insertions(+), 41 deletions(-)

-- 
2.17.2



[PATCH 0/3] irq/core: Fix and expand the irq affinity descriptor

2018-12-04 Thread Dou Liyang
Now,  Spreading the interrupt affinity info by a cpumask pointer is not
enough, meets a problem[1] and hard to expand in the future.

Fix it by:

 +---+
 |   |
 | struct cpumask *affinity  |
 |   |
 +---+
   |
+--v---+
|  |
| struct irq_affinity_desc {   |
| struct cpumask   mask;   |
| unsigned int is_managed : 1; |
| };   |
|  |
+--+

[1]:https://marc.info/?l=linux-kernel=153543887027997=2

Dou Liyang (3):
  genirq/affinity: Add a new interrupt affinity descriptor
  irq/affinity: Add is_managed into struct irq_affinity_desc
  irq/affinity: Fix a possible breakage

 drivers/pci/msi.c |  9 -
 include/linux/interrupt.h | 15 +--
 include/linux/irq.h   |  6 --
 include/linux/irqdomain.h |  6 --
 include/linux/msi.h   |  4 ++--
 kernel/irq/affinity.c | 38 +-
 kernel/irq/devres.c   |  4 ++--
 kernel/irq/irqdesc.c  | 25 +
 kernel/irq/irqdomain.c|  4 ++--
 kernel/irq/msi.c  |  7 ---
 10 files changed, 77 insertions(+), 41 deletions(-)

-- 
2.17.2