Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-04-03 Thread Thomas Gleixner
Ming,

On Fri, 30 Mar 2018, Ming Lei wrote:
> On Fri, Mar 09, 2018 at 04:08:19PM +0100, Thomas Gleixner wrote:
> > Thoughts?
> 
> Given this patchset doesn't have effect on normal machines without
> supporting physical CPU hotplug, it can fix performance regression on
> machines which might support physical CPU hotplug(cpu_present_mask !=
> cpu_possible_mask) with some extra memory allocation cost.
>
> So is there any chance to make it in v4.17?

Sorry, that thing fell through the cracks. I'll queue it now and try to do
a pull request late in the merge window.

Thanks,

tglx




Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-29 Thread Ming Lei
Hi Thomas,

On Fri, Mar 09, 2018 at 04:08:19PM +0100, Thomas Gleixner wrote:
> On Fri, 9 Mar 2018, Ming Lei wrote:
> > On Fri, Mar 09, 2018 at 11:08:54AM +0100, Thomas Gleixner wrote:
> > > > > So my understanding is that these irq patches are enhancements and 
> > > > > not bug
> > > > > fixes. I'll queue them for 4.17 then.
> > > > 
> > > > Wrt. this IO hang issue, these patches shouldn't be bug fix, but they 
> > > > may
> > > > fix performance regression[1] for some systems caused by 84676c1f21 
> > > > ("genirq/affinity:
> > > > assign vectors to all possible CPUs").
> > > > 
> > > > [1] https://marc.info/?l=linux-block=152050347831149=2
> > > 
> > > Hmm. The patches are rather large for urgent and evtl. backporting. Is
> > > there a simpler way to address that performance issue?
> > 
> > Not thought of a simpler solution. The problem is that number of active 
> > msix vector
> > is decreased a lot by commit 84676c1f21.
> 
> It's reduced in cases where the number of possible CPUs is way larger than
> the number of online CPUs.
> 
> Now, if you look at the number of present CPUs on such systems it's
> probably the same as the number of online CPUs.
> 
> It only differs on machines which support physical hotplug, but that's not
> the normal case. Those systems are more special and less wide spread.
> 
> So the obvious simple fix for this regression issue is to spread out the
> vectors accross present CPUs and not accross possible CPUs.
> 
> I'm not sure if there is a clear indicator whether physcial hotplug is
> supported or not, but the ACPI folks (x86) and architecture maintainers
> should be able to answer that question. I have a machine which says:
> 
>smpboot: Allowing 128 CPUs, 96 hotplug CPUs
> 
> There is definitely no way to hotplug anything on that machine and sure the
> existing spread algorithm will waste vectors to no end.

percpu variable may waste space too if the possible cpu number is
provided not accurately from ACPI.

> 
> Sure then there is virt, which can pretend to have a gazillion of possible
> hotpluggable CPUs, but virt is an insanity on its own. Though someone might
> come up with reasonable heuristics for that as well.

There are also IBM s390, in which physical CPU hotplug is one normal use
case.

Looks not see any other solution posted out for virt, and it may cause
complicated queue dependency issue by re-introducing CPU hotplug
handler for blk-mq.

> 
> Thoughts?

Given this patchset doesn't have effect on normal machines without
supporting physical CPU hotplug, it can fix performance regression on
machines which might support physical CPU hotplug(cpu_present_mask !=
cpu_possible_mask) with some extra memory allocation cost.

So is there any chance to make it in v4.17?

Thanks,
Ming


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-28 Thread Artem Bityutskiy
On Mon, 2018-03-26 at 10:39 +0200, Thorsten Leemhuis wrote:
> Lo! Your friendly Linux regression tracker here ;-)
> 
> On 08.03.2018 14:18, Artem Bityutskiy wrote:
> > On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
> > > This patchset tries to spread among online CPUs as far as possible, so
> > > that we can avoid to allocate too less irq vectors with online CPUs
> > > mapped.
> > 
> > […] 
> > Tested-by: Artem Bityutskiy 
> > Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com
> > 
> > this patchset fixes the v4.16-rcX regression that I reported few weeks
> > ago. I applied it and verified that Dell R640 server that I mentioned
> > in the bug report boots up and the disk works.
> 
> Artem (or anyone else), what's the status here? I have this on my list
> of regressions, but it looks like there wasn't any progress in the past
> week. Or was it discussed somewhere else or even fixed in the meantime
> and I missed it? Ciao, Thorsten

Hi, it is not fixed in upstream.

I got an e-mail from James that the fixes are in his tree in the
"fixes" branch. There is no word about when it will be merged. There is
also no stable tag.




Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-26 Thread Thorsten Leemhuis
Lo! Your friendly Linux regression tracker here ;-)

On 08.03.2018 14:18, Artem Bityutskiy wrote:
> On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
>> This patchset tries to spread among online CPUs as far as possible, so
>> that we can avoid to allocate too less irq vectors with online CPUs
>> mapped.
> […] 
> Tested-by: Artem Bityutskiy 
> Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com
> 
> this patchset fixes the v4.16-rcX regression that I reported few weeks
> ago. I applied it and verified that Dell R640 server that I mentioned
> in the bug report boots up and the disk works.

Artem (or anyone else), what's the status here? I have this on my list
of regressions, but it looks like there wasn't any progress in the past
week. Or was it discussed somewhere else or even fixed in the meantime
and I missed it? Ciao, Thorsten


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-14 Thread Dou Liyang

Hi Artern,

At 03/14/2018 05:07 PM, Artem Bityutskiy wrote:

On Wed, 2018-03-14 at 12:11 +0800, Dou Liyang wrote:

At 03/13/2018 05:35 PM, Rafael J. Wysocki wrote:

On Tue, Mar 13, 2018 at 9:39 AM, Artem Bityutskiy

Longer term, yeah, I agree. Kernel's notion of possible CPU
count
should be realistic.


I did a patch for that, Artem, could you help me to test it.



I didn't consider the nr_cpu_ids before. please ignore the old patch
and
try the following RFC patch.


Sure I can help with testing a patch, could we please:

1. Start a new thread for this
2. Include ACPI forum/folks



OK,  I will do that right now.

Thanks,
dou


Thanks,
Artem.








Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-14 Thread Artem Bityutskiy
On Wed, 2018-03-14 at 12:11 +0800, Dou Liyang wrote:
> > At 03/13/2018 05:35 PM, Rafael J. Wysocki wrote:
> > > On Tue, Mar 13, 2018 at 9:39 AM, Artem Bityutskiy 
> > > > Longer term, yeah, I agree. Kernel's notion of possible CPU
> > > > count
> > > > should be realistic.
> > 
> > I did a patch for that, Artem, could you help me to test it.
> > 
> 
> I didn't consider the nr_cpu_ids before. please ignore the old patch
> and
> try the following RFC patch.

Sure I can help with testing a patch, could we please:

1. Start a new thread for this
2. Include ACPI forum/folks

Thanks,
Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Dou Liyang

Hi Artem,

At 03/14/2018 11:29 AM, Dou Liyang wrote:

Hi All,

At 03/13/2018 05:35 PM, Rafael J. Wysocki wrote:
On Tue, Mar 13, 2018 at 9:39 AM, Artem Bityutskiy 
 wrote:

On Tue, 2018-03-13 at 16:35 +0800, Ming Lei wrote:

Then looks this issue need to fix by making possible CPU count
accurate
because there are other resources allocated according to
num_possible_cpus(),
such as percpu variables.


Short term the regression should be fixed. It is already v4.16-rc6, we
have little time left.


Right.


Longer term, yeah, I agree. Kernel's notion of possible CPU count
should be realistic.




I did a patch for that, Artem, could you help me to test it.



I didn't consider the nr_cpu_ids before. please ignore the old patch and
try the following RFC patch.

Thanks
dou

--->8-

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..96d568408515 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -671,6 +671,23 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,


 }

+static void __init acpi_refill_possible_map(void)
+{
+   unsigned int cpu, nr = 0;
+
+   if (nr_cpu_ids <= nr_unique_ids)
+   return;
+
+   for_each_possible_cpu(cpu) {
+   if (nr >= nr_unique_ids)
+   set_cpu_possible(cpu, false);
+   nr++;
+   }
+
+   nr_cpu_ids = nr_unique_ids;
+   pr_info("Allowing %d possible CPUs\n", nr_cpu_ids);
+}
+
 static void __init acpi_processor_check_duplicates(void)
 {
/* check the correctness for all processors in ACPI namespace */
@@ -680,6 +697,9 @@ static void __init acpi_processor_check_duplicates(void)
NULL, NULL, NULL);
acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, 
acpi_processor_ids_walk,

NULL, NULL);
+
+   /* make possible CPU count more realistic */
+   acpi_refill_possible_map();
 }

 bool acpi_duplicate_processor_id(int proc_id)





Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Dou Liyang

Hi Rafael,

Thank you so much for your reply.

At 03/13/2018 05:25 PM, Rafael J. Wysocki wrote:

On Tue, Mar 13, 2018 at 4:11 AM, Dou Liyang  wrote:

Hi Thomas,

At 03/09/2018 11:08 PM, Thomas Gleixner wrote:
[...]



I'm not sure if there is a clear indicator whether physcial hotplug is
supported or not, but the ACPI folks (x86) and architecture maintainers


+cc Rafael


should be able to answer that question. I have a machine which says:

 smpboot: Allowing 128 CPUs, 96 hotplug CPUs

There is definitely no way to hotplug anything on that machine and sure
the



AFAIK, in ACPI based dynamic reconfiguration, there is no clear
indicator. In theory, If the ACPI tables have the hotpluggable
CPU resources, the OS can support physical hotplug.


In order for the ACPI-based CPU hotplug (I mean physical, not just the
software offline/online we do in the kernel) to work, there have to be
objects in the ACPI namespace corresponding to all of the processors
in question.

If they are not present, there is no way to signal insertion and eject
the processors safely.


Yes, I see.

Thanks
dou










Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Dou Liyang

Hi All,

At 03/13/2018 05:35 PM, Rafael J. Wysocki wrote:

On Tue, Mar 13, 2018 at 9:39 AM, Artem Bityutskiy  wrote:

On Tue, 2018-03-13 at 16:35 +0800, Ming Lei wrote:

Then looks this issue need to fix by making possible CPU count
accurate
because there are other resources allocated according to
num_possible_cpus(),
such as percpu variables.


Short term the regression should be fixed. It is already v4.16-rc6, we
have little time left.


Right.


Longer term, yeah, I agree. Kernel's notion of possible CPU count
should be realistic.




I did a patch for that, Artem, could you help me to test it.

--->8-

diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 449d86d39965..878abfa0ce30 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -671,6 +671,18 @@ static acpi_status __init 
acpi_processor_ids_walk(acpi_handle handle,


 }

+static void __init acpi_refill_possible_map(void)
+{
+   int i;
+
+   reset_cpu_possible_mask();
+
+   for (i = 0; i < nr_unique_ids; i++)
+   set_cpu_possible(i, true);
+
+   pr_info("Allowing %d possible CPUs\n", nr_unique_ids);
+}
+
 static void __init acpi_processor_check_duplicates(void)
 {
/* check the correctness for all processors in ACPI namespace */
@@ -680,6 +692,9 @@ static void __init acpi_processor_check_duplicates(void)
NULL, NULL, NULL);
acpi_get_devices(ACPI_PROCESSOR_DEVICE_HID, 
acpi_processor_ids_walk,

NULL, NULL);
+
+   /* make possible CPU count more realistic */
+   acpi_refill_possible_map();
 }

 bool acpi_duplicate_processor_id(int proc_id)

--

I agree.

Moreover, there are not too many systems where physical CPU hotplug
actually works in practice AFAICS, so IMO we should default to "no
physical CPU hotplug" and only change that default in special cases
(which may be hard to figure out, but that's a different matter).



Yes, I think so.




What platform firmware tells us may be completely off.


Rafeal,

Sorry, I am not sure what you mean :-) . Did you mean no platform
firmware can tell us whether physcial CPU hotplug is supported or not?

My colleagues also told to me that there is no way in OS to know whether
it is supported or not.

Thanks
dou








Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Rafael J. Wysocki
On Tue, Mar 13, 2018 at 4:11 AM, Dou Liyang  wrote:
> Hi Thomas,
>
> At 03/09/2018 11:08 PM, Thomas Gleixner wrote:
> [...]
>>
>>
>> I'm not sure if there is a clear indicator whether physcial hotplug is
>> supported or not, but the ACPI folks (x86) and architecture maintainers
>
> +cc Rafael
>
>> should be able to answer that question. I have a machine which says:
>>
>> smpboot: Allowing 128 CPUs, 96 hotplug CPUs
>>
>> There is definitely no way to hotplug anything on that machine and sure
>> the
>
>
> AFAIK, in ACPI based dynamic reconfiguration, there is no clear
> indicator. In theory, If the ACPI tables have the hotpluggable
> CPU resources, the OS can support physical hotplug.

In order for the ACPI-based CPU hotplug (I mean physical, not just the
software offline/online we do in the kernel) to work, there have to be
objects in the ACPI namespace corresponding to all of the processors
in question.

If they are not present, there is no way to signal insertion and eject
the processors safely.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Artem Bityutskiy
On Tue, 2018-03-13 at 16:35 +0800, Ming Lei wrote:
> Then looks this issue need to fix by making possible CPU count
> accurate
> because there are other resources allocated according to
> num_possible_cpus(),
> such as percpu variables.

Short term the regression should be fixed. It is already v4.16-rc6, we
have little time left.

Longer term, yeah, I agree. Kernel's notion of possible CPU count
should be realistic.

Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Ming Lei
On Tue, Mar 13, 2018 at 09:38:41AM +0200, Artem Bityutskiy wrote:
> On Tue, 2018-03-13 at 11:11 +0800, Dou Liyang wrote:
> >  I also
> > met the situation that BIOS told to ACPI that it could support
> > physical
> > CPUs hotplug, But actually, there was no hardware slots in the
> > machine.
> > the ACPI tables like user inputs which should be validated when we
> > use.
> 
> This is exactly what happens on Skylake Xeon systems. When I check
> dmesg or this file:
> 
> /sys/devices/system/cpu/possible
> 
> on 2S (two socket) and 4S (four socket) systems, I see the same number
> 432.
> 
> This number comes from ACPI MADT. I will speculate (did not see myself)
> that 8S systems will report the same number as well, because of the
> Skylake-SP (Scalable Platform) architecture.
> 
> Number 432 is good for 8S systems, but it is way too large for 2S and
> 4S systems - 4x or 2x larger than the theoretical maximum.
> 
> I do not know why BIOSes have to report unrealistically high numbers, I
> am just sharing my observation.
> 
> So yes, Linux kernel's possible CPU count knowledge may be too large.
> If we use that number to evenly spread IRQ vectors among the CPUs, we
> end up with wasted vectors, and even bugs, as I observe on a 2S
> Skylake.

Then looks this issue need to fix by making possible CPU count accurate
because there are other resources allocated according to num_possible_cpus(),
such as percpu variables.

Thanks,
Ming


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-12 Thread Dou Liyang

Hi Thomas,

At 03/09/2018 11:08 PM, Thomas Gleixner wrote:
[...]


I'm not sure if there is a clear indicator whether physcial hotplug is
supported or not, but the ACPI folks (x86) and architecture maintainers

+cc Rafael


should be able to answer that question. I have a machine which says:

smpboot: Allowing 128 CPUs, 96 hotplug CPUs

There is definitely no way to hotplug anything on that machine and sure the


AFAIK, in ACPI based dynamic reconfiguration, there is no clear
indicator. In theory, If the ACPI tables have the hotpluggable
CPU resources, the OS can support physical hotplug.

For your machine, Did your CPUs support multi-threading, but not enable
it?

And, sometimes we should not trust the number of possible CPUs. I also
met the situation that BIOS told to ACPI that it could support physical
CPUs hotplug, But actually, there was no hardware slots in the machine.
the ACPI tables like user inputs which should be validated when we use.


existing spread algorithm will waste vectors to no end.

Sure then there is virt, which can pretend to have a gazillion of possible
hotpluggable CPUs, but virt is an insanity on its own. Though someone might
come up with reasonable heuristics for that as well.

Thoughts?


Do we have to map the vectors to CPU statically? Can we map them when
we hotplug/enable the possible CPU?

Thanks,

dou




Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-09 Thread Thomas Gleixner
On Fri, 9 Mar 2018, Ming Lei wrote:
> On Fri, Mar 09, 2018 at 11:08:54AM +0100, Thomas Gleixner wrote:
> > > > So my understanding is that these irq patches are enhancements and not 
> > > > bug
> > > > fixes. I'll queue them for 4.17 then.
> > > 
> > > Wrt. this IO hang issue, these patches shouldn't be bug fix, but they may
> > > fix performance regression[1] for some systems caused by 84676c1f21 
> > > ("genirq/affinity:
> > > assign vectors to all possible CPUs").
> > > 
> > > [1] https://marc.info/?l=linux-block=152050347831149=2
> > 
> > Hmm. The patches are rather large for urgent and evtl. backporting. Is
> > there a simpler way to address that performance issue?
> 
> Not thought of a simpler solution. The problem is that number of active msix 
> vector
> is decreased a lot by commit 84676c1f21.

It's reduced in cases where the number of possible CPUs is way larger than
the number of online CPUs.

Now, if you look at the number of present CPUs on such systems it's
probably the same as the number of online CPUs.

It only differs on machines which support physical hotplug, but that's not
the normal case. Those systems are more special and less wide spread.

So the obvious simple fix for this regression issue is to spread out the
vectors accross present CPUs and not accross possible CPUs.

I'm not sure if there is a clear indicator whether physcial hotplug is
supported or not, but the ACPI folks (x86) and architecture maintainers
should be able to answer that question. I have a machine which says:

   smpboot: Allowing 128 CPUs, 96 hotplug CPUs

There is definitely no way to hotplug anything on that machine and sure the
existing spread algorithm will waste vectors to no end.

Sure then there is virt, which can pretend to have a gazillion of possible
hotpluggable CPUs, but virt is an insanity on its own. Though someone might
come up with reasonable heuristics for that as well.

Thoughts?

Thanks,

tglx










Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-09 Thread Ming Lei
On Fri, Mar 09, 2018 at 11:08:54AM +0100, Thomas Gleixner wrote:
> On Fri, 9 Mar 2018, Ming Lei wrote:
> > On Fri, Mar 09, 2018 at 12:20:09AM +0100, Thomas Gleixner wrote:
> > > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > > Actually, it isn't a real fix, the real one is in the following two:
> > > > 
> > > > 0c20244d458e scsi: megaraid_sas: fix selection of reply queue
> > > > ed6d043be8cd scsi: hpsa: fix selection of reply queue
> > > 
> > > Where are these commits? Neither Linus tree not -next know anything about
> > > them
> > 
> > Both aren't merged yet, but they should land V4.16, IMO.
> > 
> > > 
> > > > This patchset can't guarantee that all IRQ vectors are assigned by one
> > > > online CPU, for example, in a quad-socket system, if only one processor
> > > > is present, then some of vectors are still assigned by all offline CPUs,
> > > > and it is a valid case, but still may cause io hang if drivers(hpsa,
> > > > megaraid_sas) select reply queue in current way.
> > > 
> > > So my understanding is that these irq patches are enhancements and not bug
> > > fixes. I'll queue them for 4.17 then.
> > 
> > Wrt. this IO hang issue, these patches shouldn't be bug fix, but they may
> > fix performance regression[1] for some systems caused by 84676c1f21 
> > ("genirq/affinity:
> > assign vectors to all possible CPUs").
> > 
> > [1] https://marc.info/?l=linux-block=152050347831149=2
> 
> Hmm. The patches are rather large for urgent and evtl. backporting. Is
> there a simpler way to address that performance issue?

Not thought of a simpler solution. The problem is that number of active msix 
vector
is decreased a lot by commit 84676c1f21.

However, if someone wants to backport, this patchset can be applied cleanly, no
any conflict.

Thanks,
Ming


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-09 Thread Thomas Gleixner
On Fri, 9 Mar 2018, Ming Lei wrote:
> On Fri, Mar 09, 2018 at 12:20:09AM +0100, Thomas Gleixner wrote:
> > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > Actually, it isn't a real fix, the real one is in the following two:
> > > 
> > >   0c20244d458e scsi: megaraid_sas: fix selection of reply queue
> > >   ed6d043be8cd scsi: hpsa: fix selection of reply queue
> > 
> > Where are these commits? Neither Linus tree not -next know anything about
> > them
> 
> Both aren't merged yet, but they should land V4.16, IMO.
> 
> > 
> > > This patchset can't guarantee that all IRQ vectors are assigned by one
> > > online CPU, for example, in a quad-socket system, if only one processor
> > > is present, then some of vectors are still assigned by all offline CPUs,
> > > and it is a valid case, but still may cause io hang if drivers(hpsa,
> > > megaraid_sas) select reply queue in current way.
> > 
> > So my understanding is that these irq patches are enhancements and not bug
> > fixes. I'll queue them for 4.17 then.
> 
> Wrt. this IO hang issue, these patches shouldn't be bug fix, but they may
> fix performance regression[1] for some systems caused by 84676c1f21 
> ("genirq/affinity:
> assign vectors to all possible CPUs").
> 
> [1] https://marc.info/?l=linux-block=152050347831149=2

Hmm. The patches are rather large for urgent and evtl. backporting. Is
there a simpler way to address that performance issue?

Thanks,

tglx


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Ming Lei
On Fri, Mar 09, 2018 at 09:00:08AM +0200, Artem Bityutskiy wrote:
> On Fri, 2018-03-09 at 09:24 +0800, Ming Lei wrote:
> > Hi Thomas,
> > 
> > On Fri, Mar 09, 2018 at 12:20:09AM +0100, Thomas Gleixner wrote:
> > > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > > Actually, it isn't a real fix, the real one is in the following
> > > > two:
> > > > 
> > > > 0c20244d458e scsi: megaraid_sas: fix selection of reply queue
> > > > ed6d043be8cd scsi: hpsa: fix selection of reply queue
> > > 
> > > Where are these commits? Neither Linus tree not -next know anything
> > > about
> > > them
> > 
> > Both aren't merged yet, but they should land V4.16, IMO.
> 
> Is it a secret where they are? If not, could you please give ma a
> pointer and I'll give them a test.

  https://marc.info/?l=linux-block=152056636717380=2

Thanks,
Ming


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Fri, 2018-03-09 at 09:24 +0800, Ming Lei wrote:
> Hi Thomas,
> 
> On Fri, Mar 09, 2018 at 12:20:09AM +0100, Thomas Gleixner wrote:
> > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > Actually, it isn't a real fix, the real one is in the following
> > > two:
> > > 
> > >   0c20244d458e scsi: megaraid_sas: fix selection of reply queue
> > >   ed6d043be8cd scsi: hpsa: fix selection of reply queue
> > 
> > Where are these commits? Neither Linus tree not -next know anything
> > about
> > them
> 
> Both aren't merged yet, but they should land V4.16, IMO.

Is it a secret where they are? If not, could you please give ma a
pointer and I'll give them a test.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Thomas Gleixner
On Thu, 8 Mar 2018, Ming Lei wrote:
> Actually, it isn't a real fix, the real one is in the following two:
> 
>   0c20244d458e scsi: megaraid_sas: fix selection of reply queue
>   ed6d043be8cd scsi: hpsa: fix selection of reply queue

Where are these commits? Neither Linus tree not -next know anything about
them

> This patchset can't guarantee that all IRQ vectors are assigned by one
> online CPU, for example, in a quad-socket system, if only one processor
> is present, then some of vectors are still assigned by all offline CPUs,
> and it is a valid case, but still may cause io hang if drivers(hpsa,
> megaraid_sas) select reply queue in current way.

So my understanding is that these irq patches are enhancements and not bug
fixes. I'll queue them for 4.17 then.

Thanks,

tglx


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Ming Lei
On Thu, Mar 08, 2018 at 03:18:33PM +0200, Artem Bityutskiy wrote:
> On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
> > Hi,
> > 
> > This patchset tries to spread among online CPUs as far as possible, so
> > that we can avoid to allocate too less irq vectors with online CPUs
> > mapped.
> > 
> > For example, in a 8cores system, 4 cpu cores(4~7) are offline/non present,
> > on a device with 4 queues:
> > 
> > 1) before this patchset
> > irq 39, cpu list 0-2
> > irq 40, cpu list 3-4,6
> > irq 41, cpu list 5
> > irq 42, cpu list 7
> > 
> > 2) after this patchset
> > irq 39, cpu list 0,4
> > irq 40, cpu list 1,6
> > irq 41, cpu list 2,5
> > irq 42, cpu list 3,7
> > 
> > Without this patchset, only two vectors(39, 40) can be active, but there
> > can be 4 active irq vectors after applying this patchset.
> 
> Tested-by: Artem Bityutskiy 
> Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

Hi Artem,

Thanks for your test!

> 
> Ming,
> 
> this patchset fixes the v4.16-rcX regression that I reported few weeks
> ago. I applied it and verified that Dell R640 server that I mentioned
> in the bug report boots up and the disk works.
> 
> So this is not just an improvement, it also includes a bugfix. 

Actually, it isn't a real fix, the real one is in the following two:

0c20244d458e scsi: megaraid_sas: fix selection of reply queue
ed6d043be8cd scsi: hpsa: fix selection of reply queue

This patchset can't guarantee that all IRQ vectors are assigned by one
online CPU, for example, in a quad-socket system, if only one processor
is present, then some of vectors are still assigned by all offline CPUs,
and it is a valid case, but still may cause io hang if drivers(hpsa, 
megaraid_sas)
select reply queue in current way.

Thanks,
Ming


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 15:18 +0200, Artem Bityutskiy wrote:
> Tested-by: Artem Bityutskiy 
> Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

And for completeness:
Linux-Regression-ID: lr#15a115


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
> Hi,
> 
> This patchset tries to spread among online CPUs as far as possible, so
> that we can avoid to allocate too less irq vectors with online CPUs
> mapped.
> 
> For example, in a 8cores system, 4 cpu cores(4~7) are offline/non present,
> on a device with 4 queues:
> 
> 1) before this patchset
>   irq 39, cpu list 0-2
>   irq 40, cpu list 3-4,6
>   irq 41, cpu list 5
>   irq 42, cpu list 7
> 
> 2) after this patchset
>   irq 39, cpu list 0,4
>   irq 40, cpu list 1,6
>   irq 41, cpu list 2,5
>   irq 42, cpu list 3,7
> 
> Without this patchset, only two vectors(39, 40) can be active, but there
> can be 4 active irq vectors after applying this patchset.

Tested-by: Artem Bityutskiy 
Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

Ming,

this patchset fixes the v4.16-rcX regression that I reported few weeks
ago. I applied it and verified that Dell R640 server that I mentioned
in the bug report boots up and the disk works.

So this is not just an improvement, it also includes a bugfix. 

Thanks!