Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2021-01-08 Thread Christopher William Snowhill
Replying to https://lkml.org/lkml/2019/2/19/516 from yes, 2019.

My MSI B450 Tomahawk is exhibiting this bug now that I've updated the firmware 
to the latest beta BIOS with AGESA 1.1.0.0 patch D.


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-07 Thread Hans de Goede

Hi,

On 06-03-19 11:14, Thomas Gleixner wrote:

Hans,

On Wed, 6 Mar 2019, Hans de Goede wrote:

On 05-03-19 20:54, Borislav Petkov wrote:

On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:

Finger pointing at the firmware if there are multiple vendors involved
is really not going to help here. Esp. since most OEMs will just respond
with "the machine works fine with Windows"


Yes, because windoze simply doesn't report that spurious IRQ, most
likely.


So maybe we need to lower the priority of the do_IRQ error from pr_emerg
to pr_err then ?  That will stop throwing the errors in the users face each
boot on distros which have chosen to set the quiet loglevel to such a level
that pr_err messages are not shown on the console (*).


Well, we rather try to understand and fix the issue.

So if Tom's theory holds, then the patch below should cure it.


Thank you for the patch, unfortunately the messages still happen
with a kernel with the patch applied:

[0.741479] smp: Bringing up secondary CPUs ...
[0.741654] x86: Booting SMP configuration:
[0.741655]  node  #0, CPUs:#1
[0.742231] TSC synchronization [CPU#0 -> CPU#1]:
[0.742231] Measured 3346474670 cycles TSC warp between CPUs, turning off TSC
 clock.
[0.742231] tsc: Marking TSC unstable due to check_tsc_sync_source failed
[0.321639] do_IRQ: 1.55 No irq handler for vector
[0.743371]   #2
[0.321639] do_IRQ: 2.55 No irq handler for vector
[0.743598]   #3
[0.321639] do_IRQ: 3.55 No irq handler for vector
[0.744306]   #4
[0.321639] do_IRQ: 4.55 No irq handler for vector
[0.744531]   #5
[0.321639] do_IRQ: 5.55 No irq handler for vector
[0.745241]   #6
[0.321639] do_IRQ: 6.55 No irq handler for vector
[0.745467]   #7
[0.321639] do_IRQ: 7.55 No irq handler for vector
[0.745627] smp: Brought up 1 node, 8 CPUs
[0.745627] smpboot: Max logical packages: 2
[0.745627] smpboot: Total of 8 processors activated (35133.37 BogoMIPS)

I also tried suspend/resume. In that case there are no
extra "No irq handler for vector" printed, this seems to
only trigger once per CPU on boot only.

I do get these messages during resume, but I guess these are unrelated:

[  167.034247] ACPI: Low-level resume complete
[  167.034247] ACPI: EC: EC started
[  167.034247] PM: Restoring platform NVS memory
[  167.034247] Enabling non-boot CPUs ...
[  167.034247] x86: Booting SMP configuration:
[  167.034247] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  167.034247]  cache: parent cpu1 should not be sleeping
[  167.034281] microcode: CPU1: patch_level=0x08101007
[  167.034542] CPU1 is up
[  167.034583] smpboot: Booting Node 0 Processor 2 APIC 0x2
[  167.035347]  cache: parent cpu2 should not be sleeping
[  167.035484] microcode: CPU2: patch_level=0x08101007
[  167.035690] CPU2 is up
[  167.035703] smpboot: Booting Node 0 Processor 3 APIC 0x3
[  167.036447]  cache: parent cpu3 should not be sleeping
[  167.036580] microcode: CPU3: patch_level=0x08101007
[  167.036819] CPU3 is up
[  167.036843] smpboot: Booting Node 0 Processor 4 APIC 0x4
[  167.038227]  cache: parent cpu4 should not be sleeping
[  167.038384] microcode: CPU4: patch_level=0x08101007
etc.

Regards,

Hans



8<-

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void)
   */
  void apic_ap_setup(void)
  {
+   clear_local_APIC();
setup_local_APIC();
end_local_APIC_setup();
  }



Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-06 Thread Thomas Gleixner
Hans,

On Wed, 6 Mar 2019, Hans de Goede wrote:
> On 05-03-19 20:54, Borislav Petkov wrote:
> > On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:
> > > Finger pointing at the firmware if there are multiple vendors involved
> > > is really not going to help here. Esp. since most OEMs will just respond
> > > with "the machine works fine with Windows"
> > 
> > Yes, because windoze simply doesn't report that spurious IRQ, most
> > likely.
> 
> So maybe we need to lower the priority of the do_IRQ error from pr_emerg
> to pr_err then ?  That will stop throwing the errors in the users face each
> boot on distros which have chosen to set the quiet loglevel to such a level
> that pr_err messages are not shown on the console (*).

Well, we rather try to understand and fix the issue.

So if Tom's theory holds, then the patch below should cure it.

Thanks,

tglx

8<-

--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1642,6 +1642,7 @@ static void end_local_APIC_setup(void)
  */
 void apic_ap_setup(void)
 {
+   clear_local_APIC();
setup_local_APIC();
end_local_APIC_setup();
 }


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-06 Thread Hans de Goede

Hi,

On 05-03-19 20:54, Borislav Petkov wrote:

On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:

Finger pointing at the firmware if there are multiple vendors involved
is really not going to help here. Esp. since most OEMs will just respond
with "the machine works fine with Windows"


Yes, because windoze simply doesn't report that spurious IRQ, most
likely.


So maybe we need to lower the priority of the do_IRQ error from pr_emerg
to pr_err then ?  That will stop throwing the errors in the users face each
boot on distros which have chosen to set the quiet loglevel to such a level
that pr_err messages are not shown on the console (*).

Regards,

Hans


*) Since there are simply too much false-positive pr_err messages in the kernel,
try e.g. plugging in a usb-stick and then do "dmesg -level=err"
Note the messages will still be in dmesg and in the system logs



Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Borislav Petkov
On Tue, Mar 05, 2019 at 08:40:02PM +0100, Hans de Goede wrote:
> Finger pointing at the firmware if there are multiple vendors involved
> is really not going to help here. Esp. since most OEMs will just respond
> with "the machine works fine with Windows"

Yes, because windoze simply doesn't report that spurious IRQ, most
likely.

Firmware is fiddling with some crap underneath and it ends up raising
IRQs. tglx told you that too.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Hans de Goede

Hi,

On 05-03-19 20:31, Lendacky, Thomas wrote:

On 3/5/19 1:19 PM, Hans de Goede wrote:

Hi,

On 05-03-19 17:02, Hans de Goede wrote:

Hi,

On 05-03-19 15:06, Lendacky, Thomas wrote:

On 3/3/19 4:57 AM, Hans de Goede wrote:

Hi,

On 21-02-19 13:30, Hans de Goede wrote:

Hi,

On 19-02-19 22:47, Lendacky, Thomas wrote:

On 2/19/19 3:01 PM, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq
handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up
secondary CPUs
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP
configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0,
CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node,
4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical
packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4
processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these
CPUs
for whatever reason.



I remember seeing something like this in the past and it turned out
to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy
8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a
pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR
interrupt is
processed shortly after the OS enables interrupts. The AP then
queries the
8259 to identify the vector number (which is the value of the 8259's
ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with
IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

   From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry
configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.


That sounds like a likely candidate, esp. also since this only happens
once per CPU when we first only the CPU.

Can you provide me with a patch with some printk-s / pr_debugs to
test for this, then I can build a kernel with that patch added and
we can see if your hypothesis is right.


Ping? I like your theory, can you provide some help with debugging this
further (to prove that your theory is correct ) ?


It's been a very long time since I dealt with this and I was only on the
periphery. You might be able to print the LVT entries from the APIC and
see if any of them have an un-masked ExtINT delivery mode.  You would need
to do this very early before Linux modifies any values.


I'm afraid I'm not familiar enough with the interrupt / APIC parts of
the kernel to do something like this myself.


Or you can report the issue to the OEM and have them check their BIOS
code to see if they are doing this.


I will try to go this route, but I'm not really hopeful that will
lead to a solution.


A similar issue is also reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

There are multiple people with different vectors (so likely / possibly
different bugs) commenting on that bug, but I just got confirmation
that the vector 55 issue is also happening on an Acer system with an AMD
A8 processor (I suspect a Ryzen, but that still needs to be confirmed).

So this seems to be a generic issue with (some) AMD laptops and
not specific to one OEM.


I also see that comment 17 is for an Intel based machine, which to me
implies that it really is a BIOS issue.


That user is seeing "No irq handler for vector" on vectors 33-35 so that
is likely / possibly another bug.

Finger pointing at the firmware if there are multiple vendors involved
is really not going to help here. Esp. since most OEMs will just respond
with "the machine works fine with Windows"

Regards,

Hans


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Lendacky, Thomas
On 3/5/19 1:19 PM, Hans de Goede wrote:
> Hi,
> 
> On 05-03-19 17:02, Hans de Goede wrote:
>> Hi,
>>
>> On 05-03-19 15:06, Lendacky, Thomas wrote:
>>> On 3/3/19 4:57 AM, Hans de Goede wrote:
 Hi,

 On 21-02-19 13:30, Hans de Goede wrote:
> Hi,
>
> On 19-02-19 22:47, Lendacky, Thomas wrote:
>> On 2/19/19 3:01 PM, Thomas Gleixner wrote:
>>> Hans,
>>>
>>> On Tue, 19 Feb 2019, Hans de Goede wrote:
>>>
>>> Cc+: ACPI/AMD folks
>>>
 Various people are reporting false positive "do_IRQ: #.55 No irq
 handler for
 vector"
 messages on AMD ryzen based laptops, see e.g.:

 https://bugzilla.redhat.com/show_bug.cgi?id=1551605

 Which contains this dmesg snippet:

 Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up
 secondary CPUs
 ...
 Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP
 configuration:
 Feb 07 20:14:29 localhost.localdomain kernel:  node  #0,
 CPUs:  #1
 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq
 handler for
 vector
 Feb 07 20:14:29 localhost.localdomain kernel:  #2
 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq
 handler for
 vector
 Feb 07 20:14:29 localhost.localdomain kernel:  #3
 Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq
 handler for
 vector
 Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node,
 4 CPUs
 Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical
 packages: 1
 Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4
 processors
 activated (15968.49 BogoMIPS)

 It seems that we get an IRQ for each CPU as we bring it online,
 which feels to me like it is some sorta false-positive.
>>>
>>> Sigh, that looks like BIOS value add again.
>>>
>>> It's not a false positive. Something _IS_ sending a vector 55 to these
>>> CPUs
>>> for whatever reason.
>>>
>>
>> I remember seeing something like this in the past and it turned out
>> to be
>> a BIOS issue.  BIOS was enabling the APs to interact with the legacy
>> 8259
>> interrupt controller when only the BSP should. During POST the APs were
>> exposed to ExtINT/INTR events as a result of the mis-configuration
>> (probably due to a UEFI timer-tick using the 8259) and this left a
>> pending
>> ExtINT/INTR interrupt latched on the APs.
>>
>> When the APs were started by the OS, the latched ExtINT/INTR
>> interrupt is
>> processed shortly after the OS enables interrupts. The AP then
>> queries the
>> 8259 to identify the vector number (which is the value of the 8259's
>> ICW2
>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
>> since no interrupts are actually pending, the 8259 will respond with
>> IRQ7
>> (spurious interrupt) yielding a vector of 0x37 or 55.
>>
>> The OS was not expecting vector 55 and printed the message.
>>
>>   From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
>> "Only one processor in the system should have an LVT entry
>> configured to
>> use the ExtINT delivery mode."
>>
>> Not saying this is the problem, but very well could be.
>
> That sounds like a likely candidate, esp. also since this only happens
> once per CPU when we first only the CPU.
>
> Can you provide me with a patch with some printk-s / pr_debugs to
> test for this, then I can build a kernel with that patch added and
> we can see if your hypothesis is right.

 Ping? I like your theory, can you provide some help with debugging this
 further (to prove that your theory is correct ) ?
>>>
>>> It's been a very long time since I dealt with this and I was only on the
>>> periphery. You might be able to print the LVT entries from the APIC and
>>> see if any of them have an un-masked ExtINT delivery mode.  You would need
>>> to do this very early before Linux modifies any values.
>>
>> I'm afraid I'm not familiar enough with the interrupt / APIC parts of
>> the kernel to do something like this myself.
>>
>>> Or you can report the issue to the OEM and have them check their BIOS
>>> code to see if they are doing this.
>>
>> I will try to go this route, but I'm not really hopeful that will
>> lead to a solution.
> 
> A similar issue is also reported here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1551605
> 
> There are multiple people with different vectors (so likely / possibly
> different bugs) commenting on that bug, but I just got confirmation
> that the vector 55 issue is also happening on an Acer system with an AMD
> A8 processor (I suspect a Ryzen, but that still needs to 

Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Hans de Goede

Hi,

On 05-03-19 17:02, Hans de Goede wrote:

Hi,

On 05-03-19 15:06, Lendacky, Thomas wrote:

On 3/3/19 4:57 AM, Hans de Goede wrote:

Hi,

On 21-02-19 13:30, Hans de Goede wrote:

Hi,

On 19-02-19 22:47, Lendacky, Thomas wrote:

On 2/19/19 3:01 PM, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq
handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up
secondary CPUs
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP
configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0,
CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node,
4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical
packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4
processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these
CPUs
for whatever reason.



I remember seeing something like this in the past and it turned out to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

  From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.


That sounds like a likely candidate, esp. also since this only happens
once per CPU when we first only the CPU.

Can you provide me with a patch with some printk-s / pr_debugs to
test for this, then I can build a kernel with that patch added and
we can see if your hypothesis is right.


Ping? I like your theory, can you provide some help with debugging this
further (to prove that your theory is correct ) ?


It's been a very long time since I dealt with this and I was only on the
periphery. You might be able to print the LVT entries from the APIC and
see if any of them have an un-masked ExtINT delivery mode.  You would need
to do this very early before Linux modifies any values.


I'm afraid I'm not familiar enough with the interrupt / APIC parts of
the kernel to do something like this myself.


Or you can report the issue to the OEM and have them check their BIOS
code to see if they are doing this.


I will try to go this route, but I'm not really hopeful that will
lead to a solution.


A similar issue is also reported here:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

There are multiple people with different vectors (so likely / possibly
different bugs) commenting on that bug, but I just got confirmation
that the vector 55 issue is also happening on an Acer system with an AMD
A8 processor (I suspect a Ryzen, but that still needs to be confirmed).

So this seems to be a generic issue with (some) AMD laptops and
not specific to one OEM.

Regards,

Hans


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Hans de Goede

Hi,

On 05-03-19 15:06, Lendacky, Thomas wrote:

On 3/3/19 4:57 AM, Hans de Goede wrote:

Hi,

On 21-02-19 13:30, Hans de Goede wrote:

Hi,

On 19-02-19 22:47, Lendacky, Thomas wrote:

On 2/19/19 3:01 PM, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq
handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up
secondary CPUs
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP
configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0,
CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq
handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node,
4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical
packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4
processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these
CPUs
for whatever reason.



I remember seeing something like this in the past and it turned out to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

  From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.


That sounds like a likely candidate, esp. also since this only happens
once per CPU when we first only the CPU.

Can you provide me with a patch with some printk-s / pr_debugs to
test for this, then I can build a kernel with that patch added and
we can see if your hypothesis is right.


Ping? I like your theory, can you provide some help with debugging this
further (to prove that your theory is correct ) ?


It's been a very long time since I dealt with this and I was only on the
periphery. You might be able to print the LVT entries from the APIC and
see if any of them have an un-masked ExtINT delivery mode.  You would need
to do this very early before Linux modifies any values.


I'm afraid I'm not familiar enough with the interrupt / APIC parts of
the kernel to do something like this myself.


Or you can report the issue to the OEM and have them check their BIOS
code to see if they are doing this.


I will try to go this route, but I'm not really hopeful that will
lead to a solution.

Regards,

Hans


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-05 Thread Lendacky, Thomas
On 3/3/19 4:57 AM, Hans de Goede wrote:
> Hi,
> 
> On 21-02-19 13:30, Hans de Goede wrote:
>> Hi,
>>
>> On 19-02-19 22:47, Lendacky, Thomas wrote:
>>> On 2/19/19 3:01 PM, Thomas Gleixner wrote:
 Hans,

 On Tue, 19 Feb 2019, Hans de Goede wrote:

 Cc+: ACPI/AMD folks

> Various people are reporting false positive "do_IRQ: #.55 No irq
> handler for
> vector"
> messages on AMD ryzen based laptops, see e.g.:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1551605
>
> Which contains this dmesg snippet:
>
> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up
> secondary CPUs
> ...
> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP
> configuration:
> Feb 07 20:14:29 localhost.localdomain kernel:  node  #0,
> CPUs:  #1
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq
> handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel:  #2
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq
> handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel:  #3
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq
> handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node,
> 4 CPUs
> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical
> packages: 1
> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4
> processors
> activated (15968.49 BogoMIPS)
>
> It seems that we get an IRQ for each CPU as we bring it online,
> which feels to me like it is some sorta false-positive.

 Sigh, that looks like BIOS value add again.

 It's not a false positive. Something _IS_ sending a vector 55 to these
 CPUs
 for whatever reason.

>>>
>>> I remember seeing something like this in the past and it turned out to be
>>> a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
>>> interrupt controller when only the BSP should. During POST the APs were
>>> exposed to ExtINT/INTR events as a result of the mis-configuration
>>> (probably due to a UEFI timer-tick using the 8259) and this left a pending
>>> ExtINT/INTR interrupt latched on the APs.
>>>
>>> When the APs were started by the OS, the latched ExtINT/INTR interrupt is
>>> processed shortly after the OS enables interrupts. The AP then queries the
>>> 8259 to identify the vector number (which is the value of the 8259's ICW2
>>> register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
>>> since no interrupts are actually pending, the 8259 will respond with IRQ7
>>> (spurious interrupt) yielding a vector of 0x37 or 55.
>>>
>>> The OS was not expecting vector 55 and printed the message.
>>>
>>>  From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
>>> "Only one processor in the system should have an LVT entry configured to
>>> use the ExtINT delivery mode."
>>>
>>> Not saying this is the problem, but very well could be.
>>
>> That sounds like a likely candidate, esp. also since this only happens
>> once per CPU when we first only the CPU.
>>
>> Can you provide me with a patch with some printk-s / pr_debugs to
>> test for this, then I can build a kernel with that patch added and
>> we can see if your hypothesis is right.
> 
> Ping? I like your theory, can you provide some help with debugging this
> further (to prove that your theory is correct ) ?

It's been a very long time since I dealt with this and I was only on the
periphery. You might be able to print the LVT entries from the APIC and
see if any of them have an un-masked ExtINT delivery mode.  You would need
to do this very early before Linux modifies any values.

Or you can report the issue to the OEM and have them check their BIOS
code to see if they are doing this.

Thanks,
Tom

> 
> Regards,
> 
> Hans


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-03-03 Thread Hans de Goede

Hi,

On 21-02-19 13:30, Hans de Goede wrote:

Hi,

On 19-02-19 22:47, Lendacky, Thomas wrote:

On 2/19/19 3:01 PM, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
for whatever reason.



I remember seeing something like this in the past and it turned out to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

 From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.


That sounds like a likely candidate, esp. also since this only happens
once per CPU when we first only the CPU.

Can you provide me with a patch with some printk-s / pr_debugs to
test for this, then I can build a kernel with that patch added and
we can see if your hypothesis is right.


Ping? I like your theory, can you provide some help with debugging this
further (to prove that your theory is correct ) ?

Regards,

Hans


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-02-21 Thread Hans de Goede

Hi,

On 19-02-19 22:47, Lendacky, Thomas wrote:

On 2/19/19 3:01 PM, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
for whatever reason.



I remember seeing something like this in the past and it turned out to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

 From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.


That sounds like a likely candidate, esp. also since this only happens
once per CPU when we first only the CPU.

Can you provide me with a patch with some printk-s / pr_debugs to
test for this, then I can build a kernel with that patch added and
we can see if your hypothesis is right.

Regards,

Hans




Thanks,
Tom


I temporarily have access to a loaner laptop for a couple of weeks which shows
the same errors and I would like to fix this, but I don't really know how to
fix this.


Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
whether vector 55 is used on CPU0 and which device is associated to that.

I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be
IRQ9 which is usually - DRUMROLL - the ACPI interrupt.

The kernel clearly sets that up to be delivered to CPU 0 only, but I've
seen that before that the BIOS value add thinks that this setup is not
relevant.

/me goes off and sings LALALA


Note if you want I can set up root ssh-access to the laptop.


As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred
password for that :)

Thanks,

tglx



Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-02-21 Thread Hans de Goede

Hi,

On 19-02-19 22:01, Thomas Gleixner wrote:

Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks


Various people are reporting false positive "do_IRQ: #.55 No irq handler for
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUsHi,
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.


Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
for whatever reason.


I temporarily have access to a loaner laptop for a couple of weeks which shows
the same errors and I would like to fix this, but I don't really know how to
fix this.


Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
whether vector 55 is used on CPU0 and which device is associated to that.


ls /sys/kernel/debug/irq/domains gives:

AMD-IR-0IO-APIC-IR-0  PCI-MSI-3  default
AMD-IR-MSI-0-3  IO-APIC-IR-1  VECTOR

Non of the files under /sys/kernel/debug/irq/domains list 55 under the "vectors"
column of their output. The part with the vectors column is identical for all
of them and looks like this for all of them:

 | CPU | avl | man | mac | act | vectors
 0   195 1 16  33-37,48
 1   195 1 16  33-38
 2   195 1 16  33-38
 3   195 1 16  33-38
 4   195 1 16  33-38
 5   195 1 16  33-38
 6   195 1 16  33-38
 7   195 1 16  33-38

cat /sys/kernel/debug/irq/irqs/55

Gives:

handler:  handle_fasteoi_irq
device:   (null)
status:   0x4100
istate:   0x
ddepth:   1
wdepth:   0
dstate:   0x0503a000
IRQD_LEVEL
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_SINGLE_TARGET
IRQD_MOVE_PCNTXT
IRQD_CAN_RESERVE
node: -1
affinity: 0-15
effectiv: 0
pending:
domain:  IO-APIC-IR-1
 hwirq:   0x0
 chip:IR-IO-APIC
  flags:   0x10
 IRQCHIP_SKIP_SET_WAKE
 parent:
domain:  AMD-IR-0
 hwirq:   0x1
 chip:AMD-IR
  flags:   0x0
 parent:
domain:  VECTOR
 hwirq:   0x37
 chip:APIC
  flags:   0x0
 Vector: 0
 Target: 0
 move_in_progress: 0
 is_managed:   0
 can_reserve:  1
 has_reserved: 1
 cleanup_pending:  0

cat /proc/interrupt

Gives:

CPU0   CPU1   CPU2   CPU3   CPU4   CPU5   
CPU6   CPU7
   0:123  0  0  0  0  0 
 0  0  IR-IO-APIC2-edge  timer
   1:  0  0  0  0  0  0
188  0  IR-IO-APIC1-edge  i8042
   8:  0  0  0  0  0  0 
 0  1  IR-IO-APIC8-edge  rtc0
   9:  0   6564  0  0  0  0 
 0  0  IR-IO-APIC9-fasteoi   acpi
  12:  0  0  0  0  0511 
 0  0  IR-IO-APIC   12-edge  i8042
  25:  0  0  0  0  0  0 
 0  0   PCI-MSI 4096-edge  AMD-Vi
  26:  0  0  0  0  0  0 
 0  0  IR-PCI-MSI 18432-edge  PCIe PME, aerdrv
  27:  0  0  0  0  0  0 
 0  0  IR-PCI-MSI 20480-edge  PCIe PME, aerdrv
  28:  0  0  0  0  0  0 
 0  0  IR-PCI-MSI 22528-edge  PCIe PME, aerdrv
  29:  0  0  0  0  0  0 
 0  0  IR-PCI-MSI 24576-edge  PCIe PME, aerdrv
  30:  0  0  0  0  0  0 
 0  0  IR-PCI-MSI 26624-edge  PCIe PME, aerdrv
  31:  0  0  0  0   

Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-02-19 Thread Lendacky, Thomas
On 2/19/19 3:01 PM, Thomas Gleixner wrote:
> Hans,
> 
> On Tue, 19 Feb 2019, Hans de Goede wrote:
> 
> Cc+: ACPI/AMD folks
> 
>> Various people are reporting false positive "do_IRQ: #.55 No irq handler for
>> vector"
>> messages on AMD ryzen based laptops, see e.g.:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1551605
>>
>> Which contains this dmesg snippet:
>>
>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs
>> ...
>> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
>> Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel:  #2
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel:  #3
>> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
>> vector
>> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 
>> 1
>> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
>> activated (15968.49 BogoMIPS)
>>
>> It seems that we get an IRQ for each CPU as we bring it online,
>> which feels to me like it is some sorta false-positive.
> 
> Sigh, that looks like BIOS value add again.
> 
> It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
> for whatever reason.
> 

I remember seeing something like this in the past and it turned out to be
a BIOS issue.  BIOS was enabling the APs to interact with the legacy 8259
interrupt controller when only the BSP should. During POST the APs were
exposed to ExtINT/INTR events as a result of the mis-configuration
(probably due to a UEFI timer-tick using the 8259) and this left a pending
ExtINT/INTR interrupt latched on the APs.

When the APs were started by the OS, the latched ExtINT/INTR interrupt is
processed shortly after the OS enables interrupts. The AP then queries the
8259 to identify the vector number (which is the value of the 8259's ICW2
register + the IRQ level). The master 8259's ICW2 was set to 0x30 and,
since no interrupts are actually pending, the 8259 will respond with IRQ7
(spurious interrupt) yielding a vector of 0x37 or 55.

The OS was not expecting vector 55 and printed the message.

From the Intel Developer's Manual: Vol 3a, Section 10.5.1:
"Only one processor in the system should have an LVT entry configured to
use the ExtINT delivery mode."

Not saying this is the problem, but very well could be.

Thanks,
Tom

>> I temporarily have access to a loaner laptop for a couple of weeks which 
>> shows
>> the same errors and I would like to fix this, but I don't really know how to
>> fix this.
> 
> Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
> whether vector 55 is used on CPU0 and which device is associated to that.
> 
> I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be
> IRQ9 which is usually - DRUMROLL - the ACPI interrupt.
> 
> The kernel clearly sets that up to be delivered to CPU 0 only, but I've
> seen that before that the BIOS value add thinks that this setup is not
> relevant.
> 
> /me goes off and sings LALALA
> 
>> Note if you want I can set up root ssh-access to the laptop.
> 
> As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred
> password for that :)
> 
> Thanks,
> 
>   tglx
> 


Re: False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-02-19 Thread Thomas Gleixner
Hans,

On Tue, 19 Feb 2019, Hans de Goede wrote:

Cc+: ACPI/AMD folks

> Various people are reporting false positive "do_IRQ: #.55 No irq handler for
> vector"
> messages on AMD ryzen based laptops, see e.g.:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1551605
> 
> Which contains this dmesg snippet:
> 
> Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs
> ...
> Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
> Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel:  #2
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel:  #3
> Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for
> vector
> Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
> Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors
> activated (15968.49 BogoMIPS)
> 
> It seems that we get an IRQ for each CPU as we bring it online,
> which feels to me like it is some sorta false-positive.

Sigh, that looks like BIOS value add again.

It's not a false positive. Something _IS_ sending a vector 55 to these CPUs
for whatever reason.

> I temporarily have access to a loaner laptop for a couple of weeks which shows
> the same errors and I would like to fix this, but I don't really know how to
> fix this.

Can you please enable CONFIG_GENERIC_IRQ_DEBUGFS and dig in the files there
whether vector 55 is used on CPU0 and which device is associated to that.

I bet its a legacy IRQ and as that space starts at 48 (IRQ0) this should be
IRQ9 which is usually - DRUMROLL - the ACPI interrupt.

The kernel clearly sets that up to be delivered to CPU 0 only, but I've
seen that before that the BIOS value add thinks that this setup is not
relevant.

/me goes off and sings LALALA

> Note if you want I can set up root ssh-access to the laptop.

As a least resort. root ssh - SHUDDER - Ooops now I spilled my preferred
password for that :)

Thanks,

tglx


False positive "do_IRQ: #.55 No irq handler for vector" messages on AMD ryzen based laptops

2019-02-19 Thread Hans de Goede

Hi Thomas,

Various people are reporting false positive "do_IRQ: #.55 No irq handler for 
vector"
messages on AMD ryzen based laptops, see e.g.:

https://bugzilla.redhat.com/show_bug.cgi?id=1551605

Which contains this dmesg snippet:

Feb 07 20:14:29 localhost.localdomain kernel: smp: Bringing up secondary CPUs 
...
Feb 07 20:14:29 localhost.localdomain kernel: x86: Booting SMP configuration:
Feb 07 20:14:29 localhost.localdomain kernel:  node  #0, CPUs:  #1
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 1.55 No irq handler for 
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #2
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 2.55 No irq handler for 
vector
Feb 07 20:14:29 localhost.localdomain kernel:  #3
Feb 07 20:14:29 localhost.localdomain kernel: do_IRQ: 3.55 No irq handler for 
vector
Feb 07 20:14:29 localhost.localdomain kernel: smp: Brought up 1 node, 4 CPUs
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Max logical packages: 1
Feb 07 20:14:29 localhost.localdomain kernel: smpboot: Total of 4 processors 
activated (15968.49 BogoMIPS)

It seems that we get an IRQ for each CPU as we bring it online,
which feels to me like it is some sorta false-positive.

I temporarily have access to a loaner laptop for a couple of weeks which shows
the same errors and I would like to fix this, but I don't really know how to
fix this.

Note if you want I can set up root ssh-access to the laptop.

Regards,

Hans