Re: acpi interrupt storm on HP EliteBook 840 G1

2020-04-25 Thread Greg Steuck
I dug a bit more into this and hopefully it will be more actionable
this way and so +tech@. The original details and references are still
below.

Since the last time I figured that suspend/resume is a reliable way to
trigger the interrupt storm.
So I recompiled 6.6-current with ACPI_DEBUG. Now once I wake up the
system it writes the following
non-stop into dmesg at ~1kHz (compared to 5kHz when the log is not written):

--== Eval Method [\\_GPE._L2A, 0 args] to t ==--
= Stack \\_GPE._L2A:Method
parsename: \\_SB_.PCI0.RP06.WKEN 1

--==Finished evaluating method: \\_GPE._L2A t
= Stack \\_GPE._L2A:Method
acpi thread going to sleep...
GPE block: 28 dd 04
queue gpe: 2a
handle gpe: 2a
handling GPE 2a
EVALNODE: \\_GPE._L2A 105b7c

--== Eval Method [\\_GPE._L2A, 0 args] to t ==--
...

Looking into the iasl-produced DSDT.dsl (full output below), this
seems to be relevant:

Scope (\_GPE)
{
Method (_L2A, 0, NotSerialized)  // _Lxx: Level-Triggered
GPE, xx=0x00-0xFF
{
If (\_SB.PCI0.RP06.WKEN)
{
\_SB.GOWW (0x2A, 0x01)
Notify (\_SB.PCI0.RP06, 0x02) // Device Wake
}
}
}


Cursory searches indicate this kind of problem is typically handled by
people shutting off the interrupt,
but I don't think a knob corresponding to linux
/sys/firmware/acpi/interrupts/gpe2A exists in OpenBSD.

Any workarounds? Patches I can test?

Thanks
Greg

P.S. The relevant device to save indirection:

Device (RP06)
{
Method (_ADR, 0, NotSerialized)  // _ADR: Address
{
Return (RPA5) /* \_SB_.PCI0.RPA5 */
}

Method (_INI, 0, NotSerialized)  // _INI: Initialize
{
LTRE = LTR6 /* \LTR6 */
LMSL = PML6 /* \_SB_.PCI0.PML6 */
LNSL = PNL6 /* \_SB_.PCI0.PNL6 */
OBFF = OBF6 /* \OBF6 */
}

OperationRegion (PXCS, PCI_Config, 0x00, 0x0380)
Field (PXCS, AnyAcc, NoLock, WriteAsZeros)
{
VDID,   32,
Offset (0x50),
L0SE,   1,
,   3,
LDIS,   1,
Offset (0x51),
Offset (0x52),
,   13,
LASX,   1,
Offset (0x5A),
ABPX,   1,
,   2,
PDCX,   1,
,   2,
PDSX,   1,
Offset (0x5B),
LSCX,   1,
Offset (0x60),
Offset (0x62),
PSPX,   1,
Offset (0xA4),
D3HT,   2,
Offset (0xD8),
,   30,
HPEX,   1,
PMEX,   1,
,   30,
HPSX,   1,
PMSX,   1,
Offset (0xE2),
,   2,
L23E,   1,
L23R,   1,
Offset (0x324),
,   3,
LEDM,   1
}

Field (PXCS, AnyAcc, NoLock, Preserve)
{
Offset (0x42),
Offset (0x43),
SI, 1,
Offset (0x50),
,   4,
LD, 1,
Offset (0x58),
SCTL,   16,
SSTS,   16,
Offset (0xD8),
,   30,
HPCE,   1,
PMCE,   1
}

Method (HPLG, 0, Serialized)
{
If (_STA ())
{
If (HPSX)
{
Sleep (0x64)
If (PDCX)
{
PDCX = 0x01
HPSX = 0x01
Notify (^, 0x00) // Bus Check
}
Else
{
HPSX = 0x01
}
}
}
}

Method (PME, 0, Serialized)
{
If (_STA ())
{
If (PSPX)
{
While (PSPX)
{
PSPX = 0x01
}

PMSX = 0x01
Notify (^, 0x02) // Device Wake
}

Re: acpi interrupt storm on HP EliteBook 840 G1

2020-04-26 Thread Greg Steuck
I searched a bit more and found a few cases when other people reported
this behavior:

https://marc.info/?l=openbsd-misc&m=154348280502809&w=2
https://marc.info/?l=openbsd-bugs&m=152022260714390&w=2

I applied a variation of the Thomas Merkel's patch (at the bottom)
which took care of the symptoms while also reporting the masked event.

I am facing a choice. This appears to be a rarely reported problem.  I
don't see a good reason to build any non-trivial mitigation (some
variation of event rate limiting comes to mind). I can just carry a
minimized patch forward for as long as I use the defective machine and
rebuild GENERIC.MP as part of my weekly sysupgrade ritual.

Anybody see a reason to prefer complicating the kernel? Will it stand a
chance of getting committed? Any design ideas for such a workaround?

Thanks
Greg

diff --git a/sys/dev/acpi/acpi.c b/sys/dev/acpi/acpi.c
index c3871e007a3..9c069fd5314 100644
--- a/sys/dev/acpi/acpi.c
+++ b/sys/dev/acpi/acpi.c
@@ -2296,6 +2296,15 @@ acpi_gpe(struct acpi_softc *sc, int gpe, void *arg)
  struct aml_node *node = arg;
  uint8_t mask, en;

+ if (gpe == 0x2a && (sc->gpe_table[gpe].flags & GPE_LEVEL)) {
+ static unsigned short i;
+ if (i == 0) {
+ i++;
+ printf("acpi_gpe %d %s IGNORING\n", gpe, node->name);
+ }
+ return (0);
+ }
+ printf("acpi_gpe %d %s\n", gpe, node->name);
  dnprintf(10, "handling GPE %.2x\n", gpe);
  aml_evalnode(sc, node, 0, NULL, NULL);