On 28.12.2018 07:39, Heiner Kallweit wrote: > On 28.12.2018 07:34, Heiner Kallweit wrote: >> On 28.12.2018 02:31, Frederic Weisbecker wrote: >>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote: >>>> >> [...] >>> >>> Interesting, the softirq is raised from hardirq but it's not handled in the >>> end of >>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect >>> ksoftirqd >>> to handle the pending work before we go idle. However I can imagine a small >>> window >>> where such an expectation may not be met: if the softirq is raised after >>> the ksoftirqd >>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we >>> disable the CPU >>> (CPUHP_TEARDOWN_CPU). >>> >> I have a network driver (r8169) using NAPI which runs in softirq context >> AFAIK. >> For testing purposes I sometimes trigger system suspend via network, so >> there is >> network adapter activity when system suspends. Apart from that nothing really >> exciting: >> CPU0 CPU1 CPU2 CPU3 >> 0: 43 0 0 0 IO-APIC 2-edge >> timer >> 1: 4 0 0 0 IO-APIC 1-edge >> i8042 >> 8: 0 1 0 0 IO-APIC 8-fasteoi >> rtc0 >> 9: 0 0 0 0 IO-APIC 9-fasteoi >> acpi >> 12: 0 0 0 5 IO-APIC 12-edge >> i8042 >> 120: 0 0 0 0 PCI-MSI 311296-edge >> PCIe PME >> 121: 0 0 0 0 PCI-MSI 315392-edge >> PCIe PME >> 122: 0 0 0 0 PCI-MSI 327680-edge >> PCIe PME >> 123: 0 0 3328 0 PCI-MSI 294912-edge >> ahci[0000:00:12.0] >> 124: 0 133 0 0 PCI-MSI 344064-edge >> xhci_hcd >> 125: 0 0 32 0 PCI-MSI 245760-edge >> mei_me >> 127: 381 0 0 0 PCI-MSI 1572864-edge >> enp3s0 >> 128: 0 0 0 236 PCI-MSI 32768-edge >> i915 >> 129: 0 374 0 0 PCI-MSI 229376-edge >> snd_hda_intel:card0 >> >>> I don't know if we can afford to ignore a softirq even at this late stage. >>> We should >>> probably avoid leaking any. So here is a possible fix, if you don't mind >>> trying: >>> >> I tested your patch and at least in the first minutes of testing couldn't >> reproduce >> the issue any longer. I tested manual system suspend and the following >> script you >> sent when we started to analyze the issue. >> > > Also after some more time the issue didn't occur again. So it seems your > analysis > was right and also the approach to fix it. Thanks! > Will let you know in case the issue should pop up again under special > circumstances. > Frederic, so far this fix didn't appear in linux-next, are you going to submit it?
> >> Heiner >> >> -------------------------------------------------------------------------- >> >> #!/bin/bash >> >> do_hotplug() >> { >> for i in $(seq 1 $2) >> do >> echo $1 > /sys/devices/system/cpu/cpu$i/online >> done >> } >> >> LAST_CPU=$(($(nproc)-1)) >> >> while true >> do >> do_hotplug 0 $LAST_CPU >> do_hotplug 1 $LAST_CPU >> done >> >