On 28.12.2018 07:34, Heiner Kallweit wrote:
> On 28.12.2018 02:31, Frederic Weisbecker wrote:
>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>>
> [...]
>>
>> Interesting, the softirq is raised from hardirq but it's not handled in the 
>> end of
>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect 
>> ksoftirqd
>> to handle the pending work before we go idle. However I can imagine a small 
>> window
>> where such an expectation may not be met: if the softirq is raised after the 
>> ksoftirqd
>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we 
>> disable the CPU
>> (CPUHP_TEARDOWN_CPU).
>>
> I have a network driver (r8169) using NAPI which runs in softirq context 
> AFAIK.
> For testing purposes I sometimes trigger system suspend via network, so there 
> is
> network adapter activity when system suspends. Apart from that nothing really
> exciting:
>             CPU0       CPU1       CPU2       CPU3
>    0:         43          0          0          0   IO-APIC    2-edge      
> timer
>    1:          4          0          0          0   IO-APIC    1-edge      
> i8042
>    8:          0          1          0          0   IO-APIC    8-fasteoi   
> rtc0
>    9:          0          0          0          0   IO-APIC    9-fasteoi   
> acpi
>   12:          0          0          0          5   IO-APIC   12-edge      
> i8042
>  120:          0          0          0          0   PCI-MSI 311296-edge      
> PCIe PME
>  121:          0          0          0          0   PCI-MSI 315392-edge      
> PCIe PME
>  122:          0          0          0          0   PCI-MSI 327680-edge      
> PCIe PME
>  123:          0          0       3328          0   PCI-MSI 294912-edge      
> ahci[0000:00:12.0]
>  124:          0        133          0          0   PCI-MSI 344064-edge      
> xhci_hcd
>  125:          0          0         32          0   PCI-MSI 245760-edge      
> mei_me
>  127:        381          0          0          0   PCI-MSI 1572864-edge      
> enp3s0
>  128:          0          0          0        236   PCI-MSI 32768-edge      
> i915
>  129:          0        374          0          0   PCI-MSI 229376-edge      
> snd_hda_intel:card0
> 
>> I don't know if we can afford to ignore a softirq even at this late stage. 
>> We should
>> probably avoid leaking any. So here is a possible fix, if you don't mind 
>> trying:
>>
> I tested your patch and at least in the first minutes of testing couldn't 
> reproduce
> the issue any longer. I tested manual system suspend and the following script 
> you
> sent when we started to analyze the issue.
> 

Also after some more time the issue didn't occur again. So it seems your 
analysis
was right and also the approach to fix it. Thanks!
Will let you know in case the issue should pop up again under special
circumstances.


> Heiner
> 
> --------------------------------------------------------------------------
> 
> #!/bin/bash
> 
> do_hotplug()
> {
>       for i in $(seq 1 $2)
>       do
>               echo $1 > /sys/devices/system/cpu/cpu$i/online
>       done
> }
> 
> LAST_CPU=$(($(nproc)-1))
> 
> while true
> do
>       do_hotplug 0 $LAST_CPU
>       do_hotplug 1 $LAST_CPU
> done
> 

Reply via email to