On 28.12.2018 07:39, Heiner Kallweit wrote:
> On 28.12.2018 07:34, Heiner Kallweit wrote:
>> On 28.12.2018 02:31, Frederic Weisbecker wrote:
>>> On Fri, Dec 28, 2018 at 12:11:12AM +0100, Heiner Kallweit wrote:
>>>>
>> [...]
>>>
>>> Interesting, the softirq is raised from hardirq but it's not handled in the 
>>> end of
>>> the IRQ. Are you running threaded IRQS by any chance? If so I would expect 
>>> ksoftirqd
>>> to handle the pending work before we go idle. However I can imagine a small 
>>> window
>>> where such an expectation may not be met: if the softirq is raised after 
>>> the ksoftirqd
>>> thread is parked (CPUHP_AP_SMPBOOT_THREADS), which is right before we 
>>> disable the CPU
>>> (CPUHP_TEARDOWN_CPU).
>>>
>> I have a network driver (r8169) using NAPI which runs in softirq context 
>> AFAIK.
>> For testing purposes I sometimes trigger system suspend via network, so 
>> there is
>> network adapter activity when system suspends. Apart from that nothing really
>> exciting:
>>             CPU0       CPU1       CPU2       CPU3
>>    0:         43          0          0          0   IO-APIC    2-edge      
>> timer
>>    1:          4          0          0          0   IO-APIC    1-edge      
>> i8042
>>    8:          0          1          0          0   IO-APIC    8-fasteoi   
>> rtc0
>>    9:          0          0          0          0   IO-APIC    9-fasteoi   
>> acpi
>>   12:          0          0          0          5   IO-APIC   12-edge      
>> i8042
>>  120:          0          0          0          0   PCI-MSI 311296-edge      
>> PCIe PME
>>  121:          0          0          0          0   PCI-MSI 315392-edge      
>> PCIe PME
>>  122:          0          0          0          0   PCI-MSI 327680-edge      
>> PCIe PME
>>  123:          0          0       3328          0   PCI-MSI 294912-edge      
>> ahci[0000:00:12.0]
>>  124:          0        133          0          0   PCI-MSI 344064-edge      
>> xhci_hcd
>>  125:          0          0         32          0   PCI-MSI 245760-edge      
>> mei_me
>>  127:        381          0          0          0   PCI-MSI 1572864-edge     
>>  enp3s0
>>  128:          0          0          0        236   PCI-MSI 32768-edge      
>> i915
>>  129:          0        374          0          0   PCI-MSI 229376-edge      
>> snd_hda_intel:card0
>>
>>> I don't know if we can afford to ignore a softirq even at this late stage. 
>>> We should
>>> probably avoid leaking any. So here is a possible fix, if you don't mind 
>>> trying:
>>>
>> I tested your patch and at least in the first minutes of testing couldn't 
>> reproduce
>> the issue any longer. I tested manual system suspend and the following 
>> script you
>> sent when we started to analyze the issue.
>>
> 
> Also after some more time the issue didn't occur again. So it seems your 
> analysis
> was right and also the approach to fix it. Thanks!
> Will let you know in case the issue should pop up again under special
> circumstances.
> 
Frederic, so far this fix didn't appear in linux-next, are you going to submit 
it?

> 
>> Heiner
>>
>> --------------------------------------------------------------------------
>>
>> #!/bin/bash
>>
>> do_hotplug()
>> {
>>      for i in $(seq 1 $2)
>>      do
>>              echo $1 > /sys/devices/system/cpu/cpu$i/online
>>      done
>> }
>>
>> LAST_CPU=$(($(nproc)-1))
>>
>> while true
>> do
>>      do_hotplug 0 $LAST_CPU
>>      do_hotplug 1 $LAST_CPU
>> done
>>
> 

Reply via email to