On 10/17/18 20:12, Heiner Kallweit wrote:
On 16.10.2018 23:17, Holger Hoffstätte wrote:
On 10/16/18 22:37, Heiner Kallweit wrote:
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Very interesting! Could this be the reason for the mysterious
hangs & resets we experienced when enabling BQL for r8169?
They happened more often with TSO/GSO enabled and several people
attempted to fix those hangs unsuccessfully; it was later reverted
and has been since then (#87cda7cb43).
If this bug has been there "forever" it might be tempting to
re-apply BQL and see what happens. Any chance you could give that
a try? I'll gladly test patches, just like I'll run this one.

After reading through the old mail threads regarding BQL on r8169
I don't think the fix here is related.
It seems that BQL on r8169 worked fine for most people, just one
had problems on one of his systems. I assume the issue was specific

I continued to use the BQL patch in my private tree after it was reverted
and also had occasional timeouts, but *only* after I started playing
with ethtool to change offload settings. Without offloads or the BQL patch
everything has been rock-solid since then.
The other weird problem was that timeouts would occur on an otherwise
*completely idle* system. Since that occasionally borked my NFS server
over night I ultimately removed BQL as well. Rock-solid since then.

I will apply the old BQL patch and see how it's on my system
(with GRO and SG enabled).

I don't think it still applies cleanly, but if you cook up an updated
version I'll gladly test it.

Thanks! :)
Holger

Reply via email to