On Sat, 21 Feb 2015 11:31:04 +0100, Florian Westphal wrote: > Tomas Szepe <sz...@pinerecords.com> wrote: >> > I tried to reproduce this without success so far on my RTL8168d/8111d >> > device. >> > I've been running 40 parallel netperf TCP_STREAM tests (1gbit) for the >> > last 5 hours and so far I saw no watchdog tx timeouts. >> > >> > I'll keep this running for a day or so to see if it just takes more time >> > to trigger. >> >> So, how's this coming along? Don't you think the patch should be reverted >> until the problem is diagnosed/understood/fixed? > > Sorry. > > David, please consider reverting > > 1e918876853aa85435e0f17fd8b4a92dcfff53d6 > (r8169: add support for Byte Queue Limits) > > and > > 0bec3b700d106a8b0a34227b2976d1a582f1aab7 > (r8169: add support for xmit_more) > > I cannot reproduce any hangs (tried for 2days with 40 parallel > netperfs using both 100mbit and 1gbit receiver). > > And I don't see anything wrong with the change either. > Seems like some revisions of the HW are just dodgy? > > I hate giving up, but I have no means to diagnose this any further. > Even reporter says it doesn't affect all of his r8169 nics. > > So I think the change is correct per se, but might be revealing some > HW/firmware bug.
Florian, have you experimented with offload settings? The only times r8169 seems to hiccup is with sg/tso enabled. I've reverted my NIC settings back to mostly defaults (which does not enable sg/tso) and had no hangs, spurious timeouts or other problems ever since, despite BQL, xmit_more and client/server use for 24/7. Tomas never said whether his setup enabled any offload settings; it's not inconceivable that a distribution might try to automatically "optimize" things. -h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/