Too late at night to be doing this stuff. Clicked send instead of saving a draft. Sorry, please ignore.
On 10/10/2018 23:30, Chris Clayton wrote: > OK, right kernel/module used this time. Please see findings below. > > On 10/10/2018 01:24, Maciej S. Szmigiero wrote: >> On 09.10.2018 22:36, Heiner Kallweit wrote: >>> On 09.10.2018 16:40, Chris Clayton wrote: >>>> Thanks to Maciej and Heiner for their replies. >>>> >>>> On 09/10/2018 13:32, Maciej S. Szmigiero wrote: >>>>> On 07.10.2018 21:36, Chris Clayton wrote: >>>>>> Hi again, >>>>>> >>>>>> I didn't think there was anything in 4.19-rc7 to fix this regression, >>>>>> but tried it anyway. I can confirm that the >>>>>> regression is still present and my network still fails when, after a >>>>>> resume from suspend (to ram or disk), I open my >>>>>> browser or my mail client. In both those cases the failure is almost >>>>>> immediate - e.g. my home page doesn't get displayed >>>>>> in the browser. Pinging one of my ISPs name servers doesn't fail quite >>>>>> so quickly but the reported time increases from >>>>>> 14-15ms to more than 1000ms. >>>>> >>>>> You can try comparing chip registers (ethtool -d eth0) in the working >>>>> state (before a suspend) and in the broken state (after a resume). >>>>> Maybe there will be some obvious in the difference. >>>>> >>>>> The same goes for the PCI configuration (lspci -d :8168 -vv). >>>>> >>>> Maciej suggested comparing the output from lspci -vv for the ethernet >>>> device. They are identical. >>>> >>>> Both Maciej and Heiner suggested comparing the output from "ethtool -d" >>>> pre and post suspend. Again, they are identical. >>>> Heiner specifically suggested looking at the RxConfig. The value of that >>>> is 0x0002870e both pre and post suspend. >>>> >>> Hmm, this is very weird, especially taking into account that in your >>> original >>> report you state that removing the call to rtl_init_rxcfg() from >>> rtl_hw_start() >>> fixes the issue. rtl_init_rxcfg() deals with the RxConfig register only and >>> register values seem to be the same before and after resume. So how can the >>> chip behave differently? >>> So far my best guess is that some chip quirk causes it to accept writes to >>> register RxConfig, but to misinterpret or ignore the written value. >>> So far your report is the only one (affecting RTL8411), but we don't know >>> whether other chip versions are affected too. >> >> Also, it is interesting that even if one removes a call to >> rtl_init_rxcfg() from rtl_hw_start() the RxConfig register will still get >> written to moments later by rtl_set_rx_mode(). >> >> The only chip accesses in the meantime seems to be a write to TxConfig by >> rtl_set_tx_config_registers() and then a read of RxConfig plus two writes >> to MAR0 earlier in rtl_set_rx_mode(). >> >> My proposals are: >> 1) Try swapping "rtl_init_rxcfg(tp);" and "rtl_set_tx_config_registers(tp);" >> in rtl_hw_start(). >> Maybe the chip does not like sometimes that RxConfig is written before >> TxConfig. >> > > This change made no difference. Networking still dies if I open a browser or > leave ping running long enough. > >> 2) Check the original value of RxConfig (after a resume) before >> rtl_init_rxcfg() overwrites it (compile tested only): >> --- r8169.c.ori >> +++ r8169.c >> @@ -5155,6 +5155,9 @@ >> /* Initially a 10 us delay. Turned it into a PCI commit. - FR */ >> RTL_R8(tp, IntrMask); >> RTL_W8(tp, ChipCmd, CmdTxEnb | CmdRxEnb); >> + >> + pr_notice("RxConfig before init was %.8x\n", >> + (unsigned int)RTL_R32(tp, RxConfig)); >> rtl_init_rxcfg(tp); >> rtl_set_tx_config_registers(tp); >> >> >> This should be the value that you got when you removed the call to >> rtl_init_rxcfg() for testing. >> Now, knowing the "right" value you can experiment with what rtl_init_rxcfg() >> writes (under the "default:" label for your NIC model). > > This might be more interesting. Through combination of viewing the output > from pr_notice() and the output from "ethtool > -d", I can see RxConfig with the following values > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002870e > > I then removed the call to rtl_init_rxcfg() from rtl_hw_start() and rebuilt, > installed and rebooted. Now I see the > following values: > > During boot: 0x00028700 > Before suspend: 0x0002870e > During resume: 0x00024000 > Post resume: 0x0002870e > >> >> Hope this helps, >> Maciej >>