Oliver <bir...@sernet.de> writes: > Hello, > > after upgrading to debian buster with kernel 4.19 we also had problems. > > By adjusting net.ipv6.route.max_size we have fixed the following messages: > watchdog: BUG: soft lockup - CPU#X stuck for 22s! > and > ixgbe 0000:02:00.0 ens2fX: initiating reset due to tx timeout > > But we still had a lot of jitter on the line. Downgrading to 4.9.0 fixed the > problem, but this is not a permanent solution. > > What else did we tried: > * Increasing gc_threshX > net.ipv6.neigh.default.gc_thresh1 = 2048 > net.ipv6.neigh.default.gc_thresh2 = 4096 > net.ipv6.neigh.default.gc_thresh3 = 8192 > => Did not help
The linux kernel is getting rid of ipv6 caching, like it did with ipv4, but it will take some time to get there. It seems that in this kernel they have set a small value for net.ipv6.route.max_size (4096!), and when this parameter is increased (e.g. 1048576).... the problem went away for us. I'm not 100% clear on what units this value is, I had around 89k ipv6 routes, so this value is definitely higher. I'm sure that setting t too high could result in some memory issues. Additionally, you also want to raise net.ipv6.route.gc_thresh to avoid running the garbage collector too often. I found that the rule of thumb here is 1/4 the size of ipv6.route.max_size. I did find that in Linux kernel 5.2 there is a message output to the kernel ring buffer when the ipv6.route.max_size is hit, so you at least have a *clue* what is going on. In 4.19, which is what Debian Buster is, you don't get that clue. -- micah