Hi all, I am replying to this thread as I see some resemblance between issue I experience and the quickly rising netlivelocks value.
On 24/06/14 3:08 PM, Chris Cappuccio wrote: >Kapetanakis Giannis [bil...@edu.physics.uoc.gr] wrote: >> On 23/06/14 21:33, Henning Brauer wrote: >>>* Chris Cappuccio <ch...@nmedia.net> [2014-06-23 20:24]: >>>> I have a sandy bridge Xeon box with PF NAT that handles a daily 200 >>>> to 700Mbps. It has a single myx interface using OpenBSD 5.5 (not >>>> current). It does nothing but PF NAT and related routing. No barage >>>> of vlans or interfaces. No dynamic routing. Nothing else. 60,000 to >>>> 100,000 states. >>>> >>>> With an MP kernel, kern.netlivelocks increases by something like >>>> 150,000 per day!! I The packet loss was notable. >>>> >>>> With an SP kernel, the 'netlivelock' counter barely moves. Maybe >>>> 100 per day on average, but for the past week, maybe 5. >> >> sysctl -a|grep netlive >> kern.netlivelocks=50 >> >> # pfctl -ss|wc -l >> 73203 >> >> # pfctl -sr|wc -l >> 294 >> >> routing/firewalling/some NAT at ~ 500Mbps I am routing between 5 and 20 megabit/sec on an OpenBSD 5.5 following mtier stable updates. No NAT, PF is disabled, just plain routing (~ 500k IPv4 routes, 20k IPv6 routes). DMESG is available here http://instituut.net/~job/dmesg-dcg-2.txt . A mixture of em(4) and bnx(4) NICs in Dell R610 chassis with mfi(4) powered PERC 6/i controller. > I have some ideas. I'm going to do some troubleshooting when I have a > chance to think clearly. > > I think the disk subsystem could be part of the issue. I see the most > netlivelocks on a box with a USB key, mfi is in second place. I am graphing netlivelocks in munin to get a grasp on things: http://sysadmin.coloclue.net/munin/router.nl.coloclue.net/eunetworks-2.router.nl.coloclue.net/index.html#kern (feel free to look at the other system metrics from the BSD routers, filed under "router.nl.coloclue.net" at http://sysadmin.coloclue.net/munin/index.html) Until yesterday I was running GENERIC.MP, and experienced between 1% and 2% packetloss on packets forwarded by the OpenBSD routers, sthen@ recommended I try the singlecore kernel and magically most of the packetloss disappeared (but not all). With the GENERIC.MP kernel netlivelocks was raising way faster. During debugging (when I was running MP) i tcpdumped for inbound ICMP traffic on one of our edge interfaces, and inititally thought one of our suppliers was to blame as tcpdump didn't show some packets I expected to arrive, now I suspect they got lost on our side because we don't see the behaviour with SP. I observed similair packetloss for both IPv4 and IPv6. Unsure if that helps in assessing where in the system they get lost. How can I assist in further debugging? Kind regards, Job