Hi all,

I am replying to this thread as I see some resemblance between issue I
experience and the quickly rising netlivelocks value.

On 24/06/14 3:08 PM, Chris Cappuccio wrote:
>Kapetanakis Giannis [bil...@edu.physics.uoc.gr] wrote:
>> On 23/06/14 21:33, Henning Brauer wrote:
>>>* Chris Cappuccio <ch...@nmedia.net> [2014-06-23 20:24]:
>>>> I have a sandy bridge Xeon box with PF NAT that handles a daily 200
>>>> to 700Mbps. It has a single myx interface using OpenBSD 5.5 (not
>>>> current). It does nothing but PF NAT and related routing. No barage
>>>> of vlans or interfaces. No dynamic routing. Nothing else. 60,000 to
>>>> 100,000 states.
>>>>
>>>> With an MP kernel, kern.netlivelocks increases by something like
>>>> 150,000 per day!! I The packet loss was notable.
>>>>
>>>> With an SP kernel, the 'netlivelock' counter barely moves. Maybe
>>>> 100 per day on average, but for the past week, maybe 5.
>>
>> sysctl -a|grep netlive
>> kern.netlivelocks=50
>> 
>> # pfctl -ss|wc -l
>>     73203
>> 
>> # pfctl -sr|wc -l
>>      294
>>
>> routing/firewalling/some NAT at ~ 500Mbps

I am routing between 5 and 20 megabit/sec on an OpenBSD 5.5 following
mtier stable updates. No NAT, PF is disabled, just plain routing (~ 500k
IPv4 routes, 20k IPv6 routes).

DMESG is available here http://instituut.net/~job/dmesg-dcg-2.txt . A
mixture of em(4) and bnx(4) NICs in Dell R610 chassis with mfi(4)
powered PERC 6/i controller.

> I have some ideas. I'm going to do some troubleshooting when I have a
> chance to think clearly.
>
> I think the disk subsystem could be part of the issue. I see the most
> netlivelocks on a box with a USB key, mfi is in second place.

I am graphing netlivelocks in munin to get a grasp on things:
    
    
http://sysadmin.coloclue.net/munin/router.nl.coloclue.net/eunetworks-2.router.nl.coloclue.net/index.html#kern
    (feel free to look at the other system metrics from the BSD routers,
    filed under "router.nl.coloclue.net" at 
http://sysadmin.coloclue.net/munin/index.html)

Until yesterday I was running GENERIC.MP, and experienced between 1% and
2% packetloss on packets forwarded by the OpenBSD routers, sthen@
recommended I try the singlecore kernel and magically most of the
packetloss disappeared (but not all). With the GENERIC.MP kernel
netlivelocks was raising way faster.

During debugging (when I was running MP) i tcpdumped for inbound ICMP
traffic on one of our edge interfaces, and inititally thought one of our
suppliers was to blame as tcpdump didn't show some packets I expected to
arrive, now I suspect they got lost on our side because we don't see the
behaviour with SP. I observed similair packetloss for both IPv4 and
IPv6. Unsure if that helps in assessing where in the system they get
lost.

How can I assist in further debugging? 

Kind regards,

Job

Reply via email to