Hello

We have a somewhat curious issue and run out of ideas ;)

We do not have a trigger to reproduce the issue, but we for example see
some IRC disconnects from users behind our firewall.

What we have:
- two HP Proliant DL360 G5 with Broadcom BCM5708 NICs, 2GB RAM,
  Intel Xeon E5335@2.0GHz
- OpenBSD 5.5
- trunk between the two NICs
- 13 VLANs interfaces with carp failover
- one VLAN for pfsync
- ospfd and ospf6d
- approx. 200Mbit/s of traffic
- the initial pfysnc takes quite long (~1h)

The setup looks like this (not sure if relevant):
- both servers have a failover trunk with two interfaces
- all traffic including pfsync is sent over this trunk
- the problem also occurs, if we disable one box

What happens/what we tried:
The main issue is, that we occasionally see broken SSH connections and
quite a lot of broken IRC connections during the day. It looks a bit
like the problem happens more in the evening - however we do not see a
correlation with the amount of traffic or number of connections.
As a first reaction we updated to the latest stable OpenBSD release
which didn't solve the issue. Afterwards we replaced the onboard
Broadcom NIC with a PCIe Intel 82576 (em driver) card, however this card
seems to cause some new issues - i.e. we see quite some input (rx)
errors using "netstat -i". Because we don't see such errors using the
Broadcom NICs we decided to not investigate this issue any further and
switch back to the Broadcom setup.
Besides those steps we also disabled one of the boxes by stopping ospf
and removing the carp interfaces - however, the disconnects didn't go
away. 
Furthermore we also checked if any state-tables are overflowing and we
didn't find any suspicious kernel messages either.

We have quite a similar setup which doesn't show those issues - however
we don't have the same amount of traffic over those systems.

I uploaded some information about the system to this place:
* sysctl -a http://dpaste.com/08VBA93
* pfctl (w/o rules and states) http://dpaste.com/2BBJG5P
Feel free to ask for more if needed.

Long story short; do you have any hints or ideas where we could look
next? Did you ever see such a problem in an other setup? At least to me,
it looks like long-during sessions (like IRC) are somehow affected -
does this ring some bells?

I appreciate any hints and hope that I didn't miss any important
information - otherwise feel free to bug me.

Thanks in advance and have a nice day!

Kind regards,
Nicolas

Reply via email to