Re: Oerrs on vlan interfaces

2014-03-26 Thread Chris Cappuccio
Matt Carey [cvstealth2...@yahoo.com] wrote:
 I'm trying to track down the source of what is causing output errors on vlan
 interfaces for 2 separate physical systems. ?For example when looking at
 netstat between 2 different runs the values are always incrementing:

 #
 netstat -s -f inet -I vlan800  echo  sleep 5  netstat -s -f inet -I
 vlan800?
 Name ? ?Mtu ? Network ? ? Address ? ? ? ? ? ? ?Ipkts Ierrs ? ?Opkts
 Oerrs Colls
 vlan800 1500 ?Link ? ? ?00:1c:23:e1:cf:48 187689428 ? ? 0
 148043392 262767 ? ? 0


This output is mostly unreadable. Maybe the yahoo mailer is the problem.

 The same behavior is
 mimicked on both systems as the counters start incrementing when failing over
 the carp interfaces between the peers. Another oddity that is the physical
 interfaces show no output errors just input errors:

Take a look at /usr/src/sys/net/if_vlan.c, there are three places where it
increases if_oerrors. You could try to sprinkle unique printfs next to each
if_oerrors++ to see which one is getting triggered. Each one happens for
a different reason, parent interface not UP/RUNNING, parent interface doesn't
handle its own tagging and if_vlan can't allocate mbuf, etc.



Oerrs on vlan interfaces

2014-03-25 Thread Matt Carey
I'm trying to track down the source of what is causing output errors on vlan
interfaces for 2 separate physical systems.  For example when looking at
netstat between 2 different runs the values are always incrementing:

#
netstat -s -f inet -I vlan800  echo  sleep 5  netstat -s -f inet -I
vlan800 
Name    Mtu   Network     Address              Ipkts Ierrs    Opkts
Oerrs Colls
vlan800 1500  Link      00:1c:23:e1:cf:48 187689428     0
148043392 262767     0

Name    Mtu   Network     Address              Ipkts
Ierrs    Opkts Oerrs Colls
vlan800 1500  Link      00:1c:23:e1:cf:48
187691085     0 148044677 262770     0


Name    Mtu   Network     Address
             Ipkts Ierrs    Opkts Oerrs Colls
vlan200 1500  Link    
 00:1c:23:e1:cf:48 18139570     0 18645286 40217     0
vlan300 1500  Link  
   00:1c:23:e1:cf:48  2562460     0  3460373  2720     0

vlan500 1500  Link
     00:1c:23:e1:cf:48 112356993     0 141163651 158443     0



The hardware
is 2 Dell PowerEdge 860 servers using the onboard Broadcom BCM5721 NICs. Each
system is attached to a different Procurve switch with the 2 onboard NICs in a
LACP trunk configuration. The 2 systems are setup in an HA configuration using
carp/pf that runs very well.  When looking for any type of issues on the
switch ports they come back clean for all uplinks to the Dells:
  Errors
(Since boot or last clear) :                                    
   FCS Rx    
     : 0                  Drops Rx        : 0                 
   Alignment Rx
   : 0                  Collisions Tx   : 0                 
   Runts Rx      
 : 0                  Late Colln Tx   : 0                 
   Giants Rx      
: 0                  Excessive Colln : 0                 
   Total Rx Errors :
0                  Deferred Tx     : 0                 

The same behavior is
mimicked on both systems as the counters start incrementing when failing over
the carp interfaces between the peers. Another oddity that is the physical
interfaces show no output errors just input errors:

# netstat -ni
Name    Mtu
  Network     Address              Ipkts Ierrs    Opkts Oerrs Colls
bge0  
 1500  Link      00:1c:23:e1:cf:48 284587839 103261 120699706     0     0
bge0    1500  fe80::%bge0 fe80::21c:23ff:fe 284587839 103261 120699706     0  
  0
bge1    1500  Link      00:1c:23:e1:cf:48 61755734   193 233219963     0
    0
bge1    1500  fe80::%bge1 fe80::21c:23ff:fe 61755734   193 233219963    
0     0
...
trunk0  1500  Link      00:1c:23:e1:cf:48 346220631     0
353798619   167     0
trunk0  1500  192.168.201 192.168.201.23    346220631  
  0 353798619   167     0
trunk0  1500  fe80::%trun fe80::21c:23ff:fe
346220631     0 353798619   167     0
...

Any advice would be appreciated on
what else to look for that is causing these errors.

Regards,
Matt

Additional
info if it helps:
# netstat -sn  
ip:
        461996032 total packets received
        21 bad header checksums
        0 with size smaller than minimum
     
  0 with data size  data length
        0 with header length  data size
   
    0 with data length  header length
        0 with bad options
        0
with incorrect version number
        0 fragments received
        0 fragments
dropped (duplicates or out of space)
        0 malformed fragments dropped
   
    0 fragments dropped after timeout
        0 packets reassembled ok
       
136026433 packets for this host
        0 packets for unknown/unsupported
protocol
        324667376 packets forwarded
        550005 packets not
forwardable
        0 redirects sent
        11050218 packets sent from this
host
        192 packets sent with fabricated ip header
        0 output
packets dropped due to no bufs, etc.
        0 output packets discarded due to
no route
        187636 output datagrams fragmented
        187734 fragments
created
        17589 datagrams that can't be fragmented
        0 fragment
floods
        0 packets with ip length  max ip packet size
        0
tunneling packets that can't find gif
        0 datagrams with bad address in
header
        347724405 input datagrams checksum-processed by hardware
     
  356771381 output datagrams checksum-processed by hardware
        410308
multicast packets which we don't join
icmp:
        1171928 calls to
icmp_error
        0 errors not generated because old message was icmp
       
Output packet histogram:
                echo reply: 295802
               
destination unreachable: 1143260
                time exceeded: 4
        0
messages with bad code fields
        0 messages  minimum length
        46
bad checksums
        4 messages with bad length
        64 echo requests to
broadcast/multicast rejected
        Input packet histogram:
               
echo reply: 22496
                destination unreachable: 22342
             
  source quench: 17
                routing redirect: 172
               
echo: 295866
                time exceeded: 1172
        295802 message
responses generated
igmp:
        0 messages received