We have also been seeing this OOB behavior also on just our Cisco 3750s.  We
are running v4.1.2.1 on the Clean Access servers in Centrally deployed
virtual gateway OOB mode.  The switches were all upgraded to 12.2(40)SE this
Winter Break.  We currently have an open TAC case.

We have had 4 occurrences of widespread CCA system failures, one in October,
one in December, and 2 in January.

The first indication is an "OOB - unable to set vlan on switch port" in the
CCA manager events log.  (NOTE there is NO indication of WHAT switch port,
which would be a handy piece of information!!!)

If you try to display the effected switch device via the CCA manager GUI,
CCA will indicate it is unable to manage the switch.  Basically, the SNMP
set/get request from the CCA manager to the switch are not processed by the
switch.  That means all users already in the access VLAN are fine and can
remain on line, those trying to log in will not be able to, as the CCA Agent
will indicated they are "Successfully logged in" but seconds later the login
screen reappears since the manager cannot change the port from the
authentication VLAN to the access VLAN via an SNMP set request.

This condition spreads like wildfire throughout our network to random switch
stacks.  In general we have 2/3rds of our stacks go down.

Sending SNMP queries from other systems or monitoring devices will just time
out, so the SNMP service on the switch is truly not responding.  If you
sniff the traffic, you can see the queries sent out with no response, though
it appeared there was some SNMP traffic coming from the switch. 

This appears to be a switch issue rather than a Clean Access issue at this
point, but it is not know what triggers the event.  It has never occurred on
our 3550s in OOB mode.  All else in OOB mode are 3750s.

If you use the "show snmp" command on the affected switch, it may indicate
an increase in the number of dropped SNMP packets, but not always. 
Increasing the SNMP queue size to the maximum has not resolved this issue
either. (I don't know what the default queue size is, so I don't know how
much we increased it.)

I initially opened a TAC case with the Clean Access team, but they turned it
over to the SNMP team, who have not been able to come up with any resolution
as yet.  We are waiting for another general system failure so I can collect
packets to/from the effected switches & CCA manager for TAC to analyze.

While we have tried a number of switch & CCA configuration changes (all are
now using SNMPv2), as well as removing the SNMP settings on the switch and
replacing them during the event, nothing appears to make a difference.  A
reboot of the switch does not work either. However, after some amount of
time, the switches come back up on their own.  The first three events took
about an hour to clear all the switches, the last event took 3 1/2 hours to
finally clear all.

If anyone has any ideas please let me know.  If interested, contact me off
line for more details and our Cisco TAC case #.

Regards,

-Bill Davis
Network Security Administrator
Housing Technology
Colorado State University
[EMAIL PROTECTED]

Reply via email to