RE: Very Strange Problem....Any Ideas? [7:59682]

Priscilla Oppenheimer Mon, 23 Dec 2002 10:43:52 -0800

You should probably look into the behavior of those UNIX servers that insist
on using a /24 mask. Is it possible that they also advertise such a mask
with ICMP? See RFC 1256 for more info about ICMP router solicitations and
advertisements.


I think I have seen Windows and Macintosh machines broadcast ICMP router
solicitation messages. Usually they are ignored, but perhaps not in your
case. The UNIX machines may have responded with the wrong mask.

Routers (and hosts that do routing) also periodically broadcast or multicast
router advertisements, which can wreak havoc even if the other hosts don't
ask for the info.

You said you checked the mask on all your devices, though... But perhaps the
problem was intermittent, due to the intermitten nature of ICMP router
advertisements.

Another thought is that there could have been a rogue DHCP server somewhere.
These days DHCP server capability is showing up in all sorts of devices,
particularly wireless access points, but also other Internet toasters,
refigerators, etc. ;-)

Anyway, I guess you already know that you'll want to put a sniffer on this
network and figure out what is really going on before you go back to the
addressing that should, of course, work.

Priscilla

Craig Columbus wrote:
> 
> I worked on a network move for a brokerage company last week
> and
> encountered a VERY strange problem.
> 
> We moved a bunch of equipment to a new office building.  During
> the
> process, we changed the internal network from 192.168.100.0/24
> to
> 172.31.4.0/22.
> There company has 4 Cisco 3500XL 48 port switches, with no
> VLANs and plain
> vanilla configurations.  The fanciest thing is portfast on the
> client
> machine ports.
> Switches are linked via GBICs in a cascade.  There is one
> client maintained
> router that sits before the firewall with only static routes
> and no routing
> protocols.
> There are multiple outside vendor routers for specific
> applications
> (real-time quotes, clearinghouse mainframe, etc.), but these
> too also have
> only static routes and no routing protocols.
> 
> After installing all of the network equipment and servers, we
> started to
> turn on clients and get new DHCP addresses.  Since the new
> network was
> 172.31.4.0/22, 172.31.4.1 - 172.31.4.255 was reserved for
> servers,
> printers, switches, and routers.  The remaining 172.31.5.0 -
> 172.31.7.254
> was reserved for clients...though there are only about 100
> clients at the
> moment and thus they only took 5.0 - 5.100 or so in DHCP.
> 
> After installing maybe 20 clients or so, we started to see mass
> slowdowns
> on the network.  Pings between clients and servers were very
> irregular and
> intermittent.  There was no discernable pattern to when pings
> would succeed
> and when they'd fail.  We exhaustively went through all devices
> and made
> sure that they'd been correctly set to the new mask and that
> all server
> functions (DNS, WINS, AD, etc.) had been correctly setup for
> the new
> subnet.  Everything looked fine.  In an effort to troubleshoot,
> we unhooked
> the switch stack and put core servers and a few clients on a
> single
> switch.  Again, communication was irregular and unpredictable,
> whether with
> static or DHCP addresses on the clients.  Sometimes things
> would be fine,
> other times clients could ping the server, but not the switch
> to which they
> were attached.  Sometimes clients could ping the switch, but
> not the
> server.  Sometimes the clients could ping neither.  Again,
> there seemed to
> be no pattern.  Thinking there might have been some IOS bug, we
> erased
> nvram, upgraded the switches to current IOS code, and put in a
> completely
> plain configuration.  This had no effect on the problem.
> 
> After 4 of us (with probably 50 years of industry experience
> between us)
> spent 15 hours or so trying to resolve the issue, I finally
> suggested we
> try moving the clients from the 172.31.5.x/22 block to the
> 172.31.4.x/22
> block.  This solved all problems, and all clients were able to
> ping both
> switches and servers 100% of the time.  Again, we didn't change
> the mask on
> anything, only the third octet of the client ip range.  We then
> went back
> and triple checked every device attached to the
> network....servers,
> routers, switches, printers, clients, etc.  Every single device
> had the
> correct mask (/22) except for two vendor maintained UNIX
> boxes...they had
> 172.31.4.x/24.  We suspected as much earlier since clients
> couldn't
> communicate with the UNIX boxes from the beginning, but the
> other servers
> could communicate with the UNIX boxes without issue.  These
> UNIX servers
> weren't running RIP(or any other RP)...and besides, there
> aren't any other
> network devices listening for RIP....so we weren't really
> concerned about
> them causing the network connectivity issues.  At the time, I
> couldn't see
> how a bad mask on these boxes could effectively make the whole
> network
> unusable, so I didn't bother correcting it early in the day.
> 
> At this point, I've had a week to think about the issue and I
> still don't
> have a logical reason for why this problem might have
> occurred.  Anyone out
> there have any thoughts?
> I'm going back to put in a 3550EMI as the core in a couple of
> weeks.  At
> that point, we're going to investigate more and try to move the
> clients
> back to the 172.31.5.x range.  I'd like to test theories at
> that time if
> anyone can put one forward that we didn't already test....as I
> said, we
> spent a lot of time on this and I didn't put every test we did
> in this
> email.  All I can offer is that it wasn't IOS code (we tried
> more than one
> version), it wasn't the switches (we tried several, including
> non-Cisco),
> it wasn't DNS, WINS, DHCP, or any other server side issue (we
> thoroughly
> examined and ruled those out...beside, this was even happening
> at the IP
> level between switches).  Everything had worked correctly at
> the old
> building...the only two things that changed significantly
> during the move
> were the IP range and the building wiring.  AND, the wiring in
> the new
> building was brand new Cat6...I even dug out the WireScope and
> verified
> that the drops passed spec.
> 
> Thanks!
> Craig
> 
> 




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=59763&t=59682
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

RE: Very Strange Problem....Any Ideas? [7:59682]

Reply via email to