Re: [leaf-user] LRP router failing?

Ray Olszewski Sat, 09 Oct 2004 18:43:11 -0700

Dale -- This is a tough one to pin down. So far, you've been doing the right things, and your approach is about as systematic as mine would have been in the same place. So these comments are best read as things for you to look into when you are in Boise and can access the router either through a console or locally.

1. When you closed the Portland office, did you remove the VPN links to it from Seattle and Boise? If not, might the Boise router be spinning its wheels trying to establish a VPN connection to a vanished other end? This is one problem that I can see easily surviving an equipment change, since you used the same floppy and CD, so the same configuration.

2. The comment that the problem goes away on the weekend catches my eye, and it makes me wonder if the problem is not in the router but instead in some device on the LAN ... something is generating a huge pile of packets that get processed and blocked by the router ... enough traffic that it burns CPU cycles to the point where even light traffic like pings get dropped. Could be a virus-infected host, or a bad port on a switch, or something I'm not thinking of, since I don't know your network. (I know you say you disconnected everything from the LAN side ... but *you* didn't really do that, since you weren't in Boise. This is enough of a possibility to make it worth checking that whoever was onsite didn't miss something "unimportant". And anyway, your description makes it sound like you did not disconenct the switches or hubs, and one of them might be the problem.)

3. From what you wrote, I (finally) realized it is not clear if this problem is occurring ONLY on the VPN links or with all traffic to/from Boise. For example, the test you did where you try (from Seattle, presumably) to ping the Boise router's internal address makes sense only in a VPN context ... but I don't know if your other tests were limited to this context as well.

4. You do'n't report the results of any connctivity tests done from the LAN side in Boise (or I don't think you do). From a host on that LAN, can one consistently ping the LEAF router's internal address? External address? The DSL router's address?

Anyway, if the load gets low enough that you can ssh in, see what you see from running "top" ... is there significant CPU load on the system (you want load as measused by top, NOT as measured by uptime, for this calculation). Check the ipchains rulesets ("ipchains -nvL", I think ... it's been awhile since I worked with a 2.2.x kernel) and see if any rule has blocked, or otherwise processed, very large numbers of packets.

Finally, in a tes where only the LEAF router and the DSL router are connected, can each ping the other with no loss of packets? (Did the "changed every cable" piece include replacing the cable between them? Put a hub or switch between them and see if the interface on the DSL router is chattering.)

At 04:47 PM 10/9/2004 -0700, Dale Mirenda wrote:

I've been using a set of identical Dachstein CD v.1.0.2 routers (2.2.19-3-LEAF-RAID) with ipsec VPN to link three small offices for several years. They have run literally flawlessly in all that time, and I've never had a problem from intrusion from the internet or virus attack from the private side. The network is very simple: three interconnected private networks, no DMZ: 192.168.1.0/24 in Seattle via T1 (384K data bandwidth) (that's where I am) 192.168.2.0/24 in Portland via T1 (768K data bandwidth) 192.168.3.0/24 in Boise via DSL (768K data bandwidth)

Two weeks ago, we had to close the Portland office so that router is no longer part of the network. About three weeks ago, the Boise network (three users, five desktops, two networked printers, a Linux fileserver, and a 12-port HP ProCurve 2424 switch) started dropping packets, no big deal to start with but the users noticed that in the mornings it took a long time to access the Seattle fileserver (identical to the one in Boise) and sometimes they could not send emails or access websites. Most afternoons, the problem would clear up by itself. Pings to the DSL router (Flowpoint 2200, 64.113.213.13) showed no dropped packets at all. Pings to the LEAF router outside address (64.113.213.14) would drop 3% to 5% in the mornings, and 0% to 5% in the afternoons. Pings to the inside network would drop 10% to 60% in the mornings, and 0% to 10% in the afternoons. Within a few days the problem worsened, with as much as 85% dropped packets to the inside addresses in the mornings, but still clearing up most days by afternoon. On the weekend, the problem all but disappeared but returned Monday morning.

I verified with the ISP (Transedge, great customer service, highly reccmmend) that there was not problem up to the DSL router. I had the Boise staff temporarily replace the LEAF router with a Win98 box set to the router outside address (64.113.213.14) and dropped no packets at all. We replaced all network cables attached to the routers. I immediately tested and shipped a replacement router to them. I talked them through setting the new router up, using the same CD and floppy from the old router, and had them ship the old router to me.

The problems did not go away, and got worse as the days passed. I received and tested the old router, and it worked fine. Head-scratching time.

I had the Boise staff shut off all networked devices except one of the printers. The problem did not go away. I had them pull the network cable from the switch to the LEAF router. Still dropped packets.

As of today, the network is virtually inaccessible from the outside. Pings to the DSL router are still fine: ITPB:~ dale$ ping -c 10 64.113.213.13 PING 64.113.213.13 (64.113.213.13): 56 data bytes 64 bytes from 64.113.213.13: icmp_seq=0 ttl=238 time=83.287 ms 64 bytes from 64.113.213.13: icmp_seq=1 ttl=238 time=82.428 ms 64 bytes from 64.113.213.13: icmp_seq=2 ttl=238 time=82.916 ms 64 bytes from 64.113.213.13: icmp_seq=3 ttl=238 time=82.382 ms 64 bytes from 64.113.213.13: icmp_seq=4 ttl=238 time=83.119 ms 64 bytes from 64.113.213.13: icmp_seq=5 ttl=238 time=82.121 ms 64 bytes from 64.113.213.13: icmp_seq=6 ttl=238 time=84.343 ms 64 bytes from 64.113.213.13: icmp_seq=7 ttl=238 time=83.358 ms 64 bytes from 64.113.213.13: icmp_seq=8 ttl=238 time=81.6 ms 64 bytes from 64.113.213.13: icmp_seq=9 ttl=238 time=80.802 ms
--- 64.113.213.13 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 80.802/82.635/84.343 ms
Pings to the LEAF router outside address drop the majority of packets;
ITPB:~ dale$ ping -c 20 64.113.213.14
PING 64.113.213.14 (64.113.213.14): 56 data bytes
64 bytes from 64.113.213.14: icmp_seq=0 ttl=237 time=81.561 ms
64 bytes from 64.113.213.14: icmp_seq=9 ttl=237 time=82.785 ms
64 bytes from 64.113.213.14: icmp_seq=11 ttl=237 time=83.254 ms
64 bytes from 64.113.213.14: icmp_seq=15 ttl=237 time=83.496 ms
64 bytes from 64.113.213.14: icmp_seq=19 ttl=237 time=84.834 ms
--- 64.113.213.14 ping statistics ---
20 packets transmitted, 5 packets received, 75% packet loss
round-trip min/avg/max = 81.561/83.186/84.834 ms
Pings to the inside LEAF router address (192.168.3.254) are never returned:
ITPB:~ dale$ ping -c 200 192.168.3.254
PING 192.168.3.254 (192.168.3.254): 56 data bytes
--- 192.168.3.254 ping statistics ---
200 packets transmitted, 0 packets received, 100% packet loss
nmap -sP 192.168.3.0/24 runs about two minutes (compared to the usual 10 - 15 seconds) and returns "0 hosts up."

As a consequence, I've lost the ability to ssh into the LEAF router or the Linux fileserver in Boise. I'm flying to Boise on Wednesday, but I really don't know what to look for as a solution: 1. I've never dropped a packet sent to the DSL router so the ISP appears to be blameless. 2. The Boise LEAF hardware has been replaced with a tested machine and it's been verified that there was nothing wrong with the one that was replaced. 3. I've never known a problem with LEAF software to survive a reboot. 4. The problem persists even with no client machines operating on the private side of the router.

I really don't know where to go from here. These machines were so easy to set up and they have worked so well that I have never had to troubleshoot them before. I know how to use ping and fping, and a bit about nmap (but not much). Mainly, I don't have any idea apart from a bad network cable, bad NIC in the router, virus or adware on the network, what could cause something like this in the first place, and all of those possibilities have been eliminated to my satisfaction.


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
------------------------------------------------------------------------
leaf-user mailing list: [EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/leaf-user
SR FAQ: http://leaf-project.org/pub/doc/docmanager/docid_1891.html

Re: [leaf-user] LRP router failing?

Reply via email to