While setting up a cluster, and I've stumbled across what appears to be a bug
in the IP masqing of 2.2.17. 

Here's my setup:
15 machines on 10.0.0.1-15. 10.0.0.1 has IP aliasing and is also on the
external net, so it can IP masq the other 14 machines. The machines are on a
switch, and share a semi-switched network segment with a bunch of other
external IP'd machines (we are all in the same lab, actually).

The bug: 
When I make a connection from any internal node to the one of the other 
externally routed machines in my lab, then close it, this external machine then 
becomes unreachable to successive connects from that node.

ex:
[root@node2 /root]# telnet 128.174.21.2 22
Trying 128.174.21.2...
Connected to fake.ip.uiuc.edu (128.174.21.2).
Escape character is '^]'.
SSH-1.5-1.2.27
^]
telnet> q
Connection closed.

[root@node2 /root]# telnet 128.174.21.2 22
Trying 128.174.21.2...

...

The problem also happens if I telnet to a closed port a few times in a row.
Soon the machine is unreachable by any network traffic from that node. If I
switch to a new node, I can connect just once from that node, and then
silence.

This problem does NOT manifest itself for connecting to machines outside of 
the local network. That seems to work fine.

More detailed setup info:
If it matters, all internal machines use eepro100's, and netboot via 
dhcp/PXE off of the 10.0.0.1 machine. 

Here's a sample routing table of the internal machines:
[root@node2 /root]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 eth0

And the world node:
[root@world /root]# route -n                                                   
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
128.174.21.50   0.0.0.0         255.255.255.255 UH    0      0        0 eth0
128.174.21.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         128.174.21.11   0.0.0.0         UG    0      0        0 eth0

And the script to load the IP masq modules and setup ipchains:
#!/bin/sh
/sbin/ifconfig eth0:0 10.0.0.1
echo 1 > /proc/sys/net/ipv4/ip_forward
/sbin/ipchains -F input
/sbin/ipchains -F output
/sbin/ipchains -F forward
/sbin/ipchains -P input ACCEPT
/sbin/ipchains -P output ACCEPT
/sbin/ipchains -P forward DENY 
/sbin/ipchains -A forward -s 10.0.0.0/8 -j MASQ 

All the ip masq modules are loadded:
[root@world /root]# lsmod
Module                  Size  Used by
ip_masq_vdolive         1336   0  (unused)
ip_masq_user            2632   0  (unused)
ip_masq_raudio          3000   0  (unused)
ip_masq_quake           1352   0  (unused)
ip_masq_portfw          2560   0  (unused)
ip_masq_mfw             3144   0  (unused)
ip_masq_irc             1592   0  (unused)
ip_masq_ftp             2616   0  (unused)
ip_masq_cuseeme         1080   0  (unused)
ip_masq_autofw          2480   0  (unused)

The problem still occurs with no modules loaded.

-- 
Mike Perry
http://so.fscked.org
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/

Reply via email to