It's good that you now have a work-around without rebooting the client or server. IP alias might, or might not, be a problem. However the real problem is why the hang occurs after it has been working for awhile with the server configured with IP alias.
I think the mount with the real IP worked because the client used a different (source) port for new connection, 620. If you try to mount using the IP alias I think the client will use port 664, which already hang (the original problem), and this is why the mount failed. The reason the client uses port 664 to do the mount because this connection was already established to the server using the IP alias. You can run these commands on the server to get a little more info on port 664: # ps -ef |grep nfsd --> get the nfsd PID # pfiles nfsd_PID ---> to see all sockets nfsd are using # pstack nfsd_PID --> to see what the nfsd threads are doing # netstat -P tcp -f inet --> to see what state the TCP sockets are in -Dai Jorgen Lundman wrote: > > Ok, a server was already hung when I got to work today. > > > ********************************** > > x4500-04: NFS Server, Sol 10 5/08 > Server IP (real) 172.20.12.226 netmask ffffff00 > NFS IP (alias) 172.20.12.227 netmask ffffff00 > > x4500-04:~# netstat -in ; netstat -rn > Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs > Collis Queue > lo0 8232 127.0.0.0 127.0.0.1 1411 0 1411 0 0 0 > e1000g0 1500 172.20.12.0 172.20.12.226 2762497849 0 1789082372 > 0 0 0 > e1000g1 1500 172.20.19.0 172.20.19.226 96059758 0 52485074 0 > 0 0 > > > Routing Table: IPv4 > Destination Gateway Flags Ref Use > Interface > -------------------- -------------------- ----- ----- ---------- > --------- > default 172.20.12.1 UG 1 20456 > 172.20.12.0 172.20.12.226 U 1 45968 e1000g0 > 172.20.12.0 172.20.12.227 U 1 0 > e1000g0:1 > 172.20.19.0 172.20.19.226 U 1 1662 e1000g1 > 224.0.0.0 172.20.12.226 U 1 0 e1000g0 > 127.0.0.1 127.0.0.1 UH 5 316 lo0 > > > ********************************** > > NFS client: Sol 10 5/08 > Client IP 172.20.12.6 netmask ffffff00 > > # netstat -in ; netstat -rn > Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs > Collis Queue > lo0 8232 127.0.0.0 127.0.0.1 2175 0 2175 0 0 0 > e1000g0 1500 172.20.12.0 172.20.12.6 43315618 0 41987515 0 > 0 0 > e1000g1 1500 172.20.11.0 172.20.11.6 19673254 0 13928826 0 > 0 0 > > > Routing Table: IPv4 > Destination Gateway Flags Ref Use > Interface > -------------------- -------------------- ----- ----- ---------- > --------- > default 172.20.11.4 UG 1 52386 > 10.0.0.0 172.20.12.1 UG 1 0 > 172.16.0.0 172.20.12.1 UG 1 193 > 172.20.11.0 172.20.11.6 U 1 2406 e1000g1 > 172.20.12.0 172.20.12.6 U 1 3163 e1000g0 > 192.168.0.0 172.20.12.1 UG 1 120 > 224.0.0.0 172.20.12.6 U 1 0 e1000g0 > 127.0.0.1 127.0.0.1 UH 4 2046 lo0 > > > > ********************************* > > > > Snoop running on NFS Client 172.20.12.6 attempting to (re)mount volume > with TCP: > > # snoop -r host 172.20.12.227 or host 172.20.12.226 & > # mount /export/www > 172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) > vers=3 proto=UDP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049 > 172.20.12.6 -> 172.20.12.227 MOUNT3 C Null > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Null > 172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix > 172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) > vers=3 proto=TCP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Syn Ack=788700587 > Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale > 0,nop,nop,sackOK> > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 > Seq=788700587 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 NFS C NULL3 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700707 > Seq=3596066620 Len=0 Win=49520 > 172.20.12.227 -> 172.20.12.6 NFS R NULL3 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 > Seq=788700707 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 > Seq=788700707 Len=0 Win=49640 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700708 > Seq=3596066648 Len=0 Win=49640 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Fin Ack=788700708 > Seq=3596066648 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 > Seq=788700708 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > > > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > > > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > > Interesting, looks like x4500-04 is replying with the wrong IP. > > > > Packet capture on x4500-04: > > # snoop -r host 172.20.12.6 > Using device /dev/e1000g0 (promiscuous mode) > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 > Seq=2924968134 Len=0 Win=49640 > 172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Rst Win=49640 > 172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) > vers=3 proto=UDP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049 > 172.20.12.6 -> 172.20.12.227 MOUNT3 C Null > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Null > 172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix > 172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) > vers=3 proto=TCP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Syn Ack=788700587 > Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale > 0,nop,nop,sackOK> > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 > Seq=788700587 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 NFS C NULL3 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700707 > Seq=3596066620 Len=0 Win=49520 > 172.20.12.227 -> 172.20.12.6 NFS R NULL3 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 > Seq=788700707 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 > Seq=788700707 Len=0 Win=49640 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700708 > Seq=3596066648 Len=0 Win=49640 > 172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Fin Ack=788700708 > Seq=3596066648 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 > Seq=788700708 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783 > Seq=3544124023 Len=0 Win=49640 > > > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783 > Seq=3544124023 Len=0 Win=49640 > > > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783 > Seq=3544124023 Len=0 Win=49640 > > > > *** Attempting mount using the real IP instead of the alias: > > > # mount -o vers=3,hard,intr,quota 172.20.12.226:/export/www /export/www > ssl01:/# 172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100005 > (MOUNT) vers=3 proto=UDP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049 > 172.20.12.6 -> 172.20.12.226 MOUNT3 C Null > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Null > 172.20.12.6 -> 172.20.12.226 MOUNT3 C Mount /export/www > 172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix > 172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS) > vers=3 proto=TCP > 172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049 > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Syn Seq=88322761 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Syn Ack=88322762 > Seq=3700270536 Len=0 Win=49640 Options=<mss 1460,nop,wscale > 0,nop,nop,sackOK> > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270537 > Seq=88322762 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 NFS C NULL3 > 172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Ack=88322882 > Seq=3700270537 Len=0 Win=49520 > 172.20.12.226 -> 172.20.12.6 NFS R NULL3 > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270565 > Seq=88322882 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Fin Ack=3700270565 > Seq=88322882 Len=0 Win=49640 > 172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Ack=88322883 > Seq=3700270565 Len=0 Win=49640 > 172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Fin Ack=88322883 > Seq=3700270565 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270566 > Seq=88322883 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 > Seq=3056789346 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Syn Seq=1932893789 > Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK> > 172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Syn Ack=1932893790 > Seq=3700480396 Len=0 Win=49640 Options=<mss 1460,nop,wscale > 0,nop,nop,sackOK> > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480397 > Seq=1932893790 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 NFS C FSINFO3 FH=D402 > 172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Ack=1932893946 > Seq=3700480397 Len=0 Win=49640 > 172.20.12.226 -> 172.20.12.6 NFS R FSINFO3 OK > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480565 > Seq=1932893946 Len=0 Win=49640 > 172.20.12.6 -> 172.20.12.226 NFS C FSSTAT3 FH=D402 > 172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Ack=1932894102 > Seq=3700480565 Len=0 Win=49640 > 172.20.12.226 -> 172.20.12.6 NFS R FSSTAT3 OK > 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480737 > Seq=1932894102 Len=0 Win=49640 > > Which works without issue. So it is not an NFS problem, it seems to be > related to alias IPs. > > Do you know a way around this? Or perhaps you can suggest a place > where I can go to ask. As a quick solution we will just forgo the > Alias IP and mount directly on the "real" IP. Why can I change > protocol (TCP->UDP and vv) to get around it, why can I reboot the NFS > client as well. Did we create the aliases wrong? > > I apologise for the noise in NFS discussion list. > > Lund > > > > Dai Ngo wrote: >> The problem seems to be on the TCP connection between the client and >> the nfsd on >> the server. The portmap and mount requests used UDP and they went OK. >> >> There are a number TCP RST packets sent from both the client and >> server, this indicated >> there might be problem with packets lost causing both sides to be out >> of sync. >> >> Looks like the server has 2 NICs on the same subnet, 172.20.12.221 >> and 172.20.12.220. >> Have you tried disable 172.20.12.220 and just use 172.20.12.221 to >> see if it helps. >> What the output of the 'netstat -in' and 'netstat -rn' on the server >> and the client look like? >> >> By the way, where were the packets captured from? on the server or >> the client. It's more >> useful if you can capture the packets on both sides and attach the >> raw capture files so >> they can be compared and examined in more details. >