It's good that you now have a work-around without rebooting the client 
or server.
IP alias might, or might not, be a problem. However the real problem is 
why the
hang occurs after it has been working for awhile with the server 
configured with
IP alias.

I think the mount with the real IP worked because the client used a 
different
(source) port for new connection, 620. If you try to mount using the IP 
alias
I think the client will use port 664, which already hang (the original 
problem),
and this is why the mount failed. The reason the client uses port 664 to do
the mount because this connection was already established to the server 
using
the IP alias.

You can run these commands on the server to get a little more info on 
port 664:

# ps -ef |grep nfsd           --> get the nfsd PID
# pfiles nfsd_PID           ---> to see all sockets nfsd are using
# pstack nfsd_PID          --> to see what the nfsd threads are doing
# netstat -P tcp -f inet      --> to see what state the TCP sockets are in

-Dai

Jorgen Lundman wrote:
>
> Ok, a server was already hung when I got to work today.
>
>
> **********************************
>
> x4500-04: NFS Server, Sol 10 5/08
>   Server IP (real) 172.20.12.226 netmask ffffff00
>   NFS IP   (alias) 172.20.12.227 netmask ffffff00
>
> x4500-04:~# netstat -in ; netstat -rn
> Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs 
> Collis Queue
> lo0   8232 127.0.0.0     127.0.0.1      1411   0     1411   0     0  0
> e1000g0 1500 172.20.12.0   172.20.12.226  2762497849 0     1789082372 
> 0     0      0
> e1000g1 1500 172.20.19.0   172.20.19.226  96059758 0     52485074 0 
> 0      0
>
>
> Routing Table: IPv4
>   Destination           Gateway           Flags  Ref     Use     
> Interface
> -------------------- -------------------- ----- ----- ---------- 
> ---------
> default              172.20.12.1          UG        1      20456
> 172.20.12.0          172.20.12.226        U         1      45968 e1000g0
> 172.20.12.0          172.20.12.227        U         1          0 
> e1000g0:1
> 172.20.19.0          172.20.19.226        U         1       1662 e1000g1
> 224.0.0.0            172.20.12.226        U         1          0 e1000g0
> 127.0.0.1            127.0.0.1            UH        5        316 lo0
>
>
> **********************************
>
> NFS client: Sol 10 5/08
>   Client IP        172.20.12.6 netmask ffffff00
>
> # netstat -in ; netstat -rn
> Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs 
> Collis Queue
> lo0   8232 127.0.0.0     127.0.0.1      2175   0     2175   0     0  0
> e1000g0 1500 172.20.12.0   172.20.12.6    43315618 0     41987515 0 
> 0      0
> e1000g1 1500 172.20.11.0   172.20.11.6    19673254 0     13928826 0 
> 0      0
>
>
> Routing Table: IPv4
>   Destination           Gateway           Flags  Ref     Use     
> Interface
> -------------------- -------------------- ----- ----- ---------- 
> ---------
> default              172.20.11.4          UG        1      52386
> 10.0.0.0             172.20.12.1          UG        1          0
> 172.16.0.0           172.20.12.1          UG        1        193
> 172.20.11.0          172.20.11.6          U         1       2406 e1000g1
> 172.20.12.0          172.20.12.6          U         1       3163 e1000g0
> 192.168.0.0          172.20.12.1          UG        1        120
> 224.0.0.0            172.20.12.6          U         1          0 e1000g0
> 127.0.0.1            127.0.0.1            UH        4       2046 lo0
>
>
>
> *********************************
>
>
>
> Snoop running on NFS Client 172.20.12.6 attempting to (re)mount volume 
> with TCP:
>
> # snoop -r host 172.20.12.227 or host 172.20.12.226 &
> # mount /export/www
>  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) 
> vers=3 proto=UDP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
>  172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
>  172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
>  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) 
> vers=3 proto=TCP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Syn Ack=788700587 
> Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 
> Seq=788700587 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 NFS C NULL3
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700707 
> Seq=3596066620 Len=0 Win=49520
> 172.20.12.227 -> 172.20.12.6  NFS R NULL3
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 
> Seq=788700707 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 
> Seq=788700707 Len=0 Win=49640
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700708 
> Seq=3596066648 Len=0 Win=49640
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Fin Ack=788700708 
> Seq=3596066648 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 
> Seq=788700708 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
> Interesting, looks like x4500-04 is replying with the wrong IP.
>
>
>
> Packet capture on x4500-04:
>
> # snoop -r host 172.20.12.6
> Using device /dev/e1000g0 (promiscuous mode)
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 
> Seq=2924968134 Len=0 Win=49640
> 172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Rst Win=49640
>  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) 
> vers=3 proto=UDP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
>  172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
>  172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
>  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) 
> vers=3 proto=TCP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Syn Ack=788700587 
> Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 
> Seq=788700587 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 NFS C NULL3
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700707 
> Seq=3596066620 Len=0 Win=49520
> 172.20.12.227 -> 172.20.12.6  NFS R NULL3
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 
> Seq=788700707 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 
> Seq=788700707 Len=0 Win=49640
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700708 
> Seq=3596066648 Len=0 Win=49640
> 172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Fin Ack=788700708 
> Seq=3596066648 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 
> Seq=788700708 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
> Seq=3544124023 Len=0 Win=49640
>
>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
> Seq=3544124023 Len=0 Win=49640
>
>
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
> Seq=3544124023 Len=0 Win=49640
>
>
>
> *** Attempting mount using the real IP instead of the alias:
>
>
> # mount -o vers=3,hard,intr,quota 172.20.12.226:/export/www /export/www
> ssl01:/#  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100005 
> (MOUNT) vers=3 proto=UDP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
>  172.20.12.6 -> 172.20.12.226 MOUNT3 C Null
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
>  172.20.12.6 -> 172.20.12.226 MOUNT3 C Mount /export/www
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
>  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS) 
> vers=3 proto=TCP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Syn Seq=88322761 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Syn Ack=88322762 
> Seq=3700270536 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270537 
> Seq=88322762 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 NFS C NULL3
> 172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Ack=88322882 
> Seq=3700270537 Len=0 Win=49520
> 172.20.12.226 -> 172.20.12.6  NFS R NULL3
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270565 
> Seq=88322882 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Fin Ack=3700270565 
> Seq=88322882 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Ack=88322883 
> Seq=3700270565 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Fin Ack=88322883 
> Seq=3700270565 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270566 
> Seq=88322883 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 
> Seq=3056789346 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Syn Seq=1932893789 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Syn Ack=1932893790 
> Seq=3700480396 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480397 
> Seq=1932893790 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 NFS C FSINFO3 FH=D402
> 172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Ack=1932893946 
> Seq=3700480397 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  NFS R FSINFO3 OK
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480565 
> Seq=1932893946 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 NFS C FSSTAT3 FH=D402
> 172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Ack=1932894102 
> Seq=3700480565 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  NFS R FSSTAT3 OK
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480737 
> Seq=1932894102 Len=0 Win=49640
>
> Which works without issue. So it is not an NFS problem, it seems to be 
> related to alias IPs.
>
> Do you know a way around this? Or perhaps you can suggest a place 
> where I can go to ask. As a quick solution we will just forgo the 
> Alias IP and mount directly on the "real" IP. Why can I change 
> protocol (TCP->UDP and  vv) to get around it, why can I reboot the NFS 
> client as well. Did we create the aliases wrong?
>
> I apologise for the noise in NFS discussion list.
>
> Lund
>
>
>
> Dai Ngo wrote:
>> The problem seems to be on the TCP connection between the client and 
>> the nfsd on
>> the server. The portmap and mount requests used UDP and they went OK.
>>
>> There are a number TCP RST packets sent from both the client and 
>> server, this indicated
>> there might be problem with packets lost causing both sides to be out 
>> of sync.
>>
>> Looks like the server has 2 NICs on the same subnet, 172.20.12.221 
>> and 172.20.12.220.
>> Have you tried disable 172.20.12.220 and just use 172.20.12.221 to 
>> see if it helps.
>> What the output of the 'netstat -in' and 'netstat -rn' on the server 
>> and the client look like?
>>
>> By the way, where were the packets captured from? on the server or 
>> the client. It's more
>> useful if you can capture the packets on both sides and attach the 
>> raw capture files so
>> they can be compared and examined in more details.
>


Reply via email to