Ok, a server was already hung when I got to work today.

**********************************

x4500-04: NFS Server, Sol 10 5/08
   Server IP (real) 172.20.12.226 netmask ffffff00
   NFS IP   (alias) 172.20.12.227 netmask ffffff00

x4500-04:~# netstat -in ; netstat -rn
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis 
Queue
lo0   8232 127.0.0.0     127.0.0.1      1411   0     1411   0     0 
  0
e1000g0 1500 172.20.12.0   172.20.12.226  2762497849 0     1789082372 0 
     0      0
e1000g1 1500 172.20.19.0   172.20.19.226  96059758 0     52485074 0 
0      0


Routing Table: IPv4
   Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              172.20.12.1          UG        1      20456
172.20.12.0          172.20.12.226        U         1      45968 e1000g0
172.20.12.0          172.20.12.227        U         1          0 e1000g0:1
172.20.19.0          172.20.19.226        U         1       1662 e1000g1
224.0.0.0            172.20.12.226        U         1          0 e1000g0
127.0.0.1            127.0.0.1            UH        5        316 lo0


**********************************

NFS client: Sol 10 5/08
   Client IP        172.20.12.6 netmask ffffff00

# netstat -in ; netstat -rn
Name  Mtu  Net/Dest      Address        Ipkts  Ierrs Opkts  Oerrs Collis 
Queue
lo0   8232 127.0.0.0     127.0.0.1      2175   0     2175   0     0 
  0
e1000g0 1500 172.20.12.0   172.20.12.6    43315618 0     41987515 0 
0      0
e1000g1 1500 172.20.11.0   172.20.11.6    19673254 0     13928826 0 
0      0


Routing Table: IPv4
   Destination           Gateway           Flags  Ref     Use     Interface
-------------------- -------------------- ----- ----- ---------- ---------
default              172.20.11.4          UG        1      52386
10.0.0.0             172.20.12.1          UG        1          0
172.16.0.0           172.20.12.1          UG        1        193
172.20.11.0          172.20.11.6          U         1       2406 e1000g1
172.20.12.0          172.20.12.6          U         1       3163 e1000g0
192.168.0.0          172.20.12.1          UG        1        120
224.0.0.0            172.20.12.6          U         1          0 e1000g0
127.0.0.1            127.0.0.1            UH        4       2046 lo0



*********************************



Snoop running on NFS Client 172.20.12.6 attempting to (re)mount volume 
with TCP:

# snoop -r host 172.20.12.227 or host 172.20.12.226 &
# mount /export/www
  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) 
vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
  172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
  172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) 
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 
Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Syn Ack=788700587 
Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
0,nop,nop,sackOK>
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 
Seq=788700587 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 NFS C NULL3
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700707 
Seq=3596066620 Len=0 Win=49520
172.20.12.227 -> 172.20.12.6  NFS R NULL3
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 
Seq=788700707 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 
Seq=788700707 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700708 
Seq=3596066648 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Fin Ack=788700708 
Seq=3596066648 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 
Seq=788700708 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>


  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>


  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>

Interesting, looks like x4500-04 is replying with the wrong IP.



Packet capture on x4500-04:

# snoop -r host 172.20.12.6
Using device /dev/e1000g0 (promiscuous mode)
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 Seq=2924968134 
Len=0 Win=49640
172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Rst Win=49640
  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT) 
vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
  172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
  172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
  172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS) 
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586 
Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Syn Ack=788700587 
Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
0,nop,nop,sackOK>
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620 
Seq=788700587 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 NFS C NULL3
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700707 
Seq=3596066620 Len=0 Win=49520
172.20.12.227 -> 172.20.12.6  NFS R NULL3
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648 
Seq=788700707 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648 
Seq=788700707 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Ack=788700708 
Seq=3596066648 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6  TCP D=63800 S=2049 Fin Ack=788700708 
Seq=3596066648 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649 
Seq=788700708 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
Seq=3544124023 Len=0 Win=49640


  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
Seq=3544124023 Len=0 Win=49640


  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6  TCP D=664 S=2049 Ack=2876021783 
Seq=3544124023 Len=0 Win=49640



*** Attempting mount using the real IP instead of the alias:


# mount -o vers=3,hard,intr,quota 172.20.12.226:/export/www /export/www
ssl01:/#  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100005 
(MOUNT) vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
  172.20.12.6 -> 172.20.12.226 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
  172.20.12.6 -> 172.20.12.226 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=D402 Auth=unix
  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS) 
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Syn Seq=88322761 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Syn Ack=88322762 
Seq=3700270536 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
0,nop,nop,sackOK>
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270537 
Seq=88322762 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 NFS C NULL3
172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Ack=88322882 
Seq=3700270537 Len=0 Win=49520
172.20.12.226 -> 172.20.12.6  NFS R NULL3
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270565 
Seq=88322882 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Fin Ack=3700270565 
Seq=88322882 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Ack=88322883 
Seq=3700270565 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6  TCP D=63802 S=2049 Fin Ack=88322883 
Seq=3700270565 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270566 
Seq=88322883 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 Seq=3056789346 
Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Syn Seq=1932893789 Len=0 
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Syn Ack=1932893790 
Seq=3700480396 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
0,nop,nop,sackOK>
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480397 
Seq=1932893790 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 NFS C FSINFO3 FH=D402
172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Ack=1932893946 
Seq=3700480397 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6  NFS R FSINFO3 OK
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480565 
Seq=1932893946 Len=0 Win=49640
  172.20.12.6 -> 172.20.12.226 NFS C FSSTAT3 FH=D402
172.20.12.226 -> 172.20.12.6  TCP D=620 S=2049 Ack=1932894102 
Seq=3700480565 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6  NFS R FSSTAT3 OK
  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480737 
Seq=1932894102 Len=0 Win=49640

Which works without issue. So it is not an NFS problem, it seems to be 
related to alias IPs.

Do you know a way around this? Or perhaps you can suggest a place where 
I can go to ask. As a quick solution we will just forgo the Alias IP and 
mount directly on the "real" IP. Why can I change protocol (TCP->UDP and 
  vv) to get around it, why can I reboot the NFS client as well. Did we 
create the aliases wrong?

I apologise for the noise in NFS discussion list.

Lund



Dai Ngo wrote:
> The problem seems to be on the TCP connection between the client and the 
> nfsd on
> the server. The portmap and mount requests used UDP and they went OK.
> 
> There are a number TCP RST packets sent from both the client and server, 
> this indicated
> there might be problem with packets lost causing both sides to be out of 
> sync.
> 
> Looks like the server has 2 NICs on the same subnet, 172.20.12.221 and 
> 172.20.12.220.
> Have you tried disable 172.20.12.220 and just use 172.20.12.221 to see 
> if it helps.
> What the output of the 'netstat -in' and 'netstat -rn' on the server and 
> the client look like?
> 
> By the way, where were the packets captured from? on the server or the 
> client. It's more
> useful if you can capture the packets on both sides and attach the raw 
> capture files so
> they can be compared and examined in more details.

-- 
Jorgen Lundman       | <lundman at lundman.net>
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo    | +81 (0)90-5578-8500          (cell)
Japan                | +81 (0)3 -3375-1767          (home)

Reply via email to