I'm having a similiar problem from a linux client to a sNV_b103 server.

For me though the mount works fine, it's the NFS accesses that hang.

Here's a snoop that shows what the server is seeing:

releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C GETATTR3 
FH=6C39
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM TCP D=800 
S=2049 Ack=659557530 Seq=471989561 Len=0 Win=53688 
Options=<nop,nop,tstamp 181189 2635184551>
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R GETATTR3 OK
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM TCP D=2049 
S=800 Ack=471989677 Seq=659557530 Len=0 Win=512 Options=<nop,nop,tstamp 
2635184551 181189>
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C ACCESS3 
FH=6C39 (read,lookup,modify,extend,delete)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R ACCESS3 
OK (read,lookup)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C 
READDIRPLUS3 FH=6C39 Cookie=0 for 512/4096
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C 
READDIRPLUS3 FH=6C39 Cookie=0 for 512/4096 (retransmit)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM TCP D=800 
S=2049 Ack=659557830 Seq=471991813 Len=0 Win=53688 
Options=<nop,nop,tstamp 181209 2635184552>
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C 
READDIRPLUS3 FH=6C39 Cookie=0 for 512/4096 (retransmit)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM TCP D=800 
S=2049 Ack=659557990 Seq=471991813 Len=0 Win=53688 
Options=<nop,nop,tstamp 187188 2635244552>
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C FSSTAT3 
FH=6C39
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R FSSTAT3 OK
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM TCP D=2049 
S=800 Ack=471989801 Seq=659558126 Len=0 Win=512 Options=<nop,nop,tstamp 
2635279692 181189,no
p,nop,sack 471993825-471993997>
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C 
READDIRPLUS3 FH=6C39 Cookie=0 for 512/4096 (retransmit)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM NFS R 
READDIRPLUS3 OK 12 entries (No more)
releng1.RelEng.Egenera.COM -> Galileo.RelEng.Egenera.COM NFS C 
READDIRPLUS3 FH=6C39 Cookie=0 for 512/4096 (retransmit)
Galileo.RelEng.Egenera.COM -> releng1.RelEng.Egenera.COM TCP D=800 
S=2049 Ack=659558286 Seq=471996009 Len=0 Win=53688 
Options=<nop,nop,tstamp 199208 2635364552>

This (to my uneducated eye) shows the server repling multiple times, and 
the client retransmitting the READDIR3 multiple times.

I'm not familiar enough with Linux (yet) to run the equivalent of snoop 
(what is it? ethereal?) or I'd include traces from the client also.

The client (and server) are both IBM x346 eServers. Like the link below, 
both have BMC(like an LOM) modules to manage the machine. Also like the 
link below these modules share the ethernet port with one of the 
broadcom (not intel) ethernet interfaces built into the motherboard. 
However in this case:
 
1) Neither the server nor the client are using the shared broadcom 
interface on the motherboard.
2) The client is using the other broadcom interface on the MB.
3) The Server is using a LACP aggr group (setup with dladm [with 
mtu=9000]) built up from 4 intel e1000g interfaces on a PCI card.

So if packets are being lost on the return trip from the server to the 
client, I don't think it's for the same reason, though it may be similiar.
Note this on a ZFS filesystem, but from the traces above I'm not inclied 
to think that has anything to do with the problem.

The server does have other ethernet interfaces on other subnets. However 
the testing above was careful to do the NFS mount with only the IP of 
the one interface, that one interface is also the one used for the 
default route, and snoop was running on the others and showed zero 
incoming or outgoing traffic (to or from the clients IP) durring this 
same period.

Anyone got any ideas?

  -Kyle





Jorgen Lundman wrote:
>
> I stumbled across this entry:
>
> http://blogs.sun.com/shepler/entry/port_623_or_the_mount
>
> and even though we do not see this issue with port 623, but rather 
> 664. But sure enough, it was sending SYN/ACK, then timeout until RST.
>
> I waited for the port to the released, told inetd to listen on port 
> 664 and voila, mount works fine again.
>
> We use Supermicros with Intel? 82573V and 82573L.
>
> I would send Shepler my thanks but comments are disabled.
>
>
>
> Useless logs:
>
>
> # mount -o proto=tcp,vers=3 172.20.12.226:/export/src /mnt
>
>  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100005 (MOUNT) 
> vers=3 proto=UDP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=39049
>  172.20.12.6 -> 172.20.12.226 MOUNT3 C Null
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Null
>  172.20.12.6 -> 172.20.12.226 MOUNT3 C Mount /export/src
> 172.20.12.226 -> 172.20.12.6  MOUNT3 R Mount OK FH=076E Auth=unix
>  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS) 
> vers=3 proto=TCP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38337 Syn Seq=592414549 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.226 -> 172.20.12.6  TCP D=38337 S=2049 Syn Ack=592414550 
> Seq=2210245643 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38337 Ack=2210245644 
> Seq=592414550 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 NFS C NULL3
> 172.20.12.226 -> 172.20.12.6  TCP D=38337 S=2049 Ack=592414670 
> Seq=2210245644 Len=0 Win=49520
> 172.20.12.226 -> 172.20.12.6  NFS R NULL3
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38337 Ack=2210245672 
> Seq=592414670 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38337 Fin Ack=2210245672 
> Seq=592414670 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=38337 S=2049 Ack=592414671 
> Seq=2210245672 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=38337 S=2049 Fin Ack=592414671 
> Seq=2210245672 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38337 Ack=2210245673 
> Seq=592414671 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS) 
> vers=3 proto=TCP
> 172.20.12.226 -> 172.20.12.6  PORTMAP R GETPORT port=2049
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38338 Syn Seq=3614232918 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
> 172.20.12.226 -> 172.20.12.6  TCP D=38338 S=2049 Syn Ack=3614232919 
> Seq=2210460804 Len=0 Win=49640 Options=<mss 1460,nop,wscale 
> 0,nop,nop,sackOK>
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38338 Ack=2210460805 
> Seq=3614232919 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 NFS C NULL3
> 172.20.12.226 -> 172.20.12.6  TCP D=38338 S=2049 Ack=3614233039 
> Seq=2210460805 Len=0 Win=49520
> 172.20.12.226 -> 172.20.12.6  NFS R NULL3
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38338 Ack=2210460833 
> Seq=3614233039 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38338 Fin Ack=2210460833 
> Seq=3614233039 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=38338 S=2049 Ack=3614233040 
> Seq=2210460833 Len=0 Win=49640
> 172.20.12.226 -> 172.20.12.6  TCP D=38338 S=2049 Fin Ack=3614233040 
> Seq=2210460833 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=38338 Ack=2210460834 
> Seq=3614233040 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=664 Rst Ack=0 
> Seq=3456416233 Len=0 Win=49640
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=664 Syn Seq=3507413975 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
>
>  172.20.12.6 -> 172.20.12.226 TCP D=2049 S=664 Syn Seq=3507413975 
> Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
>
> # netstat
> 172.20.12.6.664      172.20.12.226.2049       0      0 49640      0 
> SYN_SENT
>
>
> After inetd hack:
>
> # netstat
>       *.664                *.*                0      0 49152      0 BOUND
>
> # mount -o proto=tcp,vers=3 172.20.12.226:/export/src /mnt
> 172.20.12.6 -> 172.20.12.226 TCP D=2049 S=661 Syn Seq=1448210229 Len=0 
> Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
>
> # df
> 172.20.12.226:/export/src
>                         24T    11G    24T     1%    /mnt
>
>
>
> Jorgen Lundman wrote:
>>
>> Ok, it still happens even when not using aliases, it just took longer 
>> to turn up.
>>
>> Attempting to mount (snoop running on NFS client)
>
>



Reply via email to