On Thu, 9 Apr 2009, Steven J. Yellin wrote:

We have two SL5.1 x86_64 systems running kernel 2.6.18-128.1.1.el5. I'll call them "A" and "B". Each exports two file systems, and each runs amd to mount whatever filesystems are requested from elsewhere. Filesystems requested from SL3.0.9 systems mount without problem, and filesystems requested from the SL5.1 systems also mounted without problem until recently. But recently attempts to access from "A" a filesystem exported by "B" or access from "B" a filesystem exported by "A" started being met with a message "Input/output error". Similar requests on an SL3.0.9 system to view a SL5.1 exported one give "Permission denied". I'd appreciate advice. I'll give some more information in the following, and will be glad to add more depending on what others think might be useful. The /etc/hosts.allow files allow portmap, mountd, rquotad, and statd to a set of computers including "A" and "B".
    Unless I've made a mistake, the firewall is open between "A" and "B".
In the following is what went into /var/log/messages on "A" and "B" at the time of an attempt to look from "A" at a filesystem exported by "B", with a perhaps ineffectual paranoid attempt to maintain a low profile by replacing computer names and IP's with "A" and "B". On "A" at the time of the "Input/output error", a set of lines went to /var/log/messages all beginning with "Apr 9 12:04:34 "A" amd[12252]: " and otherwise containing

get_nfs_version: returning NFS(3,tcp) on host "B"
get_nfs_version: returning NFS(3,udp) on host "B"
Using NFS version 3, protocol tcp on host "B"
initializing "B"'s pinger to 30 sec
creating mountpoint directory '/.automount/"B"/root'
file server "B", type nfs, state starts up
Flushed /net/"B"; dependent on "B"
recompute_portmap: NFS version 3 on "B"
Using MOUNT version: 3
amfs_host_mount: NFS version 3
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
fetch_fhandle: NFS version 3
mountd rpc failed: RPC: Can't decode result
/net/"B": mount (amfs_cont): Input/output error

On "B" at that time lines in messages.log began with "Apr 9 12:04:34 "B" mountd[9831]: " and otherwise contained:

authenticated mount request from "A":1023 for /data (/data)
authenticated mount request from "A":1023 for /scratch (/scratch)

Steven Yellin

To narrow the search I'd suggest seeing if a manual nfs mount from A to B (and vise-versa) works.

If the manual mount works then we need to look more closely at how amd is differing from the manual mount, and if it doesn't we have excluded amd from the equation and should look at the nfs setup...

The next step (whether the manual mount works or not) may well be to check /var/log/secure for relevant (e.g. blocking) messages and run

 rpcinfo -p

against A and B to see that all the expected sunrpc services are registered and what ports they are listening on (e.g. in case those are being blocked somewhere...)

btw from the error '...mountd...RPC: Can't decode result' it *sounds* like amd isn't liking (or can't underdstand) the reply it is getting from mountd - but that could be a problem with mountd or amd...

BTW do you have a spare box to try as a 3rd sl5 machine 'C'?

--
/--------------------------------------------------------------------\
| "Computers are different from telephones.  Computers do not ring." |
|       -- A. Tanenbaum, "Computer Networks", p. 32                  |
---------------------------------------------------------------------|
| Jon Peatfield, _Computer_ Officer, DAMTP,  University of Cambridge |
| Mail:  jp...@damtp.cam.ac.uk     Web:  http://www.damtp.cam.ac.uk/ |
\--------------------------------------------------------------------/

Reply via email to