This isn't exactly a RedHat issue, but the box happens to be a RedHat box
and I've had no luck looking elsewhere.  Bear with me, the problem is a little
convoluted.

We're running a RedHat 6.0 box with the latest-but-one version of NetAtalk,
and serving Mac clients.  Some of the filesystems that the machine exports
are local, while some are mounted from a Solaris 2.6 machine via NFS.

This setup worked fine up until a few weeks ago, when we had a variety of
other network hiccups involving the NFS server.  I managed to fool around and
get most of the problems solved, but then gradually Atalk file sharing went
to hell.

The symptoms: one by one, users began to lose _all_ access to their Atalk-
shared files.  No write, no copy, no create, no read.  The error message is
either "file in use elsewhere" or "insufficient permissions".

In at least one case, I've verified that the user isn't trying to log in from
more than one Mac, and he does have all the right file permissions on the
server.  fuser doesn't report any of his files being in use.

I think this is the relevant section of /var/log/messages:

Oct  3 18:30:21 kgb-fs-pc afpd[15480]: login ejvanlen (uid 15951, gid 15951)
Oct  3 18:30:30 kgb-fs-pc kernel: RPC: doubly enqueued task! 
Oct  3 18:30:30 kgb-fs-pc kernel: RPC: failed to add task to queue: error: -11! Oct  3 
18:30:30 kgb-fs-pc kernel: statd: couldn't bind to server localhost - giving up. 
Oct  3 18:30:30 kgb-fs-pc kernel: RPC: task of released request still queued! 
Oct  3 18:30:30 kgb-fs-pc kernel: RPC: (task is on xprt_pending) 
Oct  3 18:30:30 kgb-fs-pc kernel: lockd: failed to monitor 128.135.84.79 
Oct  3 18:30:38 kgb-fs-pc kernel: RPC: doubly enqueued task! 
Oct  3 18:30:38 kgb-fs-pc kernel: RPC: failed to add task to queue: error: -11! 

...and so forth.  This has all the cosmetic properties of an NFS problem,
but as it happens I can shell into the server and look around in the guy's
files with no trouble at all, so apparently it's some kind of NFS/NetAtalk
interaction.  The 128.135.84.79 address is our NFS server.

Things I've tried:

- Massive reboot: the networking hiccups of a few weeks ago seemed to leave some
machines' networking stacks in a state of confusion, such that the trouble
didn't go away until the system was restarted.  We've tried that on the server
and on the clients with this problem, and so far no dice.

- Turning off lockd on the NFS server: tried that this afternoon at the wild-
guess suggestion of a colleague.  So far no visible effect.

- Relocating files: thinking that some process might have decided that the 
files were actually in use, copied them to a temp directory and had the user
try again; no luck.

Things I have not tried yet:

- Massive upgrade: Red Hat 6.0 is buggy, buggy, buggy, and Linux NFS has never
had a great rep.  I'd upgrade the box except that I'm up to my ears in other
stuff right now and I don't want to take that kind of time unless I'm fairly
sure I know what's wrong.

- Setting the whole kit on fire.

Actual questions:

- What does _any_ of that log file mean?  "statd: couldn't bind to server
localhost"??  What does that mean?  Failed to monitor the NFS server?  Wha?

Thanks for any thoughts, friends, I'm stumped here and feeling like an idiot.

-m

-- 
Michael Jinks, IB
Systems Administrator, CCCP
finger [EMAIL PROTECTED] for public key
Vote Duke! http://www.entertaindom.com/pages/duke2000/home.jsp



_______________________________________________
Redhat-list mailing list
[EMAIL PROTECTED]
https://listman.redhat.com/mailman/listinfo/redhat-list

Reply via email to