Hi,

We're noticing an odd behavior on our AFS cluster with 2 fileservers,
and 200+ active clients, each generally reading from a couple hundred of
the same files every minute (thus, they should be cached). We have begun
seeing these messages in FileLog:

Tue Apr  3 00:32:04 2012 CB: ProbeUuid for host 0x--- (---:7001) failed -01

Over time these errors become more and more frequent. The problem is that
the client who hits this issue will experience a 5-10s delay in accessing a
file, which hurts performance significantly. The clients are 1.6pre1, and
the server is 1.4.14

Using afsmonitor, I do see that one of the clients hitting this issue (I
haven't checked whether all client have the problem, but many seem to) has
17M callbacks alloced. Could that be suspect? Are there any other
statistics I can provide to get to the bottom of this?

Here are the fileserver parameters in BosConfig: parm
/usr/lib/openafs/fileserver -L -p 200 -busyat 600 -rxpck 1000 -s 3000 -l
3000 -cb 1000000 -b 500 -vc 4800 -pctspare 5
Here are the client OPTIONS="-cachedir /mnt/cache/openafs -daemons 16 -stat
50000 "

Best,
Ken

Reply via email to