david l goodrich wrote:
On Tue, Mar 24, 2009 at 10:39:24AM -0700, Russ Allbery wrote:
david l goodrich <[email protected]> writes:
The past two nights, I've had one of my AFS fileserver go "down"
I say "down" and not down because it's not totally nonfunctional.
It thinks it's running fine:
sprawl# bos status localhost -localauth
Instance fs, currently running normally.
Auxiliary status is: file server running.
bos status -long is generally more useful. However:
Can do:
sprawl# bos status localhost -localauth -long
Instance fs, (type is fs) currently running normally.
Auxiliary status is: file server running.
Process last started at Mon Mar 23 17:33:57 2009 (3 proc
starts)
Last exit at Mon Mar 23 17:33:57 2009
Command 1 is '/usr/pkg/libexec/openafs/fileserver'
Command 2 is '/usr/pkg/libexec/openafs/volserver'
Command 3 is '/usr/pkg/libexec/openafs/salvager'
sprawl# ps auxw | grep /openafs/
root 376 0.0 0.0 2316 4 ? DW 5:33PM 0:00.83
/usr/pkg/libexec/openafs/volserver
root 727 0.0 0.0 8664 2384 ? IW<a 5:33PM 0:18.29
/usr/pkg/libexec/openafs/fileserver
root 6739 0.0 0.0 240 4 ttyp0 R+ 12:42PM 0:00.00 grep /openafs/
(ksh)
sprawl#
but none of the clients (running 1.4.8 and 1.4.6) are able to
connect to the volumes on the server, despite believing that
d...@chaos:~$ fs checkservers -fast -all
All servers are running.
d...@chaos:~$ vos listvol sprawl
Could not fetch the list of partitions from the server
Possible communication failure
Error in vos listvol command.
Possible communication failure
I suspect your volserver either died or went unresponsive. What version
of OpenAFS is the fileserver? Is there anything incriminating in
VolserLog or FileLog?
I should have been more clear - sprawl is the fileserver, it is
running 1.4.6. There doesn't seem to be anything incriminating
in FileLog, but let me turn up debugging on the volserver process
on sprawl.
Turning on debugging (pkill -TSTP volserver) didn't do much of
anything - VolserLog hasn't been updated since 17:34 yesterday.
It's short:
sprawl# cat VolserLog
Mon Mar 23 17:33:57 2009 Unable to connect to file server; will retry at need
Mon Mar 23 17:33:57 2009 Starting AFS Volserver 2.0
(/usr/pkg/libexec/openafs/volserver)
sprawl#
Did you run kill -TSTP volserver and fileserver 5 times each? That turns
on the maximum amount of debugging.
Thanks,
Jason
_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info