Josh Fiske <[EMAIL PROTECTED]> writes: > We have a cell with three older AFS servers (1.2.11). They have been > running great for quite some time. However, twice in the past two weeks > the Volserver has stopped responding on one of the servers. When this > happens, if I do a 'bos status' on the server, it tells me that > everything is running normally. But, I know from trying to do a 'vos > listvol' on the server, that things are not normal, because it times > out. Both times this has happened, the server that the volserver died > on was the sync site for the cell.
The volserver or the vlserver? I'm only confused because you mention sync sites, and I'm used to this being a volserver problem, which doesn't have a sync site. If you do mean volserver, this is a 1.2.11 bug. I think it was fixed in 1.2.13; it's definitely fixed in 1.4.0. > Also of note, we have quite a few volumes that are replicated. When the > volserver died on the sync site, the read-only replicas were no longer > accessible. If a read-only replica is unavailable on one server, > shouldn't the client know to try one of the others? I thought this was > the whole point of replication. Clients fail over if the server is completely off-line, but don't always fail over if the server responds to Rx pings but nothing else, unfortunately. -- Russ Allbery ([EMAIL PROTECTED]) <http://www.eyrie.org/~eagle/> _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
