On our buster servers the OpenAFS client (1.8.2) has an issue with provisioning an AFS token. When I attempt to get an AFS token it very often takes a long time.

$ aklog (this can up to 30 seconds or more)

After some investigation it looks like aklog is trying the AFS DB servers listed in /etc/openafs/CellSrvDB and timing out on some of the DB servers. Here is the relevant contents of that file:

>example.com           # My Company
192.168.1.102                    #afsdb1.example.com
192.168.1.104                    #afsdb2.example.com
192.168.1.106                    #afsdb3.example.com

Running aklog and sniffing the network I see that the client attempts to contact one of the three afsdb servers. If the one it chooses to contact first is afsdb2 or afsdb3 the connection does not succeed until it finally gives up and tries anther one. If the second one it tries is afsdb2 or afsdb3 it gives up and tries the only remaining one: afsdb1. In other words:

afsdb3 (fail), afsdb2 (fail), afsdb1 (succeeds)
afsdb2 (fail), afsdb3 (fail), afsdb1 (succeeds)
afsdb3 (fail), afsdb1 (succeeds)
afsdb2 (fail), afsdb1 (succeeds)
afsdb1 (succeeds)

This sounds like both afsdb2 and afsdb3 are simply not working. However...

If I remove afsdb1 and afsdb2 from the CellSrvDB leaving only afsdb3 it works instantly every time! That is, the following CellSrvDB works without delay:

>ir.example.com           # My Company
192.168.1.106                    #afsdb3.example.com

Similarly, if afsdb2 is the only entry in CellSrvDB running aklog works without delay. So it cannot be that afsdb2 and afsdb3 are completely broken.

The AFS DB servers are running OpenAFS version 1.6.9.

What the heck is going on?
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to