On Thu, 23 Jan 2014 14:58:35 +0000 p...@afs.list.sabi.co.uk (Peter Grandi) wrote:
> The issue is that with 'server/CellServDB' update there is > potentially a DB daemon (PT, VL) restart (even if the rekeying > instructions hint that when the mtime of 'server/CellServDB' > changes the DB daemons reread it) and in any case a sync site > election. The daemons do reread the local configuration if the CellServDB mtime changes. But they don't reinitialize the voting algorithm data and rx connections etc that would be required to incorporate a new dbserver into the quorum. So, for that you need to restart, yes. > > You would need to keep the server-side CellServDB accurate on > > the dbservers in order for them to work, but the client > > CellServDB files can be missing dbservers. [ ... ] > > It would be nice to know more about the details here to make > planning easier in future updates. I'm not sure what additional details you want. You just always make sure the client CellServDB doesn't refer to dbservers that don't exist. So, when you add a new dbserver, don't add it to the client CellServDB until it's up and running. And when you remove a dbserver, remove it from the client CellServDB before decommissioning it. > For example in an ideal world putting more or less DB servers in > the client 'CellServDB' should not matter, as long as one that > belongs to the cell is up; again if the logic were for all types > of client: "scan quickly the list of potential DB servers, find > one that is up and belongs to the cell and reckons is part of > the quorum, and if necessary get from it the address of the sync > site". There is an idea we had pending for performing a VL_ProbeServer multi_rx call on 'vos' startup to see which servers are up before doing anything. The possible argument against this is that it adds a little bit of load and a little bit of delay on every operation, even if all of the servers are up. But maybe it's worth it. Another possible optimization that can be made is that ubik-using utilities could try the lowest-ip dbserver first when doing something that requires db write access (or just randomly pick a site from the lowest "half+1" of the quorum), which would speed up the process in a majority of cases. The argument against that, of course, is that the "lowest IP" heuristic may not always apply in future implementations of ubik, and in general it can make the minority of cases worse (when the lower IPs are unreachable). -- Andrew Deason adea...@sinenomine.net _______________________________________________ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info