On 1/17/2017 3:45 PM, Stephen Joyce wrote:
> I know the current best-practice for changing the IP addresses of AFS
> database servers is don't do it.
> 
> But assuming that I want/need to change IPs and have available hardware,
> is the use of clone dbservers the preferred method? I can tolerate short
> service interruptions of up to a few minutes as long as they're planned
> for low-utilization times.

um, not really.

> Initial condition is 3 dbservers ("OLD") located via AFSDB & SRV,

I assume these servers are who, what and when as listed in the
CellServDB file distributed from

  http://www.central.org/csdb.html

and included in every OpenAFS distribution.

> running 1.6.x. Desired final condition is 3 dbservers ("NEW") with
> different IP addresses, also running 1.6.x (for now).

The first thing to be aware of is that any entries in the CellServDB
file take precedence over information provided via DNS.  For recent
OpenAFS releases the precedence order is

 * CellServDB file
 * DNS SRV
 * DNS AFSDB

The Unix cache manager only uses the IPv4 addresses that are provided in
the CellServDB file.  Whereas the Windows cache manager only uses the
host name and performs a DNS A query on the name to obtain the IP
address to use.

The CellServDB file contains entries for physics.unc.edu but not
cas.unc.edu.  Although physics.unc.edu lists the same DB servers as
cas.unc.edu.

The second thing to be aware of is that a UBIK quorum is defined by the
set of dbservers that share a common configuration.  Running OpenAFS
UBIK servers with a mixture of configurations can lead to more than one
dbserver believe it is the master.

The UBIK clone servers are interesting because they are documented as
being non-voting.  That isn't exactly true.  All UBIK dbservers must
maintain connectivity with every other UBIK dbserver in its
configuration.  What is special about clones is not that they don't vote
but that

 1. they cannot vote for themselves
 2. their vote for other servers are received and then discarded
 3. a clone cannot be the source of the best database.

Many sites have experienced problems with UBIK quorums consisting of
more than 3 servers.  Some sites have successfully run with as many as 5
servers.  It really depends on the number of number of clients and the
average rate of application RPCs (VL, PT, ...).

The primary benefit of using clones in OpenAFS is when you wish to
prevent a server with a low IPv4 address from being elected the
coordinator (aka sync site).

> I'm roughing out a procedure, but my current thinking involves..
>
>  add 3 NEW dbservers as r/o clones (restarting db procs)

I don't believe that using clones at this stage is helpful.

Also, you should leave all of the DB servers shutdown for at least 90
seconds when modifying the configuration.

>  modify DNS to show all 6 IPs.
>  'fs newcell' or restart all afsd's (including on servers)

You will also need to update the configuration and restart the
fileservers.  The fileservers are clients of the PT and VL servers but
use the server CellServDB file for their server info.

>  swap clone/non-clone roles so that NEW dbservers are r/w and OLD
> dbservers are r/o clones (restarting db procs). At this point, sync must
> be a non-clone, r/w "NEW" server. 

Using clones to prevent the old servers from becoming coordinator is the
proper use.  You might want to consider only leaving one of the old
servers running at this point.  Be sure to shutdown all dbservers when
the configuration is changed.

> Verify with udebug. Any client afsd's
> not restarted/newcell'ed won't be able to make pt/vl changes.

The fileservers when started modify their VL entry. If their CellServDB
files are not updated as well, then they won't be able to registered.

>  modify DNS to show only 3 NEW IPs
>  'fs newcell' or restart of all afsd's (including on servers)
> 
>  remove 3 OLD dbservers which must be r/o clones (restarting db procs).
> Any client afsd's not restarted/newcell'ed won't be able to query
> pt/vlservers.

correct.

> Because it could take some time to restart/newcell all clients, I'm
> thinking of doing the clone addition/dns steps then waiting some time
> (week+) before doing the role swap and second dns change. Then waiting
> another period of time (week+) before doing the last removal.
> 
> I'm assuming that I can use -auditlog (or even a packet sniffer) to see
> what clients might still be using the OLD dbservers prior to the final
> decommissioning.

rxdebug <dbserver> <port> -peer

> Seems a bit too simple. What am I missing?

Good luck.

Jeffrey Altman

<<attachment: jaltman.vcf>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to