[OpenAFS] Shutdown/startup of entire cell

2013-01-10 Thread Garance A Drosihn

Hi.

Due to circumstances way beyond my control (a major network
upgrade), I am going to need to shutdown our entire AFS cell
this Saturday.  So, tha is less than 36 hours from now.

Basically all our fileservers use disks which are connected
via iSCSI, and the network upgrade may sever all network
connectivity between the AFS fileservers and their vice*
partitions for at least one hour, and possibly two.  Thus I
expect it would be wise to shutdown all fileservers.

And if I'm shutting down all our fileservers, I assume I
should also shut down all database servers.  (True?)

The one nice thing is that I can do a controlled shutdown, in
whatever order seems appropriate.  So I have two questions:

1) When shutting down, should all database servers be shutdown
   before the fileservers, or should the fileservers be shutdown
   first?

2) When starting up, should the fileservers be started first,
   or should the database servers be started first?

I realize this will disrupt many of our AFS clients.  We're
pretty much expecting we will reboot all of our systems by the
time we're done with this.  It's safe to assume that I'm not
happy about any of this, but I have no choice in the matter.

Apologies for running to the list on this.  I expect the answer
is somewhere in the documentation (or maybe even intuitively
obvious), but this issue didn't come up until early this morning,
and I've got about a dozen other (non-AFS) servers which are also
effected by this network upgrade, so I'd like some confirmation
of best practices for this case.

--
Garance Alistair Drosehn= dro...@rpi.edu
Senior Systems Programmer   or   g...@freebsd.org
Rensselaer Polytechnic Institute; Troy, NY;  USA
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


[OpenAFS] Rsync-ing a vice* partition

2013-01-10 Thread Garance A Drosihn

Consider a fileserver with the following partitions on it:

  /vicepa  (in production use)
  /vicepb  (in production use)
  /nextpa  (totally empty)

Assume that all the AFS processes will be shutdown on this
fileserver for a few hours (for unrelated reasons).

As far as AFS is concerned, would it be safe and reasonable
to use rsync to duplicate all files on /vicepa to /nextpa,
dismount both partitions, and then mount what was /nextpa
as /vicepa?  Or is that playing with fire, such that it'd be
much safer to move the AFS volumes via standard AFS commands
while AFS is running?

It also happens that every volume on /vicepa is replicated
on multiple AFS fileservers.  (some are RO's for volumes
where the RW is on this /vicepa, and some are RO's for
volumes where the RW is on other fileservers).

This is not an urgent issue.  I'm just wondering.

--
Garance Alistair Drosehn= dro...@rpi.edu
Senior Systems Programmer   or   g...@freebsd.org
Rensselaer Polytechnic Institute; Troy, NY;  USA
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


RE: [OpenAFS] Shutdown/startup of entire cell

2013-01-10 Thread Brandon Allbery
 1) When shutting down, should all database servers be shutdown
 before the fileservers, or should the fileservers be shutdown
 first?
 
 2) When starting up, should the fileservers be started first,
 or should the database servers be started first?

You need the database servers to be up while you shut down/start up the 
fileservers, since one of the databases in question records which fileservers 
have which volumes and fileservers register/unregister volumes with that 
database during startup/shutdown.

If there's no need for the database servers themselves to be shut down, you can 
leave them up, but I'm not sure that actually gains you anything.

--
brandon s allbery kf8nh   sine nomine associates
allber...@gmail.com  ballb...@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonadhttp://sinenomine.net
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Shutdown/startup of entire cell

2013-01-10 Thread Garance A Drosihn

On 1/10/13 11:14 PM, Jeffrey Altman wrote:

You can shutdown the file servers without shutting down the
database servers.  During the outage the database servers
may lose the ability to elect a master.  Therefore you should
avoid making any database changes during the outage window.


I should have added that the database servers are all on a
single network switch, so unless something goes REALLY wrong
they won't lose contact with each other.  I can't imagine we
would be making any database changes.  I'm just hoping to
survive the upgrade.



I would run one file server with a single local disk partition
containing a readonly site for the root.afs and root.cell volumes.
This could be on one of the database servers.


Hmm.  Clever idea!



I would shutdown any file server with network attached storage
for the length of the outage window.

I would also place a README file in the root.cell root directory
describe the outage for those that might check.

If you are going to shutdown the database servers.  Shut them
down after the fileservers and restart them before the file

 servers.


Jeffrey Altman


Thanks for the quick answers!
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Shutdown/startup of entire cell

2013-01-10 Thread Thomas Kula
On Thu, Jan 10, 2013 at 10:50:27PM -0500, Garance A Drosihn wrote:
 Hi.
 
 Due to circumstances way beyond my control (a major network
 upgrade), I am going to need to shutdown our entire AFS cell
 this Saturday.  So, tha is less than 36 hours from now.

Others have addressed the proper order for shutting down AFS 
servers well, so I won't touch on them. 

I will point out that when I was at UMich we ensured that all
of our AFS fileservers were restarted at least once a year. 
I did advocate for emptying fileservers before doing that, 
although that never took hold, for various reasons, but with 
modern hardware it wasn't too onerous of a process. The bulk
of the time was waiting for callbacks to be broken.

We did this so that we knew that a fileserver would properly
restart --- after a while, you've upgraded various things,
fixed stuff, etc., and it was good to have a sanity check
that nothing creeped in during that time. And, with Murphy
around, you know that at some point your fileserver was
going to get its power cord yanked, be the victim of hitting
the wrong power switch, etc. It was nice to know that in 
that case, things would just come back when power was restored,
and if something had creeped in over the course of the year,
it was nice that it happened when a couple sysadmins where
in the office, well-rested and well-caffeinated and ready
to handle weird problems, rather than getting a bleary-eyed
sleepy beep at 3am

-- 
Thomas L. Kula | k...@tproa.net | http://kula.tproa.net/
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Shutdown/startup of entire cell

2013-01-10 Thread Garance A Drosihn

On 1/10/13 11:45 PM, Thomas Kula wrote:

On Thu, Jan 10, 2013, Garance A Drosihn wrote:

Hi.

Due to circumstances way beyond my control (a major network
upgrade), I am going to need to shutdown our entire AFS cell
this Saturday.  So, tha is less than 36 hours from now.


Others have addressed the proper order for shutting down AFS
servers well, so I won't touch on them.

I will point out that when I was at UMich we ensured that all
of our AFS fileservers were restarted at least once a year.
I did advocate for emptying fileservers before doing that,
although that never took hold, for various reasons, but with
modern hardware it wasn't too onerous of a process. The bulk
of the time was waiting for callbacks to be broken.



It happens that we have done controlled shutdowns and restarts
of all our fileservers recently, although that is more by
dumb luck than good planning.  We did one restart of them all
in the summer of 2012, and one in the summer of 2011.  Before
that, I think they had gone four or five years without a restart.

I expect we're going to make a point of doing such restarts on
more regular and planned basis!

--
Garance Alistair Drosehn= dro...@rpi.edu
Senior Systems Programmer   or   g...@freebsd.org
Rensselaer Polytechnic Institute; Troy, NY;  USA
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Rsync-ing a vice* partition

2013-01-10 Thread Derrick Brashear
as long as you preserve owner, group and mode you're fine. -o (owner)
-g (group) -p (perms) needed, but -a (archive)
implies all those. so the usual -auv that people use is fine.

On Thu, Jan 10, 2013 at 11:02 PM, Garance A Drosihn dro...@rpi.edu wrote:
 Consider a fileserver with the following partitions on it:

   /vicepa  (in production use)
   /vicepb  (in production use)
   /nextpa  (totally empty)

 Assume that all the AFS processes will be shutdown on this
 fileserver for a few hours (for unrelated reasons).

 As far as AFS is concerned, would it be safe and reasonable
 to use rsync to duplicate all files on /vicepa to /nextpa,
 dismount both partitions, and then mount what was /nextpa
 as /vicepa?  Or is that playing with fire, such that it'd be
 much safer to move the AFS volumes via standard AFS commands
 while AFS is running?

 It also happens that every volume on /vicepa is replicated
 on multiple AFS fileservers.  (some are RO's for volumes
 where the RW is on this /vicepa, and some are RO's for
 volumes where the RW is on other fileservers).

 This is not an urgent issue.  I'm just wondering.

 --
 Garance Alistair Drosehn= dro...@rpi.edu
 Senior Systems Programmer   or   g...@freebsd.org
 Rensselaer Polytechnic Institute; Troy, NY;  USA
 ___
 OpenAFS-info mailing list
 OpenAFS-info@openafs.org
 https://lists.openafs.org/mailman/listinfo/openafs-info




-- 
Derrick
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Rsync-ing a vice* partition

2013-01-10 Thread Dirk Heinrichs
Am Freitag 11 Januar 2013, 00:14:21 schrieb Derrick Brashear:

 as long as you preserve owner, group and mode you're fine. -o (owner)
 -g (group) -p (perms) needed, but -a (archive)
 implies all those. so the usual -auv that people use is fine.

Usual for me is -acv (c = checksum), will take a bit longer, though. And, 
depending on the filesystem on /vicepx, I'd add --exclude lost+found.

Bye...

Dirk
-- 
Dirk Heinrichs dirk.heinri...@altum.de
Tel: +49 (0)2471 209385 | Mobil: +49 (0)176 34473913
GPG Public Key C2E467BB | Jabber: dirk.heinri...@altum.de


signature.asc
Description: This is a digitally signed message part.