This is a quick note to discuss our experiences with shadows thus far. We'd hoped to be done long before now, but other work keeps getting in the way of pushing this forward. We are now in early pilot, and hope to have an initial set in production by end of summer.

We (well, Dan Hyde) found that the shadow code was largely complete. We did find one serious bug that could cause lossage of the original volume; I believe Dan has forwarded that fix to the group.

One of the biggest problems we bumped into was only semi-technical. It was the lack of definition of what a shadow *should* be as opposed to what a shadow is. We made decisions that suit us, but they necessarily reflect our intended use for shadows. Your mileage may vary, and we're certainly interested in and amenable to changes if the community comes to a decision on them.

Our purpose: disaster recovery by means of invisible replicated volumes. We envision a set of DR hosts with a shadow volume that replicates a production volume. If a host hard-fails and isn't likely to come back in a reasonable amount of time, we will go to the shadow server and promote the relevant volumes from shadow to production. At that time the vldb is modified to show the shadow host as the real host, and the on-server copy of the volume is changed from type 'shadow' to type 'production' (handwave, handwave). "A reasonable amount of time" is site-dependent, of course.

Shadows do not appear in the vldb. Their existence is known only to the host which contains a particular shadow. Thus one might have many shadows, up to and including one on each vice partition in a cell. There is no required relationship of name, parenthood, etc, between a shadow and the volume from which it was created. (For the rest of this note, we'll refer to the original volume as the parent, and a shadow of a parent as a child.)

Simple shadowing of a parent onto a non-existent child creates a new volume identical to the parent in all but name and visibility. Incrementally shadowing a parent onto a child brings the child up-to- date with the parent, and is a proportionately faster operation.

Bad things you can do:

Shadowing a volume on to another volumes child results in a jumbled and probably useless volume. We don't think it should be permitted, but lacking a more extensive and better-defined child/parent relationship we don't see a way to prevent it. Properly that relationship should be in the vldb, but that requires much more extensive changes than (a) we were willing to make and (b) we thought the community would accept without pre-agreement as to what that relationship would be.

Shadowing a shadow onto itself results in disaster. We have now forbidden that in the code.

Shadowing onto a production volume should and does fail. I don't recall if we had to modify the code for that, but if so, that'll be part of the patch when we release.

There is now a vos command which promotes a shadow to production. It does nothing to the parent, which will continue to exist on the original server/vice partition and could be re-promoted with the appropriate vos sync command.

When a shadow is created, there is a mark in its volume header which indicates it is a clone. During the salvage process shadows are handled properly. If I recall correctly, we had to make no changes to the salvager for this, but if shadows were to appear in the vldb that might be a different story.

I don't recall if you can have a shadow named after its parent on the same server and vice partition as the parent.

We found a great deal of code that implies a long-term relationship between parent and child was intended, but that code is clearly incomplete. Unfortunately it's incomplete to such a degree that it's not possible to tell what the author(s) intended that relationship to be.

More detail on our intended usage:

For every AFS server we have, we will have a shadow server. When a volume is created on a server, a shadow is quickly created (semi- automated process) on the designated shadow server. When a volume is moved from one server to another, the shadow is removed from the old shadow host and created on the new host. As often as we can manage without affecting server performance (ie, TBD), we will incrementally refresh parents to children.

When a disaster occurs (an entire server is lost and not recoverable in a reasonable amount of time), the shadow server is brought on line. Assuming we've done our job correctly, user volumes simply reappear with a new location. The content of those volumes is as up- to-date as the most recent refresh of the shadow. Our seat-of-the- pants guess is that we can refresh each shadow about 4 times a day without affecting overall performance.

"A semi-automated process:" it happens out of cron. A shadow server gets the volumes list for the host it's shadowing, and does the creation/updating as needed. Since a shadow server knows what shadows it's got (think 'vos listvol'), it also can duplicate shadows it doesn't need any more. Note this means when a volume is moved, some interesting race conditions may ensue. The easiest way to fix those race conditions is by putting the shadows into the vldb, but again, that is a bigger change than we wanted to put in without a broad agreement from the community.

Some fallout/things discovered while testing the above - there's no real need to create a shadow at volume creation time; doing an incremental onto a non-existent shadow creates the shadow in exactly the same manner as doing a full shadow. Some might regard this as a bug; for the moment we're taking advantage of it.

Our new, second data center just went on line this week. With that in place, we can start the initial pilot work on shadows as disaster recovery.
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to