[OpenAFS] Status of "vos shadow"

Steve Simmons Wed, 20 Jun 2007 13:18:12 -0700

This is a quick note to discuss our experiences with shadows thusfar. We'd hoped to be done long before now, but other work keepsgetting in the way of pushing this forward. We are now in earlypilot, and hope to have an initial set in production by end of summer.

We (well, Dan Hyde) found that the shadow code was largely complete.We did find one serious bug that could cause lossage of the originalvolume; I believe Dan has forwarded that fix to the group.

One of the biggest problems we bumped into was only semi-technical.It was the lack of definition of what a shadow *should* be as opposedto what a shadow is. We made decisions that suit us, but theynecessarily reflect our intended use for shadows. Your mileage mayvary, and we're certainly interested in and amenable to changes ifthe community comes to a decision on them.

Our purpose: disaster recovery by means of invisible replicatedvolumes. We envision a set of DR hosts with a shadow volume thatreplicates a production volume. If a host hard-fails and isn't likelyto come back in a reasonable amount of time, we will go to the shadowserver and promote the relevant volumes from shadow to production. Atthat time the vldb is modified to show the shadow host as the realhost, and the on-server copy of the volume is changed from type'shadow' to type 'production' (handwave, handwave). "A reasonableamount of time" is site-dependent, of course.

Shadows do not appear in the vldb. Their existence is known only tothe host which contains a particular shadow. Thus one might have manyshadows, up to and including one on each vice partition in a cell.There is no required relationship of name, parenthood, etc, between ashadow and the volume from which it was created. (For the rest ofthis note, we'll refer to the original volume as the parent, and ashadow of a parent as a child.)

Simple shadowing of a parent onto a non-existent child creates a newvolume identical to the parent in all but name and visibility.Incrementally shadowing a parent onto a child brings the child up-to-date with the parent, and is a proportionately faster operation.


Bad things you can do:

Shadowing a volume on to another volumes child results in a jumbledand probably useless volume. We don't think it should be permitted,but lacking a more extensive and better-defined child/parentrelationship we don't see a way to prevent it. Properly thatrelationship should be in the vldb, but that requires much moreextensive changes than (a) we were willing to make and (b) we thoughtthe community would accept without pre-agreement as to what thatrelationship would be.

Shadowing a shadow onto itself results in disaster. We have nowforbidden that in the code.

Shadowing onto a production volume should and does fail. I don'trecall if we had to modify the code for that, but if so, that'll bepart of the patch when we release.

There is now a vos command which promotes a shadow to production. Itdoes nothing to the parent, which will continue to exist on theoriginal server/vice partition and could be re-promoted with theappropriate vos sync command.

When a shadow is created, there is a mark in its volume header whichindicates it is a clone. During the salvage process shadows arehandled properly. If I recall correctly, we had to make no changes tothe salvager for this, but if shadows were to appear in the vldb thatmight be a different story.

I don't recall if you can have a shadow named after its parent on thesame server and vice partition as the parent.

We found a great deal of code that implies a long-term relationshipbetween parent and child was intended, but that code is clearlyincomplete. Unfortunately it's incomplete to such a degree that it'snot possible to tell what the author(s) intended that relationship tobe.


More detail on our intended usage:

For every AFS server we have, we will have a shadow server. When avolume is created on a server, a shadow is quickly created (semi-automated process) on the designated shadow server. When a volume ismoved from one server to another, the shadow is removed from the oldshadow host and created on the new host. As often as we can managewithout affecting server performance (ie, TBD), we will incrementallyrefresh parents to children.

When a disaster occurs (an entire server is lost and not recoverablein a reasonable amount of time), the shadow server is brought online. Assuming we've done our job correctly, user volumes simplyreappear with a new location. The content of those volumes is as up-to-date as the most recent refresh of the shadow. Our seat-of-the-pants guess is that we can refresh each shadow about 4 times a daywithout affecting overall performance.

"A semi-automated process:" it happens out of cron. A shadow servergets the volumes list for the host it's shadowing, and does thecreation/updating as needed. Since a shadow server knows what shadowsit's got (think 'vos listvol'), it also can duplicate shadows itdoesn't need any more. Note this means when a volume is moved, someinteresting race conditions may ensue. The easiest way to fix thoserace conditions is by putting the shadows into the vldb, but again,that is a bigger change than we wanted to put in without a broadagreement from the community.

Some fallout/things discovered while testing the above - there's noreal need to create a shadow at volume creation time; doing anincremental onto a non-existent shadow creates the shadow in exactlythe same manner as doing a full shadow. Some might regard this as abug; for the moment we're taking advantage of it.

Our new, second data center just went on line this week. With that inplace, we can start the initial pilot work on shadows as disasterrecovery.

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info

[OpenAFS] Status of "vos shadow"

Reply via email to