Re: [OpenAFS] Re: State of the Michigan shadow system (long)

2010-12-20 Thread Thomas Kula
On Mon, Dec 20, 2010 at 07:00:18PM -0500, Steve Simmons wrote:
> 
> On Dec 20, 2010, at 3:29 PM, Andrew Deason wrote:
> 
> > On Mon, 20 Dec 2010 14:46:38 -0500
> > Steve Simmons  wrote:
> > 
> >> A shadow volume is a read-only remote clone of a primary volume. We
> >> had to create some terminology here, and 'primary' is what we called
> >> the real-time, in-use, r/w production volume. A remote clone closely
> >> resembles a read-only replica of a volume, but differs in several
> >> important respects.
> > 
> > By 'read-only' do you just mean in intended usage? I may be way off, but
> > my memory of shadow volumes (as implemented in openafs.org code) is that
> > they are are virtually identical to the primary, and are not marked as
> > RO volumes or anything like that in the underlying namei metadata. So, a
> > fileserver could theoretically attach it and modify it, though it was
> > intended that the lack of an entry in the vldb would prevent clients
> > from accessing it.
> 
> Yes, 'read-only' is sloppy terminology on my part. 'Enforcement' of the 
> read-only nature was done by virtue of the shadow being invisible to most 
> things that access volumes.

Actually, If I'm remembering correctly, down in the bit of fileserver
code that determined access to a particular object we put in something
like

 if ( VolumeIsShadow && ! YouAreAnAdmin ) {
return GOAWAYYOUHOSER;
 }

Just wrapped up more fancy. This was in the Michigan code,
not something we pulled in from what was in openafs.org code,
again if my memory is serving me well.

-- 
Thomas L. Kula | k...@tproa.net | http://kula.tproa.net/
___
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info


Re: [OpenAFS] Re: State of the Michigan shadow system (long)

2010-12-20 Thread Steve Simmons

On Dec 20, 2010, at 3:29 PM, Andrew Deason wrote:

> On Mon, 20 Dec 2010 14:46:38 -0500
> Steve Simmons  wrote:
> 
>> A shadow volume is a read-only remote clone of a primary volume. We
>> had to create some terminology here, and 'primary' is what we called
>> the real-time, in-use, r/w production volume. A remote clone closely
>> resembles a read-only replica of a volume, but differs in several
>> important respects.
> 
> By 'read-only' do you just mean in intended usage? I may be way off, but
> my memory of shadow volumes (as implemented in openafs.org code) is that
> they are are virtually identical to the primary, and are not marked as
> RO volumes or anything like that in the underlying namei metadata. So, a
> fileserver could theoretically attach it and modify it, though it was
> intended that the lack of an entry in the vldb would prevent clients
> from accessing it.

Yes, 'read-only' is sloppy terminology on my part. 'Enforcement' of the 
read-only nature was done by virtue of the shadow being invisible to most 
things that access volumes.

> 
>> First and foremost, it does not appear in the vldb. Thus there is no
>> possibility of the read-only copy coming into production.
> 
> I understand this was probably the best way to do this at the time, but
> this alone does not prevent the volume from getting used. Since vldb
> results are cached by clients and an administrator could screw up vldb
> data somehow, it's possible for someone to access the wrong volume.

Correct.

>> Shadow volumes could be detected only on the server on which they
>> reside. Modification were made to vos listvol for that purpose. A bit
>> in the volume header was selected for distinguishing a shadow from a
>> primary volume; I believe that was the only modification made to the
>> volume header file. This work is also done.
> 
> By "done" does this mean you just implemented it at umich, or it's in
> the openafs.org tree? Is the volume header bit you're referring to
> inService (or another existing flag), or did you use a separate field
> specifically for shadows?

That's how we implemented it, yes. I don't believe the source is in the public 
openafs.org source tree anywhere, tho I think Dan Hyde has it incorported as a 
branch in  his git archive. I'd ask him, but he's on vacation this week.

I don't know off the top of the head which bit he used. In our disucssions at 
the time we used one of the reserved bits, but in full knowledge that such 
might have to change when/if time came to make the implementation more public.

>> I think we were sliding towards a transparent upward-compatible
>> replacement of the vldb as well. Based purely on how I imagine the
>> vldb to work :-), it should be possible to add shadow data to it and
>> define some additional rpcs. Users of the old rpcs would only get the
>> data that was in the 'legacy' vldb, users of the new rpcs would get
>> shadow data as well. That's a door folks may not want opened yet, but
>> it seems a better choice than bolting a separate shadow-oriented vldb
>> to the side.
> 
> I thought the bigger problem is not the compatibility of the
> client<->vlserver interface, but rather the vlserver<->vlserver
> interface; that is, the structure of the VL entries in ubik, since those
> structures doesn't have any spare fields (although LockAfsId is not
> used). You can probably play some games to keep enough compatibility
> with older vlservers, but it requires some thought.

Again, loose terminology on my part, 'cause I didn't want to drown folks in 
detail. But since you were kind enough to ask:

Yeah, it's hard. One chunk of what makes it hard is that the vldb format is 
fixed and there's little or no space to wedge new stuff into it. Another 
complicating factor is this whole idea of volume families and determining if, 
when and how we want to be tracking the inter-volume relationships and 
dependencies. As a particular example, in our existing implementation it's 
perfectly possible for shadow A' (A-prime) of volume A to be overwritten by as 
a shadow of volume B. Sometimes you want that: B could be a shadow of A, and 
we're reducing overhead on A by refreshing B from A'. In a sense, you might 
think of B as more properly A''. But how should such relationships be detected, 
and what if any limitations should be imposed on such refreshes? Lacking a good 
taxonomy of what a shadow volume is and how it relates to the primary, we can't 
come up with a good database definition to encode that. Lacking that 
definition, we can't come up with a proposal that would allow shadow data to be 
placed in the vldb in any upwards-compatable method.

The decision to leave shadows outside of the vldb ultimately begs the question 
of how to manage shadows and volume families, and IMHO is acceptable only as a 
short-term case.

Coming to the more specific vlserver-vlserver-ubiq questions - yeah, that's 
hard. If all we're thinking of is simple records that could (please, god, 
please!) be sh