Re: [OpenAFS-devel] meaning of VNOVOL, VOFFLINE, etc.

Jeffrey Altman Sat, 19 May 2012 00:35:04 -0700

On 5/8/2012 1:55 PM, Tom Keiser wrote:
> On Fri, May 4, 2012 at 6:35 PM, Jeffrey Altman
> <[email protected]> wrote:
>> Tom:
>>
>> The 1991 Zayas specifications are lacking in many regards.  For
>> starters, the Vxxx error codes are only defined for the Vol/VL RPCs and
>> not for the FS/CM RPCs.  The use of the Vxxxx error codes in the FS/CM
>> RPCs is left undefined and yet those errors are reported to cache
>> managers by file servers.
>>
> Hi Jeff,
>
> I was wondering if you were going to raise this distinction.  It is
> indeed troubling how little the FS/CM document has to say about
> this--and many other--issues.


I suspect the protocol documents were intended as a guide for Transarc
developers to use as part of internal training on the code base.  They
certainly were not written at the level necessary to produce an
independent implementation.

>> I think it was 2004 or perhaps early 2005 when a large user was
>> concerned about VLDB scalability due to the introduction of tens of
>> thousands of Windows clients into the environment.  Each time a VNOVOL,
>> VMOVED, VOFFLINE, VSALVAGE or VNOSERVICE error was received the Windows
>> client would query the VLDB and retry the request after 2 seconds.  If a
>> volume couldn't be served from a file server this process would be
>> repeated.  This is exacerbated by the behavior of the Explorer Shell
>> which reads the contents of directories it displays searching for
>> various metadata.  As a result the VLDB servers were struggling under
>> the load.  It wasn't going to be possible to make the VLDB servers
>> process more requests so it was important to reduce the number of
>> requests that were sent.
>>
>> The discussions that took place came to the conclusion that the
>> description of VNOVOL was ambiguous and its meaning based upon usage
>> should be that the volume is not present.  With that interpretation a
>> client could restrict the number of VLDB lookups for a volume.  I do not
>> remember if these discussions took place at a hackathon, a workshop, or
>> on Zephyr.   Such use of the error codes didn't make a difference to
>> deployed clients since they acted on all error codes in an identical
>> fashion nor did it result in a protocol change given existing use in the
>> file server.
>>
> Thanks for the explanation; this is precisely what I was hoping to
> find out.  Given this, I will push the VNOVOL changes.

I look forward to seeing them.

>> Perhaps others can find a reference in Zephyr logs.  I no longer have
>> access to them.
>>
> I, personally, don't need anyone to go to that level of effort.
>
> This discussion raises a question: how do we want to clarify the
> documentation so that the various meanings of each error code (and in
> which context each meaning applies) are clear?  Can I just push
> changes to fs-cm-spec.h, and vldb-vol-spec.h into gerrit?  Or, would
> the community prefer that afs3-stds have a chance to review any
> language changes before they are pushed?  Ideally, I'd want to codify
> this in an I-D describing how clients should behave in the face of rx
> aborts, but I don't foresee having time to do that anytime soon.

Speaking for myself, I would personally be happy with the changes being
committed to
doc/protocol via gerrit while at the same time dropping a note to
afs3-stds letting the folks
there know that there is a protocol documentation change pending review.

I would be happier if doc/protocol contained a Makefile and the
necessary doxygen configuration
to produce readable documents from the .h files.

Jeffrey Altman

signature.asc
Description: OpenPGP digital signature

Re: [OpenAFS-devel] meaning of VNOVOL, VOFFLINE, etc.

Reply via email to