On Fri, May 4, 2012 at 6:35 PM, Jeffrey Altman
<[email protected]> wrote:
> Tom:
>
> The 1991 Zayas specifications are lacking in many regards.  For
> starters, the Vxxx error codes are only defined for the Vol/VL RPCs and
> not for the FS/CM RPCs.  The use of the Vxxxx error codes in the FS/CM
> RPCs is left undefined and yet those errors are reported to cache
> managers by file servers.
>

Hi Jeff,

I was wondering if you were going to raise this distinction.  It is
indeed troubling how little the FS/CM document has to say about
this--and many other--issues.

> I think it was 2004 or perhaps early 2005 when a large user was
> concerned about VLDB scalability due to the introduction of tens of
> thousands of Windows clients into the environment.  Each time a VNOVOL,
> VMOVED, VOFFLINE, VSALVAGE or VNOSERVICE error was received the Windows
> client would query the VLDB and retry the request after 2 seconds.  If a
> volume couldn't be served from a file server this process would be
> repeated.  This is exacerbated by the behavior of the Explorer Shell
> which reads the contents of directories it displays searching for
> various metadata.  As a result the VLDB servers were struggling under
> the load.  It wasn't going to be possible to make the VLDB servers
> process more requests so it was important to reduce the number of
> requests that were sent.
>
> The discussions that took place came to the conclusion that the
> description of VNOVOL was ambiguous and its meaning based upon usage
> should be that the volume is not present.  With that interpretation a
> client could restrict the number of VLDB lookups for a volume.  I do not
> remember if these discussions took place at a hackathon, a workshop, or
> on Zephyr.   Such use of the error codes didn't make a difference to
> deployed clients since they acted on all error codes in an identical
> fashion nor did it result in a protocol change given existing use in the
> file server.
>

Thanks for the explanation; this is precisely what I was hoping to
find out.  Given this, I will push the VNOVOL changes.

> Perhaps others can find a reference in Zephyr logs.  I no longer have
> access to them.
>

I, personally, don't need anyone to go to that level of effort.

This discussion raises a question: how do we want to clarify the
documentation so that the various meanings of each error code (and in
which context each meaning applies) are clear?  Can I just push
changes to fs-cm-spec.h, and vldb-vol-spec.h into gerrit?  Or, would
the community prefer that afs3-stds have a chance to review any
language changes before they are pushed?  Ideally, I'd want to codify
this in an I-D describing how clients should behave in the face of rx
aborts, but I don't foresee having time to do that anytime soon.

Regards,

-Tom
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Reply via email to