On Fri, May 4, 2012 at 6:35 PM, Jeffrey Altman <[email protected]> wrote: > Tom: > > The 1991 Zayas specifications are lacking in many regards. For > starters, the Vxxx error codes are only defined for the Vol/VL RPCs and > not for the FS/CM RPCs. The use of the Vxxxx error codes in the FS/CM > RPCs is left undefined and yet those errors are reported to cache > managers by file servers. >
Hi Jeff, I was wondering if you were going to raise this distinction. It is indeed troubling how little the FS/CM document has to say about this--and many other--issues. > I think it was 2004 or perhaps early 2005 when a large user was > concerned about VLDB scalability due to the introduction of tens of > thousands of Windows clients into the environment. Each time a VNOVOL, > VMOVED, VOFFLINE, VSALVAGE or VNOSERVICE error was received the Windows > client would query the VLDB and retry the request after 2 seconds. If a > volume couldn't be served from a file server this process would be > repeated. This is exacerbated by the behavior of the Explorer Shell > which reads the contents of directories it displays searching for > various metadata. As a result the VLDB servers were struggling under > the load. It wasn't going to be possible to make the VLDB servers > process more requests so it was important to reduce the number of > requests that were sent. > > The discussions that took place came to the conclusion that the > description of VNOVOL was ambiguous and its meaning based upon usage > should be that the volume is not present. With that interpretation a > client could restrict the number of VLDB lookups for a volume. I do not > remember if these discussions took place at a hackathon, a workshop, or > on Zephyr. Such use of the error codes didn't make a difference to > deployed clients since they acted on all error codes in an identical > fashion nor did it result in a protocol change given existing use in the > file server. > Thanks for the explanation; this is precisely what I was hoping to find out. Given this, I will push the VNOVOL changes. > Perhaps others can find a reference in Zephyr logs. I no longer have > access to them. > I, personally, don't need anyone to go to that level of effort. This discussion raises a question: how do we want to clarify the documentation so that the various meanings of each error code (and in which context each meaning applies) are clear? Can I just push changes to fs-cm-spec.h, and vldb-vol-spec.h into gerrit? Or, would the community prefer that afs3-stds have a chance to review any language changes before they are pushed? Ideally, I'd want to codify this in an I-D describing how clients should behave in the face of rx aborts, but I don't foresee having time to do that anytime soon. Regards, -Tom _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
