On 5/8/2012 1:55 PM, Tom Keiser wrote: > On Fri, May 4, 2012 at 6:35 PM, Jeffrey Altman > <[email protected]> wrote: >> Tom: >> >> The 1991 Zayas specifications are lacking in many regards. For >> starters, the Vxxx error codes are only defined for the Vol/VL RPCs and >> not for the FS/CM RPCs. The use of the Vxxxx error codes in the FS/CM >> RPCs is left undefined and yet those errors are reported to cache >> managers by file servers. >> > Hi Jeff, > > I was wondering if you were going to raise this distinction. It is > indeed troubling how little the FS/CM document has to say about > this--and many other--issues.
I suspect the protocol documents were intended as a guide for Transarc developers to use as part of internal training on the code base. They certainly were not written at the level necessary to produce an independent implementation. >> I think it was 2004 or perhaps early 2005 when a large user was >> concerned about VLDB scalability due to the introduction of tens of >> thousands of Windows clients into the environment. Each time a VNOVOL, >> VMOVED, VOFFLINE, VSALVAGE or VNOSERVICE error was received the Windows >> client would query the VLDB and retry the request after 2 seconds. If a >> volume couldn't be served from a file server this process would be >> repeated. This is exacerbated by the behavior of the Explorer Shell >> which reads the contents of directories it displays searching for >> various metadata. As a result the VLDB servers were struggling under >> the load. It wasn't going to be possible to make the VLDB servers >> process more requests so it was important to reduce the number of >> requests that were sent. >> >> The discussions that took place came to the conclusion that the >> description of VNOVOL was ambiguous and its meaning based upon usage >> should be that the volume is not present. With that interpretation a >> client could restrict the number of VLDB lookups for a volume. I do not >> remember if these discussions took place at a hackathon, a workshop, or >> on Zephyr. Such use of the error codes didn't make a difference to >> deployed clients since they acted on all error codes in an identical >> fashion nor did it result in a protocol change given existing use in the >> file server. >> > Thanks for the explanation; this is precisely what I was hoping to > find out. Given this, I will push the VNOVOL changes. I look forward to seeing them. >> Perhaps others can find a reference in Zephyr logs. I no longer have >> access to them. >> > I, personally, don't need anyone to go to that level of effort. > > This discussion raises a question: how do we want to clarify the > documentation so that the various meanings of each error code (and in > which context each meaning applies) are clear? Can I just push > changes to fs-cm-spec.h, and vldb-vol-spec.h into gerrit? Or, would > the community prefer that afs3-stds have a chance to review any > language changes before they are pushed? Ideally, I'd want to codify > this in an I-D describing how clients should behave in the face of rx > aborts, but I don't foresee having time to do that anytime soon. Speaking for myself, I would personally be happy with the changes being committed to doc/protocol via gerrit while at the same time dropping a note to afs3-stds letting the folks there know that there is a protocol documentation change pending review. I would be happier if doc/protocol contained a Makefile and the necessary doxygen configuration to produce readable documents from the .h files. Jeffrey Altman
signature.asc
Description: OpenPGP digital signature
