Hi, As some of you already know, sites have recently run into troubles caused by interpretation of various volume package special error codes. After looking at the Ed Zayas spec, and how the unix and Windows clients interpret the various codes in master and OpenAFS 1.0, I wanted to start a discussion about the slight redefinition of protocol error handling semantics over the past decade. According to the Zayas VVL spec, the relevant error codes have the following meanings:
- VSALVAGE: volume needs to be salvaged - VNOVOL: the given volume is either not attached, doesn't exist, or is not online - VNOSERVICE: the volume is currently not in service - VOFFLINE: the specified volume is offline, for the reason given in the offline message field (a subield within the volume field in struct volser_trans) - VBUSY: the named volume is temporarily unavailable, and the client is encouraged to retry the operation shortly By my reading of the above specification, VOFFLINE is strictly for use when offlineMessage is set in the VolumeDiskData file, whereas VNOVOL was intended to be the catch-all "it's not online" error code. Indeed, OpenAFS 1.0 volume.c more-or-less follows the above rubric. When working on DAFS many years ago, I tried to follow these definitions (although, admittedly, I got it wrong in a number of cases). Now, I must concede that the definitions in the Zayas spec are not terribly useful: they do not differentiate between "I don't have it", and "I won't give it to you", which is typically the fundamental question the cm is trying to answer. In this strict sense, I much prefer the way recent versions of the Windows CM utilize VNOVOL/VOFFLINE as a means of satisfying the existence question. However, as much as I like the cleanliness this approach provides, I am concerned about the seeming divergence between our implementations and our specification... It's certainly possible that I'm not privy to protocol discussions where it was decided that redefining VNOVOL, VNOSERVICE[*], and VOFFLINE was ok (given that legacy CMs seem to make little distinction between VOFFLINE, VNOVOL, VSALVAGE, VNOSERVICE, etc.). If that is the case, could someone provide more information from these discussions? Obviously, the current mismatch in behavior between DAFS and the Windows CM needs to be resolved posthaste. That we already have a wide deployment base of nodes in disagreement about the denotation of certain critical error codes is troubling--to the point that pragmatism may preclude us from strict adherence to the extant AFS-3 specification. This leaves me with two questions: 1) is there something that OpenAFS can do to resolve this issue without requiring any standards involvement? 2) if not, what is our stop-gap until we can fix this at the afs3-stds level? With regard to (1), I have some patches that modify DAFS to behave more like the Windows CM expects. However, before I consider pushing these patches to gerrit, I want to solicit opinions regarding these underlying questions... -- Tom Keiser [email protected] * I'm willing to grant up-front that our reappropriation of VNOSERVICE does not require further discussion, as it was a previously-unused error constant. _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
