[OpenAFS] 1.4.x quorum election process?
Can anyone point me at the docs where quorum election, IP address numbering as it pertains to election, etc... lives? I can't find what I am looking for on openafs.org I seem to recall that the highest IP is sync site (if I have that right) nonsense was addressed, but again, cannot find the modern info about the election logic. Thanks for any info! ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
Can anyone point me at the docs where quorum election, IP address numbering as it pertains to election, etc... lives? I can't find what I am looking for on openafs.org I seem to recall that the highest IP is sync site (if I have that right) nonsense was addressed, but again, cannot find the modern info about the election logic. There are two sources of documentation that I know about: A long-ago paper by Mike Kazar, and the source code (which actually has reasonable comments). I actually have a copy of the paper if you care. The key source code you want is ${OPENAFS}/src/ubik/vote.c. And in my reading other than the support for clone servers nothing has changed in terms of the quorum selection (it's the lowest IP address, actually). --Ken ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
There are two sources of documentation that I know about: A long-ago paper by Mike Kazar, and the source code (which actually has reasonable comments). I actually have a copy of the paper if you care. The key source code you want is ${OPENAFS}/src/ubik/vote.c. And in my reading other than the support for clone servers nothing has changed in terms of the quorum selection (it's the lowest IP address, actually). Thanks Ken, Yes, lowest, of course (sorry). I can't view the .PS documents yet, but I'm not sure it's necessary to view them if nothing has changed (I was sure it had). The lowest IP address favoritism decision is totally arbitrary, no? We're kind of screwed unless there's a way around it, and really would not like to have to apply a local patch with every rollout. Andrew, Simon, Jeffrey, Derrick, et al... Would a favor highest patch be accepted if it was controlled via configure script, defaulting to the traditional behavior? ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
On 10/26/2011 1:49 PM, Jeff Blaine wrote: Would a favor highest patch be accepted if it was controlled via configure script, defaulting to the traditional behavior? I would object. A quorum requirement is that all servers are in agreement with the server configuration and the quorum algorithm. Any change to the quorum algorithm needs to be exposed as part of the negotiation in order for servers to not get into a state where a misconfigured server or a server executing with an alternate algorithm does not result in a failure to achieve quorum. One of the requirements for pushing patches upstream is that they must not cause people to hang themselves due to inadvertent use. Think about what you would need to do if you were running with this patch locally. Every sysadmin that upgrades these servers must remember that the patch is in place (or how the servers were built/configured) and not forget. If you leave tomorrow, is the next sysadmin going to be burned by this change when s/he attempts to install openafs distributed binaries in your cell? That is not to say that we don't need to improve things. We know we do and it has been talked about for nearly a decade. However, it is also hard and since it is hard it has repeatedly been put off. Jeffrey Altman signature.asc Description: OpenPGP digital signature
Re: [OpenAFS] 1.4.x quorum election process?
The lowest IP address favoritism decision is totally arbitrary, no? Absolutely, yes. I think ... looking at the source code, the comparison is done in 3 places in vote.c. You could replace that with anything else. I've always thought that an explicit ordering would make more sense, but I never cared enough to actually write the code. We're kind of screwed unless there's a way around it, and really would not like to have to apply a local patch with every rollout. Have you considered making the lowest server a clone? Clones are like other database servers, except that they can never be elected as a sync site. The (default) election winner then would be the next closest. Also, it's not commonly understood but Ubik voting is what I like to call Chicago style; the incumbent is always the winner of the election even if he's not the best candidate. Thus if you shut off the database servers of the lowest IP address, once a new election takes place the winner will be sync-site-for-life (unless he's out of service past the Ubik change voting interval). Just trying to present possible solutions that don't involve code changes. --Ken ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
Think about what you would need to do if you were running with this patch locally. Every sysadmin that upgrades these servers must remember that the patch is in place (or how the servers were built/configured) and not forget. If you leave tomorrow, is the next sysadmin going to be burned by this change when s/he attempts to install openafs distributed binaries in your cell? You could make the same argument (that you're making) with at least 5 other existing OpenAFS command-line or build-time options. Example: --enable-namei-fileserver vs. not, drop on a server with existing vice partitions in the wrong style. Build/implementation decisions are encapsulated in build scripts of ours. Additionally, those decisions are documented in our wiki. If he/she hasn't read our internal documentation about our cell, which is extensive and clear in our wiki, then yes, he/she will get burned. Just like he/she would with any other option for cell or server configuration. ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
I would object. A quorum requirement is that all servers are in agreement with the server configuration and the quorum algorithm. Any change to the quorum algorithm needs to be exposed as part of the negotiation in order for servers to not get into a state where a misconfigured server or a server executing with an alternate algorithm does not result in a failure to achieve quorum. While I agree with that in theory, we don't have that today; misconfigured servers can easily cause a quorum failure. Also if the server times don't match up that can easily cause a quorum failure (I'd classify that as a misconfigured server as well). As an aside: you start to see why this problem has never been fixed. Fixing the basic problem is easy, but if you start talking about some huge negotiation framework ... gaaah, it's too much. --Ken ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
On 10/26/2011 02:23 PM, Jeff Blaine wrote: Have you considered making the lowest server a clone? Clones are like other database servers, except that they can never be elected as a sync site. The (default) election winner then would be the next closest. YES! Thank you! I knew there was something added related to this topic. CLONES I will investigate. FYI, the CellServDB man page has this info: http://docs.openafs.org/Reference/5/CellServDB.html just search for clone Jason ___ OpenAFS-info mailing list OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-info
Re: [OpenAFS] 1.4.x quorum election process?
On 10/26/2011 2:21 PM, Jeff Blaine wrote: Think about what you would need to do if you were running with this patch locally. Every sysadmin that upgrades these servers must remember that the patch is in place (or how the servers were built/configured) and not forget. If you leave tomorrow, is the next sysadmin going to be burned by this change when s/he attempts to install openafs distributed binaries in your cell? You could make the same argument (that you're making) with at least 5 other existing OpenAFS command-line or build-time options. Example: --enable-namei-fileserver vs. not, drop on a server with existing vice partitions in the wrong style. We have spent the last five years removing compile time options. Inode vs Namei is a particularly bad example since two things have been happening in the file server back-end processing: 1. Consolidation of code trees to permit more run time functionally selection 2. A decision to not accept additional back-end implementations such as POSIX-Extended-Attributes or HostAFS without also abstracting back-ends so that a file server can choose which back-end it wants to use at run-time on a partition by partition basis. One of the rationales for this is permit sites to migrate from one back-end to another on the same file server hardware without requiring that all volumes to relocated to another server as a transition. Build/implementation decisions are encapsulated in build scripts of ours. Additionally, those decisions are documented in our wiki. If he/she hasn't read our internal documentation about our cell, which is extensive and clear in our wiki, then yes, he/she will get burned. Just like he/she would with any other option for cell or server configuration. Then you can happily maintain the patch locally since it makes a change to three lines of source code. There has been discussion over the last several years about what such a change should look like especially as we move to a world that includes a mixture of IPv4 and IPv6 as well as the possibility that multiple service instances could exists on the same machine with different port numbers. Such a configuration could be deployed today using DNS SRV records for any of the database services. I don't remember all of the details but I believe the agree upon solution included: * UUIDs for each database service instance * Configuration data that would be deployed in conjunction with a new CellServDB format that would specify the ranking * Some hash of the configuration data that would be included in the votes to ensure that only votes that are cast on the same ballot are included in the resulting decision for those that agree upon the ballot. Where are we on this? Well, * Simon Wilkinson [YFS] has spent time working on implementing the new CellServDB file format that was agreed to at the most recent AFS hackathon. * There is agreement that mixed version database servers are not supported within a cell and that ubik is not an afs3-standard protocol and as such does not require protocol standardization for the purpose of making changes. That is not to say that OpenAFS will permit a change to be accepted without a solid protocol description but it does make it easier for OpenAFS to accept and roll out changes as part of a version number upgrade. * Filling in the rest of the pieces such as assigning UUIDs is not an overwhelming amount of work. Anyone that is interested in contributing to this work with code or financial support is welcome to contact me privately. Jeffrey Altman signature.asc Description: OpenPGP digital signature