Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Mon, Jun 04, 2012 at 11:33:45AM +1000, Andrew Beekhof wrote: > On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof wrote: > > On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > >> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > >> wrote: > >>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: > On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg > wrote: > > Sorry, sent to early. > > > > That would not catch the case of cluster partitions joining, > > only the pacemaker startup with fully connected cluster communication > > already up. > > > > I thought about a dc-priority default of 100, > > and only triggering a re-election if I am DC, > > my dc-priority is < 50, and I see a node joining. > > Hardcoded arbitrary defaults aren't that much fun. "You can use any > number, but 100 is the magic threshold" is something I wouldn't want > to explain to people over and over again. > >>> > >>> Then don't ;-) > >>> > >>> Not helping, and irrelevant to this case. > >>> > >>> Besides that was an example. > >>> Easily possible: move the "I want to lose" vs "I want to win" > >>> magic number to be 0, and allow both positive and negative priorities. > >>> You get to decide whether positive or negative is the "I'd rather lose" > >>> side. Want to make that configurable as well? Right. > >> > >> Nope, 0 is used as a threshold value in Pacemaker all over the place. > >> So allowing both positive and negative priorities and making 0 the > >> default sounds perfectly sane to me. > >> > >>> I don't think this can be made part of the cib configuration, > >>> DC election takes place before cibs are resynced, so if you have > >>> diverging cibs, you possibly end up with a never ending election? > >>> > >>> Then maybe the election is stable enough, > >>> even after this change to the algorithm. > >> > >> Andrew? > > > > Probably. The preferences are not going to be rapidly changing, so > > there is no reason to suspect it would destabilise things. > > Oh, you mean if the values are stored in the CIB? > Yeah, I guess you could have issues if you changed the CIB during a > cluster partition... dont do that? Right. That was my concern. So I'd rather not add them to the cib, but get them from environment variables. Which means that I would need to restart the local stack, if I wanted to change the preference. Good enough. > Honestly though, given the number (1? 2? 0?) of sites in the world > that actually need this, my main criteria for a successful patch is > "not screwing it up for everyone else". > Which certainly rules out starting elections just because someone > joined. Although "i've just started and have a non-zero preference so > I'm going to force an election" would be fine. Thanks. I'll see what the current status of that patch is, and if we can prepare a patch to be considered for upstream inclusion. May take a while though, due to round trip times ;-) -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Mon, Jun 4, 2012 at 11:28 AM, Andrew Beekhof wrote: > On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: >> On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg >> wrote: >>> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg wrote: > Sorry, sent to early. > > That would not catch the case of cluster partitions joining, > only the pacemaker startup with fully connected cluster communication > already up. > > I thought about a dc-priority default of 100, > and only triggering a re-election if I am DC, > my dc-priority is < 50, and I see a node joining. Hardcoded arbitrary defaults aren't that much fun. "You can use any number, but 100 is the magic threshold" is something I wouldn't want to explain to people over and over again. >>> >>> Then don't ;-) >>> >>> Not helping, and irrelevant to this case. >>> >>> Besides that was an example. >>> Easily possible: move the "I want to lose" vs "I want to win" >>> magic number to be 0, and allow both positive and negative priorities. >>> You get to decide whether positive or negative is the "I'd rather lose" >>> side. Want to make that configurable as well? Right. >> >> Nope, 0 is used as a threshold value in Pacemaker all over the place. >> So allowing both positive and negative priorities and making 0 the >> default sounds perfectly sane to me. >> >>> I don't think this can be made part of the cib configuration, >>> DC election takes place before cibs are resynced, so if you have >>> diverging cibs, you possibly end up with a never ending election? >>> >>> Then maybe the election is stable enough, >>> even after this change to the algorithm. >> >> Andrew? > > Probably. The preferences are not going to be rapidly changing, so > there is no reason to suspect it would destabilise things. Oh, you mean if the values are stored in the CIB? Yeah, I guess you could have issues if you changed the CIB during a cluster partition... dont do that? Honestly though, given the number (1? 2? 0?) of sites in the world that actually need this, my main criteria for a successful patch is "not screwing it up for everyone else". Which certainly rules out starting elections just because someone joined. Although "i've just started and have a non-zero preference so I'm going to force an election" would be fine. > >> >>> But you'd need to add an other trigger on "dc-priority in configuration >>> changed", complicating this stuff for no reason. >>> We actually discussed node defaults a while back. Those would be similar to resource and op defaults which Pacemaker already has, and set defaults for node attributes for newly joined nodes. At the time the idea was to support putting new joiners in standby mode by default, so when you added a node in a symmetric cluster, you wouldn't need to be afraid that Pacemaker would shuffle resources around.[1] This dc-priority would be another possibly useful use case for this. >>> >>> Not so sure about that. >>> [1] Yes, semi-doable with putting the cluster into maintenance mode before firing up the new node, setting that node into standby, and then unsetting maintenance mode. But that's just an additional step that users can easily forget about. >>> >>> Why not simply add the node to the cib, and set it to standby, >>> before it even joins for the first time. >> >> Haha, good one. >> >> Wait, you weren't joking? >> >> Florian >> >> -- >> Need help with High Availability? >> http://www.hastexo.com/now >> >> ___ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > wrote: >> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: >>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg >>> wrote: >>> > Sorry, sent to early. >>> > >>> > That would not catch the case of cluster partitions joining, >>> > only the pacemaker startup with fully connected cluster communication >>> > already up. >>> > >>> > I thought about a dc-priority default of 100, >>> > and only triggering a re-election if I am DC, >>> > my dc-priority is < 50, and I see a node joining. >>> >>> Hardcoded arbitrary defaults aren't that much fun. "You can use any >>> number, but 100 is the magic threshold" is something I wouldn't want >>> to explain to people over and over again. >> >> Then don't ;-) >> >> Not helping, and irrelevant to this case. >> >> Besides that was an example. >> Easily possible: move the "I want to lose" vs "I want to win" >> magic number to be 0, and allow both positive and negative priorities. >> You get to decide whether positive or negative is the "I'd rather lose" >> side. Want to make that configurable as well? Right. > > Nope, 0 is used as a threshold value in Pacemaker all over the place. > So allowing both positive and negative priorities and making 0 the > default sounds perfectly sane to me. > >> I don't think this can be made part of the cib configuration, >> DC election takes place before cibs are resynced, so if you have >> diverging cibs, you possibly end up with a never ending election? >> >> Then maybe the election is stable enough, >> even after this change to the algorithm. > > Andrew? Probably. The preferences are not going to be rapidly changing, so there is no reason to suspect it would destabilise things. > >> But you'd need to add an other trigger on "dc-priority in configuration >> changed", complicating this stuff for no reason. >> >>> We actually discussed node defaults a while back. Those would be >>> similar to resource and op defaults which Pacemaker already has, and >>> set defaults for node attributes for newly joined nodes. At the time >>> the idea was to support putting new joiners in standby mode by >>> default, so when you added a node in a symmetric cluster, you wouldn't >>> need to be afraid that Pacemaker would shuffle resources around.[1] >>> This dc-priority would be another possibly useful use case for this. >> >> Not so sure about that. >> >>> [1] Yes, semi-doable with putting the cluster into maintenance mode >>> before firing up the new node, setting that node into standby, and >>> then unsetting maintenance mode. But that's just an additional step >>> that users can easily forget about. >> >> Why not simply add the node to the cib, and set it to standby, >> before it even joins for the first time. > > Haha, good one. > > Wait, you weren't joking? > > Florian > > -- > Need help with High Availability? > http://www.hastexo.com/now > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 09:05:54PM +1000, Andrew Beekhof wrote: > On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > > On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > > wrote: > >> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: > >>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg > >>> wrote: > >>> > Sorry, sent to early. > >>> > > >>> > That would not catch the case of cluster partitions joining, > >>> > only the pacemaker startup with fully connected cluster communication > >>> > already up. > >>> > > >>> > I thought about a dc-priority default of 100, > >>> > and only triggering a re-election if I am DC, > >>> > my dc-priority is < 50, and I see a node joining. > >>> > >>> Hardcoded arbitrary defaults aren't that much fun. "You can use any > >>> number, but 100 is the magic threshold" is something I wouldn't want > >>> to explain to people over and over again. > >> > >> Then don't ;-) > >> > >> Not helping, and irrelevant to this case. > >> > >> Besides that was an example. > >> Easily possible: move the "I want to lose" vs "I want to win" > >> magic number to be 0, and allow both positive and negative priorities. > >> You get to decide whether positive or negative is the "I'd rather lose" > >> side. Want to make that configurable as well? Right. > > > > Nope, 0 is used as a threshold value in Pacemaker all over the place. > > So allowing both positive and negative priorities and making 0 the > > default sounds perfectly sane to me. > > > >> I don't think this can be made part of the cib configuration, > >> DC election takes place before cibs are resynced, so if you have > >> diverging cibs, you possibly end up with a never ending election? > >> > >> Then maybe the election is stable enough, > >> even after this change to the algorithm. > > > > Andrew? > > This whole thread makes me want to hurt kittens. Yep... Sorry for that :( Lars ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 7:48 PM, Florian Haas wrote: > On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg > wrote: >> On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: >>> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg >>> wrote: >>> > Sorry, sent to early. >>> > >>> > That would not catch the case of cluster partitions joining, >>> > only the pacemaker startup with fully connected cluster communication >>> > already up. >>> > >>> > I thought about a dc-priority default of 100, >>> > and only triggering a re-election if I am DC, >>> > my dc-priority is < 50, and I see a node joining. >>> >>> Hardcoded arbitrary defaults aren't that much fun. "You can use any >>> number, but 100 is the magic threshold" is something I wouldn't want >>> to explain to people over and over again. >> >> Then don't ;-) >> >> Not helping, and irrelevant to this case. >> >> Besides that was an example. >> Easily possible: move the "I want to lose" vs "I want to win" >> magic number to be 0, and allow both positive and negative priorities. >> You get to decide whether positive or negative is the "I'd rather lose" >> side. Want to make that configurable as well? Right. > > Nope, 0 is used as a threshold value in Pacemaker all over the place. > So allowing both positive and negative priorities and making 0 the > default sounds perfectly sane to me. > >> I don't think this can be made part of the cib configuration, >> DC election takes place before cibs are resynced, so if you have >> diverging cibs, you possibly end up with a never ending election? >> >> Then maybe the election is stable enough, >> even after this change to the algorithm. > > Andrew? This whole thread makes me want to hurt kittens. > >> But you'd need to add an other trigger on "dc-priority in configuration >> changed", complicating this stuff for no reason. >> >>> We actually discussed node defaults a while back. Those would be >>> similar to resource and op defaults which Pacemaker already has, and >>> set defaults for node attributes for newly joined nodes. At the time >>> the idea was to support putting new joiners in standby mode by >>> default, so when you added a node in a symmetric cluster, you wouldn't >>> need to be afraid that Pacemaker would shuffle resources around.[1] >>> This dc-priority would be another possibly useful use case for this. >> >> Not so sure about that. >> >>> [1] Yes, semi-doable with putting the cluster into maintenance mode >>> before firing up the new node, setting that node into standby, and >>> then unsetting maintenance mode. But that's just an additional step >>> that users can easily forget about. >> >> Why not simply add the node to the cib, and set it to standby, >> before it even joins for the first time. > > Haha, good one. > > Wait, you weren't joking? > > Florian > > -- > Need help with High Availability? > http://www.hastexo.com/now > > ___ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 11:38 AM, Lars Ellenberg wrote: > On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: >> On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg >> wrote: >> > Sorry, sent to early. >> > >> > That would not catch the case of cluster partitions joining, >> > only the pacemaker startup with fully connected cluster communication >> > already up. >> > >> > I thought about a dc-priority default of 100, >> > and only triggering a re-election if I am DC, >> > my dc-priority is < 50, and I see a node joining. >> >> Hardcoded arbitrary defaults aren't that much fun. "You can use any >> number, but 100 is the magic threshold" is something I wouldn't want >> to explain to people over and over again. > > Then don't ;-) > > Not helping, and irrelevant to this case. > > Besides that was an example. > Easily possible: move the "I want to lose" vs "I want to win" > magic number to be 0, and allow both positive and negative priorities. > You get to decide whether positive or negative is the "I'd rather lose" > side. Want to make that configurable as well? Right. Nope, 0 is used as a threshold value in Pacemaker all over the place. So allowing both positive and negative priorities and making 0 the default sounds perfectly sane to me. > I don't think this can be made part of the cib configuration, > DC election takes place before cibs are resynced, so if you have > diverging cibs, you possibly end up with a never ending election? > > Then maybe the election is stable enough, > even after this change to the algorithm. Andrew? > But you'd need to add an other trigger on "dc-priority in configuration > changed", complicating this stuff for no reason. > >> We actually discussed node defaults a while back. Those would be >> similar to resource and op defaults which Pacemaker already has, and >> set defaults for node attributes for newly joined nodes. At the time >> the idea was to support putting new joiners in standby mode by >> default, so when you added a node in a symmetric cluster, you wouldn't >> need to be afraid that Pacemaker would shuffle resources around.[1] >> This dc-priority would be another possibly useful use case for this. > > Not so sure about that. > >> [1] Yes, semi-doable with putting the cluster into maintenance mode >> before firing up the new node, setting that node into standby, and >> then unsetting maintenance mode. But that's just an additional step >> that users can easily forget about. > > Why not simply add the node to the cib, and set it to standby, > before it even joins for the first time. Haha, good one. Wait, you weren't joking? Florian -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 11:15:32AM +0200, Florian Haas wrote: > On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg > wrote: > > Sorry, sent to early. > > > > That would not catch the case of cluster partitions joining, > > only the pacemaker startup with fully connected cluster communication > > already up. > > > > I thought about a dc-priority default of 100, > > and only triggering a re-election if I am DC, > > my dc-priority is < 50, and I see a node joining. > > Hardcoded arbitrary defaults aren't that much fun. "You can use any > number, but 100 is the magic threshold" is something I wouldn't want > to explain to people over and over again. Then don't ;-) Not helping, and irrelevant to this case. Besides that was an example. Easily possible: move the "I want to lose" vs "I want to win" magic number to be 0, and allow both positive and negative priorities. You get to decide whether positive or negative is the "I'd rather lose" side. Want to make that configurable as well? Right. I don't think this can be made part of the cib configuration, DC election takes place before cibs are resynced, so if you have diverging cibs, you possibly end up with a never ending election? Then maybe the election is stable enough, even after this change to the algorithm. But you'd need to add an other trigger on "dc-priority in configuration changed", complicating this stuff for no reason. > We actually discussed node defaults a while back. Those would be > similar to resource and op defaults which Pacemaker already has, and > set defaults for node attributes for newly joined nodes. At the time > the idea was to support putting new joiners in standby mode by > default, so when you added a node in a symmetric cluster, you wouldn't > need to be afraid that Pacemaker would shuffle resources around.[1] > This dc-priority would be another possibly useful use case for this. Not so sure about that. > [1] Yes, semi-doable with putting the cluster into maintenance mode > before firing up the new node, setting that node into standby, and > then unsetting maintenance mode. But that's just an additional step > that users can easily forget about. Why not simply add the node to the cib, and set it to standby, before it even joins for the first time. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 10:45 AM, Lars Ellenberg wrote: > Sorry, sent to early. > > That would not catch the case of cluster partitions joining, > only the pacemaker startup with fully connected cluster communication > already up. > > I thought about a dc-priority default of 100, > and only triggering a re-election if I am DC, > my dc-priority is < 50, and I see a node joining. Hardcoded arbitrary defaults aren't that much fun. "You can use any number, but 100 is the magic threshold" is something I wouldn't want to explain to people over and over again. We actually discussed node defaults a while back. Those would be similar to resource and op defaults which Pacemaker already has, and set defaults for node attributes for newly joined nodes. At the time the idea was to support putting new joiners in standby mode by default, so when you added a node in a symmetric cluster, you wouldn't need to be afraid that Pacemaker would shuffle resources around.[1] This dc-priority would be another possibly useful use case for this. Just my two cents. Florian [1] Yes, semi-doable with putting the cluster into maintenance mode before firing up the new node, setting that node into standby, and then unsetting maintenance mode. But that's just an additional step that users can easily forget about. -- Need help with High Availability? http://www.hastexo.com/now ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 10:29:58AM +0200, Lars Ellenberg wrote: > On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote: > > On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg > > wrote: > > > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > > >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg > > >> wrote: > > >> > > > >> > People sometimes think they have a use case > > >> > for influencing which node will be the DC. > > >> > > >> Agreed :-) > > >> > > >> > > > >> > Sometimes it is latency (certain cli commands work faster > > >> > when done on the DC), > > >> > > >> Config changes can be run against any node, there is no reason to go > > >> to the one on the DC. > > >> > > >> > sometimes they add a "mostly quorum" > > >> > node which may be not quite up to the task of being DC. > > >> > > >> I'm not sure I buy that. Most of the load would comes from the > > >> resources themselves. > > >> > > >> > Prohibiting a node from becoming DC completely would > > >> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP), > > >> > or act on its own resources for certain no-quorum policies. > > >> > > > >> > So here is a patch I have been asked to present for discussion, > > >> > > >> May one ask where it originated? > > >> > > >> > against Pacemaker 1.0, that introduces a "dc-prio" configuration > > >> > parameter, which will add some skew to the election algorithm. > > >> > > > >> > > > >> > Open questions: > > >> > * does it make sense at all? > > >> > > >> Doubtful :-) > > >> > > >> > > > >> > * election algorithm compatibility, stability: > > >> > will the election be correct if some nodes have this patch, > > >> > and some don't ? > > >> > > >> Unlikely, but you could easily make it so by placing it after the > > >> version check (and bumping said version in the patch) > > >> > > >> > * How can it be improved so that a node with dc-prio=0 will > > >> > "give up" its DC-role as soon as there is at least one other node > > >> > with dc-prio > 0? > > >> > > >> Short of causing an election every time a node joins... I doubt it. > > > > > > Where would be a suitable place in the code/fsa to do so? > > > > Just after the call to exit(0) :) > > Just what I thought ;-) > > > I'd do it at the end of do_started() but only if dc-priority* > 0. > > That way you only cause an election if someone who is likely to win it > > starts. > > And people that don't enable this feature are unaffected. Sorry, sent to early. That would not catch the case of cluster partitions joining, only the pacemaker startup with fully connected cluster communication already up. I thought about a dc-priority default of 100, and only triggering a re-election if I am DC, my dc-priority is < 50, and I see a node joining. That would then happen in handle_request() /*== DC-Only Actions ==*/ if(AM_I_DC) { if(strcmp(op, CRM_OP_JOIN_ANNOUNCE) == 0) { if ( *** new logic goes here *** ) return I_ELECTION; else return I_NODE_JOIN; Of course, we could even add the dc-priority to the CRM_OP_JOIN_ANNOUNCE message, so we do only trigger an election if we are likely to lose. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 10:50:25AM +1000, Andrew Beekhof wrote: > On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg > wrote: > > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg > >> wrote: > >> > > >> > People sometimes think they have a use case > >> > for influencing which node will be the DC. > >> > >> Agreed :-) > >> > >> > > >> > Sometimes it is latency (certain cli commands work faster > >> > when done on the DC), > >> > >> Config changes can be run against any node, there is no reason to go > >> to the one on the DC. > >> > >> > sometimes they add a "mostly quorum" > >> > node which may be not quite up to the task of being DC. > >> > >> I'm not sure I buy that. Most of the load would comes from the > >> resources themselves. > >> > >> > Prohibiting a node from becoming DC completely would > >> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP), > >> > or act on its own resources for certain no-quorum policies. > >> > > >> > So here is a patch I have been asked to present for discussion, > >> > >> May one ask where it originated? > >> > >> > against Pacemaker 1.0, that introduces a "dc-prio" configuration > >> > parameter, which will add some skew to the election algorithm. > >> > > >> > > >> > Open questions: > >> > * does it make sense at all? > >> > >> Doubtful :-) > >> > >> > > >> > * election algorithm compatibility, stability: > >> > will the election be correct if some nodes have this patch, > >> > and some don't ? > >> > >> Unlikely, but you could easily make it so by placing it after the > >> version check (and bumping said version in the patch) > >> > >> > * How can it be improved so that a node with dc-prio=0 will > >> > "give up" its DC-role as soon as there is at least one other node > >> > with dc-prio > 0? > >> > >> Short of causing an election every time a node joins... I doubt it. > > > > Where would be a suitable place in the code/fsa to do so? > > Just after the call to exit(0) :) Just what I thought ;-) > I'd do it at the end of do_started() but only if dc-priority* > 0. > That way you only cause an election if someone who is likely to win it starts. > And people that don't enable this feature are unaffected. > > * Not dc-prio, its 2012, there's no need to save the extra 4 chars :-) Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Fri, May 25, 2012 at 10:04 AM, Lars Ellenberg wrote: > On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: >> On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg >> wrote: >> > >> > People sometimes think they have a use case >> > for influencing which node will be the DC. >> >> Agreed :-) >> >> > >> > Sometimes it is latency (certain cli commands work faster >> > when done on the DC), >> >> Config changes can be run against any node, there is no reason to go >> to the one on the DC. >> >> > sometimes they add a "mostly quorum" >> > node which may be not quite up to the task of being DC. >> >> I'm not sure I buy that. Most of the load would comes from the >> resources themselves. >> >> > Prohibiting a node from becoming DC completely would >> > mean it can not even be cleanly shutdown (with 1.0.x, no MCP), >> > or act on its own resources for certain no-quorum policies. >> > >> > So here is a patch I have been asked to present for discussion, >> >> May one ask where it originated? >> >> > against Pacemaker 1.0, that introduces a "dc-prio" configuration >> > parameter, which will add some skew to the election algorithm. >> > >> > >> > Open questions: >> > * does it make sense at all? >> >> Doubtful :-) >> >> > >> > * election algorithm compatibility, stability: >> > will the election be correct if some nodes have this patch, >> > and some don't ? >> >> Unlikely, but you could easily make it so by placing it after the >> version check (and bumping said version in the patch) >> >> > * How can it be improved so that a node with dc-prio=0 will >> > "give up" its DC-role as soon as there is at least one other node >> > with dc-prio > 0? >> >> Short of causing an election every time a node joins... I doubt it. > > Where would be a suitable place in the code/fsa to do so? Just after the call to exit(0) :) I'd do it at the end of do_started() but only if dc-priority* > 0. That way you only cause an election if someone who is likely to win it starts. And people that don't enable this feature are unaffected. * Not dc-prio, its 2012, there's no need to save the extra 4 chars :-) ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Sun, May 06, 2012 at 09:45:09PM +1000, Andrew Beekhof wrote: > On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg > wrote: > > > > People sometimes think they have a use case > > for influencing which node will be the DC. > > Agreed :-) > > > > > Sometimes it is latency (certain cli commands work faster > > when done on the DC), > > Config changes can be run against any node, there is no reason to go > to the one on the DC. > > > sometimes they add a "mostly quorum" > > node which may be not quite up to the task of being DC. > > I'm not sure I buy that. Most of the load would comes from the > resources themselves. > > > Prohibiting a node from becoming DC completely would > > mean it can not even be cleanly shutdown (with 1.0.x, no MCP), > > or act on its own resources for certain no-quorum policies. > > > > So here is a patch I have been asked to present for discussion, > > May one ask where it originated? > > > against Pacemaker 1.0, that introduces a "dc-prio" configuration > > parameter, which will add some skew to the election algorithm. > > > > > > Open questions: > > * does it make sense at all? > > Doubtful :-) > > > > > * election algorithm compatibility, stability: > > will the election be correct if some nodes have this patch, > > and some don't ? > > Unlikely, but you could easily make it so by placing it after the > version check (and bumping said version in the patch) > > > * How can it be improved so that a node with dc-prio=0 will > > "give up" its DC-role as soon as there is at least one other node > > with dc-prio > 0? > > Short of causing an election every time a node joins... I doubt it. Where would be a suitable place in the code/fsa to do so? Thanks, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
On Thu, May 3, 2012 at 5:38 PM, Lars Ellenberg wrote: > > People sometimes think they have a use case > for influencing which node will be the DC. Agreed :-) > > Sometimes it is latency (certain cli commands work faster > when done on the DC), Config changes can be run against any node, there is no reason to go to the one on the DC. > sometimes they add a "mostly quorum" > node which may be not quite up to the task of being DC. I'm not sure I buy that. Most of the load would comes from the resources themselves. > Prohibiting a node from becoming DC completely would > mean it can not even be cleanly shutdown (with 1.0.x, no MCP), > or act on its own resources for certain no-quorum policies. > > So here is a patch I have been asked to present for discussion, May one ask where it originated? > against Pacemaker 1.0, that introduces a "dc-prio" configuration > parameter, which will add some skew to the election algorithm. > > > Open questions: > * does it make sense at all? Doubtful :-) > > * election algorithm compatibility, stability: > will the election be correct if some nodes have this patch, > and some don't ? Unlikely, but you could easily make it so by placing it after the version check (and bumping said version in the patch) > * How can it be improved so that a node with dc-prio=0 will > "give up" its DC-role as soon as there is at least one other node > with dc-prio > 0? Short of causing an election every time a node joins... I doubt it. > Lars > > > --- ./crmd/election.c.orig 2011-05-11 11:36:05.577329600 +0200 > +++ ./crmd/election.c 2011-05-12 13:49:04.671484200 +0200 > @@ -29,6 +29,7 @@ > GHashTable *voted = NULL; > uint highest_born_on = -1; > static int current_election_id = 1; > +static int our_dc_prio = -1; > > /* A_ELECTION_VOTE */ > void > @@ -55,6 +56,20 @@ > break; > } > > + if (our_dc_prio < 0) { > + char * dc_prio_str = getenv("HA_dc_prio"); > + > + if (dc_prio_str == NULL) { > + our_dc_prio = 1; > + } else { > + our_dc_prio = atoi(dc_prio_str); > + } > + } > + > + if (!our_dc_prio) { > + not_voting = TRUE; > + } > + > if(not_voting == FALSE) { > if(is_set(fsa_input_register, R_STARTING)) { > not_voting = TRUE; > @@ -72,12 +87,13 @@ > } > > vote = create_request( > - CRM_OP_VOTE, NULL, NULL, > + our_dc_prio?CRM_OP_VOTE:CRM_OP_NOVOTE, NULL, NULL, > CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); > > current_election_id++; > crm_xml_add(vote, F_CRM_ELECTION_OWNER, fsa_our_uuid); > crm_xml_add_int(vote, F_CRM_ELECTION_ID, current_election_id); > + crm_xml_add_int(vote, F_CRM_DC_PRIO, our_dc_prio); > > send_cluster_message(NULL, crm_msg_crmd, vote, TRUE); > free_xml(vote); > @@ -188,6 +204,7 @@ > fsa_data_t *msg_data) > { > int election_id = -1; > + int your_dc_prio = 1; > int log_level = LOG_INFO; > gboolean done = FALSE; > gboolean we_loose = FALSE; > @@ -216,6 +233,17 @@ > your_version = crm_element_value(vote->msg, F_CRM_VERSION); > election_owner = crm_element_value(vote->msg, F_CRM_ELECTION_OWNER); > crm_element_value_int(vote->msg, F_CRM_ELECTION_ID, &election_id); > + crm_element_value_int(vote->msg, F_CRM_DC_PRIO, &your_dc_prio); > + > + if (our_dc_prio < 0) { > + char * dc_prio_str = getenv("HA_dc_prio"); > + > + if (dc_prio_str == NULL) { > + our_dc_prio = 1; > + } else { > + our_dc_prio = atoi(dc_prio_str); > + } > + } > > CRM_CHECK(vote_from != NULL, vote_from = fsa_our_uname); > > @@ -269,6 +297,13 @@ > reason = "Recorded"; > done = TRUE; > > + } else if(our_dc_prio < your_dc_prio) { > + reason = "DC Prio"; > + we_loose = TRUE; > + > + } else if(our_dc_prio > your_dc_prio) { > + reason = "DC Prio"; > + > } else if(compare_version(your_version, CRM_FEATURE_SET) < 0) { > reason = "Version"; > we_loose = TRUE; > @@ -328,6 +363,7 @@ > > crm_xml_add(novote, F_CRM_ELECTION_OWNER, election_owner); > crm_xml_add_int(novote, F_CRM_ELECTION_ID, election_id); > + crm_xml_add_int(novote, F_CRM_DC_PRIO, our_dc_prio); > > send_cluster_message(vote_from, crm_msg_crmd, novote, TRUE); > free_xml(novote); > --- ./include/crm/msg_xml.h.orig 2011-05-11 18:22:08.061726000 +0200 > +++ ./include/crm/msg_xml.h 2011-05-11 18:24:17.405132000 +0200 > @@ -32,6 +32,7 @@ > #define F_CRM_ORIGIN "origin" > #define F_CR
[Pacemaker] [RFC] [Patch] DC node preferences (dc-priority)
People sometimes think they have a use case for influencing which node will be the DC. Sometimes it is latency (certain cli commands work faster when done on the DC), sometimes they add a "mostly quorum" node which may be not quite up to the task of being DC. Prohibiting a node from becoming DC completely would mean it can not even be cleanly shutdown (with 1.0.x, no MCP), or act on its own resources for certain no-quorum policies. So here is a patch I have been asked to present for discussion, against Pacemaker 1.0, that introduces a "dc-prio" configuration parameter, which will add some skew to the election algorithm. Open questions: * does it make sense at all? * election algorithm compatibility, stability: will the election be correct if some nodes have this patch, and some don't ? * How can it be improved so that a node with dc-prio=0 will "give up" its DC-role as soon as there is at least one other node with dc-prio > 0? Lars --- ./crmd/election.c.orig 2011-05-11 11:36:05.577329600 +0200 +++ ./crmd/election.c 2011-05-12 13:49:04.671484200 +0200 @@ -29,6 +29,7 @@ GHashTable *voted = NULL; uint highest_born_on = -1; static int current_election_id = 1; +static int our_dc_prio = -1; /* A_ELECTION_VOTE */ void @@ -55,6 +56,20 @@ break; } + if (our_dc_prio < 0) { + char * dc_prio_str = getenv("HA_dc_prio"); + + if (dc_prio_str == NULL) { + our_dc_prio = 1; + } else { + our_dc_prio = atoi(dc_prio_str); + } + } + + if (!our_dc_prio) { + not_voting = TRUE; + } + if(not_voting == FALSE) { if(is_set(fsa_input_register, R_STARTING)) { not_voting = TRUE; @@ -72,12 +87,13 @@ } vote = create_request( - CRM_OP_VOTE, NULL, NULL, + our_dc_prio?CRM_OP_VOTE:CRM_OP_NOVOTE, NULL, NULL, CRM_SYSTEM_CRMD, CRM_SYSTEM_CRMD, NULL); current_election_id++; crm_xml_add(vote, F_CRM_ELECTION_OWNER, fsa_our_uuid); crm_xml_add_int(vote, F_CRM_ELECTION_ID, current_election_id); + crm_xml_add_int(vote, F_CRM_DC_PRIO, our_dc_prio); send_cluster_message(NULL, crm_msg_crmd, vote, TRUE); free_xml(vote); @@ -188,6 +204,7 @@ fsa_data_t *msg_data) { int election_id = -1; + int your_dc_prio = 1; int log_level = LOG_INFO; gboolean done = FALSE; gboolean we_loose = FALSE; @@ -216,6 +233,17 @@ your_version = crm_element_value(vote->msg, F_CRM_VERSION); election_owner = crm_element_value(vote->msg, F_CRM_ELECTION_OWNER); crm_element_value_int(vote->msg, F_CRM_ELECTION_ID, &election_id); + crm_element_value_int(vote->msg, F_CRM_DC_PRIO, &your_dc_prio); + + if (our_dc_prio < 0) { + char * dc_prio_str = getenv("HA_dc_prio"); + + if (dc_prio_str == NULL) { + our_dc_prio = 1; + } else { + our_dc_prio = atoi(dc_prio_str); + } + } CRM_CHECK(vote_from != NULL, vote_from = fsa_our_uname); @@ -269,6 +297,13 @@ reason = "Recorded"; done = TRUE; + } else if(our_dc_prio < your_dc_prio) { + reason = "DC Prio"; + we_loose = TRUE; + + } else if(our_dc_prio > your_dc_prio) { + reason = "DC Prio"; + } else if(compare_version(your_version, CRM_FEATURE_SET) < 0) { reason = "Version"; we_loose = TRUE; @@ -328,6 +363,7 @@ crm_xml_add(novote, F_CRM_ELECTION_OWNER, election_owner); crm_xml_add_int(novote, F_CRM_ELECTION_ID, election_id); + crm_xml_add_int(novote, F_CRM_DC_PRIO, our_dc_prio); send_cluster_message(vote_from, crm_msg_crmd, novote, TRUE); free_xml(novote); --- ./include/crm/msg_xml.h.orig2011-05-11 18:22:08.061726000 +0200 +++ ./include/crm/msg_xml.h 2011-05-11 18:24:17.405132000 +0200 @@ -32,6 +32,7 @@ #define F_CRM_ORIGIN "origin" #define F_CRM_JOIN_ID "join_id" #define F_CRM_ELECTION_ID "election-id" +#define F_CRM_DC_PRIO "dc-prio" #define F_CRM_ELECTION_OWNER "election-owner" #define F_CRM_TGRAPH "crm-tgraph" #define F_CRM_TGRAPH_INPUT "crm-tgraph-in" --- ./lib/ais/plugin.c.orig 2011-05-11 11:29:38.496116000 +0200 +++ ./lib/ais/plugin.c 2011-05-11 17:28:32.385425300 +0200 @@ -421,6 +421,9 @@ get_config_opt(pcmk_api, local_handle, "use_logd", &value, "no"); pcmk_env.use_logd = value; +get_config_opt(pcmk_api, local_handle, "dc_prio", &value, "1"); +pcmk_