Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote: On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic de...@suse.de wrote: On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. No. Otherwise all the regression tests would fail. The PE is smart enough to find promotion score and failcounts in either case. Cool. Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the resource as, not what we call it internally to the PE. What I meant was that some RA use OCF_RESOURCE_INSTANCE to name local files which keep some kind of state. If OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that the worst that can happen is for the probe to fail. But I didn't take a closer look. Thanks, Dejan Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Mon, Oct 29, 2012 at 9:51 PM, Dejan Muhamedagic de...@suse.de wrote: On Fri, Oct 26, 2012 at 11:36:53AM +1100, Andrew Beekhof wrote: On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic de...@suse.de wrote: On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. No. Otherwise all the regression tests would fail. The PE is smart enough to find promotion score and failcounts in either case. Cool. Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the resource as, not what we call it internally to the PE. What I meant was that some RA use OCF_RESOURCE_INSTANCE to name local files which keep some kind of state. If OCF_RESOURCE_INSTANCE changes on upgrade... Well, I guess that the worst that can happen is for the probe to fail. Right. But only for attach/reattach. And people should have maintenance-mode enabled at the point the probe is run, so there is time to fix things up before the cluster does anything about it. But I didn't take a closer look. Thanks, Dejan Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote: check existence of instance number in replication mode because Pacemaker 1.1.8 or higher do not append instance numbers. I think this is wrong. It seems this became necessary because of commit 427c7fe6ea94a566aaa714daf8d214290632f837 Author: Andrew Beekhof and...@beekhof.net Date: Fri Jul 13 13:37:42 2012 +1000 High: PE: Do not append instance numbers to anonymous clones Benefits: - they shouldnt have been exposed in the first place, but I didnt know how not to back then - if admins don't know what they are, they can't be misunderstood or misused - more reliable failcount and promotion scores (since you dont have to check for all possible permutations) - smaller status section since there cant be entries for each possible :N suffix - the name in the config corresponds to the resource in the logs So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? Lars You can merge this Pull Request by running: git pull https://github.com/t-matsuo/resource-agents check-instance-number Or you can view, comment on it, or merge it online at: https://github.com/ClusterLabs/resource-agents/pull/159 -- Commit Summary -- * Low: pgsql: check existence of instance number in replication mode -- File Changes -- M heartbeat/pgsql (44) -- Patch Links -- https://github.com/ClusterLabs/resource-agents/pull/159.patch https://github.com/ClusterLabs/resource-agents/pull/159.diff --- Reply to this email directly or view it on GitHub: https://github.com/ClusterLabs/resource-agents/pull/159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
Usually, we use crm_master command instead of crm_attribute to change own master score in RA. But PostgreSQL's Slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? No, I use it with false and it dosen't need true. -- Takatoshi MATSUO 2012/10/25 Lars Ellenberg lars.ellenb...@linbit.com: On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote: check existence of instance number in replication mode because Pacemaker 1.1.8 or higher do not append instance numbers. I think this is wrong. It seems this became necessary because of commit 427c7fe6ea94a566aaa714daf8d214290632f837 Author: Andrew Beekhof and...@beekhof.net Date: Fri Jul 13 13:37:42 2012 +1000 High: PE: Do not append instance numbers to anonymous clones Benefits: - they shouldnt have been exposed in the first place, but I didnt know how not to back then - if admins don't know what they are, they can't be misunderstood or misused - more reliable failcount and promotion scores (since you dont have to check for all possible permutations) - smaller status section since there cant be entries for each possible :N suffix - the name in the config corresponds to the resource in the logs So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? Lars You can merge this Pull Request by running: git pull https://github.com/t-matsuo/resource-agents check-instance-number Or you can view, comment on it, or merge it online at: https://github.com/ClusterLabs/resource-agents/pull/159 -- Commit Summary -- * Low: pgsql: check existence of instance number in replication mode -- File Changes -- M heartbeat/pgsql (44) -- Patch Links -- https://github.com/ClusterLabs/resource-agents/pull/159.patch https://github.com/ClusterLabs/resource-agents/pull/159.diff --- Reply to this email directly or view it on GitHub: https://github.com/ClusterLabs/resource-agents/pull/159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO matsuo@gmail.com wrote: Usually, we use crm_master command instead of crm_attribute to change own master score in RA. But PostgreSQL's Slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Ouch! No, not ordinary (or recommended) at all :-) What does the crm_attribute command line look like? Maybe the --node option could help? So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? No, I use it with false and it dosen't need true. -- Takatoshi MATSUO 2012/10/25 Lars Ellenberg lars.ellenb...@linbit.com: On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote: check existence of instance number in replication mode because Pacemaker 1.1.8 or higher do not append instance numbers. I think this is wrong. It seems this became necessary because of commit 427c7fe6ea94a566aaa714daf8d214290632f837 Author: Andrew Beekhof and...@beekhof.net Date: Fri Jul 13 13:37:42 2012 +1000 High: PE: Do not append instance numbers to anonymous clones Benefits: - they shouldnt have been exposed in the first place, but I didnt know how not to back then - if admins don't know what they are, they can't be misunderstood or misused - more reliable failcount and promotion scores (since you dont have to check for all possible permutations) - smaller status section since there cant be entries for each possible :N suffix - the name in the config corresponds to the resource in the logs So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? Lars You can merge this Pull Request by running: git pull https://github.com/t-matsuo/resource-agents check-instance-number Or you can view, comment on it, or merge it online at: https://github.com/ClusterLabs/resource-agents/pull/159 -- Commit Summary -- * Low: pgsql: check existence of instance number in replication mode -- File Changes -- M heartbeat/pgsql (44) -- Patch Links -- https://github.com/ClusterLabs/resource-agents/pull/159.patch https://github.com/ClusterLabs/resource-agents/pull/159.diff --- Reply to this email directly or view it on GitHub: https://github.com/ClusterLabs/resource-agents/pull/159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Fri, Oct 26, 2012 at 12:52 AM, Dejan Muhamedagic de...@suse.de wrote: On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. No. Otherwise all the regression tests would fail. The PE is smart enough to find promotion score and failcounts in either case. Also, OCF_RESOURCE_INSTANCE contains whatever the local lrmd knows the resource as, not what we call it internally to the PE. Thanks, Lars Cheers, Dejan ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
2012/10/26 Andrew Beekhof and...@beekhof.net: On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO matsuo@gmail.com wrote: Usually, we use crm_master command instead of crm_attribute to change own master score in RA. But PostgreSQL's Slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Ouch! No, not ordinary (or recommended) at all :-) What does the crm_attribute command line look like? Maybe the --node option could help? # crm_attribute -l reboot -N pm02 -n master-pgsql:1 -v 1000 This line uses crm_master as a reference. I would like crm_master to have a parameter which can set hostname. But crm_master gets hostname using crm_node -n command in these days, so I think that I should fix method to get hostname for next version. It also needs compatible code for Pacemaker 1.0.x :( So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? No, I use it with false and it dosen't need true. -- Takatoshi MATSUO 2012/10/25 Lars Ellenberg lars.ellenb...@linbit.com: On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote: check existence of instance number in replication mode because Pacemaker 1.1.8 or higher do not append instance numbers. I think this is wrong. It seems this became necessary because of commit 427c7fe6ea94a566aaa714daf8d214290632f837 Author: Andrew Beekhof and...@beekhof.net Date: Fri Jul 13 13:37:42 2012 +1000 High: PE: Do not append instance numbers to anonymous clones Benefits: - they shouldnt have been exposed in the first place, but I didnt know how not to back then - if admins don't know what they are, they can't be misunderstood or misused - more reliable failcount and promotion scores (since you dont have to check for all possible permutations) - smaller status section since there cant be entries for each possible :N suffix - the name in the config corresponds to the resource in the logs So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? Lars You can merge this Pull Request by running: git pull https://github.com/t-matsuo/resource-agents check-instance-number Or you can view, comment on it, or merge it online at: https://github.com/ClusterLabs/resource-agents/pull/159 -- Commit Summary -- * Low: pgsql: check existence of instance number in replication mode -- File Changes -- M heartbeat/pgsql (44) -- Patch Links -- https://github.com/ClusterLabs/resource-agents/pull/159.patch https://github.com/ClusterLabs/resource-agents/pull/159.diff --- Reply to this email directly or view it on GitHub: https://github.com/ClusterLabs/resource-agents/pull/159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
2012/10/25 Dejan Muhamedagic de...@suse.de: On Thu, Oct 25, 2012 at 06:09:38AM -0700, Lars Ellenberg wrote: On Thu, Oct 25, 2012 at 03:38:47AM -0700, Takatoshi MATSUO wrote: Usually, we use crm_master command instead of crm_attribute to change master score in RA. But PostgreSQL's slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Would the existing resource agent work with globally-unique=true ? I don't know it works with true. I use it with false and it dosen't need true. I suggested that you actually should use globally-unique clones, as in that case you still get those instance numbers... Does using different clones make sense in pgsql? What is to be different between them? Or would it be just for the sake of getting instance numbers? If so, then it somehow looks wrong to me :) It makes no sense to using different clones. Pgsql only uses instance numbers for changing master score on other nodes. Master score needs it on Pacemaker 1.0.x regardless of globally-unique. But thinking about it once more, I'm not so sure anymore. Correct me where I'm wrong. This is about the master score. In case the Master instance fails, we preferably want to promote the slave instance that is as close as possible to the Master. We only know which *node* was best at the last monitoring interval, which may be good enough. We need to then change the master score for *all possible instances*, for all nodes, accordingly. Which is what that loop did. (I think skipping the current instance is actually a bug; If pacemaker relabeles things in a bad way, you may hit it). Now, with pacemaker 1.1.8, all instances become equal (for anonymous clones, aka globally-unique=false), and we only need to set the score on the resource-id, not for all resource-id:instance combinations. OK. Which is great. After all, the master score in this case is attached to the node (or, the data set accessible from that node), and not to the (arbitrary, potentially relabeled anytime) instance number pacemaker assigned to the clone instance running on that node. And that is exactly what your patch does: * detect if a version of pacemaker is in use that attaches the instance number to the resource id * if so, do the loop on all possible instance numbers as before * if not, only set the master score on the resource-id Is my understanding correct? Then I think you patch is good. Yes, the patch seems good then. Though there is quite a bit of code repetition. The set attribute part should be moved to an extra function. I will improve it. Still, other resource agents that use master scores (or any other attributes that reference instance numbers of anonymous clones) need to be reviewed. Though this I'll set scores for other instances, not only myself logic is unique to pgsql, so most other resource agents should just work with whatever is present in the environment, they typically treat the $OCF_RESOURCE_INSTANCE as opaque. Seems like no other RA uses instance numbers. However, quite a few use OCF_RESOURCE_INSTANCE which, in case of clone/ms resources, may potentially lead to unpredictable results on upgrade to 1.1.8. Thanks, Lars Cheers, Dejan Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
Re: [Linux-ha-dev] [resource-agents] Low: pgsql: check existence of instance number in replication mode (#159)
On Fri, Oct 26, 2012 at 12:49 PM, Takatoshi MATSUO matsuo@gmail.com wrote: 2012/10/26 Andrew Beekhof and...@beekhof.net: On Thu, Oct 25, 2012 at 10:01 PM, Takatoshi MATSUO matsuo@gmail.com wrote: Usually, we use crm_master command instead of crm_attribute to change own master score in RA. But PostgreSQL's Slave can't get own replication status, so Master changes Slave's master-score using instance number on Pacemaker 1.0.x . This probably is not ordinary usage. Ouch! No, not ordinary (or recommended) at all :-) What does the crm_attribute command line look like? Maybe the --node option could help? # crm_attribute -l reboot -N pm02 -n master-pgsql:1 -v 1000 That looks fine, just drop the :1 (or use whatever is in OCF_RESOURCE_INSTANCE) This line uses crm_master as a reference. I would like crm_master to have a parameter which can set hostname. Probably not going to happen. crm_master is a convenience function for the common use case. Its fine to switch to crm_attribute for advanced usage. But crm_master gets hostname using crm_node -n command in these days, so I think that I should fix method to get hostname for next version. It also needs compatible code for Pacemaker 1.0.x :( So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? No, I use it with false and it dosen't need true. -- Takatoshi MATSUO 2012/10/25 Lars Ellenberg lars.ellenb...@linbit.com: On Thu, Oct 25, 2012 at 01:24:40AM -0700, Takatoshi MATSUO wrote: check existence of instance number in replication mode because Pacemaker 1.1.8 or higher do not append instance numbers. I think this is wrong. It seems this became necessary because of commit 427c7fe6ea94a566aaa714daf8d214290632f837 Author: Andrew Beekhof and...@beekhof.net Date: Fri Jul 13 13:37:42 2012 +1000 High: PE: Do not append instance numbers to anonymous clones Benefits: - they shouldnt have been exposed in the first place, but I didnt know how not to back then - if admins don't know what they are, they can't be misunderstood or misused - more reliable failcount and promotion scores (since you dont have to check for all possible permutations) - smaller status section since there cant be entries for each possible :N suffix - the name in the config corresponds to the resource in the logs So if pgsql thinks it needs these instance numbers, maybe it is not so anonymous a clone, after all? Would the existing resource agent work with globally-unique=true ? Lars You can merge this Pull Request by running: git pull https://github.com/t-matsuo/resource-agents check-instance-number Or you can view, comment on it, or merge it online at: https://github.com/ClusterLabs/resource-agents/pull/159 -- Commit Summary -- * Low: pgsql: check existence of instance number in replication mode -- File Changes -- M heartbeat/pgsql (44) -- Patch Links -- https://github.com/ClusterLabs/resource-agents/pull/159.patch https://github.com/ClusterLabs/resource-agents/pull/159.diff --- Reply to this email directly or view it on GitHub: https://github.com/ClusterLabs/resource-agents/pull/159 -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ -- Thanks, Takatoshi MATSUO ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/ ___ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/