Re: [ClusterLabs] Antw: Re: Antw: Re: 2-Node Cluster Pointless?
24.04.2017 09:15, Ulrich Windl пишет: Andrei Borzenkovschrieb am 22.04.2017 um 09:05 in > Nachricht : >> 18.04.2017 10:47, Ulrich Windl пишет: >> ... Now let me come back to quorum vs. stonith; Said simply; Quorum is a tool for when everything is working. Fencing is a tool for when things go wrong. >>> >>> I'd say: Quorum is the tool to decide who'll be alive and who's going to >> die, >>> and STONITH is the tool to make nodes die. >> >> If I had PROD, QA and DEV in a cluster and PROD were separated from >> QA+DEV I'd be very sad if PROD were shut down. >> >> The notion of simple node majority as kill policy is not appropriate as >> well as simple node based delays. I wish pacemaker supported scoring >> system for resources so that we could base stonith delays on them (the >> most important sub-cluster starts fencing first). > > So your preference for a 2|1 node split brain scenario is to make the one node > survive if it runs the more important resources? > Correct. Except I'm accustomed to call it "application" which is collection of resources. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group
On 04/24/2017 02:33 PM, Lentes, Bernd wrote: > > - On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote: > primitive prim_vnc_ip_mausdb IPaddr \ params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ meta is-managed=true >> >> I don't see allow-migrate on the IP. Is this a modified IPaddr? The >> stock resource agent doesn't support migrate_from/migrate_to. > > Not modified. I can migrate the resource without the group easily between the > nodes. And also if i try to live-migrate the whole group, > the IP is migrated. Unfortunately, migration is not live migration ... a resource (the VM) can't be live-migrated if it depends on another resource (the IP) that isn't live-migrateable. If you modify IPaddr to be live-migrateable, it should work. It has to support migrate_from and migrate_to actions, and advertise them in the meta-data. It doesn't necessarily have to do anything different from stop/start, as long as that meets your needs. >>> What i found in the net: >>> http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html >>> >>> " Yes, migration only works without order-contraints the migrating service >>> depends on ... and no way to force it." >> >> I believe this was true in pacemaker 1.1.11 and earlier. >> > > Then it should be possible: > > ha-idg-2:~ # rpm -q pacemaker > pacemaker-1.1.12-11.12 > > Bernd > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Coming in Pacemaker 1.1.17: start a node in standby
Hi all, Pacemaker 1.1.17 will have a feature that people have occasionally asked for in the past: the ability to start a node in standby mode. It will be controlled by an environment variable (set in /etc/sysconfig/pacemaker, /etc/default/pacemaker, or wherever your distro puts them): # By default, nodes will join the cluster in an online state when they first # start, unless they were previously put into standby mode. If this variable is # set to "standby" or "online", it will force this node to join in the # specified state when starting. # (experimental; currently ignored for Pacemaker Remote nodes) # PCMK_node_start_state=default As described, it will be considered experimental in this release, mainly because it doesn't work with Pacemaker Remote nodes yet. However, I don't expect any problems using it with cluster nodes. Example use cases: You want want fenced nodes to automatically start the cluster after a reboot, so they contribute to quorum, but not run any resources, so the problem can be investigated. You would leave PCMK_node_start_state=standby permanently. You want to ensure a newly added node joins the cluster without problems before allowing it to run resources. You would set this to "standby" when deploying the node, and remove the setting once you're satisfied with the node, so it can run resources at future reboots. You want a standby setting to last only until the next boot. You would set this permanently to "online", and any manual setting of standby mode would be overwritten at the next boot. Many thanks to developers Alexandra Zhuravleva and Sergey Mishin, who contributed this feature as part of a project with EMC. -- Ken Gaillot___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group
- On Apr 24, 2017, at 9:11 PM, Ken Gaillot kgail...@redhat.com wrote: >>> >>> >>> primitive prim_vnc_ip_mausdb IPaddr \ >>>params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ >>>meta is-managed=true > > I don't see allow-migrate on the IP. Is this a modified IPaddr? The > stock resource agent doesn't support migrate_from/migrate_to. Not modified. I can migrate the resource without the group easily between the nodes. And also if i try to live-migrate the whole group, the IP is migrated. >> What i found in the net: >> http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html >> >> " Yes, migration only works without order-contraints the migrating service >> depends on ... and no way to force it." > > I believe this was true in pacemaker 1.1.11 and earlier. > Then it should be possible: ha-idg-2:~ # rpm -q pacemaker pacemaker-1.1.12-11.12 Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group
- On Apr 24, 2017, at 8:26 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i have a primitive VirtualDomain resource which i can live migrate without any > problem. > Additionally i have an IP as a resource which i can live mirgate easily too. > If i combine them in a group, i can't live migrate the VirtualDomain anymore. > > It is shuted down on one node and rebooted on the other. :-( > > This is my config: > > primitive prim_vm_mausdb VirtualDomain \ >params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ >params hypervisor="qemu:///system" \ >params migration_transport=ssh \ >params autoset_utilization_cpu=false \ >params autoset_utilization_hv_memory=false \ >op start interval=0 timeout=120 \ >op stop interval=0 timeout=130 \ >op monitor interval=30 timeout=30 \ >op migrate_from interval=0 timeout=180 \ >op migrate_to interval=0 timeout=190 \ >meta allow-migrate=true is-managed=true \ >utilization cpu=4 hv_memory=8005 > > > primitive prim_vnc_ip_mausdb IPaddr \ >params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ >meta is-managed=true > > group group_vnc_vm_mausdb prim_vnc_ip_mausdb prim_vm_mausdb \ >meta target-role=Started > > > Why can't i live migrate the VirtualDomain primitive being part of a group ? > > Thanks. > > > Bernd > > What i found in the net: http://lists.clusterlabs.org/pipermail/pacemaker/2011-November/012088.html " Yes, migration only works without order-contraints the migrating service depends on ... and no way to force it." It's not possible ? Bernd Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] can't live migrate VirtualDomain which is part of a group
-- Bernd Lentes Systemadministration institute of developmental genetics Gebäude 35.34 - Raum 208 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 (0)89 3187 1241 fax: +49 (0)89 3187 2294 Erst wenn man sich auf etwas festlegt kann man Unrecht haben Scott Adams - On Apr 24, 2017, at 8:26 PM, Bernd Lentes bernd.len...@helmholtz-muenchen.de wrote: > Hi, > > i have a primitive VirtualDomain resource which i can live migrate without any > problem. > Additionally i have an IP as a resource which i can live mirgate easily too. > If i combine them in a group, i can't live migrate the VirtualDomain anymore. > > It is shuted down on one node and rebooted on the other. :-( > > This is my config: > > primitive prim_vm_mausdb VirtualDomain \ >params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ >params hypervisor="qemu:///system" \ >params migration_transport=ssh \ >params autoset_utilization_cpu=false \ >params autoset_utilization_hv_memory=false \ >op start interval=0 timeout=120 \ >op stop interval=0 timeout=130 \ >op monitor interval=30 timeout=30 \ >op migrate_from interval=0 timeout=180 \ >op migrate_to interval=0 timeout=190 \ >meta allow-migrate=true is-managed=true \ >utilization cpu=4 hv_memory=8005 > > > primitive prim_vnc_ip_mausdb IPaddr \ >params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ >meta is-managed=true > > group group_vnc_vm_mausdb prim_vnc_ip_mausdb prim_vm_mausdb \ >meta target-role=Started > > > Why can't i live migrate the VirtualDomain primitive being part of a group ? > > Thanks. > > > Bernd > > > > -- > Bernd Lentes > > Systemadministration > institute of developmental genetics > Gebäude 35.34 - Raum 208 > HelmholtzZentrum München > bernd.len...@helmholtz-muenchen.de > phone: +49 (0)89 3187 1241 > fax: +49 (0)89 3187 2294 > > Erst wenn man sich auf etwas festlegt kann man Unrecht haben > Scott Adams > > > Helmholtz Zentrum Muenchen > Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) > Ingolstaedter Landstr. 1 > 85764 Neuherberg > www.helmholtz-muenchen.de > Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe > Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons > Enhsen > Registergericht: Amtsgericht Muenchen HRB 6466 > USt-IdNr: DE 129521671 > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 11:27:51 -0500 Ken Gaillotwrote: > On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote: > > On Mon, 24 Apr 2017 17:08:15 +0200 > > Lars Ellenberg wrote: > > > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais > >> wrote: > >>> Hi all, > >>> > >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > >>> negative feedback we got is how difficult it is to experience with it > >>> because of fencing occurring way too frequently. I am currently hunting > >>> this kind of useless fencing to make life easier. > >>> > >>> It occurs to me, a frequent reason of fencing is because during the stop > >>> action, we check the status of the PostgreSQL instance using our monitor > >>> function before trying to stop the resource. If the function does not > >>> return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise > >>> an error, leading to a fencing. See: > >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > >>> > >>> I am considering adding a check to define if the instance is stopped even > >>> if the monitor action returns an error. The idea would be to parse **all** > >>> the local processes looking for at least one pair of > >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > >>> stop. If none are found, we consider the instance is not running. > >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > >>> > >>> Just for completeness, the piece of code would be: > >>> > >>>my @pids; > >>>foreach my $f (glob "/proc/[0-9]*") { > >>>push @pids => basename($f) > >>>if -r $f > >>>and basename( readlink( "$f/exe" ) ) eq "postgres" > >>>and readlink( "$f/cwd" ) eq $pgdata; > >>>} > >>> > >>> I feels safe enough to me. The only risk I could think of is in a shared > >>> disk cluster with multiple nodes accessing the same data in RW (such setup > >>> can fail in so many ways :)). However, PAF is not supposed to work in such > >>> context, so I can live with this. > >>> > >>> Do you guys have some advices? Do you see some drawbacks? Hazards? > >> > >> Isn't that the wrong place to "fix" it? > >> Why did your _monitor return something "weird"? > > > > Because this _monitor is the one called by the monitor action. It is able to > > define if an instance is running and if it feels good. > > > > Take the scenario where the slave instance is crashed: > > 1/ the monitor action raise an OCF_ERR_GENERIC > > 2/ Pacemaker tries a recover of the resource (stop->start) > > 3/ the stop action fails because _monitor says the resource is crashed > > 4/ Pacemaker fence the node. > > > >> What did it return? > > > > Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance. > > > >> Should you not fix it there? > > > > fixing this in the monitor action? This would bloat the code of this > > function. We would have to add a special code path in there to define if it > > is called as a real monitor action or just as a status one for other > > actions. > > > > But anyway, here or there, I would have to add this piece of code looking at > > each processes. According to you, is it safe enough? Do you see some hazard > > with it? > > > >> Just thinking out loud. > > > > Thank you, it helps :) > > It feels odd that there is a situation where monitor should return an > error (instead of "not running"), but stop should return OK. > > I think the question is whether the service can be considered cleanly > stopped at that point -- i.e. whether it's safe for another node to > become master, and safe to try starting the crashed service again on the > same node. > > If it's cleanly stopped, the monitor should probably return "not > running". Pacemaker will already compare that result against the > expected state, and recover appropriately if needed. From old OCF dev guide, the advice is to do everything possible to stop the resource, even killing it: http://www.linux-ha.org/doc/dev-guides/_literal_stop_literal_action.html «It is important to understand that stop is a force operation — the resource agent must do everything in its power to shut down, the resource, short of rebooting the node or shutting it off» and «a resource agent should make sure that it exits with an error only if all avenues for proper resource shutdown have been exhausted.» I know this guide is quite outdated though. Fresher informations are welcome if Pacemaker PEngine/crm is not expecting this anymore. Moreover, this (outdated) doc, states that the stop action should not return OCF_NOT_RUNNING, but OCF_SUCCESS. > The PID check assumes there can only be one instance of postgresql on > the machine. If there are instances bound to different IPs, or some user > starts a private instance, it could be inaccurate. But that would
[ClusterLabs] can't live migrate VirtualDomain which is part of a group
Hi, i have a primitive VirtualDomain resource which i can live migrate without any problem. Additionally i have an IP as a resource which i can live mirgate easily too. If i combine them in a group, i can't live migrate the VirtualDomain anymore. It is shuted down on one node and rebooted on the other. :-( This is my config: primitive prim_vm_mausdb VirtualDomain \ params config="/var/lib/libvirt/images/xml/mausdb_vm.xml" \ params hypervisor="qemu:///system" \ params migration_transport=ssh \ params autoset_utilization_cpu=false \ params autoset_utilization_hv_memory=false \ op start interval=0 timeout=120 \ op stop interval=0 timeout=130 \ op monitor interval=30 timeout=30 \ op migrate_from interval=0 timeout=180 \ op migrate_to interval=0 timeout=190 \ meta allow-migrate=true is-managed=true \ utilization cpu=4 hv_memory=8005 primitive prim_vnc_ip_mausdb IPaddr \ params ip=146.107.235.161 nic=br0 cidr_netmask=24 \ meta is-managed=true group group_vnc_vm_mausdb prim_vnc_ip_mausdb prim_vm_mausdb \ meta target-role=Started Why can't i live migrate the VirtualDomain primitive being part of a group ? Thanks. Bernd -- Bernd Lentes Systemadministration institute of developmental genetics Gebäude 35.34 - Raum 208 HelmholtzZentrum München bernd.len...@helmholtz-muenchen.de phone: +49 (0)89 3187 1241 fax: +49 (0)89 3187 2294 Erst wenn man sich auf etwas festlegt kann man Unrecht haben Scott Adams Helmholtz Zentrum Muenchen Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH) Ingolstaedter Landstr. 1 85764 Neuherberg www.helmholtz-muenchen.de Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen Registergericht: Amtsgericht Muenchen HRB 6466 USt-IdNr: DE 129521671 ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 17:52:09 +0200 Jan Pokornýwrote: > On 24/04/17 17:32 +0200, Jehan-Guillaume de Rorthais wrote: > > On Mon, 24 Apr 2017 17:08:15 +0200 > > Lars Ellenberg wrote: > > > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais > >> wrote: > >>> Hi all, > >>> > >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > >>> negative feedback we got is how difficult it is to experience with it > >>> because of fencing occurring way too frequently. I am currently hunting > >>> this kind of useless fencing to make life easier. > >>> > >>> It occurs to me, a frequent reason of fencing is because during the stop > >>> action, we check the status of the PostgreSQL instance using our monitor > >>> function before trying to stop the resource. If the function does not > >>> return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise > >>> an error, leading to a fencing. See: > >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > >>> > >>> I am considering adding a check to define if the instance is stopped even > >>> if the monitor action returns an error. The idea would be to parse **all** > >>> the local processes looking for at least one pair of > >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > >>> stop. If none are found, we consider the instance is not running. > >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > >>> > >>> Just for completeness, the piece of code would be: > >>> > >>>my @pids; > >>>foreach my $f (glob "/proc/[0-9]*") { > >>>push @pids => basename($f) > >>>if -r $f > >>>and basename( readlink( "$f/exe" ) ) eq "postgres" > >>>and readlink( "$f/cwd" ) eq $pgdata; > >>>} > >>> > >>> I feels safe enough to me. > > > > [...] > > > > But anyway, here or there, I would have to add this piece of code looking at > > each processes. According to you, is it safe enough? Do you see some hazard > > with it? > > Just for the sake of completeness, there's a race condition, indeed, > in multiple repeated path traversals (without being fixed of particular > entry inode), which can be interleaved with new postgres process being > launched anew (or what not). But that may happen even before the code > in question is executed -- naturally not having a firm grip on the > process is open to such possible issues, so this is just an aside. Indeed, a new process can appear right after the glob listing them. However, in a Pacemaker cluster, only Pacemaker should be responsible to start the resource. PostgreSQL is not able to restart itself by its own. I don't want to rely on the postmaster.pid (the postgresql pid file) file existence or content, neither track the postmaster pid from the RA itself. Way too much race conditions or complexity appears when I start thinking about it. Thank you for your answer! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On 04/24/2017 10:32 AM, Jehan-Guillaume de Rorthais wrote: > On Mon, 24 Apr 2017 17:08:15 +0200 > Lars Ellenbergwrote: > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: >>> Hi all, >>> >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent >>> negative feedback we got is how difficult it is to experience with it >>> because of fencing occurring way too frequently. I am currently hunting >>> this kind of useless fencing to make life easier. >>> >>> It occurs to me, a frequent reason of fencing is because during the stop >>> action, we check the status of the PostgreSQL instance using our monitor >>> function before trying to stop the resource. If the function does not return >>> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, >>> leading to a fencing. See: >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 >>> >>> I am considering adding a check to define if the instance is stopped even >>> if the monitor action returns an error. The idea would be to parse **all** >>> the local processes looking for at least one pair of >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to >>> stop. If none are found, we consider the instance is not running. >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. >>> >>> Just for completeness, the piece of code would be: >>> >>>my @pids; >>>foreach my $f (glob "/proc/[0-9]*") { >>>push @pids => basename($f) >>>if -r $f >>>and basename( readlink( "$f/exe" ) ) eq "postgres" >>>and readlink( "$f/cwd" ) eq $pgdata; >>>} >>> >>> I feels safe enough to me. The only risk I could think of is in a shared >>> disk cluster with multiple nodes accessing the same data in RW (such setup >>> can fail in so many ways :)). However, PAF is not supposed to work in such >>> context, so I can live with this. >>> >>> Do you guys have some advices? Do you see some drawbacks? Hazards? >> >> Isn't that the wrong place to "fix" it? >> Why did your _monitor return something "weird"? > > Because this _monitor is the one called by the monitor action. It is able to > define if an instance is running and if it feels good. > > Take the scenario where the slave instance is crashed: > 1/ the monitor action raise an OCF_ERR_GENERIC > 2/ Pacemaker tries a recover of the resource (stop->start) > 3/ the stop action fails because _monitor says the resource is crashed > 4/ Pacemaker fence the node. > >> What did it return? > > Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance. > >> Should you not fix it there? > > fixing this in the monitor action? This would bloat the code of this function. > We would have to add a special code path in there to define if it is called > as a real monitor action or just as a status one for other actions. > > But anyway, here or there, I would have to add this piece of code looking at > each processes. According to you, is it safe enough? Do you see some hazard > with it? > >> Just thinking out loud. > > Thank you, it helps :) It feels odd that there is a situation where monitor should return an error (instead of "not running"), but stop should return OK. I think the question is whether the service can be considered cleanly stopped at that point -- i.e. whether it's safe for another node to become master, and safe to try starting the crashed service again on the same node. If it's cleanly stopped, the monitor should probably return "not running". Pacemaker will already compare that result against the expected state, and recover appropriately if needed. The PID check assumes there can only be one instance of postgresql on the machine. If there are instances bound to different IPs, or some user starts a private instance, it could be inaccurate. But that would err on the side of fencing, so it might still be useful, if you don't have a way of more narrowly identifying the expected instance. ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On 24/04/17 17:32 +0200, Jehan-Guillaume de Rorthais wrote: > On Mon, 24 Apr 2017 17:08:15 +0200 > Lars Ellenbergwrote: > >> On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: >>> Hi all, >>> >>> In the PostgreSQL Automatic Failover (PAF) project, one of most frequent >>> negative feedback we got is how difficult it is to experience with it >>> because of fencing occurring way too frequently. I am currently hunting >>> this kind of useless fencing to make life easier. >>> >>> It occurs to me, a frequent reason of fencing is because during the stop >>> action, we check the status of the PostgreSQL instance using our monitor >>> function before trying to stop the resource. If the function does not return >>> OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, >>> leading to a fencing. See: >>> https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 >>> >>> I am considering adding a check to define if the instance is stopped even >>> if the monitor action returns an error. The idea would be to parse **all** >>> the local processes looking for at least one pair of >>> "/proc//{comm,cwd}" related to the PostgreSQL instance we want to >>> stop. If none are found, we consider the instance is not running. >>> Gracefully or not, we just know it is down and we can return OCF_SUCCESS. >>> >>> Just for completeness, the piece of code would be: >>> >>>my @pids; >>>foreach my $f (glob "/proc/[0-9]*") { >>>push @pids => basename($f) >>>if -r $f >>>and basename( readlink( "$f/exe" ) ) eq "postgres" >>>and readlink( "$f/cwd" ) eq $pgdata; >>>} >>> >>> I feels safe enough to me. > > [...] > > But anyway, here or there, I would have to add this piece of code looking at > each processes. According to you, is it safe enough? Do you see some hazard > with it? Just for the sake of completeness, there's a race condition, indeed, in multiple repeated path traversals (without being fixed of particular entry inode), which can be interleaved with new postgres process being launched anew (or what not). But that may happen even before the code in question is executed -- naturally not having a firm grip on the process is open to such possible issues, so this is just an aside. -- Jan (Poki) pgpqXTwVAcHxU.pgp Description: PGP signature ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, 24 Apr 2017 17:08:15 +0200 Lars Ellenbergwrote: > On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: > > Hi all, > > > > In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > > negative feedback we got is how difficult it is to experience with it > > because of fencing occurring way too frequently. I am currently hunting > > this kind of useless fencing to make life easier. > > > > It occurs to me, a frequent reason of fencing is because during the stop > > action, we check the status of the PostgreSQL instance using our monitor > > function before trying to stop the resource. If the function does not return > > OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, > > leading to a fencing. See: > > https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > > > > I am considering adding a check to define if the instance is stopped even > > if the monitor action returns an error. The idea would be to parse **all** > > the local processes looking for at least one pair of > > "/proc//{comm,cwd}" related to the PostgreSQL instance we want to > > stop. If none are found, we consider the instance is not running. > > Gracefully or not, we just know it is down and we can return OCF_SUCCESS. > > > > Just for completeness, the piece of code would be: > > > >my @pids; > >foreach my $f (glob "/proc/[0-9]*") { > >push @pids => basename($f) > >if -r $f > >and basename( readlink( "$f/exe" ) ) eq "postgres" > >and readlink( "$f/cwd" ) eq $pgdata; > >} > > > > I feels safe enough to me. The only risk I could think of is in a shared > > disk cluster with multiple nodes accessing the same data in RW (such setup > > can fail in so many ways :)). However, PAF is not supposed to work in such > > context, so I can live with this. > > > > Do you guys have some advices? Do you see some drawbacks? Hazards? > > Isn't that the wrong place to "fix" it? > Why did your _monitor return something "weird"? Because this _monitor is the one called by the monitor action. It is able to define if an instance is running and if it feels good. Take the scenario where the slave instance is crashed: 1/ the monitor action raise an OCF_ERR_GENERIC 2/ Pacemaker tries a recover of the resource (stop->start) 3/ the stop action fails because _monitor says the resource is crashed 4/ Pacemaker fence the node. > What did it return? Either OCF_ERR_GENERIC or OCF_FAILED_MASTER as instance. > Should you not fix it there? fixing this in the monitor action? This would bloat the code of this function. We would have to add a special code path in there to define if it is called as a real monitor action or just as a status one for other actions. But anyway, here or there, I would have to add this piece of code looking at each processes. According to you, is it safe enough? Do you see some hazard with it? > Just thinking out loud. Thank you, it helps :) ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] [ClusterLabs Developers] checking all procs on system enough during stop action?
On Mon, Apr 24, 2017 at 04:34:07PM +0200, Jehan-Guillaume de Rorthais wrote: > Hi all, > > In the PostgreSQL Automatic Failover (PAF) project, one of most frequent > negative feedback we got is how difficult it is to experience with it because > of > fencing occurring way too frequently. I am currently hunting this kind of > useless fencing to make life easier. > > It occurs to me, a frequent reason of fencing is because during the stop > action, we check the status of the PostgreSQL instance using our monitor > function before trying to stop the resource. If the function does not return > OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, > leading to a fencing. See: > https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 > > I am considering adding a check to define if the instance is stopped even if > the > monitor action returns an error. The idea would be to parse **all** the local > processes looking for at least one pair of "/proc//{comm,cwd}" related to > the PostgreSQL instance we want to stop. If none are found, we consider the > instance is not running. Gracefully or not, we just know it is down and we can > return OCF_SUCCESS. > > Just for completeness, the piece of code would be: > >my @pids; >foreach my $f (glob "/proc/[0-9]*") { >push @pids => basename($f) >if -r $f >and basename( readlink( "$f/exe" ) ) eq "postgres" >and readlink( "$f/cwd" ) eq $pgdata; >} > > I feels safe enough to me. The only risk I could think of is in a shared disk > cluster with multiple nodes accessing the same data in RW (such setup can > fail in so many ways :)). However, PAF is not supposed to work in such > context, > so I can live with this. > > Do you guys have some advices? Do you see some drawbacks? Hazards? Isn't that the wrong place to "fix" it? Why did your _monitor return something "weird"? What did it return? Should you not fix it there? Just thinking out loud. Cheers, Lars ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] checking all procs on system enough during stop action?
Hi all, In the PostgreSQL Automatic Failover (PAF) project, one of most frequent negative feedback we got is how difficult it is to experience with it because of fencing occurring way too frequently. I am currently hunting this kind of useless fencing to make life easier. It occurs to me, a frequent reason of fencing is because during the stop action, we check the status of the PostgreSQL instance using our monitor function before trying to stop the resource. If the function does not return OCF_NOT_RUNNING, OCF_SUCCESS or OCF_RUNNING_MASTER, we just raise an error, leading to a fencing. See: https://github.com/dalibo/PAF/blob/d50d0d783cfdf5566c3b7c8bd7ef70b11e4d1043/script/pgsqlms#L1291-L1301 I am considering adding a check to define if the instance is stopped even if the monitor action returns an error. The idea would be to parse **all** the local processes looking for at least one pair of "/proc//{comm,cwd}" related to the PostgreSQL instance we want to stop. If none are found, we consider the instance is not running. Gracefully or not, we just know it is down and we can return OCF_SUCCESS. Just for completeness, the piece of code would be: my @pids; foreach my $f (glob "/proc/[0-9]*") { push @pids => basename($f) if -r $f and basename( readlink( "$f/exe" ) ) eq "postgres" and readlink( "$f/cwd" ) eq $pgdata; } I feels safe enough to me. The only risk I could think of is in a shared disk cluster with multiple nodes accessing the same data in RW (such setup can fail in so many ways :)). However, PAF is not supposed to work in such context, so I can live with this. Do you guys have some advices? Do you see some drawbacks? Hazards? Thanks in advance! -- Jehan-Guillaume de Rorthais Dalibo ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Alert notifications on all nodes for any event type on one resource
On 04/21/2017 08:08 PM, Rekha Panda wrote: > Hi, > First, hopefully I am posting this in the right email thread! :) > > So in a 2 node cluster, we add resources (2 for now, one in each node) > for monitoring, which is located one in each node. What we expect is > any change in the event type (start/stop) for the resource, we get an > alert (which currently is directed to run a shell script). > What we notice is that the script runs only for event change of a > resource in a particular node. We would like to run the script on both > nodes, which we can then decide how we handle it in the script. > > Is there a way to run the script on both nodes for one resource event > change? If not, is there a plan to support it in the future? Unless we don't see a really strong use-case this is not very likely to happen I guess ... Currently the processes running the alert-agents are spawned from crmd-context. So if a resource is not to be started/stopped on a particular node and this node is not the DC this type of information is - at the moment - not passing by the crmd on this node. Broadcasting this information to all nodes (no IP broadcast used by corosync at least by default) would mean an enormous increase of cluster-communication. Thus it rather would have to be done just for certain resources based on rules. And that would mean quite some implementation-effort - at least without a smart idea ;-) I don't exactly know what you are doing but having a dependent, cloned not colocated resource, where you put your stuff into the RA, might be suitable for your scenario. The clones would be started after start of the to be observed resource and they would be stopped prior to stopping it. You wouldn't see immediately on which node it would be to happen but you could still use 'crm_resource --resource myResource --locate' to tell you. > > Thanks, > Rekha > > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Klaus Wenninger Senior Software Engineer, EMEA ENG Openstack Infrastructure Red Hat kwenn...@redhat.com ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [ClusterLabs] Digest does not match
Among two cases where I have seen this error messages I solved one. On one cluster these dedicated interfaces were connected to a switch instead of being connected directly. Ok Though I still don't know what caused these errors on another system (the logs in the previous email). Actually there is really nothing in the logs what could indicate that nodes ever merged. I would suggest to check corosync.conf and authkeys equality. Honza ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[ClusterLabs] Antw: Re: Antw: Re: 2-Node Cluster Pointless?
>>> Andrei Borzenkovschrieb am 22.04.2017 um 09:05 in Nachricht : > 18.04.2017 10:47, Ulrich Windl пишет: > ... >>> >>> Now let me come back to quorum vs. stonith; >>> >>> Said simply; Quorum is a tool for when everything is working. Fencing is >>> a tool for when things go wrong. >> >> I'd say: Quorum is the tool to decide who'll be alive and who's going to > die, >> and STONITH is the tool to make nodes die. > > If I had PROD, QA and DEV in a cluster and PROD were separated from > QA+DEV I'd be very sad if PROD were shut down. > > The notion of simple node majority as kill policy is not appropriate as > well as simple node based delays. I wish pacemaker supported scoring > system for resources so that we could base stonith delays on them (the > most important sub-cluster starts fencing first). So your preference for a 2|1 node split brain scenario is to make the one node survive if it runs the more important resources? > > >> If everything is working you need >> neither quorum nor STONITH. >> > > I wonder how SBD fits into this discussion. It is marketed as stonith > agent, but it is based on committing suicide so relies on well-behaving > nodes. Which we by definition cannot trust to behave well, otherwise > we'd not need stonith in the first place. > > ___ > Users mailing list: Users@clusterlabs.org > http://lists.clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org ___ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org