Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
Node level failure is detected on the communications layer, ie hearbeat or corosync. That software is run with realtime priority. So it keeps working just fine (use tcpdump on the healthy node to verify). So pacemaker on the healthy node does now know that the other node has a problem and therefore does not initiate failover. We had this discussion back in 2010, maybe you also want to refer to that: http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004739.html Regards Dominik On 07/08/2011 03:23 PM, Warnke, Eric E wrote: If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. -Eric On 7/8/11 8:38 AM, Lars Marowsky-Breel...@suse.de wrote: On 2011-07-08T14:10:09, Gianluca Cecchigianluca.cec...@gmail.com wrote: So that each node has to write to its dedicated part of it and read from the other ones. If one node doesn't update its portion it is then detected by the others and it is fenced after a configurable number of misses... Does pacemaker provide some sort of this configuration? external/sbd as a fencing mechanism provides this, but that is not the same as a load system health check at all. Though tieing into that would make sense, yes. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
On 08/29/2011 09:51 AM, Dominik Klein wrote: Node level failure is detected on the communications layer, ie hearbeat or corosync. That software is run with realtime priority. So it keeps working just fine (use tcpdump on the healthy node to verify). So pacemaker on the healthy node does now know woops, this was supposed to say not know that the other node has a problem and therefore does not initiate failover. We had this discussion back in 2010, maybe you also want to refer to that: http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004739.html Regards Dominik ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
This is essentially what I want and I am surprised this isn't already the cause. Regards, James Smith -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Eric Warnke Sent: 11 July 2011 13:51 To: Florian Haas; General Linux-HA mailing list Subject: Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover Failing to spawn a check should be the same as a check failing. -Eric On 7/11/11 3:38 AM, Florian Haas florian.h...@linbit.com wrote: On 2011-07-08 15:23, Warnke, Eric E wrote: If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. Here we go again. Since the original poster did not address the following question of mine, maybe you are inclined to: Now please define how exactly Pacemaker would be handling this accordingly. Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
On 2011-07-08 15:23, Warnke, Eric E wrote: If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. Here we go again. Since the original poster did not address the following question of mine, maybe you are inclined to: Now please define how exactly Pacemaker would be handling this accordingly. Florian signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
Hi, The slave should know the master is unable to complete monitor operations. If any monitor operations fail, it should initiate fencing. But because the OS is so rammed, it doesn't respond to anything. But it appears as if the slave doesn't initiate any health checks at all against the master, so never knows its failed, this is crazy. Regards, James Smith -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas Sent: 11 July 2011 08:39 To: General Linux-HA mailing list Cc: Warnke, Eric E Subject: Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover On 2011-07-08 15:23, Warnke, Eric E wrote: If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. Here we go again. Since the original poster did not address the following question of mine, maybe you are inclined to: Now please define how exactly Pacemaker would be handling this accordingly. Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
Failing to spawn a check should be the same as a check failing. -Eric On 7/11/11 3:38 AM, Florian Haas florian.h...@linbit.com wrote: On 2011-07-08 15:23, Warnke, Eric E wrote: If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. Here we go again. Since the original poster did not address the following question of mine, maybe you are inclined to: Now please define how exactly Pacemaker would be handling this accordingly. Florian ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
If the fork bomb is preventing the system from spawning a health check, it would seem like the most intelligent course of action would be to presume that it failed and act accordingly. -Eric On 7/8/11 8:38 AM, Lars Marowsky-Bree l...@suse.de wrote: On 2011-07-08T14:10:09, Gianluca Cecchi gianluca.cec...@gmail.com wrote: So that each node has to write to its dedicated part of it and read from the other ones. If one node doesn't update its portion it is then detected by the others and it is fenced after a configurable number of misses... Does pacemaker provide some sort of this configuration? external/sbd as a fencing mechanism provides this, but that is not the same as a load system health check at all. Though tieing into that would make sense, yes. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
On 2011-07-08T08:51:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: while sleep 10 do reset-watchdog-timeout done Won't work with SBD of course, because there can only be one watchdog user. And sleep is not an external command. ;-) It would make sense to actively run a system health check that is watchdog-protected. Regards, Lars -- Architect Storage/HA, OPS Engineering, Novell, Inc. SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover
On Fri, Jul 8, 2011 at 10:04 AM, Lars Marowsky-Bree wrote: On 2011-07-08T08:51:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: while sleep 10 do reset-watchdog-timeout done Won't work with SBD of course, because there can only be one watchdog user. And sleep is not an external command. ;-) It would make sense to actively run a system health check that is watchdog-protected. Regards, Lars On other cluster types (such as rh el cluster suite) you can address this potential high-load scenario defining a quorum disk (that typically involves sort of shared storage..). So that each node has to write to its dedicated part of it and read from the other ones. If one node doesn't update its portion it is then detected by the others and it is fenced after a configurable number of misses... Does pacemaker provide some sort of this configuration? Gianluca ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems