Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-08-29 Thread Dominik Klein
Node level failure is detected on the communications layer, ie hearbeat 
or corosync. That software is run with realtime priority. So it keeps 
working just fine (use tcpdump on the healthy node to verify). So 
pacemaker on the healthy node does now know that the other node has a 
problem and therefore does not initiate failover.

We had this discussion back in 2010, maybe you also want to refer to 
that: 
http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004739.html

Regards
Dominik

On 07/08/2011 03:23 PM, Warnke, Eric E wrote:

 If the fork bomb is preventing the system from spawning a health check, it
 would seem like the most intelligent course of action would be to presume
 that it failed and act accordingly.

 -Eric


 On 7/8/11 8:38 AM, Lars Marowsky-Breel...@suse.de  wrote:

 On 2011-07-08T14:10:09, Gianluca Cecchigianluca.cec...@gmail.com  wrote:

 So that each node has to write to its dedicated part of it and read
 from the other ones.
 If one node doesn't update its portion it is then detected by the
 others and it is fenced after a configurable number of misses...
 Does pacemaker provide some sort of this configuration?

 external/sbd as a fencing mechanism provides this, but that is not the
 same as a load  system health check at all.

 Though tieing into that would make sense, yes.


 Regards,
 Lars

 --
 Architect Storage/HA, OPS Engineering, Novell, Inc.
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
 Imendörffer, HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-08-29 Thread Dominik Klein
On 08/29/2011 09:51 AM, Dominik Klein wrote:
 Node level failure is detected on the communications layer, ie hearbeat
 or corosync. That software is run with realtime priority. So it keeps
 working just fine (use tcpdump on the healthy node to verify). So
 pacemaker on the healthy node does now know

woops, this was supposed to say not know

 that the other node has a
 problem and therefore does not initiate failover.

 We had this discussion back in 2010, maybe you also want to refer to
 that:
 http://oss.clusterlabs.org/pipermail/pacemaker/2010-February/004739.html

 Regards
 Dominik
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-08-28 Thread James Smith
This is essentially what I want and I am surprised this isn't already the cause.

Regards,
James Smith

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Eric Warnke
Sent: 11 July 2011 13:51
To: Florian Haas; General Linux-HA mailing list
Subject: Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover


Failing to spawn a check should be the same as a check failing.

-Eric


On 7/11/11 3:38 AM, Florian Haas florian.h...@linbit.com wrote:

On 2011-07-08 15:23, Warnke, Eric E wrote:
 
 If the fork bomb is preventing the system from spawning a health 
check, it  would seem like the most intelligent course of action would 
be to presume  that it failed and act accordingly.

Here we go again. Since the original poster did not address the 
following question of mine, maybe you are inclined to:

 Now please define how exactly Pacemaker would be handling this 
 accordingly.

Florian



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-11 Thread Florian Haas
On 2011-07-08 15:23, Warnke, Eric E wrote:
 
 If the fork bomb is preventing the system from spawning a health check, it
 would seem like the most intelligent course of action would be to presume
 that it failed and act accordingly.

Here we go again. Since the original poster did not address the
following question of mine, maybe you are inclined to:

 Now please define how exactly Pacemaker would be handling this
 accordingly.

Florian



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-11 Thread James Smith
Hi,

The slave should know the master is unable to complete monitor operations.  If 
any monitor operations fail, it should initiate fencing.  But because the OS is 
so rammed, it doesn't respond to anything.  But it appears as if the slave 
doesn't initiate any health checks at all against the master, so never knows 
its failed, this is crazy.

Regards,

James Smith

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Florian Haas
Sent: 11 July 2011 08:39
To: General Linux-HA mailing list
Cc: Warnke, Eric E
Subject: Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

On 2011-07-08 15:23, Warnke, Eric E wrote:
 
 If the fork bomb is preventing the system from spawning a health 
 check, it would seem like the most intelligent course of action would 
 be to presume that it failed and act accordingly.

Here we go again. Since the original poster did not address the following 
question of mine, maybe you are inclined to:

 Now please define how exactly Pacemaker would be handling this 
 accordingly.

Florian

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-11 Thread Eric Warnke

Failing to spawn a check should be the same as a check failing.

-Eric


On 7/11/11 3:38 AM, Florian Haas florian.h...@linbit.com wrote:

On 2011-07-08 15:23, Warnke, Eric E wrote:
 
 If the fork bomb is preventing the system from spawning a health check,
it
 would seem like the most intelligent course of action would be to
presume
 that it failed and act accordingly.

Here we go again. Since the original poster did not address the
following question of mine, maybe you are inclined to:

 Now please define how exactly Pacemaker would be handling this
 accordingly.

Florian



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-10 Thread Warnke, Eric E

If the fork bomb is preventing the system from spawning a health check, it
would seem like the most intelligent course of action would be to presume
that it failed and act accordingly.

-Eric


On 7/8/11 8:38 AM, Lars Marowsky-Bree l...@suse.de wrote:

On 2011-07-08T14:10:09, Gianluca Cecchi gianluca.cec...@gmail.com wrote:

 So that each node has to write to its dedicated part of it and read
 from the other ones.
 If one node doesn't update its portion it is then detected by the
 others and it is fenced after a configurable number of misses...
 Does pacemaker provide some sort of this configuration?

external/sbd as a fencing mechanism provides this, but that is not the
same as a load  system health check at all.

Though tieing into that would make sense, yes.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix
Imendörffer, HRB 21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-08 Thread Lars Marowsky-Bree
On 2011-07-08T08:51:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote:

 while sleep 10
 do
reset-watchdog-timeout
 done

Won't work with SBD of course, because there can only be one watchdog
user.

And sleep is not an external command. ;-)

It would make sense to actively run a system health check that is
watchdog-protected.


Regards,
Lars

-- 
Architect Storage/HA, OPS Engineering, Novell, Inc.
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Antw: Re: Forkbomb not initiating failover

2011-07-08 Thread Gianluca Cecchi
On Fri, Jul 8, 2011 at 10:04 AM, Lars Marowsky-Bree  wrote:
 On 2011-07-08T08:51:41, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de 
 wrote:

 while sleep 10
 do
    reset-watchdog-timeout
 done

 Won't work with SBD of course, because there can only be one watchdog
 user.

 And sleep is not an external command. ;-)

 It would make sense to actively run a system health check that is
 watchdog-protected.


 Regards,
    Lars

On other cluster types (such as rh el cluster suite) you can address
this potential high-load scenario defining a quorum disk (that
typically involves sort of shared storage..).
So that each node has to write to its dedicated part of it and read
from the other ones.
If one node doesn't update its portion it is then detected by the
others and it is fenced after a configurable number of misses...
Does pacemaker provide some sort of this configuration?

Gianluca
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems