Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 15.10.2021 13:24, Klaus Wenninger wrote:
> On Fri, Oct 15, 2021 at 12:01 PM Andrei Borzenkov 
> wrote:
> 
>> On Fri, Oct 15, 2021 at 9:25 AM Klaus Wenninger 
>> wrote:
>>
>>> Main pain-point here is that ping-RA allows us to configure the count of
>> pings sent, but it
>>> is just using the exit-value from ping that becomes negative already
>> when one of the
>>> answers is missing.
>>
>> Use fping instead? Which is supported by ping RA and should behave
>> exactly as needed - report host alive if at least one reply was
>> received.
>>
> I like fping but it having some reputation as DOS tool not everybody might
> be fine installing it.
> And we will still have something that would be fine with at least a 50%
> packet
> loss, which as well might not be acceptable to qualify a host as reachable.
> But of course we still can tweak it even with the current implementation to
> let's say a loss <20% by giving the same host 5 times and having
> the limit set to 4.
> 
>>
>> Maybe when using ping RA could also parse ping output instead of
>> relying on exit status.
>>
> as the fence-agent referenced is doing ;-)
> 

Actually simply having inner loop from 1 to $OCF_RESKEY_attempts with
"ping -c 1" is more simple and portable. But I am not convinced it is
worth the troubles.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Trying to understand dampening (ping)

2021-10-16 Thread Andrei Borzenkov
On 14.10.2021 23:51, martin doc wrote:
> 
> 
> 
> From: Andrei Borzenkov ,  Friday, 15 October 2021 4:59 AM
> ...
>> Dampening defines delay before attributes are committed to CIB.
>> Private attributes are never ever written into CIB, so dampening
>> makes no sense here. Private attributes are managed by attrd
>> itself and you see the latest value.
> 
>> If you change transient attribute (without -p option) value you
>> will see different values reported by
> 
>> attrd_updater -n my_ping -Q
> 
>> and
> 
>> cibadmin -Q -A "//nvpair[@name='my_ping']"
> 
>> until dampening timeout expires.
> 
>> This applies even to deleting attribute.
> 
> Ok, now I understand what the dampen function does.
> 
> If I understand this correctly then this probably makes every documented 
> example of using ocf:pacemaker:ping with a colocation statement wrong because 
> the only way to see the effect of dampen is to use a rule that references the 
> value of pingd directly. That or the script for ping has a major flaw with 
> respect to dampen.
> 
> That is when I do this:
> 
> pcs resource create myPing ocf:pacemaker:ping host_list=192.168.1.1 
> failure_score=1
> pcs resource create database ocf:heartbeat:pgsql
> pcs group add pgrp myPing database
> 
> PCS will move everything to a new node if there is even 1 ping failure 
> because monitor in ping doesn't look at the dampened value, only the value of 
> the immediate returned value.
> 

failure_score is number of hosts that must answer ping during single
monitor invocation. If you have single host, the only meani
ngful value is 1.

If you want to smooth out single ping failure, use "attempts" parameter.
It defaults to 3, which means every monitor operation does 3 pings and
fails only if all of the fail. So it already does what you want without
any special configuration.

> The same is true with colocation statements - if a constraint is made with a 
> ping resource without using a rule that references pingd then  the dampen 
> behaviour is ignored completely.
> 

You completely misunderstand what dampen is used for. It is used to wait
for multiple nodes to record results of their monitor actions so when
policy engine is invoked it (hopefully) has final picture. It has
nothing to do with individual ping results on any single node.

> Is the ping'er missing something that does this:
> 
> score=`cibadmin -Q -A "//nvpair[@name='ping']" | sed -e 
> 's/.*value="\([^"]*\)".*/\1/'`
> 

The only effect it will have will be using results of previous monitor
invocation instead of current one.

You cannot used dampening to smooth out ping results. You will still
have only one final value recorded, so in the sequence success, success,
failure it will be failure.

To do anything more sophisticated you need to actually record every
individual ping result. This is far more involved and I still miss real
use case.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/