Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-02-11 Thread Jan Pokorný
On 11/02/19 15:03 -0600, Ken Gaillot wrote:
> On Fri, 2019-02-01 at 08:10 +0100, Jan Pokorný wrote:
>> On 28/01/19 09:47 -0600, Ken Gaillot wrote:
>>> On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
>>> Pacemaker can handle the clock jumping forward, but not backward.
>> 
>> I am rather surprised, are we not using monotonic time only, then?
>> If so, why?
> 
> The scheduler runs on a single node (the DC) but must take as input the
> resource history (including timestamps) on all nodes. We need wall
> clock time to compare against time-based rules.

Yep, was aware of the troubles with this.

> Also, if we get two resource history entries from a node, we don't
> know if it rebooted in between, so a monotonic timestamp alone
> wouldn't be sufficient.

Ah, that's along the lines of Ulrich's response, I see it now, thanks.

I am not sure if there could be a step around that using boot IDs when
provided by the platform (like the hashes in case of systemd, assuming
just an equality test is all that's needed, since both histories
cannot arrive at the same time and presumably the FIFO gets
preserved -- it shall under normal circumstances, incl. no time
travels and sane, non-byzantine process, which might even be partially
detected and acted upon [two differing histories without any
recollection the node was fenced by the cluster? fence it for sure
and good measure!]).

> However, it might be possible to store both time representations in the
> history (and possibly maintain some sort of cluster knowledge about
> monotonic clocks to compare them within and across nodes), and use one
> or the other depending on the context. I haven't tried to determine how
> feasible that would be, but it would be a major project.

Just thinking aloud, the DC node, once won the election, will cause
other nodes (incl. new-comers during it's ruling) to (re)set the
offsets DC's vs. their own internal wall-clock time clocks (for
time-based rules should they ever become DC on their own), all nodes
will (re)establish monotonic to wall-clock time conversion based on
these one-off inputs, and from that point on, they can operate fully
detached from the wall-clock time (so that changing the wall clock
will have no immediate effect on the cluster unless (re)harmonizing
desired via some configured event handler that could reschedule the
plan appropriately, or delay the sync for a more suitable moment).
This indeed counts on pacemaker used in a full-fledged cluster mode,
i.e., requiring quorum (not to speak about fencing).

As a corollary, with such a scheme, time-based rules would only be
allowed when quorum fully honoured (afterall, it makes more sense
to employ cron or timer systemd units otherwise, since they all
use a local time only, which is the right fit, then, as opposed to
a distributed system).

>> We shall not need any explicit time synchronization across the nodes
>> since we are already backed by extended virtual synchrony from
>> corosync, eventhough it could introduce strangenesses when
>> time-based rules kick in.
> 
> Pacemaker determines the state of a resource by replaying its resource
> history in the CIB. A history entry can be replaced only by a newer
> event. Thus if there's a start event in the history, and a stop result
> comes in, we have to know which one is newer to determine whether the
> resource is started or stopped.

But for that, boot ID might be a sufficient determinant, since
a result (keyed with boot ID ABC) for an action never triggered with
boot ID XYZ deemed "current" ATM means that something is off, and
fencing is perhaps the best choice.

> Something along those lines is likely the cause of:
> 
> https://bugs.clusterlabs.org/show_bug.cgi?id=5246

I believe the "detached" scheme sketched above could solve this.
Problem is, devil is in the details to it can be unpredictably hard
to get it right in the distributed environment.

-- 
Jan (Poki)


pgpzb2j8BIGQK.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-02-11 Thread Ken Gaillot
On Fri, 2019-02-01 at 08:10 +0100, Jan Pokorný wrote:
> On 28/01/19 09:47 -0600, Ken Gaillot wrote:
> > On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
> > Pacemaker can handle the clock jumping forward, but not backward.
> 
> I am rather surprised, are we not using monotonic time only, then?
> If so, why?

The scheduler runs on a single node (the DC) but must take as input the
resource history (including timestamps) on all nodes. We need wall
clock time to compare against time-based rules. Also, if we get two
resource history entries from a node, we don't know if it rebooted in
between, so a monotonic timestamp alone wouldn't be sufficient.

However, it might be possible to store both time representations in the
history (and possibly maintain some sort of cluster knowledge about
monotonic clocks to compare them within and across nodes), and use one
or the other depending on the context. I haven't tried to determine how
feasible that would be, but it would be a major project.

> We shall not need any explicit time synchronization across the nodes
> since we are already backed by extended virtual synchrony from
> corosync, eventhough it could introduce strangenesses when
> time-based rules kick in.

Pacemaker determines the state of a resource by replaying its resource
history in the CIB. A history entry can be replaced only by a newer
event. Thus if there's a start event in the history, and a stop result
comes in, we have to know which one is newer to determine whether the
resource is started or stopped.

Something along those lines is likely the cause of:

https://bugs.clusterlabs.org/show_bug.cgi?id=5246
-- 
Ken Gaillot 

___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-01-31 Thread Jan Pokorný
On 28/01/19 09:47 -0600, Ken Gaillot wrote:
> On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
> Pacemaker can handle the clock jumping forward, but not backward.

I am rather surprised, are we not using monotonic time only, then?
If so, why?

We shall not need any explicit time synchronization across the nodes
since we are already backed by extended virtual synchrony from
corosync, eventhough it could introduce strangenesses when
time-based rules kick in.

-- 
Nazdar,
Jan (Poki)


pgpLqnsQYhMnE.pgp
Description: PGP signature
___
Users mailing list: Users@clusterlabs.org
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-01-28 Thread Dileep V Nair

Thanks Ken for prompt response.

Yes.. It was at system boot. I am still to find out a reason as to what
caused the reboot. There was no Stonith OR any other error in pacemaker
log.

Thanks & Regards

Dileep Nair
Squad Lead - SAP Base
Togaf Certified Enterprise Architect
IBM Services for Managed Applications
+91 98450 22258 Mobile
dilen...@in.ibm.com

IBM Services




From:   Ken Gaillot 
To: Cluster Labs - All topics related to open-source clustering
welcomed 
Date:   01/28/2019 09:18 PM
Subject:Re: [ClusterLabs] Pacemaker log showing time mismatch after
Sent by:"Users" 



On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
> Hi,
>
> I am seeing that there is a log entry showing Recheck Timer popped
> and the time in pacemaker.log went back in time. After sometime, the
> time issue Around the same time the resources also failed over (Slave
> became master). Do anyone know why this behavior ?
>
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: crm_timer_popped:
> PEngine Recheck Timer (I_PE_CALC) just popped (90ms)

Pacemaker can handle the clock jumping forward, but not backward. The
recheck timer here is unrelated to the clock jump, it's just the first
log message to appear since it jumped.

You definitely want to find out what's changing the clock.

If this is at system boot, likely the hardware clock is wrong and some
time manager (ntp, etc.) is adjusting it. Pacemaker's systemd unit file
has "After=time-sync.target" to try to ensure that it doesn't start
until after this has happened, but unfortunately you often have to take
extra steps to make time managers use that target (e.g. enable chronyd-
wait.service if you're using chronyd), and of course if you're not
using systemd it's not any help. But the basic idea is you want to
ensure pacemaker starts after the time has been adjusted at boot.

If this isn't at boot, then your host has something weird going on.
Check the system log around the time of the jump, etc.

> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: do_state_transition:
> Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> process_pe_message: Input has not changed since last time, not saving
> to disk
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: notice: unpack_config:
> Relying on watchdog integration for fencing
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4us7leccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4us7leccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4ushleccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4ushleccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:0 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource ip_cp1 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:1 active
> in master mode on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2log
> active on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource KUD_DB2 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> stonith-sbd (stonith:external/sbd): Started pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> ip_cp1 (ocf::heartbeat:IPaddr2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: clone_print:
> Master/Slave Set: ms_db2_cp1 [db_cp1]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print:
> Masters: [ pn4us7leccp1 ]
> Jan 22 20:17:03 [9385] p

Re: [ClusterLabs] Pacemaker log showing time mismatch after

2019-01-28 Thread Ken Gaillot
On Mon, 2019-01-28 at 18:04 +0530, Dileep V Nair wrote:
> Hi,
> 
> I am seeing that there is a log entry showing Recheck Timer popped
> and the time in pacemaker.log went back in time. After sometime, the
> time issue Around the same time the resources also failed over (Slave
> became master). Do anyone know why this behavior ? 
> 
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 23 01:16:48 [9383] pn4ushleccp1 lrmd: notice: operation_finished:
> db_cp1_monitor_2:32476:stderr [ /usr/bin/.: Permission denied. ]
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: crm_timer_popped:
> PEngine Recheck Timer (I_PE_CALC) just popped (90ms)

Pacemaker can handle the clock jumping forward, but not backward. The
recheck timer here is unrelated to the clock jump, it's just the first
log message to appear since it jumped.

You definitely want to find out what's changing the clock.

If this is at system boot, likely the hardware clock is wrong and some
time manager (ntp, etc.) is adjusting it. Pacemaker's systemd unit file
has "After=time-sync.target" to try to ensure that it doesn't start
until after this has happened, but unfortunately you often have to take
extra steps to make time managers use that target (e.g. enable chronyd-
wait.service if you're using chronyd), and of course if you're not
using systemd it's not any help. But the basic idea is you want to
ensure pacemaker starts after the time has been adjusted at boot.

If this isn't at boot, then your host has something weird going on.
Check the system log around the time of the jump, etc.

> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: notice:
> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE |
> input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
> Jan 22 20:17:03 [9386] pn4ushleccp1 crmd: info: do_state_transition:
> Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> process_pe_message: Input has not changed since last time, not saving
> to disk
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: notice: unpack_config:
> Relying on watchdog integration for fencing
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4us7leccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4us7leccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status_fencing: Node pn4ushleccp1 is active
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_online_status: Node pn4ushleccp1 is online
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:0 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2 active
> on pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource ip_cp1 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource db_cp1:1 active
> in master mode on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource TSM_DB2log
> active on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info:
> determine_op_status: Operation monitor found resource KUD_DB2 active
> on pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> stonith-sbd (stonith:external/sbd): Started pn4ushleccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> ip_cp1 (ocf::heartbeat:IPaddr2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: clone_print:
> Master/Slave Set: ms_db2_cp1 [db_cp1]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print:
> Masters: [ pn4us7leccp1 ]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: short_print:
> Slaves: [ pn4ushleccp1 ]
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> TSM_DB2 (systemd:dsmcad_db2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> TSM_DB2log (systemd:dsmcad_db2log): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: native_print:
> KUD_DB2 (systemd:kuddb2_db2): Started pn4us7leccp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: rsc_merge_weights:
> ms_db2_cp1: Breaking dependency loop at ms_db2_cp1
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: master_color:
> Promoting db_cp1:0 (Master pn4us7leccp1)
> Jan 22 20:17:03 [9385] pn4ushleccp1 pengine: info: master_color:
> ms_db2_cp1: Promoted 1 instances of a possible 1 to master
> Jan 22 20:17:03 [9385] 

[ClusterLabs] Pacemaker log showing time mismatch after

2019-01-28 Thread Dileep V Nair


Hi,

I am seeing that there is a log entry showing Recheck Timer popped
and the time in pacemaker.log went back in time. After sometime, the time
issue Around the same time the resources also failed over (Slave became
master). Do anyone know why this behavior ?

Jan 23 01:16:48 [9383] pn4ushleccp1   lrmd:   notice:
operation_finished:   db_cp1_monitor_2:32476:stderr [ /usr/bin/.:
Permission denied. ]
Jan 23 01:16:48 [9383] pn4ushleccp1   lrmd:   notice:
operation_finished:   db_cp1_monitor_2:32476:stderr [ /usr/bin/.:
Permission denied. ]
Jan 22 20:17:03 [9386] pn4ushleccp1   crmd: info: crm_timer_popped:
PEngine Recheck Timer (I_PE_CALC) just popped (90ms)
Jan 22 20:17:03 [9386] pn4ushleccp1   crmd:   notice:
do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE |
input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
Jan 22 20:17:03 [9386] pn4ushleccp1   crmd: info:
do_state_transition:  Progressed to state S_POLICY_ENGINE after
C_TIMER_POPPED
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
process_pe_message:   Input has not changed since last time, not saving to
disk
Jan 22 20:17:03 [9385] pn4ushleccp1pengine:   notice: unpack_config:
Relying on watchdog integration for fencing
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_online_status_fencing:  Node pn4us7leccp1 is active
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_online_status:  Node pn4us7leccp1 is online
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_online_status_fencing:  Node pn4ushleccp1 is active
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_online_status:  Node pn4ushleccp1 is online
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource db_cp1:0 active on
pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource TSM_DB2 active on
pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource TSM_DB2 active on
pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource ip_cp1 active on
pn4ushleccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource db_cp1:1 active in
master mode on pn4ushleccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource TSM_DB2log active on
pn4ushleccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
determine_op_status:  Operation monitor found resource KUD_DB2 active on
pn4ushleccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print:
stonith-sbd (stonith:external/sbd): Started pn4ushleccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print:
ip_cp1  (ocf::heartbeat:IPaddr2):   Started pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: clone_print:
Master/Slave Set: ms_db2_cp1 [db_cp1]
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: short_print:
Masters: [ pn4us7leccp1 ]
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: short_print:
Slaves: [ pn4ushleccp1 ]
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print:
TSM_DB2 (systemd:dsmcad_db2):   Started pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print:
TSM_DB2log  (systemd:dsmcad_db2log):Started pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: native_print:
KUD_DB2 (systemd:kuddb2_db2):   Started pn4us7leccp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info:
rsc_merge_weights:ms_db2_cp1: Breaking dependency loop at ms_db2_cp1
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: master_color:
Promoting db_cp1:0 (Master pn4us7leccp1)
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: master_color:
ms_db2_cp1: Promoted 1 instances of a possible 1 to master
Jan 22 20:17:03 [9385] pn4ushleccp1pengine: info: LogActions:
Leave   ip_cp1  (Started pn4us7leccp1)


After the transition, the date was shifted back to normal

Jan 22 20:47:03 [9386] pn4ushleccp1   crmd: info: do_log:
Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Jan 22 20:47:03 [9386] pn4ushleccp1   crmd:   notice:
do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE |
input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Jan 23 01:47:22 [9383] pn4ushleccp1   lrmd:   notice:
operation_finished:   db_cp1_monitor_2:19518:stderr [ /usr/bin/.:
Permission denied. ]
Jan 23 01:47:22 [9383] pn4ushleccp1   lrmd:   notice:
operation_finished:   db_cp1_monitor_2:19518:stderr [ /usr/bin/.:
Permission denied. ]



Thanks & Regards

Dileep Nair
Squad Lead - SAP Base
Togaf Certified