[ClusterLabs] True time periods/CLOCK_MONOTONIC node vs. cluster wide (Was: Coming in Pacemaker 2.0.4: dependency on monotonic clock for systemd resources)

2020-03-11 Thread Jan Pokorný
On 11/03/20 09:04 -0500, Ken Gaillot wrote:
> On Wed, 2020-03-11 at 08:20 +0100, Ulrich Windl wrote:
>> You only have to take care not to compare CLOCK_MONOTONIC
>> timestamps between nodes or node restarts. 
> 
> Definitely :)
> 
> They are used only to calculate action queue and run durations

Both these ... from an isolated perspective of a single node only.
E.g., run durations related to the one currently responsible to act
upon the resource in some way (the "atomic" operation is always
bound to the single host context and when retried or logically
followed with another operation, it's measured anew on pertaining,
perhaps different node).

I feel that's a rather important detail, and just recently this
surface received some slight scratching on the conceptual level...

Current inability to synchronize measurements of CLOCK_MONOTONIC
like notions of time amongst nodes (especially tranfer from old,
possibly failed DC to new DC, likely involving some admitted loss
of precisenesss -- mind you, cluster is never fully synchronous,
you'd need the help of specialized HW for that) in as lossless
way as possible is what I believe is the main show stopper for
being able to accurately express the actual "availability score"
for given resource or resource group --- yep, that famous number,
the holy grail of anyone taking HA seriously --- while at the
same time, something the cluster stack currently cannot readily
present to users (despite it having all or most of the relevant
information, just piecewise).

IOW, this sort of non-localized measurement is what asks for
emulation of cluster-wide CLOCK_MONOTONIC-like measurement, which is
not that trivial if you think about it.  Sort of a corollary of what
Ulrich said, because emulating that pushes you exactly in these waters
of relating CLOCK_MONOTONIC measurements from different nodes
together.

  Not to speak of evaluating whether any node is totally off in its
  own CLOCK_MONOTONIC measurements and hence shall rather be fenced
  as "brain damaged", and perhaps even using the measurements of the
  nodes keeping up together to somehow calculate what's the average
  rate of measured time progress so as to self-maintain time-bound
  cluster-wide integrity, which may just as well be important for
  sbd(!).  (nope, this doesn't get anywhere close to near-light
  speed concerns, just imprecise HW and possibly implied/or
  inter-VM differences)

Perhaps cheapest way out would be to use NTP-level algorithms to
synchronize two CLOCK_MONOTIC timers at the point the worker node
for resource in question claimed "resource stopped", between this
worker node and DC, so that the DC can synchronize again like that
with a new worker node at the point in time when this new claims
"resource started".  At that point, DC would have a rather accurate
knowledge of how long this fail-/move-over, hence down-time, lasted,
hence being able to reflect it to the "availability score" equations.

  Hmm, no wonder that businesses with deep pockets and serious
  synchronicity requirements across the globe resort to using atomic
  clocks, incredibly precise CLOCK_MONOTONIC by default :-)

> For most resource types those are optional (for reporting only), but
> systemd resources require them (multiple status checks are usually
> necessary to verify a start or stop worked, and we need to check the
> remaining timeout each time).

Coincidentally, IIRC systemd alone strictly requires CLOCK_MONOTIC
(and we shall get a lot more strict as well to provide reasonable
expectations to the users as mentioned recently[*]), so said
requirement is just a logical extension without corner cases.

[*] https://lists.clusterlabs.org/pipermail/users/2019-November/026647.html

-- 
Jan (Poki)


pgp88itDpsGVE.pgp
Description: PGP signature
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Resource Parameter Change Not Honoring Constraints

2020-03-11 Thread Marc Smith
Hi,

I'm using Pacemaker 1.1.20 (yes, I know, a bit dated now). I noticed
when I modify a resource parameter (eg, update the value), this causes
the resource itself to restart. And that's fine, but when this
resource is restarted, it doesn't appear to honor the full set of
constraints for that resource.

I see the output like this (right after the resource parameter change):
...
Mar 11 20:43:25 localhost crmd[1943]:   notice: State transition S_IDLE -> S_POL
ICY_ENGINE
Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state: S_POLICY_ENG
INE
Mar 11 20:43:25 localhost pengine[1942]:   notice: Clearing failure of
p_bmd_140c58-1 on 140c58-1 because resource parameters have changed
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_bmd_140c58-1 (   140c58-1 )   due to
resource definition change
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_dummy_g_lvm_140c58-1 (   140c58-1 )   due to
required g_md_140c58-1 running
Mar 11 20:43:25 localhost pengine[1942]:   notice:  * Restart
p_lvm_140c58_vg_01 (   140c58-1 )   due to
required p_dummy_g_lvm_140c58-1 start
Mar 11 20:43:25 localhost pengine[1942]:   notice: Calculated
transition 41, saving inputs in
/var/lib/pacemaker/pengine/pe-input-173.bz2
Mar 11 20:43:25 localhost crmd[1943]:   notice: Initiating stop
operation p_lvm_140c58_vg_01_stop_0 on 140c58-1
Mar 11 20:43:25 localhost crmd[1943]:   notice: Transition aborted by
deletion of lrm_rsc_op[@id='p_bmd_140c58-1_last_failure_0']: Resource
operation removal
Mar 11 20:43:25 localhost crmd[1943]:   notice: Current ping state:
S_TRANSITION_ENGINE
...

The stop on 'p_lvm_140c58_vg_01' then times out, because the other
constraint (to stop the service above LVM) is never executed. I can
see from the messages it never even tries to demote the resource above
that.

Yet, if I use crmsh at the shell, and do a restart on that same
resource, it works correctly, and all constraints are honored: crm
resource restart p_bmd_140c58-1

I can certainly provide my full cluster config if needed, but hoping
to keep this email concise for clarity. =)

I guess my questions are: 1) Is the difference in restart behavior
expected, and not all constraints are followed when resource
parameters change (or some other restart event that originated
internally like this)? 2) Or perhaps this is known bug that was
already resolved in newer versions of Pacemaker?

I searched a bit for #2 but I didn't get many (well any) hits on other
users experiencing this behavior.

Many thanks in advance.

--Marc
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] I want to have some resource monitored and based on that make an acton. Is it possible?

2020-03-11 Thread Roman Hershkovich
But colocation of dbprobe won't pull trigger of webserver ? Or because that
it is below in order - it will just restart services ?

On Wed, Mar 11, 2020, 18:41 Ken Gaillot  wrote:

> On Wed, 2020-03-11 at 16:08 +0200, Roman Hershkovich wrote:
> > Great, thank you very much for explanation. Regarding returning error
> > - i did not knew.
> > So, basically i can have a service, that will probe for master DB, in
> > case of its transfer - service will update /etc/hosts and return
> > error, which will be caught by pcs and it will restart whole
> > dependent set ? Sounds good.
> > But how i can do 2 "main resources" ? I have webserver AND
> > db_monitor. In case of failure of webserver - should all start on
> > node b, but in case of DB change - only underlying resources ...
> > Should i make webserver outside of set?
>
> If you want the webserver to move to another node after a single
> failure (of the webserver itself), set its migration-threshold to 1. If
> you want other resources to move with it, colocate them with the
> webserver.
>
> The db monitor won't affect that -- if the db monitor fails, anything
> ordered after it will restart.
>
> > On Wed, Mar 11, 2020 at 3:57 PM Ken Gaillot 
> > wrote:
> > > On Wed, 2020-03-11 at 02:27 +0200, Roman Hershkovich wrote:
> > > > Yes.
> > > > I have only 1 APP active at same time, and so I want this app to
> > > be
> > > > restarted whenever DB changes. Another one is a "standby" APP,
> > > where
> > > > all resources are shut.
> > > > So i thought about adding some "service" script, which will probe
> > > a
> > > > DB , and in case if it finds a CHANGE - will trigger pcs to
> > > reload a
> > > > set of resources, where one of resource would be a systemctl
> > > file,
> > > > which will continue to run a script, so in case of next change of
> > > DB
> > > > - it will restart APP set again. Is it sounds reasonable? (i
> > > don't
> > > > care of errors. I mean - i do, i want to log, but i'm ok to see
> > > them)
> > >
> > > That sounds fine, but I'd trigger the restart by returning an error
> > > code from the db-monitoring script, rather than directly attempt to
> > > restart the resources via pcs. If you order the other resources
> > > after
> > > the db-monitoring script, pacemaker will automatically restart them
> > > when the db-monitoring script returns an error.
> > >
> > > > In addition - i thought maybe bringing PAF here could be useful -
> > > but
> > > > this is even more complex ...
> > >
> > > If bringing the db into the cluster is a possibility, that would
> > > probably be more reliable, with a quicker response too.
> > >
> > > In that case you would simply order the dependent resources after
> > > the
> > > database master promotion. pcs example: pcs constraint order
> > > promote
> > > DB-RSC then start DEPENDENT-RSC
> > >
> > > > On Tue, Mar 10, 2020 at 10:28 PM Ken Gaillot  > > >
> > > > wrote:
> > > > > On Tue, 2020-03-10 at 21:03 +0200, Roman Hershkovich wrote:
> > > > > > DB servers are not in PCS cluster. Basically you say that i
> > > need
> > > > > to
> > > > > > add them to PCS cluster and then start them? but in case if
> > > DB1
> > > > > fails
> > > > > > - DB2 autopromoted and not required start of service again>
> > > > > >
> > > > > > Regarding colocation rule - i'm kind of missing logic how it
> > > > > works -
> > > > > > how i can "colocate" 1 of 2 APP servers to be around a master
> > > DB
> > > > > ?
> > > > >
> > > > > If I understand correctly, what you want is that both apps are
> > > > > restarted if the master changes?
> > > > >
> > > > > I'm thinking you'll need a custom OCF agent for the app
> > > servers.
> > > > > The
> > > > > monitor action, in addition to checking the app's status, could
> > > > > also
> > > > > check which db is master, and return an error if it's changed
> > > since
> > > > > the
> > > > > last monitor. (The start action would have to record the
> > > initial
> > > > > master.) Pacemaker will restart the app to recover from the
> > > error.
> > > > >
> > > > > That is a little hacky because you'll have errors in the status
> > > > > every
> > > > > time the master moves, but maybe that's worth knowing in your
> > > > > situation
> > > > > anyway.
> > > > >
> > > > > > On Tue, Mar 10, 2020 at 8:42 PM Strahil Nikolov <
> > > > > > hunter86...@yahoo.com> wrote:
> > > > > > > On March 10, 2020 7:31:27 PM GMT+02:00, Roman Hershkovich <
> > > > > > > war...@gmail.com> wrote:
> > > > > > > >I have 2 DB servers (master/slave with replica) and 2 APP
> > > > > servers.
> > > > > > > >2 APP servers managed by pacemaker  (active/passive) , but
> > > i
> > > > > want
> > > > > > > also
> > > > > > > >to
> > > > > > > >monitor "which DB is master".  I can't use VIP (which
> > > could be
> > > > > > > sticked
> > > > > > > >on
> > > > > > > >master DB) - it is very limited virtual environment.
> > > > > > > >
> > > > > > > >Is it possible to create a rule or some other scenario, so
> > > in
> > > > > case
> > > 

Re: [ClusterLabs] I want to have some resource monitored and based on that make an acton. Is it possible?

2020-03-11 Thread Ken Gaillot
On Wed, 2020-03-11 at 16:08 +0200, Roman Hershkovich wrote:
> Great, thank you very much for explanation. Regarding returning error
> - i did not knew.
> So, basically i can have a service, that will probe for master DB, in
> case of its transfer - service will update /etc/hosts and return
> error, which will be caught by pcs and it will restart whole
> dependent set ? Sounds good.
> But how i can do 2 "main resources" ? I have webserver AND
> db_monitor. In case of failure of webserver - should all start on
> node b, but in case of DB change - only underlying resources ...
> Should i make webserver outside of set? 

If you want the webserver to move to another node after a single
failure (of the webserver itself), set its migration-threshold to 1. If
you want other resources to move with it, colocate them with the
webserver.

The db monitor won't affect that -- if the db monitor fails, anything
ordered after it will restart.

> On Wed, Mar 11, 2020 at 3:57 PM Ken Gaillot 
> wrote:
> > On Wed, 2020-03-11 at 02:27 +0200, Roman Hershkovich wrote:
> > > Yes.
> > > I have only 1 APP active at same time, and so I want this app to
> > be
> > > restarted whenever DB changes. Another one is a "standby" APP,
> > where
> > > all resources are shut.
> > > So i thought about adding some "service" script, which will probe
> > a
> > > DB , and in case if it finds a CHANGE - will trigger pcs to
> > reload a
> > > set of resources, where one of resource would be a systemctl
> > file,
> > > which will continue to run a script, so in case of next change of
> > DB
> > > - it will restart APP set again. Is it sounds reasonable? (i
> > don't
> > > care of errors. I mean - i do, i want to log, but i'm ok to see
> > them)
> > 
> > That sounds fine, but I'd trigger the restart by returning an error
> > code from the db-monitoring script, rather than directly attempt to
> > restart the resources via pcs. If you order the other resources
> > after
> > the db-monitoring script, pacemaker will automatically restart them
> > when the db-monitoring script returns an error.
> > 
> > > In addition - i thought maybe bringing PAF here could be useful -
> > but
> > > this is even more complex ... 
> > 
> > If bringing the db into the cluster is a possibility, that would
> > probably be more reliable, with a quicker response too.
> > 
> > In that case you would simply order the dependent resources after
> > the
> > database master promotion. pcs example: pcs constraint order
> > promote
> > DB-RSC then start DEPENDENT-RSC
> > 
> > > On Tue, Mar 10, 2020 at 10:28 PM Ken Gaillot  > >
> > > wrote:
> > > > On Tue, 2020-03-10 at 21:03 +0200, Roman Hershkovich wrote:
> > > > > DB servers are not in PCS cluster. Basically you say that i
> > need
> > > > to
> > > > > add them to PCS cluster and then start them? but in case if
> > DB1
> > > > fails
> > > > > - DB2 autopromoted and not required start of service again>
> > > > > 
> > > > > Regarding colocation rule - i'm kind of missing logic how it
> > > > works -
> > > > > how i can "colocate" 1 of 2 APP servers to be around a master
> > DB
> > > > ? 
> > > > 
> > > > If I understand correctly, what you want is that both apps are
> > > > restarted if the master changes?
> > > > 
> > > > I'm thinking you'll need a custom OCF agent for the app
> > servers.
> > > > The
> > > > monitor action, in addition to checking the app's status, could
> > > > also
> > > > check which db is master, and return an error if it's changed
> > since
> > > > the
> > > > last monitor. (The start action would have to record the
> > initial
> > > > master.) Pacemaker will restart the app to recover from the
> > error.
> > > > 
> > > > That is a little hacky because you'll have errors in the status
> > > > every
> > > > time the master moves, but maybe that's worth knowing in your
> > > > situation
> > > > anyway.
> > > > 
> > > > > On Tue, Mar 10, 2020 at 8:42 PM Strahil Nikolov <
> > > > > hunter86...@yahoo.com> wrote:
> > > > > > On March 10, 2020 7:31:27 PM GMT+02:00, Roman Hershkovich <
> > > > > > war...@gmail.com> wrote:
> > > > > > >I have 2 DB servers (master/slave with replica) and 2 APP
> > > > servers.
> > > > > > >2 APP servers managed by pacemaker  (active/passive) , but
> > i
> > > > want
> > > > > > also
> > > > > > >to
> > > > > > >monitor "which DB is master".  I can't use VIP (which
> > could be
> > > > > > sticked
> > > > > > >on
> > > > > > >master DB) - it is very limited virtual environment.
> > > > > > >
> > > > > > >Is it possible to create a rule or some other scenario, so
> > in
> > > > case
> > > > > > if
> > > > > > >master moved - pacemaker will restart APP (app does not
> > > > support
> > > > > > >failover) ?
> > > > > > 
> > > > > > Hi Roman,
> > > > > > 
> > > > > > If you set an order rule that  starts  first the master 
> > and
> > > > then
> > > > > > the app, during a failover  the app will be stoped  and
> > once
> > > > the
> > > > > > master  is switched  (slave is promoted) the  app 

Re: [ClusterLabs] iSCSILogicalUnit - scsi_id and multiple clusters

2020-03-11 Thread Oyvind Albrigtsen

On 06/03/20 13:22 +0200, Strahil Nikolov wrote:

On March 6, 2020 11:06:13 AM GMT+02:00, Oyvind Albrigtsen  
wrote:

Hi Strahil,

It seems like it tries to set one based on the resource name, and from
a quick check it seems like it also did on RHEL 7.5.

https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/iSCSILogicalUnit.in#L57


Oyvind

On 05/03/20 23:15 +0200, Strahil Nikolov wrote:

Hey Community,

I finaly got some time to report  an issue with iSCSILogicalUnit and

scsi_id  ( https://github.com/ClusterLabs/resource-agents/issues/1463
).


The  issue was observed a while ago on RHEL 7.5 and SLES 12 SP4.

Do you know if any change was made to:
A)  Either make 'scsi_id'  a mandatory option
B)  When 'scsi_id' is   not provided by the admin,  a random one is

picked (permanently)


If not, then the github issue is relevant.

Sadly, my exam (EX436) is on monday and I can't test it before that.

Best Rregards,
Strahil Nikolov
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Thanks for the reply, Oyvind

In an environment with naming convention that allows globally non-unique  
resource names - this could cause the behaviour I have described in github (2 
clusters,  2 separate luns,  client's multipath aggregates them in 1 lun with 4 
paths).

Do you think that a better approach will be to  mark 'scsi_id' as mandatory 
option (of course  we can bypass with  'pcs  --force') or  we should randomly 
set one on resource creation if the value is not defined ?
Of course  ,  the algorithm can get a little modification  - a random seed  for 
example.

The second & third  options will be harder  to implement as we got both pcs  & 
crmsh actively used among distributions,  but will add some 'dummy proofness'. For me 
the first option is easiest to implement.

Adding info about it in metadata might be the best way to go.

Making it mandatory might annoy users who use Ansible or other
solutions to setup having to change their Playbooks to set it.



Best Regards,
Strahil Nikolov



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] I want to have some resource monitored and based on that make an acton. Is it possible?

2020-03-11 Thread Roman Hershkovich
Great, thank you very much for explanation. Regarding returning error - i
did not knew.
So, basically i can have a service, that will probe for master DB, in case
of its transfer - service will update /etc/hosts and return error, which
will be caught by pcs and it will restart whole dependent set ? Sounds good.
But how i can do 2 "main resources" ? I have webserver AND db_monitor. In
case of failure of webserver - should all start on node b, but in case of
DB change - only underlying resources ...
Should i make webserver outside of set?

On Wed, Mar 11, 2020 at 3:57 PM Ken Gaillot  wrote:

> On Wed, 2020-03-11 at 02:27 +0200, Roman Hershkovich wrote:
> > Yes.
> > I have only 1 APP active at same time, and so I want this app to be
> > restarted whenever DB changes. Another one is a "standby" APP, where
> > all resources are shut.
> > So i thought about adding some "service" script, which will probe a
> > DB , and in case if it finds a CHANGE - will trigger pcs to reload a
> > set of resources, where one of resource would be a systemctl file,
> > which will continue to run a script, so in case of next change of DB
> > - it will restart APP set again. Is it sounds reasonable? (i don't
> > care of errors. I mean - i do, i want to log, but i'm ok to see them)
>
> That sounds fine, but I'd trigger the restart by returning an error
> code from the db-monitoring script, rather than directly attempt to
> restart the resources via pcs. If you order the other resources after
> the db-monitoring script, pacemaker will automatically restart them
> when the db-monitoring script returns an error.
>
> > In addition - i thought maybe bringing PAF here could be useful - but
> > this is even more complex ...
>
> If bringing the db into the cluster is a possibility, that would
> probably be more reliable, with a quicker response too.
>
> In that case you would simply order the dependent resources after the
> database master promotion. pcs example: pcs constraint order promote
> DB-RSC then start DEPENDENT-RSC
>
> > On Tue, Mar 10, 2020 at 10:28 PM Ken Gaillot 
> > wrote:
> > > On Tue, 2020-03-10 at 21:03 +0200, Roman Hershkovich wrote:
> > > > DB servers are not in PCS cluster. Basically you say that i need
> > > to
> > > > add them to PCS cluster and then start them? but in case if DB1
> > > fails
> > > > - DB2 autopromoted and not required start of service again>
> > > >
> > > > Regarding colocation rule - i'm kind of missing logic how it
> > > works -
> > > > how i can "colocate" 1 of 2 APP servers to be around a master DB
> > > ?
> > >
> > > If I understand correctly, what you want is that both apps are
> > > restarted if the master changes?
> > >
> > > I'm thinking you'll need a custom OCF agent for the app servers.
> > > The
> > > monitor action, in addition to checking the app's status, could
> > > also
> > > check which db is master, and return an error if it's changed since
> > > the
> > > last monitor. (The start action would have to record the initial
> > > master.) Pacemaker will restart the app to recover from the error.
> > >
> > > That is a little hacky because you'll have errors in the status
> > > every
> > > time the master moves, but maybe that's worth knowing in your
> > > situation
> > > anyway.
> > >
> > > > On Tue, Mar 10, 2020 at 8:42 PM Strahil Nikolov <
> > > > hunter86...@yahoo.com> wrote:
> > > > > On March 10, 2020 7:31:27 PM GMT+02:00, Roman Hershkovich <
> > > > > war...@gmail.com> wrote:
> > > > > >I have 2 DB servers (master/slave with replica) and 2 APP
> > > servers.
> > > > > >2 APP servers managed by pacemaker  (active/passive) , but i
> > > want
> > > > > also
> > > > > >to
> > > > > >monitor "which DB is master".  I can't use VIP (which could be
> > > > > sticked
> > > > > >on
> > > > > >master DB) - it is very limited virtual environment.
> > > > > >
> > > > > >Is it possible to create a rule or some other scenario, so in
> > > case
> > > > > if
> > > > > >master moved - pacemaker will restart APP (app does not
> > > support
> > > > > >failover) ?
> > > > >
> > > > > Hi Roman,
> > > > >
> > > > > If you set an order rule that  starts  first the master  and
> > > then
> > > > > the app, during a failover  the app will be stoped  and once
> > > the
> > > > > master  is switched  (slave is promoted) the  app will be
> > > started
> > > > > again.
> > > > >
> > > > > Also you can consider  a  colocation rule that all  apps are
> > > > > started  where  the master  DB is running  -  so the lattency
> > > will
> > > > > be minimal.
> > > > >
> > > > > Best Regards,
> > > > > Strahil Nikolov
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: dependency on monotonic clock for systemd resources

2020-03-11 Thread Ken Gaillot
On Wed, 2020-03-11 at 08:20 +0100, Ulrich Windl wrote:
> > > > Ken Gaillot  schrieb am 10.03.2020 um
> > > > 18:49 in
> 
> Nachricht
> <3098_1583862581_5E67D335_3098_1270_1_91b728456223eea7c8a00516a91ede1
> 8ab094530.c
> m...@redhat.com>:
> > Hi all,
> > 
> > This is not a big deal but I wanted to give a heads‑up for anyone
> > who
> > builds their own pacemaker packages.
> > 
> > With Pacemaker 2.0.4 (first release candidate expected next month),
> > we
> > are finally replacing our calls to the long‑deprecated ftime()
> > system
> > call with the "modern" clock_gettime().
> > 
> > As part of this, building pacemaker with support for systemd‑class
> > resources will now require that the underlying platform supports
> > clock_gettime() with CLOCK_MONOTONIC. Every platform we're aware of
> > that is used for pacemaker does, so this should not be an issue.
> > The
> > configure script will automatically determine whether support is
> > available.
> 
> You only have to take care not to compare CLOCK_MONOTONIC timestamps
> between
> nodes or node restarts. 

Definitely :)

They are used only to calculate action queue and run durations. For
most resource types those are optional (for reporting only), but
systemd resources require them (multiple status checks are usually
necessary to verify a start or stop worked, and we need to check the
remaining timeout each time).
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] I want to have some resource monitored and based on that make an acton. Is it possible?

2020-03-11 Thread Ken Gaillot
On Wed, 2020-03-11 at 02:27 +0200, Roman Hershkovich wrote:
> Yes.
> I have only 1 APP active at same time, and so I want this app to be
> restarted whenever DB changes. Another one is a "standby" APP, where
> all resources are shut.
> So i thought about adding some "service" script, which will probe a
> DB , and in case if it finds a CHANGE - will trigger pcs to reload a
> set of resources, where one of resource would be a systemctl file,
> which will continue to run a script, so in case of next change of DB
> - it will restart APP set again. Is it sounds reasonable? (i don't
> care of errors. I mean - i do, i want to log, but i'm ok to see them)

That sounds fine, but I'd trigger the restart by returning an error
code from the db-monitoring script, rather than directly attempt to
restart the resources via pcs. If you order the other resources after
the db-monitoring script, pacemaker will automatically restart them
when the db-monitoring script returns an error.

> In addition - i thought maybe bringing PAF here could be useful - but
> this is even more complex ... 

If bringing the db into the cluster is a possibility, that would
probably be more reliable, with a quicker response too.

In that case you would simply order the dependent resources after the
database master promotion. pcs example: pcs constraint order promote
DB-RSC then start DEPENDENT-RSC

> On Tue, Mar 10, 2020 at 10:28 PM Ken Gaillot 
> wrote:
> > On Tue, 2020-03-10 at 21:03 +0200, Roman Hershkovich wrote:
> > > DB servers are not in PCS cluster. Basically you say that i need
> > to
> > > add them to PCS cluster and then start them? but in case if DB1
> > fails
> > > - DB2 autopromoted and not required start of service again>
> > > 
> > > Regarding colocation rule - i'm kind of missing logic how it
> > works -
> > > how i can "colocate" 1 of 2 APP servers to be around a master DB
> > ? 
> > 
> > If I understand correctly, what you want is that both apps are
> > restarted if the master changes?
> > 
> > I'm thinking you'll need a custom OCF agent for the app servers.
> > The
> > monitor action, in addition to checking the app's status, could
> > also
> > check which db is master, and return an error if it's changed since
> > the
> > last monitor. (The start action would have to record the initial
> > master.) Pacemaker will restart the app to recover from the error.
> > 
> > That is a little hacky because you'll have errors in the status
> > every
> > time the master moves, but maybe that's worth knowing in your
> > situation
> > anyway.
> > 
> > > On Tue, Mar 10, 2020 at 8:42 PM Strahil Nikolov <
> > > hunter86...@yahoo.com> wrote:
> > > > On March 10, 2020 7:31:27 PM GMT+02:00, Roman Hershkovich <
> > > > war...@gmail.com> wrote:
> > > > >I have 2 DB servers (master/slave with replica) and 2 APP
> > servers.
> > > > >2 APP servers managed by pacemaker  (active/passive) , but i
> > want
> > > > also
> > > > >to
> > > > >monitor "which DB is master".  I can't use VIP (which could be
> > > > sticked
> > > > >on
> > > > >master DB) - it is very limited virtual environment.
> > > > >
> > > > >Is it possible to create a rule or some other scenario, so in
> > case
> > > > if
> > > > >master moved - pacemaker will restart APP (app does not
> > support
> > > > >failover) ?
> > > > 
> > > > Hi Roman,
> > > > 
> > > > If you set an order rule that  starts  first the master  and
> > then
> > > > the app, during a failover  the app will be stoped  and once
> > the
> > > > master  is switched  (slave is promoted) the  app will be
> > started
> > > > again.
> > > > 
> > > > Also you can consider  a  colocation rule that all  apps are 
> > > > started  where  the master  DB is running  -  so the lattency
> > will
> > > > be minimal.
> > > > 
> > > > Best Regards,
> > > > Strahil Nikolov
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Coming in Pacemaker 2.0.4: dependency on monotonic clock for systemd resources

2020-03-11 Thread Ulrich Windl
>>> Ken Gaillot  schrieb am 10.03.2020 um 18:49 in
Nachricht
<3098_1583862581_5E67D335_3098_1270_1_91b728456223eea7c8a00516a91ede18ab094530.c
m...@redhat.com>:
> Hi all,
> 
> This is not a big deal but I wanted to give a heads‑up for anyone who
> builds their own pacemaker packages.
> 
> With Pacemaker 2.0.4 (first release candidate expected next month), we
> are finally replacing our calls to the long‑deprecated ftime() system
> call with the "modern" clock_gettime().
> 
> As part of this, building pacemaker with support for systemd‑class
> resources will now require that the underlying platform supports
> clock_gettime() with CLOCK_MONOTONIC. Every platform we're aware of
> that is used for pacemaker does, so this should not be an issue. The
> configure script will automatically determine whether support is
> available.

You only have to take care not to compare CLOCK_MONOTONIC timestamps between
nodes or node restarts. 

> ‑‑ 
> Ken Gaillot 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/