[ClusterLabs] Coming in Pacemaker 2.1.3: CIB colorization for ACLs

2022-03-09 Thread Ken Gaillot
Hi all,

I'm hoping to have the first release candidate for Pacemaker 2.1.3
available towards the end of next month.

One of the new features will be a cibadmin option to show the CIB
colorized according to a particular user's ACLs. For example:

  cibadmin --query --user=tony --show-access=color | less -r

That will show the CIB XML with the parts that the "tony" user account
can't see in red, the parts that tony can only read in blue, and the
parts that tony can read and write in green.

The command must be run as root or a user with full CIB access, to be
able to show the parts of the CIB that the specified user can't see.

This feature was initially developed by Jan Pokorný and completed by
Grace Chin.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] PAF with postgresql 13?

2022-03-09 Thread Tomas Jelinek

Dne 08. 03. 22 v 23:08 Ken Gaillot napsal(a):

On Tue, 2022-03-08 at 17:20 +0100, Jehan-Guillaume de Rorthais wrote:

Hi,

Sorry, your mail was really hard to read on my side, but I think I
understood
and try to answer bellow.

On Tue, 8 Mar 2022 11:45:30 +
lejeczek via Users  wrote:


On 08/03/2022 10:21, Jehan-Guillaume de Rorthais wrote:

op start timeout=60s \ op stop timeout=60s \ op promote
timeout=30s  >> \
op demote timeout=120s \ op monitor interval=15s

timeout=10s >> role="Master" meta master-max=1 \ op monitor
interval=16s >> timeout=10s role="Slave" \ op notify
timeout=60s meta notify=true > Because "op" appears, we are
back in resource ("pgsqld") context, > anything after is
interpreted as ressource and operation attributes, > even
the "meta notify=true". That's why your pgsqld-clone doesn't
  > have the meta attribute "notify=true" set.
Here is one-liner that should do - add, as per 'debug-'
suggestion, 'master-max=1'


What debug- suggestion??

...

then do:

-> $ pcs resource delete pgsqld

'-clone' should get removed too, so now no 'pgsqld'
resource(s) but cluster - weirdly in my mind - leaves node
attributes on.


indeed.


I see 'master-pgsqld' with each node and do not see why
'node attributes' should be kept(certainly shown) for
non-existent resources(to which only resources those attrs
are instinct)
So, you want to "clean" that for, perhaps for now you are
not going to have/use 'pgsqlms', you can do that with:

-> $ pcs node attribute node1 master-pgsqld="" # same for
remaining nodes


indeed.


now .. ! repeat your one-liner which worked just a moment
ago and you should get exact same or similar errors(while
all nodes are stuck on 'slave'


You have no promotion because your PostgreSQL instances has been
stopped
in standby mode. The cluster has no way and no score to promote one
of them.


-> $ pcs resource debug-promote pgsqld
crm_resource: Error performing operation: Error occurred
Operation force-promote for pgsqld (ocf:heartbeat:pgsqlms)
returned 1 (error: Can not get current node LSN location)
/tmp:5432 - accepting connections


NEVER use "debug-promote" or other "debug-*" command with pgsqlms, or
any other
cloned ressources. AFAIK, these commands works fine for "stateless"
ressource,
but do not (could not) create the required environnement for the
clone and multi-state ones.

So I repeat, NEVER use "debug-promote".

What you want to do is setting the promotion score on the node you
want the
promotion to happen. Eg.:

   pcs node attribute srv1 master-pgsqld=1001

You can use "crm_attribute" or "crm_master" as well.


ocf-exit-reason:Can not get current node LSN location


This one is probably because of "debug-promote".


You have to 'cib-push' to "fix" this very problem.
In my(admin's) opinion this is a 100% candidate for a bug -
whether in PCS or PAF - perhaps authors may wish to comment?


Removing the node attributes with the resource might be legit from
the
Pacemaker point of view, but I'm not sure how they can track the
dependency
(ping Ken?).


Higher-level tools like pcs or crm shell could probably do it when
removing the resource (i.e. if the resource was a promotable clone,
check for and remove any node attributes of the form master-$RSC_ID).
That sounds like a good idea to me.


I put this on pcs todo list.

Regards,
Tomas



Pacemaker would be a bad place to do it because Pacemaker only sees the
newly modified CIB with the resource configuration gone -- it can't
know for sure whether it was a promotable clone, and it can only know
it existed at all if there is leftover status entries (causing the
resource to be listed as "orphaned"), which isn't guaranteed.



PAF has no way to know the ressource is being deleted and can not
remove its
node attribute before hand.

Maybe PCS can look for promotable score and remove them during the
"resource
delete" command (ping Tomas)?

Regards,



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: Re: Antw: [EXT] Cluster timeout

2022-03-09 Thread Ulrich Windl
>>> FLORAC Thierry  schrieb am 09.03.2022 um 16:56 in
Nachricht
:

 FLORAC Thierry  schrieb am 09.03.2022 um 11:46 in
> Nachricht
>
 M>:
> 
>> Hi,
>>
>> I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker
and
> 
>> Corosync on a Debian GNU/Linux operating system.
>> Everything is OK, but my platform seems to be quite "sensitive" to small
>> network timeouts which are generating a cluster migration start from
active
> 
>> to passive node; generally, the process doesn't go through to the end: as
>> soon as the connection is back again, the migration is cancelled and the
>> database restarts!
> 
> Could it be you run without fencing? Maybe show some logs!
> 
> Logs are quite verbose and not very easy to understand...
> What log would you need?

Those showing what happens when the network goes down, and what happens when
the network comes up.
Usually the DC writes some good "action summaries" (typically after
"pacemaker-controld[7236]:  notice: State transition S_IDLE ->
S_POLICY_ENGINE"). Those would be helpful.

> 
>> That should be OK but on the application side, some database connections
(on
> 
>> a Java WildFly server) can become "invalid"! So I would like to avoid
these
> 
>> migrations when this kind of small timeout occurs...
>>
>> So my question is: which cluster settings can I change to increase the
>> timeout before starting a cluster migration?
>>
>> Best regards,
>> Thierry
>>
>>
>>
>> Thierry Florac
>> Resp. Pôle Architecture Applicative et Mobile
>> DSI ‑ Dépt. Études et Solutions Tranverses
>> 2, avenue de Saint‑Mandé ‑ 75570 Paris cedex 12
>> Tél : 01 40 19 59 64
>> www.onf.fr r



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Ulrich Windl
>>> Asseel Sidique  schrieb am 09.03.2022 um 16:45 in
Nachricht


> Hi Team,
> 
> My question is regarding the filesystem resource agent. In the filesystem 
> resource 
> agent esystem> , there is a comment that states:
> 
> # Do not put this filesystem in /etc/fstab. This script manages all of
> # that for you.
> 
> From what I understand, this means that if you are using a filesystem that 
> is being managed by Pacemaker, you should remove it from /etc/fstab to avoid

> the OS from also trying to automatically mount the filesystem.
> 
> Would it still be okay to add an entry for a filesystem in /etc/fstab if the

> 'noauto' option is specified? (This will disable the automatic mount for the

> filesystem).

But WHY would you want to do that?

> 
> I did do a test with the 'noauto' specified for a filesystem that was 
> managed by Pacemaker and did not face any issues ‑ but I still wanted to 
> confirm if this was okay.
> 
> Best,
> Asseel



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Andrei Borzenkov
On 09.03.2022 18:45, Asseel Sidique wrote:
> Hi Team,
> 
> My question is regarding the filesystem resource agent. In the filesystem 
> resource 
> agent
>  , there is a comment that states:
> 
> # Do not put this filesystem in /etc/fstab. This script manages all of
> # that for you.
> 
> From what I understand, this means that if you are using a filesystem that is 
> being managed by Pacemaker, you should remove it from /etc/fstab to avoid the 
> OS from also trying to automatically mount the filesystem.
> 
> Would it still be okay to add an entry for a filesystem in /etc/fstab if the 
> 'noauto' option is specified? (This will disable the automatic mount for the 
> filesystem).

Why? What exactly are you trying to achieve?

> 
> I did do a test with the 'noauto' specified for a filesystem that was managed 
> by Pacemaker and did not face any issues - but I still wanted to confirm if 
> this was okay.
> 

> Best,
> Asseel
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] Cluster timeout

2022-03-09 Thread FLORAC Thierry
>>> FLORAC Thierry  schrieb am 09.03.2022 um 11:46 in
Nachricht
:

> Hi,
>
> I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker and

> Corosync on a Debian GNU/Linux operating system.
> Everything is OK, but my platform seems to be quite "sensitive" to small
> network timeouts which are generating a cluster migration start from active

> to passive node; generally, the process doesn't go through to the end: as
> soon as the connection is back again, the migration is cancelled and the
> database restarts!

Could it be you run without fencing? Maybe show some logs!

Logs are quite verbose and not very easy to understand...
What log would you need?

> That should be OK but on the application side, some database connections (on

> a Java WildFly server) can become "invalid"! So I would like to avoid these

> migrations when this kind of small timeout occurs...
>
> So my question is: which cluster settings can I change to increase the
> timeout before starting a cluster migration?
>
> Best regards,
> Thierry
>
>
>
> Thierry Florac
> Resp. Pôle Architecture Applicative et Mobile
> DSI ‑ Dépt. Études et Solutions Tranverses
> 2, avenue de Saint‑Mandé ‑ 75570 Paris cedex 12
> Tél : 01 40 19 59 64
> www.onf.fr r

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Asseel Sidique
Hi Team,

My question is regarding the filesystem resource agent. In the filesystem 
resource 
agent
 , there is a comment that states:

# Do not put this filesystem in /etc/fstab. This script manages all of
# that for you.

>From what I understand, this means that if you are using a filesystem that is 
>being managed by Pacemaker, you should remove it from /etc/fstab to avoid the 
>OS from also trying to automatically mount the filesystem.

Would it still be okay to add an entry for a filesystem in /etc/fstab if the 
'noauto' option is specified? (This will disable the automatic mount for the 
filesystem).

I did do a test with the 'noauto' specified for a filesystem that was managed 
by Pacemaker and did not face any issues - but I still wanted to confirm if 
this was okay.

Best,
Asseel
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Cluster timeout

2022-03-09 Thread Ulrich Windl
>>> FLORAC Thierry  schrieb am 09.03.2022 um 11:46 in
Nachricht
:

> Hi,
> 
> I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker and

> Corosync on a Debian GNU/Linux operating system.
> Everything is OK, but my platform seems to be quite "sensitive" to small 
> network timeouts which are generating a cluster migration start from active

> to passive node; generally, the process doesn't go through to the end: as 
> soon as the connection is back again, the migration is cancelled and the 
> database restarts!

Could it be you run without fencing? Maybe show some logs!

> That should be OK but on the application side, some database connections (on

> a Java WildFly server) can become "invalid"! So I would like to avoid these

> migrations when this kind of small timeout occurs...
> 
> So my question is: which cluster settings can I change to increase the 
> timeout before starting a cluster migration?
> 
> Best regards,
> Thierry
> 
> 
> 
> Thierry Florac
> Resp. Pôle Architecture Applicative et Mobile
> DSI ‑ Dépt. Études et Solutions Tranverses
> 2, avenue de Saint‑Mandé ‑ 75570 Paris cedex 12
> Tél : 01 40 19 59 64
> www.onf.fr 
> 
> [http://www.ext.onf.fr/img/onf‑signature.jpg]



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Cluster timeout

2022-03-09 Thread Strahil Nikolov via Users
You can bump the 'token' to a higher value (for example 10s ) and adjust the 
consensus based on that value. See man 5 corosync.conf
Don't forget to sync the nodes and reload the corosync stack.
Of course proper testing on non-Prod is highly recommend.
Note: Both parameters use milliseconds (at least based on the manpage)
Best Regards,Strahil Nikolov
 
 
  On Wed, Mar 9, 2022 at 12:46, FLORAC Thierry wrote:
#yiv4997566984 P {margin-top:0;margin-bottom:0;}Hi,
I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker and 
Corosync on a Debian GNU/Linux operating system.Everything is OK, but my 
platform seems to be quite "sensitive" to small network timeouts which are 
generating a cluster migration start from active to passive node; generally, 
the process doesn't go through to the end: as soon as the connection is back 
again, the migration is cancelled and the database restarts!That should be OK 
but on the application side, some database connections (on a Java WildFly 
server) can become "invalid"! So I would like to avoid these migrations when 
this kind of small timeout occurs...

So my question is: which cluster settings can I change to increase the timeout 
before starting a cluster migration?
Best regards,Thierry



Thierry Florac 
Resp. Pôle Architecture Applicative et Mobile
DSI - Dépt. Études et Solutions Tranverses 
2, avenue de Saint-Mandé - 75570 Paris cedex 12 
Tél : 01 40 19 59 64 
www.onf.fr 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Cluster timeout

2022-03-09 Thread FLORAC Thierry
Hi,

I manage an active/passive PostgreSQL cluster using DRBD, LVM, Pacemaker and 
Corosync on a Debian GNU/Linux operating system.
Everything is OK, but my platform seems to be quite "sensitive" to small 
network timeouts which are generating a cluster migration start from active to 
passive node; generally, the process doesn't go through to the end: as soon as 
the connection is back again, the migration is cancelled and the database 
restarts!
That should be OK but on the application side, some database connections (on a 
Java WildFly server) can become "invalid"! So I would like to avoid these 
migrations when this kind of small timeout occurs...

So my question is: which cluster settings can I change to increase the timeout 
before starting a cluster migration?

Best regards,
Thierry



Thierry Florac
Resp. Pôle Architecture Applicative et Mobile
DSI - Dépt. Études et Solutions Tranverses
2, avenue de Saint-Mandé - 75570 Paris cedex 12
Tél : 01 40 19 59 64
www.onf.fr 

[http://www.ext.onf.fr/img/onf-signature.jpg]

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/