Re: [ClusterLabs] VirtualDomain restart caused fencing.

2021-06-30 Thread kgaillot
On Wed, 2021-06-30 at 08:40 -0700, Matthew Schumacher wrote: > Hello, > > I'm not sure how to fix this, but calling 'crm resource restart vm- > name' this morning caused an entire node to get fenced, kicking the > stool out from under a number of VMs. > > Looking at VirtualDomain it looks like

[ClusterLabs] VirtualDomain restart caused fencing.

2021-06-30 Thread Matthew Schumacher
Hello, I'm not sure how to fix this, but calling 'crm resource restart vm-name' this morning caused an entire node to get fenced, kicking the stool out from under a number of VMs. Looking at VirtualDomain it looks like the system defaults to a 90s timeout, and if it can't gracefully shutdown

Re: [ClusterLabs] Antw: [EXT] Postgres Cluster PAF problems

2021-06-30 Thread Jehan-Guillaume de Rorthais
On Wed, 30 Jun 2021 14:36:29 +0200 damiano giuliani wrote: > the replication is async, having a look into the postgres logs seems some > updates failed cuz no master available. 'Not sure un understand what you mean. As Pacemaker recovered the primary on the same node, standbys and clients lost

Re: [ClusterLabs] Antw: [EXT] Postgres Cluster PAF problems

2021-06-30 Thread damiano giuliani
Hi Guys, thanks for the support, really hoped you were not in holydays yet! the replication is async, having a look into the postgres logs seems some updates failed cuz no master available. i dont expect resource problems (im investingating ayway), the nodes have 200gb RAM , 80 cpu and alot of

[ClusterLabs] Antw: [EXT] Postgres Cluster PAF problems

2021-06-30 Thread Ulrich Windl
>>> damiano giuliani schrieb am 30.06.2021 um >>> 13:44 in Nachricht : > Hi Guys, > > sorry for bothering, unfortunally i was called for an issue related to a > cluster i did months ago which was fully functional till last saturday. > > looks some applications lost connection to the master

Re: [ClusterLabs] Postgres Cluster PAF problems

2021-06-30 Thread Jehan-Guillaume de Rorthais
Hi, On Wed, 30 Jun 2021 13:44:28 +0200 damiano giuliani wrote: > looks some applications lost connection to the master losing some > update/insert. > > i found the cause into the logs, the psqld-monitor went timeout after > 1ms and the master resource been demote, the instance stopped and

[ClusterLabs] Postgres Cluster PAF problems

2021-06-30 Thread damiano giuliani
.x86_64 i attached the log could be useful to dig further. Can some guys point me on the right direction, should be really appreciate. thanks for the support Pepe corosync.log-20210630.gz Description: GNU Zip compressed data ___ Manage your subscript