Re: [ClusterLabs] cluster loses state (randomly) every few minutes.

Jan Friesse Mon, 18 Jan 2021 00:57:11 -0800

lejeczek,

hi guys,
I have a very basic two-node cluster, not even a single resource on it,but very troublesome - it keeps braking.
Journal for 'pacemaker' shows constantly (on both nodes):
...
warning: Input I_DC_TIMEOUT received in state S_PENDING fromcrm_timer_popped
  notice: State transition S_ELECTION -> S_PENDING
  notice: State transition S_PENDING -> S_NOT_DC
  notice: Lost attribute writer swir
  notice: Node swir state is now lost
  notice: Our peer on the DC (swir) is dead
notice: Purged 1 peer with id=2 and/or uname=swir from the membershipcache
  notice: Node swir state is now lost
  notice: State transition S_NOT_DC -> S_ELECTION
  notice: Removing all swir attributes for peer loss
notice: Purged 1 peer with id=2 and/or uname=swir from the membershipcache
  notice: Node swir state is now lost
  notice: Node swir state is now lost
  notice: Recorded local node as attribute writer (was unset)
notice: Purged 1 peer with id=2 and/or uname=swir from the membershipcache
  notice: State transition S_ELECTION -> S_INTEGRATION
  warning: Blind faith: not fencing unseen nodes
  notice: Delaying fencing operations until there are resources to manage
notice: Calculated transition 0, saving inputs in/var/lib/pacemaker/pengine/pe-input-627.bz2 notice: Transition 0 (Complete=0, Pending=0, Fired=0, Skipped=0,Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-627.bz2): Complete
  notice: State transition S_TRANSITION_ENGINE -> S_IDLE
  notice: Node swir state is now member
  notice: Node swir state is now member
  notice: Node swir state is now member
  notice: Node swir state is now member
  notice: State transition S_IDLE -> S_INTEGRATION
  warning: Another DC detected: swir (op=noop)
  notice: Detected another attribute writer (swir), starting new election
  notice: Setting #attrd-protocol[swir]: (unset) -> 2
  notice: State transition S_ELECTION -> S_RELEASE_DC
  notice: State transition S_PENDING -> S_NOT_DC
  notice: Recorded local node as attribute writer (was unset)


Is there anything interesting in corosync.log?

It's the same hardware on which "this same" cluster ran okey and then,only a couple of days ago, I upgraded Centos on these two boxes to "Steam"I'm hoping it's something trivial I'm missing with new version(s) ofsoftware came with upgrace, perhaps some (new) settings for two-nodecluster which I missed.

Actually for Corosync there is one - increase of token timeout to 3sec.This was not a problem during my testing, but just for sure - have yourestarted corosync on both of the nodes? Do that have same token timeout(you can check used token timeout by running "corosync-cmapctl -gruntime.config.totem.token")?


Honza

Any suggestions greatly appreciated.
many thanks, L.
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster loses state (randomly) every few minutes.

Reply via email to