Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
On 14/11/14 11:01, Daniel Dehennin wrote: Christine Caulfield writes: [...] If its only happening at startup it could be the switch/router learning the ports for the nodes and building its routing tables. Switching to udpu will then get rid of the message if it's annoying Switching to updu make it works correctly. Ahh that's good. It sounds like it was something multicast related (if not exactly what I thought it might have been) ... these things usually are! Chrissie ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
Christine Caulfield writes: [...] > If its only happening at startup it could be the switch/router > learning the ports for the nodes and building its routing > tables. Switching to udpu will then get rid of the message if it's > annoying Switching to updu make it works correctly. Thanks. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
Christine Caulfield writes: [...] > If its only happening at startup it could be the switch/router > learning the ports for the nodes and building its routing > tables. Switching to udpu will then get rid of the message if it's > annoying When I start the corosync process of the VM, nothing happen in logs execpt the new membership, here are logs for two bare metal hosts (nebula3 is DC): #+begin_src Nov 14 08:29:29 nebula1 corosync[4242]: [TOTEM ] A new membership (192.168.231.70:80696) was formed. Members joined: 108489 Nov 14 08:29:29 nebula1 corosync[4242]: [QUORUM] Members[5]: 1084811078 1084811079 1084811080 108488 108489 Nov 14 08:29:29 nebula1 pacemakerd[4251]: notice: crm_update_peer_state: pcmk_quorum_notification: Node one-frontend[108489] - state is now member (was lost) Nov 14 08:29:29 nebula1 crmd[4258]: notice: crm_update_peer_state: pcmk_quorum_notification: Node one-frontend[108489] - state is now member (was lost) Nov 14 08:29:29 nebula1 corosync[4242]: [MAIN ] Completed service synchronization, ready to provide service. #+end_src #+begin_src Nov 14 08:29:29 nebula3 corosync[5345]: [TOTEM ] A new membership (192.168.231.70:80696) was formed. Members joined: 108489 Nov 14 08:29:29 nebula3 corosync[5345]: [QUORUM] Members[5]: 1084811078 1084811079 1084811080 108488 108489 Nov 14 08:29:29 nebula3 corosync[5345]: [MAIN ] Completed service synchronization, ready to provide service. Nov 14 08:29:29 nebula3 crmd[5423]: notice: crm_update_peer_state: pcmk_quorum_notification: Node one-frontend[108489] - state is now member (was lost) Nov 14 08:29:29 nebula3 pacemakerd[5416]: notice: crm_update_peer_state: pcmk_quorum_notification: Node one-frontend[108489] - state is now member (was lost) #+end_src I let the corosync process for some time and then start pacemaker, the logs start when pacemaker starts: #+begin_src Nov 14 09:26:30 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 Nov 14 09:26:30 nebula1 corosync[4242]: message repeated 8 times: [ [TOTEM ] Retransmit List: 24 ] Nov 14 09:26:30 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 Nov 14 09:26:46 nebula1 corosync[4242]: message repeated 101 times: [ [TOTEM ] Retransmit List: 24 25 26 ] Nov 14 09:26:46 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 Nov 14 09:26:46 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b Nov 14 09:26:46 nebula1 corosync[4242]: message repeated 44 times: [ [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b ] Nov 14 09:26:46 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 Nov 14 09:26:46 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 Nov 14 09:26:47 nebula1 corosync[4242]: message repeated 35 times: [ [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 ] Nov 14 09:26:47 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a Nov 14 09:26:47 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c Nov 14 09:26:48 nebula1 corosync[4242]: message repeated 43 times: [ [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c ] Nov 14 09:26:48 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 Nov 14 09:26:48 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 41 42 Nov 14 09:27:08 nebula1 crmd[4258]: notice: do_state_transition: State transition S_NOT_DC -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=do_election_count_vote ] Nov 14 09:27:08 nebula1 crmd[4258]: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ] Nov 14 09:27:08 nebula1 corosync[4242]: message repeated 119 times: [ [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 41 42 ] Nov 14 09:27:08 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 41 42 44 45 46 Nov 14 09:27:08 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 41 42 44 45 46 47 48 49 4a 4b 4c 4d 4e Nov 14 09:27:08 nebula1 corosync[4242]: [TOTEM ] Retransmit List: 24 25 26 28 29 2a 2b 34 35 36 38 3a 3b 3c 3f 40 41 42 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f Nov 14 09:27:28 nebula1 crmd[4258]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Nov 14 09:27:28 nebula1 crmd[4258]: notice: do_state_transition: State transition S_ELECTION -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL origin=do_election_count_vote ] Nov 14 09:27:48 nebula1 crmd[4258]: warning: do_log: FSA: Input I_DC_TIMEOUT from crm_timer_popped() received in state S_PENDING Nov 14 09:27:48 nebula1 crmd[4258]: notice: do_state
Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
On 13/11/14 17:54, Daniel Dehennin wrote: Digimer writes: This generally happens if the network is slow or congested. It is corosync saying it needs to resend some messages. It is not uncommon for it to happen now and then, but that is a fairly large amount of retransmits. Thanks for the explanation. Is your network slow or saturated often? It might be that the traffic from the join is enough to push a congested network to the edge. Not really: - two physical hosts with: + one 1Gb/s network card for OS (corosync network) + three 1Gb/s network cards in LACP bonding included in an Open vSwitch - one physical host with: + one 10Gb/s network card for the OS (corosync network) + three 10Gb/s network cards in LACP bonding included in an Open vSwitch - one KVM guest (quorum node) with: + one virtio card (corosync network) - one KVM guest with: + one virtio card for service (OpenNebula web frontend) + one virtio card for corosync communications With tcpdump I can see packets flying around on all nodes, but it looks like there is something with my two cards KVM guest, when I start pacemaker on it I begin to see Retransmit messages in other nodes logs. Is there a may to know which nodes is responsible of the resend of theses messages? If its only happening at startup it could be the switch/router learning the ports for the nodes and building its routing tables. Switching to udpu will then get rid of the message if it's annoying Chrissie ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
Digimer writes: > This generally happens if the network is slow or congested. It is > corosync saying it needs to resend some messages. It is not uncommon > for it to happen now and then, but that is a fairly large amount of > retransmits. Thanks for the explanation. > Is your network slow or saturated often? It might be that the traffic > from the join is enough to push a congested network to the edge. Not really: - two physical hosts with: + one 1Gb/s network card for OS (corosync network) + three 1Gb/s network cards in LACP bonding included in an Open vSwitch - one physical host with: + one 10Gb/s network card for the OS (corosync network) + three 10Gb/s network cards in LACP bonding included in an Open vSwitch - one KVM guest (quorum node) with: + one virtio card (corosync network) - one KVM guest with: + one virtio card for service (OpenNebula web frontend) + one virtio card for corosync communications With tcpdump I can see packets flying around on all nodes, but it looks like there is something with my two cards KVM guest, when I start pacemaker on it I begin to see Retransmit messages in other nodes logs. Is there a may to know which nodes is responsible of the resend of theses messages? Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
Re: [Pacemaker] TOTEM Retransmit list in logs when a node gets up
This generally happens if the network is slow or congested. It is corosync saying it needs to resend some messages. It is not uncommon for it to happen now and then, but that is a fairly large amount of retransmits. Is your network slow or saturated often? It might be that the traffic from the join is enough to push a congested network to the edge. On 13/11/14 08:07 AM, Daniel Dehennin wrote: Hello, My cluster seems to works correctly but when I start corosync and pacemaker on one of them[1] I start to have some TOTEM logs like this: #+begin_src Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 46 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4a 4b 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f #+end_src I do not understand what happens, do you have any hints? Regards. Footnotes: [1] the VM using two cards http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022962.html ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
[Pacemaker] TOTEM Retransmit list in logs when a node gets up
Hello, My cluster seems to works correctly but when I start corosync and pacemaker on one of them[1] I start to have some TOTEM logs like this: #+begin_src Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 46 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:10 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f Nov 13 14:00:30 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 47 48 49 4a 4b 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4a 4b 4c 4d 4e 4f Nov 13 14:00:35 nebula3 corosync[5345]: [TOTEM ] Retransmit List: 4b 4c 4d 4e 4f #+end_src I do not understand what happens, do you have any hints? Regards. Footnotes: [1] the VM using two cards http://oss.clusterlabs.org/pipermail/pacemaker/2014-November/022962.html -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF signature.asc Description: PGP signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org