Slava, please try to follow this steps: - First, edit your config file and keep only ONE node there and try to execute it without Asterisk/Pacemaker, ... Just pure corosync - Take a look to netstat -anop. Is corosync bound to correct interface? - Try to execute corosync-objctl. Can you see output like: compatibility=whitetank totem.version=2 totem.secauth=off ... runtime.blackbox.dump_flight_data=no runtime.blackbox.dump_state=no ?
- If (and only if) corosync is bound to correct interface and corosync-objctl doesn't report error, try do the same on second node. - If (and only if) corosync on second node is bound to correct interface and corosync-objctl doesn't report error, add BOTH nodes to config file. - Make sure that corosync on BOTH nodes are bound to correct interface - If corosync is still not able to create membership (repeating messages like: Nov 25 17:58:09 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Nov 25 17:58:11 corosync [TOTEM ] A processor failed, forming new configuration. ), try tcpdump and see if any traffic is going on corosync port? - Try reduce mtu (option netmtu) to something like 1000. I believe that following (exactly) steps, we will be able to find out what is happening. Regards, Honza Slava Bendersky napsal(a): > Hello Honza, > I corrected the config, but it didn't change match. Cluster is not forming > properly. > I shutdown iptables > Log > > Nov 25 17:58:05 corosync [CPG ] chosen downlist: sender r(0) ip(10.10.10.1) ; > members(old:1 left:0) > Nov 25 17:58:05 corosync [MAIN ] Completed service synchronization, ready to > provide service. > Nov 25 17:58:07 corosync [TOTEM ] A processor failed, forming new > configuration. > Nov 25 17:58:08 corosync [TOTEM ] A processor joined or left the membership > and a new membership was formed. > Nov 25 17:58:08 corosync [TOTEM ] A processor failed, forming new > configuration. > Nov 25 17:58:09 corosync [TOTEM ] A processor joined or left the membership > and a new membership was formed. > Nov 25 17:58:11 corosync [TOTEM ] A processor failed, forming new > configuration. > > But right now I see both end members > > pbx01*CLI> corosync show members > > ============================================================= > === Cluster members ========================================= > ============================================================= > === > === Node 1 > === --> Group: asterisk > === --> Address 1: 10.10.10.1 > === Node 2 > === --> Group: asterisk > === --> Address 1: 10.10.10.2 > === > ============================================================= > > And this message is still flooding asterisk log. > > 2013-11-25 12:02:18] WARNING[2057]: res_corosync.c:316 ast_event_cb: CPG > mcast failed (6) > [2013-11-25 12:02:18] WARNING[2057]: res_corosync.c:316 ast_event_cb: CPG > mcast failed (6) > > > When do ping from asterisk it shows mac from eth0 and not eth3. > > pbx01*CLI> corosync ping > [2013-11-25 12:03:38] NOTICE[2057]: res_corosync.c:303 ast_event_cb: > (ast_event_cb) Got event PING from server with EID: 'mac of eth0' > > Slava. > > > ----- Original Message ----- > > From: "Jan Friesse" <[email protected]> > To: "Slava Bendersky" <[email protected]>, "Steven Dake" > <[email protected]> > Cc: [email protected] > Sent: Monday, November 25, 2013 3:10:51 AM > Subject: Re: [corosync] information request > > Slava Bendersky napsal(a): >> Hello Steven, >> Here testing results >> Iptables is stopped both end. >> >> [root@eusipgw01 ~]# iptables -L -nv -x >> Chain INPUT (policy ACCEPT 474551 packets, 178664760 bytes) >> pkts bytes target prot opt in out source destination >> >> Chain FORWARD (policy ACCEPT 0 packets, 0 bytes) >> pkts bytes target prot opt in out source destination >> >> Chain OUTPUT (policy ACCEPT 467510 packets, 169303071 bytes) >> pkts bytes target prot opt in out source destination >> [root@eusipgw01 ~]# >> >> >> First case is udpu transport and rrp: none >> >> totem { >> version: 2 >> token: 160 >> token_retransmits_before_loss_const: 3 >> join: 250 >> consensus: 300 >> vsftype: none >> max_messages: 20 >> threads: 0 >> nodeid: 2 >> rrp_mode: none >> interface { >> member { >> memberaddr: 10.10.10.1 >> } > > ^^^ This is problem. You must define BOTH nodes (not only remote) on > BOTH sides. > >> ringnumber: 0 >> bindnetaddr: 10.10.10.0 >> mcastport: 5405 >> } >> transport: udpu >> } >> >> Error >> >> Nov 24 14:25:29 corosync [MAIN ] Totem is unable to form a cluster because >> of an operating system or network fault. The most common cause of this >> message is that the local firewall is configured improperly. >> > > This is because you defined only remote node, not the local one in > member(s) section(s). > > Regards, > Honza > >> pbx01*CLI> corosync show members >> >> ============================================================= >> === Cluster members ========================================= >> ============================================================= >> === >> === >> ============================================================= >> >> >> And the same with rrp: passive. I think unicast is more related to some >> incompatibility with vmware ? Only multicast going though, bur even then it >> not forming completely the cluster. >> >> Slava. >> >> ----- Original Message ----- >> >> From: "Steven Dake" <[email protected]> >> To: "Slava Bendersky" <[email protected]>, "Digimer" <[email protected]> >> Cc: [email protected] >> Sent: Sunday, November 24, 2013 12:01:09 PM >> Subject: Re: [corosync] information request >> >> >> On 11/23/2013 11:20 PM, Slava Bendersky wrote: >> >> >> >> Hello Digimer, >> Here from asterisk box what I see >> pbx01*CLI> corosync show members >> >> ============================================================= >> === Cluster members ========================================= >> ============================================================= >> === >> === Node 1 >> === --> Group: asterisk >> === --> Address 1: 10.10.10.1 >> === Node 2 >> === --> Group: asterisk >> === --> Address 1: 10.10.10.2 >> === >> ============================================================= >> >> [2013-11-24 01:12:43] WARNING[2057]: res_corosync.c:316 ast_event_cb: CPG >> mcast failed (6) >> [2013-11-24 01:12:43] WARNING[2057]: res_corosync.c:316 ast_event_cb: CPG >> mcast failed (6) >> >> >> >> >> These errors come from asterisk via the cpg libraries because corosync >> cannot get a proper configuration. The first message on tihs thread contains >> the scenarios under which those occur. In a past log you had the error >> indicating a network fault. This network fault error IIRC indicates firewall >> is enabled. The error from asterisk is expected if your firewall is enabled. >> This was suggested before by Digimer, but can you confirm you totally >> disabled your firewall on the box (rather then just configured it as you >> thought was correct). >> >> Turn off the firewall - which will help us eliminate that as a source of the >> problem. >> >> Next, use UDPU mode without RRP - confirm whether that works >> >> Next use UDPU _passive_ rrp mode - confirm whether that works >> >> One thing at a time in each step please. >> >> Regards >> -steve >> >> >> >> >> >> Is possible that message related to permission who running corosync or >> asterisk ? >> >> And another point is when I send ping I see MAC address of eth0 which is >> default gateway and not cluster interface. >> >> >> >> Corosync does not use the gateway address in any of its routing >> calculations. Instead it physically binds to the interface specified as >> detailed in corosync.conf.5. By physically binding, it avoids the gateway >> entirely. >> >> Regards >> -steve >> >> >> <blockquote> >> >> pbx01*CLI> corosync ping >> [2013-11-24 01:16:54] NOTICE[2057]: res_corosync.c:303 ast_event_cb: >> (ast_event_cb) Got event PING from server with EID: 'MAC address of the >> eth0' >> [2013-11-24 01:16:54] WARNING[2057]: res_corosync.c:316 ast_event_cb: CPG >> mcast failed (6) >> >> >> Slava. >> >> >> ----- Original Message ----- >> >> From: "Slava Bendersky" <[email protected]> >> To: "Digimer" <[email protected]> >> Cc: [email protected] >> Sent: Sunday, November 24, 2013 12:26:40 AM >> Subject: Re: [corosync] information request >> >> Hello Digimer, >> I am trying find information about vmware multicast problems. But on tcpdump >> I see multicas traffic from remote end. I can't confirm if packet arrive as >> should be. >> Can please confirm that memberaddr: is ip address of second node ? >> >> 06:05:02.408204 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP >> (17), length 221) >> 10.10.10.1.5404 > 226.94.1.1.5405: [udp sum ok] UDP, length 193 >> 06:05:02.894935 IP (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto UDP >> (17), length 221) >> 10.10.10.2.5404 > 226.94.1.1.5405: [bad udp cksum 1a8c!] UDP, length 193 >> >> >> Slava. >> >> >> >> ----- Original Message ----- >> >> From: "Digimer" <[email protected]> >> To: "Slava Bendersky" <[email protected]> >> Cc: [email protected] >> Sent: Saturday, November 23, 2013 11:54:55 PM >> Subject: Re: [corosync] information request >> >> If I recall correctly, VMWare doesn't do multicast properly. I'm not >> sure though, I don't use it. >> >> Try unicast with no RRP. See if that works. >> >> On 23/11/13 23:16, Slava Bendersky wrote: >>> Hello Digimer, >>> All machines are rhel 6.4 based on vmware , there not physical switch >>> only from vmware. I set rrp to none and cluster is formed. >>> With this config I am getting constant error messages. >>> >>> [root@eusipgw01 ~]# cat /etc/redhat-release >>> Red Hat Enterprise Linux Server release 6.4 (Santiago) >>> >>> [root@eusipgw01 ~]# rpm -qa | grep corosync >>> corosync-1.4.1-15.el6.x86_64 >>> corosynclib-1.4.1-15.el6.x86_64 >>> >>> >>> [2013-11-23 22:46:20] WARNING[2057] res_corosync.c: CPG mcast failed (6) >>> [2013-11-23 22:46:20] WARNING[2057] res_corosync.c: CPG mcast failed (6) >>> >>> iptables >>> >>> -A INPUT -i eth1 -p udp -m state --state NEW -m udp --dport 5404:5407 -j >>> NFLOG --nflog-prefix "dmz_ext2fw: " --nflog-group 2 >>> -A INPUT -i eth1 -m pkttype --pkt-type multicast -j NFLOG >>> --nflog-prefix "dmz_ext2fw: " --nflog-group 2 >>> -A INPUT -i eth1 -m pkttype --pkt-type unicast -j NFLOG --nflog-prefix >>> "dmz_ext2fw: " --nflog-group 2 >>> -A INPUT -i eth1 -p igmp -j NFLOG --nflog-prefix "dmz_ext2fw: " >>> --nflog-group 2 >>> -A INPUT -j ACCEPT >>> >>> >>> ------------------------------------------------------------------------ >>> *From: *"Digimer" <[email protected]> >>> *To: *"Slava Bendersky" <[email protected]> >>> *Cc: *[email protected] >>> *Sent: *Saturday, November 23, 2013 10:34:00 PM >>> *Subject: *Re: [corosync] information request >>> >>> I don't think you ever said what OS you have. I've never had to set >>> anything in sysctl.conf on RHEL/CentOS 6. Did you try disabling RRP >>> entirely? If you have a managed switch, make sure persistent multicast >>> groups are enabled or try a different switch entirely. >>> >>> *Something* is interrupting your network traffic. What does >>> iptables-save show? Are these physical or virtual machines? >>> >>> The more information about your environment that you can share, the >>> better we can help. >>> >>> On 23/11/13 22:29, Slava Bendersky wrote: >>>> Hello Digimer, >>>> As an idea, might be some settings in sysctl.conf ? >>>> >>>> Slava. >>>> >>>> >>>> ------------------------------------------------------------------------ >>>> *From: *"Slava Bendersky" <[email protected]> >>>> *To: *"Digimer" <[email protected]> >>>> *Cc: *[email protected] >>>> *Sent: *Saturday, November 23, 2013 10:27:22 PM >>>> *Subject: *Re: [corosync] information request >>>> >>>> Hello Digimer, >>>> Yes I set to passive and selinux is disabled >>>> >>>> [root@eusipgw01 ~]# sestatus >>>> SELinux status: disabled >>>> [root@eusipgw01 ~]# cat /etc/corosync/corosync.conf >>>> totem { >>>> version: 2 >>>> token: 160 >>>> token_retransmits_before_loss_const: 3 >>>> join: 250 >>>> consensus: 300 >>>> vsftype: none >>>> max_messages: 20 >>>> threads: 0 >>>> nodeid: 2 >>>> rrp_mode: passive >>>> interface { >>>> ringnumber: 0 >>>> bindnetaddr: 10.10.10.0 >>>> mcastaddr: 226.94.1.1 >>>> mcastport: 5405 >>>> } >>>> } >>>> >>>> logging { >>>> fileline: off >>>> to_stderr: yes >>>> to_logfile: yes >>>> to_syslog: off >>>> logfile: /var/log/cluster/corosync.log >>>> debug: off >>>> timestamp: on >>>> logger_subsys { >>>> subsys: AMF >>>> debug: off >>>> } >>>> } >>>> >>>> >>>> Slava. >>>> >>>> ------------------------------------------------------------------------ >>>> *From: *"Digimer" <[email protected]> >>>> *To: *"Slava Bendersky" <[email protected]> >>>> *Cc: *"Steven Dake" <[email protected]> , [email protected] >>>> *Sent: *Saturday, November 23, 2013 7:04:43 PM >>>> *Subject: *Re: [corosync] information request >>>> >>>> First up, I'm not Steven. Secondly, did you follow Steven's >>>> recommendation to not use active RRP? Does the cluster form with no RRP >>>> at all? Is selinux enabled? >>>> >>>> On 23/11/13 18:29, Slava Bendersky wrote: >>>>> Hello Steven, >>>>> In multicast it log filling with this message >>>>> >>>>> Nov 24 00:26:28 corosync [TOTEM ] A processor failed, forming new >>>>> configuration. >>>>> Nov 24 00:26:28 corosync [TOTEM ] A processor joined or left the >>>>> membership and a new membership was formed. >>>>> Nov 24 00:26:31 corosync [CPG ] chosen downlist: sender r(0) >>>>> ip(10.10.10.1) ; members(old:2 left:0) >>>>> Nov 24 00:26:31 corosync [MAIN ] Completed service synchronization, >>>>> ready to provide service. >>>>> >>>>> In uudp it not working at all. >>>>> >>>>> Slava. >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> *From: *"Digimer" <[email protected]> >>>>> *To: *"Slava Bendersky" <[email protected]> >>>>> *Cc: *"Steven Dake" <[email protected]> , [email protected] >>>>> *Sent: *Saturday, November 23, 2013 6:05:56 PM >>>>> *Subject: *Re: [corosync] information request >>>>> >>>>> So multicast works with the firewall disabled? >>>>> >>>>> On 23/11/13 17:28, Slava Bendersky wrote: >>>>>> Hello Steven, >>>>>> I disabled iptables and no difference, error message the same, but at >>>>>> least in multicast is wasn't generate the error. >>>>>> >>>>>> >>>>>> Slava. >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> *From: *"Digimer" <[email protected]> >>>>>> *To: *"Slava Bendersky" <[email protected]> , "Steven Dake" >>>>>> <[email protected]> >>>>>> *Cc: *[email protected] >>>>>> *Sent: *Saturday, November 23, 2013 4:37:36 PM >>>>>> *Subject: *Re: [corosync] information request >>>>>> >>>>>> Does either mcast or unicast work if you disable the firewall? If so, >>>>>> then at least you know for sure that iptables is the problem. >>>>>> >>>>>> The link here shows the iptables rules I use (for corosync in mcast and >>>>>> other apps): >>>>>> >>>>>> https://alteeve.ca/w/AN!Cluster_Tutorial_2#Configuring_iptables >>>>>> >>>>>> digimer >>>>>> >>>>>> On 23/11/13 16:12, Slava Bendersky wrote: >>>>>>> Hello Steven, >>>>>>> Than what I see when setup through UDPU >>>>>>> >>>>>>> Nov 23 22:08:13 corosync [MAIN ] Compatibility mode set to whitetank. >>>>>>> Using V1 and V2 of the synchronization engine. >>>>>>> Nov 23 22:08:13 corosync [TOTEM ] adding new UDPU member {10.10.10.1} >>>>>>> Nov 23 22:08:16 corosync [MAIN ] Totem is unable to form a cluster >>>>>>> because of an operating system or network fault. The most common cause >>>>>>> of this message is that the local firewall is configured improperly. >>>>>>> >>>>>>> >>>>>>> Might be missing some firewall rules ? I allowed unicast. >>>>>>> >>>>>>> Slava. >>>>>>> >>>>>>> >>> ------------------------------------------------------------------------ >>>>>>> *From: *"Steven Dake" <[email protected]> >>>>>>> *To: *"Slava Bendersky" <[email protected]> >>>>>>> *Cc: *[email protected] >>>>>>> *Sent: *Saturday, November 23, 2013 10:33:31 AM >>>>>>> *Subject: *Re: [corosync] information request >>>>>>> >>>>>>> >>>>>>> On 11/23/2013 08:23 AM, Slava Bendersky wrote: >>>>>>> >>>>>>> Hello Steven, >>>>>>> >>>>>>> My setup >>>>>>> >>>>>>> 10.10.10.1 primary server -----EoIP tunnel vpn ipsec ----- dr >>> server >>>>>>> 10.10.10.2 >>>>>>> >>>>>>> On both servers is 2 interfaces eth0 which default gw out and eth1 >>>>>>> where corosync live. >>>>>>> >>>>>>> Iptables: >>>>>>> >>>>>>> -A INPUT -i eth1 -p udp -m state --state NEW -m udp --dport >>>> 5404:5407 >>>>>>> -A INPUT -i eth1 -m pkttype --pkt-type multicast >>>>>>> -A INPUT -i eth1 -p igmp >>>>>>> >>>>>>> >>>>>>> Corosync.conf >>>>>>> >>>>>>> totem { >>>>>>> version: 2 >>>>>>> token: 160 >>>>>>> token_retransmits_before_loss_const: 3 >>>>>>> join: 250 >>>>>>> consensus: 300 >>>>>>> vsftype: none >>>>>>> max_messages: 20 >>>>>>> threads: 0 >>>>>>> nodeid: 2 >>>>>>> rrp_mode: active >>>>>>> interface { >>>>>>> ringnumber: 0 >>>>>>> bindnetaddr: 10.10.10.0 >>>>>>> mcastaddr: 226.94.1.1 >>>>>>> mcastport: 5405 >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Join message >>>>>>> >>>>>>> [root@eusipgw01 ~]# corosync-objctl | grep member >>>>>>> runtime.totem.pg.mrp.srp.members.2.ip=r(0) ip(10.10.10.2) >>>>>>> runtime.totem.pg.mrp.srp.members.2.join_count=1 >>>>>>> runtime.totem.pg.mrp.srp.members.2.status=joined >>>>>>> runtime.totem.pg.mrp.srp.members.1.ip=r(0) ip(10.10.10.1) >>>>>>> runtime.totem.pg.mrp.srp.members.1.join_count=254 >>>>>>> runtime.totem.pg.mrp.srp.members.1.status=joined >>>>>>> >>>>>>> Is it possible that ping sends out of wrong interface ? >>>>>>> >>>>>>> Slava, >>>>>>> >>>>>>> I wouldn't expect so. >>>>>>> >>>>>>> Which version? >>>>>>> >>>>>>> Have you tried udpu instead? If not, it is preferable to multicast >>>>>>> unless you want absolute performance on cpg groups. In most cases the >>>>>>> performance difference is very small and not worth the trouble of >>>>>>> setting up multicast in your network. >>>>>>> >>>>>>> Fabio had indicated rrp active mode is broken. I don't know the >>>>>>> details, but try passive RRP - it is actually better then active >>>>> IMNSHO :) >>>>>>> >>>>>>> Regards >>>>>>> -steve >>>>>>> >>>>>>> Slava. >>>>>>> >>>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>>> *From: *"Steven Dake" <[email protected]> >>>>>>> *To: *"Slava Bendersky" <[email protected]> , >>>>> [email protected] >>>>>>> *Sent: *Saturday, November 23, 2013 6:01:11 AM >>>>>>> *Subject: *Re: [corosync] information request >>>>>>> >>>>>>> >>>>>>> On 11/23/2013 12:29 AM, Slava Bendersky wrote: >>>>>>> >>>>>>> Hello Everyone, >>>>>>> Corosync run on box with 2 Ethernet interfaces. >>>>>>> I am getting this message >>>>>>> CPG mcast failed (6) >>>>>>> >>>>>>> Any information thank you in advance. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://github.com/corosync/corosync/blob/master/include/corosync/corotypes.h#L84 >>> >>>>>>> >>>>>>> This can occur because: >>>>>>> a) firewall is enabled - there should be something in the logs >>>>>>> telling you to properly configure the firewall >>>>>>> b) a config change is in progress - this is a normal response, and >>>>>>> you should try the request again >>>>>>> c) a bug in the synchronization code is resulting in a blocked >>>>>>> unsynced cluster >>>>>>> >>>>>>> c is very unlikely at this point. >>>>>>> >>>>>>> 2 ethernet interfaces = rrp mode, bonding, or something else? >>>>>>> >>>>>>> Digimer needs moar infos :) >>>>>>> >>>>>>> Regards >>>>>>> -steve >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> discuss mailing list >>>>>>> [email protected] >>>>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> discuss mailing list >>>>>>> [email protected] >>>>>>> http://lists.corosync.org/mailman/listinfo/discuss >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Digimer >>>>>> Papers and Projects: https://alteeve.ca/w/ >>>>>> What if the cure for cancer is trapped in the mind of a person without >>>>>> access to education? >>>>>> >>>>> >>>>> >>>>> -- >>>>> Digimer >>>>> Papers and Projects: https://alteeve.ca/w/ >>>>> What if the cure for cancer is trapped in the mind of a person without >>>>> access to education? >>>>> >>>> >>>> >>>> -- >>>> Digimer >>>> Papers and Projects: https://alteeve.ca/w/ >>>> What if the cure for cancer is trapped in the mind of a person without >>>> access to education? >>>> >>>> >>>> _______________________________________________ >>>> discuss mailing list >>>> [email protected] >>>> http://lists.corosync.org/mailman/listinfo/discuss >>>> >>> >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? >>> >> >> >> >> >> _______________________________________________ >> discuss mailing list >> [email protected] >> http://lists.corosync.org/mailman/listinfo/discuss >> > > > _______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
