On 12/15/2016 12:37 PM, al...@amisw.com wrote: > Hi, > > I got some trouble since one week and can't find solution by myself. Any > help will be really appreciated ! > I use corosync / pacemaker for 3 or 4 years and all works well, for > failover or load-balancing. > > I have shared ip between 3 servers, and need to remove one for upgrade. > But after I remove the server from the cluster i got random fail to access > to my shared ip. I think first that some packet want go to the old server. > So I put it again in the cluster, can reach it, but random failure is > still here :-/ > > My test is just a curl http://my_ip (or ssh same stuff, random failed to > connect). > A ping didn't loose any packet. > I can reach each of the three servers, but sometime, the request hang, and > got a timeout. > I see via tcpdump the packet coming, and resend, but no one respond. How I > can diagnostic this ? > I think one request on five fail. But I didn't see any messages in > firewall or /var/log/message, nothing, just like the switch choose to > remove random packet. I didn't see any counter on network interface, check > the iptable setting, recheck the log, recheck all firewall ... Where go > these packets ?? > > I try with another new ip, and same problem append. I try ip on two > differents subnets (10.xxx and external ip) and same stuff. > > I have no problem with virtual ip in failover mode. > > If someone has any clue ...
Seeing your configuration might help. Did you set globally-unique=true and clone-node-max=3 on the clone? If not, the other nodes can't pick up the lost node's share of requests. _______________________________________________ Users mailing list: Users@clusterlabs.org http://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org