Hi Yayati, *>> as a 3 Machine quorum is required for making a HA cluster*
Looks like neo4j does not require a 3rd machine to function as follows from "...When running Neo4j in HA mode there is always a single master and zero or more slaves...." on http://neo4j.com/docs/3.0.1/ha-architecture.html Dennis On Tuesday, May 17, 2016 at 12:56:40 AM UTC-4, Yayati Sule wrote: > > Hi Dennis, > I am facing simialr problem, but in my case I have 2 machines as slaves > and 1 master, but I cannot get them up and running. In your case maybe you > can try to setup a third instance and try to start the cluster as a 3 > Machine quorum is required for making a HA cluster. > > Regards, > Yayati Sule > Associate Data Scientist > Innoplexus Consulting Services Pvt. Ltd. > www.innoplexus.com > Mob : +91-9527459407 > > Landline: +91-20-66527300 > > © 2011-16 Innoplexus Consulting Services Pvt. Ltd. > > Unless otherwise explicitly stated, all rights including those in copyright > in the content of this e-mail are owned by Innoplexus Consulting Services Pvt > Ltd. and all related legal entities. The contents of this e-mail shall not be > copied, reproduced, or transmitted in any form without the written permission > of Innoplexus Consulting Services Pvt Ltd or that of the copyright owner. The > receipt of this mail is the acknowledgement of the receipt of contents; if > the recipient is not the intended addressee then the recipient shall notify > the sender immediately. > > The contents are provided for information only and no opinions expressed > should be relied on without further consultation with Innoplexus Consulting > Services Pvt Ltd. and all related legal entities. While all endeavors have > been made to ensure accuracy, Innoplexus Consulting Services Pvt. Ltd. makes > no warranty or representation to its accuracy, completeness or fairness and > persons who rely on it do so entirely at their own risk. The information > herein may be changed or withdrawn at any time without notice. Innoplexus > Consulting Services Pvt. Ltd. will not be liable to any client or third party > for the accuracy of the information supplied through this service. > > Innoplexus Consulting Services Pvt. Ltd. accepts no responsibility or > liability for the contents of any other site, whether linked to this site or > not, or any consequences from your acting upon the contents of another site. > > Please Consider the environment before printing this email. > > > On Tue, May 17, 2016 at 10:22 AM, Dennis O <dennis....@gmail.com > <javascript:>> wrote: > >> Hi, >> >> Please advise on required configuration for the 2-instance "HA" setup >> (AWS, neo4j Enterprise 3.0.1). >> >> Currently I have on both instances: >> - dbms.mode=HA >> - ha.initial_hosts=172.31.35.147:5001,172.31.33.173:5001 >> - ha.host.coordination is commented out >> - ha.host.data is commented out >> Port 5001, 5002, 7474, 6001 open on both. >> >> Differences >> 1. One node has ha.server_id=1 (172.31.33.173), another one >> - ha.server_id=2 >> 2. Node with id=1 is Debian 8.4, id=2 is Centos 7 >> >> >> With this setup, node with id=1 starts w/o problems, elected as master, >> second one however fails. >> >> Some log extracts: >> >> >> 2016-05-17 04:50:51.781+0000 INFO [o.n.k.h.MasterClient214] >> MasterClient214 communication channel created towards /127.0.0.1:6001 >> 2016-05-17 04:50:51.790+0000 INFO [o.n.k.h.c.SwitchToSlave] Copying >> store from master >> 2016-05-17 04:50:51.791+0000 INFO [o.n.k.h.MasterClient214] Thread[31, >> HA Mode switcher-1] Trying to open a new channel from /172.31.35.147:0 >> to /127.0.0.1:6001 >> 2016-05-17 04:50:51.791+0000 DEBUG [o.n.k.h.MasterClient214] >> MasterClient214 could not connect from /172.31.35.147:0 to / >> 127.0.0.1:6001 >> 2016-05-17 04:50:51.796+0000 INFO [o.n.k.h.MasterClient214] >> MasterClient214[/127.0.0.1:6001] shutdown >> 2016-05-17 04:50:51.796+0000 ERROR >> [o.n.k.h.c.m.HighAvailabilityModeSwitcher] Error while trying to switch to >> slave MasterClient214 could not connect from /172.31.35.147:0 to / >> 127.0.0.1:6001 >> org.neo4j.com.ComException: MasterClient214 could not connect from / >> 172.31.35.147:0 to /127.0.0.1:6001 >> at org.neo4j.com.Client$2.create(Client.java:225) >> at org.neo4j.com.Client$2.create(Client.java:202) >> at org.neo4j.com.ResourcePool.acquire(ResourcePool.java:177) >> at org.neo4j.com.Client.acquireChannelContext(Client.java:390) >> at org.neo4j.com.Client.sendRequest(Client.java:296) >> at org.neo4j.com.Client.sendRequest(Client.java:289) >> at org.neo4j.kernel.ha.MasterClient210.copyStore(MasterClient210.java:311) >> at >> org.neo4j.kernel.ha.cluster.SwitchToSlave$1.copyStore(SwitchToSlave.java:531) >> at >> org.neo4j.com.storecopy.StoreCopyClient.copyStore(StoreCopyClient.java:191) >> at >> org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMaster(SwitchToSlave.java:525) >> at >> org.neo4j.kernel.ha.cluster.SwitchToSlave.copyStoreFromMasterIfNeeded(SwitchToSlave.java:348) >> at >> org.neo4j.kernel.ha.cluster.SwitchToSlave.switchToSlave(SwitchToSlave.java:272) >> at >> org.neo4j.kernel.ha.cluster.modeswitch.HighAvailabilityModeSwitcher$1.run(HighAvailabilityModeSwitcher.java:348) >> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >> at >> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:104) >> Caused by: java.net.ConnectException: Connection refused >> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) >> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:148) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:104) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:78) >> at >> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312) >> at >> org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:41) >> ... 4 more >> 2016-05-17 04:50:51.797+0000 INFO >> [o.n.k.h.c.m.HighAvailabilityModeSwitcher] Attempting to switch to slave >> in 7s >> 2016-05-17 04:50:58.799+0000 INFO [o.n.k.i.f.CommunityFacadeFactory] No >> locking implementation specified, defaulting to 'forseti' >> 2016-05-17 04:50:58.799+0000 INFO [o.n.k.h.c.SwitchToSlave] ServerId 2, >> moving to slave for master ha://0.0.0.0:6001?serverId=1 >> >> >> >> 2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @ >> 2016-05-17 04:30:57.534+0000] ClusterState: >> discovery-[configurationTimeout]->discovery conversation-id:2/13# >> payload:ConfigurationTimeoutState{remainingPings=3} >> 2016-05-17 04:30:57.535+0000 DEBUG [o.n.c.p.h.HeartbeatState$1] [AsyncLog >> @ 2016-05-17 04:30:57.535+0000] HeartbeatState: >> start-[reset_send_heartbeat]->start conversation-id:2/13# >> 2016-05-17 04:30:57.538+0000 INFO [o.n.c.c.NetworkSender] [AsyncLog @ >> 2016-05-17 04:30:57.537+0000] Attempting to connect from / >> 172.31.35.147:0 to /172.31.33.173:5001 >> 2016-05-17 04:30:57.540+0000 INFO [o.n.c.c.NetworkSender] [AsyncLog @ >> 2016-05-17 04:30:57.540+0000] Failed to connect to /172.31.33.173:5001 >> due to: java.net.ConnectException: Connection refused >> 2016-05-17 04:30:57.540+0000 DEBUG [o.n.c.p.c.ClusterState$2] [AsyncLog @ >> 2016-05-17 04:30:57.540+0000] ClusterState: >> discovery-[configurationRequest]->discovery from:cluster:// >> 172.31.35.147:5001 conversation-id:2/13# >> payload:ConfigurationRequestState{joiningId=2, joiningUri=cluster:// >> 172.31.35.147:5001} >> 2016-05-17 04:30:58.420+0000 INFO [o.n.c.c.NetworkReceiver] [AsyncLog @ >> 2016-05-17 04:30:58.420+0000] cluster://172.31.35.147:47188 >> disconnected from me at cluster://172.31.35.147:5001 >> 2016-05-17 04:30:58.420+0000 INFO [o.n.c.c.NetworkReceiver] [AsyncLog @ >> 2016-05-17 04:30:58.420+0000] cluster://172.31.35.147:47188 >> disconnected from me at cluster://172.31.35.147:5001 >> 2016-05-17 04:30:58.434+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Starting check pointing... >> 2016-05-17 04:30:58.438+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Starting store flush... >> 2016-05-17 04:30:58.443+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Store flush completed >> 2016-05-17 04:30:58.443+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Starting appending check >> point entry into the tx log... >> 2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Appending check point entry >> into the tx log completed >> 2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check >> Pointing triggered by database shutdown [1]: Check pointing completed >> 2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log >> Rotation [0]: Starting log pruning. >> 2016-05-17 04:30:58.447+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log >> Rotation [0]: Log pruning complete. >> 2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] --- >> STOPPING diagnostics START --- >> 2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] High >> Availability diagnostics >> Member state:PENDING >> State machines: >> AtomicBroadcastMessage:start >> AcceptorMessage:start >> ProposerMessage:start >> LearnerMessage:start >> HeartbeatMessage:start >> ElectionMessage:start >> SnapshotMessage:start >> ClusterMessage:discovery >> Current timeouts: >> join:configurationTimeout{conversation-id=2/13#, timeout-count=29, >> created-by=2} >> 2016-05-17 04:30:58.475+0000 INFO [o.n.k.i.DiagnosticsManager] --- >> STOPPING diagnostics END --- >> 2016-05-17 04:30:58.475+0000 INFO >> [o.n.k.h.f.HighlyAvailableFacadeFactory] Shutdown started >> >> etc. >> >> >> Any insights are highly appreciated!! >> >> Thank you! >> Dennis >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@googlegroups.com <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.