What was puzzling me was: Cluster Node [akka.tcp://...Node4...] - Marking node(s) as REACHABLE [Member(address = akka.tcp://....Node1..., status = Up)]
but looking at the code revealed that this is only an notification that Node4 thinks that Node1 is reachable again. The ReachableMember event is fired when all thinks it is reachable again, and that will not happen until Node3 is back in business or removed. /Patrik On Tue, Mar 10, 2015 at 3:43 PM, michaels <michael.schr...@atos.net> wrote: > Hello Patrik, > > added the output of ClusterStatus below, > > >> ====== Test Steps: ====== >>> 1.) Start 4 JVMs (all on local host) - Nodes form a cluster - Leader = >>> 1 - same information on all nodes (VisualVM MBean akka.cluster) >>> 2.) With Sys-Internal Process Explorer Suspend Process of Node 1 >>> 3.) Looking with Java VisualVM on akka.cluster MBeans >>> 4.) Waiting for all other nodes (2,3,4) to mark 1 as Unreachable. New >>> Leader is now 2. >>> 5.) With Sys-Internal Process Explorer Suspend Process of Node 3 >>> 6.) Waiting for all other working nodes (2,4) to mark 3 also as >>> Unreachable. Leader is still 2. >>> 7.) With Sys-Internal Process Explorer Resume Process of Node 1 >>> >>> Now strange things happen/can be seen: >>> JMX MBean akka.cluster: >>> - Node 1: MemberStatus=Up, Leader = 1 / Unreachable = Node 3 >>> - Node 2: MemberStatus=Up, Leader = 2 / Unreachable = Node 1, 3 >>> - Node 4: MemberStatus=Up, Leader = 2 / Unreachable = Node 1, 3 >>> >>> It seems there are multiple leaders in the cluster. >>> Node 1 thinks almost everything is fine and believes it is the leader of >>> the cluster. >>> This state does not change, even after a long time...(30 minutes+, no >>> application load on cluster, just the cluster running.) >>> >> > Additional info from JMX MBean akka.cluster - ClusterStatus > Node 1: > "unreachable" : [{ > "node" : "akka.tcp://....Node3", > "observed-by" : ["akka.tcp://...Node1"] > } > > Node4: > "unreachable" : [{ > "node" : "akka.tcp://...Node1", > "observed-by" : ["akka.tcp://...Node3"] > }, { > "node" : "akka.tcp://...Node3", > "observed-by" : ["akka.tcp://...Node1", "akka.tcp://....Node2" > , "akka.tcp://...Node4"] > } > ] > > So...Node 4 still believes Node 1 is unreachable, because the - now > unreachable - Node 3 has told it so. > > > There can be multiple leaders. The leader is simply the member with lowest >> address among the currently reachable members (as seen from a specific >> node). There are some more rules regarding member status, but that is >> irrelevant for this. >> > > Thanks for the clarification. If you don't see a problem with that, i will > not do it either :-) > > > >>> - Is the JMX MBean akka.cluster showing wrong information in this case? >>> As pointed out above there is no ReachableMember event after "the marking >>> node as REACHABLE" trace in this case. Maybe the component preparing the >>> MBean information is also missing the event? >>> >> >> That is interesting. You should receive the ReachableMember. The MBean >> also subscribes to these events. If you look at clusterStatus you should >> see more information about who thinks that it is still unreachable. >> > >> /Patrik >> > > Thanks for the hint - i have not yet discovered the observed-by-part. > > The event is not received. And it believe it is also not received by the > MBean. > However when Step 8 is performed (see example in original post), we > immediately receive the event after the "the marking node as REACHABLE" > trace. (And also the MBean receives it, because afterwards no more > unreachable nodes in the list anywhere). > > Maybe there might be reasons why Node 4 (and Node 2) keep the reachable > Node 1 as Unreachable (so they don't want to emit the event to the > listeners like our actor or the MBean) or....? > > Or the trace "Ignoring received gossip from unreachable" is a hint? > Shouldn't the algorithm trust Node 1 more than Node 3 (Node 1 which was > Unreachable but it is now in fact talking to me...than Node 3 who has told > me something a long time ago but is now unreachable). > > > Best regards, > > Michael > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to akka-user+unsubscr...@googlegroups.com. > To post to this group, send email to akka-user@googlegroups.com. > Visit this group at http://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- Patrik Nordwall Typesafe <http://typesafe.com/> - Reactive apps on the JVM Twitter: @patriknw [image: Scala Days] <http://event.scaladays.org/scaladays-sanfran-2015> -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.