Hi, So the way the cluster works currently is that the unreachable node has to be removed (by doing a down on it) before a system with the same address/port is allowed to join the cluster. If you have the auto-down set to a low value and wait with restarting the "crashed" node until you see the master setting it to DOWN, does it work then?
The thing that seems weird in your log is that 127.0.0.1:2552 suddenly marks the node as reachable again instead of just downing it. If the old node had been downd and removed correctly, then the new one with the same address/port should be allowed to connect. There might be an issue with the failure detector and a missmatch between addresses and unique addresses (address:port:uid). Would it be possible for you to package up a minimal project that we can use to reproduce this? B/ On 4 November 2014 at 14:57:38, Behrad Zari (behr...@gmail.com) wrote: In my three node cluster (akka 2.3.6 - scala 2.10.4) with the config below cluster { seed-nodes = [ "akka.tcp://adp@127.0.0.1:2552" // using one of the three as seed node ] auto-down-unreachable-after = 120s } I `Ctrl+C` one of my nodes so that simulate some crash/termination I see Remoting - Tried to associate with unreachable remote address [akka.tcp://adp@127.0.0.1:2553]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: /127.0.0.1:2553 but when I restart the process it is ignored to join and they cannot interoperate, and I continue to see the following message: Cluster Node [akka.tcp://adp@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring 13:36:18.964UTC INFO [adp-akka.actor.default-dispatcher-2] Cluster(akka://adp) - Cluster Node [akka.tcp://adp@127.0.0.1:2552] - Marking node(s) as REACHABLE [Member(address = akka.tcp://adp@127.0.0.1:2553, status = Up)] Cluster Node [akka.tcp://adp@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring Cluster Node [akka.tcp://adp@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring Cluster Node [akka.tcp://adp@127.0.0.1:2552] - Existing member [UniqueAddress(akka.tcp://adp@127.0.0.1:2553,392261992)] is trying to join, ignoring ... I'd expect cluster to reconnect after one of my node restarts :( when I decrease "auto-down-unreachable-after" my crashed node is down in my seed node, so it is quarantined and won't be able to rejoin after startup until both node restart. I doubt what is the correct pattern for per node restarts in a clustered deployment!? -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- Björn Antonsson Typesafe – Reactive Apps on the JVM twitter: @bantonsson -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.