Are you aware of the importance of the first seed node, the one you have listed as first element in sees-nodes list? See documentation.
You can get decent behavior if you wait with joining until the list of discovered nodes stabilize, i.e. not changing within X seconds. Then sort them to make sure the same is used as the first from all places. Then joinSeedNodes with that sorted list. To be completely safe you must manually decide which one to use as the first seed node. /Patrik fre 9 sep. 2016 kl. 20:32 skrev kraythe <kray...@gmail.com>: > Greetings, > > We are having some problems with our cluster configuration that manifest > themselves in the following log lines (redacted for confidentiality > reasons. > > Sep 09 00:58:10 host1.mycompany.com application-9001.log: 2016-09-09 05: > 58:10 +0000 - [WARN] - [OrdersActor] akka://myCompany/user/OrdersActor/291 > - (291) #recordTxns, sending 54 txns to UserActor took 0.0044229 seconds > Sep 09 00:58:19 host1.mycompany.com application-9001.log: 2016-09-09 05: > 58:19 +0000 - [WARN] - [ShardRegion] akka.tcp:// > myCompany@10.8.1.169:2551/system/sharding/UserActor - Trying to register > to coordinator at [None], but no acknowledgement. Total [54] buffered > messages. > > I have traced this to the configuration of the cluster. We are running > this on Amazon AWS and the code includes use of Hazelcast for finding the > IPs of the other nodes (mostly because we have solved discovery for > hazelcast in our dynamic IP cluster). We retrieve the IPs of the other > nodes in the cluster from hazelcast and appropriately use them to create > the Address object to use in the seed node. Once we have the seed nodes we > have tried two mechanisms. First is to take the list of seed nodes and use > them to join the cluster with cluster.joinSeedNodes(). Of Course not all > machines come up and are discovered by hazelcast at exactly the same > instant so the first 3 nodes might come up first and use each other to join > whereas by the time the 9th node comes up there are 9 seed nodes. When we > start sending messages to cluster shared actors, we get the errors above. > Also when a node goes down the system screams constantly that a seed node > is gone. So I changed the code to pick a node at random and do a > cluster.join() with that node instead. However, we have the same problem > as above. However, when we first bring up one node then bring them up one > at a time, the problem goes away. Another symptom is that if we have the > problem above and we terminate host1 then other nodes start propagating > this behavior. Probably all the other nodes that were connected to host1. > Apparently they can't heal to connect to another node. So this lends > evidence to the multiple split brains theory. > > My theory is that by using all these seed nodes I am creating multiple > split brains. So if you have 5 nodes, A, B, C, D, E and A connects to B, B > to A, C to E, E to D, D to E then we have two clusters running that know > nothing about each other. For some reason then the coordinators get > confused about what is going on. > > Essentially the problem domain is this: 1. We don't know what ANY of the > IPs are ahead of time. 2) We want the cluster to be whole. 3) If a single > node leaves the cluster we would like the remaining nodes to recover. > > I would appreciate any insight anyone could provide on this and especially > what may be the problem (I could be wrong), and how we can accomplish our > goals. Note that I am not committed to using hazelcast to find other nodes. > > Thanks in advance. > > > -- > >>>>>>>>>> Read the docs: http://akka.io/docs/ > >>>>>>>>>> Check the FAQ: > http://doc.akka.io/docs/akka/current/additional/faq.html > >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user > --- > You received this message because you are subscribed to the Google Groups > "Akka User List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to akka-user+unsubscr...@googlegroups.com. > To post to this group, send email to akka-user@googlegroups.com. > Visit this group at https://groups.google.com/group/akka-user. > For more options, visit https://groups.google.com/d/optout. > -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.