Hi Johannes, On 9 December 2014 at 15:29:53, Johannes Berg (jberg...@gmail.com) wrote:
Hi! I'm doing some load tests in our system and getting problems that some of my nodes are marked as unreachable even though the processes are up. I'm seeing it going a few times from reachable to unreachable and back a few times before staying unreachable saying connection gated for 5000ms and staying silently that way. Looking at the connections made to one of the seed nodes I see that I have several hundreds of connections from other nodes except the failing ones. Is this normal? There are several (hundreds) just between two nodes. When are connections formed between cluster nodes and when are they taken down? Several hundred connections between two nodes seems very wrong. There should only be one connection between two nodes that communicate over akka remoting or are part of a cluster. How many nodes do you have in your cluster? If you are using cluster aware routers then there should be one connection between the router node and the rooutee nodes (can be the same connection that is used for the cluster communication). The connections between the nodes don't get torn down, they stay open, but they are reused for all remoting communication between the nodes. Also is there some limit on how many connections a node with default settings will accept? We have auto-down-unreachable-after = 10s set in our config, does this mean if the node is busy and doesn't respond in 10 seconds it becomes unreachable? Is there any reason why it would stay unreachable and not re-try to join the cluster? The auto down, setting is actually just what it says. I the node is considered unreachable for 10 seconds, it will be moved to DOWN and won't be able to come back into the cluster. The different states of the cluster and the settings are explained in the documentation. http://doc.akka.io/docs/akka/2.3.7/common/cluster.html http://doc.akka.io/docs/akka/2.3.7/scala/cluster-usage.html If you are having problems with nodes becoming unreachable then you could check if you are doing one of these things: 1) sending to large blobs as messages, that effectively block out the heart beats going over the same connection 2) having long GC pauses that trigger the failure detector since nodes don't reply to heartbeats B/ We are using Akka 2.3.6 and using cluster aware routers quite much with a lot of remote messages going around. Anyone that can shed some light on this or that can point me at some documentation about these things? -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- Björn Antonsson Typesafe – Reactive Apps on the JVM twitter: @bantonsson -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.