Hi Johannes,

On 9 December 2014 at 15:29:53, Johannes Berg (jberg...@gmail.com) wrote:

Hi! I'm doing some load tests in our system and getting problems that some of 
my nodes are marked as unreachable even though the processes are up. I'm seeing 
it going a few times from reachable to unreachable and back a few times before 
staying unreachable saying connection gated for 5000ms and staying silently 
that way.

Looking at the connections made to one of the seed nodes I see that I have 
several hundreds of connections from other nodes except the failing ones. Is 
this normal? There are several (hundreds) just between two nodes. When are 
connections formed between cluster nodes and when are they taken down?


Several hundred connections between two nodes seems very wrong. There should 
only be one connection between two nodes that communicate over akka remoting or 
are part of a cluster. How many nodes do you have in your cluster?

If you are using cluster aware routers then there should be one connection 
between the router node and the rooutee nodes (can be the same connection that 
is used for the cluster communication).

The connections between the nodes don't get torn down, they stay open, but they 
are reused for all remoting communication between the nodes.

Also is there some limit on how many connections a node with default settings 
will accept?

We have auto-down-unreachable-after = 10s set in our config, does this mean if 
the node is busy and doesn't respond in 10 seconds it becomes unreachable?

Is there any reason why it would stay unreachable and not re-try to join the 
cluster?


The auto down, setting is actually just what it says. I the node is considered 
unreachable for 10 seconds, it will be moved to DOWN and won't be able to come 
back into the cluster. The different states of the cluster and the settings are 
explained in the documentation.

http://doc.akka.io/docs/akka/2.3.7/common/cluster.html
http://doc.akka.io/docs/akka/2.3.7/scala/cluster-usage.html

If you are having problems with nodes becoming unreachable then you could check 
if you are doing one of these things:
1) sending to large blobs as messages, that effectively block out the heart 
beats going over the same connection
2) having long GC pauses that trigger the failure detector since nodes don't 
reply to heartbeats

B/

We are using Akka 2.3.6 and using cluster aware routers quite much with a lot 
of remote messages going around.

Anyone that can shed some light on this or that can point me at some 
documentation about these things?
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

-- 
Björn Antonsson
Typesafe – Reactive Apps on the JVM
twitter: @bantonsson

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to