Hi, We have "MonitoringMaster" actor system and N "Metrics" actor systems. They are deployed in AWS, and to make it working we are substituting public-ip in runtime.
Akka version: 2.2.4 (can't upgrade to 2.3.x due protobuf dependency) Config file: akka { loglevel = INFO log-config-on-start = on debug { receive = on lifecyle = off } actor { provider = "akka.remote.RemoteActorRefProvider" } remote { enabled-transports = ["akka.remote.netty.tcp"] log-remote-lifecycle-events = INFO netty.tcp { hostname = "127.0.0.1" //but we substitute a real IP in runtime } secure-cookie = "#####" require-cookie = on } } remote { untrusted-mode = on log-received-messages = off } So everything works ok when we have less than 10 clients. Problem starts to occur when more than 10 clients are "connecting" to master (sometimes 11, sometimes 15, ...). In this case we observing cascade of exceptions (and it affects all Metrics systems): *MonitoringMaster*: [INFO] [07/14/2014 15:02:06.386] [MonitoringMaster-akka.actor.default-dispatcher-3] [akka://MonitoringMaster/user/master] Added producer Actor[akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552/user/metric-producer#-1020796025] with meta InstanceMeta(InstanceGlobalId(us-east-1,i-14ffd83e),XXXX) [WARN] [07/14/2014 15:03:03.023] [MonitoringMaster-akka.actor.default-dispatcher-19] [akka://MonitoringMaster/system/remote-watcher] Detected unreachable: [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] [INFO] [07/14/2014 15:03:03.048] [MonitoringMaster-akka.actor.default-dispatcher-3] [Remoting] Address [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] is now quarantined, all messages to this address will be delivered to dead letters. WARN] [07/14/2014 15:03:03.060] [MonitoringMaster-akka.actor.default-dispatcher-3] [akka://MonitoringMaster/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FMetrics%40ec2-54-88-77-195.compute-1.amazonaws.com%3A2552-1866/endpointWriter] AssociationError [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] -> [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]: Error [Invalid address: akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] [ akka.remote.InvalidAssociation: Invalid address: akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has a UID that has been quarantined. Association aborted. ] [WARN] [07/14/2014 15:03:03.061] [MonitoringMaster-akka.actor.default-dispatcher-3] [Remoting] Tried to associate with unreachable remote address [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. [ERROR] [07/14/2014 15:03:06.205] [MonitoringMaster-akka.actor.default-dispatcher-19] [akka://MonitoringMaster/system/endpointManager/endpointWriter-akka.tcp%3A%2F%2FMetrics%40ec2-54-88-77-195.compute-1.amazonaws.com%3A2552-1867] AssociationError [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] <- [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]: Error [Invalid address: akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] [ akka.remote.InvalidAssociation: Invalid address: akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. ] [WARN] [07/14/2014 15:03:06.205] [MonitoringMaster-akka.actor.default-dispatcher-19] [Remoting] Tried to associate with unreachable remote address [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted. Sometimes I also see such exception: [ERROR] [07/14/2014 15:02:47.544] [MonitoringMaster-akka.actor.default-dispatcher-12] [Remoting] Error encountered while processing system message acknowledgement [2, 3] ACK[2, {1, 0}] (akka.remote.transport.Transport$InvalidAssociationException) *Metrics*: 2014-07-14 15:02:06,381 INFO [Metrics-akka.actor.default-dispatcher-17] d.e.m.MetricProducerActor - Successfully connected to master Actor[akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551/user/master#-530936949] 2014-07-14 15:03:01,174 WARN [Metrics-akka.actor.default-dispatcher-15] a.r.RemoteWatcher - Detected unreachable: [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] 2014-07-14 15:03:01,174 INFO [Metrics-akka.actor.default-dispatcher-15] Remoting - Address [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] is now quarantined, all messages to this address will be delivered to dead letters. 2014-07-14 15:03:01,176 ERROR [Metrics-akka.actor.default-dispatcher-17] a.a.OneForOneStrategy - Master terminated, need to reconnect java.lang.RuntimeException: Master terminated, need to reconnect //Got Terminated message at xxx.xxx.monitoring.MetricProducerActor$$anonfun$connected$1.applyOrElse(MetricProducerActor.scala:81) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:45) at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:338) at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:470) at akka.actor.ActorCell.invoke(ActorCell.scala:455) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 2014-07-14 15:03:06,204 WARN [Metrics-akka.actor.default-dispatcher-2] a.r.EndpointWriter - AssociationError [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] -> [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551]: Error [Invalid address: akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] [ akka.remote.InvalidAssociation: Invalid address: akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551 Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system has a UID that has been quarantined. Association aborted. ] 2014-07-14 15:03:06,204 WARN [Metrics-akka.actor.default-dispatcher-2] Remoting - Tried to associate with unreachable remote address [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551]. Address is now gated for 60000 ms, all messages to this address will be delivered to dead letters. Reason: The remote system has a UID that has been quarantined. Association aborted. I'm curious, why it happens? Our Metrics actor tries to re-connect to MonitoringMaster but after successful resolving it becomes unreachable. Regards, Vitaliy -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+unsubscr...@googlegroups.com. To post to this group, send email to akka-user@googlegroups.com. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.