Hi Konrad,

Master is m1.xlarge instance. I checked CPU load: about 5-10%

Unfortunately we can't upgrade to 2.3.x due protobuf dependency (I tried, 
but faced with some reflection exceptions during Tomcat startup) 


On Friday, July 18, 2014 1:04:33 PM UTC+3, Konrad Malawski wrote:
>
> Hi Vitaliy,
> It seems the master is overloaded.
> Do you have jvm monitoring in place to see if it's not in state of agony?
>
> In other news, 2.3.4 prioritises hearbeats so missing hearbeat messages 
> (thus causing false positives on failure detection) because of overloaded 
> machine is less likely,
> I would recommend trying to upgrade (maybe you can upgrade your proto 
> dependency somehow?).
>
> You can also tweak the failure detector's timeout, but if it's the case 
> that the master is barely keeping up anyway that's not really a solution.
>
>
>
> On Mon, Jul 14, 2014 at 5:53 PM, Vitaliy Morarian <vmor...@gmail.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> We have "MonitoringMaster" actor system and N "Metrics" actor systems.
>> They are deployed in AWS, and to make it working we are substituting 
>> public-ip in runtime.
>>
>> Akka version: 2.2.4 (can't upgrade to 2.3.x due protobuf dependency)
>>
>> Config file:
>> akka {
>>   loglevel = INFO
>>   log-config-on-start = on
>>   debug {
>>     receive = on
>>     lifecyle = off
>>   }
>>   actor {
>>     provider = "akka.remote.RemoteActorRefProvider"
>>   }
>>   remote {
>>     enabled-transports = ["akka.remote.netty.tcp"]
>>     log-remote-lifecycle-events = INFO
>>     netty.tcp {
>>       hostname = "127.0.0.1" //but we substitute a real IP in runtime
>>     }
>>     secure-cookie = "#####"
>>     require-cookie = on
>>   }
>> }
>>
>> remote {
>>   untrusted-mode = on
>>   log-received-messages = off
>> }
>>
>> So everything works ok when we have less than 10 clients. Problem starts 
>> to occur when more than 10 clients are "connecting" to master (sometimes 
>> 11, sometimes 15, ...).
>> In this case we observing cascade of exceptions (and it affects all 
>> Metrics systems):
>>
>>
>> *MonitoringMaster*:
>> [INFO] [07/14/2014 15:02:06.386] 
>> [MonitoringMaster-akka.actor.default-dispatcher-3] 
>> [akka://MonitoringMaster/user/master] Added producer Actor[akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552/user/metric-producer#-1020796025]
>>  
>> with meta InstanceMeta(InstanceGlobalId(us-east-1,i-14ffd83e),XXXX)
>> [WARN] [07/14/2014 15:03:03.023] 
>> [MonitoringMaster-akka.actor.default-dispatcher-19] 
>> [akka://MonitoringMaster/system/remote-watcher] Detected unreachable: 
>> [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]
>> [INFO] [07/14/2014 15:03:03.048] 
>> [MonitoringMaster-akka.actor.default-dispatcher-3] [Remoting] Address 
>> [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] is 
>> now quarantined, all messages to this address will be delivered to dead 
>> letters.
>> WARN] [07/14/2014 15:03:03.060] 
>> [MonitoringMaster-akka.actor.default-dispatcher-3] 
>> [akka://MonitoringMaster/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FMetrics%
>> 40ec2-54-88-77-195.compute-1.amazonaws.com%3A2552-1866/endpointWriter] 
>> AssociationError [akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] -> 
>> [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]: 
>> Error [Invalid address: akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has a UID that has been quarantined. Association aborted.
>> ]
>> [WARN] [07/14/2014 15:03:03.061] 
>> [MonitoringMaster-akka.actor.default-dispatcher-3] [Remoting] Tried to 
>> associate with unreachable remote address [akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]. Address is now 
>> gated for 60000 ms, all messages to this address will be delivered to dead 
>> letters. Reason: The remote system has a UID that has been quarantined. 
>> Association aborted.
>> [ERROR] [07/14/2014 15:03:06.205] 
>> [MonitoringMaster-akka.actor.default-dispatcher-19] 
>> [akka://MonitoringMaster/system/endpointManager/endpointWriter-akka.tcp%3A%2F%2FMetrics%
>> 40ec2-54-88-77-195.compute-1.amazonaws.com%3A2552-1867] AssociationError 
>> [akka.tcp://monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] 
>> <- [akka.tcp://metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]: 
>> Error [Invalid address: akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has quarantined this system. No further associations to 
>> the remote system are possible until this system is restarted.
>> ]
>> [WARN] [07/14/2014 15:03:06.205] 
>> [MonitoringMaster-akka.actor.default-dispatcher-19] [Remoting] Tried to 
>> associate with unreachable remote address [akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552]. Address is now 
>> gated for 60000 ms, all messages to this address will be delivered to dead 
>> letters. Reason: The remote system has quarantined this system. No further 
>> associations to the remote system are possible until this system is 
>> restarted.
>>
>>
>>
>> Sometimes I also see such exception:
>> [ERROR] [07/14/2014 15:02:47.544] 
>> [MonitoringMaster-akka.actor.default-dispatcher-12] [Remoting] Error 
>> encountered while processing system message acknowledgement [2, 3] ACK[2, 
>> {1, 0}] (akka.remote.transport.Transport$InvalidAssociationException)
>>
>>
>>
>> *Metrics*:
>> 2014-07-14 15:02:06,381  INFO [Metrics-akka.actor.default-dispatcher-17] 
>> d.e.m.MetricProducerActor - Successfully connected to master 
>> Actor[akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551/user/master#-530936949
>> ]
>> 2014-07-14 15:03:01,174  WARN [Metrics-akka.actor.default-dispatcher-15] 
>> a.r.RemoteWatcher - Detected unreachable: [akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551]
>> 2014-07-14 15:03:01,174  INFO [Metrics-akka.actor.default-dispatcher-15] 
>> Remoting - Address [akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] is now 
>> quarantined, all messages to this address will be delivered to dead letters.
>> 2014-07-14 15:03:01,176 ERROR [Metrics-akka.actor.default-dispatcher-17] 
>> a.a.OneForOneStrategy - Master terminated, need to reconnect
>> java.lang.RuntimeException: Master terminated, need to reconnect //Got 
>> Terminated message
>>  at 
>> xxx.xxx.monitoring.MetricProducerActor$$anonfun$connected$1.applyOrElse(MetricProducerActor.scala:81)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>  at 
>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:45)
>> at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:338)
>>  at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:470)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:455)
>>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at 
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:385)
>>  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at 
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>  at 
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at 
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> 2014-07-14 15:03:06,204  WARN [Metrics-akka.actor.default-dispatcher-2] 
>> a.r.EndpointWriter - AssociationError [akka.tcp://
>> metr...@ec2-54-88-77-195.compute-1.amazonaws.com:2552] -> [akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551]: Error 
>> [Invalid address: akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has a UID that has been quarantined. Association aborted.
>> ]
>> 2014-07-14 15:03:06,204  WARN [Metrics-akka.actor.default-dispatcher-2] 
>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>> monitoringmas...@ec2-54-82-6-7.compute-1.amazonaws.com:2551]. Address is 
>> now gated for 60000 ms, all messages to this address will be delivered to 
>> dead letters. Reason: The remote system has a UID that has been 
>> quarantined. Association aborted.
>>
>>
>> I'm curious, why it happens? Our Metrics actor tries to re-connect to 
>> MonitoringMaster but after successful resolving it becomes unreachable.
>>
>>
>>  Regards,
>> Vitaliy
>>
>>  -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com <javascript:>.
>> To post to this group, send email to akka...@googlegroups.com 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Cheers,
> Konrad 'ktoso' Malawski
> hAkker @ Typesafe
>
> <http://typesafe.com>
>  

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to