I'm following up on this topic after upgrading to akka 2.3.15. I'm 
reasonably confident that the issue is the resullt of using akka along with 
another library that causes the netty dependency to be upgraded from 
3.9.2.Final to 3.10.0.Final. For now I have removed the dependency on the 
newer version of netty, but I thought I'd report what I was seeing in the 
logs. I am running five nodes for a few hours with no issue, and then two 
nodes fall out of the cluster. Here are the logs from each node:

IP: 160
13:59:57.252 INFO  [geyser-akka.actor.default-dispatcher-6] AngelOfTheAbyss 
- Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:58.541 INFO  [geyser-akka.actor.default-dispatcher-306] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:3)
14:00:11.540 INFO  [geyser-akka.actor.default-dispatcher-282] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Removed)|Size:3)
14:00:11.541 INFO  [geyser-akka.actor.default-dispatcher-282] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Removed)|Size:3)
14:00:11.545 WARN  [geyser-akka.remote.default-remote-dispatcher-8] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:11.546 WARN  [geyser-akka.remote.default-remote-dispatcher-8] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.

IP: 42
13:59:57.326 WARN  [geyser-cluster-dispatcher-15] a.c.ClusterCoreDaemon - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Marking node(s) as 
UNREACHABLE [Member(address = akka.tcp://geyser@172.16.125.13:7000, status 
= Up)]
13:59:57.328 INFO  [geyser-akka.actor.default-dispatcher-46] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:4)
14:00:07.345 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Leader is 
auto-downing unreachable node [akka.tcp://geyser@172.16.125.13:7000]
14:00:07.346 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Marking unreachable 
node [akka.tcp://geyser@172.16.125.13:7000] as [Down]
14:00:07.694 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Shutting down...
14:00:07.695 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Successfully shut down
14:00:07.703 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.360 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.119.46:7000] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated]
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.17.110.139:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]
14:00:11.544 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.120.160:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]

IP: 13
13:59:57.244 WARN  [geyser-cluster-dispatcher-17] a.c.ClusterCoreDaemon - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Marking node(s) as 
UNREACHABLE [Member(address = akka.tcp://geyser@172.16.119.42:7000, status 
= Up)]
13:59:57.245 INFO  [geyser-akka.actor.default-dispatcher-61] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:57.326 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Ignoring received 
gossip status from unreachable 
[UniqueAddress(akka.tcp://geyser@172.16.119.42:7000,-477546934)]
14:00:07.711 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.119.42:7000] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated]
14:00:09.243 INFO  [geyser-cluster-dispatcher-17] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Shutting down...
14:00:09.246 INFO  [geyser-cluster-dispatcher-17] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Successfully shut down
14:00:09.253 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.361 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.119.46:7000] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated]
14:00:10.394 ERROR [geyser-akka.remote.default-remote-dispatcher-26] 
a.r.EndpointWriter - AssociationError 
[akka.tcp://geyser@172.16.125.13:7000] <- 
[akka.tcp://geyser@172.16.119.46:7000]: Error [Invalid address: 
akka.tcp://geyser@172.16.119.46:7000] [
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://geyser@172.16.119.46:7000
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The 
remote system has quarantined this system. No further associations to the 
remote system are possible until this system is restarted.
]
14:00:10.394 WARN  [geyser-akka.remote.default-remote-dispatcher-26] 
Remoting - Tried to associate with unreachable remote address 
[akka.tcp://geyser@172.16.119.46:7000]. Address is now gated for 5000 ms, 
all messages to this address will be delivered to dead letters. Reason: 
[The remote system has quarantined this system. No further associations to 
the remote system are possible until this system is restarted.]
14:00:11.364 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.17.110.139:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]
14:00:11.546 WARN  [geyser-akka.remote.default-remote-dispatcher-26] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.120.160:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]


IP: 46
13:59:57.358 INFO  [geyser-akka.actor.default-dispatcher-2] AngelOfTheAbyss 
- Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:58.329 INFO  [geyser-akka.actor.default-dispatcher-7] AngelOfTheAbyss 
- Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:3)
14:00:07.372 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Leader is 
auto-downing unreachable node [akka.tcp://geyser@172.16.119.42:7000]
14:00:07.373 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Marking unreachable 
node [akka.tcp://geyser@172.16.119.42:7000] as [Down]
14:00:08.342 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Leader is 
auto-downing unreachable node [akka.tcp://geyser@172.16.125.13:7000]
14:00:08.342 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Marking unreachable 
node [akka.tcp://geyser@172.16.125.13:7000] as [Down]
14:00:10.352 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Leader is removing 
unreachable node [akka.tcp://geyser@172.16.125.13:7000]
14:00:10.353 INFO  [geyser-cluster-dispatcher-21] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.46:7000] - Leader is removing 
unreachable node [akka.tcp://geyser@172.16.119.42:7000]
14:00:10.353 INFO  [geyser-akka.actor.default-dispatcher-2] AngelOfTheAbyss 
- Member removed (Member(address = akka.tcp://geyser@172.16.119.42:7000, 
status = Removed)|Size:3)
14:00:10.353 INFO  [geyser-akka.actor.default-dispatcher-2] AngelOfTheAbyss 
- Member removed (Member(address = akka.tcp://geyser@172.16.125.13:7000, 
status = Removed)|Size:3)
14:00:10.353 INFO  [geyser-akka.actor.default-dispatcher-5] 
a.c.p.ClusterSingletonManager - Member removed 
[akka.tcp://geyser@172.16.119.42:7000]
14:00:10.354 INFO  [geyser-akka.actor.default-dispatcher-5] 
a.c.p.ClusterSingletonManager - Member removed 
[akka.tcp://geyser@172.16.125.13:7000]
14:00:10.356 WARN  [geyser-akka.remote.default-remote-dispatcher-9] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.356 WARN  [geyser-akka.remote.default-remote-dispatcher-9] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.385 WARN  [geyser-akka.remote.default-remote-dispatcher-10] 
a.r.EndpointWriter - AssociationError 
[akka.tcp://geyser@172.16.119.46:7000] -> 
[akka.tcp://geyser@172.16.125.13:7000]: Error [Invalid address: 
akka.tcp://geyser@172.16.125.13:7000] [
akka.remote.InvalidAssociation: Invalid address: 
akka.tcp://geyser@172.16.125.13:7000
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The 
remote system has a UID that has been quarantined. Association aborted.
]
14:00:10.386 INFO  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Quarantined address [akka.tcp://geyser@172.16.125.13:7000] is 
still unreachable or has not been restarted. Keeping it quarantined.


IP: 139
13:59:57.544 INFO  [geyser-akka.actor.default-dispatcher-187] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:58.359 INFO  [geyser-akka.actor.default-dispatcher-178] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:3)
14:00:11.358 INFO  [geyser-akka.actor.default-dispatcher-32] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Removed)|Size:3)
14:00:11.359 INFO  [geyser-akka.actor.default-dispatcher-32] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Removed)|Size:3)
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.

Is there anything abnormal in the logs?

Regards,
Ben


On Wednesday, March 23, 2016 at 9:33:02 AM UTC-4, Benjamin Black wrote:
>
> I look forward to trying out the new version. Not totally sure it is the 
> same issue I'm seeing this happen on a cluster where no node is being 
> restarted. I shall continue to investigate what has changed on my side, 
> because I wasn't see this before I upgraded other libraries.
>
> On Wednesday, March 23, 2016 at 2:08:10 AM UTC-4, Patrik Nordwall wrote:
>>
>> We have fixed the issue that is noticed as 
>> "Error encountered while processing system message acknowledgement 
>> buffer: [-1 {}] ack: ACK[6, {}]"
>>
>> https://github.com/akka/akka/pull/20093
>>
>> It will be released in 2.4.3 and 2.3.15, probably by end of next week.
>>
>> /Patrik
>> tis 22 mars 2016 kl. 23:39 skrev Guido Medina <oxy...@gmail.com>:
>>
>>> Yeah sorry I thought it was related with rolling restart.
>>>
>>> As for Netty, I'm using a *non-published yet* Netty with the following 
>>> fixes:
>>>
>>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>>
>>> You can just get it from Git and:
>>>
>>> $ git checkout 3.10
>>> $ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
>>> $ mvn clean install
>>>
>>> And see if your problem goes away,
>>>
>>> Guido.
>>>
>>> On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>>>>
>>>> Hi Guido, yes I'm aware of the leaving cluster conversation as I 
>>>> started it :-) This is separate issue. I am observing this behavior whilst 
>>>> the cluster seems stable with no nodes being added/removed. I suspect that 
>>>> this issue was first observed when I upgraded a different library that 
>>>> brought in a new version of the netty library.
>>>>
>>>> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> You have nodes with predefined ports, one thing I have which 
>>>>> eliminates that problem for these nodes is that
>>>>> only my seed node(s) have the port set, the rest will just get a 
>>>>> dynamic and available port, making it get a different port when you
>>>>> do a rolling restart.
>>>>>
>>>>> I suspect you are doing a rolling restart right? so you need to wait 
>>>>> for that node with that address to completely leave the cluster (I'm also 
>>>>> doing that),
>>>>> basically you terminate your system when you receive the message 
>>>>> *MemberRemoved* for *_self_* address.
>>>>>
>>>>> I think I saw a discussion related to quarantine nodes when they are 
>>>>> re-joining using the same address, not sure if here or if it is an actual 
>>>>> Git ticket.
>>>>>
>>>>> HTH,
>>>>>
>>>>> Guido.
>>>>>
>>>> -- 
>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>> >>>>>>>>>> Check the FAQ: 
>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>> >>>>>>>>>> Search the archives: 
>>> https://groups.google.com/group/akka-user
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Akka User List" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to akka-user+...@googlegroups.com.
>>> To post to this group, send email to akka...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/akka-user.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to