Re: [akka-user] Re: Node quarantined

2016-04-29 Thread Benjamin Black
This is the latest version of akka for java 7. 

On Friday, April 29, 2016 at 3:18:55 PM UTC-4, Patrik Nordwall wrote:
>
> There can be several reasons, but a good start is to use latest Akka 
> version.
> tors 28 apr. 2016 kl. 21:13 skrev Guido Medina  >:
>
>> Hi Ben,
>>
>> As my experience goes Netty 3 doesn't get much love, issues are barely 
>> fixed,
>> like I mentioned before I'm running my own Netty 3.10.6 built internally, 
>> also; 3.10.0 is not even a good version,
>> if you want force your version to 3.10.5.Final until they release 
>> 3.10.6.Final which has nice fixes.
>>
>> or
>>
>> you could get my branch, set the version to whatever is comfortable for 
>> you and build your own Netty,
>>
>> My branch: https://github.com/guidomedina/netty/commits/3.10-SFS
>>
>> has the following milestone: 
>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>
>> plus some minor fixes I added myself, as of interest there is a race 
>> condition fixed at 3.10.6 and
>> I saw another between 3.10.0 and 3.10.5 which might be causing the issue 
>> you are experiencing.
>>
>> HTH,
>>
>> Guido.
>>
>> -- 
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com .
>> To post to this group, send email to akka...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-04-29 Thread Patrik Nordwall
There can be several reasons, but a good start is to use latest Akka
version.
tors 28 apr. 2016 kl. 21:13 skrev Guido Medina :

> Hi Ben,
>
> As my experience goes Netty 3 doesn't get much love, issues are barely
> fixed,
> like I mentioned before I'm running my own Netty 3.10.6 built internally,
> also; 3.10.0 is not even a good version,
> if you want force your version to 3.10.5.Final until they release
> 3.10.6.Final which has nice fixes.
>
> or
>
> you could get my branch, set the version to whatever is comfortable for
> you and build your own Netty,
>
> My branch: https://github.com/guidomedina/netty/commits/3.10-SFS
>
> has the following milestone:
> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>
> plus some minor fixes I added myself, as of interest there is a race
> condition fixed at 3.10.6 and
> I saw another between 3.10.0 and 3.10.5 which might be causing the issue
> you are experiencing.
>
> HTH,
>
> Guido.
>
> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-04-28 Thread Guido Medina
Hi Ben,

As my experience goes Netty 3 doesn't get much love, issues are barely 
fixed,
like I mentioned before I'm running my own Netty 3.10.6 built internally, 
also; 3.10.0 is not even a good version,
if you want force your version to 3.10.5.Final until they release 
3.10.6.Final which has nice fixes.

or

you could get my branch, set the version to whatever is comfortable for you 
and build your own Netty,

My branch: https://github.com/guidomedina/netty/commits/3.10-SFS

has the following 
milestone: 
https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed

plus some minor fixes I added myself, as of interest there is a race 
condition fixed at 3.10.6 and
I saw another between 3.10.0 and 3.10.5 which might be causing the issue 
you are experiencing.

HTH,

Guido.

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-04-28 Thread Benjamin Black
I'm following up on this topic after upgrading to akka 2.3.15. I'm 
reasonably confident that the issue is the resullt of using akka along with 
another library that causes the netty dependency to be upgraded from 
3.9.2.Final to 3.10.0.Final. For now I have removed the dependency on the 
newer version of netty, but I thought I'd report what I was seeing in the 
logs. I am running five nodes for a few hours with no issue, and then two 
nodes fall out of the cluster. Here are the logs from each node:

IP: 160
13:59:57.252 INFO  [geyser-akka.actor.default-dispatcher-6] AngelOfTheAbyss 
- Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:58.541 INFO  [geyser-akka.actor.default-dispatcher-306] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:3)
14:00:11.540 INFO  [geyser-akka.actor.default-dispatcher-282] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Removed)|Size:3)
14:00:11.541 INFO  [geyser-akka.actor.default-dispatcher-282] 
AngelOfTheAbyss - Member removed (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Removed)|Size:3)
14:00:11.545 WARN  [geyser-akka.remote.default-remote-dispatcher-8] 
Remoting - Association to [akka.tcp://geyser@172.16.119.42:7000] having UID 
[-477546934] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:11.546 WARN  [geyser-akka.remote.default-remote-dispatcher-8] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.

IP: 42
13:59:57.326 WARN  [geyser-cluster-dispatcher-15] a.c.ClusterCoreDaemon - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Marking node(s) as 
UNREACHABLE [Member(address = akka.tcp://geyser@172.16.125.13:7000, status 
= Up)]
13:59:57.328 INFO  [geyser-akka.actor.default-dispatcher-46] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.125.13:7000, status = Up)|Size:4)
14:00:07.345 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Leader is 
auto-downing unreachable node [akka.tcp://geyser@172.16.125.13:7000]
14:00:07.346 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Marking unreachable 
node [akka.tcp://geyser@172.16.125.13:7000] as [Down]
14:00:07.694 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Shutting down...
14:00:07.695 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.119.42:7000] - Successfully shut down
14:00:07.703 WARN  [geyser-akka.remote.default-remote-dispatcher-27] 
Remoting - Association to [akka.tcp://geyser@172.16.125.13:7000] having UID 
[-1471771858] is irrecoverably failed. UID is now quarantined and all 
messages to this UID will be delivered to dead letters. Remote actorsystem 
must be restarted to recover from this situation.
14:00:10.360 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.119.46:7000] has failed, address is now gated for 
[5000] ms. Reason: [Disassociated]
14:00:11.361 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.17.110.139:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]
14:00:11.544 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.120.160:7000] has failed, address is now gated 
for [5000] ms. Reason: [Disassociated]

IP: 13
13:59:57.244 WARN  [geyser-cluster-dispatcher-17] a.c.ClusterCoreDaemon - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Marking node(s) as 
UNREACHABLE [Member(address = akka.tcp://geyser@172.16.119.42:7000, status 
= Up)]
13:59:57.245 INFO  [geyser-akka.actor.default-dispatcher-61] 
AngelOfTheAbyss - Unreachable member (Member(address = 
akka.tcp://geyser@172.16.119.42:7000, status = Up)|Size:4)
13:59:57.326 INFO  [geyser-cluster-dispatcher-15] Cluster(akka://geyser) - 
Cluster Node [akka.tcp://geyser@172.16.125.13:7000] - Ignoring received 
gossip status from unreachable 
[UniqueAddress(akka.tcp://geyser@172.16.119.42:7000,-477546934)]
14:00:07.711 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
a.r.ReliableDeliverySupervisor - Association with remote system 
[akka.tcp://geyser@172.16.119.42:7000] has failed, address is now gated 

Re: [akka-user] Re: Node quarantined

2016-03-23 Thread Benjamin Black
I look forward to trying out the new version. Not totally sure it is the 
same issue I'm seeing this happen on a cluster where no node is being 
restarted. I shall continue to investigate what has changed on my side, 
because I wasn't see this before I upgraded other libraries.

On Wednesday, March 23, 2016 at 2:08:10 AM UTC-4, Patrik Nordwall wrote:
>
> We have fixed the issue that is noticed as 
> "Error encountered while processing system message acknowledgement buffer: 
> [-1 {}] ack: ACK[6, {}]"
>
> https://github.com/akka/akka/pull/20093
>
> It will be released in 2.4.3 and 2.3.15, probably by end of next week.
>
> /Patrik
> tis 22 mars 2016 kl. 23:39 skrev Guido Medina  >:
>
>> Yeah sorry I thought it was related with rolling restart.
>>
>> As for Netty, I'm using a *non-published yet* Netty with the following 
>> fixes:
>>
>> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>>
>> You can just get it from Git and:
>>
>> $ git checkout 3.10
>> $ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
>> $ mvn clean install
>>
>> And see if your problem goes away,
>>
>> Guido.
>>
>> On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>>>
>>> Hi Guido, yes I'm aware of the leaving cluster conversation as I started 
>>> it :-) This is separate issue. I am observing this behavior whilst the 
>>> cluster seems stable with no nodes being added/removed. I suspect that this 
>>> issue was first observed when I upgraded a different library that brought 
>>> in a new version of the netty library.
>>>
>>> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:

 Hi Benjamin,

 You have nodes with predefined ports, one thing I have which eliminates 
 that problem for these nodes is that
 only my seed node(s) have the port set, the rest will just get a 
 dynamic and available port, making it get a different port when you
 do a rolling restart.

 I suspect you are doing a rolling restart right? so you need to wait 
 for that node with that address to completely leave the cluster (I'm also 
 doing that),
 basically you terminate your system when you receive the message 
 *MemberRemoved* for *_self_* address.

 I think I saw a discussion related to quarantine nodes when they are 
 re-joining using the same address, not sure if here or if it is an actual 
 Git ticket.

 HTH,

 Guido.

>>> -- 
>> >> Read the docs: http://akka.io/docs/
>> >> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com .
>> To post to this group, send email to akka...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-03-23 Thread Guido Medina
*Correction:* Set that only for non-seed nodes.

On Wednesday, March 23, 2016 at 9:38:44 AM UTC, Guido Medina wrote:
>
> Hi Benjamin,
>
> For what I could understand from the issue, this is happening only to 
> nodes that rejoined
> the cluster under the same address (host and port) so I believe that 
> setting
>
> akka.remote.netty.tcp.port = 0
>
> should solve the problem in the meantime,
>
> Cheers,
>
> Guido.
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-03-23 Thread Guido Medina
Hi Benjamin,

For what I could understand from the issue, this is happening only to nodes 
that rejoined
the cluster under the same address (host and port) so I believe that setting

akka.remote.netty.tcp.port = 0

should solve the problem in the meantime,

Cheers,

Guido.

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


Re: [akka-user] Re: Node quarantined

2016-03-23 Thread Patrik Nordwall
We have fixed the issue that is noticed as
"Error encountered while processing system message acknowledgement buffer:
[-1 {}] ack: ACK[6, {}]"

https://github.com/akka/akka/pull/20093

It will be released in 2.4.3 and 2.3.15, probably by end of next week.

/Patrik
tis 22 mars 2016 kl. 23:39 skrev Guido Medina :

> Yeah sorry I thought it was related with rolling restart.
>
> As for Netty, I'm using a *non-published yet* Netty with the following
> fixes:
>
> https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed
>
> You can just get it from Git and:
>
> $ git checkout 3.10
> $ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
> $ mvn clean install
>
> And see if your problem goes away,
>
> Guido.
>
> On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>>
>> Hi Guido, yes I'm aware of the leaving cluster conversation as I started
>> it :-) This is separate issue. I am observing this behavior whilst the
>> cluster seems stable with no nodes being added/removed. I suspect that this
>> issue was first observed when I upgraded a different library that brought
>> in a new version of the netty library.
>>
>> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>>>
>>> Hi Benjamin,
>>>
>>> You have nodes with predefined ports, one thing I have which eliminates
>>> that problem for these nodes is that
>>> only my seed node(s) have the port set, the rest will just get a dynamic
>>> and available port, making it get a different port when you
>>> do a rolling restart.
>>>
>>> I suspect you are doing a rolling restart right? so you need to wait for
>>> that node with that address to completely leave the cluster (I'm also doing
>>> that),
>>> basically you terminate your system when you receive the message
>>> *MemberRemoved* for *_self_* address.
>>>
>>> I think I saw a discussion related to quarantine nodes when they are
>>> re-joining using the same address, not sure if here or if it is an actual
>>> Git ticket.
>>>
>>> HTH,
>>>
>>> Guido.
>>>
>> --
> >> Read the docs: http://akka.io/docs/
> >> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to akka-user+unsubscr...@googlegroups.com.
> To post to this group, send email to akka-user@googlegroups.com.
> Visit this group at https://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Node quarantined

2016-03-22 Thread Guido Medina
Yeah sorry I thought it was related with rolling restart.

As for Netty, I'm using a *non-published yet* Netty with the following 
fixes:
https://github.com/netty/netty/issues?q=milestone%3A3.10.6.Final+is%3Aclosed

You can just get it from Git and:

$ git checkout 3.10
$ mvn versions:set -DnewVersion=3.10.6.Final -DgenerateBackupPoms=false
$ mvn clean install

And see if your problem goes away,

Guido.

On Tuesday, March 22, 2016 at 10:27:26 PM UTC, Benjamin Black wrote:
>
> Hi Guido, yes I'm aware of the leaving cluster conversation as I started 
> it :-) This is separate issue. I am observing this behavior whilst the 
> cluster seems stable with no nodes being added/removed. I suspect that this 
> issue was first observed when I upgraded a different library that brought 
> in a new version of the netty library.
>
> On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>>
>> Hi Benjamin,
>>
>> You have nodes with predefined ports, one thing I have which eliminates 
>> that problem for these nodes is that
>> only my seed node(s) have the port set, the rest will just get a dynamic 
>> and available port, making it get a different port when you
>> do a rolling restart.
>>
>> I suspect you are doing a rolling restart right? so you need to wait for 
>> that node with that address to completely leave the cluster (I'm also doing 
>> that),
>> basically you terminate your system when you receive the message 
>> *MemberRemoved* for *_self_* address.
>>
>> I think I saw a discussion related to quarantine nodes when they are 
>> re-joining using the same address, not sure if here or if it is an actual 
>> Git ticket.
>>
>> HTH,
>>
>> Guido.
>>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Node quarantined

2016-03-22 Thread Benjamin Black
Hi Guido, yes I'm aware of the leaving cluster conversation as I started it 
:-) This is separate issue. I am observing this behavior whilst the cluster 
seems stable with no nodes being added/removed. I suspect that this issue 
was first observed when I upgraded a different library that brought in a 
new version of the netty library.

On Tuesday, March 22, 2016 at 6:23:14 PM UTC-4, Guido Medina wrote:
>
> Hi Benjamin,
>
> You have nodes with predefined ports, one thing I have which eliminates 
> that problem for these nodes is that
> only my seed node(s) have the port set, the rest will just get a dynamic 
> and available port, making it get a different port when you
> do a rolling restart.
>
> I suspect you are doing a rolling restart right? so you need to wait for 
> that node with that address to completely leave the cluster (I'm also doing 
> that),
> basically you terminate your system when you receive the message 
> *MemberRemoved* for *_self_* address.
>
> I think I saw a discussion related to quarantine nodes when they are 
> re-joining using the same address, not sure if here or if it is an actual 
> Git ticket.
>
> HTH,
>
> Guido.
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Node quarantined

2016-03-22 Thread Guido Medina
Hi Benjamin,

You have nodes with predefined ports, one thing I have which eliminates 
that problem for these nodes is that
only my seed node(s) have the port set, the rest will just get a dynamic 
and available port, making it get a different port when you
do a rolling restart.

I suspect you are doing a rolling restart right? so you need to wait for 
that node with that address to completely leave the cluster (I'm also doing 
that),
basically you terminate your system when you receive the message 
*MemberRemoved* for *_self_* address.

I think I saw a discussion related to quarantine nodes when they are 
re-joining using the same address, not sure if here or if it is an actual 
Git ticket.

HTH,

Guido.

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>  Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


[akka-user] Re: Node quarantined

2016-03-22 Thread Guido Medina
Hi Benjamin,

You have nodes with predefined ports and addresses, one thing I have which 
eliminates that problem for these nodes is that
only my seed node(s) have the port set, the rest will just get a dynamic 
and available port, making it get a different port when you
do a rolling restart.

I suspect you are doing a rolling restart right? so you need to wait for 
that node with that address to completely leave the cluster (I'm also doing 
that),
basically you terminate your system when you receive the message 
*MemberRemoved* for *_self_* address.

I think I saw a discussion related to quarantine nodes when they are 
re-joining using the same address, not sure if here or if it is an actual 
Git ticket.

HTH,

Guido.

On Tuesday, March 22, 2016 at 9:22:00 PM UTC, Benjamin Black wrote:
>
> I see the same issue with 2.3.14.
>
> On Tuesday, March 22, 2016 at 2:00:15 PM UTC-4, Guido Medina wrote:
>>
>> To eliminate noise please update to 2.3.14 which from 2.3.11 has some 
>> cluster fixes, there are also several fixes on Scala 2.11.8 (not related)
>>
>> I don't know, I just have the custom of keeping my libs up to date.
>>
>> HTH,
>>
>> Guido.
>>
>> On Tuesday, March 22, 2016 at 5:34:23 PM UTC, Benjamin Black wrote:
>>>
>>> Hello,
>>>
>>> I'm trying to understand the cause of nodes being quarantined and 
>>> possible solutions to fixing it. I'm using akka 2.3.11. On the quarantined 
>>> node I see this logging:
>>>
>>> 2:45:44.204 ERROR [geyser-akka.remote.default-remote-dispatcher-6] 
>>> a.r.EndpointWriter - AssociationError [akka.tcp://
>>> geyser@172.16.120.174:7000] <- [akka.tcp://geyser@172.17.100.105:7000]: 
>>> Error [Invalid address: akka.tcp://geyser@172.17.100.105:7000] [
>>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>>> geyser@172.17.100.105:7000
>>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>>> The remote system has quarantined this system. No further associations to 
>>> the remote system are possible until this system is restarted.
>>> ]
>>> 12:45:44.205 WARN  [geyser-akka.remote.default-remote-dispatcher-25] 
>>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>>> geyser@172.17.100.105:7000]. Address is now gated for 5000 ms, all 
>>> messages to this address will be delivered to dead letters. Reason: [The 
>>> remote system has quarantined this system. No further associations to the 
>>> remote system are possible until this system is restarted.]
>>>
>>> And on the node that cause the box to be quarantined I see this logging:
>>>
>>> 12:45:44.194 WARN  [geyser-akka.remote.default-remote-dispatcher-6] 
>>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] 
>>> having UID [-450748474] is irrecoverably failed. UID is now quarantined and 
>>> all messages to this UID will be delivered to dead letters. Remote 
>>> actorsystem must be restarted to recover from this situation.
>>> 12:45:44.202 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>>> a.r.EndpointWriter - AssociationError [akka.tcp://
>>> geyser@172.17.100.105:7000] -> [akka.tcp://geyser@172.16.120.174:7000]: 
>>> Error [Invalid address: akka.tcp://geyser@172.16.120.174:7000] [
>>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>>> geyser@172.16.120.174:7000
>>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>>> The remote system has a UID that has been quarantined. Association aborted.
>>> ]
>>> 12:45:44.203 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>>> geyser@172.16.120.174:7000]. Address is now gated for 5000 ms, all 
>>> messages to this address will be delivered to dead letters. Reason: [The 
>>> remote system has a UID that has been quarantined. Association aborted.]
>>> 12:45:44.221 ERROR [geyser-akka.remote.default-remote-dispatcher-7] 
>>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] with 
>>> UID [-450748474] irrecoverably failed. Quarantining address.
>>> java.lang.IllegalStateException: Error encountered while processing 
>>> system message acknowledgement buffer: [-1 {}] ack: ACK[6, {}]
>>> at 
>>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:288)
>>>  
>>> ~[geyser.jar:1.1.17-SNAPSHOT]
>>> at akka.actor.Actor$class.aroundReceive(Actor.scala:467) 
>>> ~[geyser.jar:1.1.17-SNAPSHOT]
>>> Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was -1 
>>> but cumulative ACK is 6
>>> at 
>>> akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103) 
>>> ~[geyser.jar:1.1.17-SNAPSHOT]
>>> at 
>>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:284)
>>>  
>>> ~[geyser.jar:1.1.17-SNAPSHOT]
>>> ... 11 common frames omitted
>>> 12:45:44.221 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>>> Remoting - Association to 

[akka-user] Re: Node quarantined

2016-03-22 Thread Benjamin Black
I see the same issue with 2.3.14.

On Tuesday, March 22, 2016 at 2:00:15 PM UTC-4, Guido Medina wrote:
>
> To eliminate noise please update to 2.3.14 which from 2.3.11 has some 
> cluster fixes, there are also several fixes on Scala 2.11.8 (not related)
>
> I don't know, I just have the custom of keeping my libs up to date.
>
> HTH,
>
> Guido.
>
> On Tuesday, March 22, 2016 at 5:34:23 PM UTC, Benjamin Black wrote:
>>
>> Hello,
>>
>> I'm trying to understand the cause of nodes being quarantined and 
>> possible solutions to fixing it. I'm using akka 2.3.11. On the quarantined 
>> node I see this logging:
>>
>> 2:45:44.204 ERROR [geyser-akka.remote.default-remote-dispatcher-6] 
>> a.r.EndpointWriter - AssociationError [akka.tcp://
>> geyser@172.16.120.174:7000] <- [akka.tcp://geyser@172.17.100.105:7000]: 
>> Error [Invalid address: akka.tcp://geyser@172.17.100.105:7000] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> geyser@172.17.100.105:7000
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has quarantined this system. No further associations to 
>> the remote system are possible until this system is restarted.
>> ]
>> 12:45:44.205 WARN  [geyser-akka.remote.default-remote-dispatcher-25] 
>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>> geyser@172.17.100.105:7000]. Address is now gated for 5000 ms, all 
>> messages to this address will be delivered to dead letters. Reason: [The 
>> remote system has quarantined this system. No further associations to the 
>> remote system are possible until this system is restarted.]
>>
>> And on the node that cause the box to be quarantined I see this logging:
>>
>> 12:45:44.194 WARN  [geyser-akka.remote.default-remote-dispatcher-6] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] having 
>> UID [-450748474] is irrecoverably failed. UID is now quarantined and all 
>> messages to this UID will be delivered to dead letters. Remote actorsystem 
>> must be restarted to recover from this situation.
>> 12:45:44.202 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> a.r.EndpointWriter - AssociationError [akka.tcp://
>> geyser@172.17.100.105:7000] -> [akka.tcp://geyser@172.16.120.174:7000]: 
>> Error [Invalid address: akka.tcp://geyser@172.16.120.174:7000] [
>> akka.remote.InvalidAssociation: Invalid address: akka.tcp://
>> geyser@172.16.120.174:7000
>> Caused by: akka.remote.transport.Transport$InvalidAssociationException: 
>> The remote system has a UID that has been quarantined. Association aborted.
>> ]
>> 12:45:44.203 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Tried to associate with unreachable remote address [akka.tcp://
>> geyser@172.16.120.174:7000]. Address is now gated for 5000 ms, all 
>> messages to this address will be delivered to dead letters. Reason: [The 
>> remote system has a UID that has been quarantined. Association aborted.]
>> 12:45:44.221 ERROR [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] with 
>> UID [-450748474] irrecoverably failed. Quarantining address.
>> java.lang.IllegalStateException: Error encountered while processing 
>> system message acknowledgement buffer: [-1 {}] ack: ACK[6, {}]
>> at 
>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:288)
>>  
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> at akka.actor.Actor$class.aroundReceive(Actor.scala:467) 
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> Caused by: java.lang.IllegalArgumentException: Highest SEQ so far was -1 
>> but cumulative ACK is 6
>> at 
>> akka.remote.AckedSendBuffer.acknowledge(AckedDelivery.scala:103) 
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> at 
>> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoint.scala:284)
>>  
>> ~[geyser.jar:1.1.17-SNAPSHOT]
>> ... 11 common frames omitted
>> 12:45:44.221 WARN  [geyser-akka.remote.default-remote-dispatcher-7] 
>> Remoting - Association to [akka.tcp://geyser@172.16.120.174:7000] having 
>> UID [-450748474] is irrecoverably failed. UID is now quarantined and all 
>> messages to this UID will be delivered to dead letters. Remote actorsystem 
>> must be restarted to recover from this situation.
>>
>> Quite a bit of data can be passed between the nodes ~200 Mb/sec and maybe 
>> the system is hitting a capacity issue although I don't see any issue with 
>> CPU or memory. I noticed that the default-remote-dispatcher only has two 
>> threads. Are these threads being used to send the data? And if so should I 
>> try increase the thread count? Are there any other settings I could play 
>> with of things I can look for in the logs that might highlight what is 
>> wrong?
>>
>> Thanks,
>> Ben
>>
>

-- 
>>  Read the docs: http://akka.io/docs/
>>  Check the FAQ: 
>>