No, According to suggestions in the docs, I am not running auto-downing. 
Should I change that policy ? 

On Sunday, September 11, 2016 at 3:22:22 PM UTC-5, √ wrote:
>
> Are you running auto-downing?
>
> On Sat, Sep 10, 2016 at 11:43 PM, kraythe <kra...@gmail.com <javascript:>> 
> wrote:
>
>> Thanks for the information. I have read most of these. Can I take it from 
>> your responses that you agree that this could be a split brain problem? I 
>> am just wondering if using all nodes as seed nodes is what is causing the 
>> issue. In production our seed nodes are fixed IPs. But when we run in the 
>> cloud we have to do auto-discovery. That is what makes the problem 
>> complicated. I respect that ConductR has solved this problem perhaps, and I 
>> am all in favor of going to commercial, but like I said the project I am on 
>> will have to be out and making profit before I can even suggest typing us 
>> to a particular platform purchase, especially since that is a rather large 
>> recurring expenditure. Furthermore, the project has a decent amount of 
>> legacy code that will have to be overcome. Its not a pure Actor program 
>> just yet. I have to have ROI to make those changes and right now I am 
>> trying to fry other fish on the development schedule; such as converting 
>> some of that legacy transactional code into actor models. 
>>
>> In the meantime, I need to make sure that the system is stable in 
>> development. What worries me is the weird behavior of when one node goes 
>> down now other nodes start reporting the problems connecting to the 
>> coordinator. Its like they aren't in a cooperative. Its almost like A is 
>> connected to B, C and D are connected to E and then when one goes away a 
>> cascade failure takes over. In the case of the right split brain i would 
>> have assumed that if E goes away, C or D will take over the duties and life 
>> will go on. However that doesn't seem to happen. It seems almost like we 
>> have a chain rather than cluster. I hope I am making myself clear. 
>>
>> On Saturday, September 10, 2016 at 12:42:06 PM UTC-5, Patrik Nordwall 
>> wrote:
>>>
>>> You asked for links
>>>
>>>
>>> http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html#Joining_to_Seed_Nodes
>>>
>>> http://doc.akka.io/docs/akka/2.4/scala/cluster-usage.html#Downing
>>>
>>> https://conductr.lightbend.com/docs/1.1.x/Home
>>> lör 10 sep. 2016 kl. 19:27 skrev Viktor Klang <viktor...@gmail.com>:
>>>
>>>> ConductR* is designed to properly seed, and update, Akka Cluster based 
>>>> applications, and the Akka Split Brain Resolver provides deterministic 
>>>> partition handling.
>>>>
>>>> ConductR runs well on EC2: 
>>>> https://conductr.lightbend.com/docs/1.0.x/Install#EC2-Installation
>>>>
>>>> * ConductR and SBR are Lightbend products
>>>>
>>>> -- 
>>>> Cheers,
>>>> √
>>>>
>>>> On Sep 10, 2016 6:46 PM, "kraythe" <kra...@gmail.com> wrote:
>>>>
>>>>> I am not following you on this one. Is there a blog post or article 
>>>>> you can reference me to ? 
>>>>>
>>>>> Thanks to you both. 
>>>>>
>>>>> On Saturday, September 10, 2016 at 11:34:26 AM UTC-5, √ wrote:
>>>>>>
>>>>>> There's also ConductR + SBR
>>>>>>
>>>>>> -- 
>>>>>> Cheers,
>>>>>> √
>>>>>>
>>>>>> On Sep 10, 2016 5:09 PM, "Patrik Nordwall" <patrik....@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> Are you aware of the importance of the first seed node, the one you 
>>>>>>> have listed as first element in sees-nodes list? See documentation.
>>>>>>>
>>>>>>> You can get decent behavior if you wait with joining until the list 
>>>>>>> of discovered nodes stabilize, i.e. not changing within X seconds. Then 
>>>>>>> sort them to make sure the same is used as the first from all places. 
>>>>>>> Then 
>>>>>>> joinSeedNodes with that sorted list.
>>>>>>>
>>>>>>> To be completely safe you must manually decide which one to use as 
>>>>>>> the first seed node.
>>>>>>>
>>>>>>> /Patrik
>>>>>>>
>>>>>>> fre 9 sep. 2016 kl. 20:32 skrev kraythe <kra...@gmail.com>:
>>>>>>>
>>>>>>>> Greetings, 
>>>>>>>>
>>>>>>>> We are having some problems with our cluster configuration that 
>>>>>>>> manifest themselves in the following log lines (redacted for 
>>>>>>>> confidentiality reasons. 
>>>>>>>>
>>>>>>>> Sep 09 00:58:10 host1.mycompany.com application-9001.log:  2016-09-
>>>>>>>> 09 05:58:10 +0000 - [WARN] - [OrdersActor] 
>>>>>>>> akka://myCompany/user/OrdersActor/291 
>>>>>>>> -  (291) #recordTxns, sending 54 txns to UserActor took 0.0044229 
>>>>>>>> seconds
>>>>>>>> Sep 09 00:58:19 host1.mycompany.com application-9001.log:  2016-09-
>>>>>>>> 09 05:58:19 +0000 - [WARN] - [ShardRegion] akka.tcp://
>>>>>>>> myCompany@10.8.1.169:2551/system/sharding/UserActor -  Trying to 
>>>>>>>> register to coordinator at [None], but no acknowledgement. Total [54] 
>>>>>>>> buffered messages.
>>>>>>>>
>>>>>>>> I have traced this to the configuration of the cluster. We are 
>>>>>>>> running this on Amazon AWS and the code includes use of Hazelcast for 
>>>>>>>> finding the IPs of the other nodes (mostly because we have solved 
>>>>>>>> discovery 
>>>>>>>> for hazelcast in our dynamic IP cluster). We retrieve the IPs of the 
>>>>>>>> other 
>>>>>>>> nodes in the cluster from hazelcast and appropriately use them to 
>>>>>>>> create 
>>>>>>>> the Address object to use in the seed node. Once we have the seed 
>>>>>>>> nodes we 
>>>>>>>> have tried two mechanisms. First is to take the list of seed nodes and 
>>>>>>>> use 
>>>>>>>> them to join the cluster with cluster.joinSeedNodes(). Of Course 
>>>>>>>> not all machines come up and are discovered by hazelcast at exactly 
>>>>>>>> the 
>>>>>>>> same instant so the first 3 nodes might come up first and use each 
>>>>>>>> other to 
>>>>>>>> join whereas by the time the 9th node comes up there are 9 seed nodes. 
>>>>>>>> When 
>>>>>>>> we start sending messages to cluster shared actors, we get the errors 
>>>>>>>> above. Also when a node goes down the system screams constantly that a 
>>>>>>>> seed 
>>>>>>>> node is gone. So I changed the code to pick a node at random and do a 
>>>>>>>> cluster.join() with that node instead. However, we have the same 
>>>>>>>> problem as above. However, when we first bring up one node then bring 
>>>>>>>> them 
>>>>>>>> up one at a time, the problem goes away. Another symptom is that if we 
>>>>>>>> have 
>>>>>>>> the problem above and we terminate host1 then other nodes start 
>>>>>>>> propagating 
>>>>>>>> this behavior. Probably all the other nodes that were connected to 
>>>>>>>> host1. 
>>>>>>>> Apparently they can't heal to connect to another node. So this lends 
>>>>>>>> evidence to the multiple split brains theory. 
>>>>>>>>
>>>>>>>> My theory is that by using all these seed nodes I am creating 
>>>>>>>> multiple split brains. So if you have 5 nodes, A, B, C, D, E and A 
>>>>>>>> connects 
>>>>>>>> to B, B to A, C to E, E to D, D to E then we have two clusters running 
>>>>>>>> that 
>>>>>>>> know nothing about each other. For some reason then the coordinators 
>>>>>>>> get 
>>>>>>>> confused about what is going on. 
>>>>>>>>
>>>>>>>> Essentially the problem domain is this: 1. We don't know what ANY 
>>>>>>>> of the IPs are ahead of time. 2) We want the cluster to be whole. 3) 
>>>>>>>> If a 
>>>>>>>> single node leaves the cluster we would like the remaining nodes to 
>>>>>>>> recover. 
>>>>>>>>
>>>>>>>> I would appreciate any insight anyone could provide on this and 
>>>>>>>> especially what may be the problem (I could be wrong), and how we can 
>>>>>>>> accomplish our goals. Note that I am not committed to using hazelcast 
>>>>>>>> to 
>>>>>>>> find other nodes. 
>>>>>>>>
>>>>>>>>  Thanks in advance.
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>> >>>>>>>>>> Check the FAQ: 
>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>> >>>>>>>>>> Search the archives: 
>>>>>>>> https://groups.google.com/group/akka-user
>>>>>>>> --- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Akka User List" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>
>>>>>>> -- 
>>>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>> >>>>>>>>>> Check the FAQ: 
>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>> >>>>>>>>>> Search the archives: 
>>>>>>> https://groups.google.com/group/akka-user
>>>>>>> --- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Akka User List" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to akka-user+...@googlegroups.com.
>>>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>> -- 
>>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>> >>>>>>>>>> Check the FAQ: 
>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>> >>>>>>>>>> Search the archives: 
>>>>> https://groups.google.com/group/akka-user
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Akka User List" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to akka-user+...@googlegroups.com.
>>>>> To post to this group, send email to akka...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>> -- 
>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>> >>>>>>>>>> Check the FAQ: 
>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>> >>>>>>>>>> Search the archives: 
>>>> https://groups.google.com/group/akka-user
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Akka User List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to akka-user+...@googlegroups.com.
>>>> To post to this group, send email to akka...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/akka-user.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>> -- 
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ: 
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to akka-user+...@googlegroups.com <javascript:>.
>> To post to this group, send email to akka...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Cheers,
> √
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to akka-user+unsubscr...@googlegroups.com.
To post to this group, send email to akka-user@googlegroups.com.
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to