Re: Master keeps forgeting nodes

João Costa Tue, 07 Apr 2015 06:06:19 -0700

All machines are on the same region, the AZ is different though.

When you say "check the _cat outputs", you mean making a call to 
_cat/indices or _cat/shards when I know that the cluster is down, correct?
I'll try to do that, then.


Em segunda-feira, 6 de abril de 2015 23:32:51 UTC+1, Mark Walkom escreveu:
>
> The next time this happens can you check the _cat outputs, take a look at 
> https://github.com/elastic/elasticsearch/issues/10447 and see if it's 
> similar behaviour.
>
> On 7 April 2015 at 07:09, Mark Walkom <markw...@gmail.com <javascript:>> 
> wrote:
>
>> Are you running across AZs, or regions?
>>
>> On 6 April 2015 at 21:01, João Costa <jdp...@gmail.com <javascript:>> 
>> wrote:
>>
>>> Slight update: The same problem also happens on another cluster with the 
>>> same on another AWS account.
>>> While this does not happen on my test account, that's probably related 
>>> to the fact that those instances are regularly rebooted.
>>>
>>>
>>> Em segunda-feira, 6 de abril de 2015 11:42:07 UTC+1, João Costa escreveu:
>>>>
>>>> I have 2 EC2 in an AWS account where it appears that the master keeps 
>>>> forgetting about the slave node.
>>>>
>>>> In the slave node logs (I removed the IPs and time for simplicity, the 
>>>> master is "Cordelia Frost" and the slave is "Chronos"):
>>>>
>>>> [discovery.zen.fd] [Chronos] [master] pinging a master [Cordelia Frost] 
>>>> but 
>>>> we do not exists on it, act as if its master failure
>>>> [discovery.zen.fd] [Chronos] [master] stopping fault detection against 
>>>> master [Cordelia Frost], reason [master failure, do not exists on 
>>>> master, act as master failure]
>>>> [discovery.ec2] [Chronos] master_left [Cordelia Frost], reason [do not 
>>>> exists on master, act as master failure]
>>>> [discovery.ec2] [Chronos] master left (reason = do not exists on 
>>>> master, act as master failure), current nodes: {[Chronos]}
>>>> [cluster.service] [Chronos] removed {[Cordelia Frost]}, reason: 
>>>> zen-disco-master_failed ([Cordelia Frost])
>>>> [discovery.ec2] [Chronos] using dynamic discovery nodes
>>>> [discovery.ec2] [Chronos] using dynamic discovery nodes
>>>> [discovery.ec2] [Chronos] using dynamic discovery nodes
>>>> [discovery.ec2] [Chronos] filtered ping responses: 
>>>> (filter_client[true], filter_data[false])
>>>>         --> ping_response{node [Cordelia Frost], id[353], master [Cordelia 
>>>> Frost], hasJoinedOnce [true], cluster_name[cluster]}
>>>> [discovery.zen.publish] [Chronos] received cluster state version 232374
>>>> [discovery.zen.fd] [Chronos] [master] restarting fault detection 
>>>> against master [Cordelia Frost], reason [new cluster state received 
>>>> and we are monitoring the wrong master [null]]
>>>> [discovery.ec2] [Chronos] got first state from fresh master
>>>> [cluster.service] [Chronos] detected_master [Cordelia Frost], added 
>>>> {[Cordelia Frost]}, reason: zen-disco-receive(from master [Cordelia 
>>>> Frost])
>>>>
>>>> "Chronos" then receives the cluster state and everything goes back to 
>>>> normal.
>>>> This happens about on quite regular intervals (usually once per hour, 
>>>> although some times it takes more time to happen). Any idea of what can be 
>>>> causing this?
>>>>
>>>> I have a ping timeout of 15s on discovery.ec2, so I think that ping 
>>>> latency should not be the problem. I also do hourly snapshots with 
>>>> curator, 
>>>> in case that's relevant.
>>>> Finally, I also have another elasticsearch cluster with the same 
>>>> configuration on a different AWS account (used for testing purposes), and 
>>>> that problem has never occured. Can this be related to the AWS region?
>>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/ad32e277-d8a0-48f8-91a0-66f6868a08af%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/ad32e277-d8a0-48f8-91a0-66f6868a08af%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8c0407b1-bb5e-45b2-9cd9-214c55b53990%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Master keeps forgeting nodes

Reply via email to