All machines are on the same region, the AZ is different though. When you say "check the _cat outputs", you mean making a call to _cat/indices or _cat/shards when I know that the cluster is down, correct? I'll try to do that, then.
Em segunda-feira, 6 de abril de 2015 23:32:51 UTC+1, Mark Walkom escreveu: > > The next time this happens can you check the _cat outputs, take a look at > https://github.com/elastic/elasticsearch/issues/10447 and see if it's > similar behaviour. > > On 7 April 2015 at 07:09, Mark Walkom <markw...@gmail.com <javascript:>> > wrote: > >> Are you running across AZs, or regions? >> >> On 6 April 2015 at 21:01, João Costa <jdp...@gmail.com <javascript:>> >> wrote: >> >>> Slight update: The same problem also happens on another cluster with the >>> same on another AWS account. >>> While this does not happen on my test account, that's probably related >>> to the fact that those instances are regularly rebooted. >>> >>> >>> Em segunda-feira, 6 de abril de 2015 11:42:07 UTC+1, João Costa escreveu: >>>> >>>> I have 2 EC2 in an AWS account where it appears that the master keeps >>>> forgetting about the slave node. >>>> >>>> In the slave node logs (I removed the IPs and time for simplicity, the >>>> master is "Cordelia Frost" and the slave is "Chronos"): >>>> >>>> [discovery.zen.fd] [Chronos] [master] pinging a master [Cordelia Frost] >>>> but >>>> we do not exists on it, act as if its master failure >>>> [discovery.zen.fd] [Chronos] [master] stopping fault detection against >>>> master [Cordelia Frost], reason [master failure, do not exists on >>>> master, act as master failure] >>>> [discovery.ec2] [Chronos] master_left [Cordelia Frost], reason [do not >>>> exists on master, act as master failure] >>>> [discovery.ec2] [Chronos] master left (reason = do not exists on >>>> master, act as master failure), current nodes: {[Chronos]} >>>> [cluster.service] [Chronos] removed {[Cordelia Frost]}, reason: >>>> zen-disco-master_failed ([Cordelia Frost]) >>>> [discovery.ec2] [Chronos] using dynamic discovery nodes >>>> [discovery.ec2] [Chronos] using dynamic discovery nodes >>>> [discovery.ec2] [Chronos] using dynamic discovery nodes >>>> [discovery.ec2] [Chronos] filtered ping responses: >>>> (filter_client[true], filter_data[false]) >>>> --> ping_response{node [Cordelia Frost], id[353], master [Cordelia >>>> Frost], hasJoinedOnce [true], cluster_name[cluster]} >>>> [discovery.zen.publish] [Chronos] received cluster state version 232374 >>>> [discovery.zen.fd] [Chronos] [master] restarting fault detection >>>> against master [Cordelia Frost], reason [new cluster state received >>>> and we are monitoring the wrong master [null]] >>>> [discovery.ec2] [Chronos] got first state from fresh master >>>> [cluster.service] [Chronos] detected_master [Cordelia Frost], added >>>> {[Cordelia Frost]}, reason: zen-disco-receive(from master [Cordelia >>>> Frost]) >>>> >>>> "Chronos" then receives the cluster state and everything goes back to >>>> normal. >>>> This happens about on quite regular intervals (usually once per hour, >>>> although some times it takes more time to happen). Any idea of what can be >>>> causing this? >>>> >>>> I have a ping timeout of 15s on discovery.ec2, so I think that ping >>>> latency should not be the problem. I also do hourly snapshots with >>>> curator, >>>> in case that's relevant. >>>> Finally, I also have another elasticsearch cluster with the same >>>> configuration on a different AWS account (used for testing purposes), and >>>> that problem has never occured. Can this be related to the AWS region? >>>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearc...@googlegroups.com <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/ad32e277-d8a0-48f8-91a0-66f6868a08af%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elasticsearch/ad32e277-d8a0-48f8-91a0-66f6868a08af%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c0407b1-bb5e-45b2-9cd9-214c55b53990%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.