Re: Node not joining cluster on boot

David Pilato Thu, 13 Mar 2014 02:15:39 -0700

did you set the same cluster name on both nodes?

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr



Le 13 mars 2014 à 09:57:35, Guillaume Loetscher (sterfi...@gmail.com) a écrit:

Hi,

First, thanks for the answers and remarks.

You are both right, the issue I'm currently facing leads to a "split-brain" 
situation, where Node #1 & Node #2 are both master, and doing their own life on 
their side. I'll see to change my configuration and the number of node, in 
order to limit this situation (I already checked this article talking about 
split-brain in ES).

However, this split-brain situation is the result of the problem with the 
discovery / broadcast, which is represented in the log of Node #2 here :
[2014-03-12 22:03:52,709][WARN ][discovery.zen.ping.multicast] [Node ES #2] 
received ping response ping_response{target [[Node ES 
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], master 
[[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], 
cluster_name[logstash]} with no matching id [1]

So, the connectivity between Node #1 (which is the first one online, and 
therefore master) and Node #2 is established, as the log on Node #2 clearly 
said "received ping response", but with an "ID that didn't match".

This is apparently why Node #2 didn't join the cluster on Node #1, and this is 
this specific issue I want to resolve.

Thanks,

Le jeudi 13 mars 2014 07:03:35 UTC+1, David Pilato a écrit :
Bonjour :-)

You should set min_master_nodes to 2. Although I'd recommend having 3 nodes 
instead of 2.

--
David ;-)
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 12 mars 2014 à 23:58, Guillaume Loetscher <ster...@gmail.com> a écrit :

Hi,

I've begun to test Elasticsearch recently, on a little mockup I've designed.

Currently, I'm running two nodes on two LXC (v0.9) containers. Those containers 
are linked using veth to a bridge declared on the host.

When I start the first node, the cluster starts, but when I start the second 
node a bit later, it seems to get some information from the other node but it 
always ended with the same "no matchind id" error.

Here's what I'm doing :

I start the LXC container of the first node :
root@lada:~# date && lxc-start -n es_node1 -d
mercredi 12 mars 2014, 22:59:39 (UTC+0100)



I logon the node, check the log file :
[2014-03-12 21:59:41,927][INFO ][node                     ] [Node ES #1] 
version[0.90.12], pid[1129], build[26feed7/2014-02-25T15:38:23Z]
[2014-03-12 21:59:41,928][INFO ][node                     ] [Node ES #1] 
initializing ...
[2014-03-12 21:59:41,944][INFO ][plugins                  ] [Node ES #1] loaded 
[], sites []
[2014-03-12 21:59:47,262][INFO ][node                     ] [Node ES #1] 
initialized
[2014-03-12 21:59:47,263][INFO ][node                     ] [Node ES #1] 
starting ...
[2014-03-12 21:59:47,485][INFO ][transport                ] [Node ES #1] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/172.16.0.100:9300]}
[2014-03-12 21:59:57,573][INFO ][cluster.service          ] [Node ES #1] 
new_master [Node ES 
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}, reason: 
zen-disco-join (elected_as_master)
[2014-03-12 21:59:57,657][INFO ][discovery                ] [Node ES #1] 
logstash/LbMQazWXR9uB6Q7R2xLxGQ
[2014-03-12 21:59:57,733][INFO ][http                     ] [Node ES #1] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/172.16.0.100:9200]}
[2014-03-12 21:59:57,735][INFO ][node                     ] [Node ES #1] started
[2014-03-12 21:59:59,569][INFO ][gateway                  ] [Node ES #1] 
recovered [2] indices into cluster_state



 Then I start the second node :
root@lada:/var/lib/lxc/kibana# date && lxc-start -n es_node2 -d
mercredi 12 mars 2014, 23:02:59 (UTC+0100)



Logon on the second node, and open the log :
[2014-03-12 22:03:02,126][INFO ][node                     ] [Node ES #2] 
version[0.90.12], pid[1128], build[26feed7/2014-02-25T15:38:23Z]
[2014-03-12 22:03:02,127][INFO ][node                     ] [Node ES #2] 
initializing ...
[2014-03-12 22:03:02,141][INFO ][plugins                  ] [Node ES #2] loaded 
[], sites []
[2014-03-12 22:03:07,352][INFO ][node                     ] [Node ES #2] 
initialized
[2014-03-12 22:03:07,352][INFO ][node                     ] [Node ES #2] 
starting ...
[2014-03-12 22:03:07,557][INFO ][transport                ] [Node ES #2] 
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address 
{inet[/172.16.0.101:9300]}
[2014-03-12 22:03:17,637][INFO ][cluster.service          ] [Node ES #2] 
new_master [Node ES 
#2][0nNCsZrFS6y95G1ld-v_rA][inet[/172.16.0.101:9300]]{master=true}, reason: 
zen-disco-join (elected_as_master)
[2014-03-12 22:03:17,718][INFO ][discovery                ] [Node ES #2] 
logstash/0nNCsZrFS6y95G1ld-v_rA
[2014-03-12 22:03:17,783][INFO ][http                     ] [Node ES #2] 
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address 
{inet[/172.16.0.101:9200]}
[2014-03-12 22:03:17,785][INFO ][node                     ] [Node ES #2] started
[2014-03-12 22:03:19,550][INFO ][gateway                  ] [Node ES #2] 
recovered [2] indices into cluster_state
[2014-03-12 22:03:52,709][WARN ][discovery.zen.ping.multicast] [Node ES #2] 
received ping response ping_response{target [[Node ES 
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], master 
[[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], 
cluster_name[logstash]} with no matching id [1]


At that point, each node considered themselves as master.

Here's my configuration for each node (same for node 1, except the node.name) :
cluster.name: logstash
node.name: "Node ES #2"
node.master: true
node.data: true
index.number_of_shards: 2
index.number_of_replicas: 1
discovery.zen.ping.timeout: 10s

The bridge on my host is setup to forward immediately every new interfaces so I 
don't think the problem is here. Here's the bridge config :
auto br1
iface br1 inet static
        address 172.16.0.254
        netmask 255.255.255.0
        bridge_ports regex veth_.*
        bridge_spt off
        bridge_maxwait 0<span style="color: #000;" cl
...
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d86aec61-e851-48ac-a7fb-fae757f3eebe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53217724.238e1f29.158d%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Re: Node not joining cluster on boot

Reply via email to