did you set the same cluster name on both nodes? -- David Pilato | Technical Advocate | Elasticsearch.com @dadoonet | @elasticsearchfr
Le 13 mars 2014 à 09:57:35, Guillaume Loetscher (sterfi...@gmail.com) a écrit: Hi, First, thanks for the answers and remarks. You are both right, the issue I'm currently facing leads to a "split-brain" situation, where Node #1 & Node #2 are both master, and doing their own life on their side. I'll see to change my configuration and the number of node, in order to limit this situation (I already checked this article talking about split-brain in ES). However, this split-brain situation is the result of the problem with the discovery / broadcast, which is represented in the log of Node #2 here : [2014-03-12 22:03:52,709][WARN ][discovery.zen.ping.multicast] [Node ES #2] received ping response ping_response{target [[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], master [[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], cluster_name[logstash]} with no matching id [1] So, the connectivity between Node #1 (which is the first one online, and therefore master) and Node #2 is established, as the log on Node #2 clearly said "received ping response", but with an "ID that didn't match". This is apparently why Node #2 didn't join the cluster on Node #1, and this is this specific issue I want to resolve. Thanks, Le jeudi 13 mars 2014 07:03:35 UTC+1, David Pilato a écrit : Bonjour :-) You should set min_master_nodes to 2. Although I'd recommend having 3 nodes instead of 2. -- David ;-) Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs Le 12 mars 2014 à 23:58, Guillaume Loetscher <ster...@gmail.com> a écrit : Hi, I've begun to test Elasticsearch recently, on a little mockup I've designed. Currently, I'm running two nodes on two LXC (v0.9) containers. Those containers are linked using veth to a bridge declared on the host. When I start the first node, the cluster starts, but when I start the second node a bit later, it seems to get some information from the other node but it always ended with the same "no matchind id" error. Here's what I'm doing : I start the LXC container of the first node : root@lada:~# date && lxc-start -n es_node1 -d mercredi 12 mars 2014, 22:59:39 (UTC+0100) I logon the node, check the log file : [2014-03-12 21:59:41,927][INFO ][node ] [Node ES #1] version[0.90.12], pid[1129], build[26feed7/2014-02-25T15:38:23Z] [2014-03-12 21:59:41,928][INFO ][node ] [Node ES #1] initializing ... [2014-03-12 21:59:41,944][INFO ][plugins ] [Node ES #1] loaded [], sites [] [2014-03-12 21:59:47,262][INFO ][node ] [Node ES #1] initialized [2014-03-12 21:59:47,263][INFO ][node ] [Node ES #1] starting ... [2014-03-12 21:59:47,485][INFO ][transport ] [Node ES #1] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.16.0.100:9300]} [2014-03-12 21:59:57,573][INFO ][cluster.service ] [Node ES #1] new_master [Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}, reason: zen-disco-join (elected_as_master) [2014-03-12 21:59:57,657][INFO ][discovery ] [Node ES #1] logstash/LbMQazWXR9uB6Q7R2xLxGQ [2014-03-12 21:59:57,733][INFO ][http ] [Node ES #1] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.16.0.100:9200]} [2014-03-12 21:59:57,735][INFO ][node ] [Node ES #1] started [2014-03-12 21:59:59,569][INFO ][gateway ] [Node ES #1] recovered [2] indices into cluster_state Then I start the second node : root@lada:/var/lib/lxc/kibana# date && lxc-start -n es_node2 -d mercredi 12 mars 2014, 23:02:59 (UTC+0100) Logon on the second node, and open the log : [2014-03-12 22:03:02,126][INFO ][node ] [Node ES #2] version[0.90.12], pid[1128], build[26feed7/2014-02-25T15:38:23Z] [2014-03-12 22:03:02,127][INFO ][node ] [Node ES #2] initializing ... [2014-03-12 22:03:02,141][INFO ][plugins ] [Node ES #2] loaded [], sites [] [2014-03-12 22:03:07,352][INFO ][node ] [Node ES #2] initialized [2014-03-12 22:03:07,352][INFO ][node ] [Node ES #2] starting ... [2014-03-12 22:03:07,557][INFO ][transport ] [Node ES #2] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/172.16.0.101:9300]} [2014-03-12 22:03:17,637][INFO ][cluster.service ] [Node ES #2] new_master [Node ES #2][0nNCsZrFS6y95G1ld-v_rA][inet[/172.16.0.101:9300]]{master=true}, reason: zen-disco-join (elected_as_master) [2014-03-12 22:03:17,718][INFO ][discovery ] [Node ES #2] logstash/0nNCsZrFS6y95G1ld-v_rA [2014-03-12 22:03:17,783][INFO ][http ] [Node ES #2] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.16.0.101:9200]} [2014-03-12 22:03:17,785][INFO ][node ] [Node ES #2] started [2014-03-12 22:03:19,550][INFO ][gateway ] [Node ES #2] recovered [2] indices into cluster_state [2014-03-12 22:03:52,709][WARN ][discovery.zen.ping.multicast] [Node ES #2] received ping response ping_response{target [[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], master [[Node ES #1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], cluster_name[logstash]} with no matching id [1] At that point, each node considered themselves as master. Here's my configuration for each node (same for node 1, except the node.name) : cluster.name: logstash node.name: "Node ES #2" node.master: true node.data: true index.number_of_shards: 2 index.number_of_replicas: 1 discovery.zen.ping.timeout: 10s The bridge on my host is setup to forward immediately every new interfaces so I don't think the problem is here. Here's the bridge config : auto br1 iface br1 inet static address 172.16.0.254 netmask 255.255.255.0 bridge_ports regex veth_.* bridge_spt off bridge_maxwait 0<span style="color: #000;" cl ... -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d86aec61-e851-48ac-a7fb-fae757f3eebe%40googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53217724.238e1f29.158d%40MacBook-Air-de-David.local. For more options, visit https://groups.google.com/d/optout.