Re: (ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed

Hui Thu, 13 Mar 2014 02:16:49 -0700

Hi Dome, 

Do you mean the service of 10.1.4.196 is not open? Yes, the service should 
be stopped when it was rebooted.


But the master node 10.1.4.197 has removed the problem node 10.1.4.196 when 
it cannot ping the machine 10.1.4.196.

The cluster should be fine after this operation. Do I understand it wrongly?

Thanks

On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:
>
> That must be the service not open.
>
> 在 2014年3月13日星期四UTC+8下午2时10分22秒，Hui写道：
>>
>> Hi Mark,
>>
>> Thanks for replying.
>>
>> The master (10.1.4.197) and other nodes can be reached while the problem 
>> node(10.1.4.196) is not reachable.
>> So, we can see the cluster status at that moment
>>
>>  "status" : "yellow",
>>   "timed_out" : false,
>>   "unassigned_shards" : 0,
>>
>>
>> On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:
>>>
>>> It looks like a networking issue, at least based on "No route to host" 
>>> in the error.
>>> Can you ping the master when this is happening, what about doing a 
>>> telnet test?
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 13 March 2014 16:54, Hui <dannyh...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>>
>>>> This is the log for the case.
>>>>
>>>>
>>>> The node 10.1.4.196 is removed at 14:08 due to machine reboot, the client 
>>>> keeps trying to connect to the elasticsearch cluster but fails.
>>>>
>>>> Master Node : 
>>>> [2014-03-08 14:08:26,531][INFO ][cluster.service          ] 
>>>> [10.1.4.197:9202] removed 
>>>> {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],}, 
>>>> reason: 
>>>> zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]),
>>>>  reason failed to ping, tried [3] times, each with maximum [30s] timeout
>>>>
>>>>
>>>> Client : 
>>>> 2014-03-08 14:15:36,184 WARN  org.elasticsearch.transport.netty - 
>>>> [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], 
>>>> closing connection
>>>> java.net.NoRouteToHostException: No route to host
>>>>
>>>>
>>>> (The cluster health at this moment is Yellow and there is no unassigned 
>>>> shard.)
>>>>
>>>>
>>>>
>>>>
>>>> The node is back at 14:25, the client can successfully connected to the 
>>>> cluster again.
>>>>
>>>> Client :
>>>>
>>>> 2014-03-08 14:25:20,597 WARN  org.elasticsearch.transport.netty - 
>>>> [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], 
>>>> closing connection
>>>> java.net.NoRouteToHostException: No route to host
>>>>
>>>>
>>>> Master Node :
>>>>
>>>> [2014-03-08 14:25:57,984][INFO ][cluster.service          ] 
>>>> [10.1.4.197:9202] added 
>>>> {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, 
>>>> reason: zen-disco-receive(join from 
>>>> node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])
>>>>
>>>>
>>>> (The cluster health at this moment is Green.)
>>>>
>>>> In the above case, the client should be able to connect to the cluster 
>>>> even a node is removed from the cluster.
>>>>
>>>>
>>>> For the client, the connection is created as followings : 
>>>>
>>>>
>>>>         Settings settings = ImmutableSettings.settingsBuilder()
>>>>                 .put("cluster.name", "clustername")
>>>>
>>>>                 .put("client.transport.sniff", true)
>>>>
>>>>
>>>>                 .build();
>>>>         
>>>>
>>>>         TransportClient client = new TransportClient(settings);
>>>>
>>>>         client.addTransportAddress(new InetSocketTransportAddress(
>>>>                 "10.1.4.195" /* hostname */, 9300 /* port */));
>>>>
>>>>         client.addTransportAddress(new InetSocketTransportAddress(
>>>>
>>>> "10.1.4.196" /* hostname */, 9300 /* port */)); 
>>>>  client.addTransportAddress(new InetSocketTransportAddress(
>>>> "10.1.4.197" /* hostname */, 9300 /* port */));
>>>>
>>>> The master node is 10.1.4.197 while the node being removed is 
>>>> 10.1.4.196.
>>>>
>>>> For the cluster setting, all setting is using the default except the 
>>>> the discovery.zen.minimum_master_nodes which is set to 3.
>>>>
>>>> Is there any problem for the above setting which cause this issue?
>>>>
>>>> Thanks.
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4f6ec6ac-ac8b-4a09-b338-2d8c6e225777%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: (ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed

Reply via email to