Re: (ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed

Hui Thu, 13 Mar 2014 20:22:20 -0700

Hi Echin,

Since the problem node ip is defined in the client es connection by JAVA 
API, I guess the client will still try to connect to this node. So, there 
are such warnings.


It should be fine for client to keep working with the cluster. However, in 
my case, the java client is not reachable and timeout(through HTTP 
protocol).

I will try to create a testing cluster with same settings to test does the 
client work fine in this condition.

Thanks.

On Thursday, March 13, 2014 11:54:53 PM UTC+8, echin1999 wrote:
>
> One more thing - I notice that functionally, the client is still able to 
> communicate to the remaining active node.  so I guess this warning is just 
> a "warning".  must be some background thread that periodically looks for 
> the missing node, while the main Client instance can still communicate to 
> the active node.    would you be able to verify if its merely a warning for 
> you?  if so, i might just not worry about this for now.
>
> On Thursday, March 13, 2014 5:15:36 AM UTC-4, Hui wrote:
>>
>> Hi Dome, 
>>
>> Do you mean the service of 10.1.4.196 is not open? Yes, the service 
>> should be stopped when it was rebooted.
>>
>> But the master node 10.1.4.197 has removed the problem node 10.1.4.196 
>> when it cannot ping the machine 10.1.4.196.
>>
>> The cluster should be fine after this operation. Do I understand it 
>> wrongly?
>>
>> Thanks
>>
>> On Thursday, March 13, 2014 4:48:17 PM UTC+8, Dome.C.Wei wrote:
>>>
>>> That must be the service not open.
>>>
>>> 在 2014年3月13日星期四UTC+8下午2时10分22秒，Hui写道：
>>>>
>>>> Hi Mark,
>>>>
>>>> Thanks for replying.
>>>>
>>>> The master (10.1.4.197) and other nodes can be reached while the 
>>>> problem node(10.1.4.196) is not reachable.
>>>> So, we can see the cluster status at that moment
>>>>
>>>>  "status" : "yellow",
>>>>   "timed_out" : false,
>>>>   "unassigned_shards" : 0,
>>>>
>>>>
>>>> On Thursday, March 13, 2014 2:03:44 PM UTC+8, Mark Walkom wrote:
>>>>>
>>>>> It looks like a networking issue, at least based on "No route to host" 
>>>>> in the error.
>>>>> Can you ping the master when this is happening, what about doing a 
>>>>> telnet test?
>>>>>
>>>>> Regards,
>>>>> Mark Walkom
>>>>>
>>>>> Infrastructure Engineer
>>>>> Campaign Monitor
>>>>> email: ma...@campaignmonitor.com
>>>>> web: www.campaignmonitor.com
>>>>>
>>>>>
>>>>> On 13 March 2014 16:54, Hui <dannyh...@gmail.com> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>>
>>>>>> This is the log for the case.
>>>>>>
>>>>>>
>>>>>> The node 10.1.4.196 is removed at 14:08 due to machine reboot, the 
>>>>>> client keeps trying to connect to the elasticsearch cluster but fails.
>>>>>>
>>>>>> Master Node : 
>>>>>> [2014-03-08 14:08:26,531][INFO ][cluster.service          ] 
>>>>>> [10.1.4.197:9202] removed 
>>>>>> {[10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]],}, 
>>>>>> reason: 
>>>>>> zen-disco-node_failed([10.1.4.196:9202][_sJrum34QWGqEkv8CvAtow][inet[/10.1.4.196:9302]]),
>>>>>>  reason failed to ping, tried [3] times, each with maximum [30s] timeout
>>>>>>
>>>>>>
>>>>>> Client : 
>>>>>> 2014-03-08 14:15:36,184 WARN  org.elasticsearch.transport.netty - 
>>>>>> [Bulldozer] exception caught on transport layer [[id: 0x50dc218f]], 
>>>>>> closing connection
>>>>>> java.net.NoRouteToHostException: No route to host
>>>>>>
>>>>>>
>>>>>> (The cluster health at this moment is Yellow and there is no unassigned 
>>>>>> shard.)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The node is back at 14:25, the client can successfully connected to the 
>>>>>> cluster again.
>>>>>>
>>>>>> Client :
>>>>>>
>>>>>> 2014-03-08 14:25:20,597 WARN  org.elasticsearch.transport.netty - 
>>>>>> [Bulldozer] exception caught on transport layer [[id: 0xf24d85d7]], 
>>>>>> closing connection
>>>>>> java.net.NoRouteToHostException: No route to host
>>>>>>
>>>>>>
>>>>>> Master Node :
>>>>>>
>>>>>> [2014-03-08 14:25:57,984][INFO ][cluster.service          ] 
>>>>>> [10.1.4.197:9202] added 
>>>>>> {[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]],}, 
>>>>>> reason: zen-disco-receive(join from 
>>>>>> node[[10.1.4.196:9202][rFZ7k7XSSY231EgPoDfmFw][inet[/10.1.4.196:9302]]])
>>>>>>
>>>>>>
>>>>>> (The cluster health at this moment is Green.)
>>>>>>
>>>>>> In the above case, the client should be able to connect to the cluster 
>>>>>> even a node is removed from the cluster.
>>>>>>
>>>>>>
>>>>>> For the client, the connection is created as followings : 
>>>>>>
>>>>>>
>>>>>>         Settings settings = ImmutableSettings.settingsBuilder()
>>>>>>                 .put("cluster.name", "clustername")
>>>>>>
>>>>>>                 .put("client.transport.sniff", true)
>>>>>>
>>>>>>
>>>>>>                 .build();
>>>>>>         
>>>>>>
>>>>>>         TransportClient client = new TransportClient(settings);
>>>>>>
>>>>>>         client.addTransportAddress(new InetSocketTransportAddress(
>>>>>>                 "10.1.4.195" /* hostname */, 9300 /* port */));
>>>>>>
>>>>>>         client.addTransportAddress(new InetSocketTransportAddress(
>>>>>>
>>>>>> "10.1.4.196" /* hostname */, 9300 /* port */)); 
>>>>>>  client.addTransportAddress(new InetSocketTransportAddress(
>>>>>> "10.1.4.197" /* hostname */, 9300 /* port */));
>>>>>>
>>>>>> The master node is 10.1.4.197 while the node being removed is 
>>>>>> 10.1.4.196.
>>>>>>
>>>>>> For the cluster setting, all setting is using the default except the 
>>>>>> the discovery.zen.minimum_master_nodes which is set to 3.
>>>>>>
>>>>>> Is there any problem for the above setting which cause this issue?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to elasticsearc...@googlegroups.com.
>>>>>> To view this discussion on the web visit 
>>>>>> https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/b1f3adf5-723b-49aa-bffe-674c5ce930e5%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e8973948-06fb-4675-a257-b28bbb09242d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: (ES 0.90.1) Cannot connect to elasticsearch cluster after a node is removed

Reply via email to