My last test ran into simalar problems, even if an master was available. 
Let me shortly explain the scenario: 2 node-es-cluster, node 1 (isetta) has 
less heap configured, node 2 (amnesia) has much more heap. The application 
event-collector@amnesia used node-client and sends bulkRequest. The test 
ran several hours, but the isetta runs into an heap issue. Here the 
event-collector application log:

isetta runs into problem and application hangs. Node 1 amnesia still is 
available.
2014-11-29 07:09:28,546 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:28,546 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Much later I terminated node 1 isetta by killing the process:
2014-11-29 09:45:00,590 WARN  
[elasticsearch[event-collector/27768@amnesia][transport_client_worker][T#3]{New 
I/O worker #5}] org.elasticsearch.transport.netty: 
[event-collector/27768@amnesia] exception caught on transport layer [[id: 
0x36217255, /139.2.246.36:54716 => /139.2.247.65:9300]], closing connection
java.io.IOException: Eine vorhandene Verbindung wurde vom Remotehost 
geschlossen
    at sun.nio.ch.SocketDispatcher.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
    at sun.nio.ch.IOUtil.read(IOUtil.java:192)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
    at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
    at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
    at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
    at 
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
    at 
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at 
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at 
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
(...)
2014-11-29 09:45:02,509 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 09:45:02,571 INFO  
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]] 
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed 
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],}, 
reason: zen-disco-receive(from master 
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Now the failing node is removed, but the application sill hangs. The ES 
dafault configuration is used (I changed cluster-name only), also there are 
no settings to node-client (except cluster-name). Can you give a hint, how 
I should configure the application client? 

Markus




This is expected behavior.
>
> When there are not enough master nodes, and the cluster nodes wait for a 
> new master, the cluster is blocked and all clients hang or get 
> SERVICE_UNAVAILABLE ClusterBlockException after a timeout.
>
> From client side, you can play with fault detection response timeout in 
> the discovery (node client) or TCP timeouts (transport client) in order to 
> continue.
>
> Jörg
>
>
> On Fri, Nov 28, 2014 at 3:09 PM, <[email protected] <javascript:>> wrote:
>
>> While testing how to handle es-cluster connectivity issues I ran into a 
>> serious problem. The java api node client is connected and then the ES 
>> server is killed. The application hangs in some bulkRequest, but this call 
>> never returns. It also does not return, even if the cluster was started. On 
>> console this exception is shown:
>>
>> Exception in thread 
>> "elasticsearch[event-collector/12240@amnesia][generic][T#2]" 
>> org.elasticsearch.cluster.block.ClusterBlockException: blocked by: 
>> [SERVICE_UNAVAILABLE/1/state not recovered / 
>> initialized];[SERVICE_UNAVAILABLE/2/no master];
>>     at 
>> org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
>>     at 
>> org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
>>     at 
>> org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
>>     at 
>> org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
>>     at 
>> org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
>>     at 
>> org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>     at java.lang.Thread.run(Thread.java:745)
>>
>> I am wondering that this scenario does not work. Any other scenario e.g. 
>> shutdown 1-of-2 nodes is transparently handled. But now the client 
>> application seems hanging for ever.
>>
>> And ideas?
>>
>> regards,
>> markus
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2385c092-53dd-4a38-8a99-b61493d1f5a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to