Re: Replication timeouts

Michael Salmon Mon, 28 Apr 2014 05:15:08 -0700

[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05] 
[ds_clearcase-vob-heat-analyzer][2] sending failed shard for 
[ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R], 
s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to 
start shard, message 
[RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery 
failed from 
[eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC}
 
into 
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}];
 
nested: 
RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]];
 
nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2] 
Phase[2] Execution failed]; nested: 
ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog]
 
request_id [6809886] timed out after [900000ms]]; ]]
[2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05] 
[ds_clearcase-vob-heat-analyzer][0] failed to start shard
org.elasticsearch.indices.recovery.RecoveryFailedException: 
[ds_clearcase-vob-heat-analyzer][0]: Recovery failed from 
[eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC}
 
into 
[eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}
at 
org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307)
at 
org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65)
at 
org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.RemoteTransportException: 
[eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery]
Caused by: org.elasticsearch.index.engine.RecoveryEngineException: 
[ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed
at 
org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098)
at 
org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627)
at 
org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117)
at 
org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337)
at 
org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: 
[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog] 
request_id [154592652] timed out after [900000ms]
at 
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356)
... 3 more


The river has been running for some time copying new documents from the db 
into es. The problem as I see it is that it is too big to copy in 15 
minutes.

/Michael

On Monday, 28 April 2014 12:48:22 UTC+2, Jörg Prante wrote:
>
> Is it possible to post the full timeout exception?
>
> Do you run JDBC river with replica level 0 and add replica later after 
> river completion?
>
> I saw this in the past and I'm not sure if this is related to tight 
> resources.
>
> In the next JDBC river version there will be more convenient control of 
> bulk index settings (automatic replica level 0, refresh disabling, 
> re-enabling of refresh & replica afterwards).
>
> Jörg
>
>
>
> On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon 
> <michael...@inovia.nu<javascript:>
> > wrote:
>
>> I have started getting some timeouts during replication and I am unsure 
>> of how to proceed. The index is about 500 million documents or 45GB spread 
>> over 8 shards and created by a jdbc river. The timeout is occurring 
>> during index/shard/recovery/prepareTranslog. It seems that the limit of 15 
>> minutes is hard coded or I would have tried changing that.
>>
>> There are several parameters relate to index recovery but I'm not sure 
>> how they affect performance. Has anyone any suggestions?
>>
>> /Michael
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c41772d0-7251-4f74-81b1-7f1058ed24f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replication timeouts

Reply via email to