[2014-04-28 13:40:15,039][WARN ][cluster.action.shard] [eis05] [ds_clearcase-vob-heat-analyzer][2] sending failed shard for [ds_clearcase-vob-heat-analyzer][2], node[QyeTlW2YQbG27zrsdjBBGA], [R], s[INITIALIZING], indexUUID [ms7jQeuMQduNIHCmjxsKjQ], reason [Failed to start shard, message [RecoveryFailedException[[ds_clearcase-vob-heat-analyzer][2]: Recovery failed from [eis09][p8-_fzHeTR22pSlsBsYm8A][eis09.rnditlab.ericsson.se][inet[/137.58.184.239:9300]]{datacenter=PoCC} into [eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC}]; nested: RemoteTransportException[[eis09][inet[/137.58.184.239:9300]][index/shard/recovery/startRecovery]]; nested: RecoveryEngineException[[ds_clearcase-vob-heat-analyzer][2] Phase[2] Execution failed]; nested: ReceiveTimeoutTransportException[[eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog] request_id [6809886] timed out after [900000ms]]; ]] [2014-04-28 14:00:11,614][WARN ][indices.cluster] [eis05] [ds_clearcase-vob-heat-analyzer][0] failed to start shard org.elasticsearch.indices.recovery.RecoveryFailedException: [ds_clearcase-vob-heat-analyzer][0]: Recovery failed from [eis07][Q8ZWgDIXRGiUej1oMoH8Jg][eis07.rnditlab.ericsson.se][inet[/137.58.184.237:9300]]{datacenter=PoCC} into [eis05][QyeTlW2YQbG27zrsdjBBGA][eis05.rnditlab.ericsson.se][inet[eis05.rnditlab.ericsson.se/137.58.184.235:9300]]{datacenter=PoCC} at org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:307) at org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:65) at org.elasticsearch.indices.recovery.RecoveryTarget$3.run(RecoveryTarget.java:184) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.transport.RemoteTransportException: [eis07][inet[/137.58.184.237:9300]][index/shard/recovery/startRecovery] Caused by: org.elasticsearch.index.engine.RecoveryEngineException: [ds_clearcase-vob-heat-analyzer][0] Phase[2] Execution failed at org.elasticsearch.index.engine.internal.InternalEngine.recover(InternalEngine.java:1098) at org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:627) at org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:117) at org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:61) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:337) at org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:323) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.elasticsearch.transport.ReceiveTimeoutTransportException: [eis05][inet[/137.58.184.235:9300]][index/shard/recovery/prepareTranslog] request_id [154592652] timed out after [900000ms] at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:356) ... 3 more
The river has been running for some time copying new documents from the db into es. The problem as I see it is that it is too big to copy in 15 minutes. /Michael On Monday, 28 April 2014 12:48:22 UTC+2, Jörg Prante wrote: > > Is it possible to post the full timeout exception? > > Do you run JDBC river with replica level 0 and add replica later after > river completion? > > I saw this in the past and I'm not sure if this is related to tight > resources. > > In the next JDBC river version there will be more convenient control of > bulk index settings (automatic replica level 0, refresh disabling, > re-enabling of refresh & replica afterwards). > > Jörg > > > > On Mon, Apr 28, 2014 at 12:22 PM, Michael Salmon > <michael...@inovia.nu<javascript:> > > wrote: > >> I have started getting some timeouts during replication and I am unsure >> of how to proceed. The index is about 500 million documents or 45GB spread >> over 8 shards and created by a jdbc river. The timeout is occurring >> during index/shard/recovery/prepareTranslog. It seems that the limit of 15 >> minutes is hard coded or I would have tried changing that. >> >> There are several parameters relate to index recovery but I'm not sure >> how they affect performance. Has anyone any suggestions? >> >> /Michael >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/6c15c79b-b9c5-4f7f-98f1-68b692f86fc6%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c41772d0-7251-4f74-81b1-7f1058ed24f6%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.