Hi Alexander, We sent you the stack trace. Can you please enlighten us on this?
Thanks, Rohit On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal <rohit.jais...@gmail.com> wrote: > Hi Alexander, > Thanks for your reply. We plan to upgrade in the > long run, however we need to fix the data loss problem on 0.90.2 in the > immediate term. > > Here is the stack trace - > > > 10:09:37.783 PM > > [22:09:37,783][WARN ][indices.cluster ] [Storm] > [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard > org.elasticsearch.indices.recovery.RecoveryFailedException: > [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey > Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into > [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]] > at > org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293) > at > org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62) > at > org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey > Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery] > Caused by: org.elasticsearch.index.engine.RecoveryEngineException: > [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed > at > org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147) > at > org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526) > at > org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116) > at > org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60) > at > org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328) > at > org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314) > at > org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.elasticsearch.transport.RemoteTransportException: > [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps] > Caused by: org.elasticsearch.indices.InvalidAliasNameException: > [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name > [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown > alias name was passed to alias Filter > at > org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99) > at > org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382) > at > org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628) > at > org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447) > at > org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416) > at > org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > [22:09:37,799][WARN ][cluster.action.shard ] [Storm] sending failed > shard for [b7a76aa06cfd4048987d1117f3e0433a][0], > node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start > shard, message > [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery > failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] > into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested: > RemoteTransportException[[Jeffrey > Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]]; > nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0] > Phase[2] Execution failed]; nested: > RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]]; > nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a] > Invalid alias name > [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown > alias name was passed to alias Filter]; ]] > [22:09:38,025][WARN ][indices.cluster ] [Storm] > [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard > org.elasticsearch.indices.recovery.RecoveryFailedException: > [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey > Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into > [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]] > at > org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293) > at > org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62) > at > org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey > Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery] > Caused by: org.elasticsearch.index.engine.RecoveryEngineException: > [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed > at > org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147) > at > org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526) > at > org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116) > at > org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60) > at > org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328) > at > org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314) > at > org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > Caused by: org.elasticsearch.transport.RemoteTransportException: > [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps] > Caused by: org.elasticsearch.indices.InvalidAliasNameException: > [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name > [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown > alias name was passed to alias Filter > at > org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99) > at > org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382) > at > org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628) > at > org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447) > at > org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416) > at > org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown > Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > > [22:09:38,042][WARN ][cluster.action.shard ] [Storm] sending failed > shard for [b7a76aa06cfd4048987d1117f3e0433a][0], > node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start > shard, message > [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery > failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] > into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested: > RemoteTransportException[[Jeffrey > Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]]; > nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0] > Phase[2] Execution failed]; nested: > RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]]; > nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a] > Invalid alias name > [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown > alias name was passed to alias Filter]; ]] > > > Let us know.. > > Thanks, > Rohit > > > On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen <a...@spinscale.de> > wrote: > >> Hey, >> >> without stack traces it is pretty hard to see the actual problem, do you >> have them around (on one node this exception has happened, so it should >> have been logged into the elasticsearch logfile as well). Also, you should >> really upgrade if possible, as releases after 0.90.2 have seen many many >> improvements. >> >> >> --Alex >> >> >> On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal <rohit.jais...@gmail.com> >> wrote: >> >>> Hello Everyone, >>> We lost data after restarting Elasticsearch >>> cluster. Restarting is a part of deploying our software stack. >>> >>> We have a 20-node cluster running 0.90.2 and we >>> have Splunk configured to index ES logs. >>> >>> Looking at the Splunk logs, we could find the >>> following *error a day before the deployment* (restart) - >>> >>> [cluster.action.shard ] [Rictor] sending failed shard >>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], >>> [R], s[STARTED], reason >>> >>> [Failed to perform [bulk/shard] on replica, message >>> [RemoteTransportException; nested: >>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]] >>> >>> >>> [cluster.action.shard ] [Kiss] received shard failed >>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], >>> [R], s[STARTED], reason >>> >>> >>> [Failed to perform [bulk/shard] on replica, message >>> [RemoteTransportException; nested: >>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]] >>> >>> >>> >>> Further,* a day after the deploy,* we see the >>> same errors on another node - >>> >>> >>> >>> [cluster.action.shard ] [Contrary] received shard >>> failed for [a58f9413315048ecb0abea48f5f6aae7][1], >>> node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason >>> >>> >>> [Failed to perform [bulk/shard] on replica, message >>> [RemoteTransportException; nested: >>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]] >>> >>> >>> >>> *Immediately next, the following error is seen*. This error is >>> seen repeatedly on a couple of other nodes as well - >>> >>> failed to start shard >>> >>> >>> >>> [cluster.action.shard ] [Copperhead] sending failed >>> shard for [a58f9413315048ecb0abea48f5f6aae7][0], >>> node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING], >>> reason [Failed to start shard, message >>> [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery >>> failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw] >>> >>> >>> [inet[/10.2.136.81:9300]] into >>> [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: >>> RemoteTransportException[[Frank Castle] >>> >>> [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: >>> RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] >>> Execution failed]; >>> >>> >>> nested: >>> RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]]; >>> nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7] >>> >>> * Invalid alias name >>> [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown >>> alias name was passed to alias Filter]; ]] >>> >>> * >>> >>> >>> *During this time, we could not access previously indexed documents.* >>> I looked up the alias error, looks like it is related to >>> https://github.com/elasticsearch/elasticsearch/issues/1198 (Delete By Query >>> wrongly persisted to translog # 1198), >>> >>> >>> but this should be fixed in ES 0.18.0 and, we are using >>> 0.90.2, so why is ES encountering this issue? >>> >>> What do we need to do to set this right and get back lost >>> data? Please help. >>> >>> Thanks. >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearch+unsubscr...@googlegroups.com. >>> >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.