Re: Losing data after Elasticsearch restart

Rohit Jaiswal Wed, 18 Jun 2014 20:13:26 -0700

Hi Alexander,
               We sent you the stack trace. Can you please enlighten us on
this?


Thanks,
Rohit


On Mon, Jun 16, 2014 at 10:25 AM, Rohit Jaiswal <rohit.jais...@gmail.com>
wrote:

> Hi Alexander,
>                         Thanks for your reply. We plan to upgrade in the
> long run, however we need to fix the data loss problem on 0.90.2 in the
> immediate term.
>
> Here is the stack trace -
>
>
> 10:09:37.783 PM
>
> [22:09:37,783][WARN ][indices.cluster          ] [Storm]
> [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
> org.elasticsearch.indices.recovery.RecoveryFailedException:
> [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
> Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
> [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
>     at
> org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
>     at
> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
>     at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
>     at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
>     at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
>     at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
> Caused by: org.elasticsearch.indices.InvalidAliasNameException:
> [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
> alias name was passed to alias Filter
>     at
> org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
>     at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> [22:09:37,799][WARN ][cluster.action.shard     ] [Storm] sending failed
> shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
> node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
> shard, message
> [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
> failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
> into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
> RemoteTransportException[[Jeffrey 
> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
> nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
> Phase[2] Execution failed]; nested:
> RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
> nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
> Invalid alias name
> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
> alias name was passed to alias Filter]; ]]
> [22:09:38,025][WARN ][indices.cluster          ] [Storm]
> [b7a76aa06cfd4048987d1117f3e0433a][0] failed to start shard
> org.elasticsearch.indices.recovery.RecoveryFailedException:
> [b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery failed from [Jeffrey
> Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]] into
> [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget.doRecovery(RecoveryTarget.java:293)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget.access$300(RecoveryTarget.java:62)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$2.run(RecoveryTarget.java:163)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> Caused by: org.elasticsearch.transport.RemoteTransportException: [Jeffrey
> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]
> Caused by: org.elasticsearch.index.engine.RecoveryEngineException:
> [b7a76aa06cfd4048987d1117f3e0433a][0] Phase[2] Execution failed
>     at
> org.elasticsearch.index.engine.robin.RobinEngine.recover(RobinEngine.java:1147)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.recover(InternalIndexShard.java:526)
>     at
> org.elasticsearch.indices.recovery.RecoverySource.recover(RecoverySource.java:116)
>     at
> org.elasticsearch.indices.recovery.RecoverySource.access$1600(RecoverySource.java:60)
>     at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:328)
>     at
> org.elasticsearch.indices.recovery.RecoverySource$StartRecoveryTransportRequestHandler.messageReceived(RecoverySource.java:314)
>     at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
> Caused by: org.elasticsearch.transport.RemoteTransportException:
> [Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]
> Caused by: org.elasticsearch.indices.InvalidAliasNameException:
> [b7a76aa06cfd4048987d1117f3e0433a] Invalid alias name
> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
> alias name was passed to alias Filter
>     at
> org.elasticsearch.index.aliases.IndexAliasesService.aliasFilter(IndexAliasesService.java:99)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.prepareDeleteByQuery(InternalIndexShard.java:382)
>     at
> org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryOperation(InternalIndexShard.java:628)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:447)
>     at
> org.elasticsearch.indices.recovery.RecoveryTarget$TranslogOperationsRequestHandler.messageReceived(RecoveryTarget.java:416)
>     at
> org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:265)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
> Source)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>     at java.lang.Thread.run(Unknown Source)
>
> [22:09:38,042][WARN ][cluster.action.shard     ] [Storm] sending failed
> shard for [b7a76aa06cfd4048987d1117f3e0433a][0],
> node[FiW6mbR5ThqqSii5Wc28lQ], [R], s[INITIALIZING], reason [Failed to start
> shard, message
> [RecoveryFailedException[[b7a76aa06cfd4048987d1117f3e0433a][0]: Recovery
> failed from [Jeffrey Mace][_jjr5BYJQjO6QzzheyDmhw][inet[/10.4.35.200:9300]]
> into [Storm][FiW6mbR5ThqqSii5Wc28lQ][inet[/10.4.40.95:9300]]]; nested:
> RemoteTransportException[[Jeffrey 
> Mace][inet[/10.4.35.200:9300]][index/shard/recovery/startRecovery]];
> nested: RecoveryEngineException[[b7a76aa06cfd4048987d1117f3e0433a][0]
> Phase[2] Execution failed]; nested:
> RemoteTransportException[[Storm][inet[/10.4.40.95:9300]][index/shard/recovery/translogOps]];
> nested: InvalidAliasNameException[[b7a76aa06cfd4048987d1117f3e0433a]
> Invalid alias name
> [1a4077872e41c0634cee780c1e5fc263bdd5f14b15ac9239480547ab2d3601eb], Unknown
> alias name was passed to alias Filter]; ]]
>
>
> Let us know..
>
> Thanks,
> Rohit
>
>
> On Mon, Jun 16, 2014 at 6:13 AM, Alexander Reelsen <a...@spinscale.de>
> wrote:
>
>> Hey,
>>
>> without stack traces it is pretty hard to see the actual problem, do you
>> have them around (on one node this exception has happened, so it should
>> have been logged into the elasticsearch logfile as well). Also, you should
>> really upgrade if possible, as releases after 0.90.2 have seen many many
>> improvements.
>>
>>
>> --Alex
>>
>>
>> On Mon, Jun 9, 2014 at 4:15 AM, Rohit Jaiswal <rohit.jais...@gmail.com>
>> wrote:
>>
>>> Hello Everyone,
>>>                          We lost data after restarting Elasticsearch
>>> cluster. Restarting is a part of deploying our software stack.
>>>
>>>                          We have a 20-node cluster running 0.90.2 and we
>>> have Splunk configured to index ES logs.
>>>
>>>                          Looking at the Splunk logs, we could find the
>>> following *error a day before the deployment* (restart) -
>>>
>>>                 [cluster.action.shard     ] [Rictor] sending failed shard 
>>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], 
>>> [R], s[STARTED], reason
>>>
>>>                 [Failed to perform [bulk/shard] on replica, message 
>>> [RemoteTransportException; nested: 
>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>
>>>
>>>                 [cluster.action.shard     ] [Kiss] received shard failed 
>>> for [c0a71ddaa70b463a9a179c36c7fc26e3][2], node[nJvnclczRNaLbETunjlcWw], 
>>> [R], s[STARTED], reason
>>>
>>>
>>>                 [Failed to perform [bulk/shard] on replica, message 
>>> [RemoteTransportException; nested: 
>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>
>>>
>>>
>>>                           Further,* a day after the deploy,* we see the
>>> same errors on another node -
>>>
>>>
>>>
>>>                 [cluster.action.shard     ] [Contrary] received shard 
>>> failed for [a58f9413315048ecb0abea48f5f6aae7][1], 
>>> node[3UbHwVCkQvO3XroIl-awPw], [R], s[STARTED], reason
>>>
>>>
>>>                 [Failed to perform [bulk/shard] on replica, message 
>>> [RemoteTransportException; nested: 
>>> ResponseHandlerFailureTransportException; nested: NullPointerException; ]]
>>>
>>>
>>>
>>>              *Immediately next, the following error is seen*. This error is 
>>> seen repeatedly on a couple of other nodes as well -
>>>
>>>                  failed to start shard
>>>
>>>
>>>
>>>                  [cluster.action.shard     ] [Copperhead] sending failed 
>>> shard for [a58f9413315048ecb0abea48f5f6aae7][0], 
>>> node[EuRzr3MLQiSS6lzTZJbiKw], [R], s[INITIALIZING],
>>>                  reason [Failed to start shard, message 
>>> [RecoveryFailedException[[a58f9413315048ecb0abea48f5f6aae7][0]: Recovery 
>>> failed from [Frank Castle][dlv2mPypQaOxLPQhHQ67Fw]
>>>
>>>
>>>                  [inet[/10.2.136.81:9300]] into 
>>> [Copperhead][EuRzr3MLQiSS6lzTZJbiKw][inet[/10.3.207.55:9300]]]; nested: 
>>> RemoteTransportException[[Frank Castle]
>>>                  
>>> [inet[/10.2.136.81:9300]][index/shard/recovery/startRecovery]]; nested: 
>>> RecoveryEngineException[[a58f9413315048ecb0abea48f5f6aae7][0] Phase[2] 
>>> Execution failed];
>>>
>>>
>>>                  nested: 
>>> RemoteTransportException[[Copperhead][inet[/10.3.207.55:9300]][index/shard/recovery/translogOps]];
>>>  nested: InvalidAliasNameException[[a58f9413315048ecb0abea48f5f6aae7]
>>>
>>> *         Invalid alias name 
>>> [fbf1e55418a2327d308e7632911f9bb8bfed58059dd7f1e4abd3467c5f8519c3], Unknown 
>>> alias name was passed to alias Filter]; ]]
>>>
>>> *
>>>
>>>
>>> *During this time, we could not access previously indexed documents.*
>>>              I looked up the alias error, looks like it is related to 
>>> https://github.com/elasticsearch/elasticsearch/issues/1198 (Delete By Query 
>>> wrongly persisted to translog # 1198),
>>>
>>>
>>>              but this should be fixed in ES 0.18.0 and, we are using 
>>> 0.90.2, so why is ES encountering this issue?
>>>
>>>              What do we need to do to set this right and get back lost 
>>> data? Please help.
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>>
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/00e54753-ab89-4f63-a39e-0931e8f7e2f0%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/2wUHvnd_lU4/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAGCwEM8yrprZNCpzNqOiDzaoFwqh6Dth23OSc1byZe81P7Ba9w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAP_rV8FrhSb%2BuDQdb26t3WwUOykB1HEY0q0pkchtKb-6_hboMA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Losing data after Elasticsearch restart

Reply via email to