Re: Failing Replica Shards

2014-11-30 Thread Jakub Podeszwik
Small mistake. 1. should be:
1. If shard had more than one segment then optimizing it to one segment 
usually worked.

On Sunday, 30 November 2014 12:00:37 UTC+1, Jakub Podeszwik wrote:
>
> I've had similar problems. Two things that helped:
> 1. If index had more than one shard then optimizing it to one shard 
> usually worked.
> 2. In other case manually copying shard files from node with master shard 
> to one of nodes that kept failing.
>
> On Sunday, 30 November 2014 00:57:02 UTC+1, David Kleiner wrote:
>>
>> Hello Mehmet,
>>
>> For two indices with problematic shards (symptoms: shard is recovering, 
>> recovery stops and recovery is attempted on a different node), I manually 
>> changed replica count to 1 then 2.  With a big index (over 90G, I think), I 
>> was never able to recover dual replica set, thankfully it was OK to drop 
>> it.  Upgrading to more recent ES version helped too. 
>>
>> HTH,
>>
>> David
>>
>> On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün 
>> wrote:
>>>
>>> Hey David, I have same problem now. Have you found a solution for that 
>>> problem?
>>>
>>> 26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:

 Hello,

 In the past couple of days I've been getting a lot of error messages 
 about corrupted replica shards.  The primary shards come up fast after ES 
 process restart but replicas take a long time to come back. Sometimes it 
 takes a few node restarts to 'kick' the nodes to start replica shards.

 ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
 3-way cluster with 4 logstash feeders hanging off it. 

 Here are the errors;

 [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
 Salvador Dali] [downloader-2014.08][4] received shard failed for 
 [downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
 s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
 failure, message [corrupted preexisting 
 index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
 [corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
 footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
 (resource: 
 NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))
 [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
 Salvador Dali] [eventlog-2014.06][0] received shard failed for 
 [eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
 indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
 [corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
 Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
 CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
 footer=-1071082520 (resource: 
 NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))
 [2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
 Salvador Dali] [eventlog-2014.07][0] received shard failed for 
 [eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
 indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
 [corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
 Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
 CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
 footer=-1071082520 (resource: 
 NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))

 

 Thanks,

 David

>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/bef48895-f1ec-41d3-9f3c-6009723f103b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Failing Replica Shards

2014-11-30 Thread Jakub Podeszwik
I've had similar problems. Two things that helped:
1. If index had more than one shard then optimizing it to one shard usually 
worked.
2. In other case manually copying shard files from node with master shard 
to one of nodes that kept failing.

On Sunday, 30 November 2014 00:57:02 UTC+1, David Kleiner wrote:
>
> Hello Mehmet,
>
> For two indices with problematic shards (symptoms: shard is recovering, 
> recovery stops and recovery is attempted on a different node), I manually 
> changed replica count to 1 then 2.  With a big index (over 90G, I think), I 
> was never able to recover dual replica set, thankfully it was OK to drop 
> it.  Upgrading to more recent ES version helped too. 
>
> HTH,
>
> David
>
> On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün 
> wrote:
>>
>> Hey David, I have same problem now. Have you found a solution for that 
>> problem?
>>
>> 26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:
>>>
>>> Hello,
>>>
>>> In the past couple of days I've been getting a lot of error messages 
>>> about corrupted replica shards.  The primary shards come up fast after ES 
>>> process restart but replicas take a long time to come back. Sometimes it 
>>> takes a few node restarts to 'kick' the nodes to start replica shards.
>>>
>>> ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
>>> 3-way cluster with 4 logstash feeders hanging off it. 
>>>
>>> Here are the errors;
>>>
>>> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
>>> Salvador Dali] [downloader-2014.08][4] received shard failed for 
>>> [downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
>>> s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
>>> failure, message [corrupted preexisting 
>>> index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
>>> [corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
>>> footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
>>> (resource: 
>>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))
>>> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
>>> Salvador Dali] [eventlog-2014.06][0] received shard failed for 
>>> [eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
>>> indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
>>> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
>>> Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
>>> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
>>> footer=-1071082520 (resource: 
>>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))
>>> [2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
>>> Salvador Dali] [eventlog-2014.07][0] received shard failed for 
>>> [eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
>>> indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
>>> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
>>> Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
>>> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
>>> footer=-1071082520 (resource: 
>>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))
>>>
>>> 
>>>
>>> Thanks,
>>>
>>> David
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/53898508-c45d-4908-a93f-a383941ff61e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Failing Replica Shards

2014-11-29 Thread David Kleiner
Hello Mehmet,

For two indices with problematic shards (symptoms: shard is recovering, 
recovery stops and recovery is attempted on a different node), I manually 
changed replica count to 1 then 2.  With a big index (over 90G, I think), I 
was never able to recover dual replica set, thankfully it was OK to drop 
it.  Upgrading to more recent ES version helped too. 

HTH,

David

On Saturday, November 29, 2014 2:48:45 AM UTC-8, Mehmet Cem Güntürkün wrote:
>
> Hey David, I have same problem now. Have you found a solution for that 
> problem?
>
> 26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:
>>
>> Hello,
>>
>> In the past couple of days I've been getting a lot of error messages 
>> about corrupted replica shards.  The primary shards come up fast after ES 
>> process restart but replicas take a long time to come back. Sometimes it 
>> takes a few node restarts to 'kick' the nodes to start replica shards.
>>
>> ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
>> 3-way cluster with 4 logstash feeders hanging off it. 
>>
>> Here are the errors;
>>
>> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
>> Salvador Dali] [downloader-2014.08][4] received shard failed for 
>> [downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
>> s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
>> failure, message [corrupted preexisting 
>> index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
>> [corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
>> footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
>> (resource: 
>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))
>> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
>> Salvador Dali] [eventlog-2014.06][0] received shard failed for 
>> [eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
>> indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
>> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
>> Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
>> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
>> footer=-1071082520 (resource: 
>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))
>> [2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
>> Salvador Dali] [eventlog-2014.07][0] received shard failed for 
>> [eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
>> indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
>> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
>> Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
>> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
>> footer=-1071082520 (resource: 
>> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))
>>
>> 
>>
>> Thanks,
>>
>> David
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/52c4fa13-32aa-4f60-bda9-c8e999ee0d2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Failing Replica Shards

2014-11-29 Thread Mehmet Cem Güntürkün
Hey David, I have same problem now. Have you found a solution for that 
problem?

26 Ağustos 2014 Salı 23:08:55 UTC+3 tarihinde David Kleiner yazdı:
>
> Hello,
>
> In the past couple of days I've been getting a lot of error messages about 
> corrupted replica shards.  The primary shards come up fast after ES process 
> restart but replicas take a long time to come back. Sometimes it takes a 
> few node restarts to 'kick' the nodes to start replica shards.
>
> ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
> 3-way cluster with 4 logstash feeders hanging off it. 
>
> Here are the errors;
>
> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
> Salvador Dali] [downloader-2014.08][4] received shard failed for 
> [downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
> s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
> failure, message [corrupted preexisting 
> index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
> [corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
> footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
> (resource: 
> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))
> [2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
> Salvador Dali] [eventlog-2014.06][0] received shard failed for 
> [eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
> indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
> Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
> footer=-1071082520 (resource: 
> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))
> [2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
> Salvador Dali] [eventlog-2014.07][0] received shard failed for 
> [eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
> indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
> [corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
> Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
> CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
> footer=-1071082520 (resource: 
> NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))
>
> 
>
> Thanks,
>
> David
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/04a6e42a-0518-47ef-81a2-b59856a8a309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Failing Replica Shards

2014-08-26 Thread David Kleiner
Hello,

In the past couple of days I've been getting a lot of error messages about 
corrupted replica shards.  The primary shards come up fast after ES process 
restart but replicas take a long time to come back. Sometimes it takes a 
few node restarts to 'kick' the nodes to start replica shards.

ES version is 1.3.1 running on CentOS 6.5 hosted at Softlayer.  It's a 
3-way cluster with 4 logstash feeders hanging off it. 

Here are the errors;

[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [downloader-2014.08][4] received shard failed for 
[downloader-2014.08][4], node[l9-BQTHSSF-ElhgpPBZ24w], [R], 
s[INITIALIZING], indexUUID [2vRrb5YlQP6MTVr1chOezg], reason [engine 
failure, message [corrupted preexisting 
index][CorruptIndexException[[downloader-2014.08][4] Corrupted index 
[corrupted_SkU0-ZHZRxivSnGczABb_g] caused by: CorruptIndexException[codec 
footer mismatch: actual footer=-1676705023 vs expected footer=-1071082520 
(resource: 
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/downloader-2014.08/4/index/_k9a_es090_0.doc"))
[2014-08-26 15:01:18,682][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.06][0] received shard failed for 
[eventlog-2014.06][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [jbvChdRrRB6HTutxPvxMmQ], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.06][0] 
Corrupted index [corrupted__712QIBQQqafzpBoQwZtcg] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.06/0/index/_1k4x.nvd"))
[2014-08-26 15:01:18,684][WARN ][cluster.action.shard ] [log03 / 
Salvador Dali] [eventlog-2014.07][0] received shard failed for 
[eventlog-2014.07][0], node[l9-BQTHSSF-ElhgpPBZ24w], [R], s[INITIALIZING], 
indexUUID [T4tTXkPjTaCdSVNTjHfOcg], reason [engine failure, message 
[corrupted preexisting index][CorruptIndexException[[eventlog-2014.07][0] 
Corrupted index [corrupted_OzfNRRGyTIq8a1PRhLYG2w] caused by: 
CorruptIndexException[codec footer mismatch: actual footer=0 vs expected 
footer=-1071082520 (resource: 
NIOFSIndexInput(path="/acc/ES/NBS/nodes/0/indices/eventlog-2014.07/0/index/_rqf.nvd"))



Thanks,

David

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/c0af53fb-6fdd-4624-bf6c-9b9d50081689%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.