Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Quick followup question, is it safe to run -fix while ES is also running on 
the node?  Understanding that some documents will be lost.

On Tuesday, March 24, 2015 at 10:24:26 AM UTC-4, mjd...@gmail.com wrote:
>
> Thanks for the CheckIndex info, that worked!  It looks like only one of 
> the segments in that shard has issues:
>
>   1 of 20: name=_1om docCount=216683
> codec=Lucene3x
> compound=false
> numFiles=10
> size (MB)=5,111.421
> diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, 
> source=merge, lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, 
> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.6.0_26, 
> java.vendor=Sun Microsystems Inc.}
> no deletions
> test: open reader.OK
> test: check integrity.OK
> test: check live docs.OK
> test: fields..OK [31 fields]
> test: field norms.OK [20 fields]
> test: terms, freq, prox...ERROR: java.lang.AssertionError: 
> index=216690, numBits=216683
> java.lang.AssertionError: index=216690, numBits=216683
> at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
> at 
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
> at 
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
> test: stored fields...OK [3033562 total field count; avg 14 fields 
> per doc]
> test: term vectorsOK [0 total vector count; avg 0 term/freq 
> vector fields per doc]
> test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
> 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
> FAILED
> WARNING: fixIndex() would remove reference to this segment; full 
> exception:
> java.lang.RuntimeException: Term Index test failed
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
>
> This is on ES 1.3.4, but the index I was running optimize on was likely 
> created back in 0.9 or 1.0.
>
> On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>>
>> Hmm, not good.
>>
>> Which version of ES?  Do you have a full stack trace for the exception?
>>
>> To run CheckIndex you need to add all ES jars to the classpath.  It's 
>> easiest to just use a wildcard for this, e.g.:
>>
>>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
>> ...
>>
>> Make sure you have the double quotes so the shell does not expand that 
>> wildcard!
>>
>> Mike McCandless
>>
>> On Mon, Mar 23, 2015 at 9:50 PM,  wrote:
>>
>>> I did an optimize on this index and it looks like it caused a shard to 
>>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>>> to light?
>>>
>>> On the node that reported the corrupted shard I tried shutting it down, 
>>> moving the shard out and then restarting. Unfortunately the next node that 
>>> got that shard then started with the same corruption issues.  The errors:
>>>
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>>> shard
>>> Mar 24 01:40:17 localhost 
>>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>>> [1-2013][0] failed to fetch index version after copying it over
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>>> indexUUID [_na_], reason [Failed to start shard, message 
>>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>>
>>> I tried using CheckIndex, but had this issue:
>>>
>>> java.lang.IllegalArgumentException: A SPI class of type 
>>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>>> You need to add the corresponding JAR file supporting this SPI to your 
>>> classpath.The current classpath supports the following names: [Pulsing41, 
>>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>>> FST41, FSTOrd41, Lucene40, Lucene41]
>>>
>>> When running with:
>>>
>>> java -cp 
>>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>>  
>>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>>
>>> I'm not a java programmer so after I tried other classpath combinations 
>>> I was out of ideas.
>>>
>>>
>>> Any tips?  Looking at _cat/shards the replica is currently marked

Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Thanks for the CheckIndex info, that worked!  It looks like only one of the 
segments in that shard has issues:

  1 of 20: name=_1om docCount=216683
codec=Lucene3x
compound=false
numFiles=10
size (MB)=5,111.421
diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, source=merge, 
lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, os.arch=amd64, 
mergeMaxNumSegments=-1, java.version=1.6.0_26, java.vendor=Sun Microsystems 
Inc.}
no deletions
test: open reader.OK
test: check integrity.OK
test: check live docs.OK
test: fields..OK [31 fields]
test: field norms.OK [20 fields]
test: terms, freq, prox...ERROR: java.lang.AssertionError: 
index=216690, numBits=216683
java.lang.AssertionError: index=216690, numBits=216683
at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
test: stored fields...OK [3033562 total field count; avg 14 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]
test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:
java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)

This is on ES 1.3.4, but the index I was running optimize on was likely 
created back in 0.9 or 1.0.

On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>
> Hmm, not good.
>
> Which version of ES?  Do you have a full stack trace for the exception?
>
> To run CheckIndex you need to add all ES jars to the classpath.  It's 
> easiest to just use a wildcard for this, e.g.:
>
>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
> ...
>
> Make sure you have the double quotes so the shell does not expand that 
> wildcard!
>
> Mike McCandless
>
> On Mon, Mar 23, 2015 at 9:50 PM, > wrote:
>
>> I did an optimize on this index and it looks like it caused a shard to 
>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>> to light?
>>
>> On the node that reported the corrupted shard I tried shutting it down, 
>> moving the shard out and then restarting. Unfortunately the next node that 
>> got that shard then started with the same corruption issues.  The errors:
>>
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>> shard
>> Mar 24 01:40:17 localhost 
>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>> [1-2013][0] failed to fetch index version after copying it over
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>> indexUUID [_na_], reason [Failed to start shard, message 
>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>
>> I tried using CheckIndex, but had this issue:
>>
>> java.lang.IllegalArgumentException: A SPI class of type 
>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>> You need to add the corresponding JAR file supporting this SPI to your 
>> classpath.The current classpath supports the following names: [Pulsing41, 
>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>> FST41, FSTOrd41, Lucene40, Lucene41]
>>
>> When running with:
>>
>> java -cp 
>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>  
>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>
>> I'm not a java programmer so after I tried other classpath combinations I 
>> was out of ideas.
>>
>>
>> Any tips?  Looking at _cat/shards the replica is currently marked 
>> "unassigned" while the primary is "initializing".  Thanks!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups

Re: corrupted shard after optimize

2015-03-24 Thread Michael McCandless
Hmm, not good.

Which version of ES?  Do you have a full stack trace for the exception?

To run CheckIndex you need to add all ES jars to the classpath.  It's
easiest to just use a wildcard for this, e.g.:

  java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex
...

Make sure you have the double quotes so the shell does not expand that
wildcard!

Mike McCandless

On Mon, Mar 23, 2015 at 9:50 PM,  wrote:

> I did an optimize on this index and it looks like it caused a shard to
> become corrupted.  Or maybe the optimize just brought the shard corruption
> to light?
>
> On the node that reported the corrupted shard I tried shutting it down,
> moving the shard out and then restarting. Unfortunately the next node that
> got that shard then started with the same corruption issues.  The errors:
>
> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN
> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start
> shard
> Mar 24 01:40:17 localhost
> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
> [1-2013][0] failed to fetch index version after copying it over
> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN
> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed
> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING],
> indexUUID [_na_], reason [Failed to start shard, message
> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index
> version after copying it over]; nested: CorruptIndexException[[1-2013][0]
> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by:
> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut:
> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>
> I tried using CheckIndex, but had this issue:
>
> java.lang.IllegalArgumentException: A SPI class of type
> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist.
> You need to add the corresponding JAR file supporting this SPI to your
> classpath.The current classpath supports the following names: [Pulsing41,
> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41,
> FST41, FSTOrd41, Lucene40, Lucene41]
>
> When running with:
>
> java -cp
> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>
> I'm not a java programmer so after I tried other classpath combinations I
> was out of ideas.
>
>
> Any tips?  Looking at _cat/shards the replica is currently marked
> "unassigned" while the primary is "initializing".  Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPhMOJWkN9p_En%2BWDM98bEDHSWTi36B_TcQsZSw%2BBKorYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


corrupted shard after optimize

2015-03-23 Thread mjdude5
I did an optimize on this index and it looks like it caused a shard to 
become corrupted.  Or maybe the optimize just brought the shard corruption 
to light?

On the node that reported the corrupted shard I tried shutting it down, 
moving the shard out and then restarting. Unfortunately the next node that 
got that shard then started with the same corruption issues.  The errors:

Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
shard
Mar 24 01:40:17 localhost 
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
[1-2013][0] failed to fetch index version after copying it over
Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
indexUUID [_na_], reason [Failed to start shard, message 
[IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]

I tried using CheckIndex, but had this issue:

java.lang.IllegalArgumentException: A SPI class of type 
org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
You need to add the corresponding JAR file supporting this SPI to your 
classpath.The current classpath supports the following names: [Pulsing41, 
SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
FST41, FSTOrd41, Lucene40, Lucene41]

When running with:

java -cp 
/usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
 
-ea:org.apache.lucene... org.apache.lucene.index.CheckIndex

I'm not a java programmer so after I tried other classpath combinations I 
was out of ideas.


Any tips?  Looking at _cat/shards the replica is currently marked 
"unassigned" while the primary is "initializing".  Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.