Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Quick followup question, is it safe to run -fix while ES is also running on 
the node?  Understanding that some documents will be lost.

On Tuesday, March 24, 2015 at 10:24:26 AM UTC-4, mjd...@gmail.com wrote:
>
> Thanks for the CheckIndex info, that worked!  It looks like only one of 
> the segments in that shard has issues:
>
>   1 of 20: name=_1om docCount=216683
> codec=Lucene3x
> compound=false
> numFiles=10
> size (MB)=5,111.421
> diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, 
> source=merge, lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, 
> os.arch=amd64, mergeMaxNumSegments=-1, java.version=1.6.0_26, 
> java.vendor=Sun Microsystems Inc.}
> no deletions
> test: open reader.OK
> test: check integrity.OK
> test: check live docs.OK
> test: fields..OK [31 fields]
> test: field norms.OK [20 fields]
> test: terms, freq, prox...ERROR: java.lang.AssertionError: 
> index=216690, numBits=216683
> java.lang.AssertionError: index=216690, numBits=216683
> at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
> at 
> org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
> at 
> org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
> test: stored fields...OK [3033562 total field count; avg 14 fields 
> per doc]
> test: term vectorsOK [0 total vector count; avg 0 term/freq 
> vector fields per doc]
> test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
> 0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
> FAILED
> WARNING: fixIndex() would remove reference to this segment; full 
> exception:
> java.lang.RuntimeException: Term Index test failed
> at 
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
> at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
>
> This is on ES 1.3.4, but the index I was running optimize on was likely 
> created back in 0.9 or 1.0.
>
> On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>>
>> Hmm, not good.
>>
>> Which version of ES?  Do you have a full stack trace for the exception?
>>
>> To run CheckIndex you need to add all ES jars to the classpath.  It's 
>> easiest to just use a wildcard for this, e.g.:
>>
>>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
>> ...
>>
>> Make sure you have the double quotes so the shell does not expand that 
>> wildcard!
>>
>> Mike McCandless
>>
>> On Mon, Mar 23, 2015 at 9:50 PM,  wrote:
>>
>>> I did an optimize on this index and it looks like it caused a shard to 
>>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>>> to light?
>>>
>>> On the node that reported the corrupted shard I tried shutting it down, 
>>> moving the shard out and then restarting. Unfortunately the next node that 
>>> got that shard then started with the same corruption issues.  The errors:
>>>
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>>> shard
>>> Mar 24 01:40:17 localhost 
>>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>>> [1-2013][0] failed to fetch index version after copying it over
>>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>>> indexUUID [_na_], reason [Failed to start shard, message 
>>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>>
>>> I tried using CheckIndex, but had this issue:
>>>
>>> java.lang.IllegalArgumentException: A SPI class of type 
>>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>>> You need to add the corresponding JAR file supporting this SPI to your 
>>> classpath.The current classpath supports the following names: [Pulsing41, 
>>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>>> FST41, FSTOrd41, Lucene40, Lucene41]
>>>
>>> When running with:
>>>
>>> java -cp 
>>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>>  
>>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>>
>>> I'm not a java programmer so after I tried other classpath combinations 
>>> I was out of ideas.
>>>
>>>
>>> Any tips?  Looking at _cat/shards the replica is currently marked

Re: corrupted shard after optimize

2015-03-24 Thread mjdude5
Thanks for the CheckIndex info, that worked!  It looks like only one of the 
segments in that shard has issues:

  1 of 20: name=_1om docCount=216683
codec=Lucene3x
compound=false
numFiles=10
size (MB)=5,111.421
diagnostics = {os=Linux, os.version=3.5.7, mergeFactor=7, source=merge, 
lucene.version=3.6.0 1310449 - rmuir - 2012-04-06 11:31:16, os.arch=amd64, 
mergeMaxNumSegments=-1, java.version=1.6.0_26, java.vendor=Sun Microsystems 
Inc.}
no deletions
test: open reader.OK
test: check integrity.OK
test: check live docs.OK
test: fields..OK [31 fields]
test: field norms.OK [20 fields]
test: terms, freq, prox...ERROR: java.lang.AssertionError: 
index=216690, numBits=216683
java.lang.AssertionError: index=216690, numBits=216683
at org.apache.lucene.util.FixedBitSet.set(FixedBitSet.java:252)
at 
org.apache.lucene.index.CheckIndex.checkFields(CheckIndex.java:932)
at 
org.apache.lucene.index.CheckIndex.testPostings(CheckIndex.java:1325)
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:631)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)
test: stored fields...OK [3033562 total field count; avg 14 fields 
per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq 
vector fields per doc]
test: docvalues...OK [0 docvalues fields; 0 BINARY; 0 NUMERIC; 
0 SORTED; 0 SORTED_NUMERIC; 0 SORTED_SET]
FAILED
WARNING: fixIndex() would remove reference to this segment; full 
exception:
java.lang.RuntimeException: Term Index test failed
at 
org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:646)
at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2051)

This is on ES 1.3.4, but the index I was running optimize on was likely 
created back in 0.9 or 1.0.

On Tuesday, March 24, 2015 at 5:27:04 AM UTC-4, Michael McCandless wrote:
>
> Hmm, not good.
>
> Which version of ES?  Do you have a full stack trace for the exception?
>
> To run CheckIndex you need to add all ES jars to the classpath.  It's 
> easiest to just use a wildcard for this, e.g.:
>
>   java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex 
> ...
>
> Make sure you have the double quotes so the shell does not expand that 
> wildcard!
>
> Mike McCandless
>
> On Mon, Mar 23, 2015 at 9:50 PM, > wrote:
>
>> I did an optimize on this index and it looks like it caused a shard to 
>> become corrupted.  Or maybe the optimize just brought the shard corruption 
>> to light?
>>
>> On the node that reported the corrupted shard I tried shutting it down, 
>> moving the shard out and then restarting. Unfortunately the next node that 
>> got that shard then started with the same corruption issues.  The errors:
>>
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start 
>> shard
>> Mar 24 01:40:17 localhost 
>> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: 
>> [1-2013][0] failed to fetch index version after copying it over
>> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN 
>> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed 
>> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING], 
>> indexUUID [_na_], reason [Failed to start shard, message 
>> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index 
>> version after copying it over]; nested: CorruptIndexException[[1-2013][0] 
>> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by: 
>> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut: 
>> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>>
>> I tried using CheckIndex, but had this issue:
>>
>> java.lang.IllegalArgumentException: A SPI class of type 
>> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist. 
>> You need to add the corresponding JAR file supporting this SPI to your 
>> classpath.The current classpath supports the following names: [Pulsing41, 
>> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41, 
>> FST41, FSTOrd41, Lucene40, Lucene41]
>>
>> When running with:
>>
>> java -cp 
>> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
>>  
>> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>>
>> I'm not a java programmer so after I tried other classpath combinations I 
>> was out of ideas.
>>
>>
>> Any tips?  Looking at _cat/shards the replica is currently marked 
>> "unassigned" while the primary is "initializing".  Thanks!
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups

Re: corrupted shard after optimize

2015-03-24 Thread Michael McCandless
Hmm, not good.

Which version of ES?  Do you have a full stack trace for the exception?

To run CheckIndex you need to add all ES jars to the classpath.  It's
easiest to just use a wildcard for this, e.g.:

  java -cp "/path/to/es-install/lib/*" org.apache.lucene.index.CheckIndex
...

Make sure you have the double quotes so the shell does not expand that
wildcard!

Mike McCandless

On Mon, Mar 23, 2015 at 9:50 PM,  wrote:

> I did an optimize on this index and it looks like it caused a shard to
> become corrupted.  Or maybe the optimize just brought the shard corruption
> to light?
>
> On the node that reported the corrupted shard I tried shutting it down,
> moving the shard out and then restarting. Unfortunately the next node that
> got that shard then started with the same corruption issues.  The errors:
>
> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN
> ][indices.cluster  ] [Meteorite II] [1-2013][0] failed to start
> shard
> Mar 24 01:40:17 localhost
> org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException:
> [1-2013][0] failed to fetch index version after copying it over
> Mar 24 01:40:17 localhost elasticsearch: [bma.0][WARN
> ][cluster.action.shard ] [Meteorite II] [1-2013][0] sending failed
> shard for [1-2013][0], node[ZzXsIZCsTyWD2emFuU0idg], [P], s[INITIALIZING],
> indexUUID [_na_], reason [Failed to start shard, message
> [IndexShardGatewayRecoveryException[[1-2013][0] failed to fetch index
> version after copying it over]; nested: CorruptIndexException[[1-2013][0]
> Corrupted index [corrupted_OahNymObSTyBzCCPu1FuJA] caused by:
> CorruptIndexException[docs out of order (1493829 <= 1493874 ) (docOut:
> org.apache.lucene.store.RateLimitedIndexOutput@2901a3e1)]]; ]]
>
> I tried using CheckIndex, but had this issue:
>
> java.lang.IllegalArgumentException: A SPI class of type
> org.apache.lucene.codecs.PostingsFormat with name 'es090' does not exist.
> You need to add the corresponding JAR file supporting this SPI to your
> classpath.The current classpath supports the following names: [Pulsing41,
> SimpleText, Memory, BloomFilter, Direct, FSTPulsing41, FSTOrdPulsing41,
> FST41, FSTOrd41, Lucene40, Lucene41]
>
> When running with:
>
> java -cp
> /usr/share/elasticsearch/lib/lucene-codecs-4.9.1.jar:/usr/share/elasticsearch/lib/lucene-core-4.9.1.jar
> -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex
>
> I'm not a java programmer so after I tried other classpath combinations I
> was out of ideas.
>
>
> Any tips?  Looking at _cat/shards the replica is currently marked
> "unassigned" while the primary is "initializing".  Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/31fa3d97-02fa-4d1c-b507-d413051f2ea3%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKHUQPhMOJWkN9p_En%2BWDM98bEDHSWTi36B_TcQsZSw%2BBKorYQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.