Re: ES upgrade 0.20.6 to 1.4.2 - CorruptIndexException and FileNotFoundException

2015-01-05 Thread Georgeta
Any ideas?

On Wednesday, December 31, 2014 3:35:39 PM UTC+1, Georgeta wrote:

 Hi All,

 I have a 5 nodes cluster. I updated the cluster from 0.20.6 to 1.4.2.
 When I start the cluster with shard allocation disabled, it starts and 
 goes into a yellow state, all good. When I enable shard allocation WARN 
 messages are generated: 
  
 INFO || elasticsearch[node1][clusterService#updateTask][T#1] 
 org.elasticsearch.cluster.routing.allocation.decider  [node1] updating 
 [cluster.routing.allocation.disable_allocation] from [true] to [false]

 [2014-12-31 13:46:26.310 GMT] WARN || 
 elasticsearch[node1][[transport_server_worker.default]][T#4]{New I/O worker 
 #21} org.elasticsearch.cluster.action.shard  [node1] [index1][2] received 
 shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], 
 s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file 
 detected source: [recovery phase 
 1]][RecoverFilesRecoveryException[[index1][2] Failed to transfer [69] files 
 with total size of [6.5mb]]; nested: CorruptIndexException[checksum failed 
 (hardware problem?) : expected=17tw8li actual=1ig9y12 
 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]; ]]

 [2014-12-31 13:46:35.504 GMT] WARN || 
 elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O 
 worker #31} org.elasticsearch.cluster.action.shard  [node1] [index2][0] 
 received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], 
 s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file 
 detected source: [recovery phase 
 1]][RecoverFilesRecoveryException[[index2][0] Failed to transfer [163] 
 files with total size of [238.1mb]]; nested: CorruptIndexException[checksum 
 failed (hardware problem?) : expected=ptu7cd actual=1jw7kx9 
 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@38c14092)]; ]]

 [2014-12-31 13:46:36.777 GMT] WARN || 
 elasticsearch[node1][[transport_server_worker.default]][T#15]{New I/O 
 worker #32} org.elasticsearch.cluster.action.shard  [node1] [index2][0] 
 received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], 
 s[STARTED], indexUUID [_na_], reason [master 
 [node1][8zFPkXuvQQWJvErc458tFA][dw1949demum.int.demandware.com][inet[/127.0.0.1:48003]]{local=false,
  
 power_zone=default} marked shard as started, but shard has not been 
 created, mark shard as failed]

 [2014-12-31 13:46:36.792 GMT] WARN || 
 elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O 
 worker #31} org.elasticsearch.cluster.action.shard  [node1] [index1][2] 
 received shard failed for [index1][2], node[2mIDLcOcQJO4i73QHb7d6Q], [P], 
 s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message 
 [IndexShardGatewayRecoveryException[[index1][2] failed recovery]; nested: 
 EngineCreationFailureException[[index1][2] failed to open reader on 
 writer]; nested: FileNotFoundException[No such file [_5aa.tis]]; ]]

 [2014-12-31 13:46:47.261 GMT] WARN || 
 elasticsearch[node1][[transport_server_worker.default]][T#6]{New I/O worker 
 #23} org.elasticsearch.cluster.action.shard  [node1] [index1][2] received 
 shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], 
 s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message 
 [IndexShardGatewayRecoveryException[[index1][2] failed to fetch index 
 version after copying it over]; nested: CorruptIndexException[[index1][2] 
 Preexisting corrupted index [corrupted_gExs5fftSwmCWWgUKN6Wbg] caused by: 
 CorruptIndexException[checksum failed (hardware problem?) : 
 expected=17tw8li actual=1ig9y12 
 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]
 org.apache.lucene.index.CorruptIndexException: checksum failed (hardware 
 problem?) : expected=17tw8li actual=1ig9y12 
 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)
 at 
 org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
 at org.elasticsearch.index.store.Store.verify(Store.java:365)
 at 
 org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599)
 at 
 org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536)
 at 
 org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Suppressed: org.elasticsearch.transport.RemoteTransportException: 
 [node5][inet[/127.0.0.1:48043]][internal:index/shard/recovery/file_chunk]
 Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
 (hardware problem?) : expected=55hiu actual=16i3yt2 
 

ES upgrade 0.20.6 to 1.4.2 - CorruptIndexException and FileNotFoundException

2014-12-31 Thread Georgeta
Hi All,

I have a 5 nodes cluster. I updated the cluster from 0.20.6 to 1.4.2.
When I start the cluster with shard allocation disabled, it starts and goes 
into a yellow state, all good. When I enable shard allocation WARN messages 
are generated: 
 
INFO || elasticsearch[node1][clusterService#updateTask][T#1] 
org.elasticsearch.cluster.routing.allocation.decider  [node1] updating 
[cluster.routing.allocation.disable_allocation] from [true] to [false]

[2014-12-31 13:46:26.310 GMT] WARN || 
elasticsearch[node1][[transport_server_worker.default]][T#4]{New I/O worker 
#21} org.elasticsearch.cluster.action.shard  [node1] [index1][2] received 
shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], 
s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file 
detected source: [recovery phase 
1]][RecoverFilesRecoveryException[[index1][2] Failed to transfer [69] files 
with total size of [6.5mb]]; nested: CorruptIndexException[checksum failed 
(hardware problem?) : expected=17tw8li actual=1ig9y12 
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]; ]]

[2014-12-31 13:46:35.504 GMT] WARN || 
elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O 
worker #31} org.elasticsearch.cluster.action.shard  [node1] [index2][0] 
received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], 
s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file 
detected source: [recovery phase 
1]][RecoverFilesRecoveryException[[index2][0] Failed to transfer [163] 
files with total size of [238.1mb]]; nested: CorruptIndexException[checksum 
failed (hardware problem?) : expected=ptu7cd actual=1jw7kx9 
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@38c14092)]; ]]

[2014-12-31 13:46:36.777 GMT] WARN || 
elasticsearch[node1][[transport_server_worker.default]][T#15]{New I/O 
worker #32} org.elasticsearch.cluster.action.shard  [node1] [index2][0] 
received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], 
s[STARTED], indexUUID [_na_], reason [master 
[node1][8zFPkXuvQQWJvErc458tFA][dw1949demum.int.demandware.com][inet[/127.0.0.1:48003]]{local=false,
 
power_zone=default} marked shard as started, but shard has not been 
created, mark shard as failed]

[2014-12-31 13:46:36.792 GMT] WARN || 
elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O 
worker #31} org.elasticsearch.cluster.action.shard  [node1] [index1][2] 
received shard failed for [index1][2], node[2mIDLcOcQJO4i73QHb7d6Q], [P], 
s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message 
[IndexShardGatewayRecoveryException[[index1][2] failed recovery]; nested: 
EngineCreationFailureException[[index1][2] failed to open reader on 
writer]; nested: FileNotFoundException[No such file [_5aa.tis]]; ]]

[2014-12-31 13:46:47.261 GMT] WARN || 
elasticsearch[node1][[transport_server_worker.default]][T#6]{New I/O worker 
#23} org.elasticsearch.cluster.action.shard  [node1] [index1][2] received 
shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], 
s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message 
[IndexShardGatewayRecoveryException[[index1][2] failed to fetch index 
version after copying it over]; nested: CorruptIndexException[[index1][2] 
Preexisting corrupted index [corrupted_gExs5fftSwmCWWgUKN6Wbg] caused by: 
CorruptIndexException[checksum failed (hardware problem?) : 
expected=17tw8li actual=1ig9y12 
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]
org.apache.lucene.index.CorruptIndexException: checksum failed (hardware 
problem?) : expected=17tw8li actual=1ig9y12 
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)
at 
org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
at org.elasticsearch.index.store.Store.verify(Store.java:365)
at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599)
at 
org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536)
at 
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Suppressed: org.elasticsearch.transport.RemoteTransportException: 
[node5][inet[/127.0.0.1:48043]][internal:index/shard/recovery/file_chunk]
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed 
(hardware problem?) : expected=55hiu actual=16i3yt2 
resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@108f1be6)
at 
org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73)
at