Re: ES upgrade 0.20.6 to 1.4.2 - CorruptIndexException and FileNotFoundException
Any ideas? On Wednesday, December 31, 2014 3:35:39 PM UTC+1, Georgeta wrote: Hi All, I have a 5 nodes cluster. I updated the cluster from 0.20.6 to 1.4.2. When I start the cluster with shard allocation disabled, it starts and goes into a yellow state, all good. When I enable shard allocation WARN messages are generated: INFO || elasticsearch[node1][clusterService#updateTask][T#1] org.elasticsearch.cluster.routing.allocation.decider [node1] updating [cluster.routing.allocation.disable_allocation] from [true] to [false] [2014-12-31 13:46:26.310 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#4]{New I/O worker #21} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file detected source: [recovery phase 1]][RecoverFilesRecoveryException[[index1][2] Failed to transfer [69] files with total size of [6.5mb]]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]; ]] [2014-12-31 13:46:35.504 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O worker #31} org.elasticsearch.cluster.action.shard [node1] [index2][0] received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file detected source: [recovery phase 1]][RecoverFilesRecoveryException[[index2][0] Failed to transfer [163] files with total size of [238.1mb]]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=ptu7cd actual=1jw7kx9 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@38c14092)]; ]] [2014-12-31 13:46:36.777 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#15]{New I/O worker #32} org.elasticsearch.cluster.action.shard [node1] [index2][0] received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], s[STARTED], indexUUID [_na_], reason [master [node1][8zFPkXuvQQWJvErc458tFA][dw1949demum.int.demandware.com][inet[/127.0.0.1:48003]]{local=false, power_zone=default} marked shard as started, but shard has not been created, mark shard as failed] [2014-12-31 13:46:36.792 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O worker #31} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[2mIDLcOcQJO4i73QHb7d6Q], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index1][2] failed recovery]; nested: EngineCreationFailureException[[index1][2] failed to open reader on writer]; nested: FileNotFoundException[No such file [_5aa.tis]]; ]] [2014-12-31 13:46:47.261 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#6]{New I/O worker #23} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index1][2] failed to fetch index version after copying it over]; nested: CorruptIndexException[[index1][2] Preexisting corrupted index [corrupted_gExs5fftSwmCWWgUKN6Wbg] caused by: CorruptIndexException[checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)] org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5) at org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73) at org.elasticsearch.index.store.Store.verify(Store.java:365) at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599) at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Suppressed: org.elasticsearch.transport.RemoteTransportException: [node5][inet[/127.0.0.1:48043]][internal:index/shard/recovery/file_chunk] Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=55hiu actual=16i3yt2
ES upgrade 0.20.6 to 1.4.2 - CorruptIndexException and FileNotFoundException
Hi All, I have a 5 nodes cluster. I updated the cluster from 0.20.6 to 1.4.2. When I start the cluster with shard allocation disabled, it starts and goes into a yellow state, all good. When I enable shard allocation WARN messages are generated: INFO || elasticsearch[node1][clusterService#updateTask][T#1] org.elasticsearch.cluster.routing.allocation.decider [node1] updating [cluster.routing.allocation.disable_allocation] from [true] to [false] [2014-12-31 13:46:26.310 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#4]{New I/O worker #21} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file detected source: [recovery phase 1]][RecoverFilesRecoveryException[[index1][2] Failed to transfer [69] files with total size of [6.5mb]]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)]; ]] [2014-12-31 13:46:35.504 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O worker #31} org.elasticsearch.cluster.action.shard [node1] [index2][0] received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], s[STARTED], indexUUID [_na_], reason [engine failure, message [corrupt file detected source: [recovery phase 1]][RecoverFilesRecoveryException[[index2][0] Failed to transfer [163] files with total size of [238.1mb]]; nested: CorruptIndexException[checksum failed (hardware problem?) : expected=ptu7cd actual=1jw7kx9 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@38c14092)]; ]] [2014-12-31 13:46:36.777 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#15]{New I/O worker #32} org.elasticsearch.cluster.action.shard [node1] [index2][0] received shard failed for [index2][0], node[GORnFBrmQLOAvK294MUHgA], [P], s[STARTED], indexUUID [_na_], reason [master [node1][8zFPkXuvQQWJvErc458tFA][dw1949demum.int.demandware.com][inet[/127.0.0.1:48003]]{local=false, power_zone=default} marked shard as started, but shard has not been created, mark shard as failed] [2014-12-31 13:46:36.792 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#14]{New I/O worker #31} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[2mIDLcOcQJO4i73QHb7d6Q], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index1][2] failed recovery]; nested: EngineCreationFailureException[[index1][2] failed to open reader on writer]; nested: FileNotFoundException[No such file [_5aa.tis]]; ]] [2014-12-31 13:46:47.261 GMT] WARN || elasticsearch[node1][[transport_server_worker.default]][T#6]{New I/O worker #23} org.elasticsearch.cluster.action.shard [node1] [index1][2] received shard failed for [index1][2], node[x6PqV8RMS8eA9GmBMZwjNQ], [P], s[INITIALIZING], indexUUID [_na_], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[index1][2] failed to fetch index version after copying it over]; nested: CorruptIndexException[[index1][2] Preexisting corrupted index [corrupted_gExs5fftSwmCWWgUKN6Wbg] caused by: CorruptIndexException[checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5)] org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=17tw8li actual=1ig9y12 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@61297ce5) at org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73) at org.elasticsearch.index.store.Store.verify(Store.java:365) at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:599) at org.elasticsearch.indices.recovery.RecoveryTarget$FileChunkTransportRequestHandler.messageReceived(RecoveryTarget.java:536) at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Suppressed: org.elasticsearch.transport.RemoteTransportException: [node5][inet[/127.0.0.1:48043]][internal:index/shard/recovery/file_chunk] Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=55hiu actual=16i3yt2 resource=(org.apache.lucene.store.FSDirectory$FSIndexOutput@108f1be6) at org.elasticsearch.index.store.LegacyVerification$Adler32VerifyingIndexOutput.verify(LegacyVerification.java:73) at