MORE INFO: I grepped only the 'WARN' messages.
MASTER Node(ES1) logs: [2014-06-30 09:02:36,942][WARN ][index.engine.internal ] [NES1] [logsjmeter14][2] failed engine [refresh failed] [2014-06-30 09:02:37,715][WARN ][cluster.action.shard ] [NES1] [logsjmeter14][2] sending failed shard for [logsjmeter14][2], node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh failed][CorruptIndexException[codec header mismatch: actual header=0 vs expected header=1071082519 (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs") slice=29488:29662)))]]] [2014-06-30 09:02:37,717][WARN ][cluster.action.shard ] [NES1] [logsjmeter14][2] received shard failed for [logsjmeter14][2], node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh failed][CorruptIndexException[codec header mismatch: actual header=0 vs expected header=1071082519 (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs") slice=29488:29662)))]]] [2014-06-30 09:03:14,809][WARN ][cluster.action.shard ] [NES1] [logsjmeter87][4] received shard failed for [logsjmeter87][4], node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") slice=15224:15300)))]]] [2014-06-30 09:03:24,021][WARN ][index.engine.internal ] [NES1] [logsjmeter65][1] failed engine [refresh failed] [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1] [logsjmeter65][1] sending failed shard for [logsjmeter65][1], node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs expected codec=Lucene41NormsMetadata (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs") slice=15048:15209)))]]] [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1] [logsjmeter65][1] received shard failed for [logsjmeter65][1], node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs expected codec=Lucene41NormsMetadata (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs") slice=15048:15209)))]]] [2014-06-30 09:03:31,778][WARN ][index.engine.internal ] [NES1] [logsjmeter79][0] failed engine [refresh failed] [2014-06-30 09:03:32,084][WARN ][cluster.action.shard ] [NES1] [logsjmeter79][0] sending failed shard for [logsjmeter79][0], node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries (resource: BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]] [2014-06-30 09:03:32,086][WARN ][cluster.action.shard ] [NES1] [logsjmeter79][0] received shard failed for [logsjmeter79][0], node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries (resource: BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]] [2014-06-30 09:03:33,865][WARN ][monitor.jvm ] [NES1] [gc][young][228848][7461] duration [1.7s], collections [1]/[2s], total [1.7s]/[4.4m], memory [3gb]->[2.8gb]/[3.9gb], all_pools {[young] [168.5mb]->[30.6mb]/[266.2mb]}{[survivor] [27.8mb]->[29.2mb]/[33.2mb]}{[old] [2.8gb]->[2.8gb]/[3.6gb]} [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1] [logsjmeter39][1] received shard failed for [logsjmeter39][1], node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") slice=17707:22401))]]] ES2 logs: [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2] [logsjmeter87][4] sending failed shard for [logsjmeter87][4], node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata (resource: BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") slice=15224:15300)))]]] ES3 logs: [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3] [logsjmeter39][1] sending failed shard for [logsjmeter39][1], node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh failed][CorruptIndexException[codec mismatch: actual codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) in MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") slice=17707:22401))]]] Thanks and Regards Sri On Monday, June 30, 2014 9:07:37 AM UTC-4, sri wrote: > > Hi Simon, > > i am currently using elasticsearch 1.2.1, i am getting the error on all my > data nodes, below are the errors: > > [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1] > [logsjmeter39][1] received shard failed for [logsjmeter39][1], > node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID > [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos > (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") > > slice=17707:22401))]]] > > [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2] > [logsjmeter87][4] sending failed shard for [logsjmeter87][4], > node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID > [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata > (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") > > slice=15224:15300)))]]] > > [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3] > [logsjmeter39][1] sending failed shard for [logsjmeter39][1], > node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID > [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos > (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") > > slice=17707:22401))]]] > > Thanks and Regards > Sri > > > On Monday, June 30, 2014 4:00:23 AM UTC-4, simonw wrote: >> >> hey, >> >> thanks for raising this, can you gimme more infos ie. which version you >> are using and if that happens only on one shard or on all shards in your >> system? It could just be what it says, and index corruption maybe due to HW >> failure but there could be other reasons.... >> >> simon >> >> On Friday, June 27, 2014 5:20:26 PM UTC+2, sri wrote: >>> >>> Hi >>> >>> I am getting the below error my ES cluster quite frequently but am not >>> able to understand the actual reason as to why its happening. >>> >>> [2014-06-27 11:12:50,014][WARN ][cluster.action.shard ] [NES1] >>> [logsjmeter62][0] received shard failed for [logsjmeter62][0], >>> node[ZqO9OQ8VQ0uGkvXdIeovRg], [P], s[STARTED], indexUUID >>> [EfBgCRm8SWu4AtsNPYVXyA], reason [engine failure, message [refresh >>> failed][CorruptIndexException[codec mismatch: actual >>> codec=Lucene41PostingsWriterDoc vs expected codec=Lucene46FieldInfos >>> (resource: >>> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_39.fnm in >>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs")) >>> >>> in >>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs") >>> >>> zlice=7371:8755)))]]] >>> >>> >>> Thanks and Regards >>> Sri >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/16f837d2-cf1c-4c5b-ae05-60b5f2698f72%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.