Apologies for this mess.. there is more thing, the document count toggles between 3748436 & 3748478 values when the actual count should be 3750000
I was unclear as to why this is happening. Thanks a lot for the help. Regards Sri On Monday, June 30, 2014 9:15:16 AM UTC-4, sri wrote: > > MORE INFO: > > I grepped only the 'WARN' messages. > > MASTER Node(ES1) logs: > [2014-06-30 09:02:36,942][WARN ][index.engine.internal ] [NES1] > [logsjmeter14][2] failed engine [refresh failed] > [2014-06-30 09:02:37,715][WARN ][cluster.action.shard ] [NES1] > [logsjmeter14][2] sending failed shard for [logsjmeter14][2], > node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID > [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh > failed][CorruptIndexException[codec header mismatch: actual header=0 vs > expected header=1071082519 (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs") > > slice=29488:29662)))]]] > [2014-06-30 09:02:37,717][WARN ][cluster.action.shard ] [NES1] > [logsjmeter14][2] received shard failed for [logsjmeter14][2], > node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID > [lXE8Wre0S3KxjGs9Jov1tw], reason [engine failure, message [refresh > failed][CorruptIndexException[codec header mismatch: actual header=0 vs > expected header=1071082519 (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1a_es090_0.blm > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter14/2/index/_1a.cfs") > > slice=29488:29662)))]]] > [2014-06-30 09:03:14,809][WARN ][cluster.action.shard ] [NES1] > [logsjmeter87][4] received shard failed for [logsjmeter87][4], > node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID > [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata > (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") > > slice=15224:15300)))]]] > [2014-06-30 09:03:24,021][WARN ][index.engine.internal ] [NES1] > [logsjmeter65][1] failed engine [refresh failed] > [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1] > [logsjmeter65][1] sending failed shard for [logsjmeter65][1], > node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID > [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs > expected codec=Lucene41NormsMetadata (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs") > > slice=15048:15209)))]]] > [2014-06-30 09:03:24,371][WARN ][cluster.action.shard ] [NES1] > [logsjmeter65][1] received shard failed for [logsjmeter65][1], > node[dbPhRQoQQE-Tlgict_gfeg], [P], s[STARTED], indexUUID > [WXUHlSGVQ-GPGSKg0oWPIw], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual codec=XBloomFilter vs > expected codec=Lucene41NormsMetadata (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_1b.nvm in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter65/1/index/_1b.cfs") > > slice=15048:15209)))]]] > [2014-06-30 09:03:31,778][WARN ][index.engine.internal ] [NES1] > [logsjmeter79][0] failed engine [refresh failed] > [2014-06-30 09:03:32,084][WARN ][cluster.action.shard ] [NES1] > [logsjmeter79][0] sending failed shard for [logsjmeter79][0], > node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID > [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries > (resource: > BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]] > [2014-06-30 09:03:32,086][WARN ][cluster.action.shard ] [NES1] > [logsjmeter79][0] received shard failed for [logsjmeter79][0], > node[dbPhRQoQQE-Tlgict_gfeg], [R], s[STARTED], indexUUID > [NZgUPNQnT0Ss0Lhk9PUz1w], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=BLOCK_TREE_TERMS_INDEX vs expected codec=CompoundFileWriterEntries > (resource: > BufferedChecksumIndexInput(MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter79/0/index/_z.cfe")))]]] > [2014-06-30 09:03:33,865][WARN ][monitor.jvm ] [NES1] > [gc][young][228848][7461] duration [1.7s], collections [1]/[2s], total > [1.7s]/[4.4m], memory [3gb]->[2.8gb]/[3.9gb], all_pools {[young] > [168.5mb]->[30.6mb]/[266.2mb]}{[survivor] > [27.8mb]->[29.2mb]/[33.2mb]}{[old] [2.8gb]->[2.8gb]/[3.6gb]} > [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1] > [logsjmeter39][1] received shard failed for [logsjmeter39][1], > node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID > [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos > (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") > > slice=17707:22401))]]] > > ES2 logs: > [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2] > [logsjmeter87][4] sending failed shard for [logsjmeter87][4], > node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID > [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata > (resource: > BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") > > slice=15224:15300)))]]] > > ES3 logs: > [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3] > [logsjmeter39][1] sending failed shard for [logsjmeter39][1], > node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID > [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh > failed][CorruptIndexException[codec mismatch: actual > codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos > (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) > > in > MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") > > slice=17707:22401))]]] > > Thanks and Regards > Sri > > On Monday, June 30, 2014 9:07:37 AM UTC-4, sri wrote: >> >> Hi Simon, >> >> i am currently using elasticsearch 1.2.1, i am getting the error on all >> my data nodes, below are the errors: >> >> [2014-06-30 09:03:57,762][WARN ][cluster.action.shard ] [NES1] >> [logsjmeter39][1] received shard failed for [logsjmeter39][1], >> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID >> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh >> failed][CorruptIndexException[codec mismatch: actual >> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos >> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) >> >> in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") >> >> slice=17707:22401))]]] >> >> [2014-06-30 09:03:14,785][WARN ][cluster.action.shard ] [NES2] >> [logsjmeter87][4] sending failed shard for [logsjmeter87][4], >> node[XVuxg7fzTT-Xy-ArWiapJQ], [R], s[STARTED], indexUUID >> [leU8sfPETKCeQFvntNY9sg], reason [engine failure, message [refresh >> failed][CorruptIndexException[codec mismatch: actual >> codec=Lucene45DocValuesData vs expected codec=Lucene45ValuesMetadata >> (resource: >> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_12_Lucene45_0.dvm >> >> in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs")) >> >> in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter87/4/index/_12.cfs") >> >> slice=15224:15300)))]]] >> >> [2014-06-30 09:03:57,639][WARN ][cluster.action.shard ] [NES3] >> [logsjmeter39][1] sending failed shard for [logsjmeter39][1], >> node[Jjvt3FxwSLWpSCHeIjOedQ], [P], s[STARTED], indexUUID >> [_PGn7TPETEWllqz71M2ZBA], reason [engine failure, message [refresh >> failed][CorruptIndexException[codec mismatch: actual >> codec=Lucene46FieldInfos vs expected codec=Lucene41PostingsWriterPos >> (resource: SlicedIndexInput(SlicedIndexInput(_1j_es090_0.pos in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs")) >> >> in >> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter39/1/index/_1j.cfs") >> >> slice=17707:22401))]]] >> >> Thanks and Regards >> Sri >> >> >> On Monday, June 30, 2014 4:00:23 AM UTC-4, simonw wrote: >>> >>> hey, >>> >>> thanks for raising this, can you gimme more infos ie. which version you >>> are using and if that happens only on one shard or on all shards in your >>> system? It could just be what it says, and index corruption maybe due to HW >>> failure but there could be other reasons.... >>> >>> simon >>> >>> On Friday, June 27, 2014 5:20:26 PM UTC+2, sri wrote: >>>> >>>> Hi >>>> >>>> I am getting the below error my ES cluster quite frequently but am not >>>> able to understand the actual reason as to why its happening. >>>> >>>> [2014-06-27 11:12:50,014][WARN ][cluster.action.shard ] [NES1] >>>> [logsjmeter62][0] received shard failed for [logsjmeter62][0], >>>> node[ZqO9OQ8VQ0uGkvXdIeovRg], [P], s[STARTED], indexUUID >>>> [EfBgCRm8SWu4AtsNPYVXyA], reason [engine failure, message [refresh >>>> failed][CorruptIndexException[codec mismatch: actual >>>> codec=Lucene41PostingsWriterDoc vs expected codec=Lucene46FieldInfos >>>> (resource: >>>> BufferedChecksumIndexInput(SlicedIndexInput(SlicedIndexInput(_39.fnm in >>>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs")) >>>> >>>> in >>>> MMapIndexInput(path="/data/es/NESClus/nodes/0/indices/logsjmeter62/0/index/_39.cfs") >>>> >>>> zlice=7371:8755)))]]] >>>> >>>> >>>> Thanks and Regards >>>> Sri >>>> >>> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/142e0fbe-4f9c-4298-802a-fc9f22e63652%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.