We've definitely found the corrupted shard it seems (shard 0 is corrupted in the same way across all nodes, all other shards seem to check out fine).
Is it worth making a filesystem backup *first*, and trying the vanilla CheckIndex -fix or should we wait for your "index.shard.check_on_startup: fix" test? Also, can we assume that if one node is restarted with the fixed shard that the other nodes will replicate from the fixed shard? On Tuesday, December 17, 2013 5:43:20 PM UTC-8, Jörg Prante wrote: > > I know this exception from OOMs, too, when heap got low. > > You should identify the corrupted shard and make a filesystem copy of it > so you do not lose files. > > I can not recommend Lucene CheckIndex, because ES uses a modified Lucene 4 > index, and may not be able to simply pick up an index "repaired" by Lucene > (the "repair" is dropping docs) > > I have to test if "index.shard.check_on_startup: fix" works at all , it > was in the Lucene 3.6 days when it worked quite ok. Since then a lot > changed. > > Jörg > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/27d18fab-f521-4143-8db5-73cbfef5d5b8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
