We've definitely found the corrupted shard it seems (shard 0 is corrupted 
in the same way across all nodes, all other shards seem to check out fine).

Is it worth making a filesystem backup *first*, and trying the vanilla 
CheckIndex -fix or should we wait for your "index.shard.check_on_startup: 
fix" test? Also, can we assume that if one node is restarted with the fixed 
shard that the other nodes will replicate from the fixed shard?


On Tuesday, December 17, 2013 5:43:20 PM UTC-8, Jörg Prante wrote:
>
> I know this exception from OOMs, too, when heap got low.
>
> You should identify the corrupted shard and make a filesystem copy of it 
> so you do not lose files.
>
> I can not recommend Lucene CheckIndex, because ES uses a modified Lucene 4 
> index, and may not be able to simply pick up an index "repaired" by Lucene 
> (the "repair" is dropping docs)
>
> I have to test if "index.shard.check_on_startup: fix" works at all , it 
> was in the Lucene 3.6 days when it worked quite ok. Since then a lot 
> changed.
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/27d18fab-f521-4143-8db5-73cbfef5d5b8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to