Hey,

great, you got it running again. The replica corruption thing makes sense,
btw.
Do you still have a stack trace of the OOM exception you found first? Would
like to see what caused it and maybe what one can do about it in the
future, if there is more information.


--Alex


On Wed, Dec 18, 2013 at 9:12 AM, Bryan Helmig <[email protected]> wrote:

> Okay, a combination of CheckIndex -fix, careful manual allocation of shard
> 0, and restarts to clear the lock files has resulted in a green cluster.
>
>
> On Tuesday, December 17, 2013 11:51:04 PM UTC-8, Bryan Helmig wrote:
>>
>> So, a little more digging and it looks like it was holding onto a
>> write.lock that was gone.
>>
>> sudo lsof -uelasticsearch | grep 'legacy/0'
>> java    27517 elasticsearch 1042uW  REG              202,1          0
>>  525279 /var/data/elasticsearch/Rage Against the Machine/nodes/0/indices/
>> zapier_legacy/0/index/write.lock (deleted)
>>
>> We did delete some leftover lock files after the nodes powered down, but
>> that seems like it shouldn't have caused this (unless we made a mistake and
>> nuked it on a live instance). Somehow that plus the OOM corruption led to a
>> pretty crazy situation. We're almost back from it after some restarts, we
>> should be able to have a blog post on the situation after. I'll follow up
>> with results and a link ASAP.
>>
>>
>> On Tuesday, December 17, 2013 8:13:39 PM UTC-8, Bryan Helmig wrote:
>>>
>>> We're also fine with loosing a few docs as we can reindex them from
>>> another source, so dropping the documents works for us.
>>>
>>>
>>> On Tuesday, December 17, 2013 7:47:21 PM UTC-8, Bryan Helmig wrote:
>>>>
>>>> All replicas have the same corruption, it seems. We can't get a primary
>>>> up for shard 0, therefore the replica never comes up. Does that make sense?
>>>>
>>>>
>>>> On Tuesday, December 17, 2013 6:33:19 PM UTC-8, Jörg Prante wrote:
>>>>>
>>>>> Hm, just wanted to clarify that I'm not familiar with the effects of
>>>>> latest ES on Lucene 4 "index.shard.check_on_startup: fix"
>>>>>
>>>>> Even if I can test it, there is no guarantee that it works for you.
>>>>> Different systems, different index, different corruptions... who knows.
>>>>>
>>>>> I'm quite puzzled, you don't have a replica shard? The "CheckIndex"
>>>>> is really a last resort if there are no replica, and it is not the
>>>>> preferred method to ensure data integrity in ES...
>>>>>
>>>>> Jörg
>>>>>
>>>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/f5413107-6701-4081-9e2c-be7035865cfb%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9jTRMbAo0V%2B8VqdwKm-ETcf3_%2BzFHvcMX4skv%3DEwO8-w%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to