So, running on all nodes:

java -cp 
:/usr/share/elasticsearch/lib/elasticsearch-0.90.7.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
 
-ea:org.apache.lucene... org.apache.lucene.index.CheckIndex 
"/var/data/elasticsearch/Rage Against the 
Machine/nodes/0/indices/zapier_legacy/0/index/"

Gives us the same result:

WARNING: 1 broken segments (containing 4444 documents) detected
WARNING: would write new segments file, and 4444 documents would be lost, 
if -fix were specified

Now I we're trying to consider the proper order of operations to fix this 
(IE: can CheckIndex be ran on a live node, should we shut down one node 
first apply the fix and bring it back, etc.)


On Tuesday, December 17, 2013 3:05:35 PM UTC-8, Bryan Helmig wrote:
>
> It looks like we might be able to try the CheckIndex (ALA 
> https://groups.google.com/forum/#!msg/elasticsearch/DT_MctUpiCM/4NKnCBthnI0J),
>  
> or perhaps that is the same as "index.shard.check_on_startup: true"...
>
>
> On Tuesday, December 17, 2013 2:37:36 PM UTC-8, Alexander Reelsen wrote:
>>
>> Hey,
>>
>> na, sent too early. What OOM exception were you hitting? Was this due to 
>> querying your data? Just trying to make sure, there was nothing 
>> out-of-ordinary which triggered that corruption.
>>
>>
>> --Alex
>>
>>
>> On Tue, Dec 17, 2013 at 11:36 PM, Alexander Reelsen <[email protected]>wrote:
>>
>>> Hey,
>>>
>>> somehow your index data is corrupt. You could set 
>>> 'index.shard.check_on_startup' to true and check its output. This triggers 
>>> a lucene CheckIndex, which might rewrite your segments file if it runs 
>>> successfully.
>>>
>>>
>>> --Alex
>>>
>>>
>>> On Tue, Dec 17, 2013 at 8:25 PM, Bryan Helmig <[email protected]> wrote:
>>>
>>>> We're at "max_file_descriptors" : 65535 right now, and I haven't seen 
>>>> anything around file handles in the logs (and we have plenty of disk 
>>>> space). We did have an OOM exception due to a misconfigured heap a few 
>>>> days 
>>>> ago, but we did a rolling restart after a fix and it all seemed fine.
>>>>
>>>> https://gist.github.com/bryanhelmig/091839e6a48a4e103699 has the full 
>>>> log with the exception repeated over and over. 
>>>> https://gist.github.com/bryanhelmig/3c17edfe5c4e9065e5a3 was the first 
>>>> log that had an error, which has a few other interesting lines like:
>>>>
>>>> "MergeException[java.io.EOFException: read past EOF: NIOFSIndexInput 
>>>> ..."
>>>>
>>>>
>>>> On Tuesday, December 17, 2013 11:15:36 AM UTC-8, Alexander Reelsen 
>>>> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>> is it possible that there is actually an exception happening before 
>>>>> the NativeFSLock exception occured? Running out of disk space or file 
>>>>> handles or something like that?
>>>>>
>>>>>
>>>>> --Alex
>>>>>
>>>>>
>>>>> On Tue, Dec 17, 2013 at 8:02 PM, Bryan Helmig <[email protected]>wrote:
>>>>>
>>>>>> Hey Alex!
>>>>>>
>>>>>> 1. We're using EBS.
>>>>>> 2. JVM is 1.7.0_25 across all nodes.
>>>>>> 3. Elasticsearch is 0.90.7 across all nodes.
>>>>>>
>>>>>> Right now the cluster is stable, albeit with one shard with no 
>>>>>> primaries/replicas started (one is never ending recover and one is 
>>>>>> unassigned). Its weird because for a short period of time last night, it 
>>>>>> had a primary (broken replica though) and was growing in size. That good 
>>>>>> fortune has not returned...
>>>>>>
>>>>>> -bryan
>>>>>>
>>>>>>
>>>>>> On Tuesday, December 17, 2013 8:16:36 AM UTC-8, Alexander Reelsen 
>>>>>> wrote:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> is your filesystem used to store data a network file system or just 
>>>>>>> a normal one? If not, any special file system type?
>>>>>>> Do you have an up-to-date JVM version?
>>>>>>> Do you have an up-to-date elasticsearch version?
>>>>>>>
>>>>>>>
>>>>>>> --Alex
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Dec 17, 2013 at 6:42 AM, Bryan Helmig <[email protected]>wrote:
>>>>>>>
>>>>>>>> It looks like we're back to not having a good primary here anymore, 
>>>>>>>> both shards are either RECOVERING/UNASSIGNED.
>>>>>>>>
>>>>>>>> (Sorry for the constant stream of updates, just trying to get to 
>>>>>>>> the bottom of this one.)
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "elasticsearch" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/elasticsearch/b4f88f23-1b5
>>>>>>>> 1-45e9-afec-dcf0efa2c2fd%40googlegroups.com.
>>>>>>>>
>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>
>>>>>>>
>>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>>>> msgid/elasticsearch/ecd680d0-4d22-41c8-86b7-823fe7c33216%
>>>>>> 40googlegroups.com.
>>>>>>
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/elasticsearch/e049dc54-23bf-4f52-82aa-95d9ab290d2c%40googlegroups.com
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a68c059f-e557-4f1f-b13e-8fc9df2bbebd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to