It looks like we might be able to try the CheckIndex (ALA https://groups.google.com/forum/#!msg/elasticsearch/DT_MctUpiCM/4NKnCBthnI0J), or perhaps that is the same as "index.shard.check_on_startup: true"...
On Tuesday, December 17, 2013 2:37:36 PM UTC-8, Alexander Reelsen wrote: > > Hey, > > na, sent too early. What OOM exception were you hitting? Was this due to > querying your data? Just trying to make sure, there was nothing > out-of-ordinary which triggered that corruption. > > > --Alex > > > On Tue, Dec 17, 2013 at 11:36 PM, Alexander Reelsen > <[email protected]<javascript:> > > wrote: > >> Hey, >> >> somehow your index data is corrupt. You could set >> 'index.shard.check_on_startup' to true and check its output. This triggers >> a lucene CheckIndex, which might rewrite your segments file if it runs >> successfully. >> >> >> --Alex >> >> >> On Tue, Dec 17, 2013 at 8:25 PM, Bryan Helmig <[email protected]<javascript:> >> > wrote: >> >>> We're at "max_file_descriptors" : 65535 right now, and I haven't seen >>> anything around file handles in the logs (and we have plenty of disk >>> space). We did have an OOM exception due to a misconfigured heap a few days >>> ago, but we did a rolling restart after a fix and it all seemed fine. >>> >>> https://gist.github.com/bryanhelmig/091839e6a48a4e103699 has the full >>> log with the exception repeated over and over. >>> https://gist.github.com/bryanhelmig/3c17edfe5c4e9065e5a3 was the first >>> log that had an error, which has a few other interesting lines like: >>> >>> "MergeException[java.io.EOFException: read past EOF: NIOFSIndexInput ..." >>> >>> >>> On Tuesday, December 17, 2013 11:15:36 AM UTC-8, Alexander Reelsen wrote: >>> >>>> Hey, >>>> >>>> is it possible that there is actually an exception happening before the >>>> NativeFSLock exception occured? Running out of disk space or file handles >>>> or something like that? >>>> >>>> >>>> --Alex >>>> >>>> >>>> On Tue, Dec 17, 2013 at 8:02 PM, Bryan Helmig <[email protected]> wrote: >>>> >>>>> Hey Alex! >>>>> >>>>> 1. We're using EBS. >>>>> 2. JVM is 1.7.0_25 across all nodes. >>>>> 3. Elasticsearch is 0.90.7 across all nodes. >>>>> >>>>> Right now the cluster is stable, albeit with one shard with no >>>>> primaries/replicas started (one is never ending recover and one is >>>>> unassigned). Its weird because for a short period of time last night, it >>>>> had a primary (broken replica though) and was growing in size. That good >>>>> fortune has not returned... >>>>> >>>>> -bryan >>>>> >>>>> >>>>> On Tuesday, December 17, 2013 8:16:36 AM UTC-8, Alexander Reelsen >>>>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> is your filesystem used to store data a network file system or just a >>>>>> normal one? If not, any special file system type? >>>>>> Do you have an up-to-date JVM version? >>>>>> Do you have an up-to-date elasticsearch version? >>>>>> >>>>>> >>>>>> --Alex >>>>>> >>>>>> >>>>>> On Tue, Dec 17, 2013 at 6:42 AM, Bryan Helmig <[email protected]>wrote: >>>>>> >>>>>>> It looks like we're back to not having a good primary here anymore, >>>>>>> both shards are either RECOVERING/UNASSIGNED. >>>>>>> >>>>>>> (Sorry for the constant stream of updates, just trying to get to the >>>>>>> bottom of this one.) >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elasticsearch" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elasticsearch/b4f88f23-1b5 >>>>>>> 1-45e9-afec-dcf0efa2c2fd%40googlegroups.com. >>>>>>> >>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>> msgid/elasticsearch/ecd680d0-4d22-41c8-86b7-823fe7c33216% >>>>> 40googlegroups.com. >>>>> >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/e049dc54-23bf-4f52-82aa-95d9ab290d2c%40googlegroups.com >>> . >>> >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a4dd96e3-9aad-4e2e-8952-a8642c37aaa8%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
