So, running on all nodes: java -cp :/usr/share/elasticsearch/lib/elasticsearch-0.90.7.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex "/var/data/elasticsearch/Rage Against the Machine/nodes/0/indices/zapier_legacy/0/index/"
Gives us the same result: WARNING: 1 broken segments (containing 4444 documents) detected WARNING: would write new segments file, and 4444 documents would be lost, if -fix were specified Now I we're trying to consider the proper order of operations to fix this (IE: can CheckIndex be ran on a live node, should we shut down one node first apply the fix and bring it back, etc.) On Tuesday, December 17, 2013 3:05:35 PM UTC-8, Bryan Helmig wrote: > > It looks like we might be able to try the CheckIndex (ALA > https://groups.google.com/forum/#!msg/elasticsearch/DT_MctUpiCM/4NKnCBthnI0J), > > or perhaps that is the same as "index.shard.check_on_startup: true"... > > > On Tuesday, December 17, 2013 2:37:36 PM UTC-8, Alexander Reelsen wrote: >> >> Hey, >> >> na, sent too early. What OOM exception were you hitting? Was this due to >> querying your data? Just trying to make sure, there was nothing >> out-of-ordinary which triggered that corruption. >> >> >> --Alex >> >> >> On Tue, Dec 17, 2013 at 11:36 PM, Alexander Reelsen <[email protected]>wrote: >> >>> Hey, >>> >>> somehow your index data is corrupt. You could set >>> 'index.shard.check_on_startup' to true and check its output. This triggers >>> a lucene CheckIndex, which might rewrite your segments file if it runs >>> successfully. >>> >>> >>> --Alex >>> >>> >>> On Tue, Dec 17, 2013 at 8:25 PM, Bryan Helmig <[email protected]> wrote: >>> >>>> We're at "max_file_descriptors" : 65535 right now, and I haven't seen >>>> anything around file handles in the logs (and we have plenty of disk >>>> space). We did have an OOM exception due to a misconfigured heap a few >>>> days >>>> ago, but we did a rolling restart after a fix and it all seemed fine. >>>> >>>> https://gist.github.com/bryanhelmig/091839e6a48a4e103699 has the full >>>> log with the exception repeated over and over. >>>> https://gist.github.com/bryanhelmig/3c17edfe5c4e9065e5a3 was the first >>>> log that had an error, which has a few other interesting lines like: >>>> >>>> "MergeException[java.io.EOFException: read past EOF: NIOFSIndexInput >>>> ..." >>>> >>>> >>>> On Tuesday, December 17, 2013 11:15:36 AM UTC-8, Alexander Reelsen >>>> wrote: >>>> >>>>> Hey, >>>>> >>>>> is it possible that there is actually an exception happening before >>>>> the NativeFSLock exception occured? Running out of disk space or file >>>>> handles or something like that? >>>>> >>>>> >>>>> --Alex >>>>> >>>>> >>>>> On Tue, Dec 17, 2013 at 8:02 PM, Bryan Helmig <[email protected]>wrote: >>>>> >>>>>> Hey Alex! >>>>>> >>>>>> 1. We're using EBS. >>>>>> 2. JVM is 1.7.0_25 across all nodes. >>>>>> 3. Elasticsearch is 0.90.7 across all nodes. >>>>>> >>>>>> Right now the cluster is stable, albeit with one shard with no >>>>>> primaries/replicas started (one is never ending recover and one is >>>>>> unassigned). Its weird because for a short period of time last night, it >>>>>> had a primary (broken replica though) and was growing in size. That good >>>>>> fortune has not returned... >>>>>> >>>>>> -bryan >>>>>> >>>>>> >>>>>> On Tuesday, December 17, 2013 8:16:36 AM UTC-8, Alexander Reelsen >>>>>> wrote: >>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> is your filesystem used to store data a network file system or just >>>>>>> a normal one? If not, any special file system type? >>>>>>> Do you have an up-to-date JVM version? >>>>>>> Do you have an up-to-date elasticsearch version? >>>>>>> >>>>>>> >>>>>>> --Alex >>>>>>> >>>>>>> >>>>>>> On Tue, Dec 17, 2013 at 6:42 AM, Bryan Helmig <[email protected]>wrote: >>>>>>> >>>>>>>> It looks like we're back to not having a good primary here anymore, >>>>>>>> both shards are either RECOVERING/UNASSIGNED. >>>>>>>> >>>>>>>> (Sorry for the constant stream of updates, just trying to get to >>>>>>>> the bottom of this one.) >>>>>>>> >>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "elasticsearch" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> To view this discussion on the web visit >>>>>>>> https://groups.google.com/d/msgid/elasticsearch/b4f88f23-1b5 >>>>>>>> 1-45e9-afec-dcf0efa2c2fd%40googlegroups.com. >>>>>>>> >>>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit https://groups.google.com/d/ >>>>>> msgid/elasticsearch/ecd680d0-4d22-41c8-86b7-823fe7c33216% >>>>>> 40googlegroups.com. >>>>>> >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/e049dc54-23bf-4f52-82aa-95d9ab290d2c%40googlegroups.com >>>> . >>>> >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a68c059f-e557-4f1f-b13e-8fc9df2bbebd%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
