On our 4 node test cluster (1.1.2), seemingly out of the blue we had one node experience very high cpu usage and become unresponsive and then after about 8 hours another node experienced the same issue. The processes themselves stayed alive, gc activity was normal, they didn't experience an OutOfMemoryError. The nodes left the cluster though, perhaps due to the unresponsiveness. The only errors in the log files were a bunch of messages like:
org.elasticsearch.search.SearchContextMissingException: No search context found for id ... and errors about the search queue being full. We see the SearchContextMissingException occasionally during normal operation, but during the high cpu period it happened quite a bit. I don't think we had an unusually high number of queries during that time because the other 2 nodes had normal cpu usage and for the prior week things ran smoothly. We are going to restart testing, but is there anything we can do to better understand what happened? Maybe change a particular log level or do something while the problem is happening, assuming we can reproduce the issue? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/58351342-da89-43ad-a1be-194d8b608457%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.