Hi, I like to share my experience and in the same time hope I can get some tips.
The query was run against an index with about 700 million documents. Two things happens, 1. The node run this query crashed. It is the node configured not to proccess data. 2. The data nodes start crazy on GC. eventually old generation gc cannot reduce the heep usage and the nodes becomes unresponsive. in some cases. OLD generation gc even increased size of the heap: *2014-12-20 07:21:03,370][WARN ][monitor.jvm ] [******] [gc][young][2796041][224976] duration [1.1s], collections [1]/[1.3s], total [1.1s]/[3.4h], memory [21.5gb]->[21.2gb]/[29.8gb], all_pools {[young] [1.4gb]->[3.4mb]/[1.4gb]}{[survivor] [191.3mb]->[191.3mb]/[191.3mb]}{[old] [19.9gb]->[21gb]/[28.1gb]}* It is a bad query by itself. But I expected ES cluster handles it gracefully. It does throw this exception: * Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [FIELDDATA] Data too large, data for [_uid] would be larger than limit of [19206989414/17.8gb]* I guess ES stopped at some point because field data exceeds the default limit. But it is too late to stop the query that caused heap memory issue. I am wondering if there is any obvious wrong with my ES cluster configuration. I have 5 box eah with 125 ram and 32 cores. I deploy two data nodes on each of them the heap fixed at 31G and configuration is favor bulk ingesting. I actually saw above 60+K document ingesting through put per second. It was working fine until that query comes. Thanks, Jack -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae1b7ea6-d801-4d67-b047-69ab54f1f38b%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.