Nils: It looks different from your issue since document counts are correct here?
MC: I think it is due to https://github.com/elastic/elasticsearch/issues/8688. Has your `max_val` field been dynamically mapped? The only way to prevent this issue is to map fields explicitely instead of relying on dynamic mappings. On Tue, Apr 14, 2015 at 3:55 PM, Nils Dijk <[email protected]> wrote: > Hi, > > To me this sounds a lot like an issue that was happening to me a week > before the release of 1.0.0. This issue was related to internal memory > reuse within Elasticsearch before the result was read out. The issue is > documented here: https://github.com/elastic/elasticsearch/issues/5021 > > What I did back then was create a reproducible test that showed the issue. > > I doubt it has to do with your replica's being inconsistent. Especially > since you turned off replicating and then turned it back on, this copies > the files you have in your primary to the secondaries. > > -- Nils > Here is the test I created in the past: > https://gist.github.com/thanodnl/8803745 > > On Wednesday, March 25, 2015 at 11:57:15 PM UTC+1, MC wrote: >> >> I am seeing some erroneous behavior in my ES cluster when performing >> aggregations. Originally, I thought this was specific to a histogram as >> that is where the error first appeared (in a K3 graph - see my post >> https://groups.google.com/forum/#!topic/elasticsearch/iY-lKjtW7PM for >> reference) but I have been able to re-create the exception with a simple >> max aggregation. The details are as follows: >> >> ES Version: 1.4.4 >> Topology: 5 nodes, 5 shards per index, 2 replicas >> OS: Redhat Linux >> >> To create the issue I execute the following query against the cluster: >> >> { >> "query": { >> "term": { >> "metric": "used" >> } >> }, >> "aggs": { >> "max_val": { >> "max": { >> "field": "metric_value" >> } >> } >> } >> } >> >> Upon executing this query multiple times, I get different responses. One >> time I get the expected result: >> ... >> "took": 13, >> "timed_out": false, >> "_shards": { >> "total": 5, >> "successful": 5, >> "failed": 0 >> }, >> "hits": { >> "total": 11712, >> "max_score": 9.361205, >> ... >> "aggregations": { "max_val": { "value": 18096380}} >> >> whereas on another request with the same query I get the following bad >> response: >> >> "took": 8, >> "timed_out": false, >> "_shards": { >> "total": 5, >> "successful": 5, >> "failed": 0 >> }, >> "hits": { >> "total": 11712, >> "max_score": 9.361205, >> ... >> "aggregations": { "max_val": { "value": 4697741490703565000}} >> >> >> >> Some possibly relevant observations: >> 1. In my first set of tests, I was consistently getting the correct >> results for the first 2 requests and the bad result on the 3rd request >> (with no one else executing this query at that point in time) >> 2. Flushing the cache did not correct the issue >> 3. I reduced the number of replicas to 0 and was consistently getting >> the same result (which happened to be the correct one) >> 4. After increasing the replica count back to 2 and waiting until ES >> reported that the replication was complete, I tried the same experiment. >> This time, the 1st request retrieved the correct result and the next 2 >> requests retrieved incorrect results. In this case the incorrect results >> were not the same but were both huge and of the same order of magnitude. >> >> >> Other info: >> - The size of the index was about 3.3Gb with ~ 50M documents in it >> - This is one of many date based indices (i.e. similar to the logstash >> index setup), but the only one in this installation that exhibited the >> issue. I believe we saw something similar in a UAT environment as well >> where 1 or 2 of the indices acted in this weird manner >> - ES reported the entire cluster as green >> >> >> It seems that some shard(s)/replica(s) were being corrupted on the >> replication and we were being routed to that one every 3rd hit. (Is this >> somehow correlated to the number of replicas?) >> >> So, my questions are: >> >> 1. Has anyone seen this type of behavior before? >> 2. Can it somehow be data dependent? >> 3. Is there any way to figure out what happened/what is happening? >> 4. Why does ES report the cluster state as green? >> 5. How can I debug this? >> 6. How can I prevent/correct this? >> >> >> Any and all help/pointers would be greatly appreciated. >> >> Thanks in advance, >> MC >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/14c3bff7-b17b-4fa8-938f-cf8e13c80a29%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/14c3bff7-b17b-4fa8-938f-cf8e13c80a29%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Adrien -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhroYFe4uk%2BmeE0bbz86S73vRAZZ%2Bk462W%3DtN4t%2BQv5Bg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
