Re: Incorrect Aggregations returned from ES

Adrien Grand Tue, 14 Apr 2015 07:06:04 -0700

Nils: It looks different from your issue since document counts are correct
here?


MC: I think it is due to
https://github.com/elastic/elasticsearch/issues/8688. Has your `max_val`
field been dynamically mapped? The only way to prevent this issue is to map
fields explicitely instead of relying on dynamic mappings.

On Tue, Apr 14, 2015 at 3:55 PM, Nils Dijk <[email protected]> wrote:

> Hi,
>
> To me this sounds a lot like an issue that was happening to me a week
> before the release of 1.0.0. This issue was related to internal memory
> reuse within Elasticsearch before the result was read out. The issue is
> documented here: https://github.com/elastic/elasticsearch/issues/5021
>
> What I did back then was create a reproducible test that showed the issue.
>
> I doubt it has to do with your replica's being inconsistent. Especially
> since you turned off replicating and then turned it back on, this copies
> the files you have in your primary to the secondaries.
>
> -- Nils
> Here is the test I created in the past:
> https://gist.github.com/thanodnl/8803745
>
> On Wednesday, March 25, 2015 at 11:57:15 PM UTC+1, MC wrote:
>>
>> I am seeing some erroneous behavior in my ES cluster when performing
>> aggregations.  Originally, I thought this was specific to a histogram as
>> that is where the error first appeared (in a K3 graph - see my post
>> https://groups.google.com/forum/#!topic/elasticsearch/iY-lKjtW7PM for
>> reference) but I have been able to re-create the exception with a simple
>> max aggregation.  The details are as follows:
>>
>> ES Version: 1.4.4
>> Topology: 5 nodes, 5 shards per index, 2 replicas
>> OS: Redhat Linux
>>
>> To create the issue I execute the following query against the cluster:
>>
>> {
>>   "query": {
>>     "term": {
>>       "metric": "used"
>>     }
>>   },
>>   "aggs": {
>>     "max_val": {
>>       "max": {
>>         "field": "metric_value"
>>       }
>>     }
>>   }
>> }
>>
>> Upon executing this query multiple times, I get different responses.  One
>> time I get the expected result:
>> ...
>> "took": 13,
>> "timed_out": false,
>> "_shards": {
>> "total": 5,
>> "successful": 5,
>> "failed": 0
>> },
>> "hits": {
>> "total": 11712,
>> "max_score": 9.361205,
>> ...
>> "aggregations": { "max_val": { "value": 18096380}}
>>
>> whereas on another request with the same query I get the following bad
>> response:
>>
>> "took": 8,
>> "timed_out": false,
>> "_shards": {
>> "total": 5,
>> "successful": 5,
>> "failed": 0
>> },
>> "hits": {
>> "total": 11712,
>> "max_score": 9.361205,
>> ...
>> "aggregations": { "max_val": { "value": 4697741490703565000}}
>>
>>
>>
>> Some possibly relevant observations:
>> 1.  In my first set of tests, I was consistently getting the correct
>> results for the first 2 requests and the bad result on the 3rd request
>> (with no one else executing this query at that point in time)
>> 2.  Flushing the cache did not correct the issue
>> 3.  I reduced the number of replicas to 0 and was consistently getting
>> the same result (which happened to be the correct one)
>> 4.  After increasing the replica count back to 2 and waiting until ES
>> reported that the replication was complete, I tried the same experiment.
>> This time, the 1st request retrieved the correct result and the next 2
>> requests retrieved incorrect results.  In this case the incorrect results
>> were not the same but were both huge and of the same order of magnitude.
>>
>>
>> Other info:
>> - The size of the index was about 3.3Gb with ~ 50M documents in it
>> - This is one of many date based indices (i.e. similar to the logstash
>> index setup), but the only one in this installation that exhibited the
>> issue.  I believe we saw something similar in a UAT environment as well
>> where 1 or 2 of the indices acted in this weird manner
>> - ES reported the entire cluster as green
>>
>>
>> It seems that some shard(s)/replica(s) were being corrupted on the
>> replication and we were being routed to that one every 3rd hit.  (Is this
>> somehow correlated to the number of replicas?)
>>
>> So, my questions are:
>>
>> 1. Has anyone seen this type of behavior before?
>> 2. Can it somehow be data dependent?
>> 3. Is there any way to figure out what happened/what is happening?
>> 4. Why does ES report the cluster state as green?
>> 5. How can I debug this?
>> 6. How can I prevent/correct this?
>>
>>
>> Any and all help/pointers would be greatly appreciated.
>>
>> Thanks in advance,
>> MC
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/14c3bff7-b17b-4fa8-938f-cf8e13c80a29%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/14c3bff7-b17b-4fa8-938f-cf8e13c80a29%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAO5%3DkAhroYFe4uk%2BmeE0bbz86S73vRAZZ%2Bk462W%3DtN4t%2BQv5Bg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Incorrect Aggregations returned from ES

Reply via email to