Hi Jörg,

Glad you could reproduce with my updated gist.

cb.

On Wednesday, February 5, 2014 8:18:39 PM UTC+1, Jörg Prante wrote:
>
> Nils, I ran the test on my Mac, and I can reproduce the issue. And also on 
> Linux.
>
> Unfortunately the Mac locked up and I had to cold reboot, and my 
> copy/paste logs are gone with all the numbers, but anyway.
>
> As a matter of fact, your aggregates demo is daunting.
>
> On the Mac, it shows different counts even between the first and the 
> subsequent executions. The counts of the first are lower, and also, even 
> different terms show up. On Linux, I do not observe different counts 
> between runs.
>

The issue you describe for Mac is the issue I discussed here.

>
> But, what's more bothering is, I observed different results in regard to 
> the shard count, and that is both on Mac and Linux. The more the hit count 
> is on top of the buckets, the more the counts match, only the lower buckets 
> differ, so the deviating counts are somewhat hard to notice.
>

The counts differ when you change the shard size is long known problem of 
elasticsearch and was also a problem in faceting. A long thread about the 
nature of this problem can be found here: 
https://github.com/elasticsearch/elasticsearch/issues/1305.

It is an issue which you can circumvent easily by one of two options:

   1. Use the term you do the aggregation for as a routing key. This forces 
   to have the same tokens in the same shard, and thus always return the exact 
   count. Although this only works if you do these kind of analytics over 1 
   field.
   2. Increase the shard_size for the terms aggregation. This way the 
   internal shards create bigger lists which than have more chance of 
   containing the actual top terms. 
   
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size


> I use Java 8 FCS, but since you observe this issue also on Java 7, I think 
> it is not an issue of Java 8. And it's both on Mac and Linux, but with 
> different symptoms.
>

This makes the only factor occurring multiple times the MacOSX OS. And on 
all java versions, I tested both 1.7 and 1.6. It is unfortunate that Adrien 
wasn't able to reproduce it on OSX.
 

>
> ES 1.0.0.RC2
> Mac OS X 10.8.5
> Darwin Jorg-Prantes-MacBook-Pro.local 12.5.0 Darwin Kernel Version 12.5.0: 
> Sun Sep 29 13:33:47 PDT 2013; root:xnu-2050.48.12~1/RELEASE_X86_64 x86_64
> java version "1.8.0"
> Java(TM) SE Runtime Environment (build 1.8.0-b128)
> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b69, mixed mode)
> G1GC enabled
>
> ES 1.0.0.RC2
> RHEL 6.3
> Linux zephyros 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "1.8.0"
> Java(TM) SE Runtime Environment (build 1.8.0-b128)
> Java HotSpot(TM) 64-Bit Server VM (build 25.0-b69, mixed mode)
> G1GC enabled
>
> Here are two Linux examples. Note, the last three terms and counts are 
> different.
>
> shards=10
>
> {
>   "took" : 143,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 10,
>     "successful" : 10,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 1060387,
>     "max_score" : 0.0,
>     "hits" : [ ]
>   },
>   "aggregations" : {
>     "a" : {
>       "buckets" : [ {
>         "key" : "totaltrafficbos",
>         "doc_count" : 3599
>       }, {
>         "key" : "mai93thm",
>         "doc_count" : 2517
>       }, {
>         "key" : "mai90thm",
>         "doc_count" : 2207
>       }, {
>         "key" : "mai95thm",
>         "doc_count" : 2207
>       }, {
>         "key" : "totaltrafficnyc",
>         "doc_count" : 1660
>       }, {
>         "key" : "confessions",
>         "doc_count" : 1534
>       }, {
>         "key" : "incidentreports",
>         "doc_count" : 1468
>       }, {
>         "key" : "nji80thm",
>         "doc_count" : 1071
>       }, {
>         "key" : "pai76thm",
>         "doc_count" : 1039
>       }, {
>         "key" : "txi35thm",
>         "doc_count" : 357
>       } ]
>     }
>   }
> }
>
> shards=5
>
> {
>   "took" : 172,
>   "timed_out" : false,
>   "_shards" : {
>     "total" : 5,
>     "successful" : 5,
>     "failed" : 0
>   },
>   "hits" : {
>     "total" : 1060387,
>     "max_score" : 0.0,
>     "hits" : [ ]
>   },
>   "aggregations" : {
>     "a" : {
>       "buckets" : [ {
>         "key" : "totaltrafficbos",
>         "doc_count" : 3599
>       }, {
>         "key" : "mai93thm",
>         "doc_count" : 2517
>       }, {
>         "key" : "mai90thm",
>         "doc_count" : 2207
>       }, {
>         "key" : "mai95thm",
>         "doc_count" : 2207
>       }, {
>         "key" : "totaltrafficnyc",
>         "doc_count" : 1660
>       }, {
>         "key" : "confessions",
>         "doc_count" : 1534
>       }, {
>         "key" : "incidentreports",
>         "doc_count" : 1468
>       }, {
>         "key" : "nji80thm",
>         "doc_count" : 1180
>       }, {
>         "key" : "pai76thm",
>         "doc_count" : 936
>       }, {
>         "key" : "nji78thm",
>         "doc_count" : 422
>       } ]
>     }
>   }
> }
>
>
> Jörg
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8b1df0c8-5ad2-4a08-9bda-4e20026756c0%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to