Re: Terms aggregation scripts running slower than expected

2014-06-02 Thread Guillermo Arias del Río
Thanks! That is a even a better solution. I have made some tests and it 
works. The buckets - and their order - are almost always the same.

El miércoles, 9 de abril de 2014 21:36:16 UTC+2, Thomas S. escribió:
>
> Hi,
>
> I am currently exploring the option of using scripts with aggregations and 
> I noticed that for some reason scripts for terms aggregations are executed 
> much slower than for other aggregations, even if the script doesn't access 
> any fields yet. This also happens for native Java scripts. I'm running 
> Elasticsearch 1.1.0.
>
> For example, on my data set the simple script "1" takes around 400ms for 
> the sum and histogram aggregations, but takes around 25s to run on a terms 
> aggregation, even on repeated runs. What is going on here? Terms 
> aggregations without a script are very fast, and histogram/sum aggregations 
> with scripts that access the document are also very fast: I had to 
> transform a script aggregation that should have been a terms aggregation 
> into a histogram and convert the numeric values back into terms on the 
> client so the aggregation would be executed in reasonable time.
>
>
> In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
> 'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
> Out[2]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>  u'key': u'1'}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 24986}
>
>
> In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
> 'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
> Out[10]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'value': 4231327.0}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 363}
>
>
> In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
> 'aggregations': { 'test_script': { 'histogram': { 'script': '1', 
> 'interval': 1 } } }})
> Out[8]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>  u'key': 1}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 421}
>
>
> Thomas
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7d7788b6-e33a-4859-8d6d-cd3be1a5006e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Terms aggregation scripts running slower than expected

2014-04-09 Thread Adrien Grand
The terms aggregation relies on the fact that field data produces unique
values in order to run efficiently. When you provide a script, by default
there will be a wrapper that will take care of deduplicating them in order
to make sure the result would be the same as if the data was stored in the
index.

You can tell Elasticsearch to assume that values are already unique by
passing `script_values_unique`: `true` to the terms aggregation. Can you
check if it makes the aggregation faster?


On Wed, Apr 9, 2014 at 9:36 PM, Thomas S.  wrote:

> Hi,
>
> I am currently exploring the option of using scripts with aggregations and
> I noticed that for some reason scripts for terms aggregations are executed
> much slower than for other aggregations, even if the script doesn't access
> any fields yet. This also happens for native Java scripts. I'm running
> Elasticsearch 1.1.0.
>
> For example, on my data set the simple script "1" takes around 400ms for
> the sum and histogram aggregations, but takes around 25s to run on a terms
> aggregation, even on repeated runs. What is going on here? Terms
> aggregations without a script are very fast, and histogram/sum aggregations
> with scripts that access the document are also very fast: I had to
> transform a script aggregation that should have been a terms aggregation
> into a histogram and convert the numeric values back into terms on the
> client so the aggregation would be executed in reasonable time.
>
>
> In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
> Out[2]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>  u'key': u'1'}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 24986}
>
>
> In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
> Out[10]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'value': 4231327.0}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 363}
>
>
> In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} },
> 'aggregations': { 'test_script': { 'histogram': { 'script': '1',
> 'interval': 1 } } }})
> Out[8]:
> {u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
>  u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
>  u'key': 1}]}},
>  u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
>  u'timed_out': False,
>  u'took': 421}
>
>
> Thomas
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j45QsxBkdZePnrnd%2B36--yYZKfk19O_H2OGZUS57%3DGOpg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Terms aggregation scripts running slower than expected

2014-04-09 Thread Thomas S.
Hi,

I am currently exploring the option of using scripts with aggregations and 
I noticed that for some reason scripts for terms aggregations are executed 
much slower than for other aggregations, even if the script doesn't access 
any fields yet. This also happens for native Java scripts. I'm running 
Elasticsearch 1.1.0.

For example, on my data set the simple script "1" takes around 400ms for 
the sum and histogram aggregations, but takes around 25s to run on a terms 
aggregation, even on repeated runs. What is going on here? Terms 
aggregations without a script are very fast, and histogram/sum aggregations 
with scripts that access the document are also very fast: I had to 
transform a script aggregation that should have been a terms aggregation 
into a histogram and convert the numeric values back into terms on the 
client so the aggregation would be executed in reasonable time.


In [2]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'terms': { 'script': '1' } } }})
Out[2]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': u'1'}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 24986}


In [10]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'sum': { 'script': '1' } } }})
Out[10]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'value': 4231327.0}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 363}


In [8]: app.search.search({'size': 0, 'query': { 'match_all': {} }, 
'aggregations': { 'test_script': { 'histogram': { 'script': '1', 
'interval': 1 } } }})
Out[8]:
{u'_shards': {u'failed': 0, u'successful': 246, u'total': 246},
 u'aggregations': {u'test_script': {u'buckets': [{u'doc_count': 4231327,
 u'key': 1}]}},
 u'hits': {u'hits': [], u'max_score': 0.0, u'total': 4231327},
 u'timed_out': False,
 u'took': 421}


Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/4af8942c-db46-47fa-9d38-370051a15c5c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.