Aggregation on nested document always takes 2-3 seconds?

2014-02-13 Thread Luke Scott
I have an index that uses 1 level of nested documents. When I run a query 
on it the result comes back in about 20-200 milliseconds. When I add a 
facet or an aggregation involving the nested documents the uncached 
response always takes 2-3 seconds, regardless of how many documents have 
been selected, even zero.

My map looks like this:

{
document: {
dynamic: strict,
properties: {
account_id: {
type: long
},
data: {
type: nested,
properties: {
key: {
type: string,
index: not_analyzed
},
string: {
type: string,
index: not_analyzed,
fields: {
token: {
type: string
}
}
},
integer: {
type: long
},
date: {
type: date,
format: dateOptionalTime
}
}
}
}
}
}

There are 3.6 million documents in this index. My query looks like this:

{
query: {
bool:{
must:[
 {term:{account_id: 1}},
{
nested:{
path:data,
query:{term:{key:amount}}
}
}
]
}
}
}

The result to the above query is 0 documents because account_id 1 doesn't 
have any documents with a key of amount. Uncached this returns in about 
10-150ms:

{
took: 9,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 0,
max_score: null,
hits: []
}
}

When I add an aggregation to the query:

{
...
aggs : {
report : {
nested : {
path : data
},
aggs : {
amount : {
filter : {
query: {term: {key:amount}}
},
aggs: {
sum: {
sum : { field : integer }
}
}
}
}
}
}
}

Uncached the query returns in about 2-3 seconds:

{
took: 2770,
timed_out: false,
_shards: {
total: 5,
successful: 5,
failed: 0
},
hits: {
total: 0,
max_score: null,
hits: []
},
aggregations: {
report: {
doc_count: 0,
amount: {
doc_count: 0,
sum: {
value: 0
}
}
}
}
}

If I run the same thing a second time (cached) it runs in 26 milliseconds. 
If I clear the cache and run it again it takes 2 seconds.

Why is this aggregation always taking 2-3 seconds, even though the query is 
returning 0 documents? The same thing happens with a statistical facet.

-
Luke

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/a82323a6-9a81-436b-a2d2-cc26e918cb7c%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Aggregation on nested document always takes 2-3 seconds?

2014-02-13 Thread Adrien Grand
Very likely this problem is not related to nested documents but to
fielddata loading because of the integer field. Field data is a
column-oriented view of the data that is, by default, lazily loaded from
the inverted index on the first time that it is needed, and then cached
until the end of life of the segment it belongs to. So only the first
request that needs it is supposed to be slow.

It is possible to load field data eagerly[1] in order to make sure that
field data loading is never going to impact response times. This way you
should not get such slow response times on the first queries.

Another option would be to use doc values[2] that will store field data on
disk instead of loading it from the inverted index. Since data will already
be stored in a column-oriented way, there will be no need to uninvert data
from the inverted index (which is costly and probably the reason of your
slow queries).

[1]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_fielddata_loading
[2]
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/fielddata-formats.html#_numeric_field_data_types



On Thu, Feb 13, 2014 at 7:34 PM, Luke Scott l...@visionlaunchers.comwrote:

 I have an index that uses 1 level of nested documents. When I run a query
 on it the result comes back in about 20-200 milliseconds. When I add a
 facet or an aggregation involving the nested documents the uncached
 response always takes 2-3 seconds, regardless of how many documents have
 been selected, even zero.

 My map looks like this:

 {
 document: {
 dynamic: strict,
 properties: {
 account_id: {
 type: long
 },
 data: {
 type: nested,
 properties: {
 key: {
 type: string,
 index: not_analyzed
 },
 string: {
 type: string,
 index: not_analyzed,
 fields: {
 token: {
 type: string
 }
 }
 },
 integer: {
 type: long
 },
 date: {
 type: date,
 format: dateOptionalTime
 }
 }
 }
 }
 }
 }

 There are 3.6 million documents in this index. My query looks like this:

 {
 query: {
 bool:{
 must:[
  {term:{account_id: 1}},
 {
 nested:{
 path:data,
 query:{term:{key:amount}}
 }
 }
 ]
 }
 }
 }

 The result to the above query is 0 documents because account_id 1 doesn't
 have any documents with a key of amount. Uncached this returns in about
 10-150ms:

 {
 took: 9,
 timed_out: false,
 _shards: {
 total: 5,
 successful: 5,
 failed: 0
 },
 hits: {
 total: 0,
 max_score: null,
 hits: []
 }
 }

 When I add an aggregation to the query:

 {
 ...
 aggs : {
 report : {
 nested : {
 path : data
 },
 aggs : {
 amount : {
 filter : {
 query: {term: {key:amount}}
 },
 aggs: {
 sum: {
 sum : { field : integer }
 }
 }
 }
 }
 }
 }
 }

 Uncached the query returns in about 2-3 seconds:

 {
 took: 2770,
 timed_out: false,
 _shards: {
 total: 5,
 successful: 5,
 failed: 0
 },
 hits: {
 total: 0,
 max_score: null,
 hits: []
 },
 aggregations: {
 report: {
 doc_count: 0,
 amount: {
 doc_count: 0,
 sum: {
 value: 0
 }
 }
 }
 }
 }

 If I run the same thing a second time (cached) it runs in 26 milliseconds.
 If I clear the cache and run it again it takes 2 seconds.

 Why is this aggregation always taking 2-3 seconds, even though the query
 is returning 0 documents? The same thing happens with a statistical facet.

 -
 Luke

 --
 You received this message because you are subscribed to the Google Groups
 elasticsearch group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to elasticsearch+unsubscr...@googlegroups.com.
 To view this discussion on the web visit