Re: Something I am finding difficult, using Aggregations

2014-07-16 Thread Adrien Grand
On Thu, Jul 3, 2014 at 6:24 PM, mooky  wrote:

> By the way, it appears that doing a Terms sub-aggregation (as I suggested
> in (b)) can be a bit of a performance murderer...
> In my case I am already doing a Terms aggregation (on the id) - and the
> Terms sub-aggregation is turning a ~10ms response into a ~1ms response
> :-o
>
> Sure, obviously there exists an id-reference data mapping in the system.
> But it doesn't really scale having to dereference ids on read operations.
> Either :
> a) its a remote call - and making 10's or 100's of remote calls to serve a
> single user request isnt going to perform or scale well.
> b) the reference data has to be all held in RAM - which doesn't scale well.
>
> The thing is that we have the data in the index - we already de-referenced
> it when we built the document to index it.
>
> I can try make a token - but as you can imagine, trying to encode/decode
> all the location details into 1 token will make a big token
>

There are no remote calls, but indeed aggregations are stored in RAM. So if
the field that you are using for the first-level terms aggregation has a
high cardinality, adding a sub-aggregation certainly adds memory pressure
(CPU overhead as well, but not enough to justify this slow down).

Deferred aggregations might help for that issue:
https://github.com/elasticsearch/elasticsearch/pull/6128. It would allow
elasticsearch to compute the top ownerIds first, take the top N and only
then to resolve their ownerName using a top_hits or a terms aggregation.
They will be available in Elasticsearch 1.3 that we expect to release soon.

-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5B88O%2BGS%2BjvX6wr44h3N91xSQtF8TT3vS66AzbECmPqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: Something I am finding difficult, using Aggregations

2014-07-15 Thread mooky
bump.


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/91f55506-5040-48f3-b994-f525999db0b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Something I am finding difficult, using Aggregations

2014-07-03 Thread mooky
By the way, it appears that doing a Terms sub-aggregation (as I suggested 
in (b)) can be a bit of a performance murderer...
In my case I am already doing a Terms aggregation (on the id) - and the 
Terms sub-aggregation is turning a ~10ms response into a ~1ms response 
:-o

Sure, obviously there exists an id-reference data mapping in the system.
But it doesn't really scale having to dereference ids on read operations. 
Either :
a) its a remote call - and making 10's or 100's of remote calls to serve a 
single user request isnt going to perform or scale well.
b) the reference data has to be all held in RAM - which doesn't scale well.

The thing is that we have the data in the index - we already de-referenced 
it when we built the document to index it.

I can try make a token - but as you can imagine, trying to encode/decode 
all the location details into 1 token will make a big token



On Thursday, 3 July 2014 12:06:00 UTC+1, Mark Harwood wrote:
>
> There is a universal truth that computers want IDs and people prefer 
> looking at labels.
> Almost every application has to handle this translation manually and it 
> does feel like if the platform had built-in knowledge of an 
> id->reference-data mapping that would be of widespread use.
>
> In the interim I guess one approach is to combine the unique ID and 
> related non-unique label into a single token which would then satisfy the 
> needs of having unique tokens for aggregation and readable tokens for 
> display purposes (perhaps made more readable if you strip the ID from the 
> token before display). 
> Obviously this would add overheads over using a basic ID.
>
>
> On Wednesday, July 2, 2014 1:21:26 PM UTC+1, mooky wrote:
>>
>>
>> Having used elastic aggregations for a little bit (and having used Mongo 
>> aggregations previously), I have been finding a couple of things a bit 
>> difficult/awkward.
>> I am not sure if its because I don't know how to do it properly - or we 
>> missing a feature/enhancement in elastic.
>>
>> A common thing I want to do is aggregate on field x, but in the result, I 
>> also want field y & z (which are unique for a given x) - there doesn't seem 
>> to be an easy way to do that.
>>
>> Lets say I have some data:
>> {
>> "id" : "94538ef6-2998-4ddd-be00-1f5dc2654955",
>> "quantity" : 1234567.2342,
>> "commodityId" : "0e918fb8-6572-4663-a692-cbebe8aca7f2",
>> "commodityName" : "Lead",
>> "ownerId" : "53e0f816-8a0a-4659-b868-c48035676b25",
>> "ownerName" : "Simon Chan",
>> "locationId" : "1cdd4bc7-76d9-43fb-ac56-8f555164211a",
>> "locationName" : "Shenyang - Shenyang Dongbei",
>> "locationCode" : "W33",
>> "locationCity" : "Shenyang",
>> "locationCountry" : "China"
>> }
>>
>> Lets say I want to do a (term) aggregation on ownerId (because its 
>> unique, while ownerName obviously is not) I will get results where the 
>> bucket key is the id. However, what I want to display to the user is the 
>> ownerName - not the id. Looking up the name from the id could be very 
>> expensive - but its also unnecessary because the name will be unique for a 
>> given bucket - we have the info to hand in the index. The same issue if I 
>> want to aggregate by locationId, or commodityId. We dereference the data 
>> associated with an id, so that we can search on them - but also we want to 
>> use this information to create a label for a bucket when we aggregate.
>>
>> Is there a simple way to retrieve ownerName while aggregating on ownerId?
>> The only way I know to do this is to:
>> a) make sure owner name is not_analyzed and
>> b) do a term subaggregation - which will give only 1 result.
>> Is there an easier way I have missed?
>>
>> (FWIW doing the same thing in, say, a Mongo aggregation is simply a 
>> matter of adding the ownerName as a key field - since its unique for a 
>> given id, it wont change the aggregation results - the ownerName info is 
>> simply extracted from the key data in the result).
>>
>> Cheers,
>> M
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/fb5ea83e-eaaf-4776-8167-b846c4aeb07f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Something I am finding difficult, using Aggregations

2014-07-03 Thread Mark Harwood
There is a universal truth that computers want IDs and people prefer 
looking at labels.
Almost every application has to handle this translation manually and it 
does feel like if the platform had built-in knowledge of an 
id->reference-data mapping that would be of widespread use.

In the interim I guess one approach is to combine the unique ID and related 
non-unique label into a single token which would then satisfy the needs of 
having unique tokens for aggregation and readable tokens for display 
purposes (perhaps made more readable if you strip the ID from the token 
before display). 
Obviously this would add overheads over using a basic ID.


On Wednesday, July 2, 2014 1:21:26 PM UTC+1, mooky wrote:
>
>
> Having used elastic aggregations for a little bit (and having used Mongo 
> aggregations previously), I have been finding a couple of things a bit 
> difficult/awkward.
> I am not sure if its because I don't know how to do it properly - or we 
> missing a feature/enhancement in elastic.
>
> A common thing I want to do is aggregate on field x, but in the result, I 
> also want field y & z (which are unique for a given x) - there doesn't seem 
> to be an easy way to do that.
>
> Lets say I have some data:
> {
> "id" : "94538ef6-2998-4ddd-be00-1f5dc2654955",
> "quantity" : 1234567.2342,
> "commodityId" : "0e918fb8-6572-4663-a692-cbebe8aca7f2",
> "commodityName" : "Lead",
> "ownerId" : "53e0f816-8a0a-4659-b868-c48035676b25",
> "ownerName" : "Simon Chan",
> "locationId" : "1cdd4bc7-76d9-43fb-ac56-8f555164211a",
> "locationName" : "Shenyang - Shenyang Dongbei",
> "locationCode" : "W33",
> "locationCity" : "Shenyang",
> "locationCountry" : "China"
> }
>
> Lets say I want to do a (term) aggregation on ownerId (because its unique, 
> while ownerName obviously is not) I will get results where the bucket key 
> is the id. However, what I want to display to the user is the ownerName - 
> not the id. Looking up the name from the id could be very expensive - but 
> its also unnecessary because the name will be unique for a given bucket - 
> we have the info to hand in the index. The same issue if I want to 
> aggregate by locationId, or commodityId. We dereference the data associated 
> with an id, so that we can search on them - but also we want to use this 
> information to create a label for a bucket when we aggregate.
>
> Is there a simple way to retrieve ownerName while aggregating on ownerId?
> The only way I know to do this is to:
> a) make sure owner name is not_analyzed and
> b) do a term subaggregation - which will give only 1 result.
> Is there an easier way I have missed?
>
> (FWIW doing the same thing in, say, a Mongo aggregation is simply a matter 
> of adding the ownerName as a key field - since its unique for a given id, 
> it wont change the aggregation results - the ownerName info is simply 
> extracted from the key data in the result).
>
> Cheers,
> M
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/ab7e68a0-31f1-415e-b640-9b0c68c76ed3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Something I am finding difficult, using Aggregations

2014-07-03 Thread mooky

Am I on my own with this problem? Have I got it all wrong?



On Wednesday, 2 July 2014 13:21:26 UTC+1, mooky wrote:
>
>
> Having used elastic aggregations for a little bit (and having used Mongo 
> aggregations previously), I have been finding a couple of things a bit 
> difficult/awkward.
> I am not sure if its because I don't know how to do it properly - or we 
> missing a feature/enhancement in elastic.
>
> A common thing I want to do is aggregate on field x, but in the result, I 
> also want field y & z (which are unique for a given x) - there doesn't seem 
> to be an easy way to do that.
>
> Lets say I have some data:
> {
> "id" : "94538ef6-2998-4ddd-be00-1f5dc2654955",
> "quantity" : 1234567.2342,
> "commodityId" : "0e918fb8-6572-4663-a692-cbebe8aca7f2",
> "commodityName" : "Lead",
> "ownerId" : "53e0f816-8a0a-4659-b868-c48035676b25",
> "ownerName" : "Simon Chan",
> "locationId" : "1cdd4bc7-76d9-43fb-ac56-8f555164211a",
> "locationName" : "Shenyang - Shenyang Dongbei",
> "locationCode" : "W33",
> "locationCity" : "Shenyang",
> "locationCountry" : "China"
> }
>
> Lets say I want to do a (term) aggregation on ownerId (because its unique, 
> while ownerName obviously is not) I will get results where the bucket key 
> is the id. However, what I want to display to the user is the ownerName - 
> not the id. Looking up the name from the id could be very expensive - but 
> its also unnecessary because the name will be unique for a given bucket - 
> we have the info to hand in the index. The same issue if I want to 
> aggregate by locationId, or commodityId. We dereference the data associated 
> with an id, so that we can search on them - but also we want to use this 
> information to create a label for a bucket when we aggregate.
>
> Is there a simple way to retrieve ownerName while aggregating on ownerId?
> The only way I know to do this is to:
> a) make sure owner name is not_analyzed and
> b) do a term subaggregation - which will give only 1 result.
> Is there an easier way I have missed?
>
> (FWIW doing the same thing in, say, a Mongo aggregation is simply a matter 
> of adding the ownerName as a key field - since its unique for a given id, 
> it wont change the aggregation results - the ownerName info is simply 
> extracted from the key data in the result).
>
> Cheers,
> M
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/d5858fc3-b7a1-4c60-9678-7f905c496c92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Something I am finding difficult, using Aggregations

2014-07-02 Thread mooky

Having used elastic aggregations for a little bit (and having used Mongo 
aggregations previously), I have been finding a couple of things a bit 
difficult/awkward.
I am not sure if its because I don't know how to do it properly - or we 
missing a feature/enhancement in elastic.

A common thing I want to do is aggregate on field x, but in the result, I 
also want field y & z (which are unique for a given x) - there doesn't seem 
to be an easy way to do that.

Lets say I have some data:
{
"id" : "94538ef6-2998-4ddd-be00-1f5dc2654955",
"quantity" : 1234567.2342,
"commodityId" : "0e918fb8-6572-4663-a692-cbebe8aca7f2",
"commodityName" : "Lead",
"ownerId" : "53e0f816-8a0a-4659-b868-c48035676b25",
"ownerName" : "Simon Chan",
"locationId" : "1cdd4bc7-76d9-43fb-ac56-8f555164211a",
"locationName" : "Shenyang - Shenyang Dongbei",
"locationCode" : "W33",
"locationCity" : "Shenyang",
"locationCountry" : "China"
}

Lets say I want to do a (term) aggregation on ownerId (because its unique, 
while ownerName obviously is not) I will get results where the bucket key 
is the id. However, what I want to display to the user is the ownerName - 
not the id. Looking up the name from the id could be very expensive - but 
its also unnecessary because the name will be unique for a given bucket - 
we have the info to hand in the index. The same issue if I want to 
aggregate by locationId, or commodityId. We dereference the data associated 
with an id, so that we can search on them - but also we want to use this 
information to create a label for a bucket when we aggregate.

Is there a simple way to retrieve ownerName while aggregating on ownerId?
The only way I know to do this is to:
a) make sure owner name is not_analyzed and
b) do a term subaggregation - which will give only 1 result.
Is there an easier way I have missed?

(FWIW doing the same thing in, say, a Mongo aggregation is simply a matter 
of adding the ownerName as a key field - since its unique for a given id, 
it wont change the aggregation results - the ownerName info is simply 
extracted from the key data in the result).

Cheers,
M

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cfcf8e74-06e7-4bf3-8cca-311dd14ccbe2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.