Yes. I can re-index the data or transform it in any way to make this query 
efficient. 

What would you suggest?


On Wednesday, June 4, 2014 2:14:09 PM UTC-7, Itamar Syn-Hershko wrote:
>
> This model is not efficient for this type of querying. You cannot do this 
> in one query using this model, and the pre-processing work you do now + 
> traversing all documents is very costly.
>
> Is it possible for you to index the data (even as a projection) into 
> Elasticsearch using a different model, so you can use ES properly using 
> queries or the aggregations framework?
>
> --
>
> Itamar Syn-Hershko
> http://code972.com | @synhershko <https://twitter.com/synhershko>
> Freelance Developer & Consultant
> Author of RavenDB in Action <http://manning.com/synhershko/>
>
>
> On Thu, Jun 5, 2014 at 12:04 AM, Zennet Wheatcroft <zwhea...@atypon.com 
> <javascript:>> wrote:
>
>> Hi,
>>
>> I am looking for an efficient way to do inter-document queries in 
>> Elasticsearch. Specifically, I want to count the number of users that went 
>> through an exit point B after visiting point A.
>>
>> In general terms, say we have some event log data about users actions on 
>> a website:
>> ....
>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/A", "country":
>> "US", "tstamp":"2013-04-01 00:01:01"}
>> {"userid":"pdq", "machineid":"110519774", "path":"/page/1", "country":
>> "CN", "tstamp":"2013-04-01 00:02:11"}
>> {"userid":"xyz", "machineid":"110530745", "path":"/promo/D", "country":
>> "US", "tstamp":"2013-04-01 00:06:31"}
>> {"userid":"abc", "machineid":"110527022", "path":"/page/23", "country":
>> "DE", "tstamp":"2013-04-01 00:08:00"}
>> {"userid":"pdq", "machineid":"110519774", "path":"/page/2", "country":
>> "CN", "tstamp":"2013-04-01 00:08:55"}
>> {"userid":"xyz", "machineid":"110530745", "path":"/sale/B", "country":
>> "US", "tstamp":"2013-04-01 00:09:46"}
>> {"userid":"abc", "machineid":"110527022 ", "path":"/promo/A", "country":
>> "DE", "tstamp":"2013-04-01 00:10:46"}
>> ....
>> And we have 500+M such entries.
>>
>> We want a count of the number of userids that visited path=/sale/B after 
>> visiting path=/promo/A.
>>
>> What I did is to preprocess the data, sorting by <userid, tstamp>, then 
>> compacting all events by the same userid into the same document. Then I 
>> wrote a script filter which traverses the path array per document, and 
>> returns true if it finds any occurrence of B followed by A. This however is 
>> inefficient. Most of our queries take 1 or 2 seconds on 100+M events. This 
>> script filter query takes over 300 seconds. Specifically, it can process 
>> events at about 400K events per second. BY comparison, I wrote a naive 
>> program that does a linear pass of the un-compacted data and that process 
>> 11M events per second. By which I conclude that Elasticsearch does not do 
>> well on this type of query.
>>
>> I am hoping someone can indicate a more efficient way to do this query in 
>> ES. Or else confirm that ES cannot do inter-document queries well. 
>>
>> Thanks,
>> Zennet
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/28c93f2d-e870-4347-8677-e9da41b6be62%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5c576f27-4b14-4a2d-9415-17ac50e41371%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to