Re: Fast exclusion of earlier queries

2014-11-10 Thread joergpra...@gmail.com
Can you move the limit to the end of the filter chain? Then you could apply
the filters as before, plus a limit at the end.

If not, you could experiment
with org.elasticsearch.common.lucene.search.LimitFilter. This filter takes
a number of matches out of a filter.

Jörg

On Mon, Nov 10, 2014 at 11:42 AM, Kristoffer Johansson <
kristoffer.s.johans...@gmail.com> wrote:

> Hi
>
> I'm currently evaluating ElasticSearch to be used by a "selection-engine"
> at my company. The selection engine will be used to answer questions like
> "how many people are there between 20 and 30 years old in the city of
> Stockholm". The critical thing for this system is to give fast feedback on
> the counts (within a second), the extraction of the identities is not as
> time critical.
>
> One requirement of the system is to make multiple queries but still keep
> unique identities. E.g: you should be able to make two queries, for example
> "all people in stockholm" and "all males between 20 and 25" and then the
> second query should not include anyone living in stockholm. We have solved
> this by negating the filter of the first query and using it in the second,
> and because of ElasticSearch filter caching this gives us really nice
> performance.
>
> Now to the real challenge: Any of these queries can contain a limit so the
> above example can be "all people in Stockholm limited to 1" and "all
> males between 20 and 25". In this case the result of the second query
> should contain documents that is "selected" by the first query but is not
> among the 1 chosen by that limit. Now we cannot rely on negated filters
> any more because now we have to investigate the result set to find out what
> documents actually "hit" the first query. And because one query can hit
> millions of documents, this is, of course, really slow.
>
> Have anyone of you considered this kind of requirement before, and do you
> have a suggestion to how we can solve it with reasonable performance?
>
> My team will now examine the possibility of creating this functionality in
> ElasticSearch. We would like to be able to start a "transaction" in ES that
> keeps track of all document identities that has been selected by any query
> within the transaction. Then we can always exclude these identities from a
> query to create the described "uniqueness". Do any of you know if this is
> feasible, and do you have some suggestions for our implementation?
>
> Regards
> Kristoffer
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/2950add6-bb3b-4b42-a6c3-0148a783abd9%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGPVaqsRXNsd7ipq5695F_5vhxVvXVNYW1o38WcYsx5jg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Fast exclusion of earlier queries

2014-11-10 Thread Kristoffer Johansson
Hi

I'm currently evaluating ElasticSearch to be used by a "selection-engine" 
at my company. The selection engine will be used to answer questions like 
"how many people are there between 20 and 30 years old in the city of 
Stockholm". The critical thing for this system is to give fast feedback on 
the counts (within a second), the extraction of the identities is not as 
time critical.

One requirement of the system is to make multiple queries but still keep 
unique identities. E.g: you should be able to make two queries, for example 
"all people in stockholm" and "all males between 20 and 25" and then the 
second query should not include anyone living in stockholm. We have solved 
this by negating the filter of the first query and using it in the second, 
and because of ElasticSearch filter caching this gives us really nice 
performance.

Now to the real challenge: Any of these queries can contain a limit so the 
above example can be "all people in Stockholm limited to 1" and "all 
males between 20 and 25". In this case the result of the second query 
should contain documents that is "selected" by the first query but is not 
among the 1 chosen by that limit. Now we cannot rely on negated filters 
any more because now we have to investigate the result set to find out what 
documents actually "hit" the first query. And because one query can hit 
millions of documents, this is, of course, really slow.

Have anyone of you considered this kind of requirement before, and do you 
have a suggestion to how we can solve it with reasonable performance?

My team will now examine the possibility of creating this functionality in 
ElasticSearch. We would like to be able to start a "transaction" in ES that 
keeps track of all document identities that has been selected by any query 
within the transaction. Then we can always exclude these identities from a 
query to create the described "uniqueness". Do any of you know if this is 
feasible, and do you have some suggestions for our implementation?

Regards
Kristoffer

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2950add6-bb3b-4b42-a6c3-0148a783abd9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.