For queries with many terms, where each term matches few documents (actually a single document for "ID filters" in my tests), I saw speedups between 4x and 8x http://heliosearch.org/solr-terms-query/ (the 3rd chart)
-Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data On Wed, Oct 29, 2014 at 9:42 AM, Michael McCandless <luc...@mikemccandless.com> wrote: > I suggested TermsFilter, not TermFilter :) Note the sneaky extra s .... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Oct 29, 2014 at 8:20 AM, Pawel Rog <pawelro...@gmail.com> wrote: >> Hi, >> I already tried to transform Queries to filter (TermQuery -> TermFilter) >> but didn't see much speed up. I wrote that wrapped this filter into >> ConstantScoreQuery and in other test I used FilteredQuery with >> MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar >> in terms of performance to simple BooleanQuery. >> But of course I'll also try to use TermsFilter. Maybe it will speedUp >> filters. >> >> Michael Sokolov I haven't prepared any statistics about number of >> BooleanClauses used and if there are some repeating sets of terms. I think >> I have to collect some stats for better understanding what can be improved. >> >> -- >> Paweł Róg >> >> >> On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov < >> msoko...@safaribooksonline.com> wrote: >> >>> I'm curious to know more about your use case, because I have an idea for >>> something that addresses this, but haven't found the opportunity to develop >>> it yet - maybe somebody else wants to :). The basic idea is to reduce the >>> number of terms needed to be looked up by collapsing commonly-occurring >>> collections of terms into synthetic "tiles". If your queries have a lot of >>> overlap, this could greatly reduce the number of terms in a query rewritten >>> to use tiles. It's sort of complex, requires indexing support, or a filter >>> cache, and there's no working implementation as yet, so this is probably >>> not really going to be helpful for you in the short term, but if you can >>> share some information I'd love to know: >>> >>> what kind of things are you searching? >>> how many terms do your larger queries have? >>> do the query terms overlap among your queries? >>> >>> -Mike Sokolov >>> >>> >>> On 10/28/14 9:40 PM, Pawel Rog wrote: >>> >>>> Hi, >>>> I have to run query with a lot of boolean should clauses. Queries like >>>> these were of course slow so I decided to change query to filter wrapped >>>> by >>>> ConstantScoreQuery but it also didn't help. >>>> >>>> Profiler shows that most of the time is spent on seekExact in >>>> BlockTreeTermsReader$FieldReader$SegmentTermsEnum >>>> >>>> When I go deeper in trace I see that inside seekExact most time is spent >>>> on >>>> loadBlock and even deeper ByteBufferIndexInput.clone. >>>> >>>> Do you have any ideas how I can make it work faster or it is not possible >>>> and I have to live with it? >>>> >>>> -- >>>> Paweł Róg >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org