I did some analysis with access-control lists and found that our customers have significant overlap in the documents they have access to, so we would be able to realize very nice compression in the number of terms in access control queries by indexing overlapping subsets. However this is a fair amount of effort since it requires analyzing all the access lists periodically and re-indexing some set of documents when that changes. We're able to achieve good-enough performance by simply caching a filter we generate when a session starts - even though the initial query may be kind of slow, we only run it once and the user is largely unaffected. Maybe you can play some trick like that?

-Mike

On 10/29/2014 08:20 AM, Pawel Rog wrote:
Hi,
I already tried to transform Queries to filter (TermQuery -> TermFilter)
but didn't see much speed up. I wrote that  wrapped this filter into
ConstantScoreQuery and in other test I used FilteredQuery with
MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
in terms of performance to simple BooleanQuery.
But of course I'll also try to use TermsFilter. Maybe it will speedUp
filters.

Michael Sokolov I haven't prepared any statistics about number of
BooleanClauses used and if there are some repeating sets of terms. I think
I have to collect some stats for better understanding what can be improved.

--
Paweł Róg


On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:

I'm curious to know more about your use case, because I have an idea for
something that addresses this, but haven't found the opportunity to develop
it yet - maybe somebody else wants to :).  The basic idea is to reduce the
number of terms needed to be looked up by collapsing commonly-occurring
collections of terms into synthetic "tiles".  If your queries have a lot of
overlap, this could greatly reduce the number of terms in a query rewritten
to use tiles. It's sort of complex, requires indexing support, or a filter
cache, and there's no working implementation as yet, so this is probably
not really going to be helpful for you in the short term, but if you can
share some information I'd love to know:

what kind of things are you searching?
how many terms do your larger queries have?
do the query terms overlap among your queries?

-Mike Sokolov


On 10/28/14 9:40 PM, Pawel Rog wrote:

Hi,
I have to run query with a lot of boolean should clauses. Queries like
these were of course slow so I decided to change query to filter wrapped
by
ConstantScoreQuery but it also didn't help.

Profiler shows that most of the time is spent on seekExact in
BlockTreeTermsReader$FieldReader$SegmentTermsEnum

When I go deeper in trace I see that inside seekExact most time is spent
on
loadBlock and even deeper ByteBufferIndexInput.clone.

Do you have any ideas how I can make it work faster or it is not possible
and I have to live with it?

--
Paweł Róg


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to