I did some analysis with access-control lists and found that our
customers have significant overlap in the documents they have access to,
so we would be able to realize very nice compression in the number of
terms in access control queries by indexing overlapping subsets.
However this is a fair amount of effort since it requires analyzing all
the access lists periodically and re-indexing some set of documents when
that changes. We're able to achieve good-enough performance by simply
caching a filter we generate when a session starts - even though the
initial query may be kind of slow, we only run it once and the user is
largely unaffected. Maybe you can play some trick like that?
-Mike
On 10/29/2014 08:20 AM, Pawel Rog wrote:
Hi,
I already tried to transform Queries to filter (TermQuery -> TermFilter)
but didn't see much speed up. I wrote that wrapped this filter into
ConstantScoreQuery and in other test I used FilteredQuery with
MatchAllDocsQuery and BooleanFilter. Both cases seems to work quite similar
in terms of performance to simple BooleanQuery.
But of course I'll also try to use TermsFilter. Maybe it will speedUp
filters.
Michael Sokolov I haven't prepared any statistics about number of
BooleanClauses used and if there are some repeating sets of terms. I think
I have to collect some stats for better understanding what can be improved.
--
Paweł Róg
On Wed, Oct 29, 2014 at 12:30 PM, Michael Sokolov <
msoko...@safaribooksonline.com> wrote:
I'm curious to know more about your use case, because I have an idea for
something that addresses this, but haven't found the opportunity to develop
it yet - maybe somebody else wants to :). The basic idea is to reduce the
number of terms needed to be looked up by collapsing commonly-occurring
collections of terms into synthetic "tiles". If your queries have a lot of
overlap, this could greatly reduce the number of terms in a query rewritten
to use tiles. It's sort of complex, requires indexing support, or a filter
cache, and there's no working implementation as yet, so this is probably
not really going to be helpful for you in the short term, but if you can
share some information I'd love to know:
what kind of things are you searching?
how many terms do your larger queries have?
do the query terms overlap among your queries?
-Mike Sokolov
On 10/28/14 9:40 PM, Pawel Rog wrote:
Hi,
I have to run query with a lot of boolean should clauses. Queries like
these were of course slow so I decided to change query to filter wrapped
by
ConstantScoreQuery but it also didn't help.
Profiler shows that most of the time is spent on seekExact in
BlockTreeTermsReader$FieldReader$SegmentTermsEnum
When I go deeper in trace I see that inside seekExact most time is spent
on
loadBlock and even deeper ByteBufferIndexInput.clone.
Do you have any ideas how I can make it work faster or it is not possible
and I have to live with it?
--
Paweł Róg
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org