Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

Adrien Grand Tue, 13 Oct 2020 02:48:39 -0700

Can you give us a few more details:
 - What version of Lucene are you testing?
 - Are you benchmarking "restrictionQuery" on its own, or its conjunction
with another query?


You mentioned that you combine your "restrictionQuery" and the user query
with Occur.MUST, Occur.FILTER feels more appropriate for "restrictionQuery"
since it should not contribute to scoring.

TermsInSetQuery automatically executes like a BooleanQuery when the number
of clauses is less than 16, so I would not expect major performance
differences between a TermInSetQuery over less than 16 terms and a
BooleanQuery wrapped in a ConstantScoreQuery.

On Tue, Oct 13, 2020 at 11:35 AM Rob Audenaerde <[email protected]>
wrote:

> Hello,
>
> I'm benchmarking an application which implements security on lucene by
> adding a multivalue field "roles". If the user has one of these roles, he
> can find the document.
>
> I implemented this as a Boolean and query, added the original query and the
> restriction with Occur.MUST.
>
> I'm having some performance issues when counting the index (>60M docs), so
> I thought about tweaking this restriction-implementation.
>
> I set-up a benchmark like this:
>
> I generate 2M documents, Each document has a multi-value "roles" field. The
> "roles" field in each document has 4 values, taken from (2,2,1000,100)
> unique values.
> The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the first
> role, 1 out of 2 for the second, 2 out of the 1000 for the third value, and
> 1 / 100 for the fourth).
>
> I got a somewhat unexpected performance difference. At first, I implemented
> the restriction query like this:
>
> for (final String role : roles) {
>     restrictionQuery.add(new TermQuery(new Term("roles", new
> BytesRef(role))), Occur.SHOULD);
> }
>
> I then switched to a TermInSetQuery, which I thought would be faster
> as it is using constant-scores.
>
> final Set<BytesRef> rolesSet =
> roles.stream().map(BytesRef::new).collect(Collectors.toSet());
> restrictionQuery.add(new TermInSetQuery("roles", rolesSet), Occur.SHOULD);
>
>
> However, the TermInSetQuery has about 25% slower ops/s. Is that to
> be expected? I did not, as I thought the constant-scoring would be faster.
>


-- 
Adrien

Re: unexpected performance TermsQuery Occur.SHOULD vs TermsInSetQuery?

Reply via email to