Hello,

I'm benchmarking an application which implements security on lucene by
adding a multivalue field "roles". If the user has one of these roles, he
can find the document.

I implemented this as a Boolean and query, added the original query and the
restriction with Occur.MUST.

I'm having some performance issues when counting the index (>60M docs), so
I thought about tweaking this restriction-implementation.

I set-up a benchmark like this:

I generate 2M documents, Each document has a multi-value "roles" field. The
"roles" field in each document has 4 values, taken from (2,2,1000,100)
unique values.
The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the first
role, 1 out of 2 for the second, 2 out of the 1000 for the third value, and
1 / 100 for the fourth).

I got a somewhat unexpected performance difference. At first, I implemented
the restriction query like this:

for (final String role : roles) {
    restrictionQuery.add(new TermQuery(new Term("roles", new
BytesRef(role))), Occur.SHOULD);
}

I then switched to a TermInSetQuery, which I thought would be faster
as it is using constant-scores.

final Set<BytesRef> rolesSet =
roles.stream().map(BytesRef::new).collect(Collectors.toSet());
restrictionQuery.add(new TermInSetQuery("roles", rolesSet), Occur.SHOULD);


However, the TermInSetQuery has about 25% slower ops/s. Is that to
be expected? I did not, as I thought the constant-scoring would be faster.

Reply via email to