Hello, I'm benchmarking an application which implements security on lucene by adding a multivalue field "roles". If the user has one of these roles, he can find the document.
I implemented this as a Boolean and query, added the original query and the restriction with Occur.MUST. I'm having some performance issues when counting the index (>60M docs), so I thought about tweaking this restriction-implementation. I set-up a benchmark like this: I generate 2M documents, Each document has a multi-value "roles" field. The "roles" field in each document has 4 values, taken from (2,2,1000,100) unique values. The user has (1,1,2,1) values for roles (so, 1 out of the 2, for the first role, 1 out of 2 for the second, 2 out of the 1000 for the third value, and 1 / 100 for the fourth). I got a somewhat unexpected performance difference. At first, I implemented the restriction query like this: for (final String role : roles) { restrictionQuery.add(new TermQuery(new Term("roles", new BytesRef(role))), Occur.SHOULD); } I then switched to a TermInSetQuery, which I thought would be faster as it is using constant-scores. final Set<BytesRef> rolesSet = roles.stream().map(BytesRef::new).collect(Collectors.toSet()); restrictionQuery.add(new TermInSetQuery("roles", rolesSet), Occur.SHOULD); However, the TermInSetQuery has about 25% slower ops/s. Is that to be expected? I did not, as I thought the constant-scoring would be faster.