Hi,
I want to filter a result of a query by Long values (applicable for specific
field, actually DocValue field) in Lucene 6 (as replacement for Filters which
are removed in Lucene 6).
The amount of allowed Long values can range from just a few up to hundred
thousands.
What I do now is to create a TermsQuery from generated Terms and apply them on
a BooleanQuery as Filter, like this:
public Query getFilteredQuery(Query query) {
List<Term> terms = new ArrayList<>(getValueSize());
String keyFieldName = getFieldName();
for (Long value : getValues()) {
BytesRef valueAsBytesRef = LongToUTF8Converter.toBytesRef(value);
// save conversion from UTF16 to UTF8
Term term = new Term(keyFieldName, valueAsBytesRef);
terms.add(term);
}
TermsQuery termsQuery = new TermsQuery(terms);
return new BooleanQuery.Builder()
.add(query, Occur.MUST) // original query
.add(termsQuery, Occur.FILTER) // add filter
.build();
}
However, I have a feeling that the conversion from Long values to Terms is
rather inefficient for large collections and also uses a lot of memory.
To ease conversion overhead somewhat, I created a class that converts a Long
value directly to BytesRef instance (in order to avoid conversion to UTF16 and
then UTF8 again) and pass that instance to the Term constructor.
I just wonder if there is a better method for passing large amount of filter
criteria to a BooleanQuery Occur.FILTER clause, that avoids excessive object
creation.
Or maybe there is a better approach than using BooleanQuery in this case?
Would be glad if you could share your thoughts on this.
Thanks a lot,
Josef