Hi,

I want to filter a result of a query by Long values (applicable for specific 
field, actually DocValue field) in Lucene 6 (as replacement for Filters which 
are removed in Lucene 6).

The amount of allowed Long values can range from just a few up to hundred 
thousands.
What I do now is to create a TermsQuery from generated Terms and apply them on 
a BooleanQuery as Filter, like this:

    public Query getFilteredQuery(Query query) {
        List<Term> terms = new ArrayList<>(getValueSize());
        String keyFieldName = getFieldName();
        for (Long value : getValues()) {
            BytesRef valueAsBytesRef = LongToUTF8Converter.toBytesRef(value); 
// save conversion from UTF16 to UTF8
            Term term = new Term(keyFieldName, valueAsBytesRef);
            terms.add(term);
        }
        TermsQuery termsQuery = new TermsQuery(terms);

        return new BooleanQuery.Builder()
                .add(query, Occur.MUST)  // original query
                .add(termsQuery, Occur.FILTER) // add filter
                .build();
    }

However, I have a feeling that the conversion from Long values to Terms is 
rather inefficient for large collections and also uses a lot of memory.
To ease conversion overhead somewhat, I created a class that converts a Long 
value directly to BytesRef instance (in order to avoid conversion to UTF16 and 
then UTF8 again) and pass that instance to the Term constructor.

I just wonder if there is a better method for passing large amount of filter 
criteria to a BooleanQuery Occur.FILTER clause, that avoids excessive object 
creation.
Or maybe there is a better approach than using BooleanQuery in this case?

Would be glad if you could share your thoughts on this.

Thanks a lot,
Josef

Reply via email to