SpanQuery, Filter, BooleanQuery

Carsten Schnober Mon, 29 Oct 2012 05:41:23 -0700

Hi,
I've got a setup in which I would like to perform an arbitrary query
over one field (typically realised through a WildcardQuery) and the
matches are returned as a SpanQuery because the result payloads are
further processed using Span.next() and Span.getPayload(). This works
fine with the following code (extract), using Lucene 4.0.0:


---------------------------------------------------------------------
// these fields are initialized externally through public methods:
private final MultiReader reader;
private final String termString;
private final String fieldname;
private final int maxHits;

private Map<Term, TermContext> termContexts = new HashMap<>();
WildcardQuery wildcard;
Term term = new Term(fieldname, termString);
SpanQuery query;        // Lucene query
Spans luceneSpans;

wildcard = new WildcardQuery(term);
query = (SpanQuery) new
SpanMultiTermQueryWrapper<>(wildcard).rewrite(reader);
spans = query.getSpans(atomic, matchingTitleIDs.bits(), termContexts);

for (AtomicReaderContext atomic : reader.getContext().leaves()) {
  spans = query.getSpans(atomic, matchingTitleIDs.bits(), termContexts);
  while (luceneSpans.next() && total <= maxHits) {
        ...
  }
}
---------------------------------------------------------------------

Now, I'd like to add the option to filter the resulting Spans object by
another WildcardQuery on a different field that contains document
titles. My intuitive approach would have been to use a filter like this:

Filter filter = new QueryWrapperFilter(new WildcardQuery(new
Term(titlefield, titles)));

The filter is applied in a dedicated method with this line:

DocIdSet matchingTitleIDs = filter.getDocIdSet(context, new
Bits.MatchAllBits(0));

And subsequently, the getSpan() call from above is substituted by:

spans = query.getSpans(atomic, matchingTitleIDs.bits(), termContexts);

However, this yields either a NullPointerException when there are no
hits or does not affect the results at all in comparison to no filtering.

I've come across the thread "lucene-4.0: QueryWrapperFilter & docBase"
[1] in which Uwe suggests not to use QueryWrapperFilter, but to use
another Query and to combine it using a Boolean Query in such a
scenario, if I understand correctly. Does this still claim for Lucene 4.0?
However, I am not sure how to use a BooleanQuery here because I need the
SpanQuery result.

Any thoughts about what I'm doing wrong and how to fix this?
Thank you very much!
Carsten


[1]
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201210.mbox/%3CCABY_-Z7r=z0301yf1-1uvbqyw3jf48srpuhe6syt1eh28vn...@mail.gmail.com%3E

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

SpanQuery, Filter, BooleanQuery

Reply via email to