Am 15.04.2013 11:27, schrieb Uwe Schindler:

Hi again,

>>> You are somehow "misusing" acceptDocs and DocIdSet here, so you have
>> to take care, semantics are different:
>>> - For acceptDocs "null" means "all documents allowed" -> no deleted
>>> documents
>>> - For DocIdSet "null" means "no documents matched"
>>
>> Okay, as described above, I would now pass either the result of
>> getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument to
>> getDocIdSet():
>>
>> Map<Term, TermContext> termContexts = new HashMap<>();
>> AtomicReaderContext atomic = ...
>> ChainedFilter filter = ...
> 
> You just pass getLiveDocs(), no null check needed. Using your code would 
> bring a slowdown for indexes without deletions.

This makes sense to me, but now I get zero matches in all searches using
the filter. I am pondering this remark in the documentation of
Filter.getDocIdSet(AtomicReaderContext context, Bits acceptDocs):
"acceptDocs - Bits that represent the allowable docs to match (typically
deleted docs but possibly filtering other documents)"

I understand that getLiveDocs() returns the document bits set that
represent NON-deleted documents which seems to match the first part of
the description (allowable docs). However, why does it say in brackets
"typically deleted docs"? I had ignored this so far, but as I get zero
results now, this might be relevant.

I am also thinking about how to possibly make use of a
BitsFilteredDocIdSet in the following kind:

ChainFilter filter = ...
AtomicReaderContext = ...

Bits alldocs = atomic.reader().getLiveDocs();
DocIdSet docids = filter.getDocIdSet(atomic, alldocs);
BitsFilteredDocIdSet filtered = new BitsFilteredDocIdSet(docids, alldocs);
Spans luceneSpans = sq.getSpans(atomic, filtered.bits(), termContexts);

However, the documentation of the constructor public
BitsFilteredDocIdSet(DocIdSet innerSet, Bits acceptDocs) does not make
it clear to me whether I am applying the arguments correcty. I fails
especially to understand the acceptDocs argument again:
"acceptDocs - Allowed docs, all docids not in this set will not be
returned by this DocIdSet"

Would this be the correct way to apply a filter on a SpanQuery?
Thanks!
Carsten

-- 
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
Korpusanalyseplattform der nächsten Generation
Next Generation Corpus Analysis Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to