RE: Statically store sub-collections for search (faceted search?)

Uwe Schindler Mon, 15 Apr 2013 04:43:45 -0700

Hi,

> Hi again,
> 
> >>> You are somehow "misusing" acceptDocs and DocIdSet here, so you
> have
> >> to take care, semantics are different:
> >>> - For acceptDocs "null" means "all documents allowed" -> no deleted
> >>> documents
> >>> - For DocIdSet "null" means "no documents matched"
> >>
> >> Okay, as described above, I would now pass either the result of
> >> getLiveDocs() or Bits.MatchAllDocuments() as the acceptDocs argument
> >> to
> >> getDocIdSet():
> >>
> >> Map<Term, TermContext> termContexts = new HashMap<>();
> >> AtomicReaderContext atomic = ...
> >> ChainedFilter filter = ...
> >
> > You just pass getLiveDocs(), no null check needed. Using your code would
> bring a slowdown for indexes without deletions.
> 
> This makes sense to me, but now I get zero matches in all searches using the
> filter. I am pondering this remark in the documentation of
> Filter.getDocIdSet(AtomicReaderContext context, Bits acceptDocs):
> "acceptDocs - Bits that represent the allowable docs to match (typically
> deleted docs but possibly filtering other documents)"


This just means, you can pass liveDocs as got from AtomicReader (live == 
inverse deleted docs), but you can pass also any other Bits implementation that 
may remove more documents from results. This is what you are dowing with spans.

Passing NULL means all documents are allowed, if this would not be the case, 
whole Lucene queries and filters would not work at all, so if you get 0 docs, 
you must have missed something else. If this is not the case, your filter may 
behave wrong. Look at e.g. FilteredQuery, IndexSearcher or any other query in 
Lucene that handles acceptDocs - those pass getLiveDocs() down. If they are 
null, that means all documents are allowed. The javadocs on Scorer/Filter/... 
should be more clear about this. Can you open an issue about Javadocs?

> I understand that getLiveDocs() returns the document bits set that represent
> NON-deleted documents which seems to match the first part of the
> description (allowable docs). However, why does it say in brackets "typically
> deleted docs"? I had ignored this so far, but as I get zero results now, this
> might be relevant.

See above.

> I am also thinking about how to possibly make use of a BitsFilteredDocIdSet
> in the following kind:
> 
> ChainFilter filter = ...
> AtomicReaderContext = ...
> 
> Bits alldocs = atomic.reader().getLiveDocs(); DocIdSet docids =
> filter.getDocIdSet(atomic, alldocs); BitsFilteredDocIdSet filtered = new
> BitsFilteredDocIdSet(docids, alldocs); Spans luceneSpans =
> sq.getSpans(atomic, filtered.bits(), termContexts);
> 
> However, the documentation of the constructor public
> BitsFilteredDocIdSet(DocIdSet innerSet, Bits acceptDocs) does not make it
> clear to me whether I am applying the arguments correcty. I fails especially 
> to
> understand the acceptDocs argument again:
> "acceptDocs - Allowed docs, all docids not in this set will not be returned by
> this DocIdSet"

You should use BitsFilteredDocIdSet.wrap(), the ctor does not do null checks.

> Would this be the correct way to apply a filter on a SpanQuery?

new FilteredQuery(SpanQuery,Filter)?

> Thanks!
> Carsten
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Statically store sub-collections for search (faceted search?)

Reply via email to