RE: Statically store sub-collections for search (faceted search?)

Uwe Schindler Mon, 15 Apr 2013 01:42:52 -0700

There might be 2 problems:

Not every DocIdSet supports bits(). If it returns null, then bits are not 
supported. To enforce a bitset availabe use CachingWrapperFilter (which 
internally uses a BitSet to cache).
It might also happen that Filter.getDocIdSet() returns null, which means that 
no document matches the filter.


AcceptDocs in Lucene are generally all non-deleted documents. For your call to 
Filter.getDocIdSet you should therefor pass AtomicReader.getLiveDocs() and not 
Bits.MatchAllBits.

You are somehow "misusing" acceptDocs and DocIdSet here, so you have to take 
care, semantics are different:
- For acceptDocs "null" means "all documents allowed" -> no deleted documents
- For DocIdSet "null" means "no documents matched"

Finally: The trick here is to make Spans think that there are more deleted docs 
than AtomicReader returns as deleted docs (if you would directly pass 
getLiveDocs() to getSpans()). The filter is applied to the deleted docs BitSet.

Uwe

> Am 15.04.2013 10:04, schrieb Uwe Schindler:
> > The limit also applies for filters. If you have a list of terms ORed 
> > together,
> the fastest way is not to use a BooleanQuery at all, but instead a TermsFilter
> (which has no limits).
> 
> Hi Uwe,
> thanks for the pointer, this looks promising! The only missing piece for me is
> now how to use that filter in SpanQuery#getSpans(). I have generated a
> DocIdSet from the filter with getDocIdSet(AtomicReaderContext context,
> Bits.MatchAllBits).bits(), but for some reason this just doesn't filter 
> anything.
> 
> I am not sure what getSpans() expects the acceptDocs to be (I suppose the
> bits that correspond to docs that should be returned are to be set).
> This uncertainty roots in the getDocIdSet method because I am not sure
> what to use as an argument for acceptDocs there either.
> 
> Best,
> Carsten
> 
> 
> --
> Institut für Deutsche Sprache | http://www.ids-mannheim.de
> Projekt KorAP                 | http://korap.ids-mannheim.de
> Tel. +49-(0)621-43740789      | schno...@ids-mannheim.de
> Korpusanalyseplattform der nächsten Generation Next Generation Corpus
> Analysis Platform
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Statically store sub-collections for search (faceted search?)

Reply via email to