[
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1536:
----------------------------------
Attachment: LUCENE-1536.patch
Further investigations showed more problems:
- FilteredDocIdSet does never implement Bits, but it should if the wrapped
filter implements Bits. This cannot be done as two different implementation
would be needed. I have no idea how to solve this.
I uploaded a new patch that fixes the problems from before:
- CachingWrapperFilter now only set the flag for containsOnlyLiveDocs to true,
if it was true before, too. If the orginal filter returned a DocIdSet without
that flag, the cached filter cannot suddenly set it to true
- CachingWrapperFilter also copies the liveDocs when it copies to FixedBitSet
(e.g. QueryWrapperFilter).
- The default for containsOnlyLiveDocs is true, as all current filters were
always resepcting this (exept FieldCacheRangeFilter since the rewrite). All
filters in Lucene use liveDocs, because this was a requirement in older Lucene
versions!
- QueryWrapperFilter may ignore liveDocs and simply return false for the flag.
In general I would like it more to rip the deleted docs handling in
CachingWrapperFilter, as it no longer needs to take care. CWF should simply
return containsOnlyLiveDocs=false if the deleted docs need to be merged in.
There is no need to and them in using FilteredDocIdSet (which slows down for
the random access case, see above)
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 2.4
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
> * Index is first 2M docs of Wikipedia. Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
> * I test across multiple queries. 1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4. "u s" means "united states" (phrase search).
> * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.99999 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
> * Method high means I use random-access filter API in
> IndexSearcher's main loop. Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
> * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]