[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uwe Schindler updated LUCENE-1536: ---------------------------------- Attachment: LUCENE-1536.patch Further investigations showed more problems: - FilteredDocIdSet does never implement Bits, but it should if the wrapped filter implements Bits. This cannot be done as two different implementation would be needed. I have no idea how to solve this. I uploaded a new patch that fixes the problems from before: - CachingWrapperFilter now only set the flag for containsOnlyLiveDocs to true, if it was true before, too. If the orginal filter returned a DocIdSet without that flag, the cached filter cannot suddenly set it to true - CachingWrapperFilter also copies the liveDocs when it copies to FixedBitSet (e.g. QueryWrapperFilter). - The default for containsOnlyLiveDocs is true, as all current filters were always resepcting this (exept FieldCacheRangeFilter since the rewrite). All filters in Lucene use liveDocs, because this was a requirement in older Lucene versions! - QueryWrapperFilter may ignore liveDocs and simply return false for the flag. In general I would like it more to rip the deleted docs handling in CachingWrapperFilter, as it no longer needs to take care. CWF should simply return containsOnlyLiveDocs=false if the deleted docs need to be merged in. There is no need to and them in using FilteredDocIdSet (which slows down for the random access case, see above) > if a filter can support random access API, we should use it > ----------------------------------------------------------- > > Key: LUCENE-1536 > URL: https://issues.apache.org/jira/browse/LUCENE-1536 > Project: Lucene - Java > Issue Type: Improvement > Components: core/search > Affects Versions: 2.4 > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Labels: gsoc2011, lucene-gsoc-11, mentor > Fix For: 4.0 > > Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, > LUCENE-1536.patch, LUCENE-1536.patch > > > I ran some performance tests, comparing applying a filter via > random-access API instead of current trunk's iterator API. > This was inspired by LUCENE-1476, where we realized deletions should > really be implemented just like a filter, but then in testing found > that switching deletions to iterator was a very sizable performance > hit. > Some notes on the test: > * Index is first 2M docs of Wikipedia. Test machine is Mac OS X > 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. > * I test across multiple queries. 1-X means an OR query, eg 1-4 > means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 > AND 3 AND 4. "u s" means "united states" (phrase search). > * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, > 95, 98, 99, 99.99999 (filter is non-null but all bits are set), > 100 (filter=null, control)). > * Method high means I use random-access filter API in > IndexSearcher's main loop. Method low means I use random-access > filter API down in SegmentTermDocs (just like deleted docs > today). > * Baseline (QPS) is current trunk, where filter is applied as iterator up > "high" (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org