[
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-1536:
----------------------------------
Attachment: LUCENE-1536.patch
Patch that fixes the Weight.scoreDocsOutOfOrder method to return the inner
weight's setting. The scorer can still return docs in order, but that was
identical behaviour in previous unpatched trunk (IS looked at the out-of- order
setting of the weight and uses correct collector, but once a filter was
applied, the documents came in order). My patch only missed to pass this
setting to our wrapper query.
Mike: If you have time, can you check this? We may need a test, that uses a
larger index and tests FilteredQuery on top of it, the current indexes used for
filtering are simply too small and in most cases have only one segment :(
There is no need for Robert's hack (that does not work correctly with aceptDocs
!= liveDocs), if different BooleanScorers return significant different scores,
it as a bug, not a problem in FilteredQuery. Slight score changes and therefor
different order in results is not a problem at all - this is just my opinion.
bq. If we want to check that the results are identical, the benchmark test must
explicitely request docs-in-order on trunk vs. patch to be consistent. But then
it's no longer a benchmark.
This is of course untrue, sorry. If the weight returns that docs *may* come out
of order, the collector should handle this.
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
> Key: LUCENE-1536
> URL: https://issues.apache.org/jira/browse/LUCENE-1536
> Project: Lucene - Java
> Issue Type: Improvement
> Components: core/search
> Affects Versions: 2.4
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Labels: gsoc2011, lucene-gsoc-11, mentor
> Fix For: 4.0
>
> Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch,
> LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch,
> LUCENE-1536.patch, LUCENE-1536_hack.patch, changes-yonik-uwe.patch,
> luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
> * Index is first 2M docs of Wikipedia. Test machine is Mac OS X
> 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
> * I test across multiple queries. 1-X means an OR query, eg 1-4
> means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
> AND 3 AND 4. "u s" means "united states" (phrase search).
> * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
> 95, 98, 99, 99.99999 (filter is non-null but all bits are set),
> 100 (filter=null, control)).
> * Method high means I use random-access filter API in
> IndexSearcher's main loop. Method low means I use random-access
> filter API down in SegmentTermDocs (just like deleted docs
> today).
> * Baseline (QPS) is current trunk, where filter is applied as iterator up
> "high" (ie in IndexSearcher's search loop).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]