[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Otis Gospodnetic updated LUCENE-584: ------------------------------------ Attachment: bench-diff.txt Perhaps I did something wrong with the benchmark, but I didn't get any speed-up when using searcher.match(Query, MatchCollector) vs. searcher.search(Query, HitCollector). Here are the benchmark numbers (50000 queries with each), HitCollector first, MatchCollector second: HITCOLLECTOR: [java] ------------> Report Sum By (any) Name (11 about 41 out of 41) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] Rounds_4 0 10 10 1 808020 787.5 1,026.04 7,217,624 17,780,736 [java] Populate - - - - - - - - - - - - 4 - - - 2003 - - 129.9 - - 61.67 - 9,938,986 - 13,821,952 [java] CreateIndex - - - 4 1 4.4 0.91 3,937,522 10,916,864 [java] MAddDocs_2000 - - - - - - - - - - 4 - - - 2000 - - 138.1 - - 57.92 - 9,368,584 - 13,821,952 [java] Optimize - - - 4 1 1.4 2.83 9,938,218 13,821,952 [java] CloseIndex - - - - - - - - - - - 4 - - - - 1 - - 2,000.0 - - 0.00 - 9,938,986 - 13,821,952 [java] OpenReader - - - 4 1 24.0 0.17 9,957,592 13,821,952 [java] SearchSameRdr_50000 - - - - - - - - 4 - - 50000 - - 1,070.3 - - 186.86 - 10,500,146 - 13,821,952 [java] CloseReader - - - 4 1 4,000.0 0.00 9,059,756 13,821,952 [java] WarmNewRdr_50 - - - - - - - - - - 4 - - 100000 - 16,237.7 - - 24.63 - 9,060,268 - 13,821,952 [java] SrchNewRdr_50000 - - - 4 50000 265.9 752.02 10,800,006 13,821,952 [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_2000 0 10 10 1 2000 94.6 21.15 7,844,112 10,407,936 [java] MAddDocs_2000 - 1 100 10 - - 1 - - - 2000 - - 136.7 - - 14.63 - 8,968,144 - 11,309,056 [java] MAddDocs_2000 2 10 100 1 2000 173.2 11.55 10,528,264 15,740,928 [java] MAddDocs_2000 - 3 100 100 - - 1 - - - 2000 - - 188.7 - - 10.60 - 10,133,816 - 17,829,888 MATCHCOLLECTOR: [java] ------------> Report Sum By (any) Name (11 about 41 out of 41) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] Rounds_4 0 10 10 1 808020 781.0 1,034.62 10,566,608 15,859,712 [java] Populate - - - - - - - - - - - - 4 - - - 2003 - - 130.9 - - 61.23 - 10,963,452 - 14,806,016 [java] CreateIndex - - - 4 1 33.9 0.12 3,616,570 11,020,288 [java] MAddDocs_2000 - - - - - - - - - - 4 - - - 2000 - - 137.3 - - 58.29 - 10,445,568 - 14,806,016 [java] Optimize - - - 4 1 1.4 2.82 10,979,398 14,806,016 [java] CloseIndex - - - - - - - - - - - 4 - - - - 1 - - 2,000.0 - - 0.00 - 10,963,452 - 14,806,016 [java] OpenReader - - - 4 1 22.0 0.18 10,982,058 14,806,016 [java] SearchSameRdr_50000 - - - - - - - - 4 - - 50000 - - 1,064.7 - - 187.84 - 11,060,036 - 14,806,016 [java] CloseReader - - - 4 1 4,000.0 0.00 10,353,206 14,806,016 [java] WarmNewRdr_50 - - - - - - - - - - 4 - - 100000 - 16,419.0 - - 24.36 - 10,431,062 - 14,806,016 [java] SrchNewRdr_50000 - - - 4 50000 263.0 760.34 11,912,358 14,806,016 [java] ------------> Report sum by Prefix (MAddDocs) and Round (4 about 4 out of 41) [java] Operation round mrg buf runCnt recsPerRun rec/s elapsedSec avgUsedMem avgTotalMem [java] MAddDocs_2000 0 10 10 1 2000 92.2 21.69 7,844,112 10,407,936 [java] MAddDocs_2000 - 1 100 10 - - 1 - - - 2000 - - 136.6 - - 14.64 - 7,720,352 - 10,407,936 [java] MAddDocs_2000 2 10 100 1 2000 167.8 11.92 11,325,952 17,571,840 [java] MAddDocs_2000 - 3 100 100 - - 1 - - - 2000 - - 199.3 - - 10.03 - 14,891,856 - 20,836,352 This is what I did for the benchmark. I used Doron's handy conf/benchmark. I added a new .alg based on micro-standard.alg, here's the diff: $ diff conf/micro-standard.alg conf/matcher-micro-standard.alg 60c60 < { "SearchSameRdr" Search > : 50000 --- > { "SearchSameRdr" SearchMatch > : 50000 65c65 < { "SrchNewRdr" Search > : 50000 --- > { "SrchNewRdr" SearchMatch > : 50000 Then I added 2 new Tasks for benchamrking the Matcher (searcher.search(Query, MatchCollector)) and modified the ReadTask to call searcher.search(Query, HitCollector) instead of the method to get Hits. I commented out all search results traversal and doc retrieval, as I didn't care to measure that. > Decouple Filter from BitSet > --------------------------- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.0.1 > Reporter: Peter Schäfer > Priority: Minor > Attachments: bench-diff.txt, BitsMatcher.java, Filter-20060628.patch, > HitCollector-20060628.patch, IndexSearcher-20060628.patch, > MatchCollector.java, Matcher.java, Matcher20070226.patch, > Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, > Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]