[
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-584:
-------------------------------
Attachment: bench-diff.txt
The benchmark does not search with filters. Is any speedup still expected?
(why?)
I applied the patch on current trunk and ran the benchmark - it shows that when
all queries use the same reader, Match is faster while when each query opens
its own reader bitset is faster. Is this an expected result?
{noformat}
Operation round runCnt recsPerRun rec/s elapsedSec
avgUsedMem avgTotalMem
SrchMtchSamRdr_5000 - 10 5000 642.2 77.85
12,331,866 16,408,576
SrchBitsSamRdr_5000 - - - - 10 - - - 5000 - - 586.9 - - 85.20 -
9,515,875 - 12,009,472
SrchMtchNewRdr_500 - 10 500 134.7 37.11
13,376,113 17,171,660
{noformat}
This test is using all Reuters documents and the searches rounds are repeated
10 times. The Match tasks were not included so I wrote them. The updated
bench-diff.txt attached contains these task classes and the algorithm. (When
you use this, note that once the index is created you can comment the first
part - the "Populate" part - and then only rerun the querying part.)
> Decouple Filter from BitSet
> ---------------------------
>
> Key: LUCENE-584
> URL: https://issues.apache.org/jira/browse/LUCENE-584
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Affects Versions: 2.0.1
> Reporter: Peter Schäfer
> Priority: Minor
> Attachments: bench-diff.txt, bench-diff.txt, BitsMatcher.java,
> Filter-20060628.patch, HitCollector-20060628.patch,
> IndexSearcher-20060628.patch, MatchCollector.java, Matcher.java,
> Matcher20070226.patch, Scorer-20060628.patch, Searchable-20060628.patch,
> Searcher-20060628.patch, Some Matchers.zip, SortedVIntList.java,
> TestSortedVIntList.java
>
>
> {code}
> package org.apache.lucene.search;
> public abstract class Filter implements java.io.Serializable
> {
> public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
> }
> public interface AbstractBitSet
> {
> public boolean get(int index);
> }
> {code}
> It would be useful if the method =Filter.bits()= returned an abstract
> interface, instead of =java.util.BitSet=.
> Use case: there is a very large index, and, depending on the user's
> privileges, only a small portion of the index is actually visible.
> Sparsely populated =java.util.BitSet=s are not efficient and waste lots of
> memory. It would be desirable to have an alternative BitSet implementation
> with smaller memory footprint.
> Though it _is_ possibly to derive classes from =java.util.BitSet=, it was
> obviously not designed for that purpose.
> That's why I propose to use an interface instead. The default implementation
> could still delegate to =java.util.BitSet=.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]