[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Michael McCandless (JIRA) Fri, 23 Sep 2011 09:40:52 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113542#comment-13113542
 ]


Michael McCandless commented on LUCENE-1536:
--------------------------------------------


bq. Would it make more sense to use the 'LiveDocs' notion in this API too? So 
rather than it being bitsIncludeDeletedDocs, it would be liveDocsOnly.

+1

bq. In what situations would a DocIdSet return a different Bits implementation 
for getRandomAccessBits? If the DocIdSet impl doesn't support random access, 
then converting to a random-access Bits implementation could be intensive, 
right? which would probably out-weigh the benefits. Therefore could we simply 
check if the DocIdSet is also a Bits impl? If it isn't, then we do the 
traditional filtering approach.

You mean instanceof check instead of getRandomAccessBits?  I think
that's good?

bq. When would a Bits implementation not want to support random-access? It 
seems its interface is implicitly random-access. If so, could we cut the setter 
from FixedBitSet (I'm moved it from OpenBitSet).

Random access is actually slower when the number of docs that match
the filter is tiny (like less than 1% of the index), so I think if
it's a Bits instance we still need a way to force it to use the
old way?

> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to