[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Chris Male (Commented) (JIRA) Tue, 27 Sep 2011 21:18:10 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116123#comment-13116123
 ]


Chris Male commented on LUCENE-1536:
------------------------------------

I haven't had a chance to look at the latest patch, but:

{quote}
For DocIdSet, can we nuke getRandomAccessBits? Ie, if
supportRandomAccess() returns true, then we can cast the instance to
Bits? Maybe we should rename supportRandomAccess to useRandomAccess?
(Ie, it may support it, but we only want to use random access when the
filter is dense enough).
{quote}

I'm definitely +1 to useRandomAccess() but I think there is a usability 
question mark around removing getRandomAccessBits().  If we assume that if 
DocIdSet.useRandomAccess() returns true then the DocIdSet must be also be a 
Bits implementation, then we need to prevent non-Bits implementations from 
returning true, or setting true in setUseRandomAccess.  If we don't, we're 
likely to confuse even expert users because this all comes together in a method 
deep inside IndexSearcher.

But if we're going to constrain useRandomAccess to only Bits implementations, 
then I once again feel these should be on Bits.  What if we added to Bits 
allowRandomAccessFiltering() or something like that? So even though Bits is 
inherently random-access, we control whether the Bits should be used to do 
filtering.

Alternatively we keep getRandomAccessBits() and see DocIdSet as a random-access 
Bits factory which currently just returns itself in most cases, but potentially 
might not in the future?
                
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to