[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Robert Muir (Commented) (JIRA) Fri, 07 Oct 2011 12:24:54 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13123095#comment-13123095
 ]


Robert Muir commented on LUCENE-1536:
-------------------------------------

by the way, luceneutil noticed some problems:
{noformat}
Traceback (most recent call last):
  File "localrun.py", line 46, in <module>
    comp.benchmark("trunk_vs_patch")
  File "/home/rmuir/workspace/util/competition.py", line 194, in benchmark
    search=self.benchSearch, index=self.benchIndex, debugs=self._debug, 
debug=self._debug, verifyScores=self._verifyScores)
  File "/home/rmuir/workspace/util/searchBench.py", line 130, in run
    raise RuntimeError('results differ: %s' % str(cmpDiffs))
RuntimeError: results differ: ([], ['query=body:changer~1.0 
filter=CachingWrapperFilter(PreComputedRandomFilter(pctAccept=95.0)): hit 2 has 
wrong id/s [8684145] vs [6260043, 8684145]', 'query=body:changer~1.0 
filter=CachingWrapperFilter(PreComputedRandomFilter(pctAccept=75.0)): wrong 
collapsed hit count: 4 vs 5', 'query=body:changer~1.0 
filter=CachingWrapperFilter(PreComputedRandomFilter(pctAccept=99.0)): hit 2 has 
wrong id/s [8684145] vs [8043795]'])
{noformat}

I have no idea whats going on, but i'll upload my modifications to these 
filters to make them work with the patch (maybe i jacked it up).

                
> if a filter can support random access API, we should use it
> -----------------------------------------------------------
>
>                 Key: LUCENE-1536
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1536
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 2.4
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>              Labels: gsoc2011, lucene-gsoc-11, mentor
>             Fix For: 4.0
>
>         Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, 
> LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, 
> luceneutil.patch
>
>
> I ran some performance tests, comparing applying a filter via
> random-access API instead of current trunk's iterator API.
> This was inspired by LUCENE-1476, where we realized deletions should
> really be implemented just like a filter, but then in testing found
> that switching deletions to iterator was a very sizable performance
> hit.
> Some notes on the test:
>   * Index is first 2M docs of Wikipedia.  Test machine is Mac OS X
>     10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153.
>   * I test across multiple queries.  1-X means an OR query, eg 1-4
>     means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2
>     AND 3 AND 4.  "u s" means "united states" (phrase search).
>   * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90,
>     95, 98, 99, 99.99999 (filter is non-null but all bits are set),
>     100 (filter=null, control)).
>   * Method high means I use random-access filter API in
>     IndexSearcher's main loop.  Method low means I use random-access
>     filter API down in SegmentTermDocs (just like deleted docs
>     today).
>   * Baseline (QPS) is current trunk, where filter is applied as iterator up
>     "high" (ie in IndexSearcher's search loop).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it

Reply via email to