[
https://issues.apache.org/jira/browse/LUCENE-5049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691206#comment-13691206
]
Michael McCandless commented on LUCENE-5049:
--------------------------------------------
OK, fair enough Uwe ... I've continued this effort at
https://github.com/mikemccand/lucene-c-boost and wrote a blog post about it at
http://blog.mikemccandless.com/2013/06/screaming-fast-lucene-searches-using-c.html
> Native (C++) implementation of "pure OR" BooleanQuery
> -----------------------------------------------------
>
> Key: LUCENE-5049
> URL: https://issues.apache.org/jira/browse/LUCENE-5049
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-5049.patch
>
>
> I've been playing with a C++ implementation of BooleanQuery containing
> only OR'd (SHOULD) TermQuery clauses, collecting top N hits by score.
> The results are impressive: ~3X speedup for BQ OR over two terms, and
> also good speedups (~38-78%) for Fuzzy1/2 as well since they rewrite
> to BQ OR over N terms:
> {noformat}
> Task QPS base StdDev QPS comp StdDev
> Pct diff
> MedTerm 69.47 (15.8%) 68.61 (13.4%)
> -1.2% ( -26% - 33%)
> HighTerm 55.25 (16.2%) 54.63 (13.9%)
> -1.1% ( -26% - 34%)
> LowTerm 333.10 (9.6%) 329.43 (8.0%)
> -1.1% ( -17% - 18%)
> IntNRQ 3.37 (2.6%) 3.36 (4.6%)
> -0.2% ( -7% - 7%)
> Prefix3 18.91 (2.0%) 19.04 (3.5%)
> 0.7% ( -4% - 6%)
> Wildcard 29.40 (1.7%) 29.70 (2.8%)
> 1.0% ( -3% - 5%)
> MedPhrase 132.69 (6.2%) 134.66 (7.0%)
> 1.5% ( -11% - 15%)
> HighSloppyPhrase 0.82 (3.6%) 0.83 (3.5%)
> 1.9% ( -5% - 9%)
> AndHighHigh 19.65 (0.6%) 20.02 (0.8%)
> 1.9% ( 0% - 3%)
> HighPhrase 11.74 (6.6%) 11.96 (7.1%)
> 1.9% ( -11% - 16%)
> MedSloppyPhrase 29.09 (1.2%) 29.76 (1.9%)
> 2.3% ( 0% - 5%)
> LowSloppyPhrase 25.71 (1.4%) 26.98 (1.7%)
> 4.9% ( 1% - 8%)
> Respell 173.78 (3.0%) 182.41 (3.7%)
> 5.0% ( -1% - 12%)
> MedSpanNear 27.67 (2.5%) 29.07 (2.4%)
> 5.1% ( 0% - 10%)
> HighSpanNear 2.95 (2.4%) 3.10 (2.8%)
> 5.4% ( 0% - 10%)
> LowSpanNear 8.29 (3.4%) 8.82 (3.3%)
> 6.4% ( 0% - 13%)
> AndHighMed 79.32 (1.6%) 84.44 (1.0%)
> 6.5% ( 3% - 9%)
> LowPhrase 23.20 (2.0%) 25.14 (1.6%)
> 8.4% ( 4% - 12%)
> AndHighLow 594.17 (3.4%) 660.32 (1.9%)
> 11.1% ( 5% - 16%)
> Fuzzy2 88.32 (6.4%) 121.44 (1.7%)
> 37.5% ( 27% - 48%)
> Fuzzy1 86.34 (6.0%) 153.49 (1.7%)
> 77.8% ( 66% - 90%)
> OrHighHigh 16.29 (2.5%) 48.29 (1.3%)
> 196.5% ( 188% - 205%)
> OrHighMed 28.98 (2.7%) 87.81 (0.9%)
> 203.0% ( 194% - 212%)
> OrHighLow 27.38 (2.6%) 84.94 (1.1%)
> 210.3% ( 201% - 219%)
> {noformat}
> This is essentially a scaled back attempt at LUCENE-1594 in that it's
> "hardwired" to "just" the "OR of TermQuery" case.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]