[ https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-1644: --------------------------------------- Attachment: LUCENE-1644.patch Attached patch: fixed some bugs in the last rev, updated test cases, javadocs, CHANGES. I also optimized MultiTermQueryWrapperFilter to use the bulk-read API from termDocs. I confirmed all tests pass if I temporarily switch CONSTANT_SCORE_FILTER_REWRITE to CONSTANT_SCORE_AUTO_REWRITE_DEFAULT. I changed QueryParser to use CONSTANT_SCORE_AUTO for rewrite (it was previously CONSTANT_FILTER). I still need to run some perf tests to get a rough sense of decent defaults for CONSTANT_SCORE_AUTO cutover thresholds. bq. getFilter()/getEnum should stay protected. OK I made getEnum protected again. I had tentatively made it public so that one could create their own [external] rewrite methods. But I think (if we leave it protected), one could still make an inner/nested class that can access getEnum(). Do we even need getFilter()? I removed it in the patch. > Enable MultiTermQuery's constant score mode to also use BooleanQuery under > the hood > ----------------------------------------------------------------------------------- > > Key: LUCENE-1644 > URL: https://issues.apache.org/jira/browse/LUCENE-1644 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1644.patch, LUCENE-1644.patch, LUCENE-1644.patch > > > When MultiTermQuery is used (via one of its subclasses, eg > WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use > "constant score mode", which pre-builds a filter and then wraps that > filter as a ConstantScoreQuery. > If you don't set that, it instead builds a [potentially massive] > BooleanQuery with one SHOULD clause per term. > There are some limitations of this approach: > * The scores returned by the BooleanQuery are often quite > meaningless to the app, so, one should be able to use a > BooleanQuery yet get constant scores back. (Though I vaguely > remember at least one example someone raised where the scores were > useful...). > * The resulting BooleanQuery can easily have too many clauses, > throwing an extremely confusing exception to newish users. > * It'd be better to have the freedom to pick "build filter up front" > vs "build massive BooleanQuery", when constant scoring is enabled, > because they have different performance tradeoffs. > * In constant score mode, an OpenBitSet is always used, yet for > sparse bit sets this does not give good performance. > I think we could address these issues by giving BooleanQuery a > constant score mode, then empower MultiTermQuery (when in constant > score mode) to pick & choose whether to use BooleanQuery vs up-front > filter, and finally empower MultiTermQuery to pick the best (sparse vs > dense) bit set impl. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org