[ 
https://issues.apache.org/jira/browse/LUCENE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734067#action_12734067
 ] 

Uwe Schindler commented on LUCENE-1644:
---------------------------------------

Sorry that I came back too late to this issue, I am in holidays at the moment.

In my opinion, the Parameter instead of boolean is a good idea. The latest 
patch is also a good idea, I only hve some small problems with it:
- Why did you make so many internal things public? The additional ctor to 
MultiTermQueryrapperFilter should be package-private or protected (the class is 
not abstract, but should be used like abstract, so it ,must have only protected 
ctors). Only the public instances TermRangeFilter should have public ctors.
- getFilter()/getEnum should stay protected.
- I do not like the wired caching of Terms. A more cleaner API would be a new 
class CachingFilteredTermEnum, that can turn on caching for e.g. the first 20 
terms and then reset. In this case, the API would stay clear and the filter 
code does not need to be changed at all (it just harvests the TermEnum, if it 
is cached or not). I would propose something like: new 
CachingFilteredTermEnum(originalEnum), use it normally, then termEnum.reset() 
to consume again and termEnum.purgeCache() if caching no longer needed and to 
be switched off (after the first 25 terms or so). The problem with 
MultiTermQueryWrapper filter is, that the filter is normally stateless (no 
reader or termenum). So normally the method getDocIdSet() should get the 
termenum or wrapper in addition to the indexreader. This is not very good (it 
took me some time, to understand, what you are doing). 

> Enable MultiTermQuery's constant score mode to also use BooleanQuery under 
> the hood
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1644.patch, LUCENE-1644.patch
>
>
> When MultiTermQuery is used (via one of its subclasses, eg
> WildcardQuery, PrefixQuery, FuzzyQuery, etc.), you can ask it to use
> "constant score mode", which pre-builds a filter and then wraps that
> filter as a ConstantScoreQuery.
> If you don't set that, it instead builds a [potentially massive]
> BooleanQuery with one SHOULD clause per term.
> There are some limitations of this approach:
>   * The scores returned by the BooleanQuery are often quite
>     meaningless to the app, so, one should be able to use a
>     BooleanQuery yet get constant scores back.  (Though I vaguely
>     remember at least one example someone raised where the scores were
>     useful...).
>   * The resulting BooleanQuery can easily have too many clauses,
>     throwing an extremely confusing exception to newish users.
>   * It'd be better to have the freedom to pick "build filter up front"
>     vs "build massive BooleanQuery", when constant scoring is enabled,
>     because they have different performance tradeoffs.
>   * In constant score mode, an OpenBitSet is always used, yet for
>     sparse bit sets this does not give good performance.
> I think we could address these issues by giving BooleanQuery a
> constant score mode, then empower MultiTermQuery (when in constant
> score mode) to pick & choose whether to use BooleanQuery vs up-front
> filter, and finally empower MultiTermQuery to pick the best (sparse vs
> dense) bit set impl.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to