Am I missing something?   Across the project, I’m seeing over 1,000 references 
to BooleanQuery.add().   Already, this seems like a pretty major refactoring.  
And I haven’t checked the other types of queries: DisjunctionMax, Phrase, and 
MultiPhrase.   At that scale, bugs will be introduced.

I’m not disagreeing with the concept.  At all.   It’s part of the Collections 
contract that anything used in hashCode() and equals() be kept immutable.  Just 
wondering if the cost is worth the principle this time?

In the spirit of discussion, an alternate approach might be to:
  a. Locate the places in the code where a query is taken from the cache and 
modified after the fact.
  b. Remove the query object before modifying and placing back on the cache.

Easier said than done, I realize.  Note, changing the constructors and removing 
modifiers would force all of these changes anyway.  It's just that they would 
be lost in a forest of other minor modifications.    So, even if folks are ok 
with the larger scale changes, it might make sense to start with the 
problematic places first and then move on to the bulk of "syntax changes".

Please ignore this if I am missing something here.

--------------------------------------------------------------------------------------------------------------------------------
From: Terry Smith [mailto:sheb...@gmail.com] 
Sent: Tuesday, March 31, 2015 9:38 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Adrien,

I missed the reason that boost is going to stay mutable. Is this to support 
query rewriting?

--Terry


On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir <rcm...@gmail.com> wrote:
Same with BooleanQuery. the go-to ctor should just take 'clauses'

On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
<luc...@mikemccandless.com> wrote:
> +1
>
> For PhraseQuery we could also have a common-case ctor that just takes
> the terms (and assumes sequential positions)?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand <jpou...@gmail.com> wrote:
>> Recent changes that added automatic filter caching to IndexSearcher
>> uncovered some traps with our queries when it comes to using them as
>> cache keys. The problem comes from the fact that some of our main
>> queries are mutable, and modifying them while they are used as cache
>> keys makes the entry that they are caching invisible (because the hash
>> code changed too) yet still using memory.
>>
>> While I think most users would be unaffected as it is rather uncommon
>> to modify queries after having passed them to IndexSearcher, I would
>> like to remove this trap by making queries immutable: everything
>> should be set at construction time except the boost parameter that
>> could still be changed with the same clone()/setBoost() mechanism as
>> today.
>>
>> First I would like to make sure that it sounds good to everyone and
>> then to discuss what the API should look like. Most of our queries
>> happen to be immutable already (NumericRangeQuery, TermsQuery,
>> SpanNearQuery, etc.) but some aren't and the main exceptions are:
>>  - BooleanQuery,
>>  - DisjunctionMaxQuery,
>>  - PhraseQuery,
>>  - MultiPhraseQuery.
>>
>> We could take all parameters that are set as setters and move them to
>> constructor arguments. For the above queries, this would mean (using
>> varargs for ease of use):
>>
>>   BooleanQuery(boolean disableCoord, int minShouldMatch,
>>     BooleanClause... clauses)
>>   DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
>>
>> For PhraseQuery and MultiPhraseQuery, the closest to what we have
>> today would require adding new classes to wrap terms and positions
>> together, for instance:
>>
>> class TermAndPosition {
>>   public final BytesRef term;
>>   public final int position;
>> }
>>
>> so that eg. PhraseQuery would look like:
>>
>>   PhraseQuery(int slop, String field, TermAndPosition... terms)
>>
>> MultiPhraseQuery would be the same with several terms at the same position.
>>
>> Comments/ideas/concerns are highly welcome.
>>
>> --
>> Adrien

*************************************************************************
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*************************************************************************

Reply via email to