Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Thu, Apr 2, 2015 at 6:23 PM, Adrien Grand wrote: > On Fri, Apr 3, 2015 at 12:10 AM, Reitzel, Charles > wrote: >> Unfortunately, since boost is used in hashCode() and equals() calculations, >> changing the boost will still make the queries "trappy". You will do all >> that work to make everything-but-boost immutable and still not fix the >> problem. > > This is why the query cache clones queries before putting them into > the cache and sets the boost to 1. The issue is that this clone method > is only shallow since its purpose is to change the boost. So it works > fine for the boost parameter but not for eg. boolean clauses. > > Again, I don't dismiss advantages of going fully immutable I will. I do not think it is worth it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Fri, Apr 3, 2015 at 12:10 AM, Reitzel, Charles wrote: > Unfortunately, since boost is used in hashCode() and equals() calculations, > changing the boost will still make the queries "trappy". You will do all > that work to make everything-but-boost immutable and still not fix the > problem. This is why the query cache clones queries before putting them into the cache and sets the boost to 1. The issue is that this clone method is only shallow since its purpose is to change the boost. So it works fine for the boost parameter but not for eg. boolean clauses. Again, I don't dismiss advantages of going fully immutable, I'm just arguing that making all queries immutable up to the boost would already be a good improvement like Robert described. We can still discuss about going fully immutable in a different issue, it would benefit from the proposed change. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [DISCUSS] Change Query API to make queries immutable in 6.0
Unfortunately, since boost is used in hashCode() and equals() calculations, changing the boost will still make the queries "trappy". You will do all that work to make everything-but-boost immutable and still not fix the problem. You can prove it to yourself like so (this test fails!): public void testMapOrphan() { Map map = new HashMap<>(); BooleanQuery booleanAB = new BooleanQuery(); booleanAB.add(new TermQuery(new Term("contents", "a")), BooleanClause.Occur.SHOULD); booleanAB.add(new TermQuery(new Term("contents", "b")), BooleanClause.Occur.SHOULD); map.put( booleanAB, 1 ); booleanAB.setBoost( 33.3f );// Set boost after map.put() assertTrue( map.containsKey(booleanAB) ); } Seems like the quick way to the coast is to write a failing test - before making changes. I realize this is easier said than done. Based on your testing that led you to start this discussion, can you narrow it down to a single Query class and/or IndexSearcher use case? Not there will be only one case. But, at least, it will be a starting point. Once the first failing test has been written, it should be relatively easy to write test variations to cover the remaining "mutuable" Query classes. With the scale of the changes you are proposing, "test first" seems like a reasonable approach. Another compromise approach might be to sub-class the mutable Query classes like so: class ImmutableBooleanQuery extends BooleanQuery { public void add(BooleanClause clause) { throw new UnsupportedOperationException( "ImmutableBooleanQuery.add(BooleanClause)" ); } public void setBoost( int boost ) { throw new UnsupportedOperationException( "ImmutableBooleanQuery.add(BooleanClause)" ); } // etc. public static ImmutableBooleanQuery cloneFrom(BooleanQuery original) { // Use field level access to by-pass mutator methods. } // Do NOT override rewrite(IndexReader)! } In theory, such a proxy class could be generated at runtime to force immutability: https://github.com/verhas/immutator Which could make a lot of sense in JUnit tests, if not production runtime. An immutable Query would be cloned from the original and place on the cache instead. Any attempt to modify the cache entry should fail quickly. To me, a less invasive approach seems like a faster and easier way to actually find and fix this bug. Once that is done, then it might make sense to perform the exhaustive updates to prevent a "relapse" in the future. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Thursday, April 02, 2015 9:46 AM To: dev@lucene.apache.org Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0 Boosts might not make sense to become immutable, it might make the code too complex. Who is to say until the other stuff is fixed first. The downsides might outweight the upsides. So yeah, if you want to say "if anyone disagrees with what the future might look like i'm gonna -1 your progress", then i will bite right now. Fixing the rest of Query to be immutable, so filter caching isn't trappy, we should really do that. And we have been doing it already. I remember Uwe suggested this approach when adding automaton and related queries a long time ago. It made things simpler and avoided bugs, we ultimately made as much of it immutable as we could. Queries have to be well-behaved, they need a good hashcode/equals, thread safety, good error checking, etc. It is easier to do this when things are immutable. Someone today can make a patch for FooQuery that nukes setBar and moves it to a ctor parameter named 'bar' and chances are a lot of the time, it probably fixes bugs in FooQuery somehow. Thats just what it is. Boosts are the 'long tail'. they are simple primitive floating point values, so susceptible to less problems. The base class incorporates boosts into equals/hashcode already, which prevents the most common bugs with them. They are trickier because internal things like rewrite() might "shuffle them around" in conjunction with clone(), to do optimizations. They are also only relevant when scores are needed: so we can prevent nasty filter caching bugs as a step, by making everything else immutable. On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com wrote: > On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand wrote: >> >> first make queries immutable up to the boost and then discuss >> if/how/when we should go fully immutable with a new API to change >> boosts? > > > The “if” part concerns me; I don’t mind it being a separate issue to > make the changes more manageable (progress not perfection, and all > that). I’m all for the whole shebang. But if others think “no” > then…. will it have been
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Thu, Apr 2, 2015 at 9:45 AM, Robert Muir wrote: > They are also only relevant when scores are needed: > so we can prevent nasty filter caching bugs as a step, by making > everything else immutable. > That’s a good point. +1 to your progress Adrien! ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Boosts might not make sense to become immutable, it might make the code too complex. Who is to say until the other stuff is fixed first. The downsides might outweight the upsides. So yeah, if you want to say "if anyone disagrees with what the future might look like i'm gonna -1 your progress", then i will bite right now. Fixing the rest of Query to be immutable, so filter caching isn't trappy, we should really do that. And we have been doing it already. I remember Uwe suggested this approach when adding automaton and related queries a long time ago. It made things simpler and avoided bugs, we ultimately made as much of it immutable as we could. Queries have to be well-behaved, they need a good hashcode/equals, thread safety, good error checking, etc. It is easier to do this when things are immutable. Someone today can make a patch for FooQuery that nukes setBar and moves it to a ctor parameter named 'bar' and chances are a lot of the time, it probably fixes bugs in FooQuery somehow. Thats just what it is. Boosts are the 'long tail'. they are simple primitive floating point values, so susceptible to less problems. The base class incorporates boosts into equals/hashcode already, which prevents the most common bugs with them. They are trickier because internal things like rewrite() might "shuffle them around" in conjunction with clone(), to do optimizations. They are also only relevant when scores are needed: so we can prevent nasty filter caching bugs as a step, by making everything else immutable. On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com wrote: > On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand wrote: >> >> first make queries immutable up to the boost and then discuss >> if/how/when we should go fully immutable with a new API to change >> boosts? > > > The “if” part concerns me; I don’t mind it being a separate issue to make > the changes more manageable (progress not perfection, and all that). I’m > all for the whole shebang. But if others think “no” then…. will it have > been worthwhile to do this big change and not go all the way? I think not. > Does anyone feel the answer is “no” to make boosts immutable? And if so why? > > If nobody comes up with a dissenting opinion to make boosts immutable within > a couple days then count me as “+1” to your plans, else “-1” pending that > discussion. > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
If we were designing things from scratch again, would boost really be on Query, or would it be on BooleanClause? Just throwing that out there... although it may make it easier to implement immutable queries (and perhaps make more sense too), it may also be too big of a change to be worth while. -Yonik On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com wrote: > On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand wrote: >> >> first make queries immutable up to the boost and then discuss >> if/how/when we should go fully immutable with a new API to change >> boosts? > > > The “if” part concerns me; I don’t mind it being a separate issue to make > the changes more manageable (progress not perfection, and all that). I’m > all for the whole shebang. But if others think “no” then…. will it have > been worthwhile to do this big change and not go all the way? I think not. > Does anyone feel the answer is “no” to make boosts immutable? And if so why? > > If nobody comes up with a dissenting opinion to make boosts immutable within > a couple days then count me as “+1” to your plans, else “-1” pending that > discussion. > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand wrote: > first make queries immutable up to the boost and then discuss > if/how/when we should go fully immutable with a new API to change > boosts? > The “if” part concerns me; I don’t mind it being a separate issue to make the changes more manageable (progress not perfection, and all that). I’m all for the whole shebang. But if others think “no” then…. will it have been worthwhile to do this big change and not go all the way? I think not. Does anyone feel the answer is “no” to make boosts immutable? And if so why? If nobody comes up with a dissenting opinion to make boosts immutable within a couple days then count me as “+1” to your plans, else “-1” pending that discussion. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand wrote: > Would would be > ok to first make queries immutable up to the boost and then discuss > if/how/when we should go fully immutable with a new API to change > boosts? +1 ... progress not perfection. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
I did not close this door, I agree this is something that should be considered and tried to list the pros/cons that I could think about. However I would like it to be dealt with in a different issue as it will already be a big change to change those 4 queries. Would would be ok to first make queries immutable up to the boost and then discuss if/how/when we should go fully immutable with a new API to change boosts? On Wed, Apr 1, 2015 at 9:25 PM, david.w.smi...@gmail.com wrote: > I’m +1 to going all the way (fully immutable) but the proposal stops short > by skipping the boost. I agree with Terry’s comments — what a shame to make > Queries “more immutable” but not really quite immutable. It kinda misses > the point? Otherwise why bother? If this is about progress not perfection, > then okay, but if we don’t ultimately go all the way then there isn’t the > benefit we’re after and we’ve both changed the API and made it a bit more > awkward to use. I like the idea of a method like cloneWithBoost() or > some-such. A no-arg clone() could be final and call that one with the > current boost. > > While we’re at it, BooleanQuery & other variable aggregates could cache the > hashCode at construction. > > ~ David Smiley > Freelance Apache Lucene/Solr Search Consultant/Developer > http://www.linkedin.com/in/davidwsmiley > > On Tue, Mar 31, 2015 at 11:06 AM, Adrien Grand wrote: >> >> On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith wrote: >> > Thanks for the explanation. It seems a pity to make queries just nearly >> > immutable. Do you have any interest in adding a boost parameter to >> > clone() >> > so they really could be immutable? >> >> We could have a single method, but if we do it I would rather do it in >> a different change since it would affect all queries as opposed to >> only a handful of them. >> >> Also there is some benefit in having clone() and setBoost() in that >> cloning and setters are things that are familiar to everyone. If we >> replace them with a new method, we would need to specify its >> semantics. (Not a blocker, just wanted to mention what the pros/cons >> are in my opinion.) >> >> -- >> Adrien >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
I’m +1 to going all the way (fully immutable) but the proposal stops short by skipping the boost. I agree with Terry’s comments — what a shame to make Queries “more immutable” but not really quite immutable. It kinda misses the point? Otherwise why bother? If this is about progress not perfection, then okay, but if we don’t ultimately go all the way then there isn’t the benefit we’re after and we’ve both changed the API and made it a bit more awkward to use. I like the idea of a method like cloneWithBoost() or some-such. A no-arg clone() could be final and call that one with the current boost. While we’re at it, BooleanQuery & other variable aggregates could cache the hashCode at construction. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Tue, Mar 31, 2015 at 11:06 AM, Adrien Grand wrote: > On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith wrote: > > Thanks for the explanation. It seems a pity to make queries just nearly > > immutable. Do you have any interest in adding a boost parameter to > clone() > > so they really could be immutable? > > We could have a single method, but if we do it I would rather do it in > a different change since it would affect all queries as opposed to > only a handful of them. > > Also there is some benefit in having clone() and setBoost() in that > cloning and setters are things that are familiar to everyone. If we > replace them with a new method, we would need to specify its > semantics. (Not a blocker, just wanted to mention what the pros/cons > are in my opinion.) > > -- > Adrien > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
On Tue, Mar 31, 2015 at 4:32 PM, Terry Smith wrote: > Thanks for the explanation. It seems a pity to make queries just nearly > immutable. Do you have any interest in adding a boost parameter to clone() > so they really could be immutable? We could have a single method, but if we do it I would rather do it in a different change since it would affect all queries as opposed to only a handful of them. Also there is some benefit in having clone() and setBoost() in that cloning and setters are things that are familiar to everyone. If we replace them with a new method, we would need to specify its semantics. (Not a blocker, just wanted to mention what the pros/cons are in my opinion.) -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Hi Charles, On Tue, Mar 31, 2015 at 4:12 PM, Reitzel, Charles wrote: > Am I missing something? Across the project, I’m seeing over 1,000 > references to BooleanQuery.add(). Already, this seems like a pretty major > refactoring. And I haven’t checked the other types of queries: > DisjunctionMax, Phrase, and MultiPhrase. At that scale, bugs will be > introduced. > > I’m not disagreeing with the concept. At all. It’s part of the Collections > contract that anything used in hashCode() and equals() be kept immutable. > Just wondering if the cost is worth the principle this time? The majority of call sites are in test folders. This does not make the change easier but it decreases the chances to introduce an actual bug. Also, the queries that we need to modify are those that are best tested so I'm quite confident that this change will not be a bug nest. However I totally agree that the change it huge, this is why I asked for opinions on the list before doing it actually doing it. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Adrien, Thanks for the explanation. It seems a pity to make queries just nearly immutable. Do you have any interest in adding a boost parameter to clone() so they really could be immutable? --Terry On Tue, Mar 31, 2015 at 9:44 AM, Adrien Grand wrote: > Hi Terry, > > Indeed this is for query rewriting. For instance if you have a boolean > query with a boost of 5 that wraps a single MUST clause with a term > query, then we rewrite to this to the inner term query and update its > boost using clone() and setBoost() in order to not modify in-place a > user-modified query. > > On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith wrote: > > Adrien, > > > > I missed the reason that boost is going to stay mutable. Is this to > support > > query rewriting? > > > > --Terry > > > > > > On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir wrote: > >> > >> Same with BooleanQuery. the go-to ctor should just take 'clauses' > >> > >> On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless > >> wrote: > >> > +1 > >> > > >> > For PhraseQuery we could also have a common-case ctor that just takes > >> > the terms (and assumes sequential positions)? > >> > > >> > Mike McCandless > >> > > >> > http://blog.mikemccandless.com > >> > > >> > > >> > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand > wrote: > >> >> Recent changes that added automatic filter caching to IndexSearcher > >> >> uncovered some traps with our queries when it comes to using them as > >> >> cache keys. The problem comes from the fact that some of our main > >> >> queries are mutable, and modifying them while they are used as cache > >> >> keys makes the entry that they are caching invisible (because the > hash > >> >> code changed too) yet still using memory. > >> >> > >> >> While I think most users would be unaffected as it is rather uncommon > >> >> to modify queries after having passed them to IndexSearcher, I would > >> >> like to remove this trap by making queries immutable: everything > >> >> should be set at construction time except the boost parameter that > >> >> could still be changed with the same clone()/setBoost() mechanism as > >> >> today. > >> >> > >> >> First I would like to make sure that it sounds good to everyone and > >> >> then to discuss what the API should look like. Most of our queries > >> >> happen to be immutable already (NumericRangeQuery, TermsQuery, > >> >> SpanNearQuery, etc.) but some aren't and the main exceptions are: > >> >> - BooleanQuery, > >> >> - DisjunctionMaxQuery, > >> >> - PhraseQuery, > >> >> - MultiPhraseQuery. > >> >> > >> >> We could take all parameters that are set as setters and move them to > >> >> constructor arguments. For the above queries, this would mean (using > >> >> varargs for ease of use): > >> >> > >> >> BooleanQuery(boolean disableCoord, int minShouldMatch, > >> >> BooleanClause... clauses) > >> >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) > >> >> > >> >> For PhraseQuery and MultiPhraseQuery, the closest to what we have > >> >> today would require adding new classes to wrap terms and positions > >> >> together, for instance: > >> >> > >> >> class TermAndPosition { > >> >> public final BytesRef term; > >> >> public final int position; > >> >> } > >> >> > >> >> so that eg. PhraseQuery would look like: > >> >> > >> >> PhraseQuery(int slop, String field, TermAndPosition... terms) > >> >> > >> >> MultiPhraseQuery would be the same with several terms at the same > >> >> position. > >> >> > >> >> Comments/ideas/concerns are highly welcome. > >> >> > >> >> -- > >> >> Adrien > >> >> > >> >> - > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> >> > >> > > >> > - > >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > > > -- > Adrien > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
RE: [DISCUSS] Change Query API to make queries immutable in 6.0
Am I missing something? Across the project, I’m seeing over 1,000 references to BooleanQuery.add(). Already, this seems like a pretty major refactoring. And I haven’t checked the other types of queries: DisjunctionMax, Phrase, and MultiPhrase. At that scale, bugs will be introduced. I’m not disagreeing with the concept. At all. It’s part of the Collections contract that anything used in hashCode() and equals() be kept immutable. Just wondering if the cost is worth the principle this time? In the spirit of discussion, an alternate approach might be to: a. Locate the places in the code where a query is taken from the cache and modified after the fact. b. Remove the query object before modifying and placing back on the cache. Easier said than done, I realize. Note, changing the constructors and removing modifiers would force all of these changes anyway. It's just that they would be lost in a forest of other minor modifications.So, even if folks are ok with the larger scale changes, it might make sense to start with the problematic places first and then move on to the bulk of "syntax changes". Please ignore this if I am missing something here. From: Terry Smith [mailto:sheb...@gmail.com] Sent: Tuesday, March 31, 2015 9:38 AM To: dev@lucene.apache.org Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0 Adrien, I missed the reason that boost is going to stay mutable. Is this to support query rewriting? --Terry On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir wrote: Same with BooleanQuery. the go-to ctor should just take 'clauses' On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless wrote: > +1 > > For PhraseQuery we could also have a common-case ctor that just takes > the terms (and assumes sequential positions)? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: >> Recent changes that added automatic filter caching to IndexSearcher >> uncovered some traps with our queries when it comes to using them as >> cache keys. The problem comes from the fact that some of our main >> queries are mutable, and modifying them while they are used as cache >> keys makes the entry that they are caching invisible (because the hash >> code changed too) yet still using memory. >> >> While I think most users would be unaffected as it is rather uncommon >> to modify queries after having passed them to IndexSearcher, I would >> like to remove this trap by making queries immutable: everything >> should be set at construction time except the boost parameter that >> could still be changed with the same clone()/setBoost() mechanism as >> today. >> >> First I would like to make sure that it sounds good to everyone and >> then to discuss what the API should look like. Most of our queries >> happen to be immutable already (NumericRangeQuery, TermsQuery, >> SpanNearQuery, etc.) but some aren't and the main exceptions are: >> - BooleanQuery, >> - DisjunctionMaxQuery, >> - PhraseQuery, >> - MultiPhraseQuery. >> >> We could take all parameters that are set as setters and move them to >> constructor arguments. For the above queries, this would mean (using >> varargs for ease of use): >> >> BooleanQuery(boolean disableCoord, int minShouldMatch, >> BooleanClause... clauses) >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) >> >> For PhraseQuery and MultiPhraseQuery, the closest to what we have >> today would require adding new classes to wrap terms and positions >> together, for instance: >> >> class TermAndPosition { >> public final BytesRef term; >> public final int position; >> } >> >> so that eg. PhraseQuery would look like: >> >> PhraseQuery(int slop, String field, TermAndPosition... terms) >> >> MultiPhraseQuery would be the same with several terms at the same position. >> >> Comments/ideas/concerns are highly welcome. >> >> -- >> Adrien * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA-CREF *
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Hi Terry, Indeed this is for query rewriting. For instance if you have a boolean query with a boost of 5 that wraps a single MUST clause with a term query, then we rewrite to this to the inner term query and update its boost using clone() and setBoost() in order to not modify in-place a user-modified query. On Tue, Mar 31, 2015 at 3:37 PM, Terry Smith wrote: > Adrien, > > I missed the reason that boost is going to stay mutable. Is this to support > query rewriting? > > --Terry > > > On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir wrote: >> >> Same with BooleanQuery. the go-to ctor should just take 'clauses' >> >> On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless >> wrote: >> > +1 >> > >> > For PhraseQuery we could also have a common-case ctor that just takes >> > the terms (and assumes sequential positions)? >> > >> > Mike McCandless >> > >> > http://blog.mikemccandless.com >> > >> > >> > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: >> >> Recent changes that added automatic filter caching to IndexSearcher >> >> uncovered some traps with our queries when it comes to using them as >> >> cache keys. The problem comes from the fact that some of our main >> >> queries are mutable, and modifying them while they are used as cache >> >> keys makes the entry that they are caching invisible (because the hash >> >> code changed too) yet still using memory. >> >> >> >> While I think most users would be unaffected as it is rather uncommon >> >> to modify queries after having passed them to IndexSearcher, I would >> >> like to remove this trap by making queries immutable: everything >> >> should be set at construction time except the boost parameter that >> >> could still be changed with the same clone()/setBoost() mechanism as >> >> today. >> >> >> >> First I would like to make sure that it sounds good to everyone and >> >> then to discuss what the API should look like. Most of our queries >> >> happen to be immutable already (NumericRangeQuery, TermsQuery, >> >> SpanNearQuery, etc.) but some aren't and the main exceptions are: >> >> - BooleanQuery, >> >> - DisjunctionMaxQuery, >> >> - PhraseQuery, >> >> - MultiPhraseQuery. >> >> >> >> We could take all parameters that are set as setters and move them to >> >> constructor arguments. For the above queries, this would mean (using >> >> varargs for ease of use): >> >> >> >> BooleanQuery(boolean disableCoord, int minShouldMatch, >> >> BooleanClause... clauses) >> >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) >> >> >> >> For PhraseQuery and MultiPhraseQuery, the closest to what we have >> >> today would require adding new classes to wrap terms and positions >> >> together, for instance: >> >> >> >> class TermAndPosition { >> >> public final BytesRef term; >> >> public final int position; >> >> } >> >> >> >> so that eg. PhraseQuery would look like: >> >> >> >> PhraseQuery(int slop, String field, TermAndPosition... terms) >> >> >> >> MultiPhraseQuery would be the same with several terms at the same >> >> position. >> >> >> >> Comments/ideas/concerns are highly welcome. >> >> >> >> -- >> >> Adrien >> >> >> >> - >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> > >> > - >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Adrien, I missed the reason that boost is going to stay mutable. Is this to support query rewriting? --Terry On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir wrote: > Same with BooleanQuery. the go-to ctor should just take 'clauses' > > On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless > wrote: > > +1 > > > > For PhraseQuery we could also have a common-case ctor that just takes > > the terms (and assumes sequential positions)? > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: > >> Recent changes that added automatic filter caching to IndexSearcher > >> uncovered some traps with our queries when it comes to using them as > >> cache keys. The problem comes from the fact that some of our main > >> queries are mutable, and modifying them while they are used as cache > >> keys makes the entry that they are caching invisible (because the hash > >> code changed too) yet still using memory. > >> > >> While I think most users would be unaffected as it is rather uncommon > >> to modify queries after having passed them to IndexSearcher, I would > >> like to remove this trap by making queries immutable: everything > >> should be set at construction time except the boost parameter that > >> could still be changed with the same clone()/setBoost() mechanism as > >> today. > >> > >> First I would like to make sure that it sounds good to everyone and > >> then to discuss what the API should look like. Most of our queries > >> happen to be immutable already (NumericRangeQuery, TermsQuery, > >> SpanNearQuery, etc.) but some aren't and the main exceptions are: > >> - BooleanQuery, > >> - DisjunctionMaxQuery, > >> - PhraseQuery, > >> - MultiPhraseQuery. > >> > >> We could take all parameters that are set as setters and move them to > >> constructor arguments. For the above queries, this would mean (using > >> varargs for ease of use): > >> > >> BooleanQuery(boolean disableCoord, int minShouldMatch, > >> BooleanClause... clauses) > >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) > >> > >> For PhraseQuery and MultiPhraseQuery, the closest to what we have > >> today would require adding new classes to wrap terms and positions > >> together, for instance: > >> > >> class TermAndPosition { > >> public final BytesRef term; > >> public final int position; > >> } > >> > >> so that eg. PhraseQuery would look like: > >> > >> PhraseQuery(int slop, String field, TermAndPosition... terms) > >> > >> MultiPhraseQuery would be the same with several terms at the same > position. > >> > >> Comments/ideas/concerns are highly welcome. > >> > >> -- > >> Adrien > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
+1. Makes a lot of sense in general I think. Keeps them nicely thread safe as well, and as they are commonly used for keys in caches and such, that's a nice intrinsic property. - Mark On Tue, Mar 31, 2015 at 7:24 AM Robert Muir wrote: > Same with BooleanQuery. the go-to ctor should just take 'clauses' > > On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless > wrote: > > +1 > > > > For PhraseQuery we could also have a common-case ctor that just takes > > the terms (and assumes sequential positions)? > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: > >> Recent changes that added automatic filter caching to IndexSearcher > >> uncovered some traps with our queries when it comes to using them as > >> cache keys. The problem comes from the fact that some of our main > >> queries are mutable, and modifying them while they are used as cache > >> keys makes the entry that they are caching invisible (because the hash > >> code changed too) yet still using memory. > >> > >> While I think most users would be unaffected as it is rather uncommon > >> to modify queries after having passed them to IndexSearcher, I would > >> like to remove this trap by making queries immutable: everything > >> should be set at construction time except the boost parameter that > >> could still be changed with the same clone()/setBoost() mechanism as > >> today. > >> > >> First I would like to make sure that it sounds good to everyone and > >> then to discuss what the API should look like. Most of our queries > >> happen to be immutable already (NumericRangeQuery, TermsQuery, > >> SpanNearQuery, etc.) but some aren't and the main exceptions are: > >> - BooleanQuery, > >> - DisjunctionMaxQuery, > >> - PhraseQuery, > >> - MultiPhraseQuery. > >> > >> We could take all parameters that are set as setters and move them to > >> constructor arguments. For the above queries, this would mean (using > >> varargs for ease of use): > >> > >> BooleanQuery(boolean disableCoord, int minShouldMatch, > >> BooleanClause... clauses) > >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) > >> > >> For PhraseQuery and MultiPhraseQuery, the closest to what we have > >> today would require adding new classes to wrap terms and positions > >> together, for instance: > >> > >> class TermAndPosition { > >> public final BytesRef term; > >> public final int position; > >> } > >> > >> so that eg. PhraseQuery would look like: > >> > >> PhraseQuery(int slop, String field, TermAndPosition... terms) > >> > >> MultiPhraseQuery would be the same with several terms at the same > position. > >> > >> Comments/ideas/concerns are highly welcome. > >> > >> -- > >> Adrien > >> > >> - > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: dev-h...@lucene.apache.org > >> > > > > - > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
Same with BooleanQuery. the go-to ctor should just take 'clauses' On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless wrote: > +1 > > For PhraseQuery we could also have a common-case ctor that just takes > the terms (and assumes sequential positions)? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: >> Recent changes that added automatic filter caching to IndexSearcher >> uncovered some traps with our queries when it comes to using them as >> cache keys. The problem comes from the fact that some of our main >> queries are mutable, and modifying them while they are used as cache >> keys makes the entry that they are caching invisible (because the hash >> code changed too) yet still using memory. >> >> While I think most users would be unaffected as it is rather uncommon >> to modify queries after having passed them to IndexSearcher, I would >> like to remove this trap by making queries immutable: everything >> should be set at construction time except the boost parameter that >> could still be changed with the same clone()/setBoost() mechanism as >> today. >> >> First I would like to make sure that it sounds good to everyone and >> then to discuss what the API should look like. Most of our queries >> happen to be immutable already (NumericRangeQuery, TermsQuery, >> SpanNearQuery, etc.) but some aren't and the main exceptions are: >> - BooleanQuery, >> - DisjunctionMaxQuery, >> - PhraseQuery, >> - MultiPhraseQuery. >> >> We could take all parameters that are set as setters and move them to >> constructor arguments. For the above queries, this would mean (using >> varargs for ease of use): >> >> BooleanQuery(boolean disableCoord, int minShouldMatch, >> BooleanClause... clauses) >> DisjunctionMaxQuery(float tieBreakMul, Query... clauses) >> >> For PhraseQuery and MultiPhraseQuery, the closest to what we have >> today would require adding new classes to wrap terms and positions >> together, for instance: >> >> class TermAndPosition { >> public final BytesRef term; >> public final int position; >> } >> >> so that eg. PhraseQuery would look like: >> >> PhraseQuery(int slop, String field, TermAndPosition... terms) >> >> MultiPhraseQuery would be the same with several terms at the same position. >> >> Comments/ideas/concerns are highly welcome. >> >> -- >> Adrien >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [DISCUSS] Change Query API to make queries immutable in 6.0
+1 For PhraseQuery we could also have a common-case ctor that just takes the terms (and assumes sequential positions)? Mike McCandless http://blog.mikemccandless.com On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand wrote: > Recent changes that added automatic filter caching to IndexSearcher > uncovered some traps with our queries when it comes to using them as > cache keys. The problem comes from the fact that some of our main > queries are mutable, and modifying them while they are used as cache > keys makes the entry that they are caching invisible (because the hash > code changed too) yet still using memory. > > While I think most users would be unaffected as it is rather uncommon > to modify queries after having passed them to IndexSearcher, I would > like to remove this trap by making queries immutable: everything > should be set at construction time except the boost parameter that > could still be changed with the same clone()/setBoost() mechanism as > today. > > First I would like to make sure that it sounds good to everyone and > then to discuss what the API should look like. Most of our queries > happen to be immutable already (NumericRangeQuery, TermsQuery, > SpanNearQuery, etc.) but some aren't and the main exceptions are: > - BooleanQuery, > - DisjunctionMaxQuery, > - PhraseQuery, > - MultiPhraseQuery. > > We could take all parameters that are set as setters and move them to > constructor arguments. For the above queries, this would mean (using > varargs for ease of use): > > BooleanQuery(boolean disableCoord, int minShouldMatch, > BooleanClause... clauses) > DisjunctionMaxQuery(float tieBreakMul, Query... clauses) > > For PhraseQuery and MultiPhraseQuery, the closest to what we have > today would require adding new classes to wrap terms and positions > together, for instance: > > class TermAndPosition { > public final BytesRef term; > public final int position; > } > > so that eg. PhraseQuery would look like: > > PhraseQuery(int slop, String field, TermAndPosition... terms) > > MultiPhraseQuery would be the same with several terms at the same position. > > Comments/ideas/concerns are highly welcome. > > -- > Adrien > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[DISCUSS] Change Query API to make queries immutable in 6.0
Recent changes that added automatic filter caching to IndexSearcher uncovered some traps with our queries when it comes to using them as cache keys. The problem comes from the fact that some of our main queries are mutable, and modifying them while they are used as cache keys makes the entry that they are caching invisible (because the hash code changed too) yet still using memory. While I think most users would be unaffected as it is rather uncommon to modify queries after having passed them to IndexSearcher, I would like to remove this trap by making queries immutable: everything should be set at construction time except the boost parameter that could still be changed with the same clone()/setBoost() mechanism as today. First I would like to make sure that it sounds good to everyone and then to discuss what the API should look like. Most of our queries happen to be immutable already (NumericRangeQuery, TermsQuery, SpanNearQuery, etc.) but some aren't and the main exceptions are: - BooleanQuery, - DisjunctionMaxQuery, - PhraseQuery, - MultiPhraseQuery. We could take all parameters that are set as setters and move them to constructor arguments. For the above queries, this would mean (using varargs for ease of use): BooleanQuery(boolean disableCoord, int minShouldMatch, BooleanClause... clauses) DisjunctionMaxQuery(float tieBreakMul, Query... clauses) For PhraseQuery and MultiPhraseQuery, the closest to what we have today would require adding new classes to wrap terms and positions together, for instance: class TermAndPosition { public final BytesRef term; public final int position; } so that eg. PhraseQuery would look like: PhraseQuery(int slop, String field, TermAndPosition... terms) MultiPhraseQuery would be the same with several terms at the same position. Comments/ideas/concerns are highly welcome. -- Adrien - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org