RE: DisjunctionMaxQuery and scoring
Hi, I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired name IN (dick, rich) scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' Dick Rich higher than Rich because the former has two term matches and the latter only one. In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); I that case DisjunctionMaxQuery is the way to go (it will only count the hit with highest score and not add scores (coord or not coord doesn't matter here). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd dmu...@gmail.com wrote: In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); you can, by returning a customized weight with a coord impl that PUNISHES documents that match 1 sub. Take a look at http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/queries/src/java/org/apache/lucene/queries/BoostingQuery.java for some inspiration, especially this part: BooleanQuery result = new BooleanQuery() { @Override public Weight createWeight(IndexSearcher searcher) throws IOException { return new BooleanWeight(searcher, false) { @Override public float coord(int overlap, int max) { // your logic here when overlap == 1, 1, etc -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
RE: DisjunctionMaxQuery and scoring
Hi, Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To achieve this, you have to change the coord function in your similarity/BooleanWeight used for this query. Either way: If you want a group of terms that get only one score if at least one of the terms match (SQL IN), but not add them at all, DisjunctionMaxQuery is fine. I think this is what Benson asked for. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, April 20, 2012 8:16 AM To: java-user@lucene.apache.org; david_murgatr...@hotmail.com Subject: RE: DisjunctionMaxQuery and scoring Hi, I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired name IN (dick, rich) scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' Dick Rich higher than Rich because the former has two term matches and the latter only one. In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); I that case DisjunctionMaxQuery is the way to go (it will only count the hit with highest score and not add scores (coord or not coord doesn't matter here). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
Uwe and Robert, Thanks. David and I are two peas in one pod here at Basis. --benson On Fri, Apr 20, 2012 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To achieve this, you have to change the coord function in your similarity/BooleanWeight used for this query. Either way: If you want a group of terms that get only one score if at least one of the terms match (SQL IN), but not add them at all, DisjunctionMaxQuery is fine. I think this is what Benson asked for. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, April 20, 2012 8:16 AM To: java-user@lucene.apache.org; david_murgatr...@hotmail.com Subject: RE: DisjunctionMaxQuery and scoring Hi, I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired name IN (dick, rich) scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' Dick Rich higher than Rich because the former has two term matches and the latter only one. In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); I that case DisjunctionMaxQuery is the way to go (it will only count the hit with highest score and not add scores (coord or not coord doesn't matter here). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
DisjunctionMaxQuery and scoring
I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) as a single term query. Reading the javadoc, I am seeing MultiTermQuery, and I'm that it is what we want. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) Rob, How do nested queries work with respect to this? If I build a boolean query one of whose clauses is a BooleanQuery with coord turned off, does just the nested query insides get left out of 'coord'? If so, then your answer certainly seems to be what the doctor ordered. --benson -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
Turning on disableCoord for a nested boolean query does not seem to change the overall maxCoord term as displayed in explain. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) Rob, How do nested queries work with respect to this? If I build a boolean query one of whose clauses is a BooleanQuery with coord turned off, does just the nested query insides get left out of 'coord'? If so, then your answer certainly seems to be what the doctor ordered. it applies only to that query itself. So if this BQ is a clause to another BQ that has coord enabled, that would not change the top-level BQ's coord. Note: if you don't want coord at all, then you can also plug in a Similarity that returns 1, or pick another Similarity like BM25: in trunk only the vector space impl even does anything for coord() -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) Rob, How do nested queries work with respect to this? If I build a boolean query one of whose clauses is a BooleanQuery with coord turned off, does just the nested query insides get left out of 'coord'? If so, then your answer certainly seems to be what the doctor ordered. it applies only to that query itself. So if this BQ is a clause to another BQ that has coord enabled, that would not change the top-level BQ's coord. Note: if you don't want coord at all, then you can also plug in a Similarity that returns 1, or pick another Similarity like BM25: in trunk only the vector space impl even does anything for coord() Robert, I'm sorry that my density is approaching lead. My problem is that I want coord, but I want to control which terms are counted and which are not. I suppose I can accomplish this with my own scorer. My hope was that there was a way to express This group of terms counts as one for coord. In other words, for a subset of fields in the query, I want to scale the entire score by the fraction of them that match. Another way to think about this, which might be no use at all, is to wonder: is there a way to charge a score penalty for failure to match a particular query term? That would, from another direction, address the underlying effect I'm trying to get. -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) Rob, How do nested queries work with respect to this? If I build a boolean query one of whose clauses is a BooleanQuery with coord turned off, does just the nested query insides get left out of 'coord'? If so, then your answer certainly seems to be what the doctor ordered. it applies only to that query itself. So if this BQ is a clause to another BQ that has coord enabled, that would not change the top-level BQ's coord. Note: if you don't want coord at all, then you can also plug in a Similarity that returns 1, or pick another Similarity like BM25: in trunk only the vector space impl even does anything for coord() Robert, I'm sorry that my density is approaching lead. My problem is that I want coord, but I want to control which terms are counted and which are not. I suppose I can accomplish this with my own scorer. My hope was that there was a way to express This group of terms counts as one for coord. So just structure your boolean query appropriately? BQ1(coord=true) BQ2(coord=false): 25 terms BQ3(coord=false): 87 terms BQ1's coord is based on how many subscorers match (out of 2, BQ2 and BQ3). If both match its 2/2 otherwise 1/2. But in this example BQ2 and BQ3 disable coord themselves, hiding the fact they accept 25 and 87 terms respectively and appearing as a single sub for coord(). Does this make sense? you can extend this idea to control this however you want by structuring the BQ appropriately so your BQ's with synonyms have coord=0 -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote: I am trying to solve a problem using DisjunctionMaxQuery. Consider a query like: a:b OR c:d OR e:f OR ... name:richard OR name:dick OR name:dickie OR name:rich ... At most, one of the richard names matches. So the match score gets dragged down by the long list of things that don't match, as the list can get quite long. It seemed to me, upon reading the documentation, that I could cure this problem by creating a query tree that used DisjunctionMaxQuery around all those nicknames. However, when I built a boolean query that had, as a clause, a DisjunctionMaxQuery in the place of a pile of these individual Term queries, the score and the explanation did not change at all -- in particular, the coord term shows the same number of total terms. So it looks as if the children of the disjunction still count. Is there a way to control that term? Or a better way to express this? Thinking SQL for a moment, what I'm trying to express is name IN (richard, dick, dickie, rich) I think you just want to disable coord() here? You can do this for that particular boolean query by passing true to the ctor: public BooleanQuery(boolean disableCoord) Rob, How do nested queries work with respect to this? If I build a boolean query one of whose clauses is a BooleanQuery with coord turned off, does just the nested query insides get left out of 'coord'? If so, then your answer certainly seems to be what the doctor ordered. it applies only to that query itself. So if this BQ is a clause to another BQ that has coord enabled, that would not change the top-level BQ's coord. Note: if you don't want coord at all, then you can also plug in a Similarity that returns 1, or pick another Similarity like BM25: in trunk only the vector space impl even does anything for coord() Robert, I'm sorry that my density is approaching lead. My problem is that I want coord, but I want to control which terms are counted and which are not. I suppose I can accomplish this with my own scorer. My hope was that there was a way to express This group of terms counts as one for coord. So just structure your boolean query appropriately? BQ1(coord=true) BQ2(coord=false): 25 terms BQ3(coord=false): 87 terms BQ1's coord is based on how many subscorers match (out of 2, BQ2 and BQ3). If both match its 2/2 otherwise 1/2. But in this example BQ2 and BQ3 disable coord themselves, hiding the fact they accept 25 and 87 terms respectively and appearing as a single sub for coord(). Does this make sense? you can extend this idea to control this however you want by structuring the BQ appropriately so your BQ's with synonyms have coord=0 Robert, This makes perfect sense, it is what I thought you meant to begin with. I tried it and thought that it did not work. Or, perhaps, I am misreading the 'explain' output. Or, more likely, I goofed altogether. I'll go back and recheck my results and post some explain output if I can't find my mistake. --benson -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug). However, at the end of the explain trace, I see: 0.45 = coord(9/20) I think that my nested Boolean, for which I've been flipping coord on and off to see what happens, is somehow not participating at all. So switching it's coord on and off has no effect. Why 20? Why not 22? Is this just an explain quirk? Should I shove all this code up to 3.6 from 2.9.3 before bugging you further? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Apr 19, 2012, at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote: I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug). However, at the end of the explain trace, I see: 0.45 = coord(9/20) I think that my nested Boolean, for which I've been flipping coord on and off to see what happens, is somehow not participating at all. So switching it's coord on and off has no effect. Why 20? Why not 22? Is this just an explain quirk? Should I shove all this code up to 3.6 from 2.9.3 before bugging you further? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote: I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug). However, at the end of the explain trace, I see: 0.45 = coord(9/20) I think that my nested Boolean, for which I've been flipping coord on and off to see what happens, is somehow not participating at all. So switching it's coord on and off has no effect. Why 20? Why not 22? Is this just an explain quirk? I am not sure (also not sure i understand your example totally), but at the same time could be as simple as the fact you have 2 prohibited (MUST_NOT) clauses. These don't count towards coord() I think its hard to tell from your description (just since it doesn't have all the details). an explain or test case or something like that would might be more efficient if its still not making sense... -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
[apologies for the earlier errant send] I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired name IN (dick, rich) scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' Dick Rich higher than Rich because the former has two term matches and the latter only one. In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); Is there a good way to accomplish this? On Thu, Apr 19, 2012 at 7:37 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote: I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug). However, at the end of the explain trace, I see: 0.45 = coord(9/20) I think that my nested Boolean, for which I've been flipping coord on and off to see what happens, is somehow not participating at all. So switching it's coord on and off has no effect. Why 20? Why not 22? Is this just an explain quirk? I am not sure (also not sure i understand your example totally), but at the same time could be as simple as the fact you have 2 prohibited (MUST_NOT) clauses. These don't count towards coord() I think its hard to tell from your description (just since it doesn't have all the details). an explain or test case or something like that would might be more efficient if its still not making sense... -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: DisjunctionMaxQuery and scoring
FWIW, there seems to be an explain bug in 2.9.1 that is fixed in 3.6.0, so I'm no longer confused about the actual behavior. On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd dmu...@gmail.com wrote: [apologies for the earlier errant send] I think BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the desired name IN (dick, rich) scoring behavior. This is because (name:dick | name:rich) with coord=false would score the 'document' Dick Rich higher than Rich because the former has two term matches and the latter only one. In contrast, I think the desire is that one and only one of the terms in the document match those in the BooleanQuery so that Rich would score higher than Dick Rich, given document length normalization. It's almost like a desire for BooleanQuery bq = new BooleanQuery(false); bq.set*Maximum*NumberShouldMatch(1); Is there a good way to accomplish this? On Thu, Apr 19, 2012 at 7:37 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote: I see why I'm so confused, but I think I need to construct a simpler test case. My top-level BooleanQuery, which has disableCoord=false, has 22 clauses. All but three are ordinary SHOULD TermQueries. the remainder are a spanNear and a nested BooleanQuery, and an empty PhraseQuery (that's a bug). However, at the end of the explain trace, I see: 0.45 = coord(9/20) I think that my nested Boolean, for which I've been flipping coord on and off to see what happens, is somehow not participating at all. So switching it's coord on and off has no effect. Why 20? Why not 22? Is this just an explain quirk? I am not sure (also not sure i understand your example totally), but at the same time could be as simple as the fact you have 2 prohibited (MUST_NOT) clauses. These don't count towards coord() I think its hard to tell from your description (just since it doesn't have all the details). an explain or test case or something like that would might be more efficient if its still not making sense... -- lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org