RE: DisjunctionMaxQuery and scoring

2012-04-20 Thread Uwe Schindler
Hi,
 I think
  BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish the
 desired name IN (dick, rich) scoring behavior. This is because
(name:dick |
 name:rich) with coord=false would score the 'document' Dick Rich higher
 than Rich because the former has two term matches and the latter only
one.
 In contrast, I think the desire is that one and only one of the terms in
the
 document match those in the BooleanQuery so that Rich would score higher
 than Dick Rich, given document length normalization. It's almost like a
desire
 for BooleanQuery bq = new BooleanQuery(false);
   bq.set*Maximum*NumberShouldMatch(1);

I that case DisjunctionMaxQuery is the way to go (it will only count the hit
with highest score and not add scores (coord or not coord doesn't matter
here).


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-20 Thread Robert Muir
On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd dmu...@gmail.com wrote:
 In contrast, I think the desire
 is that one and only one of the terms in the document match those in the
 BooleanQuery so that Rich would score higher than Dick Rich, given
 document length normalization. It's almost like a desire for
 BooleanQuery bq = new BooleanQuery(false);
  bq.set*Maximum*NumberShouldMatch(1);


you can, by returning a customized weight with a coord impl that
PUNISHES documents that match  1 sub.

Take a look at 
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/queries/src/java/org/apache/lucene/queries/BoostingQuery.java
for some inspiration, especially this part:

BooleanQuery result = new BooleanQuery() {
@Override
public Weight createWeight(IndexSearcher searcher) throws IOException {
  return new BooleanWeight(searcher, false) {

@Override
public float coord(int overlap, int max) {
  // your logic here when overlap == 1,  1, etc

-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: DisjunctionMaxQuery and scoring

2012-04-20 Thread Uwe Schindler
Hi,

Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To
achieve this, you have to change the coord function in your
similarity/BooleanWeight used for this query.

Either way: If you want a group of terms that get only one score if at least
one of the terms match (SQL IN), but not add them at all,
DisjunctionMaxQuery is fine. I think this is what Benson asked for.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, April 20, 2012 8:16 AM
 To: java-user@lucene.apache.org; david_murgatr...@hotmail.com
 Subject: RE: DisjunctionMaxQuery and scoring
 
 Hi,
  I think
   BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish
  the desired name IN (dick, rich) scoring behavior. This is because
 (name:dick |
  name:rich) with coord=false would score the 'document' Dick Rich
  higher than Rich because the former has two term matches and the
  latter only
 one.
  In contrast, I think the desire is that one and only one of the terms
  in
 the
  document match those in the BooleanQuery so that Rich would score
  higher than Dick Rich, given document length normalization. It's
  almost like a
 desire
  for BooleanQuery bq = new BooleanQuery(false);
bq.set*Maximum*NumberShouldMatch(1);
 
 I that case DisjunctionMaxQuery is the way to go (it will only count the
hit with
 highest score and not add scores (coord or not coord doesn't matter here).
 
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-20 Thread Benson Margulies
Uwe and Robert,

Thanks. David and I are two peas in one pod here at Basis.

--benson

On Fri, Apr 20, 2012 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 Ah sorry, I misunderstood, you wanted to score the duplicate match lower! To
 achieve this, you have to change the coord function in your
 similarity/BooleanWeight used for this query.

 Either way: If you want a group of terms that get only one score if at least
 one of the terms match (SQL IN), but not add them at all,
 DisjunctionMaxQuery is fine. I think this is what Benson asked for.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, April 20, 2012 8:16 AM
 To: java-user@lucene.apache.org; david_murgatr...@hotmail.com
 Subject: RE: DisjunctionMaxQuery and scoring

 Hi,
  I think
   BooleanQuery bq = new BooleanQuery(false); doesn't quite accomplish
  the desired name IN (dick, rich) scoring behavior. This is because
 (name:dick |
  name:rich) with coord=false would score the 'document' Dick Rich
  higher than Rich because the former has two term matches and the
  latter only
 one.
  In contrast, I think the desire is that one and only one of the terms
  in
 the
  document match those in the BooleanQuery so that Rich would score
  higher than Dick Rich, given document length normalization. It's
  almost like a
 desire
  for BooleanQuery bq = new BooleanQuery(false);
    bq.set*Maximum*NumberShouldMatch(1);

 I that case DisjunctionMaxQuery is the way to go (it will only count the
 hit with
 highest score and not add scores (coord or not coord doesn't matter here).


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
I am trying to solve a problem using DisjunctionMaxQuery.


Consider a query like:

a:b OR c:d OR e:f OR ...
name:richard OR name:dick OR name:dickie OR name:rich ...

At most, one of the richard names matches. So the match score gets
dragged down by the long list of things that don't match, as the list
can get quite long.

It seemed to me, upon reading the documentation, that I could cure
this problem by creating a query tree that used DisjunctionMaxQuery
around all those nicknames. However, when I built a boolean query that
had, as a clause, a DisjunctionMaxQuery in the place of a pile of
these individual Term queries, the score and the explanation did not
change at all -- in particular, the coord term shows the same number
of total terms. So it looks as if the children of the disjunction
still count.

Is there a way to control that term? Or a better way to express this?
Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)

as a single term query. Reading the javadoc, I am seeing
MultiTermQuery, and I'm that it is what we want.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


I think you just want to disable coord() here? You can do this for
that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


 I think you just want to disable coord() here? You can do this for
 that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

Rob,

How do nested queries work with respect to this? If I build a boolean
query one of whose clauses is a BooleanQuery with coord turned off,
does just the nested query insides get left out of 'coord'?

If so, then your answer certainly seems to be what the doctor ordered.

--benson



 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
Turning on disableCoord for a nested boolean query does not seem to
change the overall maxCoord term as displayed in explain.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


 I think you just want to disable coord() here? You can do this for
 that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

 Rob,

 How do nested queries work with respect to this? If I build a boolean
 query one of whose clauses is a BooleanQuery with coord turned off,
 does just the nested query insides get left out of 'coord'?

 If so, then your answer certainly seems to be what the doctor ordered.


it applies only to that query itself. So if this BQ is a clause to
another BQ that has coord enabled,
that would not change the top-level BQ's coord.

Note: if you don't want coord at all, then you can also plug in a
Similarity that returns 1,
or pick another Similarity like BM25: in trunk only the vector space
impl even does anything for coord()


-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


 I think you just want to disable coord() here? You can do this for
 that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

 Rob,

 How do nested queries work with respect to this? If I build a boolean
 query one of whose clauses is a BooleanQuery with coord turned off,
 does just the nested query insides get left out of 'coord'?

 If so, then your answer certainly seems to be what the doctor ordered.


 it applies only to that query itself. So if this BQ is a clause to
 another BQ that has coord enabled,
 that would not change the top-level BQ's coord.

 Note: if you don't want coord at all, then you can also plug in a
 Similarity that returns 1,
 or pick another Similarity like BM25: in trunk only the vector space
 impl even does anything for coord()

Robert, I'm sorry that my density is approaching lead. My problem is
that I want coord, but I want to control which terms are counted and
which are not. I suppose I can accomplish this with my own scorer. My
hope was that there was a way to express This group of terms counts
as one for coord.

In other words, for a subset of fields in the query, I want to scale
the entire score by the fraction of them that match.

Another way to think about this, which might be no use at all, is to
wonder: is there a way to charge a score penalty for failure to match
a particular query term? That would, from another direction, address
the underlying effect I'm trying to get.





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies bimargul...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


 I think you just want to disable coord() here? You can do this for
 that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

 Rob,

 How do nested queries work with respect to this? If I build a boolean
 query one of whose clauses is a BooleanQuery with coord turned off,
 does just the nested query insides get left out of 'coord'?

 If so, then your answer certainly seems to be what the doctor ordered.


 it applies only to that query itself. So if this BQ is a clause to
 another BQ that has coord enabled,
 that would not change the top-level BQ's coord.

 Note: if you don't want coord at all, then you can also plug in a
 Similarity that returns 1,
 or pick another Similarity like BM25: in trunk only the vector space
 impl even does anything for coord()

 Robert, I'm sorry that my density is approaching lead. My problem is
 that I want coord, but I want to control which terms are counted and
 which are not. I suppose I can accomplish this with my own scorer. My
 hope was that there was a way to express This group of terms counts
 as one for coord.

So just structure your boolean query appropriately?

BQ1(coord=true)
  BQ2(coord=false): 25 terms
  BQ3(coord=false): 87 terms

BQ1's coord is based on how many subscorers match (out of 2, BQ2 and
BQ3). If both match its 2/2 otherwise 1/2.

But in this example BQ2 and BQ3 disable coord themselves, hiding the
fact they accept 25 and 87 terms respectively and appearing as a
single sub for coord().

Does this make sense? you can extend this idea to control this however
you want by structuring the BQ appropriately so your BQ's with
synonyms have coord=0

-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
On Thu, Apr 19, 2012 at 5:10 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 5:05 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 On Thu, Apr 19, 2012 at 4:21 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 3:49 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 On Thu, Apr 19, 2012 at 1:34 PM, Robert Muir rcm...@gmail.com wrote:
 On Thu, Apr 19, 2012 at 1:26 PM, Benson Margulies bimargul...@gmail.com 
 wrote:
 I am trying to solve a problem using DisjunctionMaxQuery.


 Consider a query like:

 a:b OR c:d OR e:f OR ...
 name:richard OR name:dick OR name:dickie OR name:rich ...

 At most, one of the richard names matches. So the match score gets
 dragged down by the long list of things that don't match, as the list
 can get quite long.

 It seemed to me, upon reading the documentation, that I could cure
 this problem by creating a query tree that used DisjunctionMaxQuery
 around all those nicknames. However, when I built a boolean query that
 had, as a clause, a DisjunctionMaxQuery in the place of a pile of
 these individual Term queries, the score and the explanation did not
 change at all -- in particular, the coord term shows the same number
 of total terms. So it looks as if the children of the disjunction
 still count.

 Is there a way to control that term? Or a better way to express this?
 Thinking SQL for a moment, what I'm trying to express is

   name IN (richard, dick, dickie, rich)


 I think you just want to disable coord() here? You can do this for
 that particular boolean query by passing true to the ctor:

  public BooleanQuery(boolean disableCoord)

 Rob,

 How do nested queries work with respect to this? If I build a boolean
 query one of whose clauses is a BooleanQuery with coord turned off,
 does just the nested query insides get left out of 'coord'?

 If so, then your answer certainly seems to be what the doctor ordered.


 it applies only to that query itself. So if this BQ is a clause to
 another BQ that has coord enabled,
 that would not change the top-level BQ's coord.

 Note: if you don't want coord at all, then you can also plug in a
 Similarity that returns 1,
 or pick another Similarity like BM25: in trunk only the vector space
 impl even does anything for coord()

 Robert, I'm sorry that my density is approaching lead. My problem is
 that I want coord, but I want to control which terms are counted and
 which are not. I suppose I can accomplish this with my own scorer. My
 hope was that there was a way to express This group of terms counts
 as one for coord.

 So just structure your boolean query appropriately?

 BQ1(coord=true)
  BQ2(coord=false): 25 terms
  BQ3(coord=false): 87 terms

 BQ1's coord is based on how many subscorers match (out of 2, BQ2 and
 BQ3). If both match its 2/2 otherwise 1/2.

 But in this example BQ2 and BQ3 disable coord themselves, hiding the
 fact they accept 25 and 87 terms respectively and appearing as a
 single sub for coord().

 Does this make sense? you can extend this idea to control this however
 you want by structuring the BQ appropriately so your BQ's with
 synonyms have coord=0

Robert,

This makes perfect sense, it is what I thought you meant to begin
with. I tried it and thought that it did not work. Or, perhaps, I am
misreading the 'explain' output. Or, more likely, I goofed altogether.
I'll go back and recheck my results and post some explain output if I
can't find my mistake.

--benson





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
I see why I'm so confused, but I think I need to construct a simpler test case.

My top-level BooleanQuery, which has disableCoord=false, has 22
clauses. All but three are ordinary SHOULD TermQueries. the remainder
are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
(that's a bug).

However, at the end of the explain trace, I see:

0.45 = coord(9/20) I think that my nested Boolean, for which I've been
flipping coord on and off to see what happens, is somehow not
participating at all. So switching it's coord on and off has no
effect.

Why 20? Why not 22? Is this just an explain quirk? Should I shove all
this code up to 3.6 from 2.9.3 before bugging you further?

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread David Murgatroyd




On Apr 19, 2012, at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote:

 I see why I'm so confused, but I think I need to construct a simpler test 
 case.
 
 My top-level BooleanQuery, which has disableCoord=false, has 22
 clauses. All but three are ordinary SHOULD TermQueries. the remainder
 are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
 (that's a bug).
 
 However, at the end of the explain trace, I see:
 
 0.45 = coord(9/20) I think that my nested Boolean, for which I've been
 flipping coord on and off to see what happens, is somehow not
 participating at all. So switching it's coord on and off has no
 effect.
 
 Why 20? Why not 22? Is this just an explain quirk? Should I shove all
 this code up to 3.6 from 2.9.3 before bugging you further?
 
 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Robert Muir
On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com wrote:
 I see why I'm so confused, but I think I need to construct a simpler test 
 case.

 My top-level BooleanQuery, which has disableCoord=false, has 22
 clauses. All but three are ordinary SHOULD TermQueries. the remainder
 are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
 (that's a bug).

 However, at the end of the explain trace, I see:

 0.45 = coord(9/20) I think that my nested Boolean, for which I've been
 flipping coord on and off to see what happens, is somehow not
 participating at all. So switching it's coord on and off has no
 effect.

 Why 20? Why not 22? Is this just an explain quirk?

I am not sure (also not sure i understand your example totally), but
at the same time could be as simple as the fact you have 2 prohibited
(MUST_NOT) clauses. These don't count towards coord()

I think its hard to tell from your description (just since it doesn't
have all the details). an explain or test case or something like that
would might be more efficient if its still not making sense...

-- 
lucidimagination.com

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread David Murgatroyd
[apologies for the earlier errant send]

I think
 BooleanQuery bq = new BooleanQuery(false);
doesn't quite accomplish the desired name IN (dick, rich) scoring
behavior. This is because (name:dick | name:rich) with coord=false would
score the 'document' Dick Rich higher than Rich because the former has
two term matches and the latter only one. In contrast, I think the desire
is that one and only one of the terms in the document match those in the
BooleanQuery so that Rich would score higher than Dick Rich, given
document length normalization. It's almost like a desire for
BooleanQuery bq = new BooleanQuery(false);
  bq.set*Maximum*NumberShouldMatch(1);

Is there a good way to accomplish this?

On Thu, Apr 19, 2012 at 7:37 PM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com
 wrote:
  I see why I'm so confused, but I think I need to construct a simpler
 test case.
 
  My top-level BooleanQuery, which has disableCoord=false, has 22
  clauses. All but three are ordinary SHOULD TermQueries. the remainder
  are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
  (that's a bug).
 
  However, at the end of the explain trace, I see:
 
  0.45 = coord(9/20) I think that my nested Boolean, for which I've been
  flipping coord on and off to see what happens, is somehow not
  participating at all. So switching it's coord on and off has no
  effect.
 
  Why 20? Why not 22? Is this just an explain quirk?

 I am not sure (also not sure i understand your example totally), but
 at the same time could be as simple as the fact you have 2 prohibited
 (MUST_NOT) clauses. These don't count towards coord()

 I think its hard to tell from your description (just since it doesn't
 have all the details). an explain or test case or something like that
 would might be more efficient if its still not making sense...

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: DisjunctionMaxQuery and scoring

2012-04-19 Thread Benson Margulies
FWIW, there seems to be an explain bug in 2.9.1 that is fixed in
3.6.0, so I'm no longer confused about the actual behavior.


On Thu, Apr 19, 2012 at 8:32 PM, David Murgatroyd dmu...@gmail.com wrote:
 [apologies for the earlier errant send]

 I think
  BooleanQuery bq = new BooleanQuery(false);
 doesn't quite accomplish the desired name IN (dick, rich) scoring
 behavior. This is because (name:dick | name:rich) with coord=false would
 score the 'document' Dick Rich higher than Rich because the former has
 two term matches and the latter only one. In contrast, I think the desire
 is that one and only one of the terms in the document match those in the
 BooleanQuery so that Rich would score higher than Dick Rich, given
 document length normalization. It's almost like a desire for
 BooleanQuery bq = new BooleanQuery(false);
  bq.set*Maximum*NumberShouldMatch(1);

 Is there a good way to accomplish this?

 On Thu, Apr 19, 2012 at 7:37 PM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Apr 19, 2012 at 6:36 PM, Benson Margulies bimargul...@gmail.com
 wrote:
  I see why I'm so confused, but I think I need to construct a simpler
 test case.
 
  My top-level BooleanQuery, which has disableCoord=false, has 22
  clauses. All but three are ordinary SHOULD TermQueries. the remainder
  are a spanNear and a nested BooleanQuery, and an empty PhraseQuery
  (that's a bug).
 
  However, at the end of the explain trace, I see:
 
  0.45 = coord(9/20) I think that my nested Boolean, for which I've been
  flipping coord on and off to see what happens, is somehow not
  participating at all. So switching it's coord on and off has no
  effect.
 
  Why 20? Why not 22? Is this just an explain quirk?

 I am not sure (also not sure i understand your example totally), but
 at the same time could be as simple as the fact you have 2 prohibited
 (MUST_NOT) clauses. These don't count towards coord()

 I think its hard to tell from your description (just since it doesn't
 have all the details). an explain or test case or something like that
 would might be more efficient if its still not making sense...

 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org