Re: Scoring on multi-valued fields

2010-08-06 Thread Chris Hostetter

: The other would be to somehow control the scores of each id. So a document
: with 2 ids matching should be worth more then the document with only 1 id
: matching (This is how it works now) but a document with 7 ids matching
: shouldn't be worth more, or at least not a lot more, then a document that
: matches only 3 ids (this is not how it works). 

this is all drive by the "coord factor" of the outermost BooleanQuery ... 
you can provide a custom Similarity class thta generates differnet values 
based on the field/number of clauses, or if you are already generating the 
BooleanQuery via custom code (ie: your own QParser or what not) you can 
override the SImilartiy there.

: The reason this would be ideal for us is that we don't have any control over
: how many ids will be in the query and we don't want documents that have lots
: of ids to have an unnatural advantage over those with just a few.

If you put 'omitNorms="false"' on the field in question, then the length 
normalization (which rewards shorter documents) should help offset this -- 
no custom code required.

-Hoss



Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Well that does take care of some cases.

How about if we still want a hit on a tag to contribute to the weight
though? 

There would be 2 options. One is the one I described in the original post,
which is to grab the highest score of a set of ids.

The other would be to somehow control the scores of each id. So a document
with 2 ids matching should be worth more then the document with only 1 id
matching (This is how it works now) but a document with 7 ids matching
shouldn't be worth more, or at least not a lot more, then a document that
matches only 3 ids (this is not how it works). 

The reason this would be ideal for us is that we don't have any control over
how many ids will be in the query and we don't want documents that have lots
of ids to have an unnatural advantage over those with just a few.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020504.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread Yonik Seeley
On Tue, Aug 3, 2010 at 3:16 PM, oleg.gnatovskiy  wrote:
>
> Sorry guess I messed up my example query. The query should look like this:
>
> name:pizza AND id:(10 OR 20 OR 30)
>
> Thus if I do name:pizza^10 AND id:(10 OR 20 OR 30)^0 wouldn't a document
> that has all the ids (10,20, and 30) still come up higher then a document
> that has just one?

No, because the whole id:(10 OR 20 OR 30)^0 clause will contribute 0
to the final score.
Another way to get the same effect would be to pull it out as a filter:
q=name:pizza&fq=id:(10 OR 20 OR 30)

-Yonik
http://www.lucidimagination.com


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Sorry guess I messed up my example query. The query should look like this:

name:pizza AND id:(10 OR 20 OR 30) 

Thus if I do name:pizza^10 AND id:(10 OR 20 OR 30)^0 wouldn't a document
that has all the ids (10,20, and 30) still come up higher then a document
that has just one?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020234.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread Yonik Seeley
On Tue, Aug 3, 2010 at 2:42 PM, oleg.gnatovskiy  wrote:
>
> Oh sorry guys, I didn't correctly submit my original post to the mailing
> list. The original message was this:
> "
> Hello all. We are having some trouble with queries similar to the type shown
> below:
>
> name: pizza OR (id:10 OR id:20 OR id:30) (id is a multi-valued field)
>
> With the above query, we will always get documents with pizza in the name,
> and any document with id values of 10, 20, and 30 will always come up first.
> What we would like is to have a document with only id 10 to be weighted the
> same as a document with ids 10, 20, and 30.

How do you want pizza weighted against 10, 20, or 30?
If pizza can always come first, you can boost the second clause to zero:
pizza OR (id:10 OR id:20 OR id:30)^0

> What happens is that the sums of all the hits on ID are added up. Is there a
> way to only grab the first score?

There is a way to grab only the highest score from a set of options
(DisjunctionMaxQuery) but unfortunately there is no general query
parser syntax to support that yet.

-Yonik
http://www.lucidimagination.com


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

Oh sorry guys, I didn't correctly submit my original post to the mailing
list. The original message was this:
"
Hello all. We are having some trouble with queries similar to the type shown
below:

name: pizza OR (id:10 OR id:20 OR id:30) (id is a multi-valued field)

With the above query, we will always get documents with pizza in the name,
and any document with id values of 10, 20, and 30 will always come up first.
What we would like is to have a document with only id 10 to be weighted the
same as a document with ids 10, 20, and 30.

Is this possible with Lucene/Solr?

Thanks in advance for any assistance you might be able to offer. 
"
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020181.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Scoring on multi-valued fields

2010-08-03 Thread oleg.gnatovskiy

I checked the explain query.

What happens is that the sums of all the hits on ID are added up. Is there a
way to only grab the first score?

Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Scoring-on-multi-valued-fields-tp1017624p1020150.html
Sent from the Solr - User mailing list archive at Nabble.com.