How to correctly boost results in Solr Dismax query

2009-03-12 Thread Pete Smith
Hi,

I have managed to build an index in Solr which I can search on keyword,
produce facets, query facets etc. This is all working great. I have
implemented my search using a dismax query so it searches predetermined
fields.

However, my results are coming back sorted by score which appears to be
calculated by keyword relevancy only. I would like to adjust the score
where fields have pre-determined values. I think I can do this with
boost query and boost functions but the documentation here:

http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3

Is not particularly helpful. I tried adding adding a bq argument to my
search: 

&bq=media:DVD^2

(yes, this is an index of films!) but I find when I start adding more
and more:

&bq=media:DVD^2&bq=media:BLU-RAY^1.5

I find the negative results - e.g. films that are DVD but are not
BLU-RAY get negatively affected in their score. In the end it all seems
to even out and my score is as it was before i started boosting.

I must be doing this wrong and I wonder whether "boost function" comes
in somewhere. Any ideas on how to correctly use boost?

Cheers,
Pete

-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


Re: How to correctly boost results in Solr Dismax query

2009-03-12 Thread dabboo

Hi Pete,

bq parameter works with q,alt query parameter. If you are passing the search
criteria using q.alt query parameter then this bq parameter comes into
picture. Also, q.alt doesnt support field boosting.

If you want to boost the records with their field value then you must use q
query parameter instead of q.alt. 'q' parameter actually uses qf parameters
from solrConfig for field boosting.

Let me know if you have any questions.

Thanks,
Amit Garg





Pete Smith-3 wrote:
> 
> Hi,
> 
> I have managed to build an index in Solr which I can search on keyword,
> produce facets, query facets etc. This is all working great. I have
> implemented my search using a dismax query so it searches predetermined
> fields.
> 
> However, my results are coming back sorted by score which appears to be
> calculated by keyword relevancy only. I would like to adjust the score
> where fields have pre-determined values. I think I can do this with
> boost query and boost functions but the documentation here:
> 
> http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
> 
> Is not particularly helpful. I tried adding adding a bq argument to my
> search: 
> 
> &bq=media:DVD^2
> 
> (yes, this is an index of films!) but I find when I start adding more
> and more:
> 
> &bq=media:DVD^2&bq=media:BLU-RAY^1.5
> 
> I find the negative results - e.g. films that are DVD but are not
> BLU-RAY get negatively affected in their score. In the end it all seems
> to even out and my score is as it was before i started boosting.
> 
> I must be doing this wrong and I wonder whether "boost function" comes
> in somewhere. Any ideas on how to correctly use boost?
> 
> Cheers,
> Pete
> 
> -- 
> Pete Smith
> Developer
> 
> No.9 | 6 Portal Way | London | W3 6RU |
> T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
> 
> LOVEFiLM.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22490850.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks very much for your reply. What you said makes things a bit
clearer but I am still a bit confused.

On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> If you want to boost the records with their field value then you must use q
> query parameter instead of q.alt. 'q' parameter actually uses qf parameters
> from solrConfig for field boosting.

>From the documentation for Dismax queries, I thought that "q" is simply
a keyword parameter:

>From http://wiki.apache.org/solr/DisMaxRequestHandler:
q
The guts of the search defining the main "query". This is designed to be
support raw input strings provided by users with no special escaping.
'+' and '-' characters are treated as "mandatory" and "prohibited"
modifiers for the subsequent terms. Text wrapped in balanced quote
characters '"' are treated as phrases, any query containing an odd
number of quote characters is evaluated as if there were no quote
characters at all. Wildcards in this "q" parameter are not supported. 

And I thought 'qf' is a list of fields and boost scores:

>From http://wiki.apache.org/solr/DisMaxRequestHandler:
qf (Query Fields)
List of fields and the "boosts" to associate with each of them when
building DisjunctionMaxQueries from the user's query. The format
supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
fieldOne has a boost of 2.3, fieldTwo has the default boost, and
fieldThree has a boost of 0.4 ... this indicates that matches in
fieldOne are much more significant than matches in fieldTwo, which are
more significant than matches in fieldThree. 

But if I want to, say, search for films with 'indiana' in the title,
with media=DVD scoring higher than media=BLU-RAY then do I need to do
something like:

solr/select?q=indiana

And in my config:

media^2

But I don't see where the actual *contents* of the media field would
determine the boost.

Sorry if I have misunderstood what you mean.

Cheers,
Pete

> Pete Smith-3 wrote:
> > 
> > Hi,
> > 
> > I have managed to build an index in Solr which I can search on keyword,
> > produce facets, query facets etc. This is all working great. I have
> > implemented my search using a dismax query so it searches predetermined
> > fields.
> > 
> > However, my results are coming back sorted by score which appears to be
> > calculated by keyword relevancy only. I would like to adjust the score
> > where fields have pre-determined values. I think I can do this with
> > boost query and boost functions but the documentation here:
> > 
> > http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
> > 
> > Is not particularly helpful. I tried adding adding a bq argument to my
> > search: 
> > 
> > &bq=media:DVD^2
> > 
> > (yes, this is an index of films!) but I find when I start adding more
> > and more:
> > 
> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5
> > 
> > I find the negative results - e.g. films that are DVD but are not
> > BLU-RAY get negatively affected in their score. In the end it all seems
> > to even out and my score is as it was before i started boosting.
> > 
> > I must be doing this wrong and I wonder whether "boost function" comes
> > in somewhere. Any ideas on how to correctly use boost?
> > 
> > Cheers,
> > Pete
> > 
> > -- 
> > Pete Smith
> > Developer
> > 
> > No.9 | 6 Portal Way | London | W3 6RU |
> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
> > 
> > LOVEFiLM.com
> > 
> > 
> 
-- 
Pete Smith
Developer

No.9 | 6 Portal Way | London | W3 6RU |
T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111

LOVEFiLM.com


Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo

Pete,

Sorry, if wasnt clear. Here is the explanation.

Suppose you have 2 records and they have films and media as 2 columns.

Now first record has values like films="Indiana" and media="blue ray"
and 2nd record has values like films="Bond" and media="Indiana"

Values for qf parameters

media^2.0 films^1.0

Now, search for q=Indiana .. it should display both of the records but
record #2 will display above than the 1st.

Let me know if you still have questions.

Cheers,
amit


Pete Smith-3 wrote:
> 
> Hi Amit,
> 
> Thanks very much for your reply. What you said makes things a bit
> clearer but I am still a bit confused.
> 
> On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
>> If you want to boost the records with their field value then you must use
>> q
>> query parameter instead of q.alt. 'q' parameter actually uses qf
>> parameters
>> from solrConfig for field boosting.
> 
>>From the documentation for Dismax queries, I thought that "q" is simply
> a keyword parameter:
> 
>>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> q
> The guts of the search defining the main "query". This is designed to be
> support raw input strings provided by users with no special escaping.
> '+' and '-' characters are treated as "mandatory" and "prohibited"
> modifiers for the subsequent terms. Text wrapped in balanced quote
> characters '"' are treated as phrases, any query containing an odd
> number of quote characters is evaluated as if there were no quote
> characters at all. Wildcards in this "q" parameter are not supported. 
> 
> And I thought 'qf' is a list of fields and boost scores:
> 
>>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> qf (Query Fields)
> List of fields and the "boosts" to associate with each of them when
> building DisjunctionMaxQueries from the user's query. The format
> supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
> fieldOne has a boost of 2.3, fieldTwo has the default boost, and
> fieldThree has a boost of 0.4 ... this indicates that matches in
> fieldOne are much more significant than matches in fieldTwo, which are
> more significant than matches in fieldThree. 
> 
> But if I want to, say, search for films with 'indiana' in the title,
> with media=DVD scoring higher than media=BLU-RAY then do I need to do
> something like:
> 
> solr/select?q=indiana
> 
> And in my config:
> 
> media^2
> 
> But I don't see where the actual *contents* of the media field would
> determine the boost.
> 
> Sorry if I have misunderstood what you mean.
> 
> Cheers,
> Pete
> 
>> Pete Smith-3 wrote:
>> > 
>> > Hi,
>> > 
>> > I have managed to build an index in Solr which I can search on keyword,
>> > produce facets, query facets etc. This is all working great. I have
>> > implemented my search using a dismax query so it searches predetermined
>> > fields.
>> > 
>> > However, my results are coming back sorted by score which appears to be
>> > calculated by keyword relevancy only. I would like to adjust the score
>> > where fields have pre-determined values. I think I can do this with
>> > boost query and boost functions but the documentation here:
>> > 
>> >
>> http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
>> > 
>> > Is not particularly helpful. I tried adding adding a bq argument to my
>> > search: 
>> > 
>> > &bq=media:DVD^2
>> > 
>> > (yes, this is an index of films!) but I find when I start adding more
>> > and more:
>> > 
>> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5
>> > 
>> > I find the negative results - e.g. films that are DVD but are not
>> > BLU-RAY get negatively affected in their score. In the end it all seems
>> > to even out and my score is as it was before i started boosting.
>> > 
>> > I must be doing this wrong and I wonder whether "boost function" comes
>> > in somewhere. Any ideas on how to correctly use boost?
>> > 
>> > Cheers,
>> > Pete
>> > 
>> > -- 
>> > Pete Smith
>> > Developer
>> > 
>> > No.9 | 6 Portal Way | London | W3 6RU |
>> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
>> > 
>> > LOVEFiLM.com
>> > 
>> > 
>> 
> -- 
> Pete Smith
> Developer
> 
> No.9 | 6 Portal Way | London | W3 6RU |
> T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
> 
> LOVEFiLM.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22493646.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi Amit,

Thanks again for your reply. I am understanding it a bit better but I
think it would help if I posted an example. Say I have three records:


1
BLU-RAY
Indiana Jones and the Kingdom of the Crystal
Skull


2
DVD
Indiana Jones and the Kingdom of the Crystal
Skull


3
DVD
Casino Royale


Now, if I search for indiana: select?q=indiana

I want the first two rows to come back (not the third as it does not
contain 'indiana'). I would like record 2 to be scored higher than
record 1 as it's media type is DVD.

At the moment I have in my config:

title

And i was trying to boost by media having a specific value by using 'bq'
but from what you told me that is incorrect.

Cheers,
Pete


On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> Pete,
> 
> Sorry, if wasnt clear. Here is the explanation.
> 
> Suppose you have 2 records and they have films and media as 2 columns.
> 
> Now first record has values like films="Indiana" and media="blue ray"
> and 2nd record has values like films="Bond" and media="Indiana"
> 
> Values for qf parameters
> 
> media^2.0 films^1.0
> 
> Now, search for q=Indiana .. it should display both of the records but
> record #2 will display above than the 1st.
> 
> Let me know if you still have questions.
> 
> Cheers,
> amit
> 
> 
> Pete Smith-3 wrote:
> > 
> > Hi Amit,
> > 
> > Thanks very much for your reply. What you said makes things a bit
> > clearer but I am still a bit confused.
> > 
> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> >> If you want to boost the records with their field value then you must use
> >> q
> >> query parameter instead of q.alt. 'q' parameter actually uses qf
> >> parameters
> >> from solrConfig for field boosting.
> > 
> >>From the documentation for Dismax queries, I thought that "q" is simply
> > a keyword parameter:
> > 
> >>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> > q
> > The guts of the search defining the main "query". This is designed to be
> > support raw input strings provided by users with no special escaping.
> > '+' and '-' characters are treated as "mandatory" and "prohibited"
> > modifiers for the subsequent terms. Text wrapped in balanced quote
> > characters '"' are treated as phrases, any query containing an odd
> > number of quote characters is evaluated as if there were no quote
> > characters at all. Wildcards in this "q" parameter are not supported. 
> > 
> > And I thought 'qf' is a list of fields and boost scores:
> > 
> >>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> > qf (Query Fields)
> > List of fields and the "boosts" to associate with each of them when
> > building DisjunctionMaxQueries from the user's query. The format
> > supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
> > fieldOne has a boost of 2.3, fieldTwo has the default boost, and
> > fieldThree has a boost of 0.4 ... this indicates that matches in
> > fieldOne are much more significant than matches in fieldTwo, which are
> > more significant than matches in fieldThree. 
> > 
> > But if I want to, say, search for films with 'indiana' in the title,
> > with media=DVD scoring higher than media=BLU-RAY then do I need to do
> > something like:
> > 
> > solr/select?q=indiana
> > 
> > And in my config:
> > 
> > media^2
> > 
> > But I don't see where the actual *contents* of the media field would
> > determine the boost.
> > 
> > Sorry if I have misunderstood what you mean.
> > 
> > Cheers,
> > Pete
> > 
> >> Pete Smith-3 wrote:
> >> > 
> >> > Hi,
> >> > 
> >> > I have managed to build an index in Solr which I can search on keyword,
> >> > produce facets, query facets etc. This is all working great. I have
> >> > implemented my search using a dismax query so it searches predetermined
> >> > fields.
> >> > 
> >> > However, my results are coming back sorted by score which appears to be
> >> > calculated by keyword relevancy only. I would like to adjust the score
> >> > where fields have pre-determined values. I think I can do this with
> >> > boost query and boost functions but the documentation here:
> >> > 
> >> >
> >> http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
> >> > 
> >> > Is not particularly helpful. I tried adding adding a bq argument to my
> >> > search: 
> >> > 
> >> > &bq=media:DVD^2
> >> > 
> >> > (yes, this is an index of films!) but I find when I start adding more
> >> > and more:
> >> > 
> >> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5
> >> > 
> >> > I find the negative results - e.g. films that are DVD but are not
> >> > BLU-RAY get negatively affected in their score. In the end it all seems
> >> > to even out and my score is as it was before i started boosting.
> >> > 
> >> > I must be doing this wrong and I wonder whether "boost function" comes
> >> > in somewhere. Any ideas on how to correctly use boost?
> >> > 
> >> > Cheers,
> >> > Pete
> >> > 
> >> > -- 
> >> > Pete Smith
> >> > Developer
> >> > 
> >> > No.9 | 6 Portal Way | London | W3 6RU |
> >> > T: +44 (0

Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread dabboo
> > 
>> >> > I have managed to build an index in Solr which I can search on
>> keyword,
>> >> > produce facets, query facets etc. This is all working great. I have
>> >> > implemented my search using a dismax query so it searches
>> predetermined
>> >> > fields.
>> >> > 
>> >> > However, my results are coming back sorted by score which appears to
>> be
>> >> > calculated by keyword relevancy only. I would like to adjust the
>> score
>> >> > where fields have pre-determined values. I think I can do this with
>> >> > boost query and boost functions but the documentation here:
>> >> > 
>> >> >
>> >>
>> http://wiki.apache.org/solr/DisMaxRequestHandler#head-6862070cf279d9a09bdab971309135c7aea22fb3
>> >> > 
>> >> > Is not particularly helpful. I tried adding adding a bq argument to
>> my
>> >> > search: 
>> >> > 
>> >> > &bq=media:DVD^2
>> >> > 
>> >> > (yes, this is an index of films!) but I find when I start adding
>> more
>> >> > and more:
>> >> > 
>> >> > &bq=media:DVD^2&bq=media:BLU-RAY^1.5
>> >> > 
>> >> > I find the negative results - e.g. films that are DVD but are not
>> >> > BLU-RAY get negatively affected in their score. In the end it all
>> seems
>> >> > to even out and my score is as it was before i started boosting.
>> >> > 
>> >> > I must be doing this wrong and I wonder whether "boost function"
>> comes
>> >> > in somewhere. Any ideas on how to correctly use boost?
>> >> > 
>> >> > Cheers,
>> >> > Pete
>> >> > 
>> >> > -- 
>> >> > Pete Smith
>> >> > Developer
>> >> > 
>> >> > No.9 | 6 Portal Way | London | W3 6RU |
>> >> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
>> >> > 
>> >> > LOVEFiLM.com
>> >> > 
>> >> > 
>> >> 
>> > -- 
>> > Pete Smith
>> > Developer
>> > 
>> > No.9 | 6 Portal Way | London | W3 6RU |
>> > T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
>> > 
>> > LOVEFiLM.com
>> > 
>> > 
>> 
> -- 
> Pete Smith
> Developer
> 
> No.9 | 6 Portal Way | London | W3 6RU |
> T: +44 (0)20 8896 8070 | F: +44 (0)20 8896 8111
> 
> LOVEFiLM.com
> 
> 

-- 
View this message in context: 
http://www.nabble.com/How-to-correctly-boost-results-in-Solr-Dismax-query-tp22476204p22494196.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to correctly boost results in Solr Dismax query

2009-03-13 Thread Pete Smith
Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
> bq works only with q.alt query and not with q queries. So, in your case you
> would be using qf parameter for field boosting, you will have to give both
> the fields in qf parameter i.e. both title and media.
> 
> try this
> 
> media^1.0 title^100.0

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


> Pete Smith-3 wrote:
> > 
> > Hi Amit,
> > 
> > Thanks again for your reply. I am understanding it a bit better but I
> > think it would help if I posted an example. Say I have three records:
> > 
> > 
> > 1
> > BLU-RAY
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 2
> > DVD
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 3
> > DVD
> > Casino Royale
> > 
> > 
> > Now, if I search for indiana: select?q=indiana
> > 
> > I want the first two rows to come back (not the third as it does not
> > contain 'indiana'). I would like record 2 to be scored higher than
> > record 1 as it's media type is DVD.
> > 
> > At the moment I have in my config:
> > 
> > title
> > 
> > And i was trying to boost by media having a specific value by using 'bq'
> > but from what you told me that is incorrect.
> > 
> > Cheers,
> > Pete
> > 
> > 
> > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> >> Pete,
> >> 
> >> Sorry, if wasnt clear. Here is the explanation.
> >> 
> >> Suppose you have 2 records and they have films and media as 2 columns.
> >> 
> >> Now first record has values like films="Indiana" and media="blue ray"
> >> and 2nd record has values like films="Bond" and media="Indiana"
> >> 
> >> Values for qf parameters
> >> 
> >> media^2.0 films^1.0
> >> 
> >> Now, search for q=Indiana .. it should display both of the records but
> >> record #2 will display above than the 1st.
> >> 
> >> Let me know if you still have questions.
> >> 
> >> Cheers,
> >> amit
> >> 
> >> 
> >> Pete Smith-3 wrote:
> >> > 
> >> > Hi Amit,
> >> > 
> >> > Thanks very much for your reply. What you said makes things a bit
> >> > clearer but I am still a bit confused.
> >> > 
> >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> >> >> If you want to boost the records with their field value then you must
> >> use
> >> >> q
> >> >> query parameter instead of q.alt. 'q' parameter actually uses qf
> >> >> parameters
> >> >> from solrConfig for field boosting.
> >> > 
> >> >>From the documentation for Dismax queries, I thought that "q" is simply
> >> > a keyword parameter:
> >> > 
> >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> >> > q
> >> > The guts of the search defining the main "query". This is designed to
> >> be
> >> > support raw input strings provided by users with no special escaping.
> >> > '+' and '-' characters are treated as "mandatory" and "prohibited"
> >> > modifiers for the subsequent terms. Text wrapped in balanced quote
> >> > characters '"' are treated as phrases, any query containing an odd
> >> > number of quote characters is evaluated as if there were no quote
> >> > characters at all. Wildcards in this "q" parameter are not supported. 
> >> > 
> >> > And I thought 'qf' is a list of fields and boost scores:
> >> > 
> >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> >> > qf (Query Fields)
> >> > List of fields and the "boosts" to associate with each of them when
> >> > building DisjunctionMaxQueries from the user's query. The format
> >> > supported is fieldOne^2.3 fieldTwo fieldThree^0.4, which indicates that
> >> > fieldOne has a boost of 2.3, fieldTwo has the default boost, and
> >> > fieldThree has a boost of 0.4 ... this indicates that matches in
> >> > fieldOne are much more significant than matches in fieldTwo, which are
> >> > more significant than matches in fieldThree. 
> >> > 
> >> > But if I want to, say, search for films with 'indiana' in the title,
> >> > with media=DVD scoring higher than media=BLU-RAY then do I need to do
> >> > something like:
> >> > 
> >> > solr/select?q=indiana
> >> > 
> >> > And in my config:
> >> > 
> >> > media^2
> >> > 
> >> > But I don't see where the actual *contents* of the media field would
> >> > determine the boost.
> >> > 
> >> > Sorry if I have misunderstood what you mean.
> >> > 
> >> > Cheers,
> >> > Pete
> >> > 
> >> >> Pete Smith-3 wrote:
> >> >> > 
> >> >> > Hi,
> >> >> > 
> >> >> > I have managed to build an index in Solr which I can search on
> >> keyword,
> >> >> > produce facets, query facets etc. This is all working great. I have
> >> >> > implemented my search using a dismax query so it searches
> >> predetermined
> >> >> > fields.
> >> >> > 
> >> >> > However, my results are coming back sorted by score which appears to
> >> be
> >> >> > calculated by keyword relevancy only. I would like to adjust the
> >> score
> >> >> > where fields have pre-determined values. I think I can do this with
> >> >> > boost query and boost functions but the documentation here:
> >> >> > 
> >> >> >
> >> >>
> >> http://wiki

RE: How to correctly boost results in Solr Dismax query

2009-03-15 Thread Dean Missikowski (Consultant), CLSA
Hi,

My experience is that the BQ parameter can be used with any query type.
You can define boosts on the query fields (qf) that are used with the
query terms (q) in your query, AND you can define additional boosts for
fields that are not used with the query terms through the bq or bf
parameters. 

I think the relative weight that assigning a particular boost to a field
via BQ has on the overall scoring needs to take into consideration the
other fields in your query. If you're searching on titles, you might
want to consider setting omitNorms=true (means don't generate length
normalization vectors) for title in your schema.xml, and if you're using
Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
that results aren't skewed by short and long titles, or titles that
contain multiple occurrences of the same term (setting these requires
you to reindex). I think this should have the effect of making BQ boosts
like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. 

-- Dean

-Original Message-
From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
Sent: 13/03/2009 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: How to correctly boost results in Solr Dismax query

Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
> bq works only with q.alt query and not with q queries. So, in your
case you
> would be using qf parameter for field boosting, you will have to give
both
> the fields in qf parameter i.e. both title and media.
> 
> try this
> 
> media^1.0 title^100.0

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


> Pete Smith-3 wrote:
> > 
> > Hi Amit,
> > 
> > Thanks again for your reply. I am understanding it a bit better but
I
> > think it would help if I posted an example. Say I have three
records:
> > 
> > 
> > 1
> > BLU-RAY
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 2
> > DVD
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 3
> > DVD
> > Casino Royale
> > 
> > 
> > Now, if I search for indiana: select?q=indiana
> > 
> > I want the first two rows to come back (not the third as it does not
> > contain 'indiana'). I would like record 2 to be scored higher than
> > record 1 as it's media type is DVD.
> > 
> > At the moment I have in my config:
> > 
> > title
> > 
> > And i was trying to boost by media having a specific value by using
'bq'
> > but from what you told me that is incorrect.
> > 
> > Cheers,
> > Pete
> > 
> > 
> > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> >> Pete,
> >> 
> >> Sorry, if wasnt clear. Here is the explanation.
> >> 
> >> Suppose you have 2 records and they have films and media as 2
columns.
> >> 
> >> Now first record has values like films="Indiana" and media="blue
ray"
> >> and 2nd record has values like films="Bond" and media="Indiana"
> >> 
> >> Values for qf parameters
> >> 
> >> media^2.0 films^1.0
> >> 
> >> Now, search for q=Indiana .. it should display both of the records
but
> >> record #2 will display above than the 1st.
> >> 
> >> Let me know if you still have questions.
> >> 
> >> Cheers,
> >> amit
> >> 
> >> 
> >> Pete Smith-3 wrote:
> >> > 
> >> > Hi Amit,
> >> > 
> >> > Thanks very much for your reply. What you said makes things a bit
> >> > clearer but I am still a bit confused.
> >> > 
> >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> >> >> If you want to boost the records with their field value then you
must
> >> use
> >> >> q
> >> >> query parameter instead of q.alt. 'q' parameter actually uses qf
> >> >> parameters
> >> >> from solrConfig for field boosting.
> >> > 
> >> >>From the documentation for Dismax queries, I thought that "q" is
simply
> >> > a keyword parameter:
> >> > 
> >> >>From http://wiki.apache.org/solr/DisMaxRequestHandler:
> >> > q
> >> > The guts of the search defining the main "query". This is
designed to
> >> be
> >> > support raw input strings provided by users with no special
escaping.
> >> > '+' and '-' characters are treated as "mandatory" and
"prohibited"
> >&g

RE: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Dean Missikowski (Consultant), CLSA
If you just discovered the omitTf parameter because of this post, please
be aware that I've not really explained it's purpose properly and note
that using it will prevent phrase queries from working. See this thread
for clarification on it's use here:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3
c897559.95769...@web50301.mail.re2.yahoo.com%3e

-- Dean

-Original Message-
From: Dean Missikowski (Consultant), CLSA 
Sent: 16/03/2009 10:30 AM
To: solr-user@lucene.apache.org
Subject: RE: How to correctly boost results in Solr Dismax query

Hi,

My experience is that the BQ parameter can be used with any query type.
You can define boosts on the query fields (qf) that are used with the
query terms (q) in your query, AND you can define additional boosts for
fields that are not used with the query terms through the bq or bf
parameters. 

I think the relative weight that assigning a particular boost to a field
via BQ has on the overall scoring needs to take into consideration the
other fields in your query. If you're searching on titles, you might
want to consider setting omitNorms=true (means don't generate length
normalization vectors) for title in your schema.xml, and if you're using
Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
that results aren't skewed by short and long titles, or titles that
contain multiple occurrences of the same term (setting these requires
you to reindex). I think this should have the effect of making BQ boosts
like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. 

-- Dean

-Original Message-
From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
Sent: 13/03/2009 7:11 PM
To: solr-user@lucene.apache.org
Subject: Re: How to correctly boost results in Solr Dismax query

Hi,

On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
> bq works only with q.alt query and not with q queries. So, in your
case you
> would be using qf parameter for field boosting, you will have to give
both
> the fields in qf parameter i.e. both title and media.
> 
> try this
> 
> media^1.0 title^100.0

But with that, how will it know to rank media:DVD higher than
media:BLU-RAY?

Cheers,
Pete


> Pete Smith-3 wrote:
> > 
> > Hi Amit,
> > 
> > Thanks again for your reply. I am understanding it a bit better but
I
> > think it would help if I posted an example. Say I have three
records:
> > 
> > 
> > 1
> > BLU-RAY
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 2
> > DVD
> > Indiana Jones and the Kingdom of the Crystal
> > Skull
> > 
> > 
> > 3
> > DVD
> > Casino Royale
> > 
> > 
> > Now, if I search for indiana: select?q=indiana
> > 
> > I want the first two rows to come back (not the third as it does not
> > contain 'indiana'). I would like record 2 to be scored higher than
> > record 1 as it's media type is DVD.
> > 
> > At the moment I have in my config:
> > 
> > title
> > 
> > And i was trying to boost by media having a specific value by using
'bq'
> > but from what you told me that is incorrect.
> > 
> > Cheers,
> > Pete
> > 
> > 
> > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> >> Pete,
> >> 
> >> Sorry, if wasnt clear. Here is the explanation.
> >> 
> >> Suppose you have 2 records and they have films and media as 2
columns.
> >> 
> >> Now first record has values like films="Indiana" and media="blue
ray"
> >> and 2nd record has values like films="Bond" and media="Indiana"
> >> 
> >> Values for qf parameters
> >> 
> >> media^2.0 films^1.0
> >> 
> >> Now, search for q=Indiana .. it should display both of the records
but
> >> record #2 will display above than the 1st.
> >> 
> >> Let me know if you still have questions.
> >> 
> >> Cheers,
> >> amit
> >> 
> >> 
> >> Pete Smith-3 wrote:
> >> > 
> >> > Hi Amit,
> >> > 
> >> > Thanks very much for your reply. What you said makes things a bit
> >> > clearer but I am still a bit confused.
> >> > 
> >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> >> >> If you want to boost the records with their field value then you
must
> >> use
> >> >> q
> >> >> query parameter instead of q.alt. 'q' parameter actually uses qf
> >> >> parameters
> >> >> from solrConfig for field boosting.
> >> > 
> 

Re: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Otis Gospodnetic

Also note that we have an open and related issue on Lucene's bug tracking 
system.  omitTf might get renamed so that it's more clear that positional 
information is not stored, which prevents phrase queries.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: "Dean Missikowski (Consultant), CLSA" 
> To: solr-user@lucene.apache.org
> Sent: Monday, March 16, 2009 4:46:32 AM
> Subject: RE: How to correctly boost results in Solr Dismax query
> 
> If you just discovered the omitTf parameter because of this post, please
> be aware that I've not really explained it's purpose properly and note
> that using it will prevent phrase queries from working. See this thread
> for clarification on it's use here:
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/200903.mbox/%3
> c897559.95769...@web50301.mail.re2.yahoo.com%3e
> 
> -- Dean
> 
> -Original Message-
> From: Dean Missikowski (Consultant), CLSA 
> Sent: 16/03/2009 10:30 AM
> To: solr-user@lucene.apache.org
> Subject: RE: How to correctly boost results in Solr Dismax query
> 
> Hi,
> 
> My experience is that the BQ parameter can be used with any query type.
> You can define boosts on the query fields (qf) that are used with the
> query terms (q) in your query, AND you can define additional boosts for
> fields that are not used with the query terms through the bq or bf
> parameters. 
> 
> I think the relative weight that assigning a particular boost to a field
> via BQ has on the overall scoring needs to take into consideration the
> other fields in your query. If you're searching on titles, you might
> want to consider setting omitNorms=true (means don't generate length
> normalization vectors) for title in your schema.xml, and if you're using
> Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
> that results aren't skewed by short and long titles, or titles that
> contain multiple occurrences of the same term (setting these requires
> you to reindex). I think this should have the effect of making BQ boosts
> like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. 
> 
> -- Dean
> 
> -Original Message-
> From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
> Sent: 13/03/2009 7:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to correctly boost results in Solr Dismax query
> 
> Hi,
> 
> On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
> > bq works only with q.alt query and not with q queries. So, in your
> case you
> > would be using qf parameter for field boosting, you will have to give
> both
> > the fields in qf parameter i.e. both title and media.
> > 
> > try this
> > 
> > media^1.0 title^100.0
> 
> But with that, how will it know to rank media:DVD higher than
> media:BLU-RAY?
> 
> Cheers,
> Pete
> 
> 
> > Pete Smith-3 wrote:
> > > 
> > > Hi Amit,
> > > 
> > > Thanks again for your reply. I am understanding it a bit better but
> I
> > > think it would help if I posted an example. Say I have three
> records:
> > > 
> > > 
> > > 1
> > > BLU-RAY
> > > Indiana Jones and the Kingdom of the Crystal
> > > Skull
> > > 
> > > 
> > > 2
> > > DVD
> > > Indiana Jones and the Kingdom of the Crystal
> > > Skull
> > > 
> > > 
> > > 3
> > > DVD
> > > Casino Royale
> > > 
> > > 
> > > Now, if I search for indiana: select?q=indiana
> > > 
> > > I want the first two rows to come back (not the third as it does not
> > > contain 'indiana'). I would like record 2 to be scored higher than
> > > record 1 as it's media type is DVD.
> > > 
> > > At the moment I have in my config:
> > > 
> > > title
> > > 
> > > And i was trying to boost by media having a specific value by using
> 'bq'
> > > but from what you told me that is incorrect.
> > > 
> > > Cheers,
> > > Pete
> > > 
> > > 
> > > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> > >> Pete,
> > >> 
> > >> Sorry, if wasnt clear. Here is the explanation.
> > >> 
> > >> Suppose you have 2 records and they have films and media as 2
> columns.
> > >> 
> > >> Now first record has values like films="Indiana" and media="blue
> ray"
> > >> and 2nd record has values like films="Bond" a

RE: How to correctly boost results in Solr Dismax query

2009-03-16 Thread Pete Smith
Thank you Dean. I thought I was on the right track with BQ but it was
the skewing of results that was frustrating me. I'll try out your
suggestion.

Cheers,
Pete

On Mon, 2009-03-16 at 10:29 +0800, Dean Missikowski (Consultant), CLSA
wrote:
> Hi,
> 
> My experience is that the BQ parameter can be used with any query type.
> You can define boosts on the query fields (qf) that are used with the
> query terms (q) in your query, AND you can define additional boosts for
> fields that are not used with the query terms through the bq or bf
> parameters. 
> 
> I think the relative weight that assigning a particular boost to a field
> via BQ has on the overall scoring needs to take into consideration the
> other fields in your query. If you're searching on titles, you might
> want to consider setting omitNorms=true (means don't generate length
> normalization vectors) for title in your schema.xml, and if you're using
> Solr 1.4 omitTf=true (means don't generate term frequency vectors), so
> that results aren't skewed by short and long titles, or titles that
> contain multiple occurrences of the same term (setting these requires
> you to reindex). I think this should have the effect of making BQ boosts
> like &bq=media:DVD^2&bq=media:BLU-RAY^1.5 more effective. 
> 
> -- Dean
> 
> -Original Message-
> From: Pete Smith [mailto:pete.sm...@lovefilm.com] 
> Sent: 13/03/2009 7:11 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to correctly boost results in Solr Dismax query
> 
> Hi,
> 
> On Fri, 2009-03-13 at 03:57 -0700, dabboo wrote:
> > bq works only with q.alt query and not with q queries. So, in your
> case you
> > would be using qf parameter for field boosting, you will have to give
> both
> > the fields in qf parameter i.e. both title and media.
> > 
> > try this
> > 
> > media^1.0 title^100.0
> 
> But with that, how will it know to rank media:DVD higher than
> media:BLU-RAY?
> 
> Cheers,
> Pete
> 
> 
> > Pete Smith-3 wrote:
> > > 
> > > Hi Amit,
> > > 
> > > Thanks again for your reply. I am understanding it a bit better but
> I
> > > think it would help if I posted an example. Say I have three
> records:
> > > 
> > > 
> > > 1
> > > BLU-RAY
> > > Indiana Jones and the Kingdom of the Crystal
> > > Skull
> > > 
> > > 
> > > 2
> > > DVD
> > > Indiana Jones and the Kingdom of the Crystal
> > > Skull
> > > 
> > > 
> > > 3
> > > DVD
> > > Casino Royale
> > > 
> > > 
> > > Now, if I search for indiana: select?q=indiana
> > > 
> > > I want the first two rows to come back (not the third as it does not
> > > contain 'indiana'). I would like record 2 to be scored higher than
> > > record 1 as it's media type is DVD.
> > > 
> > > At the moment I have in my config:
> > > 
> > > title
> > > 
> > > And i was trying to boost by media having a specific value by using
> 'bq'
> > > but from what you told me that is incorrect.
> > > 
> > > Cheers,
> > > Pete
> > > 
> > > 
> > > On Fri, 2009-03-13 at 03:21 -0700, dabboo wrote:
> > >> Pete,
> > >> 
> > >> Sorry, if wasnt clear. Here is the explanation.
> > >> 
> > >> Suppose you have 2 records and they have films and media as 2
> columns.
> > >> 
> > >> Now first record has values like films="Indiana" and media="blue
> ray"
> > >> and 2nd record has values like films="Bond" and media="Indiana"
> > >> 
> > >> Values for qf parameters
> > >> 
> > >> media^2.0 films^1.0
> > >> 
> > >> Now, search for q=Indiana .. it should display both of the records
> but
> > >> record #2 will display above than the 1st.
> > >> 
> > >> Let me know if you still have questions.
> > >> 
> > >> Cheers,
> > >> amit
> > >> 
> > >> 
> > >> Pete Smith-3 wrote:
> > >> > 
> > >> > Hi Amit,
> > >> > 
> > >> > Thanks very much for your reply. What you said makes things a bit
> > >> > clearer but I am still a bit confused.
> > >> > 
> > >> > On Thu, 2009-03-12 at 23:14 -0700, dabboo wrote:
> > >> >> If you want to boost the records with their fi

Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: bq works only with q.alt query and not with q queries. So, in your case you
: would be using qf parameter for field boosting, you will have to give both
: the fields in qf parameter i.e. both title and media.

FWIW: that statement is false.  the "boost query" (bq) is added to the 
query regardless of wether "q" or "q.alt" is ultimately used.

if you turn on debugQUery=true and look at your resulting query string, 
you can see exactly what the resulting query is (parsedQuery)

Using the example setup, compare the output from these examples...

http://localhost:8983/solr/select/?q.alt=baz&q=solr&defType=dismax&qf=name+cat&bq=foo&debugQuery=true
http://localhost:8983/solr/select/?q.alt=solr&q=&defType=dismax&qf=name+cat&bq=foo&debugQuery=true


-Hoss



Re: How to correctly boost results in Solr Dismax query

2009-03-17 Thread Chris Hostetter

: Is not particularly helpful. I tried adding adding a bq argument to my
: search: 
: 
: &bq=media:DVD^2
: 
: (yes, this is an index of films!) but I find when I start adding more
: and more:
: 
: &bq=media:DVD^2&bq=media:BLU-RAY^1.5
: 
: I find the negative results - e.g. films that are DVD but are not
: BLU-RAY get negatively affected in their score. In the end it all seems

that shouldn't be happening ... the outermost BooleanQuery (that the 
main "q" and all of hte "bq" queries are added to) has it's 
"coordFactor" disabled, so documents aren't penalized for not matching bq 
caluses.

What you may be seeing is that the raw numeric score values you see 
getting returned by Solr are lower for documents that match "DVD" when you add 
teh 
"BLU-RAY" bq ... that's totally possible because *absolute* scores from 
one query can't be compared to scores from another query -- what's important is 
that 
the *relative* order of scores from doc1 and doc2 should be consistent 
(ie: the score for a doc matching DVD might go down when you add the 
BLUERAY bq, but the scores for *all* documents not matching BLUERAY should 
go down some)

The important thing to look for is:
  1) are DVD docs sorting higher then they would without the DVD bq?
  2) are BLURAY docs sorting higher then they would without the BLURAY bq?
  3) are two docs that are equivilent except for a DVD?BLUERAY distinction 
 sorting such that the BLURAY doc comes first?


...the answers to all of those should be yes.  if you're seeing otherwise, 
please post the query tostrings for both queries, and the score 
explanations for the docs in question against both queries.




-Hoss