Re: Index time boosts, payloads, and long query strings

2009-11-23 Thread Girish Redekar
Thanks Erick!

After reading your answer, and re-reading the Solr wiki, I realized my
folly. I used to think that index-time boosts when applied on a per-field
basis are equivalent to query time boosts to that field.

To ensure that my new understanding is correct , I'll state it in my words.
Index time boosts will determine boost for a *document* if it is counted as
a hit. Query time boosts give you control on boosting the occurrence of a
query in a specific field.

Please correct me if I'm wrong (again) :-)

Girish Redekar
http://girishredekar.net


On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 I still think they are apples and oranges. If you boost *all* titles,
 you're effectively boosting none of them. Index time boosting
 expresses this document's title is more important than other
 document titles. What I think you're after is titles are more
 important than other parts of the document.

 For this latter, you're talking query-time boosting. Boosting only
 really makes sense if there are multiple clauses, something
 like title:important OR body:unimportant. If this is true, speed
 is irrelevant, you need correct behavior.

 Not that I think you'd notice either way. Modern computers
 can do a LOT of FLOPS/sec. Here's an experiment: time
 some queries (but beware of timing the very first ones, see
 the Wiki) with boosts and without boosts. I doubt you'll see
 enough difference to matter (but please do report back if you
 do, it'll further my education G).

 But, depending on your index structure, you may get this
 anyway. Generally, matches on shorter fields weigh more
 in the score calculations than on longer fields. If you have
 fields like title and body and you are querying on title:term OR
 body:term, documents with term in the title will tend toward
 higher scores.

 But before putting too much effort into this, do you have any
 evidence that the default behavior is unsatisfactory? Because
 unless and until you do, I think this is a distraction G...

 Best
 Erick

 On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
 girish.rede...@aplopio.comwrote:

  Hi Erick -
 
  Maybe I mis-wrote.
 
  My question is: would title:any_query^4.0 be faster/slower than
 applying
  index time boost to the field title. Basically, if I take *every* user
  query
  and search for it in title with boost (say, 4.0) - is it different than
  saying field title has boost 4.0?
 
  Cheers,
  Girish Redekar
  http://girishredekar.net
 
 
  On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   I'll take a whack at index .vs. query boosting. They are expressing
 very
   different concepts. Let's claim we're interested in boosting the title
   field
  
   Index time boosting is expressing this document's title is X more
   important
  
   than a normal document title. It doesn't matter *what* the title is,
   any query that matches on anything in this document's title will give
  this
   document a boost. I might use this to give preferential treatment to
 all
   encyclopedia entries or something.
  
   Query time boosting, like title:solr^4.0 expresses Any document with
   solr
   in
   it's title is more important than documents without solr in the title.
   This
   really
   only makes sense if you have other clauses that might cause a document
   *without*
   solr  the title to match..
  
   Since they are doing different things, efficiency isn't really
 relevant.
  
   HTH
   Erick
  
  
   On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
   girish.rede...@aplopio.comwrote:
  
Hi ,
   
I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
directly) primarily because I can use it without writing java code
  (rest
   of
my project is python coded).
   
My application has the following requirements:
(a) ability to search over multiple fields, each with different
 weight
(b) If possible, I'd like to have the ability to add extra/diminished
weights to particular tokens within a field
(c) My query strings have large lengths (50-100 words)
(d) My index is 500K+  documents
   
1) The way to (a) is field boosting (right?). My question is: Is all
   field
boosting done at query time? Even if I give index time boosts to
  fields?
   Is
there a performance advantage in boosting fields at index time vs at
   using
something like fieldname:querystring^boost.
2) From what I've read, it seems that I can do (b) using payloads.
   However,
as this link (
   
   
  
 
 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
)
suggests, I will have to write a payload aware Query Parser. Wanted
 to
confirm if this is indeed the case - or is there a out-of-box way to
implement payloads (am using Solr1.4)
3) For my project, the user fills multiple text boxes (for each
 query).
  I
combine these into a single query (with different treatment for
  contents

Re: Index time boosts, payloads, and long query strings

2009-11-23 Thread Erick Erickson
Yep G

On Mon, Nov 23, 2009 at 4:13 AM, Girish Redekar
girish.rede...@aplopio.comwrote:

 Thanks Erick!

 After reading your answer, and re-reading the Solr wiki, I realized my
 folly. I used to think that index-time boosts when applied on a per-field
 basis are equivalent to query time boosts to that field.

 To ensure that my new understanding is correct , I'll state it in my words.
 Index time boosts will determine boost for a *document* if it is counted as
 a hit. Query time boosts give you control on boosting the occurrence of a
 query in a specific field.

 Please correct me if I'm wrong (again) :-)

 Girish Redekar
 http://girishredekar.net


 On Sun, Nov 22, 2009 at 8:25 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  I still think they are apples and oranges. If you boost *all* titles,
  you're effectively boosting none of them. Index time boosting
  expresses this document's title is more important than other
  document titles. What I think you're after is titles are more
  important than other parts of the document.
 
  For this latter, you're talking query-time boosting. Boosting only
  really makes sense if there are multiple clauses, something
  like title:important OR body:unimportant. If this is true, speed
  is irrelevant, you need correct behavior.
 
  Not that I think you'd notice either way. Modern computers
  can do a LOT of FLOPS/sec. Here's an experiment: time
  some queries (but beware of timing the very first ones, see
  the Wiki) with boosts and without boosts. I doubt you'll see
  enough difference to matter (but please do report back if you
  do, it'll further my education G).
 
  But, depending on your index structure, you may get this
  anyway. Generally, matches on shorter fields weigh more
  in the score calculations than on longer fields. If you have
  fields like title and body and you are querying on title:term OR
  body:term, documents with term in the title will tend toward
  higher scores.
 
  But before putting too much effort into this, do you have any
  evidence that the default behavior is unsatisfactory? Because
  unless and until you do, I think this is a distraction G...
 
  Best
  Erick
 
  On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
  girish.rede...@aplopio.comwrote:
 
   Hi Erick -
  
   Maybe I mis-wrote.
  
   My question is: would title:any_query^4.0 be faster/slower than
  applying
   index time boost to the field title. Basically, if I take *every* user
   query
   and search for it in title with boost (say, 4.0) - is it different than
   saying field title has boost 4.0?
  
   Cheers,
   Girish Redekar
   http://girishredekar.net
  
  
   On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
I'll take a whack at index .vs. query boosting. They are expressing
  very
different concepts. Let's claim we're interested in boosting the
 title
field
   
Index time boosting is expressing this document's title is X more
important
   
than a normal document title. It doesn't matter *what* the title is,
any query that matches on anything in this document's title will give
   this
document a boost. I might use this to give preferential treatment to
  all
encyclopedia entries or something.
   
Query time boosting, like title:solr^4.0 expresses Any document
 with
solr
in
it's title is more important than documents without solr in the
 title.
This
really
only makes sense if you have other clauses that might cause a
 document
*without*
solr  the title to match..
   
Since they are doing different things, efficiency isn't really
  relevant.
   
HTH
Erick
   
   
On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
girish.rede...@aplopio.comwrote:
   
 Hi ,

 I'm relatively new to Solr/Lucene, and am using Solr (and not
 lucene
 directly) primarily because I can use it without writing java code
   (rest
of
 my project is python coded).

 My application has the following requirements:
 (a) ability to search over multiple fields, each with different
  weight
 (b) If possible, I'd like to have the ability to add
 extra/diminished
 weights to particular tokens within a field
 (c) My query strings have large lengths (50-100 words)
 (d) My index is 500K+  documents

 1) The way to (a) is field boosting (right?). My question is: Is
 all
field
 boosting done at query time? Even if I give index time boosts to
   fields?
Is
 there a performance advantage in boosting fields at index time vs
 at
using
 something like fieldname:querystring^boost.
 2) From what I've read, it seems that I can do (b) using payloads.
However,
 as this link (


   
  
 
 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
 )
 suggests, I will have to write a payload aware Query Parser. Wanted
  to
 confirm if this is indeed 

Re: Index time boosts, payloads, and long query strings

2009-11-22 Thread Girish Redekar
Hi Erick -

Maybe I mis-wrote.

My question is: would title:any_query^4.0 be faster/slower than applying
index time boost to the field title. Basically, if I take *every* user query
and search for it in title with boost (say, 4.0) - is it different than
saying field title has boost 4.0?

Cheers,
Girish Redekar
http://girishredekar.net


On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson erickerick...@gmail.comwrote:

 I'll take a whack at index .vs. query boosting. They are expressing very
 different concepts. Let's claim we're interested in boosting the title
 field

 Index time boosting is expressing this document's title is X more
 important

 than a normal document title. It doesn't matter *what* the title is,
 any query that matches on anything in this document's title will give this
 document a boost. I might use this to give preferential treatment to all
 encyclopedia entries or something.

 Query time boosting, like title:solr^4.0 expresses Any document with
 solr
 in
 it's title is more important than documents without solr in the title.
 This
 really
 only makes sense if you have other clauses that might cause a document
 *without*
 solr  the title to match..

 Since they are doing different things, efficiency isn't really relevant.

 HTH
 Erick


 On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
 girish.rede...@aplopio.comwrote:

  Hi ,
 
  I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
  directly) primarily because I can use it without writing java code (rest
 of
  my project is python coded).
 
  My application has the following requirements:
  (a) ability to search over multiple fields, each with different weight
  (b) If possible, I'd like to have the ability to add extra/diminished
  weights to particular tokens within a field
  (c) My query strings have large lengths (50-100 words)
  (d) My index is 500K+  documents
 
  1) The way to (a) is field boosting (right?). My question is: Is all
 field
  boosting done at query time? Even if I give index time boosts to fields?
 Is
  there a performance advantage in boosting fields at index time vs at
 using
  something like fieldname:querystring^boost.
  2) From what I've read, it seems that I can do (b) using payloads.
 However,
  as this link (
 
 
 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
  )
  suggests, I will have to write a payload aware Query Parser. Wanted to
  confirm if this is indeed the case - or is there a out-of-box way to
  implement payloads (am using Solr1.4)
  3) For my project, the user fills multiple text boxes (for each query). I
  combine these into a single query (with different treatment for contents
 of
  each text box). Consequently, my query looks something like (fieldname1:
  queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are there any
  guidelines for improving performance of such a system (sorry, this bit is
  vague)
 
  Any help with this will be great !
 
  Girish Redekar
  http://girishredekar.net
 



Re: Index time boosts, payloads, and long query strings

2009-11-22 Thread Erick Erickson
I still think they are apples and oranges. If you boost *all* titles,
you're effectively boosting none of them. Index time boosting
expresses this document's title is more important than other
document titles. What I think you're after is titles are more
important than other parts of the document.

For this latter, you're talking query-time boosting. Boosting only
really makes sense if there are multiple clauses, something
like title:important OR body:unimportant. If this is true, speed
is irrelevant, you need correct behavior.

Not that I think you'd notice either way. Modern computers
can do a LOT of FLOPS/sec. Here's an experiment: time
some queries (but beware of timing the very first ones, see
the Wiki) with boosts and without boosts. I doubt you'll see
enough difference to matter (but please do report back if you
do, it'll further my education G).

But, depending on your index structure, you may get this
anyway. Generally, matches on shorter fields weigh more
in the score calculations than on longer fields. If you have
fields like title and body and you are querying on title:term OR
body:term, documents with term in the title will tend toward
higher scores.

But before putting too much effort into this, do you have any
evidence that the default behavior is unsatisfactory? Because
unless and until you do, I think this is a distraction G...

Best
Erick

On Sun, Nov 22, 2009 at 8:37 AM, Girish Redekar
girish.rede...@aplopio.comwrote:

 Hi Erick -

 Maybe I mis-wrote.

 My question is: would title:any_query^4.0 be faster/slower than applying
 index time boost to the field title. Basically, if I take *every* user
 query
 and search for it in title with boost (say, 4.0) - is it different than
 saying field title has boost 4.0?

 Cheers,
 Girish Redekar
 http://girishredekar.net


 On Sun, Nov 22, 2009 at 2:02 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  I'll take a whack at index .vs. query boosting. They are expressing very
  different concepts. Let's claim we're interested in boosting the title
  field
 
  Index time boosting is expressing this document's title is X more
  important
 
  than a normal document title. It doesn't matter *what* the title is,
  any query that matches on anything in this document's title will give
 this
  document a boost. I might use this to give preferential treatment to all
  encyclopedia entries or something.
 
  Query time boosting, like title:solr^4.0 expresses Any document with
  solr
  in
  it's title is more important than documents without solr in the title.
  This
  really
  only makes sense if you have other clauses that might cause a document
  *without*
  solr  the title to match..
 
  Since they are doing different things, efficiency isn't really relevant.
 
  HTH
  Erick
 
 
  On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
  girish.rede...@aplopio.comwrote:
 
   Hi ,
  
   I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
   directly) primarily because I can use it without writing java code
 (rest
  of
   my project is python coded).
  
   My application has the following requirements:
   (a) ability to search over multiple fields, each with different weight
   (b) If possible, I'd like to have the ability to add extra/diminished
   weights to particular tokens within a field
   (c) My query strings have large lengths (50-100 words)
   (d) My index is 500K+  documents
  
   1) The way to (a) is field boosting (right?). My question is: Is all
  field
   boosting done at query time? Even if I give index time boosts to
 fields?
  Is
   there a performance advantage in boosting fields at index time vs at
  using
   something like fieldname:querystring^boost.
   2) From what I've read, it seems that I can do (b) using payloads.
  However,
   as this link (
  
  
 
 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
   )
   suggests, I will have to write a payload aware Query Parser. Wanted to
   confirm if this is indeed the case - or is there a out-of-box way to
   implement payloads (am using Solr1.4)
   3) For my project, the user fills multiple text boxes (for each query).
 I
   combine these into a single query (with different treatment for
 contents
  of
   each text box). Consequently, my query looks something like
 (fieldname1:
   queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are there
 any
   guidelines for improving performance of such a system (sorry, this bit
 is
   vague)
  
   Any help with this will be great !
  
   Girish Redekar
   http://girishredekar.net
  
 



Re: Index time boosts, payloads, and long query strings

2009-11-21 Thread Erick Erickson
I'll take a whack at index .vs. query boosting. They are expressing very
different concepts. Let's claim we're interested in boosting the title
field

Index time boosting is expressing this document's title is X more important

than a normal document title. It doesn't matter *what* the title is,
any query that matches on anything in this document's title will give this
document a boost. I might use this to give preferential treatment to all
encyclopedia entries or something.

Query time boosting, like title:solr^4.0 expresses Any document with solr
in
it's title is more important than documents without solr in the title. This
really
only makes sense if you have other clauses that might cause a document
*without*
solr  the title to match..

Since they are doing different things, efficiency isn't really relevant.

HTH
Erick


On Sat, Nov 21, 2009 at 2:13 AM, Girish Redekar
girish.rede...@aplopio.comwrote:

 Hi ,

 I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
 directly) primarily because I can use it without writing java code (rest of
 my project is python coded).

 My application has the following requirements:
 (a) ability to search over multiple fields, each with different weight
 (b) If possible, I'd like to have the ability to add extra/diminished
 weights to particular tokens within a field
 (c) My query strings have large lengths (50-100 words)
 (d) My index is 500K+  documents

 1) The way to (a) is field boosting (right?). My question is: Is all field
 boosting done at query time? Even if I give index time boosts to fields? Is
 there a performance advantage in boosting fields at index time vs at using
 something like fieldname:querystring^boost.
 2) From what I've read, it seems that I can do (b) using payloads. However,
 as this link (

 http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
 )
 suggests, I will have to write a payload aware Query Parser. Wanted to
 confirm if this is indeed the case - or is there a out-of-box way to
 implement payloads (am using Solr1.4)
 3) For my project, the user fills multiple text boxes (for each query). I
 combine these into a single query (with different treatment for contents of
 each text box). Consequently, my query looks something like (fieldname1:
 queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are there any
 guidelines for improving performance of such a system (sorry, this bit is
 vague)

 Any help with this will be great !

 Girish Redekar
 http://girishredekar.net



Index time boosts, payloads, and long query strings

2009-11-20 Thread Girish Redekar
Hi ,

I'm relatively new to Solr/Lucene, and am using Solr (and not lucene
directly) primarily because I can use it without writing java code (rest of
my project is python coded).

My application has the following requirements:
(a) ability to search over multiple fields, each with different weight
(b) If possible, I'd like to have the ability to add extra/diminished
weights to particular tokens within a field
(c) My query strings have large lengths (50-100 words)
(d) My index is 500K+  documents

1) The way to (a) is field boosting (right?). My question is: Is all field
boosting done at query time? Even if I give index time boosts to fields? Is
there a performance advantage in boosting fields at index time vs at using
something like fieldname:querystring^boost.
2) From what I've read, it seems that I can do (b) using payloads. However,
as this link (
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/)
suggests, I will have to write a payload aware Query Parser. Wanted to
confirm if this is indeed the case - or is there a out-of-box way to
implement payloads (am using Solr1.4)
3) For my project, the user fills multiple text boxes (for each query). I
combine these into a single query (with different treatment for contents of
each text box). Consequently, my query looks something like (fieldname1:
queryterm1 queryterm2^2.0 queryterm3^3.0 +queryterm4)^1.0  Are there any
guidelines for improving performance of such a system (sorry, this bit is
vague)

Any help with this will be great !

Girish Redekar
http://girishredekar.net