Re: weighted search and index

2010-03-04 Thread Erick Erickson
OK, lights are finally dawning. I think what you want is payloads,
see:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/for
your index-time term boosting. Query time boosting is as you
indicated

HTH
Erick

On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai j...@huawei.com wrote:

 Hi Erick,

 Each doc contains some keywords that are indexed. However each keyword is
 associated with a weight to represent its importance. In my example,
 D1: fruit 0.8, apple 0.4, banana 0.2

 The keyword fruit is the most important keyword, which means I really
 really
 want it to be matched in a search result, but banana is less important (It
 would be good to be matched though).

 Hope that explains.

 Thanks.

 JB



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 Then I'm totally lost as to what you're trying to accomplish. Perhaps
 a higher-level statement of the problem would help.

 Because no matter how often I look at your point 2, I don't see
 what relevance the numbers have if you're not using them to
 boost at index time. Why are they even there?

 Erick

 On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

  Thank you very much Erick!
 
  1. I used boost in search, but I don't know exactly what's the best way
 to
  boost, for such as Sports 0.8, golf 0.5 in my example, would it be
  sports^0.8 AND golf^0.5 ?
 
 
  2. I cannot use boost in indexing. Because the weight of the value
 changes,
  not the field, look at this example again,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  There is no good way to boost it during indexing.
 
  Thanks.
 
  JB
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Wednesday, March 03, 2010 5:45 PM
  To: solr-user@lucene.apache.org
  Subject: Re: weighted search and index
 
  You have to provide some more details to get meaningful help.
 
  You say I was trying to use boosting. How? At index time?
  Search time? Both? Can you provide some code snippets?
  What does your schema look like for the relevant field(s)?
 
  You say but seems not working right. What does that mean? No hits?
  Hits not ordered as you expect? Have you tried putting debugQuery=on
 on
  your URL and examined the return values?
 
  Have you looked at your index with the admin page and/or Luke to see if
  the data in the index is as you expect?
 
  As far as I know, boosts are multiplicative. So boosting by a value less
  than
  1 will actually decrease the ranking. But see the Lucene scoring, See:
 
 
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity
 .
 
 html
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.%0Ahtml
 
  And remember, that boosting will *tend* to move a hit up or down in the
  ranking, not position it absolutely.
 
  HTH
  Erick
 
  On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:
 
   Hi,
  
   I am trying to use solr for a content match application.
  
   A content is described by a set of keywords with weights associated,
 eg.,
  
   C1: fruit 0.8, apple 0.4, banana 0.2
   C2: music 0.9, pop song 0.6, Britney Spears 0.4
  
   Those contents would be indexed in solr.
   In the search, I also have a set of keywords with weights:
  
   Query: Sports 0.8, golf 0.5
  
   I am trying to find the closest matching contents for this query.
  
   My question is how to index the contents with weighted scores, and how
 to
   write search query. I was trying to use boosting, but seems not working
   right.
  
   Thanks.
  
   Jianbin
  
  
  
 
 




RE: weighted search and index

2010-03-04 Thread Jianbin Dai
Thanks! Will try it.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, March 04, 2010 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

OK, lights are finally dawning. I think what you want is payloads,
see:
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payload
s/
http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloa
ds/for
your index-time term boosting. Query time boosting is as you
indicated

HTH
Erick

On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai j...@huawei.com wrote:

 Hi Erick,

 Each doc contains some keywords that are indexed. However each keyword is
 associated with a weight to represent its importance. In my example,
 D1: fruit 0.8, apple 0.4, banana 0.2

 The keyword fruit is the most important keyword, which means I really
 really
 want it to be matched in a search result, but banana is less important (It
 would be good to be matched though).

 Hope that explains.

 Thanks.

 JB



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 Then I'm totally lost as to what you're trying to accomplish. Perhaps
 a higher-level statement of the problem would help.

 Because no matter how often I look at your point 2, I don't see
 what relevance the numbers have if you're not using them to
 boost at index time. Why are they even there?

 Erick

 On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

  Thank you very much Erick!
 
  1. I used boost in search, but I don't know exactly what's the best way
 to
  boost, for such as Sports 0.8, golf 0.5 in my example, would it be
  sports^0.8 AND golf^0.5 ?
 
 
  2. I cannot use boost in indexing. Because the weight of the value
 changes,
  not the field, look at this example again,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  There is no good way to boost it during indexing.
 
  Thanks.
 
  JB
 
 
  -Original Message-
  From: Erick Erickson [mailto:erickerick...@gmail.com]
  Sent: Wednesday, March 03, 2010 5:45 PM
  To: solr-user@lucene.apache.org
  Subject: Re: weighted search and index
 
  You have to provide some more details to get meaningful help.
 
  You say I was trying to use boosting. How? At index time?
  Search time? Both? Can you provide some code snippets?
  What does your schema look like for the relevant field(s)?
 
  You say but seems not working right. What does that mean? No hits?
  Hits not ordered as you expect? Have you tried putting debugQuery=on
 on
  your URL and examined the return values?
 
  Have you looked at your index with the admin page and/or Luke to see if
  the data in the index is as you expect?
 
  As far as I know, boosts are multiplicative. So boosting by a value less
  than
  1 will actually decrease the ranking. But see the Lucene scoring, See:
 
 

http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity
 .
 
 html
 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.%0Ahtml
 
  And remember, that boosting will *tend* to move a hit up or down in the
  ranking, not position it absolutely.
 
  HTH
  Erick
 
  On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:
 
   Hi,
  
   I am trying to use solr for a content match application.
  
   A content is described by a set of keywords with weights associated,
 eg.,
  
   C1: fruit 0.8, apple 0.4, banana 0.2
   C2: music 0.9, pop song 0.6, Britney Spears 0.4
  
   Those contents would be indexed in solr.
   In the search, I also have a set of keywords with weights:
  
   Query: Sports 0.8, golf 0.5
  
   I am trying to find the closest matching contents for this query.
  
   My question is how to index the contents with weighted scores, and how
 to
   write search query. I was trying to use boosting, but seems not
working
   right.
  
   Thanks.
  
   Jianbin
  
  
  
 
 





Re: weighted search and index

2010-03-04 Thread Erick Erickson
Huh?

On Thu, Mar 4, 2010 at 1:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:


 : Subject: weighted search and index
 : In-reply-to: 4b8f061b.3080...@gmail.com

 http://people.apache.org/~hossman/#threadhijack
 Thread Hijacking on Mailing Lists

 When starting a new discussion on a mailing list, please do not reply to
 an existing message, instead start a fresh email.  Even if you change the
 subject line of your email, other mail headers still track which thread
 you replied to and your question is hidden in that thread and gets less
 attention.   It makes following discussions in the mailing list archives
 particularly difficult.
 See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



 -Hoss




weighted search and index

2010-03-03 Thread Jianbin Dai
Hi,

I am trying to use solr for a content match application. 

A content is described by a set of keywords with weights associated, eg.,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

Those contents would be indexed in solr.
In the search, I also have a set of keywords with weights:

Query: Sports 0.8, golf 0.5

I am trying to find the closest matching contents for this query.

My question is how to index the contents with weighted scores, and how to
write search query. I was trying to use boosting, but seems not working
right.

Thanks.

Jianbin




Re: weighted search and index

2010-03-03 Thread Erick Erickson
You have to provide some more details to get meaningful help.

You say I was trying to use boosting. How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say but seems not working right. What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting debugQuery=on on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

 Hi,

 I am trying to use solr for a content match application.

 A content is described by a set of keywords with weights associated, eg.,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 Those contents would be indexed in solr.
 In the search, I also have a set of keywords with weights:

 Query: Sports 0.8, golf 0.5

 I am trying to find the closest matching contents for this query.

 My question is how to index the contents with weighted scores, and how to
 write search query. I was trying to use boosting, but seems not working
 right.

 Thanks.

 Jianbin





RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Thank you very much Erick!

1. I used boost in search, but I don't know exactly what's the best way to
boost, for such as Sports 0.8, golf 0.5 in my example, would it be
sports^0.8 AND golf^0.5 ?


2. I cannot use boost in indexing. Because the weight of the value changes,
not the field, look at this example again,

C1: fruit 0.8, apple 0.4, banana 0.2
C2: music 0.9, pop song 0.6, Britney Spears 0.4

There is no good way to boost it during indexing.

Thanks.

JB


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

You have to provide some more details to get meaningful help.

You say I was trying to use boosting. How? At index time?
Search time? Both? Can you provide some code snippets?
What does your schema look like for the relevant field(s)?

You say but seems not working right. What does that mean? No hits?
Hits not ordered as you expect? Have you tried putting debugQuery=on on
your URL and examined the return values?

Have you looked at your index with the admin page and/or Luke to see if
the data in the index is as you expect?

As far as I know, boosts are multiplicative. So boosting by a value less
than
1 will actually decrease the ranking. But see the Lucene scoring, See:
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
html

And remember, that boosting will *tend* to move a hit up or down in the
ranking, not position it absolutely.

HTH
Erick

On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

 Hi,

 I am trying to use solr for a content match application.

 A content is described by a set of keywords with weights associated, eg.,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 Those contents would be indexed in solr.
 In the search, I also have a set of keywords with weights:

 Query: Sports 0.8, golf 0.5

 I am trying to find the closest matching contents for this query.

 My question is how to index the contents with weighted scores, and how to
 write search query. I was trying to use boosting, but seems not working
 right.

 Thanks.

 Jianbin






Re: weighted search and index

2010-03-03 Thread Erick Erickson
Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point 2, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:

 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.
 htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated, eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 




RE: weighted search and index

2010-03-03 Thread Jianbin Dai
Hi Erick,

Each doc contains some keywords that are indexed. However each keyword is
associated with a weight to represent its importance. In my example, 
D1: fruit 0.8, apple 0.4, banana 0.2

The keyword fruit is the most important keyword, which means I really really
want it to be matched in a search result, but banana is less important (It
would be good to be matched though).

Hope that explains.

Thanks.

JB



-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, March 03, 2010 6:23 PM
To: solr-user@lucene.apache.org
Subject: Re: weighted search and index

Then I'm totally lost as to what you're trying to accomplish. Perhaps
a higher-level statement of the problem would help.

Because no matter how often I look at your point 2, I don't see
what relevance the numbers have if you're not using them to
boost at index time. Why are they even there?

Erick

On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value
changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:


http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.

htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
rity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated,
eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how
to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 





Re: weighted search and index

2010-03-03 Thread Lance Norskog
Boosting by convention is flat at 1.0. Usually people boost with
numbers like 3 or 5 or 20.

On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai j...@huawei.com wrote:
 Hi Erick,

 Each doc contains some keywords that are indexed. However each keyword is
 associated with a weight to represent its importance. In my example,
 D1: fruit 0.8, apple 0.4, banana 0.2

 The keyword fruit is the most important keyword, which means I really really
 want it to be matched in a search result, but banana is less important (It
 would be good to be matched though).

 Hope that explains.

 Thanks.

 JB



 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 Then I'm totally lost as to what you're trying to accomplish. Perhaps
 a higher-level statement of the problem would help.

 Because no matter how often I look at your point 2, I don't see
 what relevance the numbers have if you're not using them to
 boost at index time. Why are they even there?

 Erick

 On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote:

 Thank you very much Erick!

 1. I used boost in search, but I don't know exactly what's the best way to
 boost, for such as Sports 0.8, golf 0.5 in my example, would it be
 sports^0.8 AND golf^0.5 ?


 2. I cannot use boost in indexing. Because the weight of the value
 changes,
 not the field, look at this example again,

 C1: fruit 0.8, apple 0.4, banana 0.2
 C2: music 0.9, pop song 0.6, Britney Spears 0.4

 There is no good way to boost it during indexing.

 Thanks.

 JB


 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Wednesday, March 03, 2010 5:45 PM
 To: solr-user@lucene.apache.org
 Subject: Re: weighted search and index

 You have to provide some more details to get meaningful help.

 You say I was trying to use boosting. How? At index time?
 Search time? Both? Can you provide some code snippets?
 What does your schema look like for the relevant field(s)?

 You say but seems not working right. What does that mean? No hits?
 Hits not ordered as you expect? Have you tried putting debugQuery=on on
 your URL and examined the return values?

 Have you looked at your index with the admin page and/or Luke to see if
 the data in the index is as you expect?

 As far as I know, boosts are multiplicative. So boosting by a value less
 than
 1 will actually decrease the ranking. But see the Lucene scoring, See:


 http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.

 htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila
 rity.%0Ahtml

 And remember, that boosting will *tend* to move a hit up or down in the
 ranking, not position it absolutely.

 HTH
 Erick

 On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote:

  Hi,
 
  I am trying to use solr for a content match application.
 
  A content is described by a set of keywords with weights associated,
 eg.,
 
  C1: fruit 0.8, apple 0.4, banana 0.2
  C2: music 0.9, pop song 0.6, Britney Spears 0.4
 
  Those contents would be indexed in solr.
  In the search, I also have a set of keywords with weights:
 
  Query: Sports 0.8, golf 0.5
 
  I am trying to find the closest matching contents for this query.
 
  My question is how to index the contents with weighted scores, and how
 to
  write search query. I was trying to use boosting, but seems not working
  right.
 
  Thanks.
 
  Jianbin
 
 
 







-- 
Lance Norskog
goks...@gmail.com