Re: weighted search and index
OK, lights are finally dawning. I think what you want is payloads, see: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/for your index-time term boosting. Query time boosting is as you indicated HTH Erick On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai j...@huawei.com wrote: Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity . html http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: weighted search and index
Thanks! Will try it. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, March 04, 2010 5:59 AM To: solr-user@lucene.apache.org Subject: Re: weighted search and index OK, lights are finally dawning. I think what you want is payloads, see: http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payload s/ http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloa ds/for your index-time term boosting. Query time boosting is as you indicated HTH Erick On Wed, Mar 3, 2010 at 9:34 PM, Jianbin Dai j...@huawei.com wrote: Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity . html http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Re: weighted search and index
Huh? On Thu, Mar 4, 2010 at 1:13 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: weighted search and index : In-reply-to: 4b8f061b.3080...@gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
weighted search and index
Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Re: weighted search and index
You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: weighted search and index
Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. html And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Re: weighted search and index
Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
RE: weighted search and index
Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin
Re: weighted search and index
Boosting by convention is flat at 1.0. Usually people boost with numbers like 3 or 5 or 20. On Wed, Mar 3, 2010 at 6:34 PM, Jianbin Dai j...@huawei.com wrote: Hi Erick, Each doc contains some keywords that are indexed. However each keyword is associated with a weight to represent its importance. In my example, D1: fruit 0.8, apple 0.4, banana 0.2 The keyword fruit is the most important keyword, which means I really really want it to be matched in a search result, but banana is less important (It would be good to be matched though). Hope that explains. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 6:23 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index Then I'm totally lost as to what you're trying to accomplish. Perhaps a higher-level statement of the problem would help. Because no matter how often I look at your point 2, I don't see what relevance the numbers have if you're not using them to boost at index time. Why are they even there? Erick On Wed, Mar 3, 2010 at 8:54 PM, Jianbin Dai j...@huawei.com wrote: Thank you very much Erick! 1. I used boost in search, but I don't know exactly what's the best way to boost, for such as Sports 0.8, golf 0.5 in my example, would it be sports^0.8 AND golf^0.5 ? 2. I cannot use boost in indexing. Because the weight of the value changes, not the field, look at this example again, C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 There is no good way to boost it during indexing. Thanks. JB -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 03, 2010 5:45 PM To: solr-user@lucene.apache.org Subject: Re: weighted search and index You have to provide some more details to get meaningful help. You say I was trying to use boosting. How? At index time? Search time? Both? Can you provide some code snippets? What does your schema look like for the relevant field(s)? You say but seems not working right. What does that mean? No hits? Hits not ordered as you expect? Have you tried putting debugQuery=on on your URL and examined the return values? Have you looked at your index with the admin page and/or Luke to see if the data in the index is as you expect? As far as I know, boosts are multiplicative. So boosting by a value less than 1 will actually decrease the ranking. But see the Lucene scoring, See: http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity. htmlhttp://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Simila rity.%0Ahtml And remember, that boosting will *tend* to move a hit up or down in the ranking, not position it absolutely. HTH Erick On Wed, Mar 3, 2010 at 8:13 PM, Jianbin Dai j...@huawei.com wrote: Hi, I am trying to use solr for a content match application. A content is described by a set of keywords with weights associated, eg., C1: fruit 0.8, apple 0.4, banana 0.2 C2: music 0.9, pop song 0.6, Britney Spears 0.4 Those contents would be indexed in solr. In the search, I also have a set of keywords with weights: Query: Sports 0.8, golf 0.5 I am trying to find the closest matching contents for this query. My question is how to index the contents with weighted scores, and how to write search query. I was trying to use boosting, but seems not working right. Thanks. Jianbin -- Lance Norskog goks...@gmail.com