Re: pruning search result with search score gradient
On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote: I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. As part of experimenting with federated search, this is one approach we'll be trying out to determine which results to discard when merging. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. So if we have the scores 1.0, 0.9, 0.2, 0.15, 0.1, 0.05 then the slopes will be 0.05, 0.4, 0.025, 0.025, 0.025 and with a slope threshold of 0.1, we would discard everything from score 0.2 and below. It makes sense if the scores are linear with the relevance (a document with score 0.8 has double the relevance as one with 0.4). I don't know if they are, so experiments must be made and I fear that this is another demonstration of the inherent problem with quantifying quality. - Toke
Re: pruning search result with search score gradient
that's a pretty good idea, using 'delta score' Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Toke Eskildsen t...@statsbiblioteket.dk To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thu, January 20, 2011 11:31:48 PM Subject: Re: pruning search result with search score gradient On Tue, 2011-01-11 at 12:12 +0100, Julien Piquot wrote: I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. As part of experimenting with federated search, this is one approach we'll be trying out to determine which results to discard when merging. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. So if we have the scores 1.0, 0.9, 0.2, 0.15, 0.1, 0.05 then the slopes will be 0.05, 0.4, 0.025, 0.025, 0.025 and with a slope threshold of 0.1, we would discard everything from score 0.2 and below. It makes sense if the scores are linear with the relevance (a document with score 0.8 has double the relevance as one with 0.4). I don't know if they are, so experiments must be made and I fear that this is another demonstration of the inherent problem with quantifying quality. - Toke
Re: pruning search result with search score gradient
What's the use-case you're trying to solve? Because if you're still showing results to the user, you're taking information away from them. Where are you expecting to get the list? If you try to return the entire list, you're going to pay the penalty of creating the entire list and transmitting it across the wire rather than just a pages' worth. And if you're paging, the user will do this for you by deciding for herself when she's getting less relevant results. So I don't understand what the value to the end user you're trying to provide is, perhaps if you elaborate on that I'll have more useful response Best Erick On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquot julien.piq...@arisem.comwrote: Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
Re: pruning search result with search score gradient
Some times I've _considered_ trying to do this (but generally decided it wasn't worth it) was when I didn't want those documents below the threshold to show up in the facet values. In my application the facet counts are sometimes very pertinent information, that are sometimes not quite as useful as they could be when they include barely-relevant hits. On 1/12/2011 11:42 AM, Erick Erickson wrote: What's the use-case you're trying to solve? Because if you're still showing results to the user, you're taking information away from them. Where are you expecting to get the list? If you try to return the entire list, you're going to pay the penalty of creating the entire list and transmitting it across the wire rather than just a pages' worth. And if you're paging, the user will do this for you by deciding for herself when she's getting less relevant results. So I don't understand what the value to the end user you're trying to provide is, perhaps if you elaborate on that I'll have more useful response Best Erick On Tue, Jan 11, 2011 at 3:12 AM, Julien Piquotjulien.piq...@arisem.comwrote: Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
pruning search result with search score gradient
Hi everyone, I would like to be able to prune my search result by removing the less relevant documents. I'm thinking about using the search score : I use the search scores of the document set (I assume there are sorted by descending order), normalise them (0 would be the the lowest value and 1 the greatest value) and then calculate the gradient of the normalised scores. The documents with a gradient below a threshold value would be rejected. If the scores are linearly decreasing, then no document is rejected. However, if there is a brutal score drop, then the documents below the drop are rejected. The threshold value would still have to be tuned but I believe it would make a much stronger metric than an absolute search score. What do you think about this approach? Do you see any problem with it? Is there any SOLR tools that could help me dealing with that? Thanks for your answer. Julien
Re: pruning search result with search score gradient
Look at Solr Function Queries they might help you - Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/pruning-search-result-with-search-score-gradient-tp2233760p2233773.html Sent from the Solr - User mailing list archive at Nabble.com.