Re: Query ReRanking question
the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto: ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Joel and Erick, Thank you very much for explaining how the ReRanking works. Now its a bit more clear. Thanks, Ravi Kiran Bhaskar On Sun, Sep 7, 2014 at 4:45 PM, Joel Bernstein joels...@gmail.com wrote: Oops wrong usage pattern. It should be: 1) Main query is sorted by a field (scores tracked silently in the background). 2) Reranker is reRanking docs based on the score from the main query. Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com wrote: Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively
Re: Query ReRanking question
Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis
Re: Query ReRanking question
Oops wrong usage pattern. It should be: 1) Main query is sorted by a field (scores tracked silently in the background). 2) Reranker is reRanking docs based on the score from the main query. Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 4:43 PM, Joel Bernstein joels...@gmail.com wrote: Ok, just reviewed the code. The ReRankingQParserPlugin always tracks the scores from the main query. So this explains things. Speaking of explaining things, the ReRankingParserPlugin also works with Lucene's explain. So if you use debugQuery=true we should see that the score from the initial query was combined with the score from the reRankQuery, which should be 1. You have stumbled on a interesting usage pattern which I never considered. But basically what's happening is: 1) Main query is sorted by score. 2) Reranker is reRanking docs based on the score from the main query. No, worries Erick, you've taught me a lot over the past couple of years! Joel Bernstein Search Engineer at Heliosearch On Sun, Sep 7, 2014 at 11:37 AM, Erick Erickson erickerick...@gmail.com wrote: Joel: I find that whenever I say something totally wrong publicly, I remember the correction really really well... Thanks for straightening that out! Erick On Sat, Sep 6, 2014 at 12:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N
Re: Query ReRanking question
You can probably use the FunctionQParserPlugin in conjunction with Query ReRanking to achieve what you're trying to do. q=foorq={!rerank reRankDocs=1000 reRankQuery=$qq}qq={!func}someFunction() What this is going to do is rerank the docs based on a function query. Your function query will need to return a float because the query reranker is expecting a score which is a float. So you'll have to devise function query logic that will transform your date to a float. Joel Bernstein Search Engineer at Heliosearch On Fri, Sep 5, 2014 at 7:06 PM, Ravi Solr ravis...@gmail.com wrote: Walter, thank you for the valuable insight. The problem I am facing is that between the term frequencies, mm, date boost and stemming the results can become very inconsistent...Look at the following examples Here the chronology is all over the place because of what I mentioned above http://www.washingtonpost.com/pb/newssearch/?query=malaysian+airline+crash Now take the instance of an old topic/news which was covered a a while ago for a period of time but not actively updated recently...In this case, the date boosting predominantly takes over because of common terms and we get a rash of irrelevant content http://www.washingtonpost.com/pb/newssearch/?query=faces+of+the+fallen This has become such a balancing act and hence I was looking to see if reRanking might help Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:32 PM, Walter Underwood wun...@wunderwood.org wrote: Boosting on recency is probably a better approach. A fixed re-ranking horizon will always be a compromise, a guess at the precision of the query. It will give poor results for queries that are more or less specific than the assumption. Think of the recency boost as a tie-breaker. When documents are similar in relevance, show the most recent. This can work over a wide range of queries. For “malaysian airlines crash”, there are two sets of relevant documents, one set on MH 370 starting six months ago, and one set on MH 17, two months ago. But four hours ago, The Guardian published a “six months on” article on MH 370. A recency boost will handle that complexity. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 5, 2014, at 10:23 AM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Ravi: bq: It is as if the sort is applied after the docs are collected Exactly, the primary query is getting the top 1,000 documents ranked by relevance. Then it's sending those through the reranking query, i.e. sorting them by date. I kind of question whether you really want 1,000 docs to be re-ranked by date, perhaps a smaller number of docs would provide better results, but that's for you to decide. If I understand it correctly, conceptually reranking goes like this in your example 1 execute the first query with rows=1,000, as: q=malaysian airline crashrows=1000fl=id 2 Now form a bit OR clause in a filter query of all the docs returned in 1, like fq=id:(id1 OR id2 OR id45 OR.) and append it to the reranking query, as: q=*:*sort=publish_date descfl=headline,publish_date,scorefq=id:(id1 OR id2 OR id45 OR.) I'm sure Joel will correct me if I'm wrong here. And of course the code is much more efficient than this, but that's the idea I think. Best, Erick On Sat, Sep 6, 2014 at 11:33 AM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
If you are using “bq”, that is a problem. An additive boost does not work well. If items are very popular, it overrides everything, if items are not so popular, it does nothing. You need to use “boost” in edismax, a multiplicative boost. That works regardless of the magnitudes. Example from my time at Netflix several years ago: The query is “twilight zone”. The movie “Twilight” is massively popular (in the queues of 1.2 million subscribers), so an additive boost puts that movie above all the “Twilight Zone” matches. With a multiplicative boost, that doesn’t happen. The query is “lord of the rings”. There are small differences between the popularity of the recent movies, the extended edition movies, and the ancient animated version. With additive boost, those are not enough to make a difference. With a multiplicative boost, they do. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 6, 2014, at 12:07 PM, Erick Erickson erickerick...@gmail.com wrote: Ravi: bq: It is as if the sort is applied after the docs are collected Exactly, the primary query is getting the top 1,000 documents ranked by relevance. Then it's sending those through the reranking query, i.e. sorting them by date. I kind of question whether you really want 1,000 docs to be re-ranked by date, perhaps a smaller number of docs would provide better results, but that's for you to decide. If I understand it correctly, conceptually reranking goes like this in your example 1 execute the first query with rows=1,000, as: q=malaysian airline crashrows=1000fl=id 2 Now form a bit OR clause in a filter query of all the docs returned in 1, like fq=id:(id1 OR id2 OR id45 OR.) and append it to the reranking query, as: q=*:*sort=publish_date descfl=headline,publish_date,scorefq=id:(id1 OR id2 OR id45 OR.) I'm sure Joel will correct me if I'm wrong here. And of course the code is much more efficient than this, but that's the idea I think. Best, Erick On Sat, Sep 6, 2014 at 11:33 AM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort
Re: Query ReRanking question
This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
What may be happening here: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Because the fl is requesting the score, possibly the scores are being tracked in the initial query even though it is being sorted by publish_date. Then during the rerank phase the the initial score is being combined with the *:* score which will be 1. So the effect would be to rerank the docs by the scores from the main query. One way to prove this would be to remove the score from the fl param and see if this changes the result ordering. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 3:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Joel, that was exactly what I was thinking too, that is why I wanted to know the explanation. Anyway, I will modify the fl and report. This is getting interesting :-) Thanks Ravi Kiran Bhaskar On Sat, Sep 6, 2014 at 3:58 PM, Joel Bernstein joels...@gmail.com wrote: This folllowing query: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
: http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score Is doing the following: The main query is sorted by publish_date. Then the results are reranked by *:*, which in theory would have no effect at all. The reRankQuery only uses the reRankQuery to re-rank the results. The sort param will always apply to the main query. Joel Bernstein Search Engineer at Heliosearch On Sat, Sep 6, 2014 at 2:33 PM, Ravi Solr ravis...@gmail.com wrote: Erick, Your idea about reversing Joel's suggestion seems to give the best results of all the options I tried...but I cant seem to understand why. I thought the query shown below should give irrelevant results as sorting by date would throw relevancy off...but somehow its getting relevant results with fair enough reverse chronology. It is as if the sort is applied after the docs are collected and reranked (which is what I wanted). One more thing that baffled me was, if I change reRankDocs from 1000 to100 the results become irrelevant, which doesnt make sense. So can you kindly explain whats going on in the following query. http://localhost:8080/solr/select?q=malaysian airline crashrq={!rerank reRankQuery=$rqq reRankDocs=1000}rqq=*:*sort=publish_date descfl=headline,publish_date,score I love the solr community, so much to learn from so many knowledgeable people. Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto: solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
RE: Query ReRanking question
Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Boosting on recency is probably a better approach. A fixed re-ranking horizon will always be a compromise, a guess at the precision of the query. It will give poor results for queries that are more or less specific than the assumption. Think of the recency boost as a tie-breaker. When documents are similar in relevance, show the most recent. This can work over a wide range of queries. For “malaysian airlines crash”, there are two sets of relevant documents, one set on MH 370 starting six months ago, and one set on MH 17, two months ago. But four hours ago, The Guardian published a “six months on” article on MH 370. A recency boost will handle that complexity. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 5, 2014, at 10:23 AM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Erick, I believe when you apply sort this way it runs the query and sort first and then tries to rerank...so basically it already lost the true relevancy because of sort taking precedence. Am I making sense ? Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:23 PM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
Walter, thank you for the valuable insight. The problem I am facing is that between the term frequencies, mm, date boost and stemming the results can become very inconsistent...Look at the following examples Here the chronology is all over the place because of what I mentioned above http://www.washingtonpost.com/pb/newssearch/?query=malaysian+airline+crash Now take the instance of an old topic/news which was covered a a while ago for a period of time but not actively updated recently...In this case, the date boosting predominantly takes over because of common terms and we get a rash of irrelevant content http://www.washingtonpost.com/pb/newssearch/?query=faces+of+the+fallen This has become such a balancing act and hence I was looking to see if reRanking might help Thanks Ravi Kiran Bhaskar On Fri, Sep 5, 2014 at 1:32 PM, Walter Underwood wun...@wunderwood.org wrote: Boosting on recency is probably a better approach. A fixed re-ranking horizon will always be a compromise, a guess at the precision of the query. It will give poor results for queries that are more or less specific than the assumption. Think of the recency boost as a tie-breaker. When documents are similar in relevance, show the most recent. This can work over a wide range of queries. For “malaysian airlines crash”, there are two sets of relevant documents, one set on MH 370 starting six months ago, and one set on MH 17, two months ago. But four hours ago, The Guardian published a “six months on” article on MH 370. A recency boost will handle that complexity. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Sep 5, 2014, at 10:23 AM, Erick Erickson erickerick...@gmail.com wrote: OK, why can't you switch the clauses from Joel's suggestion? Something like: q=Malaysia plane crashrq={!rerank reRankDocs=1000 reRankQuery=$myquery}myquery=*:*sort=date+desc (haven't tried this yet, but you get the idea). Best, Erick On Fri, Sep 5, 2014 at 9:33 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent: Friday 5th September 2014 18:06 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Query ReRanking question Thank you very much for responding. I want to do exactly the opposite of what you said. I want to sort the relevant docs in reverse chronology. If you sort by date before hand then the relevancy is lost. So I want to get Top N relevant results and then rerank those Top N to achieve relevant reverse chronological results. If you ask Why would I want to do that ?? Lets take a example about Malaysian airline crash. several articles might have been published over a period of time. When I search for - malaysia airline crash blackbox - I would want to see relevant results but would also like to see the the recent developments on the top i.e. effectively a reverse chronological order within the relevant results, like telling a story over a period of time Hope i am clear. Thanks for your help. Thanks Ravi Kiran Bhaskar On Thu, Sep 4, 2014 at 5:08 PM, Joel Bernstein joels...@gmail.com mailto:joels...@gmail.com wrote: If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar
Re: Query ReRanking question
If you want the main query to be sorted by date then the top N docs reranked by a query, that should work. Try something like this: q=foosort=date+descrq={!rerank reRandDocs=1000 reRankQuery=$myquery}myquery=blah Joel Bernstein Search Engineer at Heliosearch On Thu, Sep 4, 2014 at 4:25 PM, Ravi Solr ravis...@gmail.com wrote: Can the ReRanking API be used to sort within docs retrieved by a date field ? Can somebody help me understand how to write such a query ? Thanks Ravi Kiran Bhaskar