Re[2]: Sort hits in the order of subqueries
Hello, I had a look at the Constant Score approach suggested by Emir: (q0^=100) OR (q1)^=90 ... As observed by Alexandre it seems to introduce stratification at the cost of the intra-query ranking which is not satisfactory. So if I imagine Constant Score as a function f(x) = C operating on a document score and constrained to a subquery then what I would like to have is sigmoid function F(x, C) = C + 1 / (1+ exp(-x)) applied to the document scores of intra-queries. Instead of: ConstantScore(q0, 100) OR ConstantScore(q1, 90) ... then: SigmoidScore(q0, 100) OR SigmoidScore(q1, 90) ... I'm pretty sure, it is possible to take ConstantScore class and end up with Sigmoid as a custom extension. Still hoping for a hint what is the simplest approach to achieve the stratification. Next question which I have in this context: we happen to sort some intra queries by different fields in some cases. It looks like: (q0 sorted by date) OR (q1 sorted by relevancy) Wondering if you have any idea how is that possible to formulate in Solr. Regards, Robert >Четверг, 7 июня 2018, 15:20 +02:00 от Alexandre Rafalovitch >: > >I think this solution will destroy intra-query ranking. So all results in >q0 come before q1 but would be random within q0 results. > >Would instead just a bunch of boost queries with different weights >(additive probably) be a beter way to introduce stratification? > >Regards, > Alex > >On Thu, Jun 7, 2018, 13:19 Emir Arnautović, < emir.arnauto...@sematext.com > >wrote: > >> Hi Robert, >> If I get your requirement right, you can solve it with following: >> (q0)^=100 OR (q1)^=90…. >> >> Assuming there are no overlaps - otherwise, one matching multiple >> conditions can change the ordering. >> >> HTH, >> Emir >> -- >> Monitoring - Log Management - Alerting - Anomaly Detection >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ >> >> >> >> > On 7 Jun 2018, at 11:53, Robert K. < wk.rk.sk...@mail.ru.INVALID > wrote: >> > >> > Hello, >> > >> > I am investigating the following use case. >> > >> > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a >> boolean query using 'SHOULD'-clauses. >> > The requirement for the hits sorting is that the results of q_0 precede >> the results of q_1, the results of q_1 precede the >> > results of q_2 an so on. If a hit occurs in the results of more then one >> query, then we should see it only once in the results >> > of the query with the smallest index. >> > >> > I have searched for some solutions but didn't find anything useful so >> far. >> > >> > I have considered following approaches: >> > >> > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ... >> > >> > While possible, seems to have a potential negative impact on performance >> due to multiple evaluations on the same queries. >> > I didn't do any measurements, though. It is technically possible to >> optimize the execution of this query to evaluate the subqueries >> > q_i only once, but I don't know, whether this kind of optimizations is >> implemented in the current Lucene/Solr. (?) >> > >> > 2. Implement CustomScoreQuery. General idea: Take a list of queries and >> execute them in the context of a BooleanQuery mapping >> > the scores of the corresponding subqueries to disjunct score ranges, >> like q_n -> [0,1), q_(n-1) -> [1,2) and so on. >> > >> > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded >> approach. Still I didn't see any obvious solution >> > how I can use FunctionQuery to implement the idea. Is it possible, >> should I dive in and try to do it with FunctionQuery. >> > >> > 3. Assuming there is some possibility to solve the task with the >> FunctionQuery (or anything within the out-of-the-box Solr). My questions >> > are: Is there any solution without having to write our own extension to >> Solr? Using only what is delivered in the standard distribution of Solr? >> > >> > >> > Note: In the past we solved the problem within our legacy application >> with a modified BooleanQuery/BooleanScorer. We could migrate >> > (=rewrite) this extension to the current Solr/Lucene, but it may be not >> the best option, so I am exploring all the other possibilities. >> > >> > Thank you all & Best regards, >> > >> > Robert >> >>
Re: Sort hits in the order of subqueries
I think this solution will destroy intra-query ranking. So all results in q0 come before q1 but would be random within q0 results. Would instead just a bunch of boost queries with different weights (additive probably) be a beter way to introduce stratification? Regards, Alex On Thu, Jun 7, 2018, 13:19 Emir Arnautović, wrote: > Hi Robert, > If I get your requirement right, you can solve it with following: > (q0)^=100 OR (q1)^=90…. > > Assuming there are no overlaps - otherwise, one matching multiple > conditions can change the ordering. > > HTH, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 7 Jun 2018, at 11:53, Robert K. wrote: > > > > Hello, > > > > I am investigating the following use case. > > > > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a > boolean query using 'SHOULD'-clauses. > > The requirement for the hits sorting is that the results of q_0 precede > the results of q_1, the results of q_1 precede the > > results of q_2 an so on. If a hit occurs in the results of more then one > query, then we should see it only once in the results > > of the query with the smallest index. > > > > I have searched for some solutions but didn't find anything useful so > far. > > > > I have considered following approaches: > > > > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ... > > > > While possible, seems to have a potential negative impact on performance > due to multiple evaluations on the same queries. > > I didn't do any measurements, though. It is technically possible to > optimize the execution of this query to evaluate the subqueries > > q_i only once, but I don't know, whether this kind of optimizations is > implemented in the current Lucene/Solr. (?) > > > > 2. Implement CustomScoreQuery. General idea: Take a list of queries and > execute them in the context of a BooleanQuery mapping > > the scores of the corresponding subqueries to disjunct score ranges, > like q_n -> [0,1), q_(n-1) -> [1,2) and so on. > > > > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded > approach. Still I didn't see any obvious solution > > how I can use FunctionQuery to implement the idea. Is it possible, > should I dive in and try to do it with FunctionQuery. > > > > 3. Assuming there is some possibility to solve the task with the > FunctionQuery (or anything within the out-of-the-box Solr). My questions > > are: Is there any solution without having to write our own extension to > Solr? Using only what is delivered in the standard distribution of Solr? > > > > > > Note: In the past we solved the problem within our legacy application > with a modified BooleanQuery/BooleanScorer. We could migrate > > (=rewrite) this extension to the current Solr/Lucene, but it may be not > the best option, so I am exploring all the other possibilities. > > > > Thank you all & Best regards, > > > > Robert > >
Re: Sort hits in the order of subqueries
Hi Robert, If I get your requirement right, you can solve it with following: (q0)^=100 OR (q1)^=90…. Assuming there are no overlaps - otherwise, one matching multiple conditions can change the ordering. HTH, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 7 Jun 2018, at 11:53, Robert K. wrote: > > Hello, > > I am investigating the following use case. > > Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a > boolean query using 'SHOULD'-clauses. > The requirement for the hits sorting is that the results of q_0 precede the > results of q_1, the results of q_1 precede the > results of q_2 an so on. If a hit occurs in the results of more then one > query, then we should see it only once in the results > of the query with the smallest index. > > I have searched for some solutions but didn't find anything useful so far. > > I have considered following approaches: > > 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ... > > While possible, seems to have a potential negative impact on performance due > to multiple evaluations on the same queries. > I didn't do any measurements, though. It is technically possible to optimize > the execution of this query to evaluate the subqueries > q_i only once, but I don't know, whether this kind of optimizations is > implemented in the current Lucene/Solr. (?) > > 2. Implement CustomScoreQuery. General idea: Take a list of queries and > execute them in the context of a BooleanQuery mapping > the scores of the corresponding subqueries to disjunct score ranges, like q_n > -> [0,1), q_(n-1) -> [1,2) and so on. > > Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded > approach. Still I didn't see any obvious solution > how I can use FunctionQuery to implement the idea. Is it possible, should I > dive in and try to do it with FunctionQuery. > > 3. Assuming there is some possibility to solve the task with the > FunctionQuery (or anything within the out-of-the-box Solr). My questions > are: Is there any solution without having to write our own extension to Solr? > Using only what is delivered in the standard distribution of Solr? > > > Note: In the past we solved the problem within our legacy application with a > modified BooleanQuery/BooleanScorer. We could migrate > (=rewrite) this extension to the current Solr/Lucene, but it may be not the > best option, so I am exploring all the other possibilities. > > Thank you all & Best regards, > > Robert
Sort hits in the order of subqueries
Hello, I am investigating the following use case. Suppose I have a list of queries q_0, q_1, ..., q_n which I combine to a boolean query using 'SHOULD'-clauses. The requirement for the hits sorting is that the results of q_0 precede the results of q_1, the results of q_1 precede the results of q_2 an so on. If a hit occurs in the results of more then one query, then we should see it only once in the results of the query with the smallest index. I have searched for some solutions but didn't find anything useful so far. I have considered following approaches: 1. Reformulate: q0 & (q_1 & !q_0) & (q2 & !q_0 & !q1) & ... While possible, seems to have a potential negative impact on performance due to multiple evaluations on the same queries. I didn't do any measurements, though. It is technically possible to optimize the execution of this query to evaluate the subqueries q_i only once, but I don't know, whether this kind of optimizations is implemented in the current Lucene/Solr. (?) 2. Implement CustomScoreQuery. General idea: Take a list of queries and execute them in the context of a BooleanQuery mapping the scores of the corresponding subqueries to disjunct score ranges, like q_n -> [0,1), q_(n-1) -> [1,2) and so on. Problem: CustomScoreQuery is deprecated, FunctionQuery is the recommeded approach. Still I didn't see any obvious solution how I can use FunctionQuery to implement the idea. Is it possible, should I dive in and try to do it with FunctionQuery. 3. Assuming there is some possibility to solve the task with the FunctionQuery (or anything within the out-of-the-box Solr). My questions are: Is there any solution without having to write our own extension to Solr? Using only what is delivered in the standard distribution of Solr? Note: In the past we solved the problem within our legacy application with a modified BooleanQuery/BooleanScorer. We could migrate (=rewrite) this extension to the current Solr/Lucene, but it may be not the best option, so I am exploring all the other possibilities. Thank you all & Best regards, Robert