Re: Want zero results from SOLR when there are no matches for "querystring"
Thanks - I'll look at it... On Fri, Aug 12, 2016 at 1:21 PM, Erick Erickson wrote: > Maybe rerankqparserplugin? > > On Aug 12, 2016 11:54, "John Bickerstaff" > wrote: > > > @Hossman -- thanks again. > > > > I've made the following change and so far things look good. I couldn't > see > > debug or find results for what I put in for $func, so I just removed it, > > but making modifications as you suggested appears to be working. > > > > Including the actual line from my endpoint XML in case this thread helps > > someone else... > > > > {!boost defType=synonym_edismax qf='title' synonyms='true' > > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' > > v=$q} > > > > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff < > > j...@johnbickerstaff.com > > > wrote: > > > > > Thanks! I'll check it out. > > > > > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar > > > > wrote: > > > > > >> Not exactly sure what you are looking from chaining the results but > > >> similar > > >> functionality is available in Streaming expressions where result of > > inner > > >> expressions are passed to outer expressions and so on > > >> https://cwiki.apache.org/confluence/display/solr/ > Streaming+Expressions > > >> > > >> HTH > > >> Susheel > > >> > > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < > > >> j...@johnbickerstaff.com> > > >> wrote: > > >> > > >> > Hossman - many thanks again for your comprehensive and very helpful > > >> answer! > > >> > > > >> > All, > > >> > > > >> > I am (possibly mis-remembering) reading something about being able > to > > >> pass > > >> > the results of one query to another query... Essentially "chaining" > > >> result > > >> > sets. > > >> > > > >> > I have looked in docs and can't find anything on a quick search -- I > > may > > >> > have been reading about the Re-Ranking feature, which doesn't help > me > > (I > > >> > know because I just tried and it seems to return all results anyway, > > >> just > > >> > re-ranking the number specified in the reRankDocs flag...) > > >> > > > >> > Is there a way to (cleanly) send the results of one query to another > > >> query > > >> > for further processing? Essentially, pass ONLY the results > (including > > >> an > > >> > empty set of results) to another query for processing? > > >> > > > >> > thanks... > > >> > > > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > > >> > j...@johnbickerstaff.com> > > >> > wrote: > > >> > > > >> > > Thanks! > > >> > > > > >> > > To answer your questions, while I digest the rest of that > > >> information... > > >> > > > > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > > >> > > https://github.com/healthonnet/hon-lucene-synonyms > > >> > > > > >> > > The config looks like this - and IIRC, is simply a copy from the > > >> > > recommended cofig on the site mentioned above. > > >> > > > > >> > > class="com.github.healthonnet. > > >> > search. > > >> > > SynonymExpandingExtendedDismaxQParserPlugin"> > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > solr.PatternTokenizerFactory > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > solr.ShingleFilterFactory > > >> > > true > > >> > > true > > >> > > 2 > > >> > > 4 > > >> > > > > >> > > > > >> > > > > >> > > solr.SynonymFilterFactory > > >> > > solr. > > >> > KeywordTokenizerFactory > > >> > > example_synonym_file.txt > > >> > > true > > >> > > true > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > > >> > hossman_luc...@fucit.org > > >> > > > wrote: > > >> > > > > >> > >> > > >> > >> : First let me say that this is very possibly the "x - y problem" > > so > > >> let > > >> > >> me > > >> > >> : state up front what my ultimate need is -- then I'll ask about > > the > > >> > >> thing I > > >> > >> : imagine might help... which, of course, is heavily biased in > the > > >> > >> direction > > >> > >> : of my experience coding Java and writing SQL... > > >> > >> > > >> > >> Thank you so much for asking your question this way! > > >> > >> > > >> > >> Right off the bat, the background you've provided seems > > supicious... > > >> > >> > > >> > >> : I have a piece of a query that calculates a score based on a > > >> > "weighting" > > >> > >> ... > > >> > >> : The specific line is this: > > >> > >> : product(field(category_weight),20) > > >> > >> : > > >> > >> : What I just realized is that when I query Solr for a string > that > > >> has > > >> > NO > > >> > >> : matches in the entire corpus, I still get a slew of results > > because > > >> > >> EVERY > > >> > >> : doc has the weighting value in the category_weight field - and > > >> > therefore > > >> > >> : every
Re: Want zero results from SOLR when there are no matches for "querystring"
Maybe rerankqparserplugin? On Aug 12, 2016 11:54, "John Bickerstaff" wrote: > @Hossman -- thanks again. > > I've made the following change and so far things look good. I couldn't see > debug or find results for what I put in for $func, so I just removed it, > but making modifications as you suggested appears to be working. > > Including the actual line from my endpoint XML in case this thread helps > someone else... > > {!boost defType=synonym_edismax qf='title' synonyms='true' > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' > v=$q} > > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff < > j...@johnbickerstaff.com > > wrote: > > > Thanks! I'll check it out. > > > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar > > wrote: > > > >> Not exactly sure what you are looking from chaining the results but > >> similar > >> functionality is available in Streaming expressions where result of > inner > >> expressions are passed to outer expressions and so on > >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > >> > >> HTH > >> Susheel > >> > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < > >> j...@johnbickerstaff.com> > >> wrote: > >> > >> > Hossman - many thanks again for your comprehensive and very helpful > >> answer! > >> > > >> > All, > >> > > >> > I am (possibly mis-remembering) reading something about being able to > >> pass > >> > the results of one query to another query... Essentially "chaining" > >> result > >> > sets. > >> > > >> > I have looked in docs and can't find anything on a quick search -- I > may > >> > have been reading about the Re-Ranking feature, which doesn't help me > (I > >> > know because I just tried and it seems to return all results anyway, > >> just > >> > re-ranking the number specified in the reRankDocs flag...) > >> > > >> > Is there a way to (cleanly) send the results of one query to another > >> query > >> > for further processing? Essentially, pass ONLY the results (including > >> an > >> > empty set of results) to another query for processing? > >> > > >> > thanks... > >> > > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > >> > j...@johnbickerstaff.com> > >> > wrote: > >> > > >> > > Thanks! > >> > > > >> > > To answer your questions, while I digest the rest of that > >> information... > >> > > > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > >> > > https://github.com/healthonnet/hon-lucene-synonyms > >> > > > >> > > The config looks like this - and IIRC, is simply a copy from the > >> > > recommended cofig on the site mentioned above. > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > solr.PatternTokenizerFactory > >> > > > >> > > > >> > > > >> > > > >> > > solr.ShingleFilterFactory > >> > > true > >> > > true > >> > > 2 > >> > > 4 > >> > > > >> > > > >> > > > >> > > solr.SynonymFilterFactory > >> > > solr. > >> > KeywordTokenizerFactory > >> > > example_synonym_file.txt > >> > > true > >> > > true > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > >> > hossman_luc...@fucit.org > >> > > > wrote: > >> > > > >> > >> > >> > >> : First let me say that this is very possibly the "x - y problem" > so > >> let > >> > >> me > >> > >> : state up front what my ultimate need is -- then I'll ask about > the > >> > >> thing I > >> > >> : imagine might help... which, of course, is heavily biased in the > >> > >> direction > >> > >> : of my experience coding Java and writing SQL... > >> > >> > >> > >> Thank you so much for asking your question this way! > >> > >> > >> > >> Right off the bat, the background you've provided seems > supicious... > >> > >> > >> > >> : I have a piece of a query that calculates a score based on a > >> > "weighting" > >> > >> ... > >> > >> : The specific line is this: > >> > >> : product(field(category_weight),20) > >> > >> : > >> > >> : What I just realized is that when I query Solr for a string that > >> has > >> > NO > >> > >> : matches in the entire corpus, I still get a slew of results > because > >> > >> EVERY > >> > >> : doc has the weighting value in the category_weight field - and > >> > therefore > >> > >> : every doc gets some score. > >> > >> > >> > >> ...that is *NOT* how dismax and edisamx normally work. > >> > >> > >> > >> While both the "bf" abd "bq" params result in "additive" boosting, > >> and > >> > the > >> > >> implementation of that "additive boost" comes from adding new > >> optional > >> > >> clauses to the top level BooleanQuery that is executed, that only > >> > happens > >> > >> after the "main" query (from your "q" param) is added to that top > >> level > >> > >> BooleanQuery as a "mandaory" clau
Re: Want zero results from SOLR when there are no matches for "querystring"
@Hossman -- thanks again. I've made the following change and so far things look good. I couldn't see debug or find results for what I put in for $func, so I just removed it, but making modifications as you suggested appears to be working. Including the actual line from my endpoint XML in case this thread helps someone else... {!boost defType=synonym_edismax qf='title' synonyms='true' synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' v=$q} On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff wrote: > Thanks! I'll check it out. > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar > wrote: > >> Not exactly sure what you are looking from chaining the results but >> similar >> functionality is available in Streaming expressions where result of inner >> expressions are passed to outer expressions and so on >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions >> >> HTH >> Susheel >> >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < >> j...@johnbickerstaff.com> >> wrote: >> >> > Hossman - many thanks again for your comprehensive and very helpful >> answer! >> > >> > All, >> > >> > I am (possibly mis-remembering) reading something about being able to >> pass >> > the results of one query to another query... Essentially "chaining" >> result >> > sets. >> > >> > I have looked in docs and can't find anything on a quick search -- I may >> > have been reading about the Re-Ranking feature, which doesn't help me (I >> > know because I just tried and it seems to return all results anyway, >> just >> > re-ranking the number specified in the reRankDocs flag...) >> > >> > Is there a way to (cleanly) send the results of one query to another >> query >> > for further processing? Essentially, pass ONLY the results (including >> an >> > empty set of results) to another query for processing? >> > >> > thanks... >> > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < >> > j...@johnbickerstaff.com> >> > wrote: >> > >> > > Thanks! >> > > >> > > To answer your questions, while I digest the rest of that >> information... >> > > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: >> > > https://github.com/healthonnet/hon-lucene-synonyms >> > > >> > > The config looks like this - and IIRC, is simply a copy from the >> > > recommended cofig on the site mentioned above. >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > solr.PatternTokenizerFactory >> > > >> > > >> > > >> > > >> > > solr.ShingleFilterFactory >> > > true >> > > true >> > > 2 >> > > 4 >> > > >> > > >> > > >> > > solr.SynonymFilterFactory >> > > solr. >> > KeywordTokenizerFactory >> > > example_synonym_file.txt >> > > true >> > > true >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < >> > hossman_luc...@fucit.org >> > > > wrote: >> > > >> > >> >> > >> : First let me say that this is very possibly the "x - y problem" so >> let >> > >> me >> > >> : state up front what my ultimate need is -- then I'll ask about the >> > >> thing I >> > >> : imagine might help... which, of course, is heavily biased in the >> > >> direction >> > >> : of my experience coding Java and writing SQL... >> > >> >> > >> Thank you so much for asking your question this way! >> > >> >> > >> Right off the bat, the background you've provided seems supicious... >> > >> >> > >> : I have a piece of a query that calculates a score based on a >> > "weighting" >> > >> ... >> > >> : The specific line is this: >> > >> : product(field(category_weight),20) >> > >> : >> > >> : What I just realized is that when I query Solr for a string that >> has >> > NO >> > >> : matches in the entire corpus, I still get a slew of results because >> > >> EVERY >> > >> : doc has the weighting value in the category_weight field - and >> > therefore >> > >> : every doc gets some score. >> > >> >> > >> ...that is *NOT* how dismax and edisamx normally work. >> > >> >> > >> While both the "bf" abd "bq" params result in "additive" boosting, >> and >> > the >> > >> implementation of that "additive boost" comes from adding new >> optional >> > >> clauses to the top level BooleanQuery that is executed, that only >> > happens >> > >> after the "main" query (from your "q" param) is added to that top >> level >> > >> BooleanQuery as a "mandaory" clause. >> > >> >> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every >> > doc, >> > >> but with the techprducts configs/data these requests still don't >> match >> > >> anything... >> > >> >> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query >> > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query >> > >> >> > >> ...and if you look at the debug output, the parsed queries s
Re: Want zero results from SOLR when there are no matches for "querystring"
Thanks! I'll check it out. On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar wrote: > Not exactly sure what you are looking from chaining the results but similar > functionality is available in Streaming expressions where result of inner > expressions are passed to outer expressions and so on > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > > HTH > Susheel > > On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < > j...@johnbickerstaff.com> > wrote: > > > Hossman - many thanks again for your comprehensive and very helpful > answer! > > > > All, > > > > I am (possibly mis-remembering) reading something about being able to > pass > > the results of one query to another query... Essentially "chaining" > result > > sets. > > > > I have looked in docs and can't find anything on a quick search -- I may > > have been reading about the Re-Ranking feature, which doesn't help me (I > > know because I just tried and it seems to return all results anyway, just > > re-ranking the number specified in the reRankDocs flag...) > > > > Is there a way to (cleanly) send the results of one query to another > query > > for further processing? Essentially, pass ONLY the results (including an > > empty set of results) to another query for processing? > > > > thanks... > > > > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > > j...@johnbickerstaff.com> > > wrote: > > > > > Thanks! > > > > > > To answer your questions, while I digest the rest of that > information... > > > > > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > > > https://github.com/healthonnet/hon-lucene-synonyms > > > > > > The config looks like this - and IIRC, is simply a copy from the > > > recommended cofig on the site mentioned above. > > > > > > > > > > > > > > > > > > > > > > > > > > > solr.PatternTokenizerFactory > > > > > > > > > > > > > > > solr.ShingleFilterFactory > > > true > > > true > > > 2 > > > 4 > > > > > > > > > > > > solr.SynonymFilterFactory > > > solr. > > KeywordTokenizerFactory > > > example_synonym_file.txt > > > true > > > true > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > > hossman_luc...@fucit.org > > > > wrote: > > > > > >> > > >> : First let me say that this is very possibly the "x - y problem" so > let > > >> me > > >> : state up front what my ultimate need is -- then I'll ask about the > > >> thing I > > >> : imagine might help... which, of course, is heavily biased in the > > >> direction > > >> : of my experience coding Java and writing SQL... > > >> > > >> Thank you so much for asking your question this way! > > >> > > >> Right off the bat, the background you've provided seems supicious... > > >> > > >> : I have a piece of a query that calculates a score based on a > > "weighting" > > >> ... > > >> : The specific line is this: > > >> : product(field(category_weight),20) > > >> : > > >> : What I just realized is that when I query Solr for a string that has > > NO > > >> : matches in the entire corpus, I still get a slew of results because > > >> EVERY > > >> : doc has the weighting value in the category_weight field - and > > therefore > > >> : every doc gets some score. > > >> > > >> ...that is *NOT* how dismax and edisamx normally work. > > >> > > >> While both the "bf" abd "bq" params result in "additive" boosting, and > > the > > >> implementation of that "additive boost" comes from adding new optional > > >> clauses to the top level BooleanQuery that is executed, that only > > happens > > >> after the "main" query (from your "q" param) is added to that top > level > > >> BooleanQuery as a "mandaory" clause. > > >> > > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every > > doc, > > >> but with the techprducts configs/data these requests still don't match > > >> anything... > > >> > > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query > > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query > > >> > > >> ...and if you look at the debug output, the parsed queries shows that > > the > > >> "bogus" part of the query is mandatory... > > >> > > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) > > >> FunctionQuery(const(true)) > > >> > > >> (i didn't use "pf" in that example, but the effect is the same, the > "pf" > > >> based clauses are optional, while the "qf" based clauses are > mandatory) > > >> > > >> If you compare that example to your debug output, you'll notice a > > >> difference in structure -- it's a bit hard to see in your example, but > > if > > >> you simplify your qf, pf, and q fields it should be more obvious, but > > >> AFAICT the "main" parts of your query are getting wrapped in an extra > > >> layer of parents (ie: a
Re: Want zero results from SOLR when there are no matches for "querystring"
Not exactly sure what you are looking from chaining the results but similar functionality is available in Streaming expressions where result of inner expressions are passed to outer expressions and so on https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions HTH Susheel On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff wrote: > Hossman - many thanks again for your comprehensive and very helpful answer! > > All, > > I am (possibly mis-remembering) reading something about being able to pass > the results of one query to another query... Essentially "chaining" result > sets. > > I have looked in docs and can't find anything on a quick search -- I may > have been reading about the Re-Ranking feature, which doesn't help me (I > know because I just tried and it seems to return all results anyway, just > re-ranking the number specified in the reRankDocs flag...) > > Is there a way to (cleanly) send the results of one query to another query > for further processing? Essentially, pass ONLY the results (including an > empty set of results) to another query for processing? > > thanks... > > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > j...@johnbickerstaff.com> > wrote: > > > Thanks! > > > > To answer your questions, while I digest the rest of that information... > > > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > > https://github.com/healthonnet/hon-lucene-synonyms > > > > The config looks like this - and IIRC, is simply a copy from the > > recommended cofig on the site mentioned above. > > > > > > > > > > > > > > > > > > solr.PatternTokenizerFactory > > > > > > > > > > solr.ShingleFilterFactory > > true > > true > > 2 > > 4 > > > > > > > > solr.SynonymFilterFactory > > solr. > KeywordTokenizerFactory > > example_synonym_file.txt > > true > > true > > > > > > > > > > > > > > > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > hossman_luc...@fucit.org > > > wrote: > > > >> > >> : First let me say that this is very possibly the "x - y problem" so let > >> me > >> : state up front what my ultimate need is -- then I'll ask about the > >> thing I > >> : imagine might help... which, of course, is heavily biased in the > >> direction > >> : of my experience coding Java and writing SQL... > >> > >> Thank you so much for asking your question this way! > >> > >> Right off the bat, the background you've provided seems supicious... > >> > >> : I have a piece of a query that calculates a score based on a > "weighting" > >> ... > >> : The specific line is this: > >> : product(field(category_weight),20) > >> : > >> : What I just realized is that when I query Solr for a string that has > NO > >> : matches in the entire corpus, I still get a slew of results because > >> EVERY > >> : doc has the weighting value in the category_weight field - and > therefore > >> : every doc gets some score. > >> > >> ...that is *NOT* how dismax and edisamx normally work. > >> > >> While both the "bf" abd "bq" params result in "additive" boosting, and > the > >> implementation of that "additive boost" comes from adding new optional > >> clauses to the top level BooleanQuery that is executed, that only > happens > >> after the "main" query (from your "q" param) is added to that top level > >> BooleanQuery as a "mandaory" clause. > >> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every > doc, > >> but with the techprducts configs/data these requests still don't match > >> anything... > >> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query > >> > >> ...and if you look at the debug output, the parsed queries shows that > the > >> "bogus" part of the query is mandatory... > >> > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) > >> FunctionQuery(const(true)) > >> > >> (i didn't use "pf" in that example, but the effect is the same, the "pf" > >> based clauses are optional, while the "qf" based clauses are mandatory) > >> > >> If you compare that example to your debug output, you'll notice a > >> difference in structure -- it's a bit hard to see in your example, but > if > >> you simplify your qf, pf, and q fields it should be more obvious, but > >> AFAICT the "main" parts of your query are getting wrapped in an extra > >> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in > >> the top level query ... i don't see *any* mandatory clauses in your top > >> level BooleanQuery, which is why any match on a bf or bq function is > >> enough to cause a document to match. > >> > >> I suspect the reason your parsed query structure is so diff has to do > with > >> this... > >> > >> :synonym_edismax> > >> > >> > >> 1) how exactly is "s
Re: Want zero results from SOLR when there are no matches for "querystring"
Hossman - many thanks again for your comprehensive and very helpful answer! All, I am (possibly mis-remembering) reading something about being able to pass the results of one query to another query... Essentially "chaining" result sets. I have looked in docs and can't find anything on a quick search -- I may have been reading about the Re-Ranking feature, which doesn't help me (I know because I just tried and it seems to return all results anyway, just re-ranking the number specified in the reRankDocs flag...) Is there a way to (cleanly) send the results of one query to another query for further processing? Essentially, pass ONLY the results (including an empty set of results) to another query for processing? thanks... On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff wrote: > Thanks! > > To answer your questions, while I digest the rest of that information... > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > https://github.com/healthonnet/hon-lucene-synonyms > > The config looks like this - and IIRC, is simply a copy from the > recommended cofig on the site mentioned above. > > > > > > > > > solr.PatternTokenizerFactory > > > > > solr.ShingleFilterFactory > true > true > 2 > 4 > > > > solr.SynonymFilterFactory > solr.KeywordTokenizerFactory > example_synonym_file.txt > true > true > > > > > > > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter > wrote: > >> >> : First let me say that this is very possibly the "x - y problem" so let >> me >> : state up front what my ultimate need is -- then I'll ask about the >> thing I >> : imagine might help... which, of course, is heavily biased in the >> direction >> : of my experience coding Java and writing SQL... >> >> Thank you so much for asking your question this way! >> >> Right off the bat, the background you've provided seems supicious... >> >> : I have a piece of a query that calculates a score based on a "weighting" >> ... >> : The specific line is this: >> : product(field(category_weight),20) >> : >> : What I just realized is that when I query Solr for a string that has NO >> : matches in the entire corpus, I still get a slew of results because >> EVERY >> : doc has the weighting value in the category_weight field - and therefore >> : every doc gets some score. >> >> ...that is *NOT* how dismax and edisamx normally work. >> >> While both the "bf" abd "bq" params result in "additive" boosting, and the >> implementation of that "additive boost" comes from adding new optional >> clauses to the top level BooleanQuery that is executed, that only happens >> after the "main" query (from your "q" param) is added to that top level >> BooleanQuery as a "mandaory" clause. >> >> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, >> but with the techprducts configs/data these requests still don't match >> anything... >> >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query >> >> ...and if you look at the debug output, the parsed queries shows that the >> "bogus" part of the query is mandatory... >> >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) >> FunctionQuery(const(true)) >> >> (i didn't use "pf" in that example, but the effect is the same, the "pf" >> based clauses are optional, while the "qf" based clauses are mandatory) >> >> If you compare that example to your debug output, you'll notice a >> difference in structure -- it's a bit hard to see in your example, but if >> you simplify your qf, pf, and q fields it should be more obvious, but >> AFAICT the "main" parts of your query are getting wrapped in an extra >> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in >> the top level query ... i don't see *any* mandatory clauses in your top >> level BooleanQuery, which is why any match on a bf or bq function is >> enough to cause a document to match. >> >> I suspect the reason your parsed query structure is so diff has to do with >> this... >> >> :synonym_edismax> >> >> >> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? >> 2) what QParserPlugin are you using to implement that? >> >> I suspect whatever QParserPlugin you are using has a bug in it :) >> >> >> If you can't fix the bug, one possibile workaround would be to abandon bf >> and bq params completely, and instead wrap the query it produces in in a >> {!boost} parser with whatever function you want (using functions like >> sum() or prod() to combine multiple functions, and query() to incorporate >> your current bq param). Doing this will require chanign how you specify >> you input (example below) and it will result in *multiplicitive* boosts -- >> so your scores will be much diff, and y
Re: Want zero results from SOLR when there are no matches for "querystring"
Thanks! To answer your questions, while I digest the rest of that information... I'm using the hon-lucene-synonyms.5.0.4.jar from here: https://github.com/healthonnet/hon-lucene-synonyms The config looks like this - and IIRC, is simply a copy from the recommended cofig on the site mentioned above. solr.PatternTokenizerFactory solr.ShingleFilterFactory true true 2 4 solr.SynonymFilterFactory solr.KeywordTokenizerFactory example_synonym_file.txt true true On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter wrote: > > : First let me say that this is very possibly the "x - y problem" so let me > : state up front what my ultimate need is -- then I'll ask about the thing > I > : imagine might help... which, of course, is heavily biased in the > direction > : of my experience coding Java and writing SQL... > > Thank you so much for asking your question this way! > > Right off the bat, the background you've provided seems supicious... > > : I have a piece of a query that calculates a score based on a "weighting" > ... > : The specific line is this: > : product(field(category_weight),20) > : > : What I just realized is that when I query Solr for a string that has NO > : matches in the entire corpus, I still get a slew of results because EVERY > : doc has the weighting value in the category_weight field - and therefore > : every doc gets some score. > > ...that is *NOT* how dismax and edisamx normally work. > > While both the "bf" abd "bq" params result in "additive" boosting, and the > implementation of that "additive boost" comes from adding new optional > clauses to the top level BooleanQuery that is executed, that only happens > after the "main" query (from your "q" param) is added to that top level > BooleanQuery as a "mandaory" clause. > > So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, > but with the techprducts configs/data these requests still don't match > anything... > > /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query > /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query > > ...and if you look at the debug output, the parsed queries shows that the > "bogus" part of the query is mandatory... > > +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) > FunctionQuery(const(true)) > > (i didn't use "pf" in that example, but the effect is the same, the "pf" > based clauses are optional, while the "qf" based clauses are mandatory) > > If you compare that example to your debug output, you'll notice a > difference in structure -- it's a bit hard to see in your example, but if > you simplify your qf, pf, and q fields it should be more obvious, but > AFAICT the "main" parts of your query are getting wrapped in an extra > layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in > the top level query ... i don't see *any* mandatory clauses in your top > level BooleanQuery, which is why any match on a bf or bq function is > enough to cause a document to match. > > I suspect the reason your parsed query structure is so diff has to do with > this... > > :synonym_edismax> > > > 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? > 2) what QParserPlugin are you using to implement that? > > I suspect whatever QParserPlugin you are using has a bug in it :) > > > If you can't fix the bug, one possibile workaround would be to abandon bf > and bq params completely, and instead wrap the query it produces in in a > {!boost} parser with whatever function you want (using functions like > sum() or prod() to combine multiple functions, and query() to incorporate > your current bq param). Doing this will require chanign how you specify > you input (example below) and it will result in *multiplicitive* boosts -- > so your scores will be much diff, and you will likely have to adjust your > constants, but: 1) multiplicitive boosts are almost always what people > *really* want anyway; 2) it will ensure the boosts are only applied for > things matching your main query, no matter how that query parser works or > what bugs it has. > > Example of using {!boost} to wrap an arbitrary other parser... > > instead of... > defType=foofoo > q=barbarbar > > use... >q={!boost b=$func defType=foofoo v=$qq} > qq=barbarbar > func=sum(something,somethingelse) > > https://cwiki.apache.org/confluence/display/solr/Other+Parsers > https://cwiki.apache.org/confluence/display/solr/Function+Queries > > > > > : > : What I would like is to return zero results if there is no match for the > : querystring. My collection is small enough that I don't care if the > actual > : calculation runs on each doc (although that's wasteful) -- I just don't > : want to see results come back for zero matches to the querystring
Re: Want zero results from SOLR when there are no matches for "querystring"
: First let me say that this is very possibly the "x - y problem" so let me : state up front what my ultimate need is -- then I'll ask about the thing I : imagine might help... which, of course, is heavily biased in the direction : of my experience coding Java and writing SQL... Thank you so much for asking your question this way! Right off the bat, the background you've provided seems supicious... : I have a piece of a query that calculates a score based on a "weighting" ... : The specific line is this: : product(field(category_weight),20) : : What I just realized is that when I query Solr for a string that has NO : matches in the entire corpus, I still get a slew of results because EVERY : doc has the weighting value in the category_weight field - and therefore : every doc gets some score. ...that is *NOT* how dismax and edisamx normally work. While both the "bf" abd "bq" params result in "additive" boosting, and the implementation of that "additive boost" comes from adding new optional clauses to the top level BooleanQuery that is executed, that only happens after the "main" query (from your "q" param) is added to that top level BooleanQuery as a "mandaory" clause. So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, but with the techprducts configs/data these requests still don't match anything... /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query ...and if you look at the debug output, the parsed queries shows that the "bogus" part of the query is mandatory... +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) FunctionQuery(const(true)) (i didn't use "pf" in that example, but the effect is the same, the "pf" based clauses are optional, while the "qf" based clauses are mandatory) If you compare that example to your debug output, you'll notice a difference in structure -- it's a bit hard to see in your example, but if you simplify your qf, pf, and q fields it should be more obvious, but AFAICT the "main" parts of your query are getting wrapped in an extra layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in the top level query ... i don't see *any* mandatory clauses in your top level BooleanQuery, which is why any match on a bf or bq function is enough to cause a document to match. I suspect the reason your parsed query structure is so diff has to do with this... :synonym_edismax> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? 2) what QParserPlugin are you using to implement that? I suspect whatever QParserPlugin you are using has a bug in it :) If you can't fix the bug, one possibile workaround would be to abandon bf and bq params completely, and instead wrap the query it produces in in a {!boost} parser with whatever function you want (using functions like sum() or prod() to combine multiple functions, and query() to incorporate your current bq param). Doing this will require chanign how you specify you input (example below) and it will result in *multiplicitive* boosts -- so your scores will be much diff, and you will likely have to adjust your constants, but: 1) multiplicitive boosts are almost always what people *really* want anyway; 2) it will ensure the boosts are only applied for things matching your main query, no matter how that query parser works or what bugs it has. Example of using {!boost} to wrap an arbitrary other parser... instead of... defType=foofoo q=barbarbar use... q={!boost b=$func defType=foofoo v=$qq} qq=barbarbar func=sum(something,somethingelse) https://cwiki.apache.org/confluence/display/solr/Other+Parsers https://cwiki.apache.org/confluence/display/solr/Function+Queries : : What I would like is to return zero results if there is no match for the : querystring. My collection is small enough that I don't care if the actual : calculation runs on each doc (although that's wasteful) -- I just don't : want to see results come back for zero matches to the querystring : : (The /select endpoint does this of course, but my custom endpoint includes : this "weighting" piece and therefore returns every doc in the corpus : because they all have the weighting. : : : Enter my imagined solution... The potential X-Y problem... : : : So - given that I come from a programming background, I immediately start : thinking of an if statement ... : : if(some_score_for_the_primary_search_string) { : run_the_category_weight_calculation; : } else { : do_NOT_run_category_weight_calc; : } : : : Another way of thinking of it would be something like the "WHERE" clause in : SQL... : : run_category_weight_calculation WHERE "searchstring" is found in the : document, not otherwise. : : I'm aware that things could be handled in the client-side of my web app, : but if possible, I'd like the interface to SOL