Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Thanks - I'll look at it...

On Fri, Aug 12, 2016 at 1:21 PM, Erick Erickson 
wrote:

> Maybe rerankqparserplugin?
>
> On Aug 12, 2016 11:54, "John Bickerstaff" 
> wrote:
>
> > @Hossman --  thanks again.
> >
> > I've made the following change and so far things look good.  I couldn't
> see
> > debug or find results for what I put in for $func, so I just removed it,
> > but making modifications as you suggested appears to be working.
> >
> > Including the actual line from my endpoint XML in case this thread helps
> > someone else...
> >
> > {!boost defType=synonym_edismax qf='title' synonyms='true'
> > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
> > v=$q}
> >
> > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <
> > j...@johnbickerstaff.com
> > > wrote:
> >
> > > Thanks!  I'll check it out.
> > >
> > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar  >
> > > wrote:
> > >
> > >> Not exactly sure what you are looking from chaining the results but
> > >> similar
> > >> functionality is available in Streaming expressions where result of
> > inner
> > >> expressions are passed to outer expressions and so on
> > >> https://cwiki.apache.org/confluence/display/solr/
> Streaming+Expressions
> > >>
> > >> HTH
> > >> Susheel
> > >>
> > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> > >> j...@johnbickerstaff.com>
> > >> wrote:
> > >>
> > >> > Hossman - many thanks again for your comprehensive and very helpful
> > >> answer!
> > >> >
> > >> > All,
> > >> >
> > >> > I am (possibly mis-remembering) reading something about being able
> to
> > >> pass
> > >> > the results of one query to another query...  Essentially "chaining"
> > >> result
> > >> > sets.
> > >> >
> > >> > I have looked in docs and can't find anything on a quick search -- I
> > may
> > >> > have been reading about the Re-Ranking feature, which doesn't help
> me
> > (I
> > >> > know because I just tried and it seems to return all results anyway,
> > >> just
> > >> > re-ranking the number specified in the reRankDocs flag...)
> > >> >
> > >> > Is there a way to (cleanly) send the results of one query to another
> > >> query
> > >> > for further processing?  Essentially, pass ONLY the results
> (including
> > >> an
> > >> > empty set of results) to another query for processing?
> > >> >
> > >> > thanks...
> > >> >
> > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > >> > j...@johnbickerstaff.com>
> > >> > wrote:
> > >> >
> > >> > > Thanks!
> > >> > >
> > >> > > To answer your questions, while I digest the rest of that
> > >> information...
> > >> > >
> > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > >> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >> > >
> > >> > > The config looks like this - and IIRC, is simply a copy from the
> > >> > > recommended cofig on the site mentioned above.
> > >> > >
> > >> > >   class="com.github.healthonnet.
> > >> > search.
> > >> > > SynonymExpandingExtendedDismaxQParserPlugin">
> > >> > > 
> > >> > > 
> > >> > >   
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > >   solr.PatternTokenizerFactory
> > >> > >   
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.ShingleFilterFactory
> > >> > >   true
> > >> > >   true
> > >> > >   2
> > >> > >   4
> > >> > > 
> > >> > > 
> > >> > > 
> > >> > >   solr.SynonymFilterFactory
> > >> > >   solr.
> > >> > KeywordTokenizerFactory
> > >> > >   example_synonym_file.txt
> > >> > >   true
> > >> > >   true
> > >> > > 
> > >> > >   
> > >> > > 
> > >> > >   
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> > >> > hossman_luc...@fucit.org
> > >> > > > wrote:
> > >> > >
> > >> > >>
> > >> > >> : First let me say that this is very possibly the "x - y problem"
> > so
> > >> let
> > >> > >> me
> > >> > >> : state up front what my ultimate need is -- then I'll ask about
> > the
> > >> > >> thing I
> > >> > >> : imagine might help...  which, of course, is heavily biased in
> the
> > >> > >> direction
> > >> > >> : of my experience coding Java and writing SQL...
> > >> > >>
> > >> > >> Thank you so much for asking your question this way!
> > >> > >>
> > >> > >> Right off the bat, the background you've provided seems
> > supicious...
> > >> > >>
> > >> > >> : I have a piece of a query that calculates a score based on a
> > >> > "weighting"
> > >> > >> ...
> > >> > >> : The specific line is this:
> > >> > >> : product(field(category_weight),20)
> > >> > >> :
> > >> > >> : What I just realized is that when I query Solr for a string
> that
> > >> has
> > >> > NO
> > >> > >> : matches in the entire corpus, I still get a slew of results
> > because
> > >> > >> EVERY
> > >> > >> : doc has the weighting value in the category_weight field - and
> > >> > therefore
> > >> > >> : every 

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread Erick Erickson
Maybe rerankqparserplugin?

On Aug 12, 2016 11:54, "John Bickerstaff"  wrote:

> @Hossman --  thanks again.
>
> I've made the following change and so far things look good.  I couldn't see
> debug or find results for what I put in for $func, so I just removed it,
> but making modifications as you suggested appears to be working.
>
> Including the actual line from my endpoint XML in case this thread helps
> someone else...
>
> {!boost defType=synonym_edismax qf='title' synonyms='true'
> synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
> v=$q}
>
> On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff <
> j...@johnbickerstaff.com
> > wrote:
>
> > Thanks!  I'll check it out.
> >
> > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
> > wrote:
> >
> >> Not exactly sure what you are looking from chaining the results but
> >> similar
> >> functionality is available in Streaming expressions where result of
> inner
> >> expressions are passed to outer expressions and so on
> >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
> >>
> >> HTH
> >> Susheel
> >>
> >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> >> j...@johnbickerstaff.com>
> >> wrote:
> >>
> >> > Hossman - many thanks again for your comprehensive and very helpful
> >> answer!
> >> >
> >> > All,
> >> >
> >> > I am (possibly mis-remembering) reading something about being able to
> >> pass
> >> > the results of one query to another query...  Essentially "chaining"
> >> result
> >> > sets.
> >> >
> >> > I have looked in docs and can't find anything on a quick search -- I
> may
> >> > have been reading about the Re-Ranking feature, which doesn't help me
> (I
> >> > know because I just tried and it seems to return all results anyway,
> >> just
> >> > re-ranking the number specified in the reRankDocs flag...)
> >> >
> >> > Is there a way to (cleanly) send the results of one query to another
> >> query
> >> > for further processing?  Essentially, pass ONLY the results (including
> >> an
> >> > empty set of results) to another query for processing?
> >> >
> >> > thanks...
> >> >
> >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> >> > j...@johnbickerstaff.com>
> >> > wrote:
> >> >
> >> > > Thanks!
> >> > >
> >> > > To answer your questions, while I digest the rest of that
> >> information...
> >> > >
> >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> >> > > https://github.com/healthonnet/hon-lucene-synonyms
> >> > >
> >> > > The config looks like this - and IIRC, is simply a copy from the
> >> > > recommended cofig on the site mentioned above.
> >> > >
> >> > >  
> >> > > 
> >> > > 
> >> > >   
> >> > >   
> >> > > 
> >> > > 
> >> > >   solr.PatternTokenizerFactory
> >> > >   
> >> > > 
> >> > > 
> >> > > 
> >> > >   solr.ShingleFilterFactory
> >> > >   true
> >> > >   true
> >> > >   2
> >> > >   4
> >> > > 
> >> > > 
> >> > > 
> >> > >   solr.SynonymFilterFactory
> >> > >   solr.
> >> > KeywordTokenizerFactory
> >> > >   example_synonym_file.txt
> >> > >   true
> >> > >   true
> >> > > 
> >> > >   
> >> > > 
> >> > >   
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> >> > hossman_luc...@fucit.org
> >> > > > wrote:
> >> > >
> >> > >>
> >> > >> : First let me say that this is very possibly the "x - y problem"
> so
> >> let
> >> > >> me
> >> > >> : state up front what my ultimate need is -- then I'll ask about
> the
> >> > >> thing I
> >> > >> : imagine might help...  which, of course, is heavily biased in the
> >> > >> direction
> >> > >> : of my experience coding Java and writing SQL...
> >> > >>
> >> > >> Thank you so much for asking your question this way!
> >> > >>
> >> > >> Right off the bat, the background you've provided seems
> supicious...
> >> > >>
> >> > >> : I have a piece of a query that calculates a score based on a
> >> > "weighting"
> >> > >> ...
> >> > >> : The specific line is this:
> >> > >> : product(field(category_weight),20)
> >> > >> :
> >> > >> : What I just realized is that when I query Solr for a string that
> >> has
> >> > NO
> >> > >> : matches in the entire corpus, I still get a slew of results
> because
> >> > >> EVERY
> >> > >> : doc has the weighting value in the category_weight field - and
> >> > therefore
> >> > >> : every doc gets some score.
> >> > >>
> >> > >> ...that is *NOT* how dismax and edisamx normally work.
> >> > >>
> >> > >> While both the "bf" abd "bq" params result in "additive" boosting,
> >> and
> >> > the
> >> > >> implementation of that "additive boost" comes from adding new
> >> optional
> >> > >> clauses to the top level BooleanQuery that is executed, that only
> >> > happens
> >> > >> after the "main" query (from your "q" param) is added to that top
> >> level
> >> > >> BooleanQuery as a "mandaory" clau

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
@Hossman --  thanks again.

I've made the following change and so far things look good.  I couldn't see
debug or find results for what I put in for $func, so I just removed it,
but making modifications as you suggested appears to be working.

Including the actual line from my endpoint XML in case this thread helps
someone else...

{!boost defType=synonym_edismax qf='title' synonyms='true'
synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq=''
v=$q}

On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff  wrote:

> Thanks!  I'll check it out.
>
> On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
> wrote:
>
>> Not exactly sure what you are looking from chaining the results but
>> similar
>> functionality is available in Streaming expressions where result of inner
>> expressions are passed to outer expressions and so on
>> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>>
>> HTH
>> Susheel
>>
>> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
>> j...@johnbickerstaff.com>
>> wrote:
>>
>> > Hossman - many thanks again for your comprehensive and very helpful
>> answer!
>> >
>> > All,
>> >
>> > I am (possibly mis-remembering) reading something about being able to
>> pass
>> > the results of one query to another query...  Essentially "chaining"
>> result
>> > sets.
>> >
>> > I have looked in docs and can't find anything on a quick search -- I may
>> > have been reading about the Re-Ranking feature, which doesn't help me (I
>> > know because I just tried and it seems to return all results anyway,
>> just
>> > re-ranking the number specified in the reRankDocs flag...)
>> >
>> > Is there a way to (cleanly) send the results of one query to another
>> query
>> > for further processing?  Essentially, pass ONLY the results (including
>> an
>> > empty set of results) to another query for processing?
>> >
>> > thanks...
>> >
>> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
>> > j...@johnbickerstaff.com>
>> > wrote:
>> >
>> > > Thanks!
>> > >
>> > > To answer your questions, while I digest the rest of that
>> information...
>> > >
>> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
>> > > https://github.com/healthonnet/hon-lucene-synonyms
>> > >
>> > > The config looks like this - and IIRC, is simply a copy from the
>> > > recommended cofig on the site mentioned above.
>> > >
>> > >  
>> > > 
>> > > 
>> > >   
>> > >   
>> > > 
>> > > 
>> > >   solr.PatternTokenizerFactory
>> > >   
>> > > 
>> > > 
>> > > 
>> > >   solr.ShingleFilterFactory
>> > >   true
>> > >   true
>> > >   2
>> > >   4
>> > > 
>> > > 
>> > > 
>> > >   solr.SynonymFilterFactory
>> > >   solr.
>> > KeywordTokenizerFactory
>> > >   example_synonym_file.txt
>> > >   true
>> > >   true
>> > > 
>> > >   
>> > > 
>> > >   
>> > >
>> > >
>> > >
>> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
>> > hossman_luc...@fucit.org
>> > > > wrote:
>> > >
>> > >>
>> > >> : First let me say that this is very possibly the "x - y problem" so
>> let
>> > >> me
>> > >> : state up front what my ultimate need is -- then I'll ask about the
>> > >> thing I
>> > >> : imagine might help...  which, of course, is heavily biased in the
>> > >> direction
>> > >> : of my experience coding Java and writing SQL...
>> > >>
>> > >> Thank you so much for asking your question this way!
>> > >>
>> > >> Right off the bat, the background you've provided seems supicious...
>> > >>
>> > >> : I have a piece of a query that calculates a score based on a
>> > "weighting"
>> > >> ...
>> > >> : The specific line is this:
>> > >> : product(field(category_weight),20)
>> > >> :
>> > >> : What I just realized is that when I query Solr for a string that
>> has
>> > NO
>> > >> : matches in the entire corpus, I still get a slew of results because
>> > >> EVERY
>> > >> : doc has the weighting value in the category_weight field - and
>> > therefore
>> > >> : every doc gets some score.
>> > >>
>> > >> ...that is *NOT* how dismax and edisamx normally work.
>> > >>
>> > >> While both the "bf" abd "bq" params result in "additive" boosting,
>> and
>> > the
>> > >> implementation of that "additive boost" comes from adding new
>> optional
>> > >> clauses to the top level BooleanQuery that is executed, that only
>> > happens
>> > >> after the "main" query (from your "q" param) is added to that top
>> level
>> > >> BooleanQuery as a "mandaory" clause.
>> > >>
>> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
>> > doc,
>> > >> but with the techprducts configs/data these requests still don't
>> match
>> > >> anything...
>> > >>
>> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
>> > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
>> > >>
>> > >> ...and if you look at the debug output, the parsed queries s

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Thanks!  I'll check it out.

On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar 
wrote:

> Not exactly sure what you are looking from chaining the results but similar
> functionality is available in Streaming expressions where result of inner
> expressions are passed to outer expressions and so on
> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions
>
> HTH
> Susheel
>
> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Hossman - many thanks again for your comprehensive and very helpful
> answer!
> >
> > All,
> >
> > I am (possibly mis-remembering) reading something about being able to
> pass
> > the results of one query to another query...  Essentially "chaining"
> result
> > sets.
> >
> > I have looked in docs and can't find anything on a quick search -- I may
> > have been reading about the Re-Ranking feature, which doesn't help me (I
> > know because I just tried and it seems to return all results anyway, just
> > re-ranking the number specified in the reRankDocs flag...)
> >
> > Is there a way to (cleanly) send the results of one query to another
> query
> > for further processing?  Essentially, pass ONLY the results (including an
> > empty set of results) to another query for processing?
> >
> > thanks...
> >
> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> > j...@johnbickerstaff.com>
> > wrote:
> >
> > > Thanks!
> > >
> > > To answer your questions, while I digest the rest of that
> information...
> > >
> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > > https://github.com/healthonnet/hon-lucene-synonyms
> > >
> > > The config looks like this - and IIRC, is simply a copy from the
> > > recommended cofig on the site mentioned above.
> > >
> > >  
> > > 
> > > 
> > >   
> > >   
> > > 
> > > 
> > >   solr.PatternTokenizerFactory
> > >   
> > > 
> > > 
> > > 
> > >   solr.ShingleFilterFactory
> > >   true
> > >   true
> > >   2
> > >   4
> > > 
> > > 
> > > 
> > >   solr.SynonymFilterFactory
> > >   solr.
> > KeywordTokenizerFactory
> > >   example_synonym_file.txt
> > >   true
> > >   true
> > > 
> > >   
> > > 
> > >   
> > >
> > >
> > >
> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> > hossman_luc...@fucit.org
> > > > wrote:
> > >
> > >>
> > >> : First let me say that this is very possibly the "x - y problem" so
> let
> > >> me
> > >> : state up front what my ultimate need is -- then I'll ask about the
> > >> thing I
> > >> : imagine might help...  which, of course, is heavily biased in the
> > >> direction
> > >> : of my experience coding Java and writing SQL...
> > >>
> > >> Thank you so much for asking your question this way!
> > >>
> > >> Right off the bat, the background you've provided seems supicious...
> > >>
> > >> : I have a piece of a query that calculates a score based on a
> > "weighting"
> > >> ...
> > >> : The specific line is this:
> > >> : product(field(category_weight),20)
> > >> :
> > >> : What I just realized is that when I query Solr for a string that has
> > NO
> > >> : matches in the entire corpus, I still get a slew of results because
> > >> EVERY
> > >> : doc has the weighting value in the category_weight field - and
> > therefore
> > >> : every doc gets some score.
> > >>
> > >> ...that is *NOT* how dismax and edisamx normally work.
> > >>
> > >> While both the "bf" abd "bq" params result in "additive" boosting, and
> > the
> > >> implementation of that "additive boost" comes from adding new optional
> > >> clauses to the top level BooleanQuery that is executed, that only
> > happens
> > >> after the "main" query (from your "q" param) is added to that top
> level
> > >> BooleanQuery as a "mandaory" clause.
> > >>
> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
> > doc,
> > >> but with the techprducts configs/data these requests still don't match
> > >> anything...
> > >>
> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
> > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
> > >>
> > >> ...and if you look at the debug output, the parsed queries shows that
> > the
> > >> "bogus" part of the query is mandatory...
> > >>
> > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> > >> FunctionQuery(const(true))
> > >>
> > >> (i didn't use "pf" in that example, but the effect is the same, the
> "pf"
> > >> based clauses are optional, while the "qf" based clauses are
> mandatory)
> > >>
> > >> If you compare that example to your debug output, you'll notice a
> > >> difference in structure -- it's a bit hard to see in your example, but
> > if
> > >> you simplify your qf, pf, and q fields it should be more obvious, but
> > >> AFAICT the "main" parts of your query are getting wrapped in an extra
> > >> layer of parents (ie: a

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread Susheel Kumar
Not exactly sure what you are looking from chaining the results but similar
functionality is available in Streaming expressions where result of inner
expressions are passed to outer expressions and so on
https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions

HTH
Susheel

On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff 
wrote:

> Hossman - many thanks again for your comprehensive and very helpful answer!
>
> All,
>
> I am (possibly mis-remembering) reading something about being able to pass
> the results of one query to another query...  Essentially "chaining" result
> sets.
>
> I have looked in docs and can't find anything on a quick search -- I may
> have been reading about the Re-Ranking feature, which doesn't help me (I
> know because I just tried and it seems to return all results anyway, just
> re-ranking the number specified in the reRankDocs flag...)
>
> Is there a way to (cleanly) send the results of one query to another query
> for further processing?  Essentially, pass ONLY the results (including an
> empty set of results) to another query for processing?
>
> thanks...
>
> On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff <
> j...@johnbickerstaff.com>
> wrote:
>
> > Thanks!
> >
> > To answer your questions, while I digest the rest of that information...
> >
> > I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> > https://github.com/healthonnet/hon-lucene-synonyms
> >
> > The config looks like this - and IIRC, is simply a copy from the
> > recommended cofig on the site mentioned above.
> >
> >  
> > 
> > 
> >   
> >   
> > 
> > 
> >   solr.PatternTokenizerFactory
> >   
> > 
> > 
> > 
> >   solr.ShingleFilterFactory
> >   true
> >   true
> >   2
> >   4
> > 
> > 
> > 
> >   solr.SynonymFilterFactory
> >   solr.
> KeywordTokenizerFactory
> >   example_synonym_file.txt
> >   true
> >   true
> > 
> >   
> > 
> >   
> >
> >
> >
> > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter <
> hossman_luc...@fucit.org
> > > wrote:
> >
> >>
> >> : First let me say that this is very possibly the "x - y problem" so let
> >> me
> >> : state up front what my ultimate need is -- then I'll ask about the
> >> thing I
> >> : imagine might help...  which, of course, is heavily biased in the
> >> direction
> >> : of my experience coding Java and writing SQL...
> >>
> >> Thank you so much for asking your question this way!
> >>
> >> Right off the bat, the background you've provided seems supicious...
> >>
> >> : I have a piece of a query that calculates a score based on a
> "weighting"
> >> ...
> >> : The specific line is this:
> >> : product(field(category_weight),20)
> >> :
> >> : What I just realized is that when I query Solr for a string that has
> NO
> >> : matches in the entire corpus, I still get a slew of results because
> >> EVERY
> >> : doc has the weighting value in the category_weight field - and
> therefore
> >> : every doc gets some score.
> >>
> >> ...that is *NOT* how dismax and edisamx normally work.
> >>
> >> While both the "bf" abd "bq" params result in "additive" boosting, and
> the
> >> implementation of that "additive boost" comes from adding new optional
> >> clauses to the top level BooleanQuery that is executed, that only
> happens
> >> after the "main" query (from your "q" param) is added to that top level
> >> BooleanQuery as a "mandaory" clause.
> >>
> >> So, for example, "bf=true()" and "bq=*:*" should match & boost every
> doc,
> >> but with the techprducts configs/data these requests still don't match
> >> anything...
> >>
> >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
> >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
> >>
> >> ...and if you look at the debug output, the parsed queries shows that
> the
> >> "bogus" part of the query is mandatory...
> >>
> >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> >> FunctionQuery(const(true))
> >>
> >> (i didn't use "pf" in that example, but the effect is the same, the "pf"
> >> based clauses are optional, while the "qf" based clauses are mandatory)
> >>
> >> If you compare that example to your debug output, you'll notice a
> >> difference in structure -- it's a bit hard to see in your example, but
> if
> >> you simplify your qf, pf, and q fields it should be more obvious, but
> >> AFAICT the "main" parts of your query are getting wrapped in an extra
> >> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
> >> the top level query ... i don't see *any* mandatory clauses in your top
> >> level BooleanQuery, which is why any match on a bf or bq function is
> >> enough to cause a document to match.
> >>
> >> I suspect the reason your parsed query structure is so diff has to do
> with
> >> this...
> >>
> >> :synonym_edismax>
> >>
> >>
> >> 1) how exactly is "s

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-12 Thread John Bickerstaff
Hossman - many thanks again for your comprehensive and very helpful answer!

All,

I am (possibly mis-remembering) reading something about being able to pass
the results of one query to another query...  Essentially "chaining" result
sets.

I have looked in docs and can't find anything on a quick search -- I may
have been reading about the Re-Ranking feature, which doesn't help me (I
know because I just tried and it seems to return all results anyway, just
re-ranking the number specified in the reRankDocs flag...)

Is there a way to (cleanly) send the results of one query to another query
for further processing?  Essentially, pass ONLY the results (including an
empty set of results) to another query for processing?

thanks...

On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff 
wrote:

> Thanks!
>
> To answer your questions, while I digest the rest of that information...
>
> I'm using the hon-lucene-synonyms.5.0.4.jar from here:
> https://github.com/healthonnet/hon-lucene-synonyms
>
> The config looks like this - and IIRC, is simply a copy from the
> recommended cofig on the site mentioned above.
>
>  
> 
> 
>   
>   
> 
> 
>   solr.PatternTokenizerFactory
>   
> 
> 
> 
>   solr.ShingleFilterFactory
>   true
>   true
>   2
>   4
> 
> 
> 
>   solr.SynonymFilterFactory
>   solr.KeywordTokenizerFactory
>   example_synonym_file.txt
>   true
>   true
> 
>   
> 
>   
>
>
>
> On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter  > wrote:
>
>>
>> : First let me say that this is very possibly the "x - y problem" so let
>> me
>> : state up front what my ultimate need is -- then I'll ask about the
>> thing I
>> : imagine might help...  which, of course, is heavily biased in the
>> direction
>> : of my experience coding Java and writing SQL...
>>
>> Thank you so much for asking your question this way!
>>
>> Right off the bat, the background you've provided seems supicious...
>>
>> : I have a piece of a query that calculates a score based on a "weighting"
>> ...
>> : The specific line is this:
>> : product(field(category_weight),20)
>> :
>> : What I just realized is that when I query Solr for a string that has NO
>> : matches in the entire corpus, I still get a slew of results because
>> EVERY
>> : doc has the weighting value in the category_weight field - and therefore
>> : every doc gets some score.
>>
>> ...that is *NOT* how dismax and edisamx normally work.
>>
>> While both the "bf" abd "bq" params result in "additive" boosting, and the
>> implementation of that "additive boost" comes from adding new optional
>> clauses to the top level BooleanQuery that is executed, that only happens
>> after the "main" query (from your "q" param) is added to that top level
>> BooleanQuery as a "mandaory" clause.
>>
>> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
>> but with the techprducts configs/data these requests still don't match
>> anything...
>>
>> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
>> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
>>
>> ...and if you look at the debug output, the parsed queries shows that the
>> "bogus" part of the query is mandatory...
>>
>> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
>> FunctionQuery(const(true))
>>
>> (i didn't use "pf" in that example, but the effect is the same, the "pf"
>> based clauses are optional, while the "qf" based clauses are mandatory)
>>
>> If you compare that example to your debug output, you'll notice a
>> difference in structure -- it's a bit hard to see in your example, but if
>> you simplify your qf, pf, and q fields it should be more obvious, but
>> AFAICT the "main" parts of your query are getting wrapped in an extra
>> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
>> the top level query ... i don't see *any* mandatory clauses in your top
>> level BooleanQuery, which is why any match on a bf or bq function is
>> enough to cause a document to match.
>>
>> I suspect the reason your parsed query structure is so diff has to do with
>> this...
>>
>> :synonym_edismax>
>>
>>
>> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml?
>> 2) what QParserPlugin are you using to implement that?
>>
>> I suspect whatever QParserPlugin you are using has a bug in it :)
>>
>>
>> If you can't fix the bug, one possibile workaround would be to abandon bf
>> and bq params completely, and instead wrap the query it produces in in a
>> {!boost} parser with whatever function you want (using functions like
>> sum() or prod() to combine multiple functions, and query() to incorporate
>> your current bq param).  Doing this will require chanign how you specify
>> you input (example below) and it will result in *multiplicitive* boosts --
>> so your scores will be much diff, and y

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread John Bickerstaff
Thanks!

To answer your questions, while I digest the rest of that information...

I'm using the hon-lucene-synonyms.5.0.4.jar from here:
https://github.com/healthonnet/hon-lucene-synonyms

The config looks like this - and IIRC, is simply a copy from the
recommended cofig on the site mentioned above.

 


  
  


  solr.PatternTokenizerFactory
  



  solr.ShingleFilterFactory
  true
  true
  2
  4



  solr.SynonymFilterFactory
  solr.KeywordTokenizerFactory
  example_synonym_file.txt
  true
  true

  

  



On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter 
wrote:

>
> : First let me say that this is very possibly the "x - y problem" so let me
> : state up front what my ultimate need is -- then I'll ask about the thing
> I
> : imagine might help...  which, of course, is heavily biased in the
> direction
> : of my experience coding Java and writing SQL...
>
> Thank you so much for asking your question this way!
>
> Right off the bat, the background you've provided seems supicious...
>
> : I have a piece of a query that calculates a score based on a "weighting"
> ...
> : The specific line is this:
> : product(field(category_weight),20)
> :
> : What I just realized is that when I query Solr for a string that has NO
> : matches in the entire corpus, I still get a slew of results because EVERY
> : doc has the weighting value in the category_weight field - and therefore
> : every doc gets some score.
>
> ...that is *NOT* how dismax and edisamx normally work.
>
> While both the "bf" abd "bq" params result in "additive" boosting, and the
> implementation of that "additive boost" comes from adding new optional
> clauses to the top level BooleanQuery that is executed, that only happens
> after the "main" query (from your "q" param) is added to that top level
> BooleanQuery as a "mandaory" clause.
>
> So, for example, "bf=true()" and "bq=*:*" should match & boost every doc,
> but with the techprducts configs/data these requests still don't match
> anything...
>
> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query
>
> ...and if you look at the debug output, the parsed queries shows that the
> "bogus" part of the query is mandatory...
>
> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*)
> FunctionQuery(const(true))
>
> (i didn't use "pf" in that example, but the effect is the same, the "pf"
> based clauses are optional, while the "qf" based clauses are mandatory)
>
> If you compare that example to your debug output, you'll notice a
> difference in structure -- it's a bit hard to see in your example, but if
> you simplify your qf, pf, and q fields it should be more obvious, but
> AFAICT the "main" parts of your query are getting wrapped in an extra
> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in
> the top level query ... i don't see *any* mandatory clauses in your top
> level BooleanQuery, which is why any match on a bf or bq function is
> enough to cause a document to match.
>
> I suspect the reason your parsed query structure is so diff has to do with
> this...
>
> :synonym_edismax>
>
>
> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml?
> 2) what QParserPlugin are you using to implement that?
>
> I suspect whatever QParserPlugin you are using has a bug in it :)
>
>
> If you can't fix the bug, one possibile workaround would be to abandon bf
> and bq params completely, and instead wrap the query it produces in in a
> {!boost} parser with whatever function you want (using functions like
> sum() or prod() to combine multiple functions, and query() to incorporate
> your current bq param).  Doing this will require chanign how you specify
> you input (example below) and it will result in *multiplicitive* boosts --
> so your scores will be much diff, and you will likely have to adjust your
> constants, but: 1) multiplicitive boosts are almost always what people
> *really* want anyway; 2) it will ensure the boosts are only applied for
> things matching your main query, no matter how that query parser works or
> what bugs it has.
>
> Example of using {!boost} to wrap an arbitrary other parser...
>
> instead of...
>   defType=foofoo
>   q=barbarbar
>
> use...
>q={!boost b=$func defType=foofoo v=$qq}
>   qq=barbarbar
> func=sum(something,somethingelse)
>
> https://cwiki.apache.org/confluence/display/solr/Other+Parsers
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
>
>
>
> :
> : What I would like is to return zero results if there is no match for the
> : querystring.  My collection is small enough that I don't care if the
> actual
> : calculation runs on each doc (although that's wasteful) -- I just don't
> : want to see results come back for zero matches to the querystring

Re: Want zero results from SOLR when there are no matches for "querystring"

2016-08-11 Thread Chris Hostetter

: First let me say that this is very possibly the "x - y problem" so let me
: state up front what my ultimate need is -- then I'll ask about the thing I
: imagine might help...  which, of course, is heavily biased in the direction
: of my experience coding Java and writing SQL...

Thank you so much for asking your question this way!

Right off the bat, the background you've provided seems supicious...

: I have a piece of a query that calculates a score based on a "weighting"
...
: The specific line is this:
: product(field(category_weight),20)
: 
: What I just realized is that when I query Solr for a string that has NO
: matches in the entire corpus, I still get a slew of results because EVERY
: doc has the weighting value in the category_weight field - and therefore
: every doc gets some score.

...that is *NOT* how dismax and edisamx normally work.  

While both the "bf" abd "bq" params result in "additive" boosting, and the 
implementation of that "additive boost" comes from adding new optional 
clauses to the top level BooleanQuery that is executed, that only happens 
after the "main" query (from your "q" param) is added to that top level 
BooleanQuery as a "mandaory" clause.

So, for example, "bf=true()" and "bq=*:*" should match & boost every doc, 
but with the techprducts configs/data these requests still don't match 
anything...

/select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query
/select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query

...and if you look at the debug output, the parsed queries shows that the 
"bogus" part of the query is mandatory...

+DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) 
FunctionQuery(const(true))

(i didn't use "pf" in that example, but the effect is the same, the "pf" 
based clauses are optional, while the "qf" based clauses are mandatory)

If you compare that example to your debug output, you'll notice a 
difference in structure -- it's a bit hard to see in your example, but if 
you simplify your qf, pf, and q fields it should be more obvious, but 
AFAICT the "main" parts of your query are getting wrapped in an extra 
layer of parents (ie: an extra BooleanQuery) which is *not* mandatory in 
the top level query ... i don't see *any* mandatory clauses in your top 
level BooleanQuery, which is why any match on a bf or bq function is 
enough to cause a document to match.

I suspect the reason your parsed query structure is so diff has to do with 
this...

:synonym_edismax>


1) how exactly is "synonym_edismax" defined in your solrconfig.xml? 
2) what QParserPlugin are you using to implement that?

I suspect whatever QParserPlugin you are using has a bug in it :)


If you can't fix the bug, one possibile workaround would be to abandon bf 
and bq params completely, and instead wrap the query it produces in in a 
{!boost} parser with whatever function you want (using functions like
sum() or prod() to combine multiple functions, and query() to incorporate 
your current bq param).  Doing this will require chanign how you specify 
you input (example below) and it will result in *multiplicitive* boosts -- 
so your scores will be much diff, and you will likely have to adjust your 
constants, but: 1) multiplicitive boosts are almost always what people 
*really* want anyway; 2) it will ensure the boosts are only applied for 
things matching your main query, no matter how that query parser works or 
what bugs it has.

Example of using {!boost} to wrap an arbitrary other parser...

instead of...
  defType=foofoo
  q=barbarbar

use...
   q={!boost b=$func defType=foofoo v=$qq}
  qq=barbarbar
func=sum(something,somethingelse)

https://cwiki.apache.org/confluence/display/solr/Other+Parsers
https://cwiki.apache.org/confluence/display/solr/Function+Queries




: 
: What I would like is to return zero results if there is no match for the
: querystring.  My collection is small enough that I don't care if the actual
: calculation runs on each doc (although that's wasteful) -- I just don't
: want to see results come back for zero matches to the querystring
: 
: (The /select endpoint does this of course, but my custom endpoint includes
: this "weighting" piece and therefore returns every doc in the corpus
: because they all have the weighting.
: 
: 
: Enter my imagined solution...  The potential X-Y problem...
: 
: 
: So - given that I come from a programming background, I immediately start
: thinking of an if statement ...
: 
:  if(some_score_for_the_primary_search_string) {
:   run_the_category_weight_calculation;
:  } else {
:   do_NOT_run_category_weight_calc;
:  }
: 
: 
: Another way of thinking of it would be something like the "WHERE" clause in
: SQL...
: 
:  run_category_weight_calculation WHERE "searchstring" is found in the
: document, not otherwise.
: 
: I'm aware that things could be handled in the client-side of my web app,
: but if possible, I'd like the interface to SOL