: Suppose that I propose a file-type filter to the user, and the user typed
: some keywords, like "hello world". The user gets back results, and he now
: wants to filter those results by select "PDF" from the file-type filter. The
: only query the client application can send to the back-end is "hello world
: +filetype:pdf". But that doesn't work as expected. If queries are run with
: OR operator as the default, then the documents that will be returned are
: those that include filetype:pdf, and may or may not include "hello world".
: This is not what the user expected though.
I'm really not understanding what that example has to do with
minShouldMatch ... the fundemental problem in your example is that if you
start with a query for...
"hello world"
...and then want to restrict it to only docs that also match...
filetype:pdf
...the combined query must have *both* clauses marekd as mandatory...
+"hello world" +filetype:pdf
minShouldMatch doesn't even factor in at all.
Independent of that, if you wnat ot add minShouldMatch support to
QueryParser, there are two fairly straightforward ways to go, depending on
how generalized you wnat support to be...
1) minShouldMatch set on all BooleanQueries (as a function of length)
This is hte appraoch the DisMaxQueryParser in Solr takes ... you override
the getBooleanQuery method in QueryParser, delegate to super, and then
modify the BooleanQuery returned setting minShouldMatch based on some
function of the number of clauses it already contains. the version in
Solr supports a gramer for deciding what it should be relative various
cut-off points as either an absolute number or a percentage...
http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html
2) overload the use of "~" in the parser grammer
instead of adding a new special character to the grammer (i think you
suggested '#') which cuold break back compatibility you might want to
consider modifying the grammer to recognize the '~' character when it
follows a close paren as an indication of minShouldMatch on the boolean
query those parens wrap. Since '~' is currently used for specifying
slop on phrase queries and fuzzyniess on fuzzy queries it's already a
reserved character.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]