[ https://issues.apache.org/jira/browse/SOLR-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228533#comment-13228533 ]
Robert Muir commented on SOLR-3145: ----------------------------------- {quote} But Jan is talking about just changing the default for just an example GUI (/browse), and not any query parsers. {quote} I think its pretty important. The problem is that in some languages, someone enters a search query with some useless particle or something and misses documents completely only because of grammatical structure. Also for a lot of languages (e.g. chinese), tokenization into 'query terms' is not even close to completely accurate! {quote} That's pretty minor - not a big deal either way, but I do think that from a "finished product" perspective, more people expect all of their query terms to appear in matching documents (and I believe this is how google does it? {quote} This is false. Search for 'lucid in imagination' and look for the first result, it does not contain the word 'in'. This is just an illustration of my point (its hard to come up with examples for english), but other examples would be simple things like searching for U.S.A-China relations and missing documents that have U.S.-China relations. In general most of the stopwords lists we have are very incomplete and minimal: I think this is good. But if you choose to use AND as a default, you need to be much more aggressive about these things. Also i'm completely failing to mention use cases that do more natural language searches (e.g. longer queries) would really suffer more here. Again I think: don't wire the queryparser to force 100% query-term-importance, lean on the ranking system to do this. As i mentioned, its my opinion there are serious problems with lucene's sqrt() tf normalization (it grows too fast and does not represent the information gain of additional term occurrences well), causing additional occurences of only a few terms to blow up the score versus documents that actually do contain all terms: but we shouldn't solve that with a hammer like this. So from a 'finished product' I think it should work reasonably well for as many languages and use cases as possible out of box: it should be generic. This kind of tuning thats specific to only certain use cases/languages/configurations is well documented (its easy to change the default operator) and not tricky to do. > Velocity /browse GUI should stick to AND as defaultOperator > ----------------------------------------------------------- > > Key: SOLR-3145 > URL: https://issues.apache.org/jira/browse/SOLR-3145 > Project: Solr > Issue Type: Improvement > Components: web gui > Affects Versions: 4.0 > Reporter: Jan Høydahl > Assignee: Jan Høydahl > Fix For: 4.0 > > Attachments: SOLR-3145.patch > > > After SOLR-1889 was committed, the DisMax "mm" parameter defaults to whatever > set in q.op. Since defaultOperator in schema.xml is OR, this means that > DisMax now defaults to OR (mm=0) instead of the old default (mm=100%). It > should stick to AND as before. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org