[jira] [Commented] (SOLR-3145) Velocity /browse GUI should stick to AND as defaultOperator

Robert Muir (Commented) (JIRA) Tue, 13 Mar 2012 10:41:03 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-3145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228533#comment-13228533
 ]


Robert Muir commented on SOLR-3145:
-----------------------------------

{quote}
But Jan is talking about just changing the default for just an example GUI 
(/browse), and not any query parsers. 
{quote}

I think its pretty important. The problem is that in some languages, someone 
enters a search query with some useless particle
or something and misses documents completely only because of grammatical 
structure.

Also for a lot of languages (e.g. chinese), tokenization into 'query terms' is 
not even close to completely accurate!

{quote}
That's pretty minor - not a big deal either way, but I do think that from a 
"finished product" perspective, more people expect all of their query terms to 
appear in matching documents (and I believe this is how google does it?
{quote}

This is false. Search for 'lucid in imagination' and look for the first result, 
it does not contain the word 'in'. 
This is just an illustration of my point (its hard to come up with examples for 
english), but other examples
would be simple things like searching for U.S.A-China relations and missing 
documents that have U.S.-China relations.

In general most of the stopwords lists we have are very incomplete and minimal: 
I think this is good. But if you choose
to use AND as a default, you need to be much more aggressive about these things.

Also i'm completely failing to mention use cases that do more natural language 
searches (e.g. longer queries) would really
suffer more here. 

Again I think: don't wire the queryparser to force 100% query-term-importance, 
lean on the ranking system to do this.
As i mentioned, its my opinion there are serious problems with lucene's sqrt() 
tf normalization (it grows too fast and does
not represent the information gain of additional term occurrences well), 
causing additional occurences of only a few terms
to blow up the score versus documents that actually do contain all terms: but 
we shouldn't solve that with a hammer like this.

So from a 'finished product' I think it should work reasonably well for as many 
languages and use cases as possible out of box:
it should be generic. This kind of tuning thats specific to only certain use 
cases/languages/configurations is well documented 
(its easy to change the default operator) and not tricky to do.

                
> Velocity /browse GUI should stick to AND as defaultOperator
> -----------------------------------------------------------
>
>                 Key: SOLR-3145
>                 URL: https://issues.apache.org/jira/browse/SOLR-3145
>             Project: Solr
>          Issue Type: Improvement
>          Components: web gui
>    Affects Versions: 4.0
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 4.0
>
>         Attachments: SOLR-3145.patch
>
>
> After SOLR-1889 was committed, the DisMax "mm" parameter defaults to whatever 
> set in q.op. Since defaultOperator in schema.xml is OR, this means that 
> DisMax now defaults to OR (mm=0) instead of the old default (mm=100%). It 
> should stick to AND as before.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-3145) Velocity /browse GUI should stick to AND as defaultOperator

Reply via email to