[
https://issues.apache.org/jira/browse/SOLR-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475109#comment-13475109
]
KuroSaka TeruHiko edited comment on SOLR-3729 at 10/12/12 7:40 PM:
-------------------------------------------------------------------
I have an indirect evidence that *:* is given to the analysis chain and the
tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in
use. Analysis chain (tokenizer+filters) should not be invoked for the special
query "*:*", should it? Probably because my custom tokenizer generates 3 tokens
"*", ":", and "*", I am seeing this in debug output (note a space between "*"
and ":"):
{noformat}
DisjunctionMaxQuery((body:"* : *"~100^0.5...
{noformat}
and the score is not 1.0.
Here are steps to reproduce this using NGramTokenizer. (I'm using
NGramTokenizer in non-realistic way because I couldn't find other Tokenizer
that divides "*:*" into three tokens.)
1. After indexing the Solr sample docs normally, stop the Solr and insert:
{noformat}
<fieldtype name="text_fake" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.NGramTokenizerFactory"
maxGramSize="1"
minGramSize="1" />
</analyzer>
</fieldtype>
{noformat}
2. Replace the field definition for "name":
{noformat}
<field name="name" type="text_fake" indexed="true" stored="true"/>
{noformat}
In solrconfig.xml, change the default search handler's definition like this:
{noformat}
<str name="defType">edismax</str>
<str name="pf">name^0.5</str>
{noformat}
(I guess you could just have these in the URL.)
3. Start Solr and give this URL:
{noformat}
http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on
{noformat}
Hopefully you'll see what I see:
{noformat}
<float name="score">0.3663672</float>
{noformat}
and
{noformat}
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))
{noformat}
in the debug output.
I think two things are wrong here:
(1) The score calculation should not be attempted when the query is *:*.
(2) Even if the score calculation is done, "*:*" shouldn't be passed to
Tokenizers.
If this phenomena is nothing to do with this bug, my apology. I'll file a
separate bug.
This behavior is observed with Solr 3.5.0. I haven't tried Solr 4.0.
was (Author: tkurosaka):
I have an indirect evidence that *:* is given to the analysis chain and the
tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in
use. Analysis chain (tokenizer+filters) should not be invoked for the special
query "*:*", should it? Probably because my custom tokenizer generates 3 tokens
"*", ":", and "*", I am seeing this in debug output (note a space between "*"
and ":"):
{noformat}
DisjunctionMaxQuery((body:"* : *"~100^0.5...
{noformat}
and the score is not 1.0.
Here are steps to reproduce this using NGramTokenizer. (I'm using
NGramTokenizer in non-realistic way because I couldn't find other Tokenizer
that divides "*:*" into three tokens.)
#. After indexing the Solr sample docs normally, stop the Solr and insert:
{noformat}
<fieldtype name="text_fake" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.NGramTokenizerFactory"
maxGramSize="1"
minGramSize="1" />
</analyzer>
</fieldtype>
{noformat}
#. Replace the field definition for "name":
{noformat}
<field name="name" type="text_fake" indexed="true" stored="true"/>
{noformat}
In solrconfig.xml, change the default search handler's definition like this:
{noformat}
<str name="defType">edismax</str>
<str name="pf">name^0.5</str>
{noformat}
(I guess you could just have these in the URL.)
#. Start Solr and give this URL:
{noformat}
http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on
{noformat}
Hopefully you'll see what I see:
{noformat}
<float name="score">0.3663672</float>
{noformat}
and
{noformat}
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))
{noformat}
in the debug output.
I think two things are wrong here:
(1) The score calculation should not be attempted when the query is *:*.
(2) Even if the score calculation is done, "*:*" shouldn't be passed to
Tokenizers.
If this phenomena is nothing to do with this bug, my apology. I'll file a
separate bug.
This behavior is observed with Solr 3.5.0. I haven't tried Solr 4.0.
> ExtendedDismaxQParser (edismax) doesn't parse (*:*) properly
> ------------------------------------------------------------
>
> Key: SOLR-3729
> URL: https://issues.apache.org/jira/browse/SOLR-3729
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Affects Versions: 4.0-BETA
> Reporter: Jack Krupansky
> Attachments: SOLR-3729.patch
>
>
> I just happen to notice that (\*:\*) is not parsed properly by the edismax
> (ExtendedDismaxQParser) query parser in 4.0-beta. It appears to require
> spaces before and after the \*:\*, otherwise it treats the colon as part of a
> wildcard term (see the escaping below). I haven’t tried other releases yet.
> My original query:
> http://localhost:8983/solr/select/?debugQuery=true&q=(*:*)&defType=edismax
> Produces this:
> {code}
> <str name="rawquerystring">(*:*)</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
> <str name="QParser">ExtendedDismaxQParser</str>
> {code}
> Some variations I tried:
> {code}
> <str name="rawquerystring">( *:*)</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
>
> <str name="rawquerystring">(*:* )</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
>
> <str name="rawquerystring">( *:* )</str>
> <str name="parsedquery">(+MatchAllDocsQuery(*:*))/no_coord</str>
> <str name="parsedquery_toString">+*:*</str>
>
> <str name="rawquerystring">(*:* -fox)</str>
> <str name="parsedquery">
> (+(DisjunctionMaxQuery((text:*\:*))
> -DisjunctionMaxQuery((text:fox))))/no_coord
> </str>
> <str name="parsedquery_toString">+((text:*\:*) -(text:fox))</str>
>
> <str name="rawquerystring">( *:* -fox)</str>
> <str name="parsedquery">
> (+(MatchAllDocsQuery(*:*) -DisjunctionMaxQuery((text:fox))))/no_coord
> </str>
> <str name="parsedquery_toString">+(*:* -(text:fox))</str>
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]