[ 
https://issues.apache.org/jira/browse/SOLR-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475109#comment-13475109
 ] 

KuroSaka TeruHiko edited comment on SOLR-3729 at 10/12/12 7:40 PM:
-------------------------------------------------------------------

I have an indirect evidence that *:* is given to the analysis chain and the 
tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in 
use.  Analysis chain (tokenizer+filters) should not be invoked for the special 
query "*:*", should it? Probably because my custom tokenizer generates 3 tokens 
"*", ":", and "*", I am seeing this in debug output (note a space between "*" 
and ":"):
{noformat}
DisjunctionMaxQuery((body:"* : *"~100^0.5...
{noformat}
and the score is not 1.0.

Here are steps to reproduce this using NGramTokenizer. (I'm using 
NGramTokenizer in non-realistic way because I couldn't find other Tokenizer 
that divides "*:*" into three tokens.)

1. After indexing the Solr sample docs normally, stop the Solr and insert:
{noformat}
    <fieldtype name="text_fake" class="solr.TextField" 
positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.NGramTokenizerFactory"
           maxGramSize="1"
           minGramSize="1" />
      </analyzer>
    </fieldtype>
{noformat}

2. Replace the field definition for "name":
{noformat}
   <field name="name" type="text_fake" indexed="true" stored="true"/>
{noformat}

In solrconfig.xml, change the default search handler's definition like this:
{noformat}

           <str name="defType">edismax</str>
         <str name="pf">name^0.5</str>
{noformat}
(I guess you could just have these in the URL.)

3. Start Solr and give this URL:
{noformat}
http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on
{noformat}

Hopefully you'll see what I see:
{noformat}
<float name="score">0.3663672</float>
{noformat}
and
{noformat}
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))
{noformat}
in the debug output.

I think two things are wrong here:
(1) The score calculation should not be attempted when the query is *:*.
(2) Even if the score calculation is done, "*:*" shouldn't be passed to 
Tokenizers.


If this phenomena is nothing to do with this bug, my apology.  I'll file a 
separate bug.
This behavior is observed with Solr 3.5.0.  I haven't tried Solr 4.0.

                
      was (Author: tkurosaka):
    I have an indirect evidence that *:* is given to the analysis chain and the 
tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in 
use.  Analysis chain (tokenizer+filters) should not be invoked for the special 
query "*:*", should it? Probably because my custom tokenizer generates 3 tokens 
"*", ":", and "*", I am seeing this in debug output (note a space between "*" 
and ":"):
{noformat}
DisjunctionMaxQuery((body:"* : *"~100^0.5...
{noformat}
and the score is not 1.0.

Here are steps to reproduce this using NGramTokenizer. (I'm using 
NGramTokenizer in non-realistic way because I couldn't find other Tokenizer 
that divides "*:*" into three tokens.)

#. After indexing the Solr sample docs normally, stop the Solr and insert:
{noformat}
    <fieldtype name="text_fake" class="solr.TextField" 
positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.NGramTokenizerFactory"
           maxGramSize="1"
           minGramSize="1" />
      </analyzer>
    </fieldtype>
{noformat}

#. Replace the field definition for "name":
{noformat}
   <field name="name" type="text_fake" indexed="true" stored="true"/>
{noformat}

In solrconfig.xml, change the default search handler's definition like this:
{noformat}

           <str name="defType">edismax</str>
         <str name="pf">name^0.5</str>
{noformat}
(I guess you could just have these in the URL.)

#. Start Solr and give this URL:
{noformat}
http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on
{noformat}

Hopefully you'll see what I see:
{noformat}
<float name="score">0.3663672</float>
{noformat}
and
{noformat}
+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5))
{noformat}
in the debug output.

I think two things are wrong here:
(1) The score calculation should not be attempted when the query is *:*.
(2) Even if the score calculation is done, "*:*" shouldn't be passed to 
Tokenizers.


If this phenomena is nothing to do with this bug, my apology.  I'll file a 
separate bug.
This behavior is observed with Solr 3.5.0.  I haven't tried Solr 4.0.

                  
> ExtendedDismaxQParser (edismax) doesn't parse (*:*) properly
> ------------------------------------------------------------
>
>                 Key: SOLR-3729
>                 URL: https://issues.apache.org/jira/browse/SOLR-3729
>             Project: Solr
>          Issue Type: Bug
>          Components: query parsers
>    Affects Versions: 4.0-BETA
>            Reporter: Jack Krupansky
>         Attachments: SOLR-3729.patch
>
>
> I just happen to notice that (\*:\*) is not parsed properly by the edismax 
> (ExtendedDismaxQParser) query parser in 4.0-beta. It appears to require 
> spaces before and after the \*:\*, otherwise it treats the colon as part of a 
> wildcard term (see the escaping below). I haven’t tried other releases yet.
> My original query:
> http://localhost:8983/solr/select/?debugQuery=true&q=(*:*)&defType=edismax
> Produces this:
> {code}
> <str name="rawquerystring">(*:*)</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
> <str name="QParser">ExtendedDismaxQParser</str>
> {code}
> Some variations I tried:
> {code}
> <str name="rawquerystring">( *:*)</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
>  
> <str name="rawquerystring">(*:* )</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str>
> <str name="parsedquery_toString">+(text:*\:*)</str>
>  
> <str name="rawquerystring">( *:* )</str>
> <str name="parsedquery">(+MatchAllDocsQuery(*:*))/no_coord</str>
> <str name="parsedquery_toString">+*:*</str>
>  
> <str name="rawquerystring">(*:* -fox)</str>
> <str name="parsedquery">
> (+(DisjunctionMaxQuery((text:*\:*)) 
> -DisjunctionMaxQuery((text:fox))))/no_coord
> </str>
> <str name="parsedquery_toString">+((text:*\:*) -(text:fox))</str>
>  
> <str name="rawquerystring">( *:* -fox)</str>
> <str name="parsedquery">
> (+(MatchAllDocsQuery(*:*) -DisjunctionMaxQuery((text:fox))))/no_coord
> </str>
> <str name="parsedquery_toString">+(*:* -(text:fox))</str>
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to