[ https://issues.apache.org/jira/browse/SOLR-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475109#comment-13475109 ]
KuroSaka TeruHiko edited comment on SOLR-3729 at 10/12/12 7:40 PM: ------------------------------------------------------------------- I have an indirect evidence that *:* is given to the analysis chain and the tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in use. Analysis chain (tokenizer+filters) should not be invoked for the special query "*:*", should it? Probably because my custom tokenizer generates 3 tokens "*", ":", and "*", I am seeing this in debug output (note a space between "*" and ":"): {noformat} DisjunctionMaxQuery((body:"* : *"~100^0.5... {noformat} and the score is not 1.0. Here are steps to reproduce this using NGramTokenizer. (I'm using NGramTokenizer in non-realistic way because I couldn't find other Tokenizer that divides "*:*" into three tokens.) 1. After indexing the Solr sample docs normally, stop the Solr and insert: {noformat} <fieldtype name="text_fake" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.NGramTokenizerFactory" maxGramSize="1" minGramSize="1" /> </analyzer> </fieldtype> {noformat} 2. Replace the field definition for "name": {noformat} <field name="name" type="text_fake" indexed="true" stored="true"/> {noformat} In solrconfig.xml, change the default search handler's definition like this: {noformat} <str name="defType">edismax</str> <str name="pf">name^0.5</str> {noformat} (I guess you could just have these in the URL.) 3. Start Solr and give this URL: {noformat} http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on {noformat} Hopefully you'll see what I see: {noformat} <float name="score">0.3663672</float> {noformat} and {noformat} +MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5)) {noformat} in the debug output. I think two things are wrong here: (1) The score calculation should not be attempted when the query is *:*. (2) Even if the score calculation is done, "*:*" shouldn't be passed to Tokenizers. If this phenomena is nothing to do with this bug, my apology. I'll file a separate bug. This behavior is observed with Solr 3.5.0. I haven't tried Solr 4.0. was (Author: tkurosaka): I have an indirect evidence that *:* is given to the analysis chain and the tokens are concatenated together and fed to DisjunctionMaxQuery when pf= is in use. Analysis chain (tokenizer+filters) should not be invoked for the special query "*:*", should it? Probably because my custom tokenizer generates 3 tokens "*", ":", and "*", I am seeing this in debug output (note a space between "*" and ":"): {noformat} DisjunctionMaxQuery((body:"* : *"~100^0.5... {noformat} and the score is not 1.0. Here are steps to reproduce this using NGramTokenizer. (I'm using NGramTokenizer in non-realistic way because I couldn't find other Tokenizer that divides "*:*" into three tokens.) #. After indexing the Solr sample docs normally, stop the Solr and insert: {noformat} <fieldtype name="text_fake" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.NGramTokenizerFactory" maxGramSize="1" minGramSize="1" /> </analyzer> </fieldtype> {noformat} #. Replace the field definition for "name": {noformat} <field name="name" type="text_fake" indexed="true" stored="true"/> {noformat} In solrconfig.xml, change the default search handler's definition like this: {noformat} <str name="defType">edismax</str> <str name="pf">name^0.5</str> {noformat} (I guess you could just have these in the URL.) #. Start Solr and give this URL: {noformat} http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on {noformat} Hopefully you'll see what I see: {noformat} <float name="score">0.3663672</float> {noformat} and {noformat} +MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *"^0.5)) {noformat} in the debug output. I think two things are wrong here: (1) The score calculation should not be attempted when the query is *:*. (2) Even if the score calculation is done, "*:*" shouldn't be passed to Tokenizers. If this phenomena is nothing to do with this bug, my apology. I'll file a separate bug. This behavior is observed with Solr 3.5.0. I haven't tried Solr 4.0. > ExtendedDismaxQParser (edismax) doesn't parse (*:*) properly > ------------------------------------------------------------ > > Key: SOLR-3729 > URL: https://issues.apache.org/jira/browse/SOLR-3729 > Project: Solr > Issue Type: Bug > Components: query parsers > Affects Versions: 4.0-BETA > Reporter: Jack Krupansky > Attachments: SOLR-3729.patch > > > I just happen to notice that (\*:\*) is not parsed properly by the edismax > (ExtendedDismaxQParser) query parser in 4.0-beta. It appears to require > spaces before and after the \*:\*, otherwise it treats the colon as part of a > wildcard term (see the escaping below). I haven’t tried other releases yet. > My original query: > http://localhost:8983/solr/select/?debugQuery=true&q=(*:*)&defType=edismax > Produces this: > {code} > <str name="rawquerystring">(*:*)</str> > <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str> > <str name="parsedquery_toString">+(text:*\:*)</str> > <str name="QParser">ExtendedDismaxQParser</str> > {code} > Some variations I tried: > {code} > <str name="rawquerystring">( *:*)</str> > <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str> > <str name="parsedquery_toString">+(text:*\:*)</str> > > <str name="rawquerystring">(*:* )</str> > <str name="parsedquery">(+DisjunctionMaxQuery((text:*\:*)))/no_coord</str> > <str name="parsedquery_toString">+(text:*\:*)</str> > > <str name="rawquerystring">( *:* )</str> > <str name="parsedquery">(+MatchAllDocsQuery(*:*))/no_coord</str> > <str name="parsedquery_toString">+*:*</str> > > <str name="rawquerystring">(*:* -fox)</str> > <str name="parsedquery"> > (+(DisjunctionMaxQuery((text:*\:*)) > -DisjunctionMaxQuery((text:fox))))/no_coord > </str> > <str name="parsedquery_toString">+((text:*\:*) -(text:fox))</str> > > <str name="rawquerystring">( *:* -fox)</str> > <str name="parsedquery"> > (+(MatchAllDocsQuery(*:*) -DisjunctionMaxQuery((text:fox))))/no_coord > </str> > <str name="parsedquery_toString">+(*:* -(text:fox))</str> > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org