KuroSaka TeruHiko created SOLR-3962: ---------------------------------------
Summary: For the match-all-docs query *:*, (e)dismax parser passes "*:*" to tokenizer. Under certain conditions, hit suboptimal (<1.0) score is reported. Key: SOLR-3962 URL: https://issues.apache.org/jira/browse/SOLR-3962 Project: Solr Issue Type: Bug Components: query parsers Affects Versions: 4.0, 3.6, 3.5 Reporter: KuroSaka TeruHiko My understanding is that the special match-all-docs query "\*:\*" shouldn't call tokenizers and all hits should have score 1.0. In fact, this is usually the case. But, when all of these conditions are met, suboptimal (<1.0) hit scores are reported: * dismax or edismax parser is used * a tokenizer that splits "\*:\*" into multiple tokens is used * pf parameter is specified for a field that uses the above tokenizer Use case: * We created a Japanese tokenizer which happens to break "\*:\*" into three tokens representing each symbols. * Our client uses this tokenizer for Japanese with edismax on Solr 3.6. * They have pf=text^0.5 in the default section in solrconfig.xml. * When search is done with the query string "\*:\*", all the hits from Japanese has the score much less than 1.0. Below is how to simulate this situation with a NGramTokenizer. (It is not realistic.) 1. Run Solr with the default setting. Post all *.xml docs in examples/exampledocs. 2. Stop the Solr. 3. Add this fieldType: {noformat} <fieldtype name="text_fake" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.NGramTokenizerFactory" maxGramSize="1" minGramSize="1" /> </analyzer> </fieldtype> {noformat} 4. Change the field definition of "name" to use "text_fake". 5. Restart Solr 6. GET this URL: http://localhost:8983/solr/select?indent=on&version=2.2&q=*%3A*&fq=&start=0&rows=10&fl=*%2Cscore&qt=&wt=&debugQuery=on&defType=edismax&pf=name Below is an excerpt of query debug output. Notice that "\*:\*" is expanded with spaces to "\* : \*": {noformat} ... <doc> <str name="id">ati</str> <str name="compName_s">ATI Technologies</str> <str name="address_s"> 33 Commerce Valley Drive East Thornhill, ON L3T 7N6 Canada </str> <long name="_version_">1415830106362871808</long> <float name="score">0.07443535</float> </doc> </result> <lst name="debug"> <str name="rawquerystring">*:*</str> <str name="querystring">*:*</str> <str name="parsedquery"> (+MatchAllDocsQuery(*:*) DisjunctionMaxQuery((name:"* : *")))/no_coord </str> {noformat} And here is a partial stack trace at the time the tokenizer is called from the query parser: {noformat} NGramTokenizer.incrementToken() line: 112 CachingTokenFilter.fillCache() line: 90 CachingTokenFilter.incrementToken() line: 55 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).newFieldQuery(Analyzer, String, String, boolean) line: 513 ExtendedDismaxQParser$ExtendedSolrQueryParser.newFieldQuery(Analyzer, String, String, boolean) line: 1018 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).getFieldQuery(String, String, boolean) line: 474 ExtendedDismaxQParser$ExtendedSolrQueryParser(SolrQueryParser).getFieldQuery(String, String, boolean) line: 169 ExtendedDismaxQParser$ExtendedSolrQueryParser.getQuery() line: 1163 ExtendedDismaxQParser$ExtendedSolrQueryParser.getAliasedQuery() line: 1105 ExtendedDismaxQParser$ExtendedSolrQueryParser.getQueries(Alias) line: 1145 ExtendedDismaxQParser$ExtendedSolrQueryParser.getAliasedQuery() line: 1073 ExtendedDismaxQParser$ExtendedSolrQueryParser.getFieldQuery(String, String, int) line: 989 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).handleQuotedTerm(String, Token, Token) line: 1082 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Term(String) line: 462 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Clause(String) line: 257 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).Query(String) line: 181 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParser).TopLevelQuery(String) line: 170 ExtendedDismaxQParser$ExtendedSolrQueryParser(QueryParserBase).parse(String) line: 120 ExtendedDismaxQParser.addShingledPhraseQueries(BooleanQuery, List<Clause>, Map<String,Float>, int, float, int) line: 506 ExtendedDismaxQParser.parse() line: 338 ExtendedDismaxQParser(QParser).getQuery() line: 143 QueryComponent.prepare(ResponseBuilder) line: 118 SearchHandler.handleRequestBody(SolrQueryRequest, SolrQueryResponse) line: 192 SearchHandler(RequestHandlerBase).handleRequest(SolrQueryRequest, SolrQueryResponse) line: 129 ... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org