Karl Wolf created SOLR-13367: -------------------------------- Summary: Highlighting fails for Range queries on Multi-valued String fields Key: SOLR-13367 URL: https://issues.apache.org/jira/browse/SOLR-13367 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: highlighter Affects Versions: 7.7.1, 7.5 Environment: RedHat Linux v7
Java 1.8.0_201 Reporter: Karl Wolf Fix For: 5.1 Range queries against multi-valued string fields produces useless highlighting, even though "hl.highlightMultiTerm":"true" I have uncovered what I believe is a bug. At the very lease it is a difference in behavior between Solr v5.1.0 and v7.5.0 (and v7.7.1). I have a multi-valued string Field defined in my schema as: <fieldType name="string" class="solr.StrField" sortMissingLast="true"/> <field name="MyStringField" type="string" indexed="true" stored="true" multiValued="true" /> I am using a query containing a Range clause and I am using highlighting to get the list of values that actually matched the range query. All examples below were using the appropriate Solr Admin Server SolrCore Query page. *************************************************************************** First, a correctly working example of a range query using Solr v5.1.0 which produces useful results: { "responseHeader": { "status": 0, "QTime": 366, "params": { "q": "MyStringField:[A TO B}", "hl": "true", "indent": "true", "hl.preserveMulti": "true", "fl": "MyStringField,MyUniqueID", "hl.requireFieldMatch": "true", "hl.usePhraseHighlighter": "true", "hl.fl": "MyStringField", "wt": "json", "hl.highlightMultiTerm": "true", "_": "1553275722025" } }, "response": { "numFound": 999, "start": 0, "docs": [ { "MyStringField": [ "Stanley, Wendell M.", "Avery, Roy" ], "MyUniqueID": "UniqueID1" }, { "MyStringField": [ "Avery, Roy" ], "MyUniqueID": "UniqueID2" }, *** lots more docs correctly found ] }, *** we get to the highlighting portion of the response *** this indicates which values of each MyStringField *** that actually matched the query "highlighting": { "UniqueID1": { "MyStringField": [ "<em>Avery, Roy</em>" ] }, "UniqueID2": { "MyStringField": [ "<em>Avery, Roy</em>" ] }, "UniqueID3": { "MyStringField": [ "<em>American Institute of Biological Sciences</em>", "<em>Albritton, Errett C.</em>" ] }, ... etc. *** lots more useful highlight values. Note the two matching values *** for document UniqueID3. } *************************************************************************** * THE PROBLEM * Now using newer versions of Solr *************************************************************************** Using the exact same parameters with Solr v7.5.0 or v7.7.1, the top portion of the response is basically the same including the number of documents found { "responseHeader":{ "status":0, "QTime":245, "params":{ "q":"MyStringField:[A TO B}", "hl":"on", "hl.preserveMulti":"true", "fl":"MyUniqueID, MyStringField", "hl.requireFieldMatch":"true", "hl.fl":"MyStringField", "hightlightMultiTerm":"true", "wt":"json", "_":"1553105129887", "usePhraseHighLighter":"true"}}, "response":{"numFound":999,"start":0,"docs":[ *** The problem is with the lighlighting portion of the results, which is effectively empty. *** There is no way to know what values in each document that actually matched the query: "highlighting":{ "UniqueID1":{}, "UniqueID2":{}, "UniqueID3":{}, ... etc. *** NOTE: The source data is the same for all of the tested Solr versions and the Solr indexes *** were properly rebuilt for each Solr version. *************************************************************************** Changing the request to using the "unified" highlighter: "hl.method=unified", the highlighting looks like: "highlighting":{ "UniqueID1":{ "MyStringField":[]}, "UniqueID2":{ "MyStringField":[]}, "UniqueID3":{ "MyStringField":[]}, ... etc. *** The highlighting now properly lists the matching field but still no useful values are listed. *************************************************************************** NOTE: if I change the query from using a Range clause to using a Wildcard query: q="MyStringField:A*" the highlighting is correct in both Solr v7.5.0 and v7.7.1: These are GOOD results! "highlighting":{ "UniqueID1": { "MyStringField": ["<em>Avery, Roy</em>"]}, "UniqueID2": { "MyStringField": ["<em>Avery, Roy</em>"]}, "UniqueID3": { "MyStringField": [ "<em>American Institute of Biological Sciences</em>", "<em>Albritton, Errett C.</em>" ] }, ... etc. *** This makes me think there is some problem with the way a Range query *** feeds the search results to the Solr Highlighter code. *************************************************************************** All attempts to vary the hl specs or any other query parameters do not solve the problem. The wildcard query is my current work around but there still is a problem with range queries: In summary, there is some incompatibility among: 1) A multi-valued string field AND 2) A range query against that field AND 3) The result Highlighting. It is effectively empty. I don't know when this issue was first introduced. I have recently been updating from 5.1.0 to 7.5.0 in one big leap. I have attempted to read through the change logs for the intervening versions but I gave up to save my sanity. You should be able to reproduce this issue using any multi-valued, indexed and stored string field. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org