Erick Erickson created SOLR-4516:
------------------------------------

             Summary: Highlighting while querying on field:* highlights every 
value in the field.
                 Key: SOLR-4516
                 URL: https://issues.apache.org/jira/browse/SOLR-4516
             Project: Solr
          Issue Type: Improvement
            Reporter: Erick Erickson
            Priority: Minor


A query like 
q=*:*&hl=on&.....

doesn't attempt to highlight anything, as well it shouldn't. But 
q=field1:*&hl=on&...

does try to highlight. Of course it highlights every last term in the highlight 
fields, and is also very slow. 

Re-forming the query as 
q=*:*&fq=field1:*&hl=on&.... 
gets around the problem and is a better query anyway, but it still seems like 
trying to highlight in the above case is wrong.

Comments from the dev list

Jack Krupansky:
If you want to add a highlight option to suppress or limit highlighting for 
wildcard terms (or any multi-term query, including fuzzy query), that would 
seem reasonable, but I’d hate to lose the highlighting for useful wildcards 
such as field1:invest*.
 
Maybe if it was something like &hl.maxMultiTerms=15, that would provide the 
best of both worlds – a reasonable default to prevent really slow highlighting, 
but still give reasonable highlighting in reasonable cases, and give you the 
ultimate control to completely turn off all multi-term expansion highlighting 
if you so choose.


Me:
I was mostly thinking of this specific case, but a more general solution makes 
sense. I can still argue that the case of field:* shouldn't ever try to 
highlight, but field:some* could, as you say, actually be useful....

Mostly I'm drawing attention to the difference between *:* and field:*. I think 
we should be consistent across both.

Jack:
Could I subvert your “fix” by writing field1:* as field1:** or field1:?* ?
 
*:* is simply a shorthand for “MatchAllDocs”, with no implication that it is 
referencing any field values, while field1:* is an explicit wildcard query, so 
they are not really comparable other than at a superficial lexical level.
 
That said, somewhere there is a Jira that I filed that attempts to have * 
treated as a faster filter query for matching all docs that have any value 
(non-null) in a field. Your proposal makes more sense in that context since it 
is clear that * is semantically distinct from a true wildcard.
 
Back to my question above, I think it’s okay if only strict single-asterisk 
wildcard is covered by your change. Any other wildcard or fuzzy query would 
continue to behave as before – although adding my suggested limit on term 
expansion might still be worthwhile. And I might still argue that your fix 
should be an option even if the default is as you have suggested.
 
But, all these comments should be placed on a Jira!


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to