Erick Erickson created SOLR-4516:
------------------------------------
Summary: Highlighting while querying on field:* highlights every
value in the field.
Key: SOLR-4516
URL: https://issues.apache.org/jira/browse/SOLR-4516
Project: Solr
Issue Type: Improvement
Reporter: Erick Erickson
Priority: Minor
A query like
q=*:*&hl=on&.....
doesn't attempt to highlight anything, as well it shouldn't. But
q=field1:*&hl=on&...
does try to highlight. Of course it highlights every last term in the highlight
fields, and is also very slow.
Re-forming the query as
q=*:*&fq=field1:*&hl=on&....
gets around the problem and is a better query anyway, but it still seems like
trying to highlight in the above case is wrong.
Comments from the dev list
Jack Krupansky:
If you want to add a highlight option to suppress or limit highlighting for
wildcard terms (or any multi-term query, including fuzzy query), that would
seem reasonable, but I’d hate to lose the highlighting for useful wildcards
such as field1:invest*.
Maybe if it was something like &hl.maxMultiTerms=15, that would provide the
best of both worlds – a reasonable default to prevent really slow highlighting,
but still give reasonable highlighting in reasonable cases, and give you the
ultimate control to completely turn off all multi-term expansion highlighting
if you so choose.
Me:
I was mostly thinking of this specific case, but a more general solution makes
sense. I can still argue that the case of field:* shouldn't ever try to
highlight, but field:some* could, as you say, actually be useful....
Mostly I'm drawing attention to the difference between *:* and field:*. I think
we should be consistent across both.
Jack:
Could I subvert your “fix” by writing field1:* as field1:** or field1:?* ?
*:* is simply a shorthand for “MatchAllDocs”, with no implication that it is
referencing any field values, while field1:* is an explicit wildcard query, so
they are not really comparable other than at a superficial lexical level.
That said, somewhere there is a Jira that I filed that attempts to have *
treated as a faster filter query for matching all docs that have any value
(non-null) in a field. Your proposal makes more sense in that context since it
is clear that * is semantically distinct from a true wildcard.
Back to my question above, I think it’s okay if only strict single-asterisk
wildcard is covered by your change. Any other wildcard or fuzzy query would
continue to behave as before – although adding my suggested limit on term
expansion might still be worthwhile. And I might still argue that your fix
should be an option even if the default is as you have suggested.
But, all these comments should be placed on a Jira!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]