[ 
https://issues.apache.org/jira/browse/SOLR-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-1910.
----------------------------------

    Resolution: Won't Fix

2013 Old JIRA cleanup

> Add hl.df (highlight-specific default field) param, so highlighting can have 
> a separate analysis path from search
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-1910
>                 URL: https://issues.apache.org/jira/browse/SOLR-1910
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>    Affects Versions: 1.4
>            Reporter: Chris Harris
>         Attachments: SOLR-1910.patch
>
>
> Summary: Patch adds a hl.df parameter, to help with (some) situations where 
> the highlighter currently uses the "wrong" analyzer for highlighting.
> What: hl.df is like the normal df parameter, except that it takes effect only 
> during highlighting. (In fact the implementation is basically to temporarily 
> mess with the normal df parameter at the start of highlighting, and then  
> revert to the original value when highlighting is complete.) When hl.df is 
> specified, we make sure not to use the Query object that was parsed by 
> QueryComponent, but rather make our own. In the right circumstances anyway, 
> this means that a more appropriate analyzer gets used for highlighting.
> Motivation: Currently, in a normal query+highlighting request, the 
> highlighter re-uses the Query object parsed by the QueryComponent. This can 
> result in incorrect highlights if the field being highlighted is of a 
> different type than the field being queried. In my particular case:
>  * My queries don't explicitly specify field names; they always rely on the 
> default field
>  * My default field for search is "body"
>  * body is a unigram-plus-bigram field. So, e.g. input "audit trail" gets 
> turned into tokens "audit / audit trail / trail". (This is a performance 
> optimzation.)
>  * If I try to highlight directly on "body", the highlights get screwed up. 
> (This is because the highlighter doesn't really support the kind of 
> "continuously overlapping" tokens generated by my analysis chain. In short, 
> the bigrams confuse the TokenGroup class.)
>  * To avoid these highlighting problems, I don't directly highlight "body", 
> but rather a "highlight" field, which has no bigram tokens. ("highlight" is 
> populated from "body" with a copyfield directive.)
>  * Without hl.df, I have a new class of highlighting problems. In particular, 
> if the user enters a phrase search (e.g. "audit trail"), then that phrase 
> appears unhighlighted in the highlighter output. The short version for why is 
> that the analyzer used to parse the query output a Query object that contains 
> bigrams, but the text that we're highlighting doesn't contain bigrams.
>  * With hl.df, the analyzers match up for highlight; the Query object used 
> for highlighting does _not_ contain bigrams, just like the "highlight" field.
> (I realize it may help to expand the description of this use case, but I'm a 
> bit hurried right now.)
> I wanted to throw this out there, partly in case people have any better 
> solutions. One variation on hl.df option that might be worth considering is 
> hl.UseHighlightedFieldAsDefaultField, which would create a new Query object 
> not just once at the start of highlighting, but separately for each 
> particular field that's getting highlighted.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to