[ 
https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628775#comment-13628775
 ] 

Holger Floerke commented on SOLR-4686:
--------------------------------------

"""
Have you seen the XmlCharFilter on SOLR-2597 ?
"""
No, but this is a two year old bug report never reached a release...

You are right for phrase highlighting. I didn't think about that, this is a 
point where HTMLStripCharFilter (or XMLCharFilter) does not have any chance.

Regarding to the high volume of unresolved bugs for solr, I would suggest to 
close this bug as "won't change". I will reopen it, if I have a good idea on 
this issue.
                
> HTMLStripCharFilter and Highlighter generates invalid HTML
> ----------------------------------------------------------
>
>                 Key: SOLR-4686
>                 URL: https://issues.apache.org/jira/browse/SOLR-4686
>             Project: Solr
>          Issue Type: Bug
>          Components: highlighter
>    Affects Versions: 4.1
>            Reporter: Holger Floerke
>              Labels: HTML, highlighter
>
> Using the HTMLStripCharFilter may yield to an invalid HTML highlight.
> The HTMLStripCharFilter has a special treatment of inline-elements (eg. "a", 
> "b", ...). For theese elements the CharFilter ignores the tag and does not 
> insert any split-character.
> If you index
> """
> <a>xxx</a>
> """
> you get the word "xxx" starting at position 3 ending on position 10(!) 
> If you highlight a search on "xxx", you will get
> """
> <a><em>xxx</a></em>
> """
> which is invalid HTML.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to