[ https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628775#comment-13628775 ]
Holger Floerke commented on SOLR-4686: -------------------------------------- """ Have you seen the XmlCharFilter on SOLR-2597 ? """ No, but this is a two year old bug report never reached a release... You are right for phrase highlighting. I didn't think about that, this is a point where HTMLStripCharFilter (or XMLCharFilter) does not have any chance. Regarding to the high volume of unresolved bugs for solr, I would suggest to close this bug as "won't change". I will reopen it, if I have a good idea on this issue. > HTMLStripCharFilter and Highlighter generates invalid HTML > ---------------------------------------------------------- > > Key: SOLR-4686 > URL: https://issues.apache.org/jira/browse/SOLR-4686 > Project: Solr > Issue Type: Bug > Components: highlighter > Affects Versions: 4.1 > Reporter: Holger Floerke > Labels: HTML, highlighter > > Using the HTMLStripCharFilter may yield to an invalid HTML highlight. > The HTMLStripCharFilter has a special treatment of inline-elements (eg. "a", > "b", ...). For theese elements the CharFilter ignores the tag and does not > insert any split-character. > If you index > """ > <a>xxx</a> > """ > you get the word "xxx" starting at position 3 ending on position 10(!) > If you highlight a search on "xxx", you will get > """ > <a><em>xxx</a></em> > """ > which is invalid HTML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org