[ https://issues.apache.org/jira/browse/LUCENE-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Rowe updated LUCENE-4185: -------------------------------- Description: Debugging one of my test cases, I found that a TokenStream from an Analyzer constructed by Solr contains the configured chain of CharFilters twice. While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for LUCENE-3721 unnecessary, and the combination of the fixes results in the repeated application of the CharFilters. I came across this with a test case involving an HTMLStripCharFilter, where the input string contains "&lt;h1>". After passing through one HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the second filter. was: Debugging one of my test cases, I found that a TokenStream from an Analyzer constructed by Solr contains the configured chain of CharFilters twice. While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for LUCENE-3721 unnecessary, and the combination of the fixes results in the repeated application of the CharFilters. I came across this with a test case involving an HTMLStripCharFilter, where the input string contains "<h1>". After passing through one HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the second filter. (edited description to escape the ampersand in "&lt;h1>" so that JIRA readers understand the problem) > CharFilters being added twice in Solr > ------------------------------------- > > Key: LUCENE-4185 > URL: https://issues.apache.org/jira/browse/LUCENE-4185 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 4.0-ALPHA > Reporter: Michael Froh > > Debugging one of my test cases, I found that a TokenStream from an Analyzer > constructed by Solr contains the configured chain of CharFilters twice. > While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for > LUCENE-3721 unnecessary, and the combination of the fixes results in the > repeated application of the CharFilters. > I came across this with a test case involving an HTMLStripCharFilter, where > the input string contains "&lt;h1>". After passing through one > HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the > second filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org