[ https://issues.apache.org/jira/browse/LUCENE-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-4185: -------------------------------- Attachment: LUCENE-4185.patch Thanks for reporting this: you are right, TokenizerChain has a bug where it wraps the already-wrapped reader. Here's a patch. > CharFilters being added twice in Solr > ------------------------------------- > > Key: LUCENE-4185 > URL: https://issues.apache.org/jira/browse/LUCENE-4185 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis > Affects Versions: 4.0-ALPHA > Reporter: Michael Froh > Attachments: LUCENE-4185.patch > > > Debugging one of my test cases, I found that a TokenStream from an Analyzer > constructed by Solr contains the configured chain of CharFilters twice. > While I may be mistaken, the fix for LUCENE-4142 appears to make the fix for > LUCENE-3721 unnecessary, and the combination of the fixes results in the > repeated application of the CharFilters. > I came across this with a test case involving an HTMLStripCharFilter, where > the input string contains "&lt;h1>". After passing through one > HTMLStripCharFilter, it becomes "<h1>", and then the HTML is removed by the > second filter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org