[ https://issues.apache.org/jira/browse/LUCENE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless resolved LUCENE-1118. ---------------------------------------- Resolution: Fixed > core analyzers should not produce tokens > N (100?) characters in length > ------------------------------------------------------------------------ > > Key: LUCENE-1118 > URL: https://issues.apache.org/jira/browse/LUCENE-1118 > Project: Lucene - Java > Issue Type: Improvement > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Attachments: LUCENE-1118.patch > > > Discussion that led to this: > http://www.gossamer-threads.com/lists/lucene/java-dev/56103 > I believe nearly any time a token > 100 characters in length is > produced, it's a bug in the analysis that the user is not aware of. > These long tokens cause all sorts of problems, downstream, so it's > best to catch them early at the source. > We can accomplish this by tacking on a LengthFilter onto the chains > for StandardAnalyzer, SimpleAnalyzer, WhitespaceAnalyzer, etc. > Should we do this in 2.3? I realize this is technically a break in > backwards compatibility, however, I think it must be incredibly rare > that this break would in fact break something real in the application? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]