[ https://issues.apache.org/jira/browse/LUCENE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628563#action_12628563 ]
marklassau edited comment on LUCENE-1373 at 9/4/08 11:13 PM: -------------------------------------------------------------- Just discovered LUCENE-1151, which attempts to make StandardAnalyzer NOT be buggy by default. I think if the changes made to StandardAnalyzer here where moved to StandardTokenizer instead, then we would fix this issue. was (Author: marklassau): Just discovered LUCENE-1151, whcihc attempts to make StandardAnalyzer NOT be buggy by default. I think if the changes made to StandardAnalyzer here where moved to StandardTokenizer instead, then we would fix this issue. > Most of the contributed Analyzers suffer from invalid recognition of acronyms. > ------------------------------------------------------------------------------ > > Key: LUCENE-1373 > URL: https://issues.apache.org/jira/browse/LUCENE-1373 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/analyzers > Affects Versions: 2.3.2 > Reporter: Mark Lassau > Priority: Minor > > LUCENE-1068 describes a bug in StandardTokenizer whereby a string like > "www.apache.org." would be incorrectly tokenized as an acronym (note the dot > at the end). > Unfortunately, keeping the "backward compatibility" of a bug turns out to > harm us. > StandardTokenizer has a couple of ways to indicate "fix this bug", but > unfortunately the default behaviour is still to be buggy. > Most of the non-English analyzers provided in lucene-analyzers utilize the > StandardTokenizer, and in v2.3.2 not one of these provides a way to get the > non-buggy behaviour :( > I refer to: > * BrazilianAnalyzer > * CzechAnalyzer > * DutchAnalyzer > * FrenchAnalyzer > * GermanAnalyzer > * GreekAnalyzer > * ThaiAnalyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]