[ https://issues.apache.org/jira/browse/LUCENE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless resolved LUCENE-1373. ---------------------------------------- Resolution: Duplicate Dup of LUCENE-2002. > Most of the contributed Analyzers suffer from invalid recognition of acronyms. > ------------------------------------------------------------------------------ > > Key: LUCENE-1373 > URL: https://issues.apache.org/jira/browse/LUCENE-1373 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis, contrib/analyzers > Affects Versions: 2.3.2 > Reporter: Mark Lassau > Priority: Minor > Attachments: LUCENE-1373.patch > > > LUCENE-1068 describes a bug in StandardTokenizer whereby a string like > "www.apache.org." would be incorrectly tokenized as an acronym (note the dot > at the end). > Unfortunately, keeping the "backward compatibility" of a bug turns out to > harm us. > StandardTokenizer has a couple of ways to indicate "fix this bug", but > unfortunately the default behaviour is still to be buggy. > Most of the non-English analyzers provided in lucene-analyzers utilize the > StandardTokenizer, and in v2.3.2 not one of these provides a way to get the > non-buggy behaviour :( > I refer to: > * BrazilianAnalyzer > * CzechAnalyzer > * DutchAnalyzer > * FrenchAnalyzer > * GermanAnalyzer > * GreekAnalyzer > * ThaiAnalyzer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org