If you want email addresses, UAX29URLEmailAnalyzer is another alternative.
-- Ian. On Wed, Oct 24, 2012 at 3:56 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Yes, by design. StandardAnalyzer implements "simple word boundaries" (the > technical term is "Unicode text segmentation"), period. As the javadoc says, > "As of Lucene version 3.1, this class implements the Word Break rules from > the Unicode Text Segmentation algorithm, as specified in Unicode Standard > Annex #29." That is a "standard". > > See: > http://lucene.apache.org/core/4_0_0-ALPHA/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html > http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/standard/ClassicTokenizer.html > > -- Jack Krupansky > > -----Original Message----- From: kiwi clive > Sent: Wednesday, October 24, 2012 6:42 AM > To: java-user@lucene.apache.org > Subject: StandardAnalyzer functionality change > > > Hi all, > > Sorry if I'm asking an age old question but we have migrated to lucene 3.6.0 > and I see StandardAnalyzer has changed its behaviour, particularly when > tokenizing email addresses. From reading the forums, I understand > StandardAnalyzer was renamed to ClassicAnalyzer - is this the case ? > > > If I pass the string 'u...@domain.com' through these analyzers, I get the > following tokens: > > Using StandardAnalyzer(Version.LUCENE_23): --> u...@domain.com (one token) > > Using StandardAnalyzer(Version.LUCENE_36): --> user domain.com (two > tokens) > Using ClassicAnalyzer(Version.LUCENE_36): --> u...@domain.com (one > token) > > StandardAnalyzer is normally a good compromise as a default analyzer but the > failure to keep an email address intact makes it less fit for purpose than > it used to be. Is this a bug or is it by design ? If by design, what is the > reason for the change and is ClassicAnalyzer now the defacto-analyzer to use > ? > > Thanks, > Clive > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org