Hi Kumaran, WordDelimiterGraphFilter with PRESERVE_ORIGINAL should do what you want: <http://lucene.apache.org/core/6_6_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterGraphFilter.html>.
Here’s a test I added to TestWordDelimiterGraphFilter.java that passed for me: ----- public void testEmail() throws Exception { final int flags = GENERATE_WORD_PARTS | GENERATE_NUMBER_PARTS | SPLIT_ON_CASE_CHANGE | SPLIT_ON_NUMERICS | PRESERVE_ORIGINAL; Analyzer a = new Analyzer() { @Override public TokenStreamComponents createComponents(String field) { Tokenizer tokenizer = new MockTokenizer(MockTokenizer.WHITESPACE, false); return new TokenStreamComponents(tokenizer, new WordDelimiterGraphFilter(tokenizer, flags, null)); } }; assertAnalyzesTo(a, "will.sm...@yahoo.com", new String[] { "will.sm...@yahoo.com", "will", "smith", "yahoo", "com" }, null, null, null, new int[] { 1, 0, 1, 1, 1 }, null, false); a.close(); } ----- -- Steve www.lucidworks.com > On Jun 15, 2017, at 8:53 AM, Kumaran Ramasubramanian <kums....@gmail.com> > wrote: > > Hi All, > > i want to index email fields as both analyzed and not analyzed using custom > analyzer. > > for example, > sm...@yahoo.com > will.sm...@yahoo.com > > that is, indexing sm...@yahoo.com as single token as well as analyzed > tokens in same email field... > > > My existing custom analyzer, > > public class CustomSearchAnalyzer extends StopwordAnalyzerBase > { > > public CustomSearchAnalyzer(Version matchVersion, Reader stopwords) > throws Exception > { > super(matchVersion, loadStopwordSet(stopwords, matchVersion)); > } > > @Override > protected Analyzer.TokenStreamComponents createComponents(final String > fieldName, final Reader reader) > { > final ClassicTokenizer src = new ClassicTokenizer(getVersion(), > reader); > src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH); > TokenStream tok = new ClassicFilter(src); > tok = new LowerCaseFilter(getVersion(), tok); > tok = new StopFilter(getVersion(), tok, stopwords); > tok = new ASCIIFoldingFilter(tok); // to enable AccentInsensitive > search > > return new Analyzer.TokenStreamComponents(src, tok) > { > @Override > protected void setReader(final Reader reader) throws IOException > { > > src.setMaxTokenLength(ClassicAnalyzer.DEFAULT_MAX_TOKEN_LENGTH); > super.setReader(reader); > } > }; > } > } > > > And so i want to achieve like, > > 1.if i search using query "sm...@yahoo.com", records with > will.sm...@yahoo.com should not come... > 2.Also i should be able to search using query "smith" in that field > 3.if possible, should be able to detect email values in all other fields > and apply the same type of tokenization > > How to achieve point 1 and 2 using UAX29URLEmailTokenizer? how to add > UAX29URLEmailTokenizer in my existing custom analyzer without using email > analyzer ( perfieldanalyzer ) for email field.. And so i can apply this > tokenizer for email terms of all fields.. > > > > - > Kumaran R --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org