Hello all,
is there a filter I can use to remove emails from a TokenStream?
so far I'm using this to remove numbers, URls, and I would like to remove
emails too:
Tokenizer tokenizer = new UAX29URLEmailTokenizer(Version.LUCENE_43,
new StringReader(text));
Set<String> stopTypes = new HashSet<String>();
stopTypes.add("<URL>");
stopTypes.add("<NUM>");
TokenStream stream = new TypeTokenFilter(true, tokenizer, stopTypes);
stream = new StandardFilter( Version.LUCENE_43, stream );
stream = new LowerCaseFilter(Version.LUCENE_43, stream);
Thanks a million!
Best