Thanks Ian What I am currently doing is duplicating the data into 2 different fields and having my own PerFieldAnalyzerWrapper just like you pointed out
Is there a good way to do this in a single-pass? Like how Bi-Grams or Common-Grams do… -- Ravi On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea <ian....@gmail.com> wrote: > Sounds like a job for > org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper. > > > -- > Ian. > > > On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan > <ravikumar.govindara...@gmail.com> wrote: > > We have a requirement in that E-mail addresses need to be added in a > > tokenized form to one field while untokenized form is added to another > field > > > > Ex: > > > > "I have mailed a...@xyz.com" . It should tokenize as below > > > > body = {"I", "have", "mailed", "abc", "xyz", "com"}; > > > > I also have a body-addr field. Tokenizer needs to extract e-mail > addresses > > from body field and add them as below > > > > body-addr = {"a...@xyz.com"} > > > > How to achieve this via tokenizer chain? > > > > -- > > Ravi > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >