Re: URL/Email tokenizer

Ravikumar Govindarajan Tue, 17 Feb 2015 03:45:07 -0800

Thanks Ian

What I am currently doing is duplicating the data into 2 different fields
and having my own PerFieldAnalyzerWrapper just like you pointed out


Is there a good way to do this in a single-pass? Like how Bi-Grams or
Common-Grams do…

--
Ravi

On Tue, Feb 17, 2015 at 3:08 PM, Ian Lea <[email protected]> wrote:

> Sounds like a job for
> org.apache.lucene.analysis.miscellaneous.PerFieldAnalyzerWrapper.
>
>
> --
> Ian.
>
>
> On Tue, Feb 17, 2015 at 8:51 AM, Ravikumar Govindarajan
> <[email protected]> wrote:
> > We have a requirement in that E-mail addresses need to be added in a
> > tokenized form to one field while untokenized form is added to another
> field
> >
> > Ex:
> >
> > "I have mailed [email protected]" . It should tokenize as below
> >
> > body = {"I", "have", "mailed", "abc", "xyz", "com"};
> >
> > I also have a body-addr field. Tokenizer needs to extract e-mail
> addresses
> > from body field and add them as below
> >
> > body-addr = {"[email protected]"}
> >
> > How to achieve this via tokenizer chain?
> >
> > --
> > Ravi
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: URL/Email tokenizer

Reply via email to