Re: StandardTokenizer and e-mail

Erik Hatcher Fri, 21 May 2004 16:43:21 -0700

Further on this...

If you are using StandardTokenizer, the token for an e-mail address has the type value of "<EMAIL>", which you could use to pick up specifically in a custom TokenFilter implementation and split it how you like, passing through everything else. Take a look at StandardFilter's source code for an example of keying off the types emitted by StandardTokenizer.

        Erik


On May 21, 2004, at 11:50 AM, Otis Gospodnetic wrote:

Si, si.
Write your own TokenFilter sub-class that overrides next() and extracts
those other elements/tokens from an email address token and uses
Token's setPositionIncrement(0) to store the extracted tokens in the
same position as the original email.

Otis

--- Albert Vila <[EMAIL PROTECTED]> wrote:

Hi all,

I want to achieve the following, when I indexing the
'[EMAIL PROTECTED]',
I want to index the '[EMAIL PROTECTED]' token, then the 'xyz' token,
the
'company' token and the 'com'token.
This way, you'll be able to find the document searching for
'[EMAIL PROTECTED]', for 'xyz' only, or for 'company' only.

How can I achieve that?, I need to write my own tokenizer?

Thanks
Albert

--
Albert Vila
Director de proyectos I+D
http://www.imente.com
902 933 242
[iMente �La informaci�n con m�s beneficios�]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: StandardTokenizer and e-mail

Reply via email to