Further on this...
If you are using StandardTokenizer, the token for an e-mail address has the type value of "<EMAIL>", which you could use to pick up specifically in a custom TokenFilter implementation and split it how you like, passing through everything else. Take a look at StandardFilter's source code for an example of keying off the types emitted by StandardTokenizer.
Erik
On May 21, 2004, at 11:50 AM, Otis Gospodnetic wrote:
Si, si. Write your own TokenFilter sub-class that overrides next() and extracts those other elements/tokens from an email address token and uses Token's setPositionIncrement(0) to store the extracted tokens in the same position as the original email.
Otis
--- Albert Vila <[EMAIL PROTECTED]> wrote:Hi all,
I want to achieve the following, when I indexing the '[EMAIL PROTECTED]', I want to index the '[EMAIL PROTECTED]' token, then the 'xyz' token, the 'company' token and the 'com'token. This way, you'll be able to find the document searching for '[EMAIL PROTECTED]', for 'xyz' only, or for 'company' only.
How can I achieve that?, I need to write my own tokenizer?
Thanks Albert
-- Albert Vila Director de proyectos I+D http://www.imente.com 902 933 242 [iMente �La informaci�n con m�s beneficios�]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
