Re: indexing multiple email addresses in one field

Matthew Hall Thu, 30 Jul 2009 11:44:24 -0700

Place a delimiter between the email addresses that doesn't get removedin your analyzer. (preferably something you know will never be searched on)


That way you can ensure that each email matches independently of each other.


So something like

f...@bar.com DELIM123 b...@foo.com DELIM123 c...@bar.foo

Matt


Phil Whelan wrote:

On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
<mh...@informatics.jax.org> wrote:

1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact.  (And yes, the positional information 
for the terms is kept, which is what allows span queries to work)

So searching on the following "foo bar com" will match f...@bar.com but not 
b...@foo.com


Thanks, I really appreciate your help with this. That's great to know.
Can I take this a little further...

If I have "f...@bar.com b...@foo.com c...@bar.foo" and analyze it I get
"foo bar com bar foo com com bar foo", so perhaps I need a different
way of delimiting the emails, as it will match some other combinations
here, eg. f...@com.com which is not one of the emails.

Has anyone done anything similar? I can imagine that one option would
be to filter the returned docs based on the original content of the
string I'm analyzing. Does Lucene do this for me?

Thanks,
Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
Matthew Hall
Software Engineer
Mouse Genome Informatics
mh...@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: indexing multiple email addresses in one field

Reply via email to