Place a delimiter between the email addresses that doesn't get removed
in your analyzer. (preferably something you know will never be searched on)
That way you can ensure that each email matches independently of each other.
So something like
[email protected] DELIM123 [email protected] DELIM123 [email protected]
Matt
Phil Whelan wrote:
On Thu, Jul 30, 2009 at 11:22 AM, Matthew Hall
<[email protected]> wrote:
1. Sure, just have an analyzer that splits on all non letter characters.
2. Phrase queries keep the order intact. (And yes, the positional information
for the terms is kept, which is what allows span queries to work)
So searching on the following "foo bar com" will match [email protected] but not
[email protected]
Thanks, I really appreciate your help with this. That's great to know.
Can I take this a little further...
If I have "[email protected] [email protected] [email protected]" and analyze it I get
"foo bar com bar foo com com bar foo", so perhaps I need a different
way of delimiting the emails, as it will match some other combinations
here, eg. [email protected] which is not one of the emails.
Has anyone done anything similar? I can imagine that one option would
be to filter the returned docs based on the original content of the
string I'm analyzing. Does Lucene do this for me?
Thanks,
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
--
Matthew Hall
Software Engineer
Mouse Genome Informatics
[email protected]
(207) 288-6012
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]