After careful consideration I have decided that the better approach is to create a separate "global" index in which all messages are stored. This will not only relieve my duplication issue but should also scale better if/when there are several hundred or several thousand distinct indexes.
Thanks,
- JP
----- Original Message ----- From: "PA" <[EMAIL PROTECTED]>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Monday, January 24, 2005 10:43 PM
Subject: Re: Duplicate hits using ParallelMultiSearcher
On Jan 24, 2005, at 09:14, Jason Polites wrote:
I am aware of the Filter object however the unique identifier of my document is a field within the lucene document itself (messageid); and I am reluctant to access this field using the public API for every Hit as I fear it will have drastic performance implications.
Well... I don't see any way around that as you basically want to uniquely identify your messages based on their Message-ID.
That said, you don't need to do it during the search itself. You could simply perform your search as you do now and then create a set of unique messages while preserving Lucene Hits sort ordering for "relevance" purpose.
HTH.
Cheers
-- PA http://alt.textdrive.com/
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]