Nice idea John - one I hadn't considered.  Once you have the checksum, do
you 'check' in the index first before storing the second document?  Or do
you filter on the query side?

Jerry Jalenak
Senior Programmer / Analyst, Web Publishing
LabOne, Inc.
10101 Renner Blvd.
Lenexa, KS  66219
(913) 577-1496

[EMAIL PROTECTED]


-----Original Message-----
From: John Haxby [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 01, 2005 9:06 AM
To: Lucene Users List
Subject: Re: Duplicate Hits


Jerry Jalenak wrote:

>Given Erik's response of 'don't put duplicate documents in the index', how
>can I accomplish this in the IndexWriter?
>  
>
I was dealing with a similar requirement recently.   I eventually 
decided on storing the MD5 checksum of the document as a keyword.   It 
means reading it twice (once to calculate the checksum, once to index 
it), but it seems to do the trick.

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


This transmission (and any information attached to it) may be confidential and
is intended solely for the use of the individual or entity to which it is
addressed. If you are not the intended recipient or the person responsible for
delivering the transmission to the intended recipient, be advised that you
have received this transmission in error and that any use, dissemination,
forwarding, printing, or copying of this information is strictly prohibited.
If you have received this transmission in error, please immediately notify
LabOne at the following email address: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to