Re: Skipping duplicates in DataImportHandler based on uniqueKey

2010-05-03 Thread Marc Sturlese

You can use deduplication to do that. Create the signature based on the
unique field or any field you want.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p772768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Skipping duplicates in DataImportHandler based on uniqueKey

2010-05-03 Thread Andrew Clegg


Marc Sturlese wrote:
 
 You can use deduplication to do that. Create the signature based on the
 unique field or any field you want.
 

Cool, thanks, I hadn't thought of that.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p773268.html
Sent from the Solr - User mailing list archive at Nabble.com.


Skipping duplicates in DataImportHandler based on uniqueKey

2010-05-02 Thread Andrew Clegg

Hi,

Is there a way to get the DataImportHandler to skip already-seen records
rather than reindexing them?

The UpdateHandler has an add overwrite=false ...  capability which (as I
understand it) means that a document whose uniqueKey matches one already in
the index will be skipped instead of overwritten.

Can the DIH be made to behave this way?

If not, would it be an easy patch? This is using the XPathEntityProcessor by
the way.

Thanks,

Andrew.
--
:: http://biotext.org.uk/ ::
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p771559.html
Sent from the Solr - User mailing list archive at Nabble.com.