Re: Skipping duplicates in DataImportHandler based on uniqueKey
You can use deduplication to do that. Create the signature based on the unique field or any field you want. -- View this message in context: http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p772768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Skipping duplicates in DataImportHandler based on uniqueKey
Marc Sturlese wrote: You can use deduplication to do that. Create the signature based on the unique field or any field you want. Cool, thanks, I hadn't thought of that. -- View this message in context: http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p773268.html Sent from the Solr - User mailing list archive at Nabble.com.
Skipping duplicates in DataImportHandler based on uniqueKey
Hi, Is there a way to get the DataImportHandler to skip already-seen records rather than reindexing them? The UpdateHandler has an add overwrite=false ... capability which (as I understand it) means that a document whose uniqueKey matches one already in the index will be skipped instead of overwritten. Can the DIH be made to behave this way? If not, would it be an easy patch? This is using the XPathEntityProcessor by the way. Thanks, Andrew. -- :: http://biotext.org.uk/ :: -- View this message in context: http://lucene.472066.n3.nabble.com/Skipping-duplicates-in-DataImportHandler-based-on-uniqueKey-tp771559p771559.html Sent from the Solr - User mailing list archive at Nabble.com.