Re: Dedupe and overwriteDupes setting
Hi, My solrconfig dedupe setting is as follows. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupesfalse/bool str name=signatureFielddupesign/str str name=fieldstitle,url/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Even though overwriteDupes is set to false, search qiery results show the contents are overwrtten. Is this because there are duplicate contents on solr and the query results is displaying only the latest entery from the duplicate? I actually need the date field not to be overwritten. Please help. Thanks Shameema -- View this message in context: http://lucene.472066.n3.nabble.com/Dedupe-and-overwriteDupes-setting-tp809320p3989807.html Sent from the Solr - User mailing list archive at Nabble.com.
Dedupe and overwriteDupes setting
List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Dedupe and overwriteDupes setting
It seems this e-mail did already leave the outbox yesterday. Apologies for the spam. On Tuesday 11 May 2010 10:13:18 Markus Jelsma wrote: List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/st r /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Dedupe and overwriteDupes setting
1. You need to set the sig field to indexed. 2. This should be added to the wiki 3. Want to make a JIRA issue? This is not very friendly behavior (when you have the sig field set to indexed=false and overwriteDupes=true it should likely complain) -- - Mark http://www.lucidimagination.com On 5/11/10 4:13 AM, Markus Jelsma wrote: List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Dedupe and overwriteDupes setting
Thanks Mark, I already fixed it in the meantime and quickly went on with the usual stuff, i know, bad me =). I'll file a Jira report tomorrow and update the wiki on this subject. I'll can also file another ticket from another current topic on this subject; that's about a proper use-case for the update handler to return information on which documents where rejected due to dedupe. I would like to think that updating the wiki with links to those new Jira tickets would be a good idea for other readers, is it not? Cheers, -Original message- From: Mark Miller markrmil...@gmail.com Sent: Tue 11-05-2010 17:25 To: solr-user@lucene.apache.org; Subject: Re: Dedupe and overwriteDupes setting 1. You need to set the sig field to indexed. 2. This should be added to the wiki 3. Want to make a JIRA issue? This is not very friendly behavior (when you have the sig field set to indexed=false and overwriteDupes=true it should likely complain) -- - Mark http://www.lucidimagination.com On 5/11/10 4:13 AM, Markus Jelsma wrote: List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350