Re: Dedupe and overwriteDupes setting

2012-06-15 Thread Shameema Umer
Hi,
My solrconfig dedupe setting is as follows.

 updateRequestProcessorChain name=dedupe
processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  bool name=overwriteDupesfalse/bool
  str name=signatureFielddupesign/str
  str name=fieldstitle,url/str
  str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain

Even though overwriteDupes is set to false, search qiery results show the
contents are overwrtten.

Is this because there are duplicate contents on solr and the query results
is displaying only the latest entery from the duplicate?

I actually need the date field not to be overwritten. Please help.

Thanks
Shameema


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dedupe-and-overwriteDupes-setting-tp809320p3989807.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
List,


I've stumbled upon an issue with the deduplication mechanism. It either 
deletes all documents or does nothing at all and it depends on the 
overwriteDupes setting, resp. true and false.

I use a slightly modified configuration:

  updateRequestProcessorChain name=dedupe
processor 
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldsig/str
  bool name=overwriteDupestrue/bool
  str name=fieldscontent/str
  str 
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain


field name=sig type=string stored=true indexed=false 
multiValued=true /

After importing new documents i (only with overwriteDupes=false) can clearly 
see the correct signatures. Most documents have a distinct signature and some 
share the same because the content field's value is identical for those 
documents.


Anyway, why does it delete all my documents? Any clues? The wiki is not very 
helpful on this subject.


Cheers.


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
It seems this e-mail did already leave the outbox yesterday. Apologies for the 
spam.


On Tuesday 11 May 2010 10:13:18 Markus Jelsma wrote:
 List,
 
 
 I've stumbled upon an issue with the deduplication mechanism. It either
 deletes all documents or does nothing at all and it depends on the
 overwriteDupes setting, resp. true and false.
 
 I use a slightly modified configuration:
 
   updateRequestProcessorChain name=dedupe
 processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldsig/str
   bool name=overwriteDupestrue/bool
   str name=fieldscontent/str
   str
 name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/st
 r /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain
 
 
 field name=sig type=string stored=true indexed=false
 multiValued=true /
 
 After importing new documents i (only with overwriteDupes=false) can
  clearly see the correct signatures. Most documents have a distinct
  signature and some share the same because the content field's value is
  identical for those documents.
 
 
 Anyway, why does it delete all my documents? Any clues? The wiki is not
  very helpful on this subject.
 
 
 Cheers.
 
 
 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Re: Dedupe and overwriteDupes setting

2010-05-11 Thread Mark Miller

1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when 
you have the sig field set to indexed=false and overwriteDupes=true it 
should likely complain)




--
- Mark

http://www.lucidimagination.com


On 5/11/10 4:13 AM, Markus Jelsma wrote:

List,


I've stumbled upon an issue with the deduplication mechanism. It either
deletes all documents or does nothing at all and it depends on the
overwriteDupes setting, resp. true and false.

I use a slightly modified configuration:

   updateRequestProcessorChain name=dedupe
 processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldsig/str
   bool name=overwriteDupestrue/bool
   str name=fieldscontent/str
   str
name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
 /processor
 processor class=solr.LogUpdateProcessorFactory /
 processor class=solr.RunUpdateProcessorFactory /
   /updateRequestProcessorChain


 field name=sig type=string stored=true indexed=false
multiValued=true /

After importing new documents i (only with overwriteDupes=false) can clearly
see the correct signatures. Most documents have a distinct signature and some
share the same because the content field's value is identical for those
documents.


Anyway, why does it delete all my documents? Any clues? The wiki is not very
helpful on this subject.


Cheers.


Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



RE: Re: Dedupe and overwriteDupes setting

2010-05-11 Thread Markus Jelsma
Thanks Mark,

 

 

I already fixed it in the meantime and quickly went on with the usual stuff, i 
know, bad me =). I'll file a Jira report tomorrow and update the wiki on this 
subject. I'll can also file another ticket from another current topic on this 
subject; that's about a proper use-case for the update handler to return 
information on which documents where rejected due to dedupe.

 

I would like to think that updating the wiki with links to those new Jira 
tickets would be a good idea for other readers, is it not?

 

 

Cheers,
 
-Original message-
From: Mark Miller markrmil...@gmail.com
Sent: Tue 11-05-2010 17:25
To: solr-user@lucene.apache.org; 
Subject: Re: Dedupe and overwriteDupes setting

1. You need to set the sig field to indexed.
2. This should be added to the wiki
3. Want to make a JIRA issue? This is not very friendly behavior (when 
you have the sig field set to indexed=false and overwriteDupes=true it 
should likely complain)



-- 
- Mark

http://www.lucidimagination.com


On 5/11/10 4:13 AM, Markus Jelsma wrote:
 List,


 I've stumbled upon an issue with the deduplication mechanism. It either
 deletes all documents or does nothing at all and it depends on the
 overwriteDupes setting, resp. true and false.

 I use a slightly modified configuration:

    updateRequestProcessorChain name=dedupe
      processor
 class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
        bool name=enabledtrue/bool
        str name=signatureFieldsig/str
        bool name=overwriteDupestrue/bool
        str name=fieldscontent/str
        str
 name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str
      /processor
      processor class=solr.LogUpdateProcessorFactory /
      processor class=solr.RunUpdateProcessorFactory /
    /updateRequestProcessorChain


          field name=sig type=string stored=true indexed=false
 multiValued=true /

 After importing new documents i (only with overwriteDupes=false) can clearly
 see the correct signatures. Most documents have a distinct signature and some
 share the same because the content field's value is identical for those
 documents.


 Anyway, why does it delete all my documents? Any clues? The wiki is not very
 helpful on this subject.


 Cheers.


 Markus Jelsma - Technisch Architect - Buyways BV
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350