SolrDeleteDuplicates needs to clone the SolrRecord objects 
-----------------------------------------------------------

                 Key: NUTCH-850
                 URL: https://issues.apache.org/jira/browse/NUTCH-850
             Project: Nutch
          Issue Type: Bug
          Components: indexer
    Affects Versions: 1.1
            Reporter: Julien Nioche
            Assignee: Julien Nioche
             Fix For: 1.2, 2.0
         Attachments: NUTCH-850.patch

The reduce() method of SolrDeleteDuplicates deduplicates SOLRRecords given 
their signature. The first SOLRRecord is stored in a variable _recordToKeep_ 
and is compared to the following SOLRRecords found with the same signature. The 
only trouble being that the first instance is reused by Hadoop when calling 
values.next() and hence  _recordToKeep_ gets the same values as the latest call 
to values.next(). 

The patch attached clones the SOLRRecord before assigning them to 
_recordToKeep_ in order to avoid the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to