[ 
https://issues.apache.org/jira/browse/NUTCH-684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675323#action_12675323
 ] 

Andrzej Bialecki  commented on NUTCH-684:
-----------------------------------------

IMHO it would be good to have this functionality in 1.0, and the patch is very 
close.

Ok, how about the following:

* we make the name of the unique field configurable, and provide a default 
value in nutch-default.xml, which is consistent with the one provided in the 
example schema.xml (yes, we should add an example schema, and the one in 
NUTCH-442 looks good enough).

* the UpdateRequest improvement: it's up to you whether to do it here or 
separately. It would be certainly a nice to have.

* javadocs: yeah, map/reduce/configure are obvious, and good javadocs exist in 
superclasses. Same of bean-like getters/setters. Other public methods should be 
documented, so that in half a year we still know what they are for and we 
understand the arguments they expect.

> Dedup support for Solr
> ----------------------
>
>                 Key: NUTCH-684
>                 URL: https://issues.apache.org/jira/browse/NUTCH-684
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>         Attachments: NUTCH-684_bin_nutch.patch, NUTCH-684_solrdedup_v2.patch, 
> solrdedup.patch
>
>
> After NUTCH-442, nutch now can index to both solr and lucene. However, 
> duplicate deletion feature (based on digests) is only available in lucene. It 
> should also be available for solr.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to