[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446002#comment-13446002 ]
Luca Cavanna commented on NUTCH-1100: ------------------------------------- The problem with the approach I mentioned before is that the field digest would need to be made indexed in the solr schema, otherwise that query would always return 0 results. > SolrDedup broken > ---------------- > > Key: NUTCH-1100 > URL: https://issues.apache.org/jira/browse/NUTCH-1100 > Project: Nutch > Issue Type: Bug > Components: indexer > Affects Versions: 1.4 > Reporter: Markus Jelsma > Fix For: 1.6 > > Attachments: NUTCH-1100-1.6-1.patch > > > Some Solr indices are unable to be deduped from Nutch. For unknown reasons > Nutch will throw the exception below. There are no peculiarities to be found > in the Solr logs, the queries are normal and seem to succeed. > {code} > java.lang.NullPointerException > at org.apache.hadoop.io.Text.encode(Text.java:388) > at org.apache.hadoop.io.Text.set(Text.java:178) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:272) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat$1.next(SolrDeleteDuplicates.java:243) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176) > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira