[ https://issues.apache.org/jira/browse/NUTCH-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453907#comment-16453907 ]
Hudson commented on NUTCH-2526: ------------------------------- SUCCESS: Integrated in Jenkins build Nutch-trunk #3522 (See [https://builds.apache.org/job/Nutch-trunk/3522/]) NUTCH-2526 NPE in scoring-opic when indexing document without CrawlDb (snagel: [https://github.com/apache/nutch/commit/90ae2d1f9159c3d30d5a937252f2bbb00e2110e4]) * (edit) src/plugin/scoring-link/src/java/org/apache/nutch/scoring/link/LinkAnalysisScoringFilter.java * (edit) src/java/org/apache/nutch/scoring/ScoringFilter.java * (edit) src/plugin/scoring-opic/src/java/org/apache/nutch/scoring/opic/OPICScoringFilter.java > NPE in scoring-opic when indexing document without CrawlDb datum > ---------------------------------------------------------------- > > Key: NUTCH-2526 > URL: https://issues.apache.org/jira/browse/NUTCH-2526 > Project: Nutch > Issue Type: Improvement > Components: parser, scoring > Affects Versions: 1.14 > Reporter: Yash Thenuan > Assignee: Sebastian Nagel > Priority: Major > Fix For: 1.15 > > > I was trying to write a parse filter plugin whose work was to parse internal > links as a separate document.what I did basically is,breaking the page into > multiple parseResults each parseResult having ParseText and ParseData > corresponding to the InternalLinks. I was successfully able to parse them > separately. But at the time of Scoring Some Error occurred. > I am attaching the Logs for Indexing. > > 2018-03-07 15:41:52,327 INFO indexer.IndexerMapReduce - IndexerMapReduce: > crawldb: crawl/crawldb > 2018-03-07 15:41:52,327 INFO indexer.IndexerMapReduce - IndexerMapReduce: > linkdb: crawl/linkdb > 2018-03-07 15:41:52,327 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/segments/20180307130959 > 2018-03-07 15:41:53,677 INFO anchor.AnchorIndexingFilter - Anchor > deduplication is: off > 2018-03-07 15:41:54,861 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.elasticrest.ElasticRestIndexWriter > 2018-03-07 15:41:55,168 INFO client.AbstractJestClient - Setting server pool > to a list of 1 servers: [http://localhost:9200] > 2018-03-07 15:41:55,170 INFO client.JestClientFactory - Using multi > thread/connection supporting pooling connection manager > 2018-03-07 15:41:55,238 INFO client.JestClientFactory - Using default GSON > instance > 2018-03-07 15:41:55,238 INFO client.JestClientFactory - Node Discovery > disabled... > 2018-03-07 15:41:55,238 INFO client.JestClientFactory - Idle connection > reaping disabled... > 2018-03-07 15:41:55,282 INFO elasticrest.ElasticRestIndexWriter - Processing > remaining requests [docs = 1, length = 210402, total docs = 1] > 2018-03-07 15:41:55,361 INFO elasticrest.ElasticRestIndexWriter - Processing > to finalize last execute > 2018-03-07 15:41:55,458 INFO elasticrest.ElasticRestIndexWriter - Previous > took in ms 175, including wait 97 > 2018-03-07 15:41:55,468 WARN mapred.LocalJobRunner - job_local1561152089_0001 > java.lang.Exception: java.lang.NullPointerException > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) > Caused by: java.lang.NullPointerException > at > org.apache.nutch.scoring.opic.OPICScoringFilter.indexerScore(OPICScoringFilter.java:171) > at > org.apache.nutch.scoring.ScoringFilters.indexerScore(ScoringFilters.java:120) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:296) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:57) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > 2018-03-07 15:41:55,510 ERROR indexer.IndexingJob - Indexer: > java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873) > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147) > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239) -- This message was sent by Atlassian JIRA (v7.6.3#76005)