Re: Nutch/Solr communication problem

Zara Parst Mon, 18 Jan 2016 02:56:11 -0800

SolrIndexWriter
solr.server.type : Type of SolrServer to communicate with (default 'http'
however options include 'cloud', 'lb' and 'concurrent')
solr.server.url : URL of the Solr instance (mandatory)
solr.zookeeper.url : URL of the Zookeeper URL (mandatory if 'cloud' value
for solr.server.type)
solr.loadbalance.urls : Comma-separated string of Solr server strings to be
used (madatory if 'lb' value for solr.server.type)
solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
solr.commit.size : buffer size when sending to Solr (default 1000)
solr.auth : use authentication (default false)
solr.auth.username : username for authentication
solr.auth.password : password for authentication

2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
crawldb: crawlDbyah/crawldb
2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
linkdb: crawlDbyah/linkdb
2016-01-17 19:19:42,973 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
adding segment: crawlDbyah/segments/20160117191906
2016-01-17 19:19:42,975 WARN  indexer.IndexerMapReduce - Ignoring linkDb
for indexing, no linkDb found in path: crawlDbyah/linkdb
2016-01-17 19:19:43,807 WARN  conf.Configuration -
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-01-17 19:19:43,809 WARN  conf.Configuration -
file:/tmp/hadoop-rakesh/mapred/staging/rakesh2114349538/.staging/job_local2114349538_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-01-17 19:19:43,963 WARN  conf.Configuration -
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-01-17 19:19:43,980 WARN  conf.Configuration -
file:/tmp/hadoop-rakesh/mapred/local/localRunner/rakesh/job_local2114349538_0001/job_local2114349538_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-01-17 19:19:44,260 INFO  anchor.AnchorIndexingFilter - Anchor
deduplication is: off
2016-01-17 19:19:45,128 INFO  indexer.IndexWriters - Adding
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2016-01-17 19:19:45,148 INFO  solr.SolrUtils - Authenticating as: radmin
2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: content
dest: content
2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: title dest:
title
2016-01-17 19:19:45,318 INFO  solr.SolrMappingReader - source: host dest:
host
2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: segment
dest: segment
2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: boost dest:
boost
2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: digest dest:
digest
2016-01-17 19:19:45,319 INFO  solr.SolrMappingReader - source: tstamp dest:
tstamp
2016-01-17 19:19:45,360 INFO  solr.SolrIndexWriter - Indexing 2 documents
2016-01-17 19:19:45,507 INFO  solr.SolrIndexWriter - Indexing 2 documents
2016-01-17 19:19:45,526 WARN  mapred.LocalJobRunner -
job_local2114349538_0001
java.lang.Exception: java.io.IOException
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.io.IOException
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:171)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:157)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:502)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:456)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException
occured when talking to server at: http://127.0.0.1:8983/solr/yah
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:566)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:153)
... 11 more
Caused by: org.apache.http.client.ClientProtocolException
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:448)
... 15 more
Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
retry request with a non-repeatable request entity.
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:208)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
... 19 more
2016-01-17 19:19:46,055 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)

On Mon, Jan 18, 2016 at 4:15 PM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hi - can you post the log output?
> Markus
>
>
> -----Original message-----
> From: Zara Parst<edotserv...@gmail.com>
> Sent: Monday 18th January 2016 2:06
> To: dev@nutch.apache.org
> Subject: Nutch/Solr communication problem
>
> Hi everyone,
>
> I have situation here, I am using nutch 1.11 and solr 5.4
>
> Solr is protected by user name and password  I am passing credential to
> solr using following command
>
> bin/crawl -i -Dsolr.server.url=http://localhost:8983/solr/abc <
> http://localhost:8983/solr/abc>  -D solr.auth=true
>  -Dsolr.auth.username=xxxx  -Dsolr.auth.password=xxx  url crawlDbyah 1
>
> and always same problem , please help me how to feed data to protected
> solr.
>
> Below is error message.
>
> Indexer: starting at 2016-01-17 19:01:12
>
> Indexer: deleting gone documents: false
>
> Indexer: URL filtering: false
>
> Indexer: URL normalizing: false
>
> Active IndexWriters :
>
> SolrIndexWriter
>
>         solr.server.type : Type of SolrServer to communicate with (default
> http however options include cloud, lb and concurrent)
>
>         solr.server.url : URL of the Solr instance (mandatory)
>
>         solr.zookeeper.url : URL of the Zookeeper URL (mandatory if cloud
> value for solr.server.type)
>
>         solr.loadbalance.urls : Comma-separated string of Solr server
> strings to be used (madatory if lb value for solr.server.type)
>
>         solr.mapping.file : name of the mapping file for fields (default
> solrindex-mapping.xml)
>
>         solr.commit.size : buffer size when sending to Solr (default 1000)
>
>         solr.auth : use authentication (default false)
>
>         solr.auth.username : username for authentication
>
>         solr.auth.password : password for authentication
>
> Indexing 2 documents
>
> Indexing 2 documents
>
> Indexer: java.io.IOException: Job failed!
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:228)
>
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:237)
>
> I also tried username and password in nutch-default.xml but again same
> error. Please help me out.
>
>
>

Re: Nutch/Solr communication problem

Reply via email to