[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423492#comment-15423492
 ] 

Jose-Marcio Martins commented on NUTCH-2269:
--------------------------------------------

Hello, from a message I've posted on nutch-users discussion list... on Jun, 07 
2016. Nobody answered.
I tried with older solr releases but the problem remains.So I've tried to 
rebuild the crawl data (and solr data too) from scratch, incrementally to see 
at what point the problem arrives.
I copy here the content of my message to nutch-list...
Well. to find which "thing" could trigger the problem on "clean", I worked 
incrementally, and I found that the problem is triggered when nutch tries to 
clean the following URLs from solr :

********************************************************************************************

[nutch@crawler crawldb]$ ../../../../devel/show-urls part-00000  | grep gone
db_gone      http://www.armines.net/0.85
db_gone      http://www.armines.net/1.8
db_gone      http://www.armines.net/agenda/3%C3%A8me-a%C3%A9rogels
db_gone      http://www.armines.net/agenda/chercheurs-3d
db_gone      http://www.armines.net/agenda/rencontres-2016
db_gone      http://www.armines.net/association-armines/chiffres-dactivit%C3%A9
db_gone      http://www.armines.net/associations-reseaux
db_gone      
http://www.armines.net/carnot-mines-tv/sciences-mat%C3%A9riaux/extinguo
db_gone      
http://www.armines.net/centres-thematiques/%C3%A9conomie-management-soci%C3%A9t%C3%A9
db_gone      
http://www.armines.net/centres-thematiques/%C3%A9nerg%C3%A9tique-proc%C3%A9d%C3%A9s
db_gone      http://www.armines.net/centres-thematiques/math%C3%A9matiques-9
db_gone      http://www.armines.net/centres-thematiques/sciences-lenvironnement
db_gone      http://www.armines.net/centres-thematiques/sciences-mat%C3%A9riaux
db_gone      http://www.armines.net/domaines-dapplication/energie-durable
db_gone      
http://www.armines.net/domaines-dapplication/transformation-mati%C3%A8re
db_gone      http://www.armines.net/fr/grid4eu-solutions
db_gone      http://www.armines.net/text/javascript
[nutch@crawler crawldb]$

Is it possible that the problem come from the encoded URLs (with %XY) ?


> Clean not working after crawl
> -----------------------------
>
>                 Key: NUTCH-2269
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2269
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.12
>         Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>            Reporter: Francesco Capponi
>             Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>       at org.apache.http.util.Asserts.check(Asserts.java:34)
>       at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>       at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>       at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>       at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>       at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>       at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>       at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>       at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>       at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>       at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>       at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>       at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>       at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>       at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>       at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>       at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>       at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>       at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>       at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>       at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>       at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> 2016-05-30 10:13:09,299 ERROR indexer.CleaningJob - CleaningJob: 
> java.io.IOException: Job failed!
>       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
>       at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:172)
>       at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:195)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:206)
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to