[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-08-16 Thread Jose-Marcio Martins (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423492#comment-15423492
 ] 

Jose-Marcio Martins commented on NUTCH-2269:


Hello, from a message I've posted on nutch-users discussion list... on Jun, 07 
2016. Nobody answered.
I tried with older solr releases but the problem remains.So I've tried to 
rebuild the crawl data (and solr data too) from scratch, incrementally to see 
at what point the problem arrives.
I copy here the content of my message to nutch-list...
Well. to find which "thing" could trigger the problem on "clean", I worked 
incrementally, and I found that the problem is triggered when nutch tries to 
clean the following URLs from solr :



[nutch@crawler crawldb]$ ../../../../devel/show-urls part-0  | grep gone
db_gone  http://www.armines.net/0.85
db_gone  http://www.armines.net/1.8
db_gone  http://www.armines.net/agenda/3%C3%A8me-a%C3%A9rogels
db_gone  http://www.armines.net/agenda/chercheurs-3d
db_gone  http://www.armines.net/agenda/rencontres-2016
db_gone  http://www.armines.net/association-armines/chiffres-dactivit%C3%A9
db_gone  http://www.armines.net/associations-reseaux
db_gone  
http://www.armines.net/carnot-mines-tv/sciences-mat%C3%A9riaux/extinguo
db_gone  
http://www.armines.net/centres-thematiques/%C3%A9conomie-management-soci%C3%A9t%C3%A9
db_gone  
http://www.armines.net/centres-thematiques/%C3%A9nerg%C3%A9tique-proc%C3%A9d%C3%A9s
db_gone  http://www.armines.net/centres-thematiques/math%C3%A9matiques-9
db_gone  http://www.armines.net/centres-thematiques/sciences-lenvironnement
db_gone  http://www.armines.net/centres-thematiques/sciences-mat%C3%A9riaux
db_gone  http://www.armines.net/domaines-dapplication/energie-durable
db_gone  
http://www.armines.net/domaines-dapplication/transformation-mati%C3%A8re
db_gone  http://www.armines.net/fr/grid4eu-solutions
db_gone  http://www.armines.net/text/javascript
[nutch@crawler crawldb]$

Is it possible that the problem come from the encoded URLs (with %XY) ?


> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
> Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.

[jira] [Commented] (NUTCH-2139) Basic plugin to index inlinks and outlinks

2016-08-16 Thread Manish Bassi (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423299#comment-15423299
 ] 

Manish Bassi commented on NUTCH-2139:
-

I am using Nutch 1.12 and index-links is working only for outlinks and not for 
inlinks.

Inlinks value is null due to which they are getting indexed.


> Basic plugin to index inlinks and outlinks
> --
>
> Key: NUTCH-2139
> URL: https://issues.apache.org/jira/browse/NUTCH-2139
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, plugin
>Reporter: Jorge Luis Betancourt Gonzalez
>Priority: Minor
>  Labels: link, plugin
> Fix For: 1.13
>
>
> Basic plugin that allows to index the inlinks and outlinks of the web pages, 
> this could be very useful for analytic purposes, including neat 
> visualizations using d3.js for instance. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Nutch-trunk #3389

2016-08-16 Thread Apache Jenkins Server
See 

Changes:

[snagel] Remove obsolete properties protocol.plugin.check.blocking and

--
[...truncated 515 lines...]
init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-metadata

jar:

deps-test:

deploy:

copy-generated-lib:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-static

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-metadata

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: index-links

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: mimetype-filter

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: indexer-cloudsearch

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: indexer-dummy

jar:

deps-test:

deploy:

copy-generated-lib:

init:

init-plugin:

deps-jar:

clean-lib:

resolve-default:
[ivy:resolve] :: loading settings :: file = 


compile:
 [echo] Compiling plugin: indexer-elastic
[javac] Compiling 1 source file to 

[javac] 
:96:
 error: cannot find symbol
[javac]   .setBackoffPolicy(BackoffPolicy.exponentialBackoff(
[javac]   ^
[javac]   symbol:   method setBackoffPolicy(BackoffPolicy)
[javac]   location: class Builder
[javac] 
:107:
 error: cannot find symbol
[javac] Settings.Builder settingsBuilder = Settings.settingsBuilder();
[javac]^
[javac]   symbol:   method settingsBuilder()
[javac]   location: interface Settings
[javac] 
:119:
 error: cannot find symbol
[javac]   settingsBuilder.put(parts[0].trim(), parts[1].trim());
[javac]  ^
[javac]   symbol:   method put(String,String)
[javac]   location: variable settingsBuilder of type Builder
[javac] 
:126:
 error: cannot find symbol
[javac]   settingsBuilder.put("cluster.name", clusterName);
[javac]  ^
[javac]   symbol:   method put(String,String)
[javac]   location: variable settingsBuilder of type Builder
[javac] 
:134:
 error: cannot find symbol
[javac]   TransportClient transportClient = 
TransportClient.builder().settings(settings).build();
[javac]^
[javac]   symbol:   method builder()
[javac]   location: class TransportClient
[javac] 5 errors

BUILD FAILED
:116: The following 
error occurred while executing this line:
:43: The 
following error occurred while executing this line:
:133: 
Compile failed; see the compiler 

[jira] [Resolved] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel resolved NUTCH-2299.

Resolution: Fixed

> Remove obsolete properties protocol.plugin.check.*
> --
>
> Key: NUTCH-2299
> URL: https://issues.apache.org/jira/browse/NUTCH-2299
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher, protocol
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.13
>
>
> There are two properties {{protocol.plugin.check.blocking}} and 
> {{protocol.plugin.check.robots}} not used anymore since NUTCH-876. They can 
> be removed from Fetcher and Protocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423214#comment-15423214
 ] 

ASF GitHub Bot commented on NUTCH-2299:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/140


> Remove obsolete properties protocol.plugin.check.*
> --
>
> Key: NUTCH-2299
> URL: https://issues.apache.org/jira/browse/NUTCH-2299
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher, protocol
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.13
>
>
> There are two properties {{protocol.plugin.check.blocking}} and 
> {{protocol.plugin.check.robots}} not used anymore since NUTCH-876. They can 
> be removed from Fetcher and Protocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #140: NUTCH-2299 Remove obsolete properties protocol.plug...

2016-08-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/140


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Work started] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on NUTCH-2299 started by Sebastian Nagel.
--
> Remove obsolete properties protocol.plugin.check.*
> --
>
> Key: NUTCH-2299
> URL: https://issues.apache.org/jira/browse/NUTCH-2299
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher, protocol
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.13
>
>
> There are two properties {{protocol.plugin.check.blocking}} and 
> {{protocol.plugin.check.robots}} not used anymore since NUTCH-876. They can 
> be removed from Fetcher and Protocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (NUTCH-2299) Remove obsolete properties protocol.plugin.check.*

2016-08-16 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel reassigned NUTCH-2299:
--

Assignee: Sebastian Nagel

> Remove obsolete properties protocol.plugin.check.*
> --
>
> Key: NUTCH-2299
> URL: https://issues.apache.org/jira/browse/NUTCH-2299
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher, protocol
>Affects Versions: 1.12
>Reporter: Sebastian Nagel
>Assignee: Sebastian Nagel
>Priority: Trivial
> Fix For: 1.13
>
>
> There are two properties {{protocol.plugin.check.blocking}} and 
> {{protocol.plugin.check.robots}} not used anymore since NUTCH-876. They can 
> be removed from Fetcher and Protocol.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-08-16 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423152#comment-15423152
 ] 

Lewis John McGibbney commented on NUTCH-2269:
-

{code}
2016-08-12 09:05:05,414 WARN  output.FileOutputCommitter - Output Path is null 
in setupJob()
2016-08-12 09:05:06,000 INFO  indexer.IndexWriters - Adding 
org.apache.nutch.indexwriter.solr.SolrIndexWriter
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: content dest: 
content
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: title dest: title
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: host dest: host
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: segment dest: 
segment
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: boost dest: boost
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: digest dest: 
digest
2016-08-12 09:05:06,132 INFO  solr.SolrMappingReader - source: tstamp dest: 
tstamp
2016-08-12 09:05:06,145 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
198/198 documents
2016-08-12 09:05:06,303 WARN  output.FileOutputCommitter - Output Path is null 
in cleanupJob()
2016-08-12 09:05:06,304 WARN  mapred.LocalJobRunner - job_local2045546135_0001
java.lang.Exception: java.lang.IllegalStateException: Connection pool shut down
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at 
org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
at 
org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
at 
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
at 
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-08-12 09:05:06,434 ERROR indexer.CleaningJob - CleaningJob: 
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:172)
at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:195)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:206)
{code}

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
> Fix For: 1.13
>
>
> 

[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-08-16 Thread Kris (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423131#comment-15423131
 ] 

Kris commented on NUTCH-2269:
-

Encountered the same issue with clean from nutch 1.12 to solr 5.4.1.

./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/ARLInside urls/ 
crawlARLInside -1
produces:
2016-08-12 09:05:06,434 ERROR indexer.CleaningJob - CleaningJob: 
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:172)
at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:195)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:206).

Crawl dump = 

2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - TOTAL urls: 4304
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - retry 0:4303
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - retry 1:1
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - min score:  0.0
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - avg score:  6.347584E-4
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - max score:  1.011
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - status 1 (db_unfetched):
139
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - status 2 (db_fetched):  
3966
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - status 3 (db_gone): 66
2016-08-12 09:26:07,035 INFO  crawl.CrawlDbReader - status 5 (db_redir_perm):   
1
2016-08-12 09:26:07,036 INFO  crawl.CrawlDbReader - status 7 (db_duplicate):
132
2016-08-12 09:26:07,036 INFO  crawl.CrawlDbReader - CrawlDb statistics: done

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
> Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.sol

[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-08-16 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15423108#comment-15423108
 ] 

Lewis John McGibbney commented on NUTCH-2269:
-

[~wastl-nagel] said

bq. Are you able to reproduce the problem with the correct Solr version?

It looks like we are able to reproduce this against Solr 5.4.1. This is using 
Nutch 1.12. I am going to try against master branch and see if this is still 
the case.

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
> Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>   at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.uti

[jira] [Updated] (NUTCH-2269) Clean not working after crawl

2016-08-16 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2269:

Fix Version/s: 1.13

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
> Fix For: 1.13
>
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>   at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-05-30 10:13:09,299 ERROR indexer.CleaningJob - CleaningJob: 
> java.io.IOException: Job failed!
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)