[jira] [Commented] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351824#comment-15351824
 ] 

Sebastian Nagel commented on NUTCH-2269:


Thanks for reporting the problems. Afaics, they can be solved by using "clean" 
the right way in combination with the required Solr version:
# "nutch clean" will not run on the linkdb:
#* the command-line help is clear
{noformat}
% bin/nutch clean
Usage: CleaningJob  [-noCommit]
{noformat}
#* and also the error message gives a clear hint:
{noformat}
java.lang.Exception: java.lang.ClassCastException: 
org.apache.nutch.crawl.Inlinks cannot be cast to 
org.apache.nutch.crawl.CrawlDatum
...
2016-06-27 22:00:09,628 ERROR indexer.CleaningJob - CleaningJob: 
java.io.IOException: Job failed!
...
2016-06-27 22:00:52,057 ERROR indexer.CleaningJob - Missing crawldb. Usage: 
CleaningJob  [-noCommit]
{noformat}
#* unfortunately, both CrawlDb and LinkDb are formally map files which makes it 
difficult to check the right usage in advance.
# I was able to reproduce the error "IllegalStateException: Connection pool 
shut down" when using Nutch 1.12 in combination with Solr 4.10.4. However, 
Nutch 1.12 is built against Solr 5.4.1 which is probably the reason. Are you 
able to reproduce the problem with the correct Solr version?
# The message
{noformat}
WARN output.FileOutputCommitter - Output Path is null in commitJob()
{noformat}
is only a warning and no problem: Indeed, the cleaning job is a map-reduce job 
without output, deletions are sent to the Solr server.  It's uncommon for a 
map-reduce job to have no output but it is not a problem.

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.proces

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2269:
---
Comment: was deleted

(was: The message
{noformat}
WARN output.FileOutputCommitter - Output Path is null in commitJob()
{noformat}
is only a warning and no problem: Indeed, the cleaning job is a map-reduce job 
without output, deletions are sent to the Solr server.  That's uncommon but not 
a problem.)

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>   at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2269:
---
Comment: was deleted

(was: The message
{noformat}
WARN output.FileOutputCommitter - Output Path is null in commitJob()
{noformat}
is only a warning and no problem: Indeed, the cleaning job is a map-reduce job 
without output, deletions are sent to the Solr server.  That's uncommon but not 
a problem.)

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>   at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl

2016-06-27 Thread Sebastian Nagel (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sebastian Nagel updated NUTCH-2269:
---
Comment: was deleted

(was: The message
{noformat}
WARN output.FileOutputCommitter - Output Path is null in commitJob()
{noformat}
is only a warning and no problem: Indeed, the cleaning job is a map-reduce job 
without output, deletions are sent to the Solr server.  That's uncommon but not 
a problem.)

> Clean not working after crawl
> -
>
> Key: NUTCH-2269
> URL: https://issues.apache.org/jira/browse/NUTCH-2269
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: Vagrant, Ubuntu, Java 8, Solr 4.10
>Reporter: Francesco Capponi
>
> I'm have been having this problem for a while and I had to rollback using the 
> old solr clean instead of the newer version. 
> Once it inserts/update correctly every document in Nutch, when it tries to 
> clean, it returns error 255:
> {quote}
> 2016-05-30 10:13:04,992 WARN  output.FileOutputCommitter - Output Path is 
> null in setupJob()
> 2016-05-30 10:13:07,284 INFO  indexer.IndexWriters - Adding 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: content dest: 
> content
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: title dest: 
> title
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: host dest: host
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: segment dest: 
> segment
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: boost dest: 
> boost
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: digest dest: 
> digest
> 2016-05-30 10:13:08,114 INFO  solr.SolrMappingReader - source: tstamp dest: 
> tstamp
> 2016-05-30 10:13:08,133 INFO  solr.SolrIndexWriter - SolrIndexer: deleting 
> 15/15 documents
> 2016-05-30 10:13:08,919 WARN  output.FileOutputCommitter - Output Path is 
> null in cleanupJob()
> 2016-05-30 10:13:08,937 WARN  mapred.LocalJobRunner - job_local662730477_0001
> java.lang.Exception: java.lang.IllegalStateException: Connection pool shut 
> down
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
> Caused by: java.lang.IllegalStateException: Connection pool shut down
>   at org.apache.http.util.Asserts.check(Asserts.java:34)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169)
>   at 
> org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202)
>   at 
> org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184)
>   at 
> org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415)
>   at 
> org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>   at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>   at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>   at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483)
>   at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190)
>   at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>   at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)
>   at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo

[jira] [Resolved] (NUTCH-2022) Investigate better documentation for the Nutch REST API's

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2022.
-
Resolution: Fixed

> Investigate better documentation for the Nutch REST API's
> -
>
> Key: NUTCH-2022
> URL: https://issues.apache.org/jira/browse/NUTCH-2022
> Project: Nutch
>  Issue Type: Wish
>  Components: REST_api
>Affects Versions: 2.3, 1.10
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4, 1.11
>
>
> Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better 
> representation of the Tika REST API.
> Based on recent development on both 1.X and 2.x REST API's, it would be nice 
> to have a better interface for people to see.
> An example of Miredot REST API docs can be seen on [Tika REST API 
> docs|http://tika.apache.org/1.8/miredot/index.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2022) Investigate better documentation for the Nutch REST API's

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2022:

Fix Version/s: 1.11
   2.4

> Investigate better documentation for the Nutch REST API's
> -
>
> Key: NUTCH-2022
> URL: https://issues.apache.org/jira/browse/NUTCH-2022
> Project: Nutch
>  Issue Type: Wish
>  Components: REST_api
>Affects Versions: 2.3, 1.10
>Reporter: Lewis John McGibbney
>Assignee: Furkan KAMACI
> Fix For: 2.4, 1.11
>
>
> Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better 
> representation of the Tika REST API.
> Based on recent development on both 1.X and 2.x REST API's, it would be nice 
> to have a better interface for people to see.
> An example of Miredot REST API docs can be seen on [Tika REST API 
> docs|http://tika.apache.org/1.8/miredot/index.html]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2243) Documentation for Nutch 2.X REST API

2016-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351610#comment-15351610
 ] 

Hudson commented on NUTCH-2243:
---

SUCCESS: Integrated in Nutch-nutchgora #1562 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1562/])
NUTCH-2243 REST API documentation for Nutch 2.X (furkankamaci: rev 
728d0de8bac399ac8dff5d0a0eee89f5c53428b9)
* ivy/mvn.template
* build.xml
NUTCH-2243 Miredot plugin version and licence configuration are updated. 
(furkankamaci: rev 6f7ca5bc0b08c3e3f8f4aa23e4924ad159d7222f)
* ivy/mvn.template


> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2243
> URL: https://issues.apache.org/jira/browse/NUTCH-2243
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages:
> org.apache.nutch.api.*
> for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2282) Incorrect content-type returned in 4 API calls

2016-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351609#comment-15351609
 ] 

Hudson commented on NUTCH-2282:
---

SUCCESS: Integrated in Nutch-nutchgora #1562 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1562/])
fix for NUTCH-2282 contributed by lancergr (g.adam: rev 
98814a0270cfc173998366618171f746ec4d304b)
* src/java/org/apache/nutch/api/resources/AdminResource.java
* src/java/org/apache/nutch/api/resources/ConfigResource.java
* src/java/org/apache/nutch/api/resources/JobResource.java


> Incorrect content-type returned in 4 API calls
> --
>
> Key: NUTCH-2282
> URL: https://issues.apache.org/jira/browse/NUTCH-2282
> Project: Nutch
>  Issue Type: Bug
>  Components: REST_api
>Affects Versions: 2.3.1
>Reporter: Giorgos Adam
>Assignee: Lewis John McGibbney
>Priority: Trivial
> Fix For: 2.4
>
>
> The REST API returns 'Content-type: application/json' instead of text/plain 
> (at least) on the following calls:
> 1. GET /admin/stop
> 2. GET /config/:configId/:property
> 3. POST /config/create
> 4. POST /job/create



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2289) SSL Support for REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351553#comment-15351553
 ] 

ASF GitHub Bot commented on NUTCH-2289:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/128#discussion_r68630587
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -99,17 +114,70 @@ public NutchServer() {
 component = new Component();
 component.getLogger().setLevel(Level.parse(logLevel));
 
-// Add a new HTTP server listening on defined port.
-component.getServers().add(Protocol.HTTP, port);
+AuthenticationTypeEnum authenticationType = 
configManager.get(ConfigResource.DEFAULT)
+.getEnum("restapi.auth", AuthenticationTypeEnum.NONE);
+
+if (authenticationType == AuthenticationTypeEnum.SSL) {
+  // Add a new HTTPS server listening on defined port.
+  Server server = component.getServers().add(Protocol.HTTPS, port);
+
+  Series parameters = server.getContext().getParameters();
+  parameters.add("sslContextFactory", 
"org.restlet.engine.ssl.DefaultSslContextFactory");
+
+  String keyStorePath = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.storepath", 
"etc/nutch-ssl.keystore.jks");
+  parameters.add("keyStorePath", keyStorePath);
+
+  String keyStorePassword = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.storepass", "password");
+  parameters.add("keyStorePassword", keyStorePassword);
+
+  String keyPassword = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.keypass", "password");
+  parameters.add("keyPassword", keyPassword);
+
+  parameters.add("keyStoreType", "JKS");
+  LOG.info("SSL Authentication is set for NutchServer");
+} else {
+  // Add a new HTTP server listening on defined port.
+  component.getServers().add(Protocol.HTTP, port);
+}
 
 Context childContext = component.getContext().createChildContext();
 JaxRsApplication application = new JaxRsApplication(childContext);
 application.add(this);
 application.setStatusService(new ErrorStatusService());
 childContext.getAttributes().put(NUTCH_SERVER, this);
 
-// Attach the application.
-component.getDefaultHost().attach(application);
+if (authenticationType == AuthenticationTypeEnum.NONE || 
authenticationType == AuthenticationTypeEnum.SSL ) {
+  component.getDefaultHost().attach(application);
+  return;
+}
+
+String username = 
configManager.get(ConfigResource.DEFAULT).get("restapi.auth.username", "admin");
+String password = 
configManager.get(ConfigResource.DEFAULT).get("restapi.auth.password", "nutch");
+
+MapVerifier mapVerifier = new MapVerifier();
+mapVerifier.getLocalSecrets().put(username, password.toCharArray());
+
+if (authenticationType == AuthenticationTypeEnum.BASIC) {
--- End diff --

Please see comment on other issue for use of switch block.


> SSL Support for REST API
> 
>
> Key: NUTCH-2289
> URL: https://issues.apache.org/jira/browse/NUTCH-2289
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add SSL Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #128: NUTCH-2289 SSL support for Nutch 2.X REST API.

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/128#discussion_r68630587
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -99,17 +114,70 @@ public NutchServer() {
 component = new Component();
 component.getLogger().setLevel(Level.parse(logLevel));
 
-// Add a new HTTP server listening on defined port.
-component.getServers().add(Protocol.HTTP, port);
+AuthenticationTypeEnum authenticationType = 
configManager.get(ConfigResource.DEFAULT)
+.getEnum("restapi.auth", AuthenticationTypeEnum.NONE);
+
+if (authenticationType == AuthenticationTypeEnum.SSL) {
+  // Add a new HTTPS server listening on defined port.
+  Server server = component.getServers().add(Protocol.HTTPS, port);
+
+  Series parameters = server.getContext().getParameters();
+  parameters.add("sslContextFactory", 
"org.restlet.engine.ssl.DefaultSslContextFactory");
+
+  String keyStorePath = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.storepath", 
"etc/nutch-ssl.keystore.jks");
+  parameters.add("keyStorePath", keyStorePath);
+
+  String keyStorePassword = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.storepass", "password");
+  parameters.add("keyStorePassword", keyStorePassword);
+
+  String keyPassword = configManager.get(ConfigResource.DEFAULT)
+  .get("restapi.auth.ssl.keypass", "password");
+  parameters.add("keyPassword", keyPassword);
+
+  parameters.add("keyStoreType", "JKS");
+  LOG.info("SSL Authentication is set for NutchServer");
+} else {
+  // Add a new HTTP server listening on defined port.
+  component.getServers().add(Protocol.HTTP, port);
+}
 
 Context childContext = component.getContext().createChildContext();
 JaxRsApplication application = new JaxRsApplication(childContext);
 application.add(this);
 application.setStatusService(new ErrorStatusService());
 childContext.getAttributes().put(NUTCH_SERVER, this);
 
-// Attach the application.
-component.getDefaultHost().attach(application);
+if (authenticationType == AuthenticationTypeEnum.NONE || 
authenticationType == AuthenticationTypeEnum.SSL ) {
+  component.getDefaultHost().attach(application);
+  return;
+}
+
+String username = 
configManager.get(ConfigResource.DEFAULT).get("restapi.auth.username", "admin");
+String password = 
configManager.get(ConfigResource.DEFAULT).get("restapi.auth.password", "nutch");
+
+MapVerifier mapVerifier = new MapVerifier();
+mapVerifier.getLocalSecrets().put(username, password.toCharArray());
+
+if (authenticationType == AuthenticationTypeEnum.BASIC) {
--- End diff --

Please see comment on other issue for use of switch block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2289) SSL Support for REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351549#comment-15351549
 ] 

ASF GitHub Bot commented on NUTCH-2289:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/128#discussion_r68630436
  
--- Diff: conf/nutch-default.xml ---
@@ -1435,4 +1435,60 @@
   
 
 
+
+  restapi.auth
+  NONE
+  
+Configures authentication type for communicating with RESTAPI. Valid 
values are BASIC, DIGEST, SSL and NONE.
+When no authentication type is defined NONE will be used as default 
which does not provide security.
+Use the restapi.auth.username and restapi.auth.auth.password 
properties to configure
--- End diff --

Typo again


> SSL Support for REST API
> 
>
> Key: NUTCH-2289
> URL: https://issues.apache.org/jira/browse/NUTCH-2289
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add SSL Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #128: NUTCH-2289 SSL support for Nutch 2.X REST API.

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/128#discussion_r68630436
  
--- Diff: conf/nutch-default.xml ---
@@ -1435,4 +1435,60 @@
   
 
 
+
+  restapi.auth
+  NONE
+  
+Configures authentication type for communicating with RESTAPI. Valid 
values are BASIC, DIGEST, SSL and NONE.
+When no authentication type is defined NONE will be used as default 
which does not provide security.
+Use the restapi.auth.username and restapi.auth.auth.password 
properties to configure
--- End diff --

Typo again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] nutch pull request #126: NUTCH-2285 Digest Authentication support for Nutch ...

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/126#discussion_r68630240
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -108,8 +118,41 @@ public NutchServer() {
 application.setStatusService(new ErrorStatusService());
 childContext.getAttributes().put(NUTCH_SERVER, this);
 
-// Attach the application.
-component.getDefaultHost().attach(application);
+AuthenticationTypeEnum authenticationType = 
configManager.get(ConfigResource.DEFAULT)
+.getEnum("restapi.auth", AuthenticationTypeEnum.NONE);
+
+if (authenticationType == AuthenticationTypeEnum.NONE) {
--- End diff --

Please change to the more efficient switch notation. Multiple If's are 
messy and JDK1.8 has better switch support for string input values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2285) Digest Authentication Support for REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351542#comment-15351542
 ] 

ASF GitHub Bot commented on NUTCH-2285:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/126#discussion_r68630240
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -108,8 +118,41 @@ public NutchServer() {
 application.setStatusService(new ErrorStatusService());
 childContext.getAttributes().put(NUTCH_SERVER, this);
 
-// Attach the application.
-component.getDefaultHost().attach(application);
+AuthenticationTypeEnum authenticationType = 
configManager.get(ConfigResource.DEFAULT)
+.getEnum("restapi.auth", AuthenticationTypeEnum.NONE);
+
+if (authenticationType == AuthenticationTypeEnum.NONE) {
--- End diff --

Please change to the more efficient switch notation. Multiple If's are 
messy and JDK1.8 has better switch support for string input values.


> Digest Authentication Support for REST API
> --
>
> Key: NUTCH-2285
> URL: https://issues.apache.org/jira/browse/NUTCH-2285
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2284) Basic Authentication Support for REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351530#comment-15351530
 ] 

ASF GitHub Bot commented on NUTCH-2284:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/124#discussion_r68629674
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -85,7 +88,12 @@
* well as the logging granularity. If the latter option is not provided 
via
* {@link org.apache.nutch.api.NutchServer#main(String[])} then it 
defaults to
* 'INFO' however best attempts should always be made to specify a 
logging
-   * level.
+   * level.
+   * {@link org.apache.nutch.api.NutchServer} can be run as secure. 
restapi.auth property
+   * should be set to true at nutch-site.xml to enable HTTP 
basic authentication
--- End diff --

In JDK 1.8 I think the relevnt HTML encoding should be used e.g. 
nutch-site.xml


> Basic Authentication Support for REST API
> -
>
> Key: NUTCH-2284
> URL: https://issues.apache.org/jira/browse/NUTCH-2284
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add Basic Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[GitHub] nutch pull request #124: NUTCH-2284 Basic Authentication support for Nutch 2...

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/124#discussion_r68629674
  
--- Diff: src/java/org/apache/nutch/api/NutchServer.java ---
@@ -85,7 +88,12 @@
* well as the logging granularity. If the latter option is not provided 
via
* {@link org.apache.nutch.api.NutchServer#main(String[])} then it 
defaults to
* 'INFO' however best attempts should always be made to specify a 
logging
-   * level.
+   * level.
+   * {@link org.apache.nutch.api.NutchServer} can be run as secure. 
restapi.auth property
+   * should be set to true at nutch-site.xml to enable HTTP 
basic authentication
--- End diff --

In JDK 1.8 I think the relevnt HTML encoding should be used e.g. 
nutch-site.xml


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2284) Basic Authentication Support for REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351528#comment-15351528
 ] 

ASF GitHub Bot commented on NUTCH-2284:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/124#discussion_r68629394
  
--- Diff: conf/nutch-default.xml ---
@@ -1435,4 +1435,30 @@
   
 
 
+
+  restapi.auth
+  false
+  
+Whether to enable HTTP basic authentication for communicating with 
RESTAPI.
+Use the restapi.auth.username and restapi.auth.auth.password 
properties to configure
--- End diff --

Typo in restapi.auth.auth.password it should be restapi.auth.password


> Basic Authentication Support for REST API
> -
>
> Key: NUTCH-2284
> URL: https://issues.apache.org/jira/browse/NUTCH-2284
> Project: Nutch
>  Issue Type: Sub-task
>  Components: REST_api, web gui
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.5
>
>
> Add Basic Authentication for Nutch REST API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #124: NUTCH-2284 Basic Authentication support for Nutch 2...

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/124#discussion_r68629394
  
--- Diff: conf/nutch-default.xml ---
@@ -1435,4 +1435,30 @@
   
 
 
+
+  restapi.auth
+  false
+  
+Whether to enable HTTP basic authentication for communicating with 
RESTAPI.
+Use the restapi.auth.username and restapi.auth.auth.password 
properties to configure
--- End diff --

Typo in restapi.auth.auth.password it should be restapi.auth.password


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (NUTCH-2243) Documentation for Nutch 2.X REST API

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2243:

Fix Version/s: (was: 2.5)
   2.4

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2243
> URL: https://issues.apache.org/jira/browse/NUTCH-2243
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages:
> org.apache.nutch.api.*
> for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2243) Documentation for Nutch 2.X REST API

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351524#comment-15351524
 ] 

ASF GitHub Bot commented on NUTCH-2243:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/123


> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2243
> URL: https://issues.apache.org/jira/browse/NUTCH-2243
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages:
> org.apache.nutch.api.*
> for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (NUTCH-2243) Documentation for Nutch 2.X REST API

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2243.
-
Resolution: Fixed

Excellent [~kamaci] thank you

> Documentation for Nutch 2.X REST API
> 
>
> Key: NUTCH-2243
> URL: https://issues.apache.org/jira/browse/NUTCH-2243
> Project: Nutch
>  Issue Type: New Feature
>  Components: documentation, REST_api
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
> Fix For: 2.4
>
>
> This issue should build on NUTCH-1769 with full Java documentation for all 
> classes in the following packages:
> org.apache.nutch.api.*
> for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #123: NUTCH-2243 REST API documentation for Nutch 2.X

2016-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/123


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (NUTCH-2282) Incorrect content-type returned in 4 API calls

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-2282.
-
Resolution: Fixed
  Assignee: Lewis John McGibbney

Thanks for the patch [~lancergr] :)

> Incorrect content-type returned in 4 API calls
> --
>
> Key: NUTCH-2282
> URL: https://issues.apache.org/jira/browse/NUTCH-2282
> Project: Nutch
>  Issue Type: Bug
>  Components: REST_api
>Affects Versions: 2.3.1
>Reporter: Giorgos Adam
>Assignee: Lewis John McGibbney
>Priority: Trivial
>
> The REST API returns 'Content-type: application/json' instead of text/plain 
> (at least) on the following calls:
> 1. GET /admin/stop
> 2. GET /config/:configId/:property
> 3. POST /config/create
> 4. POST /job/create



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (NUTCH-2282) Incorrect content-type returned in 4 API calls

2016-06-27 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-2282:

Fix Version/s: 2.4

> Incorrect content-type returned in 4 API calls
> --
>
> Key: NUTCH-2282
> URL: https://issues.apache.org/jira/browse/NUTCH-2282
> Project: Nutch
>  Issue Type: Bug
>  Components: REST_api
>Affects Versions: 2.3.1
>Reporter: Giorgos Adam
>Assignee: Lewis John McGibbney
>Priority: Trivial
> Fix For: 2.4
>
>
> The REST API returns 'Content-type: application/json' instead of text/plain 
> (at least) on the following calls:
> 1. GET /admin/stop
> 2. GET /config/:configId/:property
> 3. POST /config/create
> 4. POST /job/create



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (NUTCH-2282) Incorrect content-type returned in 4 API calls

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351514#comment-15351514
 ] 

ASF GitHub Bot commented on NUTCH-2282:
---

Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/120


> Incorrect content-type returned in 4 API calls
> --
>
> Key: NUTCH-2282
> URL: https://issues.apache.org/jira/browse/NUTCH-2282
> Project: Nutch
>  Issue Type: Bug
>  Components: REST_api
>Affects Versions: 2.3.1
>Reporter: Giorgos Adam
>Priority: Trivial
>
> The REST API returns 'Content-type: application/json' instead of text/plain 
> (at least) on the following calls:
> 1. GET /admin/stop
> 2. GET /config/:configId/:property
> 3. POST /config/create
> 4. POST /job/create



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #120: fix for NUTCH-2282 contributed by lancergr

2016-06-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/nutch/pull/120


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2264) Check Forbidden API's at Build

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351511#comment-15351511
 ] 

ASF GitHub Bot commented on NUTCH-2264:
---

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/115#discussion_r68627471
  
--- Diff: build.xml ---
@@ -1035,4 +1039,11 @@
   
 
   
+
+  
+
+  
--- End diff --

What error?


> Check Forbidden API's at Build
> --
>
> Key: NUTCH-2264
> URL: https://issues.apache.org/jira/browse/NUTCH-2264
> Project: Nutch
>  Issue Type: Task
>Affects Versions: 2.3.1
>Reporter: Furkan KAMACI
>Assignee: Furkan KAMACI
>Priority: Minor
>
> We should avoid [forbidden 
> calls|https://github.com/policeman-tools/forbidden-apis/wiki]  and check in 
> the ant build for it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] nutch pull request #115: NUTCH-2264 Check Forbidden API's at Build

2016-06-27 Thread lewismc
Github user lewismc commented on a diff in the pull request:

https://github.com/apache/nutch/pull/115#discussion_r68627471
  
--- Diff: build.xml ---
@@ -1035,4 +1039,11 @@
   
 
   
+
+  
+
+  
--- End diff --

What error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[Nutch Wiki] New attachment added to page GoogleSummerOfCode/SecurityLayer/MidtermReport

2016-06-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page 
"GoogleSummerOfCode/SecurityLayer/MidtermReport" for change notification. An 
attachment has been added to that page by kamaci. Following detailed 
information is available:

Attachment name: Furkan_KAMACI_Midterm_Report_NUTCH-1756.pdf
Attachment size: 58732
Attachment link: 
https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/MidtermReport?action=AttachFile&do=get&target=Furkan_KAMACI_Midterm_Report_NUTCH-1756.pdf
Page link: 
https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/MidtermReport


[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2016-06-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350993#comment-15350993
 ] 

ASF GitHub Bot commented on NUTCH-2267:
---

GitHub user sjwoodard opened a pull request:

https://github.com/apache/nutch/pull/129

NUTCH-2267 - Solr and Hadoop JAR mismatch

Explicitly pass in an instance of SystemDefaultHttpClient to 
CloudSolrClient, otherwise SolrJ will use a default implementation of 
CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs 
in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and 
https://issues.apache.org/jira/browse/HADOOP-12767).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sjwoodard/nutch master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #129


commit f64686bb06cec2e31c9560d7e7e7f050311d62f1
Author: Steven 
Date:   2016-06-27T13:30:52Z

NUTCH-2267 - Solr and Hadoop JAR mismatch

Explicitly pass in an instance of SystemDefaultHttpClient to 
CloudSolrClient, otherwise SolrJ will use a default implementation of 
CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs 
in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and 
https://issues.apache.org/jira/browse/HADOOP-12767).




> Solr indexer fails at the end of the job with a java error message
> --
>
> Key: NUTCH-2267
> URL: https://issues.apache.org/jira/browse/NUTCH-2267
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: hadoop v2.7.2  solr6 in cloud configuration with 
> zookeeper 3.4.6. I use the master branch from github currently on commit 
> da252eb7b3d2d7b70   ( NUTCH - 2263 mingram and maxgram support for Unigram 
> Cosine Similarity Model is provided. )
>Reporter: kaveh minooie
> Fix For: 1.13
>
>
> this is was what I was getting first:
> 16/05/23 13:52:27 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : 
> attempt_1462499602101_0119_r_00_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
>  @58: areturn
>   Reason:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams', 
> 'org/apache/http/conn/ClientConnectionManager', 
> 'org/apache/solr/common/params/ModifiableSolrParams', 
> 'org/apache/http/impl/client/DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
>   Bytecode:
> 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0x030: b800 104e 2d2c b800 0f2d b0
>   Stackmap Table:
> append_frame(@47,Object[#143])
> 16/05/23 13:52:28 INFO mapreduce.Job:  map 100% reduce 0% 
> as you can see the failed reducer gets re-spawned. then I found this issue: 
> https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop 
> config file. after that, the indexer seems to be able to finish ( I got the 
> document in the solr, it seems ) but I still get the error message at the end 
> of the job:
> 16/05/23 16:39:26 INFO mapreduce.Job:  map 100% reduce 99%
> 16/05/23 16:39:44 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed 
> successfully
> 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53
>   File System Counters
>   FILE: Number of bytes read=42700154855
>   FILE: Number of bytes written=70210771807
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=8699202825
>   HDFS: Number of bytes written=0
>   HDFS: Number of read operations=537
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=0
>   Job Counters 
>   Launched map tasks=134
>   Launched reduce tasks=1
>   Data-local map ta

[GitHub] nutch pull request #129: NUTCH-2267 - Solr and Hadoop JAR mismatch

2016-06-27 Thread sjwoodard
GitHub user sjwoodard opened a pull request:

https://github.com/apache/nutch/pull/129

NUTCH-2267 - Solr and Hadoop JAR mismatch

Explicitly pass in an instance of SystemDefaultHttpClient to 
CloudSolrClient, otherwise SolrJ will use a default implementation of 
CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs 
in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and 
https://issues.apache.org/jira/browse/HADOOP-12767).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sjwoodard/nutch master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/nutch/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #129


commit f64686bb06cec2e31c9560d7e7e7f050311d62f1
Author: Steven 
Date:   2016-06-27T13:30:52Z

NUTCH-2267 - Solr and Hadoop JAR mismatch

Explicitly pass in an instance of SystemDefaultHttpClient to 
CloudSolrClient, otherwise SolrJ will use a default implementation of 
CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs 
in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and 
https://issues.apache.org/jira/browse/HADOOP-12767).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2016-06-27 Thread Steven W (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350979#comment-15350979
 ] 

Steven W commented on NUTCH-2267:
-

I think this is a valid bug, however it's actually a JAR mismatch between SOLR 
and HADOOP. There's an easy solution though... Just change the following in the 
indexer-solr SolrUtils.java class:

```
SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient();
CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient);
```

I'm working on a PR now.

> Solr indexer fails at the end of the job with a java error message
> --
>
> Key: NUTCH-2267
> URL: https://issues.apache.org/jira/browse/NUTCH-2267
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: hadoop v2.7.2  solr6 in cloud configuration with 
> zookeeper 3.4.6. I use the master branch from github currently on commit 
> da252eb7b3d2d7b70   ( NUTCH - 2263 mingram and maxgram support for Unigram 
> Cosine Similarity Model is provided. )
>Reporter: kaveh minooie
> Fix For: 1.13
>
>
> this is was what I was getting first:
> 16/05/23 13:52:27 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : 
> attempt_1462499602101_0119_r_00_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
>  @58: areturn
>   Reason:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams', 
> 'org/apache/http/conn/ClientConnectionManager', 
> 'org/apache/solr/common/params/ModifiableSolrParams', 
> 'org/apache/http/impl/client/DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
>   Bytecode:
> 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0x030: b800 104e 2d2c b800 0f2d b0
>   Stackmap Table:
> append_frame(@47,Object[#143])
> 16/05/23 13:52:28 INFO mapreduce.Job:  map 100% reduce 0% 
> as you can see the failed reducer gets re-spawned. then I found this issue: 
> https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop 
> config file. after that, the indexer seems to be able to finish ( I got the 
> document in the solr, it seems ) but I still get the error message at the end 
> of the job:
> 16/05/23 16:39:26 INFO mapreduce.Job:  map 100% reduce 99%
> 16/05/23 16:39:44 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed 
> successfully
> 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53
>   File System Counters
>   FILE: Number of bytes read=42700154855
>   FILE: Number of bytes written=70210771807
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=8699202825
>   HDFS: Number of bytes written=0
>   HDFS: Number of read operations=537
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=0
>   Job Counters 
>   Launched map tasks=134
>   Launched reduce tasks=1
>   Data-local map tasks=107
>   Rack-local map tasks=27
>   Total time spent by all maps in occupied slots (ms)=49377664
>   Total time spent by all reduces in occupied slots (ms)=32765064
>   Total time spent by all map tasks (ms)=3086104
>   Total time spent by all reduce tasks (ms)=1365211
>   Total vcore-milliseconds taken by all map tasks=3086104
>   Total vcore-milliseconds taken by all reduce tasks=1365211
>   Total megabyte-milliseconds taken by all map tasks=12640681984
>   Total megabyte-milliseconds taken by all reduce tasks=8387856384
>   Map-Reduce Framework
>   Map input records=25305474
>   Map output records=25305474
>   Map output bytes=27422869763
>   Map output materialized bytes=27489888004
>   Input split bytes=15225
>   Combine input records=0
>   Combine output records=0
>   Reduce input groups=

[jira] [Comment Edited] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message

2016-06-27 Thread Steven W (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350979#comment-15350979
 ] 

Steven W edited comment on NUTCH-2267 at 6/27/16 1:24 PM:
--

I think this is a valid bug, however it's actually a JAR mismatch between SOLR 
and HADOOP. There's an easy solution though... Just change the following in the 
indexer-solr SolrUtils.java class:

SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient();
CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient);

I'm working on a PR now.


was (Author: sjwoodard):
I think this is a valid bug, however it's actually a JAR mismatch between SOLR 
and HADOOP. There's an easy solution though... Just change the following in the 
indexer-solr SolrUtils.java class:

```
SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient();
CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient);
```

I'm working on a PR now.

> Solr indexer fails at the end of the job with a java error message
> --
>
> Key: NUTCH-2267
> URL: https://issues.apache.org/jira/browse/NUTCH-2267
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.12
> Environment: hadoop v2.7.2  solr6 in cloud configuration with 
> zookeeper 3.4.6. I use the master branch from github currently on commit 
> da252eb7b3d2d7b70   ( NUTCH - 2263 mingram and maxgram support for Unigram 
> Cosine Similarity Model is provided. )
>Reporter: kaveh minooie
> Fix For: 1.13
>
>
> this is was what I was getting first:
> 16/05/23 13:52:27 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : 
> attempt_1462499602101_0119_r_00_0, Status : FAILED
> Error: Bad return type
> Exception Details:
>   Location:
> org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient;
>  @58: areturn
>   Reason:
> Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, 
> stack[0]) is not assignable to 
> 'org/apache/http/impl/client/CloseableHttpClient' (from method signature)
>   Current Frame:
> bci: @58
> flags: { }
> locals: { 'org/apache/solr/common/params/SolrParams', 
> 'org/apache/http/conn/ClientConnectionManager', 
> 'org/apache/solr/common/params/ModifiableSolrParams', 
> 'org/apache/http/impl/client/DefaultHttpClient' }
> stack: { 'org/apache/http/impl/client/DefaultHttpClient' }
>   Bytecode:
> 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601
> 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209
> 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b
> 0x030: b800 104e 2d2c b800 0f2d b0
>   Stackmap Table:
> append_frame(@47,Object[#143])
> 16/05/23 13:52:28 INFO mapreduce.Job:  map 100% reduce 0% 
> as you can see the failed reducer gets re-spawned. then I found this issue: 
> https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop 
> config file. after that, the indexer seems to be able to finish ( I got the 
> document in the solr, it seems ) but I still get the error message at the end 
> of the job:
> 16/05/23 16:39:26 INFO mapreduce.Job:  map 100% reduce 99%
> 16/05/23 16:39:44 INFO mapreduce.Job:  map 100% reduce 100%
> 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed 
> successfully
> 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53
>   File System Counters
>   FILE: Number of bytes read=42700154855
>   FILE: Number of bytes written=70210771807
>   FILE: Number of read operations=0
>   FILE: Number of large read operations=0
>   FILE: Number of write operations=0
>   HDFS: Number of bytes read=8699202825
>   HDFS: Number of bytes written=0
>   HDFS: Number of read operations=537
>   HDFS: Number of large read operations=0
>   HDFS: Number of write operations=0
>   Job Counters 
>   Launched map tasks=134
>   Launched reduce tasks=1
>   Data-local map tasks=107
>   Rack-local map tasks=27
>   Total time spent by all maps in occupied slots (ms)=49377664
>   Total time spent by all reduces in occupied slots (ms)=32765064
>   Total time spent by all map tasks (ms)=3086104
>   Total time spent by all reduce tasks (ms)=1365211
>   Total vcore-milliseconds taken by all map tasks=3086104
>   Total vcore-milliseconds taken by all reduce tasks=1365211
>   Total megabyte-milliseconds taken by all map tasks=12640681984
>