[jira] [Commented] (NUTCH-2269) Clean not working after crawl
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351824#comment-15351824 ] Sebastian Nagel commented on NUTCH-2269: Thanks for reporting the problems. Afaics, they can be solved by using "clean" the right way in combination with the required Solr version: # "nutch clean" will not run on the linkdb: #* the command-line help is clear {noformat} % bin/nutch clean Usage: CleaningJob [-noCommit] {noformat} #* and also the error message gives a clear hint: {noformat} java.lang.Exception: java.lang.ClassCastException: org.apache.nutch.crawl.Inlinks cannot be cast to org.apache.nutch.crawl.CrawlDatum ... 2016-06-27 22:00:09,628 ERROR indexer.CleaningJob - CleaningJob: java.io.IOException: Job failed! ... 2016-06-27 22:00:52,057 ERROR indexer.CleaningJob - Missing crawldb. Usage: CleaningJob [-noCommit] {noformat} #* unfortunately, both CrawlDb and LinkDb are formally map files which makes it difficult to check the right usage in advance. # I was able to reproduce the error "IllegalStateException: Connection pool shut down" when using Nutch 1.12 in combination with Solr 4.10.4. However, Nutch 1.12 is built against Solr 5.4.1 which is probably the reason. Are you able to reproduce the problem with the correct Solr version? # The message {noformat} WARN output.FileOutputCommitter - Output Path is null in commitJob() {noformat} is only a warning and no problem: Indeed, the cleaning job is a map-reduce job without output, deletions are sent to the Solr server. It's uncommon for a map-reduce job to have no output but it is not a problem. > Clean not working after crawl > - > > Key: NUTCH-2269 > URL: https://issues.apache.org/jira/browse/NUTCH-2269 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Vagrant, Ubuntu, Java 8, Solr 4.10 >Reporter: Francesco Capponi > > I'm have been having this problem for a while and I had to rollback using the > old solr clean instead of the newer version. > Once it inserts/update correctly every document in Nutch, when it tries to > clean, it returns error 255: > {quote} > 2016-05-30 10:13:04,992 WARN output.FileOutputCommitter - Output Path is > null in setupJob() > 2016-05-30 10:13:07,284 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: content dest: > content > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: title dest: > title > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: host dest: host > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: segment dest: > segment > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: boost dest: > boost > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: digest dest: > digest > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2016-05-30 10:13:08,133 INFO solr.SolrIndexWriter - SolrIndexer: deleting > 15/15 documents > 2016-05-30 10:13:08,919 WARN output.FileOutputCommitter - Output Path is > null in cleanupJob() > 2016-05-30 10:13:08,937 WARN mapred.LocalJobRunner - job_local662730477_0001 > java.lang.Exception: java.lang.IllegalStateException: Connection pool shut > down > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) > Caused by: java.lang.IllegalStateException: Connection pool shut down > at org.apache.http.util.Asserts.check(Asserts.java:34) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) > at > org.apache.solr.client.solrj.SolrRequest.proces
[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output Path is null in commitJob() {noformat} is only a warning and no problem: Indeed, the cleaning job is a map-reduce job without output, deletions are sent to the Solr server. That's uncommon but not a problem.) > Clean not working after crawl > - > > Key: NUTCH-2269 > URL: https://issues.apache.org/jira/browse/NUTCH-2269 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Vagrant, Ubuntu, Java 8, Solr 4.10 >Reporter: Francesco Capponi > > I'm have been having this problem for a while and I had to rollback using the > old solr clean instead of the newer version. > Once it inserts/update correctly every document in Nutch, when it tries to > clean, it returns error 255: > {quote} > 2016-05-30 10:13:04,992 WARN output.FileOutputCommitter - Output Path is > null in setupJob() > 2016-05-30 10:13:07,284 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: content dest: > content > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: title dest: > title > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: host dest: host > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: segment dest: > segment > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: boost dest: > boost > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: digest dest: > digest > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2016-05-30 10:13:08,133 INFO solr.SolrIndexWriter - SolrIndexer: deleting > 15/15 documents > 2016-05-30 10:13:08,919 WARN output.FileOutputCommitter - Output Path is > null in cleanupJob() > 2016-05-30 10:13:08,937 WARN mapred.LocalJobRunner - job_local662730477_0001 > java.lang.Exception: java.lang.IllegalStateException: Connection pool shut > down > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) > Caused by: java.lang.IllegalStateException: Connection pool shut down > at org.apache.http.util.Asserts.check(Asserts.java:34) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178) > at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) > at > org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output Path is null in commitJob() {noformat} is only a warning and no problem: Indeed, the cleaning job is a map-reduce job without output, deletions are sent to the Solr server. That's uncommon but not a problem.) > Clean not working after crawl > - > > Key: NUTCH-2269 > URL: https://issues.apache.org/jira/browse/NUTCH-2269 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Vagrant, Ubuntu, Java 8, Solr 4.10 >Reporter: Francesco Capponi > > I'm have been having this problem for a while and I had to rollback using the > old solr clean instead of the newer version. > Once it inserts/update correctly every document in Nutch, when it tries to > clean, it returns error 255: > {quote} > 2016-05-30 10:13:04,992 WARN output.FileOutputCommitter - Output Path is > null in setupJob() > 2016-05-30 10:13:07,284 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: content dest: > content > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: title dest: > title > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: host dest: host > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: segment dest: > segment > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: boost dest: > boost > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: digest dest: > digest > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2016-05-30 10:13:08,133 INFO solr.SolrIndexWriter - SolrIndexer: deleting > 15/15 documents > 2016-05-30 10:13:08,919 WARN output.FileOutputCommitter - Output Path is > null in cleanupJob() > 2016-05-30 10:13:08,937 WARN mapred.LocalJobRunner - job_local662730477_0001 > java.lang.Exception: java.lang.IllegalStateException: Connection pool shut > down > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) > Caused by: java.lang.IllegalStateException: Connection pool shut down > at org.apache.http.util.Asserts.check(Asserts.java:34) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178) > at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) > at > org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
[jira] [Issue Comment Deleted] (NUTCH-2269) Clean not working after crawl
[ https://issues.apache.org/jira/browse/NUTCH-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sebastian Nagel updated NUTCH-2269: --- Comment: was deleted (was: The message {noformat} WARN output.FileOutputCommitter - Output Path is null in commitJob() {noformat} is only a warning and no problem: Indeed, the cleaning job is a map-reduce job without output, deletions are sent to the Solr server. That's uncommon but not a problem.) > Clean not working after crawl > - > > Key: NUTCH-2269 > URL: https://issues.apache.org/jira/browse/NUTCH-2269 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: Vagrant, Ubuntu, Java 8, Solr 4.10 >Reporter: Francesco Capponi > > I'm have been having this problem for a while and I had to rollback using the > old solr clean instead of the newer version. > Once it inserts/update correctly every document in Nutch, when it tries to > clean, it returns error 255: > {quote} > 2016-05-30 10:13:04,992 WARN output.FileOutputCommitter - Output Path is > null in setupJob() > 2016-05-30 10:13:07,284 INFO indexer.IndexWriters - Adding > org.apache.nutch.indexwriter.solr.SolrIndexWriter > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: content dest: > content > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: title dest: > title > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: host dest: host > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: segment dest: > segment > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: boost dest: > boost > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: digest dest: > digest > 2016-05-30 10:13:08,114 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2016-05-30 10:13:08,133 INFO solr.SolrIndexWriter - SolrIndexer: deleting > 15/15 documents > 2016-05-30 10:13:08,919 WARN output.FileOutputCommitter - Output Path is > null in cleanupJob() > 2016-05-30 10:13:08,937 WARN mapred.LocalJobRunner - job_local662730477_0001 > java.lang.Exception: java.lang.IllegalStateException: Connection pool shut > down > at > org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) > Caused by: java.lang.IllegalStateException: Connection pool shut down > at org.apache.http.util.Asserts.check(Asserts.java:34) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:169) > at > org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:202) > at > org.apache.http.impl.conn.PoolingClientConnectionManager.requestConnection(PoolingClientConnectionManager.java:184) > at > org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:415) > at > org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) > at > org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:480) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) > at > org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:483) > at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:464) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:190) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178) > at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) > at > org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) > at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at > org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
[jira] [Resolved] (NUTCH-2022) Investigate better documentation for the Nutch REST API's
[ https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2022. - Resolution: Fixed > Investigate better documentation for the Nutch REST API's > - > > Key: NUTCH-2022 > URL: https://issues.apache.org/jira/browse/NUTCH-2022 > Project: Nutch > Issue Type: Wish > Components: REST_api >Affects Versions: 2.3, 1.10 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.4, 1.11 > > > Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better > representation of the Tika REST API. > Based on recent development on both 1.X and 2.x REST API's, it would be nice > to have a better interface for people to see. > An example of Miredot REST API docs can be seen on [Tika REST API > docs|http://tika.apache.org/1.8/miredot/index.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2022) Investigate better documentation for the Nutch REST API's
[ https://issues.apache.org/jira/browse/NUTCH-2022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2022: Fix Version/s: 1.11 2.4 > Investigate better documentation for the Nutch REST API's > - > > Key: NUTCH-2022 > URL: https://issues.apache.org/jira/browse/NUTCH-2022 > Project: Nutch > Issue Type: Wish > Components: REST_api >Affects Versions: 2.3, 1.10 >Reporter: Lewis John McGibbney >Assignee: Furkan KAMACI > Fix For: 2.4, 1.11 > > > Over on Apache Tika we use [Miredot|http://www.miredot.com/] for better > representation of the Tika REST API. > Based on recent development on both 1.X and 2.x REST API's, it would be nice > to have a better interface for people to see. > An example of Miredot REST API docs can be seen on [Tika REST API > docs|http://tika.apache.org/1.8/miredot/index.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2243) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351610#comment-15351610 ] Hudson commented on NUTCH-2243: --- SUCCESS: Integrated in Nutch-nutchgora #1562 (See [https://builds.apache.org/job/Nutch-nutchgora/1562/]) NUTCH-2243 REST API documentation for Nutch 2.X (furkankamaci: rev 728d0de8bac399ac8dff5d0a0eee89f5c53428b9) * ivy/mvn.template * build.xml NUTCH-2243 Miredot plugin version and licence configuration are updated. (furkankamaci: rev 6f7ca5bc0b08c3e3f8f4aa23e4924ad159d7222f) * ivy/mvn.template > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2243 > URL: https://issues.apache.org/jira/browse/NUTCH-2243 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > This issue should build on NUTCH-1769 with full Java documentation for all > classes in the following packages: > org.apache.nutch.api.* > for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2282) Incorrect content-type returned in 4 API calls
[ https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351609#comment-15351609 ] Hudson commented on NUTCH-2282: --- SUCCESS: Integrated in Nutch-nutchgora #1562 (See [https://builds.apache.org/job/Nutch-nutchgora/1562/]) fix for NUTCH-2282 contributed by lancergr (g.adam: rev 98814a0270cfc173998366618171f746ec4d304b) * src/java/org/apache/nutch/api/resources/AdminResource.java * src/java/org/apache/nutch/api/resources/ConfigResource.java * src/java/org/apache/nutch/api/resources/JobResource.java > Incorrect content-type returned in 4 API calls > -- > > Key: NUTCH-2282 > URL: https://issues.apache.org/jira/browse/NUTCH-2282 > Project: Nutch > Issue Type: Bug > Components: REST_api >Affects Versions: 2.3.1 >Reporter: Giorgos Adam >Assignee: Lewis John McGibbney >Priority: Trivial > Fix For: 2.4 > > > The REST API returns 'Content-type: application/json' instead of text/plain > (at least) on the following calls: > 1. GET /admin/stop > 2. GET /config/:configId/:property > 3. POST /config/create > 4. POST /job/create -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2289) SSL Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351553#comment-15351553 ] ASF GitHub Bot commented on NUTCH-2289: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/128#discussion_r68630587 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -99,17 +114,70 @@ public NutchServer() { component = new Component(); component.getLogger().setLevel(Level.parse(logLevel)); -// Add a new HTTP server listening on defined port. -component.getServers().add(Protocol.HTTP, port); +AuthenticationTypeEnum authenticationType = configManager.get(ConfigResource.DEFAULT) +.getEnum("restapi.auth", AuthenticationTypeEnum.NONE); + +if (authenticationType == AuthenticationTypeEnum.SSL) { + // Add a new HTTPS server listening on defined port. + Server server = component.getServers().add(Protocol.HTTPS, port); + + Series parameters = server.getContext().getParameters(); + parameters.add("sslContextFactory", "org.restlet.engine.ssl.DefaultSslContextFactory"); + + String keyStorePath = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.storepath", "etc/nutch-ssl.keystore.jks"); + parameters.add("keyStorePath", keyStorePath); + + String keyStorePassword = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.storepass", "password"); + parameters.add("keyStorePassword", keyStorePassword); + + String keyPassword = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.keypass", "password"); + parameters.add("keyPassword", keyPassword); + + parameters.add("keyStoreType", "JKS"); + LOG.info("SSL Authentication is set for NutchServer"); +} else { + // Add a new HTTP server listening on defined port. + component.getServers().add(Protocol.HTTP, port); +} Context childContext = component.getContext().createChildContext(); JaxRsApplication application = new JaxRsApplication(childContext); application.add(this); application.setStatusService(new ErrorStatusService()); childContext.getAttributes().put(NUTCH_SERVER, this); -// Attach the application. -component.getDefaultHost().attach(application); +if (authenticationType == AuthenticationTypeEnum.NONE || authenticationType == AuthenticationTypeEnum.SSL ) { + component.getDefaultHost().attach(application); + return; +} + +String username = configManager.get(ConfigResource.DEFAULT).get("restapi.auth.username", "admin"); +String password = configManager.get(ConfigResource.DEFAULT).get("restapi.auth.password", "nutch"); + +MapVerifier mapVerifier = new MapVerifier(); +mapVerifier.getLocalSecrets().put(username, password.toCharArray()); + +if (authenticationType == AuthenticationTypeEnum.BASIC) { --- End diff -- Please see comment on other issue for use of switch block. > SSL Support for REST API > > > Key: NUTCH-2289 > URL: https://issues.apache.org/jira/browse/NUTCH-2289 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add SSL Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #128: NUTCH-2289 SSL support for Nutch 2.X REST API.
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/128#discussion_r68630587 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -99,17 +114,70 @@ public NutchServer() { component = new Component(); component.getLogger().setLevel(Level.parse(logLevel)); -// Add a new HTTP server listening on defined port. -component.getServers().add(Protocol.HTTP, port); +AuthenticationTypeEnum authenticationType = configManager.get(ConfigResource.DEFAULT) +.getEnum("restapi.auth", AuthenticationTypeEnum.NONE); + +if (authenticationType == AuthenticationTypeEnum.SSL) { + // Add a new HTTPS server listening on defined port. + Server server = component.getServers().add(Protocol.HTTPS, port); + + Series parameters = server.getContext().getParameters(); + parameters.add("sslContextFactory", "org.restlet.engine.ssl.DefaultSslContextFactory"); + + String keyStorePath = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.storepath", "etc/nutch-ssl.keystore.jks"); + parameters.add("keyStorePath", keyStorePath); + + String keyStorePassword = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.storepass", "password"); + parameters.add("keyStorePassword", keyStorePassword); + + String keyPassword = configManager.get(ConfigResource.DEFAULT) + .get("restapi.auth.ssl.keypass", "password"); + parameters.add("keyPassword", keyPassword); + + parameters.add("keyStoreType", "JKS"); + LOG.info("SSL Authentication is set for NutchServer"); +} else { + // Add a new HTTP server listening on defined port. + component.getServers().add(Protocol.HTTP, port); +} Context childContext = component.getContext().createChildContext(); JaxRsApplication application = new JaxRsApplication(childContext); application.add(this); application.setStatusService(new ErrorStatusService()); childContext.getAttributes().put(NUTCH_SERVER, this); -// Attach the application. -component.getDefaultHost().attach(application); +if (authenticationType == AuthenticationTypeEnum.NONE || authenticationType == AuthenticationTypeEnum.SSL ) { + component.getDefaultHost().attach(application); + return; +} + +String username = configManager.get(ConfigResource.DEFAULT).get("restapi.auth.username", "admin"); +String password = configManager.get(ConfigResource.DEFAULT).get("restapi.auth.password", "nutch"); + +MapVerifier mapVerifier = new MapVerifier(); +mapVerifier.getLocalSecrets().put(username, password.toCharArray()); + +if (authenticationType == AuthenticationTypeEnum.BASIC) { --- End diff -- Please see comment on other issue for use of switch block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2289) SSL Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351549#comment-15351549 ] ASF GitHub Bot commented on NUTCH-2289: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/128#discussion_r68630436 --- Diff: conf/nutch-default.xml --- @@ -1435,4 +1435,60 @@ + + restapi.auth + NONE + +Configures authentication type for communicating with RESTAPI. Valid values are BASIC, DIGEST, SSL and NONE. +When no authentication type is defined NONE will be used as default which does not provide security. +Use the restapi.auth.username and restapi.auth.auth.password properties to configure --- End diff -- Typo again > SSL Support for REST API > > > Key: NUTCH-2289 > URL: https://issues.apache.org/jira/browse/NUTCH-2289 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add SSL Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #128: NUTCH-2289 SSL support for Nutch 2.X REST API.
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/128#discussion_r68630436 --- Diff: conf/nutch-default.xml --- @@ -1435,4 +1435,60 @@ + + restapi.auth + NONE + +Configures authentication type for communicating with RESTAPI. Valid values are BASIC, DIGEST, SSL and NONE. +When no authentication type is defined NONE will be used as default which does not provide security. +Use the restapi.auth.username and restapi.auth.auth.password properties to configure --- End diff -- Typo again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] nutch pull request #126: NUTCH-2285 Digest Authentication support for Nutch ...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/126#discussion_r68630240 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -108,8 +118,41 @@ public NutchServer() { application.setStatusService(new ErrorStatusService()); childContext.getAttributes().put(NUTCH_SERVER, this); -// Attach the application. -component.getDefaultHost().attach(application); +AuthenticationTypeEnum authenticationType = configManager.get(ConfigResource.DEFAULT) +.getEnum("restapi.auth", AuthenticationTypeEnum.NONE); + +if (authenticationType == AuthenticationTypeEnum.NONE) { --- End diff -- Please change to the more efficient switch notation. Multiple If's are messy and JDK1.8 has better switch support for string input values. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2285) Digest Authentication Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351542#comment-15351542 ] ASF GitHub Bot commented on NUTCH-2285: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/126#discussion_r68630240 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -108,8 +118,41 @@ public NutchServer() { application.setStatusService(new ErrorStatusService()); childContext.getAttributes().put(NUTCH_SERVER, this); -// Attach the application. -component.getDefaultHost().attach(application); +AuthenticationTypeEnum authenticationType = configManager.get(ConfigResource.DEFAULT) +.getEnum("restapi.auth", AuthenticationTypeEnum.NONE); + +if (authenticationType == AuthenticationTypeEnum.NONE) { --- End diff -- Please change to the more efficient switch notation. Multiple If's are messy and JDK1.8 has better switch support for string input values. > Digest Authentication Support for REST API > -- > > Key: NUTCH-2285 > URL: https://issues.apache.org/jira/browse/NUTCH-2285 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2284) Basic Authentication Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351530#comment-15351530 ] ASF GitHub Bot commented on NUTCH-2284: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/124#discussion_r68629674 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -85,7 +88,12 @@ * well as the logging granularity. If the latter option is not provided via * {@link org.apache.nutch.api.NutchServer#main(String[])} then it defaults to * 'INFO' however best attempts should always be made to specify a logging - * level. + * level. + * {@link org.apache.nutch.api.NutchServer} can be run as secure. restapi.auth property + * should be set to true at nutch-site.xml to enable HTTP basic authentication --- End diff -- In JDK 1.8 I think the relevnt HTML encoding should be used e.g.nutch-site.xml
> Basic Authentication Support for REST API > - > > Key: NUTCH-2284 > URL: https://issues.apache.org/jira/browse/NUTCH-2284 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add Basic Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #124: NUTCH-2284 Basic Authentication support for Nutch 2...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/124#discussion_r68629674 --- Diff: src/java/org/apache/nutch/api/NutchServer.java --- @@ -85,7 +88,12 @@ * well as the logging granularity. If the latter option is not provided via * {@link org.apache.nutch.api.NutchServer#main(String[])} then it defaults to * 'INFO' however best attempts should always be made to specify a logging - * level. + * level. + * {@link org.apache.nutch.api.NutchServer} can be run as secure. restapi.auth property + * should be set to true at nutch-site.xml to enable HTTP basic authentication --- End diff -- In JDK 1.8 I think the relevnt HTML encoding should be used e.g.nutch-site.xml
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2284) Basic Authentication Support for REST API
[ https://issues.apache.org/jira/browse/NUTCH-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351528#comment-15351528 ] ASF GitHub Bot commented on NUTCH-2284: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/124#discussion_r68629394 --- Diff: conf/nutch-default.xml --- @@ -1435,4 +1435,30 @@ + + restapi.auth + false + +Whether to enable HTTP basic authentication for communicating with RESTAPI. +Use the restapi.auth.username and restapi.auth.auth.password properties to configure --- End diff -- Typo in restapi.auth.auth.password it should be restapi.auth.password > Basic Authentication Support for REST API > - > > Key: NUTCH-2284 > URL: https://issues.apache.org/jira/browse/NUTCH-2284 > Project: Nutch > Issue Type: Sub-task > Components: REST_api, web gui >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.5 > > > Add Basic Authentication for Nutch REST API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #124: NUTCH-2284 Basic Authentication support for Nutch 2...
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/124#discussion_r68629394 --- Diff: conf/nutch-default.xml --- @@ -1435,4 +1435,30 @@ + + restapi.auth + false + +Whether to enable HTTP basic authentication for communicating with RESTAPI. +Use the restapi.auth.username and restapi.auth.auth.password properties to configure --- End diff -- Typo in restapi.auth.auth.password it should be restapi.auth.password --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (NUTCH-2243) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2243: Fix Version/s: (was: 2.5) 2.4 > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2243 > URL: https://issues.apache.org/jira/browse/NUTCH-2243 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > This issue should build on NUTCH-1769 with full Java documentation for all > classes in the following packages: > org.apache.nutch.api.* > for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2243) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351524#comment-15351524 ] ASF GitHub Bot commented on NUTCH-2243: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/123 > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2243 > URL: https://issues.apache.org/jira/browse/NUTCH-2243 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > This issue should build on NUTCH-1769 with full Java documentation for all > classes in the following packages: > org.apache.nutch.api.* > for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (NUTCH-2243) Documentation for Nutch 2.X REST API
[ https://issues.apache.org/jira/browse/NUTCH-2243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2243. - Resolution: Fixed Excellent [~kamaci] thank you > Documentation for Nutch 2.X REST API > > > Key: NUTCH-2243 > URL: https://issues.apache.org/jira/browse/NUTCH-2243 > Project: Nutch > Issue Type: New Feature > Components: documentation, REST_api >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI > Fix For: 2.4 > > > This issue should build on NUTCH-1769 with full Java documentation for all > classes in the following packages: > org.apache.nutch.api.* > for Nutch 2.x as done at NUTCH-1800 for Nutch 1.x -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #123: NUTCH-2243 REST API documentation for Nutch 2.X
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/123 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (NUTCH-2282) Incorrect content-type returned in 4 API calls
[ https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-2282. - Resolution: Fixed Assignee: Lewis John McGibbney Thanks for the patch [~lancergr] :) > Incorrect content-type returned in 4 API calls > -- > > Key: NUTCH-2282 > URL: https://issues.apache.org/jira/browse/NUTCH-2282 > Project: Nutch > Issue Type: Bug > Components: REST_api >Affects Versions: 2.3.1 >Reporter: Giorgos Adam >Assignee: Lewis John McGibbney >Priority: Trivial > > The REST API returns 'Content-type: application/json' instead of text/plain > (at least) on the following calls: > 1. GET /admin/stop > 2. GET /config/:configId/:property > 3. POST /config/create > 4. POST /job/create -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (NUTCH-2282) Incorrect content-type returned in 4 API calls
[ https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-2282: Fix Version/s: 2.4 > Incorrect content-type returned in 4 API calls > -- > > Key: NUTCH-2282 > URL: https://issues.apache.org/jira/browse/NUTCH-2282 > Project: Nutch > Issue Type: Bug > Components: REST_api >Affects Versions: 2.3.1 >Reporter: Giorgos Adam >Assignee: Lewis John McGibbney >Priority: Trivial > Fix For: 2.4 > > > The REST API returns 'Content-type: application/json' instead of text/plain > (at least) on the following calls: > 1. GET /admin/stop > 2. GET /config/:configId/:property > 3. POST /config/create > 4. POST /job/create -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (NUTCH-2282) Incorrect content-type returned in 4 API calls
[ https://issues.apache.org/jira/browse/NUTCH-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351514#comment-15351514 ] ASF GitHub Bot commented on NUTCH-2282: --- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/120 > Incorrect content-type returned in 4 API calls > -- > > Key: NUTCH-2282 > URL: https://issues.apache.org/jira/browse/NUTCH-2282 > Project: Nutch > Issue Type: Bug > Components: REST_api >Affects Versions: 2.3.1 >Reporter: Giorgos Adam >Priority: Trivial > > The REST API returns 'Content-type: application/json' instead of text/plain > (at least) on the following calls: > 1. GET /admin/stop > 2. GET /config/:configId/:property > 3. POST /config/create > 4. POST /job/create -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #120: fix for NUTCH-2282 contributed by lancergr
Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2264) Check Forbidden API's at Build
[ https://issues.apache.org/jira/browse/NUTCH-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351511#comment-15351511 ] ASF GitHub Bot commented on NUTCH-2264: --- Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/115#discussion_r68627471 --- Diff: build.xml --- @@ -1035,4 +1039,11 @@ + + + + --- End diff -- What error? > Check Forbidden API's at Build > -- > > Key: NUTCH-2264 > URL: https://issues.apache.org/jira/browse/NUTCH-2264 > Project: Nutch > Issue Type: Task >Affects Versions: 2.3.1 >Reporter: Furkan KAMACI >Assignee: Furkan KAMACI >Priority: Minor > > We should avoid [forbidden > calls|https://github.com/policeman-tools/forbidden-apis/wiki] and check in > the ant build for it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] nutch pull request #115: NUTCH-2264 Check Forbidden API's at Build
Github user lewismc commented on a diff in the pull request: https://github.com/apache/nutch/pull/115#discussion_r68627471 --- Diff: build.xml --- @@ -1035,4 +1039,11 @@ + + + + --- End diff -- What error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[Nutch Wiki] New attachment added to page GoogleSummerOfCode/SecurityLayer/MidtermReport
Dear Wiki user, You have subscribed to a wiki page "GoogleSummerOfCode/SecurityLayer/MidtermReport" for change notification. An attachment has been added to that page by kamaci. Following detailed information is available: Attachment name: Furkan_KAMACI_Midterm_Report_NUTCH-1756.pdf Attachment size: 58732 Attachment link: https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/MidtermReport?action=AttachFile&do=get&target=Furkan_KAMACI_Midterm_Report_NUTCH-1756.pdf Page link: https://wiki.apache.org/nutch/GoogleSummerOfCode/SecurityLayer/MidtermReport
[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350993#comment-15350993 ] ASF GitHub Bot commented on NUTCH-2267: --- GitHub user sjwoodard opened a pull request: https://github.com/apache/nutch/pull/129 NUTCH-2267 - Solr and Hadoop JAR mismatch Explicitly pass in an instance of SystemDefaultHttpClient to CloudSolrClient, otherwise SolrJ will use a default implementation of CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and https://issues.apache.org/jira/browse/HADOOP-12767). You can merge this pull request into a Git repository by running: $ git pull https://github.com/sjwoodard/nutch master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit f64686bb06cec2e31c9560d7e7e7f050311d62f1 Author: Steven Date: 2016-06-27T13:30:52Z NUTCH-2267 - Solr and Hadoop JAR mismatch Explicitly pass in an instance of SystemDefaultHttpClient to CloudSolrClient, otherwise SolrJ will use a default implementation of CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and https://issues.apache.org/jira/browse/HADOOP-12767). > Solr indexer fails at the end of the job with a java error message > -- > > Key: NUTCH-2267 > URL: https://issues.apache.org/jira/browse/NUTCH-2267 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: hadoop v2.7.2 solr6 in cloud configuration with > zookeeper 3.4.6. I use the master branch from github currently on commit > da252eb7b3d2d7b70 ( NUTCH - 2263 mingram and maxgram support for Unigram > Cosine Similarity Model is provided. ) >Reporter: kaveh minooie > Fix For: 1.13 > > > this is was what I was getting first: > 16/05/23 13:52:27 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : > attempt_1462499602101_0119_r_00_0, Status : FAILED > Error: Bad return type > Exception Details: > Location: > org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; > @58: areturn > Reason: > Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, > stack[0]) is not assignable to > 'org/apache/http/impl/client/CloseableHttpClient' (from method signature) > Current Frame: > bci: @58 > flags: { } > locals: { 'org/apache/solr/common/params/SolrParams', > 'org/apache/http/conn/ClientConnectionManager', > 'org/apache/solr/common/params/ModifiableSolrParams', > 'org/apache/http/impl/client/DefaultHttpClient' } > stack: { 'org/apache/http/impl/client/DefaultHttpClient' } > Bytecode: > 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601 > 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209 > 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b > 0x030: b800 104e 2d2c b800 0f2d b0 > Stackmap Table: > append_frame(@47,Object[#143]) > 16/05/23 13:52:28 INFO mapreduce.Job: map 100% reduce 0% > as you can see the failed reducer gets re-spawned. then I found this issue: > https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop > config file. after that, the indexer seems to be able to finish ( I got the > document in the solr, it seems ) but I still get the error message at the end > of the job: > 16/05/23 16:39:26 INFO mapreduce.Job: map 100% reduce 99% > 16/05/23 16:39:44 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed > successfully > 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53 > File System Counters > FILE: Number of bytes read=42700154855 > FILE: Number of bytes written=70210771807 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=8699202825 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=537 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Launched map tasks=134 > Launched reduce tasks=1 > Data-local map ta
[GitHub] nutch pull request #129: NUTCH-2267 - Solr and Hadoop JAR mismatch
GitHub user sjwoodard opened a pull request: https://github.com/apache/nutch/pull/129 NUTCH-2267 - Solr and Hadoop JAR mismatch Explicitly pass in an instance of SystemDefaultHttpClient to CloudSolrClient, otherwise SolrJ will use a default implementation of CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and https://issues.apache.org/jira/browse/HADOOP-12767). You can merge this pull request into a Git repository by running: $ git pull https://github.com/sjwoodard/nutch master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nutch/pull/129.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #129 commit f64686bb06cec2e31c9560d7e7e7f050311d62f1 Author: Steven Date: 2016-06-27T13:30:52Z NUTCH-2267 - Solr and Hadoop JAR mismatch Explicitly pass in an instance of SystemDefaultHttpClient to CloudSolrClient, otherwise SolrJ will use a default implementation of CloseableHttpClient, which is not present in the HttpClient and HttpCore JARs in Hadoop < 2.8 (see https://issues.apache.org/jira/browse/SOLR-7948 and https://issues.apache.org/jira/browse/HADOOP-12767). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350979#comment-15350979 ] Steven W commented on NUTCH-2267: - I think this is a valid bug, however it's actually a JAR mismatch between SOLR and HADOOP. There's an easy solution though... Just change the following in the indexer-solr SolrUtils.java class: ``` SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient(); CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient); ``` I'm working on a PR now. > Solr indexer fails at the end of the job with a java error message > -- > > Key: NUTCH-2267 > URL: https://issues.apache.org/jira/browse/NUTCH-2267 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: hadoop v2.7.2 solr6 in cloud configuration with > zookeeper 3.4.6. I use the master branch from github currently on commit > da252eb7b3d2d7b70 ( NUTCH - 2263 mingram and maxgram support for Unigram > Cosine Similarity Model is provided. ) >Reporter: kaveh minooie > Fix For: 1.13 > > > this is was what I was getting first: > 16/05/23 13:52:27 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : > attempt_1462499602101_0119_r_00_0, Status : FAILED > Error: Bad return type > Exception Details: > Location: > org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; > @58: areturn > Reason: > Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, > stack[0]) is not assignable to > 'org/apache/http/impl/client/CloseableHttpClient' (from method signature) > Current Frame: > bci: @58 > flags: { } > locals: { 'org/apache/solr/common/params/SolrParams', > 'org/apache/http/conn/ClientConnectionManager', > 'org/apache/solr/common/params/ModifiableSolrParams', > 'org/apache/http/impl/client/DefaultHttpClient' } > stack: { 'org/apache/http/impl/client/DefaultHttpClient' } > Bytecode: > 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601 > 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209 > 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b > 0x030: b800 104e 2d2c b800 0f2d b0 > Stackmap Table: > append_frame(@47,Object[#143]) > 16/05/23 13:52:28 INFO mapreduce.Job: map 100% reduce 0% > as you can see the failed reducer gets re-spawned. then I found this issue: > https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop > config file. after that, the indexer seems to be able to finish ( I got the > document in the solr, it seems ) but I still get the error message at the end > of the job: > 16/05/23 16:39:26 INFO mapreduce.Job: map 100% reduce 99% > 16/05/23 16:39:44 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed > successfully > 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53 > File System Counters > FILE: Number of bytes read=42700154855 > FILE: Number of bytes written=70210771807 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=8699202825 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=537 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Launched map tasks=134 > Launched reduce tasks=1 > Data-local map tasks=107 > Rack-local map tasks=27 > Total time spent by all maps in occupied slots (ms)=49377664 > Total time spent by all reduces in occupied slots (ms)=32765064 > Total time spent by all map tasks (ms)=3086104 > Total time spent by all reduce tasks (ms)=1365211 > Total vcore-milliseconds taken by all map tasks=3086104 > Total vcore-milliseconds taken by all reduce tasks=1365211 > Total megabyte-milliseconds taken by all map tasks=12640681984 > Total megabyte-milliseconds taken by all reduce tasks=8387856384 > Map-Reduce Framework > Map input records=25305474 > Map output records=25305474 > Map output bytes=27422869763 > Map output materialized bytes=27489888004 > Input split bytes=15225 > Combine input records=0 > Combine output records=0 > Reduce input groups=
[jira] [Comment Edited] (NUTCH-2267) Solr indexer fails at the end of the job with a java error message
[ https://issues.apache.org/jira/browse/NUTCH-2267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15350979#comment-15350979 ] Steven W edited comment on NUTCH-2267 at 6/27/16 1:24 PM: -- I think this is a valid bug, however it's actually a JAR mismatch between SOLR and HADOOP. There's an easy solution though... Just change the following in the indexer-solr SolrUtils.java class: SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient(); CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient); I'm working on a PR now. was (Author: sjwoodard): I think this is a valid bug, however it's actually a JAR mismatch between SOLR and HADOOP. There's an easy solution though... Just change the following in the indexer-solr SolrUtils.java class: ``` SystemDefaultHttpClient httpClient = new SystemDefaultHttpClient(); CloudSolrClient sc = new CloudSolrClient(url.replace('|', ','), httpClient); ``` I'm working on a PR now. > Solr indexer fails at the end of the job with a java error message > -- > > Key: NUTCH-2267 > URL: https://issues.apache.org/jira/browse/NUTCH-2267 > Project: Nutch > Issue Type: Bug > Components: indexer >Affects Versions: 1.12 > Environment: hadoop v2.7.2 solr6 in cloud configuration with > zookeeper 3.4.6. I use the master branch from github currently on commit > da252eb7b3d2d7b70 ( NUTCH - 2263 mingram and maxgram support for Unigram > Cosine Similarity Model is provided. ) >Reporter: kaveh minooie > Fix For: 1.13 > > > this is was what I was getting first: > 16/05/23 13:52:27 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 13:52:27 INFO mapreduce.Job: Task Id : > attempt_1462499602101_0119_r_00_0, Status : FAILED > Error: Bad return type > Exception Details: > Location: > org/apache/solr/client/solrj/impl/HttpClientUtil.createClient(Lorg/apache/solr/common/params/SolrParams;Lorg/apache/http/conn/ClientConnectionManager;)Lorg/apache/http/impl/client/CloseableHttpClient; > @58: areturn > Reason: > Type 'org/apache/http/impl/client/DefaultHttpClient' (current frame, > stack[0]) is not assignable to > 'org/apache/http/impl/client/CloseableHttpClient' (from method signature) > Current Frame: > bci: @58 > flags: { } > locals: { 'org/apache/solr/common/params/SolrParams', > 'org/apache/http/conn/ClientConnectionManager', > 'org/apache/solr/common/params/ModifiableSolrParams', > 'org/apache/http/impl/client/DefaultHttpClient' } > stack: { 'org/apache/http/impl/client/DefaultHttpClient' } > Bytecode: > 0x000: bb00 0359 2ab7 0004 4db2 0005 b900 0601 > 0x010: 0099 001e b200 05bb 0007 59b7 0008 1209 > 0x020: b600 0a2c b600 0bb6 000c b900 0d02 002b > 0x030: b800 104e 2d2c b800 0f2d b0 > Stackmap Table: > append_frame(@47,Object[#143]) > 16/05/23 13:52:28 INFO mapreduce.Job: map 100% reduce 0% > as you can see the failed reducer gets re-spawned. then I found this issue: > https://issues.apache.org/jira/browse/SOLR-7657 and I updated my hadoop > config file. after that, the indexer seems to be able to finish ( I got the > document in the solr, it seems ) but I still get the error message at the end > of the job: > 16/05/23 16:39:26 INFO mapreduce.Job: map 100% reduce 99% > 16/05/23 16:39:44 INFO mapreduce.Job: map 100% reduce 100% > 16/05/23 16:39:57 INFO mapreduce.Job: Job job_1464045047943_0001 completed > successfully > 16/05/23 16:39:58 INFO mapreduce.Job: Counters: 53 > File System Counters > FILE: Number of bytes read=42700154855 > FILE: Number of bytes written=70210771807 > FILE: Number of read operations=0 > FILE: Number of large read operations=0 > FILE: Number of write operations=0 > HDFS: Number of bytes read=8699202825 > HDFS: Number of bytes written=0 > HDFS: Number of read operations=537 > HDFS: Number of large read operations=0 > HDFS: Number of write operations=0 > Job Counters > Launched map tasks=134 > Launched reduce tasks=1 > Data-local map tasks=107 > Rack-local map tasks=27 > Total time spent by all maps in occupied slots (ms)=49377664 > Total time spent by all reduces in occupied slots (ms)=32765064 > Total time spent by all map tasks (ms)=3086104 > Total time spent by all reduce tasks (ms)=1365211 > Total vcore-milliseconds taken by all map tasks=3086104 > Total vcore-milliseconds taken by all reduce tasks=1365211 > Total megabyte-milliseconds taken by all map tasks=12640681984 >