Author: snagel
Date: Fri Jul 26 08:46:42 2019
New Revision: 1863783
URL: http://svn.apache.org/viewvc?rev=1863783&view=rev
Log:
Update links
- Wiki (MoinMoin -> Confluence migration)
- http:// -> https:// where applicable
- remove about.md (not related to Nutch)
Removed:
nutch/cms_site/trunk/content/about.md
Modified:
nutch/cms_site/trunk/content/index.md
nutch/cms_site/trunk/content/javadoc.md
nutch/cms_site/trunk/content/version_control.md
Modified: nutch/cms_site/trunk/content/index.md
URL:
http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/index.md?rev=1863783&r1=1863782&r2=1863783&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/index.md (original)
+++ nutch/cms_site/trunk/content/index.md Fri Jul 26 08:46:42 2019
@@ -25,7 +25,7 @@ under the License.
<div class="carousel-caption">
<h1>Highly extensible, highly scalable Web crawler</h1>
<p class="lead">Nutch is a well matured, production ready Web crawler.
Nutch 1.x enables
- fine grained configuration, relying on <a
href="http://hadoop.apache.org">Apache Hadoop™</a>
+ fine grained configuration, relying on <a
href="https://hadoop.apache.org/">Apache Hadoop™</a>
data structures, which are great for batch processing.</p>
<a class="btn btn-large btn-primary" href="downloads.html">Download</a>
</div>
@@ -38,10 +38,10 @@ under the License.
<h1>Pluggable parsing, protocols, storage and indexing</h1>
<p class="lead">Being pluggable and modular of course has it's benefits,
Nutch provides extensible interfaces such as Parse, Index and
ScoringFilter's for custom
- implementations e.g. <a href="http://tika.apache.org">Apache
Tika™</a> for parsing.
- Additonally, pluggable indexing exists for <a
href="http://lucene.apache.org/solr">Apache Solr™</a>,
- <a href="http://www.elasticsearch.org/">Elastic Search</a>, <a
href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">SolrCloud</a>,
etc.</p>
- <a class="btn btn-large btn-primary"
href="http://wiki.apache.org/nutch/">Learn About</a>
+ implementations e.g. <a href="https://tika.apache.org/">Apache
Tika™</a> for parsing.
+ Additonally, pluggable indexing exists for <a
href="https://lucene.apache.org/solr">Apache Solr™</a>,
+ <a href="https://www.elastic.co/">Elastic Search</a>, <a
href="https://cwiki.apache.org/confluence/display/solr/SolrCloud">SolrCloud</a>,
etc.</p>
+ <a class="btn btn-large btn-primary"
href="https://cwiki.apache.org/confluence/display/NUTCH/Home">Learn About</a>
</div>
</div>
</div>
@@ -53,7 +53,7 @@ under the License.
<p class="lead">Nutch 2.X branch is becoming an emerging alternative
taking direct inspiration from 1.X. 2.X differs in one key area;
storage is abstracted away from any specific underlying data store by
using
- <a href="http://gora.apache.org">Apache Gora™</a> for handling
object to persistent
+ <a href="https://gora.apache.org/">Apache Gora™</a> for handling
object to persistent
data store mappings.</p>
<a class="btn btn-large btn-primary" href="mailing_lists.html">Join the
Community</a>
</div>
@@ -106,6 +106,10 @@ under the License.
#Apache Nutch News
+##26 July 2019 - Nutch Wiki Migrated
+
+The [Apache Nutch
wiki](https://cwiki.apache.org/confluence/display/NUTCH/Home) has been migrated
from MoinMoin to Confluence.
+
##9 August 2018 - Nutch 1.15 Release
The Apache Nutch PMC are pleased to announce the immediate release of Apache
Nutch v1.15, we advise all
@@ -115,8 +119,8 @@ An account of the CHANGES in this releas
[release
report](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12342302).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
##23 December 2017 - Nutch 1.14 Release
@@ -127,8 +131,8 @@ An account of the CHANGES in this releas
[release
report](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12340218).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
##02 April 2017 - Nutch 1.13 Release
@@ -139,8 +143,8 @@ An account of the CHANGES in this releas
[release report](https://s.apache.org/wq3x).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
##18 June 2016 - Nutch 1.12 Release
@@ -151,8 +155,8 @@ This release is the result of many month
[release report](https://s.apache.org/nutch1.12).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
##21 January 2016 - Nutch 2.3.1 Release
@@ -160,11 +164,11 @@ The Apache Nutch PMC are pleased to anno
current users and developers of the 2.X series to upgrade to this release.
This bug fix release contains around 40 issues addressed. For a complete
overview of these issues please see the
-[release report](http://s.apache.org/nutch_2.3.1).
+[release report](https://s.apache.org/nutch_2.3.1).
As usual in the 2.X series, release artifacts are made available as only
source and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
The recommended Gora backends for this Nutch release are
@@ -185,11 +189,11 @@ The Apache Nutch PMC are pleased to anno
current users and developers of the 1.X series to upgrade to this release.
This release is the result of many months of work and around 100 issues
addressed. For a complete overview of these issues please see the
-[release report](http://s.apache.org/nutch11).
+[release report](https://s.apache.org/nutch11).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven
Central](http://search.maven.org/#search|gav|1|g%3A%22org.apache.nutch%22%20AND%20a%3A%22nutch%22)
as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven
Central](https://search.maven.org/search?q=g:org.apache.nutch%20AND%20a:nutch&core=gav)
as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
##06 May 2015 - Nutch 1.10 Release
@@ -197,15 +201,15 @@ The Apache Nutch PMC are pleased to anno
current users and developers of the 1.X series to upgrade to this release.
This release is the result of many months of work and well over 100 issues
addressed. For a complete overview of these issues please see the
-[release report](http://s.apache.org/nutch10).
+[release report](https://s.apache.org/nutch10).
As usual in the 1.X series, release artifacts are made available as both
source and binary and also available within
-[Maven Central](http://search.maven.org/) as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven Central](https://search.maven.org/) as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
## 23 April 2015 - Apache Nutch Reaches 2000th Jira Issue
-<blockquote class="twitter-tweet" lang="en"><p><a
href="https://twitter.com/ApacheNutch">@ApacheNutch</a> reaches 2000th issue on
<a href="https://twitter.com/TheASF">@TheASF</a> <a
href="https://twitter.com/hashtag/JIRA?src=hash">#JIRA</a> with over a decade
of <a href="https://twitter.com/hashtag/opensource?src=hash">#opensource</a>
crawling on the <a href="https://twitter.com/hashtag/www?src=hash">#www</a> <a
href="http://t.co/k3VLhbJQhg">pic.twitter.com/k3VLhbJQhg</a></p>— Apache
Nutch (@ApacheNutch) <a
href="https://twitter.com/ApacheNutch/status/591359830171856896">April 23,
2015</a></blockquote>
+<blockquote class="twitter-tweet" lang="en"><p><a
href="https://twitter.com/ApacheNutch">@ApacheNutch</a> reaches 2000th issue on
<a href="https://twitter.com/TheASF">@TheASF</a> <a
href="https://twitter.com/hashtag/JIRA?src=hash">#JIRA</a> with over a decade
of <a href="https://twitter.com/hashtag/opensource?src=hash">#opensource</a>
crawling on the <a href="https://twitter.com/hashtag/www?src=hash">#www</a> <a
href="https://t.co/k3VLhbJQhg">pic.twitter.com/k3VLhbJQhg</a></p>— Apache
Nutch (@ApacheNutch) <a
href="https://twitter.com/ApacheNutch/status/591359830171856896">April 23,
2015</a></blockquote>
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
##22 January 2015 - Nutch 2.3 Release
@@ -214,35 +218,35 @@ The Apache Nutch PMC are pleased to anno
current users and developers of the 2.X series to upgrade to this release.
After successful completion of the first [Nutch Google Summer of Code
project](https://issues.apache.org/jira/browse/NUTCH-841)
we are pleased to announce that Nutch 2.3 release now comes packaged with a
self
-contained [Apache Wicket](http://wicket.apache.org)-based Web Application.
+contained [Apache Wicket](https://wicket.apache.org/)-based Web Application.
This release is the result of many months of work and 143 issues addressed.
For a complete overview of these issues please see the
-[release report](http://s.apache.org/nutch_2.3).
+[release report](https://s.apache.org/nutch_2.3).
As usual in the 2.x series, this release is made available only as source, but
is also available within
-[Maven Central](http://search.maven.org/) as a Maven dependency.
-The release is available from our [DOWNLAODS
PAGE](http://nutch.apache.org/downloads.html).
+[Maven Central](https://search.maven.org/) as a Maven dependency.
+The release is available from our [DOWNLAODS PAGE](/downloads.html).
-The supported [Apache Gora](http://gora.apache.org) v0.5 backends are;
+The supported [Apache Gora](https://gora.apache.org/) v0.5 backends are;
- * [Apache Hadoop](http://hadoop.apache.org) 1.0.1 & 2.4.0
- * [Apache Cassandra](http://cassandra.apache.org) 2.0.2
- * [Apache HBase](http://hbase.apache.org) 0.94.14
- * [Apache Accumulo](http://accumulo.apache.org) 1.5.1
- * [MongoDB](http://mongodb.org) 2.12.2
- * [Apache Solr](http://lucene.apache.org/solr) 4.8.1
- * [Apache Avro](http://avro.apache.org) 1.7.6
+ * [Apache Hadoop](https://hadoop.apache.org/) 1.0.1 & 2.4.0
+ * [Apache Cassandra](https://cassandra.apache.org/) 2.0.2
+ * [Apache HBase](https://hbase.apache.org/) 0.94.14
+ * [Apache Accumulo](https://accumulo.apache.org/) 1.5.1
+ * [MongoDB](https://mongodb.org/) 2.12.2
+ * [Apache Solr](https://lucene.apache.org/solr) 4.8.1
+ * [Apache Avro](https://avro.apache.org/) 1.7.6
Please note that the SQL backend for Gora has been deprecated.
##22 September 2014 - Wicket WebApp now part of Nutch 2.x Codebase
- <a title="Apache Wicket" href="http://wicket.apache.org">
- <img src="http://wicket.apache.org/guide/img/apache-wicket.png"
class="float-right" alt="Apache Wicket Logo" height="100" width="400"/>
+ <a title="Apache Wicket" href="https://wicket.apache.org/">
+ <img src="https://wicket.apache.org/guide/img/apache-wicket.png"
class="float-right" alt="Apache Wicket Logo" height="100" width="400"/>
</a>
After successful completion of the first [Nutch Google Summer of Code
project](https://issues.apache.org/jira/browse/NUTCH-841)
we are pleased to announce that Nutch 2.X branch now comes packaged with a
self
-contained [Apache Wicket](http://wicket.apache.org)-based Web Application.
+contained [Apache Wicket](https://wicket.apache.org/)-based Web Application.
This not only greatly lowers the barrier for direct interaction with the Nutch
2.X
REST API but also provides a stepping stone from which we intend to backport
this
@@ -251,8 +255,8 @@ work to the Nutch 1.X (trunk) series.
Some of the Web Application features include:
* Functionality to dynamically load seed URLs in order to bootstrap Nutch
crawls
- * Browsable and dynamic editing of [Configuration
overrides](http://wiki.apache.org/nutch/NutchPropertiesCompleteList)
- * Complete [REST API
documentation](https://wiki.apache.org/nutch/NutchRESTAPI) and UML
+ * Browsable and dynamic editing of [Configuration
overrides](https://cwiki.apache.org/nutch/NutchPropertiesCompleteList)
+ * Complete [REST API
documentation](https://cwiki.apache.org/nutch/NutchRESTAPI) and UML
model describing REST API calls, Administration and Job and Configuration
Management.
The new Web Application feature will be present within the upcoming Nutch 2.3
Release.
@@ -262,11 +266,11 @@ The new Web Application feature will be
<p>The Apache Nutch PMC are pleased to announce the immediate release of
Apache Nutch v1.9, we advise all
current users and developers of the 1.X series to upgrade to this release.
This release addressed
no fewer than 55 issues in total.
- Please see the <a
href="http://www.apache.org/dist/nutch/1.9/CHANGES.txt">list of changes</a> for
a full
- breakdown, or see the <a href="http://s.apache.org/1.9-release">release
report</a>.
+ Please see the <a
href="https://www.apache.org/dist/nutch/1.9/CHANGES.txt">list of changes</a>
for a full
+ breakdown, or see the <a href="https://s.apache.org/1.9-release">release
report</a>.
As usual in the 1.X series, this release is made available both as source
and binary. Additionally developers
- can find Maven artifacts within <a href="http://search.maven.org/">Maven
Central</a>.
- The release is available <a
href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
+ can find Maven artifacts within <a href="https://search.maven.org/">Maven
Central</a>.
+ The release is available <a
href="https://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
</p>
</div>
@@ -308,7 +312,7 @@ The new Web Application feature will be
<p>The Apache Nutch PMC are pleased to announce the immediate release of
Apache Nutch v1.8, we advise all
current users and developers of the 1.X series to upgrade to this release.
Alhough this
release includes library upgrades to <a
href="http://code.google.com/p/crawler-commons/">Crawler Commons</a> 0.3 and
- <a href="http://tika.apache.org">Apache Tika</a> 1.5, it also provides over
30 bug fixes as well as 18 improvements.
+ <a href="https://tika.apache.org/">Apache Tika</a> 1.5, it also provides
over 30 bug fixes as well as 18 improvements.
Please see the <a
href="http://www.apache.org/dist/nutch/1.8/CHANGES.txt">list of changes</a> for
a full
breakdown, or see the <a href="http://s.apache.org/oHY">release report</a>.
As usual in the 1.X series, this release is made available both as source
and binary. Additionally developers
@@ -322,8 +326,8 @@ The new Web Application feature will be
<h2>02 July 2013 - Apache Nutch v2.2.1 Released</h2>
<p>The Apache Nutch PMC are pleased to announce the immediate release of
Apache Nutch v2.2.1, we advise all
current users and developers of the 2.X series to upgrade to this release
ASAP. Although this
- release includes library upgrades to <a
href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0 and
- <a href="http://tika.apache.org">Apache Tika</a> 1.3, it is predominantly a
bug fix for
+ release includes library upgrades to <a
href="https://hadoop.apache.org/">Apache Hadoop</a> 1.2.0 and
+ <a href="https://tika.apache.org/">Apache Tika</a> 1.3, it is predominantly
a bug fix for
<a href="https://issues.apache.org/jira/browse/NUTCH-1591">NUTCH-1591 -
Incorrect conversion of ByteBuffer to String</a>.
Please see the <a
href="http://www.apache.org/dist/nutch/2.2.1/CHANGES-2.2.1.txt">list of
changes</a> for a full
breakdown, or see the <a href="http://s.apache.org/PGa">release report</a>.
@@ -342,8 +346,8 @@ The new Web Application feature will be
<a href="http://lucene.apache.org/solr">Apache Solr</a> and <a
href="http://www.elasticsearch.org/">Elastic Search</a>.
Shadowing the recent Nutch 2.2 release, parsing
of Robots.txt is now delegated to <a
href="http://code.google.com/p/crawler-commons/">
- Crawler-Commons</a>. Key library upgrades have been made to <a
href="http://hadoop.apache.org">Apache Hadoop</a> 1.2.0
- and <a href="http://tika.apache.org">Apache Tika</a> 1.3. Please see the <a
href="http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt">list of
+ Crawler-Commons</a>. Key library upgrades have been made to <a
href="https://hadoop.apache.org/">Apache Hadoop</a> 1.2.0
+ and <a href="https://tika.apache.org/">Apache Tika</a> 1.3. Please see the
<a href="http://www.apache.org/dist/nutch/1.7/1.7-CHANGES.txt">list of
changes</a> or the <a href="http://s.apache.org/1zE">release report</a> made
in this version for a full
breakdown.
As usual in the 1.x series, the release is made available as binary and
source (zip + tar.gz) and is also available within
@@ -359,8 +363,8 @@ The new Web Application feature will be
release includes over 30 bug fixes and over 25 improvements representing the
third release of increasingly
popular 2.x Nutch series. This release features inclusion of <a
href="http://code.google.com/p/crawler-commons/">
Crawler-Commons</a> which Nutch now utilizes for improved robots.txt
parsing, library upgrades to
- <a href="http://hadoop.apache.org">Apache Hadoop</a> 1.1.1, <a
href="http://gora.apache.org">Apache Gora</a>
- 0.3, <a href="http://tika.apache.org">Apache Tika</a> 1.2 and <a
href="http://www.brics.dk/automaton/automaton">
+ <a href="https://hadoop.apache.org/">Apache Hadoop</a> 1.1.1, <a
href="https://gora.apache.org/">Apache Gora</a>
+ 0.3, <a href="https://tika.apache.org/">Apache Tika</a> 1.2 and <a
href="http://www.brics.dk/automaton/automaton">
Automaton</a> 1.11-8. Please see the <a
href="http://www.apache.org/dist/nutch/2.2/2.2-CHANGES.txt">list of
changes</a> or the <a href="http://s.apache.org/LPB">release report</a> made
in this version for a full
breakdown.
@@ -376,7 +380,7 @@ The new Web Application feature will be
release includes over 20 bug fixes, the same in improvements, as well as new
functionalities including a new HostNormalizer,
the ability to dynamically set fetchInterval by MIME-type and functional
enhancements to the Indexer API inluding the normalization
of URL's and the deletion of robots noIndex documents. Other notable
improvements include the upgrade of key dependencies to
- <a href="http://tika.apache.org/1.2/index.html">Tika 1.2</a> and <a
href="http://www.brics.dk/automaton/">Automaton 1.11-8</a>.
+ <a href="https://tika.apache.org/1.2/index.html">Tika 1.2</a> and <a
href="http://www.brics.dk/automaton/">Automaton 1.11-8</a>.
Please see the <a
href="http://www.apache.org/dist/nutch/1.6/CHANGES_1.6.txt">list of changes</a>
or the
<a
href="https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=10680&version=12319941">release
report</a> made
in this version for a full breakdown. The release is available
@@ -389,7 +393,7 @@ The new Web Application feature will be
<p>The Apache Nutch PMC are very pleased to announce the release of Apache
Nutch v2.1. This
release continues to provide Nutch users with a simplified Nutch
distribution building on the 2.x
development drive which is growing in popularity amongst the community. As
well as addressing ~20 bugs
- this release also offers improved properties for better <a
href="http://lucene.apache.org/solr/">Solr</a> configuration, upgrades to
various <a href="http://gora.apache.org">Gora</a> dependencies and the
introduction of the option to build indexes in <a
href="http://www.elasticsearch.org/">elastic search</a>.
+ this release also offers improved properties for better <a
href="http://lucene.apache.org/solr/">Solr</a> configuration, upgrades to
various <a href="https://gora.apache.org/">Gora</a> dependencies and the
introduction of the option to build indexes in <a
href="http://www.elasticsearch.org/">elastic search</a>.
Please see the <a
href="http://www.apache.org/dist/nutch/2.1/CHANGES-2.1.txt">list of changes</a>
made
in this version for a full breakdown. The release is available
<a href="http://www.apache.org/dyn/closer.cgi/nutch/">here</a>.
Modified: nutch/cms_site/trunk/content/javadoc.md
URL:
http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/javadoc.md?rev=1863783&r1=1863782&r2=1863783&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/javadoc.md (original)
+++ nutch/cms_site/trunk/content/javadoc.md Fri Jul 26 08:46:42 2019
@@ -35,7 +35,7 @@ under the License.
<!-- div id="bodyColumn" class="span9"-->
<p>It should ne noted that the 1.X branch is currently the Nutch trunk code
base.</p>
<p>Nutch 2.X is a different code base and uses different data structures.
For more information
- on the 2.X branch, we urge users to approach the <a
href="http://wiki.apache.org/nutch/#Nutch_2.x">wiki documentation</a>
+ on the 2.X branch, we urge users to approach the <a
href="https://cwiki.apache.org/nutch/#Nutch_2.x">wiki documentation</a>
</p>
<div class="section">
Modified: nutch/cms_site/trunk/content/version_control.md
URL:
http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/version_control.md?rev=1863783&r1=1863782&r2=1863783&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/version_control.md (original)
+++ nutch/cms_site/trunk/content/version_control.md Fri Jul 26 08:46:42 2019
@@ -4,4 +4,4 @@ Title: Nutch Version Control System
Nutch uses the Apache Software Foundation Git writeable repositories as its
master
repository.
-You can find more information about Nutch source control
[here](https://wiki.apache.org/nutch/UsingGit).
+You can find more information about Nutch source control
[here](https://cwiki.apache.org/nutch/UsingGit).