[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-09-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=299&rev2=300

   * HardwareRequirements
   * NutchResources
   * NutchScoring - The whats and wheres of Scoring implementations in Apache 
Nutch
+  * NutchFileFormats - Provides information on the Nutch file formats
  
  == Nutch Development ==
   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-05-22 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=298&rev2=299

   Nutch 1.X tutorial(s) 
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.
   * QuickStartparseChecker - Quick start tutorial on how to use the 
ParseChecker tool to quickly scrape a website.
+  * [[Nutch 1.X RESTAPI|https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI]] - An 
overview of the entire Nutch 1.X REST API. 
  
   Nutch 2.X tutorial(s) 
   * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora. This is the primary Nutch 2.X tutorial. 


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-05-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=297&rev2=298

   * InternalDocumentation -- How Nutch works.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
   * HowToContribute
-  * TaskList -- Tasks for Nutch developers. /!\ :Severe update required: /!\
   * [[Committer's_Rules]] -- Committers should follow these guidelines when 
deciding, which branch to use for committing the patches and when to commit.
   * [[Release_HOWTO]]
   * [[CMS_Website_Update_HOWTO]] - How to edit the Nutch website based on the 
[[http://www.apache.org/dev/cms.html|Apache CMS]].


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=293&rev2=294

   * NutchMeetUps - Records of previous Nutch community meetup, hackathons, 
barcamps etc.
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
   * GoogleSummerOfCode - An area dedicated to GSoC projects and student/mentor 
development/documentation sandbox.
-  * AdvancedAJAXInteraction - Discussion centered on enabling Nutch to not 
only fetch, but also interact with JavaScript
+  * AdvancedAjaxInteraction - Discussion centered on enabling Nutch to not 
only fetch, but also interact with JavaScript
  
  == Nutch 2.x ==
   * Nutch2Crawling - A description of the crawling jobs and field to database 
mappings.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-02-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=292&rev2=293

   * NutchMeetUps - Records of previous Nutch community meetup, hackathons, 
barcamps etc.
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
   * GoogleSummerOfCode - An area dedicated to GSoC projects and student/mentor 
development/documentation sandbox.
+  * AdvancedAJAXInteraction - Discussion centered on enabling Nutch to not 
only fetch, but also interact with JavaScript
  
  == Nutch 2.x ==
   * Nutch2Crawling - A description of the crawling jobs and field to database 
mappings.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2015-01-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=291&rev2=292

   * QuickStartparseChecker - Quick start tutorial on how to use the 
ParseChecker tool to quickly scrape a website.
  
   Nutch 2.X tutorial(s) 
-  * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
+  * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora. This is the primary Nutch 2.X tutorial. 
-  * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial
-  * [[http://www.covert.io/post/18414889381/accumulo-nutch-and-gora|Accumulo, 
Nutch, and Gora]] - A step-by-step tutorial
-  * [[Nutch2Cassandra|Setting up Nutch 2.x with Cassandra]] - How to setup and 
run Nutch 2.x using Cassandra as storage.
+  * [[Nutch2Cassandra|Setting up Nutch 2.x with Cassandra]] - How to setup and 
run Nutch 2.x using Cassandra as storage. 
+  * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial () /!\ Very Old SQL is deprecated in 
Nutch 2.X /!\
+  * [[http://www.covert.io/post/18414889381/accumulo-nutch-and-gora|Accumulo, 
Nutch, and Gora]] - A step-by-step tutorial /!\ Very Old /!\
  
   Other Tutorial(s) 
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-11-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=290&rev2=291

   * Nutch 1.x: A well matured, production ready crawler. 1.x enables fine 
grained configuration, relying on [[http://hadoop.apache.org/|Apache Hadoop]] 
data structures, which are great for batch processing.
   * Nutch 2.x: An emerging alternative taking direct inspiration from 1.x, but 
which differs in one key area; storage is abstracted away from any specific 
underlying data store by using [[http://gora.apache.org|Apache Gora]] for 
handling object to persistent mappings. This means we can implement an 
extremely flexibile model/stack for storing everything (fetch time, status, 
content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage 
solutions.
  
- Being pluggable and modular of course has it's benefits, Nutch provides 
extensible interfaces such as Parse, Index and ScoringFilter's for custom 
implementations e.g. [[htp://tika.apache.org|Apache Tika]] for parsing. 
Additionally, pluggable indexing exists for 
[[http://lucene.apache.org/solr|Apache Solr]], 
[[http://www.elasticsearch.org|Elastic Search]], etc.
+ Being pluggable and modular of course has it's benefits, Nutch provides 
extensible interfaces such as Parse, Index and ScoringFilter's for custom 
implementations e.g. [[http://tika.apache.org|Apache Tika]] for parsing. 
Additionally, pluggable indexing exists for 
[[http://lucene.apache.org/solr|Apache Solr]], 
[[http://www.elasticsearch.org|Elastic Search]], etc.
  
  Nutch can run on a single machine, but gains a lot of its strength from 
running in a Hadoop cluster
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=289&rev2=290

  == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions: Command line options for 1.X and 2.X 
-  * [[https://nutch.apache.org/apidocs-1.8/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.X release.
+  * [[https://nutch.apache.org/apidocs-1.9/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.X release.
   * [[https://nutch.apache.org/apidocs-2.2.1/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-23 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=286&rev2=287

   * 
[[http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/|Recrawling 
with Nutch]] - How to re-crawl with Nutch. 
   * 
[[https://github.com/evolvingweb/ajax-solr/wiki/Tutorial%3A-Nutch|Ajax-Solr 
Tutorial: Nutch]] - Quick and easy guide to getting a nice UI on top of your 
Nutch crawl data. 
   * 
[[http://soryy.com/blog/2014/ajax-javascript-enabled-parsing-apache-nutch-selenium/|AJAX/JavaScript
 Enabled Parsing with Apache Nutch and Selenium]]
+  * SetupProxyForNutch - using Tinyproxy on Ubuntu
+  * SetupNutchAndTor - Crawling .onion hidden services using Nutch behind 
Polipo HTTP Proxy
  
  
  === Configuration ===
@@ -62, +64 @@

   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
intranet crawling configuration.
   * OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect recent Nutch releases. In 
addition the legacy indexing and searching material should be archived. /!\
-  * SetupProxyForNutch - using Tinyproxy on Ubuntu
   * IndexStructure /!\ :This page needs a slight update to provide more 
information on plugins and the data they send to Solr for indexing: /!\
  
  == General Information ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=285&rev2=286

   * PluginCentral -- How to write your own plugins and use other people's.
   * InternalDocumentation -- How Nutch works.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
-  * FixingOpicScoring - ''In planning''.
   * HowToContribute
   * TaskList -- Tasks for Nutch developers. /!\ :Severe update required: /!\
   * [[Committer's_Rules]] -- Committers should follow these guidelines when 
deciding, which branch to use for committing the patches and when to commit.
@@ -103, +102 @@

   * Nutch2Crawling - A description of the crawling jobs and field to database 
mappings.
   * Nutch2Architecture - A high level overview of the new architecture and 
design
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
-  * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
-  * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
   * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
   * [[NutchConfigurationFiles-2.x]] -- Configuration files that are specific 
to Nutch-2.x


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=284&rev2=285

   * [[FAQ]]
   * HardwareRequirements
   * NutchResources
+  * NutchScoring - The whats and wheres of Scoring implementations in Apache 
Nutch
  
  == Nutch Development ==
   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=283&rev2=284

   * [[NutchConfigurationFiles-2.x]] -- Configuration files that are specific 
to Nutch-2.x
   * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields in 
Nutch 2.0 Webpage -  Detailed article]]
   * WorkingWithGoraSnapshots - A step by step guide to working with Gora 
development code within your Nutch 2.x deployment
-  * NutchRESTAPI - A UML diagram and overview of the entire Nutch 2.X REST 
API. 
+  * [[NutchRESTAPI]] - A UML diagram and overview of the entire Nutch 2.X REST 
API. 
  
  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]


Re: [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-15 Thread Mattmann, Chris A (3980)
Boom, thanks Lewis

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Apache Wiki 
Reply-To: "dev@nutch.apache.org" 
Date: Monday, September 15, 2014 12:14 PM
To: Apache Wiki 
Subject: [Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

>Dear Wiki user,
>
>You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>change notification.
>
>The "FrontPage" page has been changed by LewisJohnMcgibbney:
>https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=282&rev2=283
>
>   * [[NutchConfigurationFiles-2.x]] -- Configuration files that are
>specific to Nutch-2.x
>   * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields
>in Nutch 2.0 Webpage -  Detailed article]]
>   * WorkingWithGoraSnapshots - A step by step guide to working with Gora
>development code within your Nutch 2.x deployment
>+  * NutchRESTAPI - A UML diagram and overview of the entire Nutch 2.X
>REST API. 
>  
>  == Pre Nutch 1.3 and Archive ==
>   * [[Archive and Legacy]]



[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-09-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=282&rev2=283

   * [[NutchConfigurationFiles-2.x]] -- Configuration files that are specific 
to Nutch-2.x
   * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields in 
Nutch 2.0 Webpage -  Detailed article]]
   * WorkingWithGoraSnapshots - A step by step guide to working with Gora 
development code within your Nutch 2.x deployment
+  * NutchRESTAPI - A UML diagram and overview of the entire Nutch 2.X REST 
API. 
  
  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-08-05 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=280&rev2=281

   * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr 
backend.
   * 
[[http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/|Recrawling 
with Nutch]] - How to re-crawl with Nutch. 
   * 
[[https://github.com/evolvingweb/ajax-solr/wiki/Tutorial%3A-Nutch|Ajax-Solr 
Tutorial: Nutch]] - Quick and easy guide to getting a nice UI on top of your 
Nutch crawl data. 
+  * 
[[http://soryy.com/blog/2014/ajax-javascript-enabled-parsing-apache-nutch-selenium/|AJAX/JavaScript
 Enabled Parsing with Apache Nutch and Selenium]]
  
  
  === Configuration ===


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-06-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=278&rev2=279

   * [[Getting_Started]]
   * NutchMeetUps - Records of previous Nutch community meetup, hackathons, 
barcamps etc.
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
+  * GoogleSummerOfCode - An area dedicated to GSoC projects and student/mentor 
development/documentation sandbox.
  
  == Nutch 2.x ==
   * Nutch2Crawling - A description of the crawling jobs and field to database 
mappings.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-04-07 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=276&rev2=277

  == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions: Command line options for 1.X and 2.X 
-  * [[http://nutch.apache.org/apidocs-1.6/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.X release.
+  * [[https://nutch.apache.org/apidocs-1.8/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.X release.
-  * [[http://nutch.apache.org/apidocs-2.1/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
+  * [[https://nutch.apache.org/apidocs-2.2.1/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2014-03-05 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/FrontPage?action=diff&rev1=273&rev2=274

  = Welcome to the Apache Nutch Wiki =
  {{attachment:nutch_logo_medium.gif}}
  
- Please contribute your knowledge about Nutch here! <>
+ Please contribute your knowledge about Nutch here! 
+ '''If you would like to update any content, would like to add your own 
content or would like to see something added then please browse the 
[[http://s.apache.org/73z|Documentation issues]] and open a 
[[https://issues.apache.org/jira/browse/NUTCH|Jira ticket]] (tagging it with 
the [[http://s.apache.org/73z|Documentation label]]) if you cannot find 
something your looking for.
+ 
+ <>
  
  == Nutch Version Administration ==
   * DownloadingNutch


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2013-02-09 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=258&rev2=259

   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.x ==
-  * Nutch2Crawling - A description of the crawling jobs
+  * Nutch2Crawling - A description of the crawling jobs and field to database 
mappings.
   * Nutch2Architecture - A high level overview of the new architecture and 
design
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
   * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
-  * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields in 
Nutch 2.0 Webpage]]
+  * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields in 
Nutch 2.0 Webpage -  Detailed article]]
  
  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2013-01-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=257&rev2=258

   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
   * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr 
backend.
   * 
[[http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/|Recrawling 
with Nutch]] - How to re-crawl with Nutch. 
+  * 
[[https://github.com/evolvingweb/ajax-solr/wiki/Tutorial%3A-Nutch|Ajax-Solr 
Tutorial: Nutch]] - Quick and easy guide to getting a nice UI on top of your 
Nutch crawl data. 
  
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect recent Nutch releases: /!\


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2013-01-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=256&rev2=257

   * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. 
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
   * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr 
backend.
+  * 
[[http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/|Recrawling 
with Nutch]] - How to re-crawl with Nutch. 
  
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect recent Nutch releases: /!\


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2013-01-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=255&rev2=256

  == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions: Command line options for 1.X and 2.X 
-  * [[http://nutch.apache.org/apidocs-1.5/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.5.X release.
+  * [[http://nutch.apache.org/apidocs-1.6/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.X release.
-  * [[http://nutch.apache.org/apidocs-2.0/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
+  * [[http://nutch.apache.org/apidocs-2.1/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===
  
@@ -67, +67 @@

   * [[Image_Search_Design]]
   * StrategicGoals
   * [[Getting_Started]]
-  * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
+  * NutchMeetUps - Records of previous Nutch community meetup, hackathons, 
barcamps etc.
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.x ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2013-01-12 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=254&rev2=255

   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
- == Nutch 2.0 ==
+ == Nutch 2.x ==
   * Nutch2Crawling - A description of the crawling jobs
   * Nutch2Architecture - A high level overview of the new architecture and 
design
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-12-12 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=253&rev2=254

  
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect recent Nutch releases: /!\
-  * NutchConfigurationFiles
+  * NutchConfigurationFiles: An overview from Nutch developers.
+  * NutchPropertiesCompleteList: A fine grained account of all Nutch property 
configuration.
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
intranet crawling configuration.
   * OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-08-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=252&rev2=253

   Nutch 2.X tutorial(s) 
   * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
   * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial
-  * [[http://www.covert.io/post/18414889381/accumulo-nutch-and-gora]] - 
Accumulo, Nutch, and Gora
+  * [[http://www.covert.io/post/18414889381/accumulo-nutch-and-gora|Accumulo, 
Nutch, and Gora]] - A step-by-step tutorial
  
   Other Tutorial(s) 
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-08-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=251&rev2=252

  = Welcome to the Apache Nutch Wiki =
  {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}}
  
- Please contribute your knowledge about Nutch here! <>
+ Please contribute your knowledge about Nutch here! <>
  
  == Nutch Version Administration ==
   * DownloadingNutch
@@ -12, +12 @@

   * [[http://nutch.apache.org/apidocs-2.0/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===
+ 
+  Nutch 1.X tutorial(s) 
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.
+ 
+  Nutch 2.X tutorial(s) 
   * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
+  * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial
+  * [[http://www.covert.io/post/18414889381/accumulo-nutch-and-gora]] - 
Accumulo, Nutch, and Gora
+ 
+  Other Tutorial(s) 
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
   * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. 
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
@@ -68, +76 @@

   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
-  * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial
   * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
   * [[http:///nlp.solutions.asia/?p=232|Understanding the columns/fields in 
Nutch 2.0 Webpage]]
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-08-06 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=248&rev2=249

   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
+  * [[http://nlp.solutions.asia/?p=180|Setting up Nutch 2.0 with MySQL to 
handle UTF-8]] - A step-by-step tutorial
   * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
  
  == Pre Nutch 1.3 and Archive ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-07-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=247&rev2=248

   * DownloadingNutch
   * Current CommandLineOptions: Command line options for 1.X and 2.X 
   * [[http://nutch.apache.org/apidocs-1.5/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.5.X release.
- * [[http://nutch.apache.org/apidocs-2.0/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
+  * [[http://nutch.apache.org/apidocs-2.0/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-07-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=246&rev2=247

  == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions: Command line options for 1.X and 2.X 
-  * [[http://nutch.apache.org/apidocs-1.4/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.4 release.
+  * [[http://nutch.apache.org/apidocs-1.5/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-1.5.X release.
+ * [[http://nutch.apache.org/apidocs-2.0/index.html|JavaDocs]] -- The 
!JavaDocs for the most recent Nutch-2.X release.
  
  === Tutorials ===
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=244&rev2=245

   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.0 ==
-  * ArchitecturalOverview - A high level overview of the new architecture and 
design
+  * Nutch2Architecture - A high level overview of the new architecture and 
design
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=243&rev2=244

   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.0 ==
+  * ArchitecturalOverview - A high level overview of the new architecture and 
design
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=242&rev2=243

  
  === Tutorials ===
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.
+  * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
   * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. 
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
@@ -60, +61 @@

   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.0 ==
-  * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=241&rev2=242

   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
-  * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
   * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=240&rev2=241

   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.0 ==
+  * Nutch2Tutorial -- How to get Nutch 2.X to use HBase as persistence layer 
for Gora  
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-05-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=239&rev2=240

  
  == Nutch Version Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions /!\ :New commands added which need to be 
documented: /!\
+  * Current CommandLineOptions: Command line options for 1.X and 2.X 
   * [[http://nutch.apache.org/apidocs-1.4/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.4 release.
  
  === Tutorials ===


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2012-04-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=238&rev2=239

  === Tutorials ===
   * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
-  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\
+  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. 
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
   * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr 
backend.
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-11-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=237&rev2=238

   * [[http://nutch.apache.org/apidocs-1.4/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.4 release.
  
  === Tutorials ===
-  * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.
+  * NutchTutorial - How to configure Nutch to crawl in local mode and post to 
Apache Solr for search/index.
   * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
   * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\
-  * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 
within Eclipse
+  * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within 
Eclipse
-  * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc documentsin a file system hierachy with a Solr 
backend.
+  * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search 
Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr 
backend.
  
  === Configuration ===
-  * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\
+  * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect recent Nutch releases: /!\
   * NutchConfigurationFiles
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
-  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
1.3 intranet crawling configuration.
+  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
intranet crawling configuration.
-  * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
+  * OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch.
-  * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. /!\
+  * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect recent Nutch releases. In 
addition the legacy indexing and searching material should be archived. /!\
   * SetupProxyForNutch - using Tinyproxy on Ubuntu
   * IndexStructure /!\ :This page needs a slight update to provide more 
information on plugins and the data they send to Solr for indexing: /!\
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
-  * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect 
Nutch 1.3 features. /!\
+  * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect 
recent Nutch features. /!\
   * Current [[NutchGotchas|Nutch Gotchas]]
   * PublicServers running Nutch
   * [[Presentations]] on Nutch


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-11-27 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=236&rev2=237

  == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions /!\ :New commands added which need to be 
documented: /!\
-  * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
+  * [[http://nutch.apache.org/apidocs-1.4/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.4 release.
  
  === Tutorials ===
   * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-11-17 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=235&rev2=236

  
  Please contribute your knowledge about Nutch here! <>
  
- == Nutch Version 1.3 Administration ==
+ == Nutch Version Administration ==
   * DownloadingNutch
   * Current CommandLineOptions /!\ :New commands added which need to be 
documented: /!\
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-11-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=234&rev2=235

   * [[Presentations]] on Nutch
   * Press [[Articles]]
   * [[Evaluations]] of Search Quality
-  * [[Help_Wanted]] organizations hiring Nutch expertise
   * Commercial [[Support]] and developers for hire
   * [[Mailing]] Lists
   * AcademicArticles that deal with Nutch


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-11-15 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=233&rev2=234

   * [[Release_HOWTO]]
   * [[Website_Update_HOWTO]]
   * [[Image_Search_Design]]
-  * [[NutchOSGi]]
   * StrategicGoals
   * [[Getting_Started]]
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-23 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=231&rev2=232

   * Commercial [[Support]] and developers for hire
   * [[Mailing]] Lists
   * AcademicArticles that deal with Nutch
-  * [[FAQ]] /!\ :The Indexing and Searching section require update/archive to 
reflect new 1.3 release: /!\
+  * [[FAQ]] 
   * HardwareRequirements
   * NutchResources
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-10 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=230&rev2=231

  
  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions 
+  * Current CommandLineOptions /!\ :New commands added which need to be 
documented: /!\
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
   * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-08 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=229&rev2=230

   * PluginCentral -- How to write your own plugins and use other people's. 
   * InternalDocumentation -- How Nutch works.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
-  * MultiLingualSupport - ''In development''.
   * FixingOpicScoring - ''In planning''.
   * HowToContribute
   * TaskList -- Tasks for Nutch developers. /!\ :Severe update required: /!\


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-08 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=228&rev2=229

   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. /!\
   * SetupProxyForNutch - using Tinyproxy on Ubuntu 
+  * IndexStructure /!\ :This page needs a slight update to provide more 
information on plugins and the data they send to Solr for indexing: /!\
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
@@ -55, +56 @@

   * [[Image_Search_Design]]
   * [[NutchOSGi]]
   * StrategicGoals
-  * IndexStructure /!\ :This page needs a slight update to provide more 
information on plugins and the data they send to Solr for indexing: /!\
   * [[Getting_Started]]
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=227&rev2=228

  === Tutorials ===
   * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.
   *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
-  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster.
+  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 
within Eclipse
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=226&rev2=227

   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
   * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.
-  * RunningNutchAndSolr - 
-  * RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy 
mode. /!\ :TODO:This tutorial is in construction. /!\
   *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
+  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch 
in deploy mode over a Hadoop cluster.
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 
within Eclipse
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-09-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=225&rev2=226

   * Current CommandLineOptions 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
-  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode 
and post to Apache Solr for search/index.
+  * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post 
to Apache Solr for search/index.
+  * RunningNutchAndSolr - 
   * RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy 
mode. /!\ :TODO:This tutorial is in construction. /!\
   *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
   * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 
within Eclipse


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=224&rev2=225

   * MultiLingualSupport - ''In development''.
   * FixingOpicScoring - ''In planning''.
   * HowToContribute
-  * TaskList -- Tasks for Nutch developers.
+  * TaskList -- Tasks for Nutch developers. /!\ :Severe update required: /!\
   * [[Committer's_Rules]] -- Committers should follow these guidelines when 
deciding, which branch to use for committing the patches and when to commit.
   * [[Release_HOWTO]]
   * [[Website_Update_HOWTO]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=223&rev2=224

   * [[Image_Search_Design]]
   * [[NutchOSGi]]
   * StrategicGoals
-  * IndexStructure
+  * IndexStructure /!\ :This page needs a slight update to provide more 
information on plugins and the data they send to Solr for indexing: /!\
   * [[Getting_Started]]
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=222&rev2=223

   * StrategicGoals
   * IndexStructure
   * [[Getting_Started]]
-  * JavaDemoApplication - A simple demonstration of how to use the Nutch APIin 
a Java application
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=221&rev2=222

  
  == Nutch Development ==
   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.
-  * PluginCentral -- How to write your own plugins and use other people's. /!\ 
:This page requires a huge update to reflect plugins included in Nutch 1.3: /!\
+  * PluginCentral -- How to write your own plugins and use other people's. 
   * InternalDocumentation -- How Nutch works.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
   * MultiLingualSupport - ''In development''.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=220&rev2=221

   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
1.3 intranet crawling configuration.
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
-  * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. We also need to 
create a similar page for Nutch 2.0 as the errors are different in nature as 
are the solutions required to fix them. /!\
+  * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. /!\
   * SetupProxyForNutch - using Tinyproxy on Ubuntu 
  
  == General Information ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=219&rev2=220

   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
-  * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them. /!\ :This page is in construction: /!\
+  * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them.
  
  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=218&rev2=219

   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode 
and post to Apache Solr for search/index.
   * RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy 
mode. /!\ :TODO:This tutorial is in construction. /!\
   *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
-  * BuildingNutchInEclipse - How to configure, build, crawl and debug Nutch 
1.3 within Eclipse
+  * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 
within Eclipse
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=217&rev2=218

   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode 
and post to Apache Solr for search/index.
   * RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy 
mode. /!\ :TODO:This tutorial is in construction. /!\
   *  [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch 
being based Hadoop, it helps to have a better understanding of Hadoop.
+  * BuildingNutchInEclipse - How to configure, build, crawl and debug Nutch 
1.3 within Eclipse
  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=216&rev2=217

   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
-  * ErrorMessagesInNutch2.0 -- What they mean and suggestions for getting rid 
of them. /!\ :This page is in construction: /!\
+  * ErrorMessagesInNutch2 -- What they mean and suggestions for getting rid of 
them. /!\ :This page is in construction: /!\
  
  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-08-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=215&rev2=216

   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 
1.3 intranet crawling configuration.
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. We also need to 
create a similar page for Nutch 2.0 as the errors are different in nature as 
are the solutions required to fix them. /!\
-  * SetupProxyForNutch - using Tinyproxy on Ubuntu /!\ Requires slight 
updating to correct references and subheadings /!\
+  * SetupProxyForNutch - using Tinyproxy on Ubuntu 
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-21 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=213&rev2=214

   * [[Getting_Started]]
   * JavaDemoApplication - A simple demonstration of how to use the Nutch APIin 
a Java application
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
-  * TikaPlugin - Comments on the Tika integration and differences with 
existing parse plugins
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]
  
  == Nutch 2.0 ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-18 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=212&rev2=213

   * Commercial [[Support]] and developers for hire
   * [[Mailing]] Lists
   * AcademicArticles that deal with Nutch
-  * [[FAQ]]
+  * [[FAQ]] /!\ :The Indexing and Searching section require update/archive to 
reflect new 1.3 release: /!\
   * HardwareRequirements
   * NutchResources
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-12 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=211&rev2=212

   * FixingOpicScoring - ''In planning''.
   * HowToContribute
   * TaskList -- Tasks for Nutch developers.
-  * [[Development]] -- More tasks for Nutch developers.
   * [[Committer's_Rules]] -- Committers should follow these guidelines when 
deciding, which branch to use for committing the patches and when to commit.
   * [[Release_HOWTO]]
   * [[Website_Update_HOWTO]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-12 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=210&rev2=211

  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
   * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect 
Nutch 1.3 features. /!\
-  * Current [[NutchGotchas|Nutch Gotchas]] /!\ :TODO:At the moment this 
appears to contain no info! What are the current Nutch 1.3 User gotchas and 
1.4/2.0 Development gotchas. /!\
+  * Current [[NutchGotchas|Nutch Gotchas]] 
   * PublicServers running Nutch
   * [[Presentations]] on Nutch
   * Press [[Articles]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=207&rev2=208

   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration. /!\ :This is configured for  Nutch <1.3 and 
therefore requires an update and for the old page to be archived: /!\
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them. 
/!\ :This requires extensive updating to reflect Nutch 1.3. In addition the 
legacy indexing and searching material should be archived. We also need to 
create a similar page for Nutch 2.0 as the errors are different in nature as 
are the solutions required to fix them. /!\
-  * SetupProxyForNutch - using Tinyproxy on Ubuntu
+  * SetupProxyForNutch - using Tinyproxy on Ubuntu /!\ Requires slight 
updating to correct references and subheadings /!\
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=205&rev2=206

   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 
   * NutchConfigurationFiles
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
-  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration.
+  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration. /!\ :This is configured for  Nutch <1.3 and 
therefore requires an update and for the old page to be archived: /!\
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them.
   * SetupProxyForNutch - using Tinyproxy on Ubuntu


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=204&rev2=205

  
  == Nutch Development ==
   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.
-  * PluginCentral -- How to write your own plugins and use other people's.
+  * PluginCentral -- How to write your own plugins and use other people's. /!\ 
:This page requires a huge update to reflect plugins included in Nutch 1.3: /!\
   * InternalDocumentation -- How Nutch works.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
   * MultiLingualSupport - ''In development''.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=203&rev2=204

  === Configuration ===
   * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 
   * NutchConfigurationFiles
-  * HowToMakeCustomSearch
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration.
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=202&rev2=203

  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
   * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect 
Nutch 1.3 features. /!\
-  * Current [[NutchGotchas|Nutch Gotchas]]/!\ :TODO:At the moment this appears 
to contain no info! What are the current Nutch 1.3 User gotchas and 1.4/2.0 
Development gotchas. /!\
+  * Current [[NutchGotchas|Nutch Gotchas]] /!\ :TODO:At the moment this 
appears to contain no info! What are the current Nutch 1.3 User gotchas and 
1.4/2.0 Development gotchas. /!\
   * PublicServers running Nutch
   * [[Presentations]] on Nutch
   * Press [[Articles]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-03 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=201&rev2=202

   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode 
and post to Apache Solr for search/index.
   * RunningNutchInDeployMode - How to configure Nutch 1.3 to crawl in deploy 
mode. /!\ :TODO:This tutorial is in construction. /!\
  === Configuration ===
-  * OverviewDeploymentConfigs
+  * OverviewDeploymentConfigs /!\ :This full page requires a complete update 
to reflect Nutch 1.3 release: /!\ 
   * NutchConfigurationFiles
   * HowToMakeCustomSearch
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-02 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=200&rev2=201

Comment:
Removal of comments to reflect updates to CommandLineOptions for Nutch 1.3

  
  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters.  /!\ 
+  * Current CommandLineOptions 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl in local mode 
and post to Apache Solr for search/index.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-07-01 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=199&rev2=200

  == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]
  
- == Other Resources ==
-  * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
-  * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch 
Documentation]]
-  * [[Search_Theory]] Search Theory & White Papers
-  * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts
-  * [[http://www.interadvertising.co.uk/blog/nutch_logos|Larger / better 
quality Nutch logos]] Re-created Nutch logos available in GIF, PNG & EPS in 
resolutions up to 1200 x 449
- 
  == How to edit this Wiki ==
  This Wiki is a collaborative site, anyone can contribute and share:
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=196&rev2=197

  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
-  * [[Features]]
-  * Current [[NutchGotchas|Nutch Gotchas]]
+  * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect 
Nutch 1.3 features. /!\
+  * Current [[NutchGotchas|Nutch Gotchas]]/!\ :TODO:At the moment this appears 
to contain no info! What are the current Nutch 1.3 User gotchas and 1.4/2.0 
Development gotchas. /!\
   * PublicServers running Nutch
   * [[Presentations]] on Nutch
   * Press [[Articles]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=195&rev2=196

  
  == Nutch 2.0 ==
   * Nutch2Roadmap -- Discussions on the architecture and features of Nutch 2.0
-  * Nutch2Architecture -- Discussions on the Nutch 2.0 architecture (old)
   * NewScoring -- New stable pagerank like webgraph and link-analysis jobs.
   * NewScoringIndexingExample -- Two full fetch cycles of commands using new 
scoring and indexing systems.
   * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=194&rev2=195

   * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
  
- == Pre Nutch 1.3 ==
+ == Pre Nutch 1.3 and Archive ==
   * [[Archive and Legacy]]
  
  == Other Resources ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-25 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=193&rev2=194

   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
-  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
+  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:Content to display functionality within 
the new ${NUTCH_HOME}/runtime/deploy configuration is still required for this 
tutorial. /!\
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=192&rev2=193

   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
-  * Nutch1.3WithSolrIntegration - How to configure Nutch 1.3 to crawl and post 
to Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
+  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=191&rev2=192

   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
-  * Nutch1.3WithSolrIntegration 
-- How to configure Nutch 1.3 to crawl and post to Apache Solr for 
search/index /!\ :TODO:This tutorial is being updated to accomodate changes to 
Nutch 1.3 release /!\ 
+  * Nutch1.3WithSolrIntegration - How to configure Nutch 1.3 to crawl and post 
to Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=190&rev2=191

   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
-  * Running Nutch 1.3 with Solr Integration 
+  * Nutch1.3WithSolrIntegration 
-  - How to configure Nutch 1.3 to crawl and post to Apache Solr for 
search/index /!\ :TODO:This tutorial is being updated to accomodate changes to 
Nutch 1.3 release /!\ 
+- How to configure Nutch 1.3 to crawl and post to Apache Solr for 
search/index /!\ :TODO:This tutorial is being updated to accomodate changes to 
Nutch 1.3 release /!\ 
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-24 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=189&rev2=190

   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
+  * Running Nutch 1.3 with Solr Integration 
-  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
+  - How to configure Nutch 1.3 to crawl and post to Apache Solr for 
search/index /!\ :TODO:This tutorial is being updated to accomodate changes to 
Nutch 1.3 release /!\ 
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-23 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=188&rev2=189

   * IndexStructure
   * [[Getting_Started]]
   * JavaDemoApplication - A simple demonstration of how to use the Nutch APIin 
a Java application
-  * InstallingWeb2
   * ApacheConUs2009MeetUp - List of topics for !MeetUp at !ApacheCon US 2009 
in Oakland (Nov 2-6)
   * TikaPlugin - Comments on the Tika integration and differences with 
existing parse plugins
   * [[NutchMavenSupport|Using Nutch as a Maven dependency]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-20 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=187&rev2=188

  
  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release /!\ 
+  * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release also available content for 
existing commands to be updated to include new parameters  /!\ 
   * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=186&rev2=187

  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release /!\ 
+  * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
  === Configuration ===
@@ -40, +41 @@

   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.
   * PluginCentral -- How to write your own plugins and use other people's.
   * InternalDocumentation -- How Nutch works.
-  * [[http://nutch.apache.org/apidocs-1.3/index.html|JavaDocs]] -- The 
!JavaDocs for Nutch-1.3 release.
   * [[http://nutch.apache.org/version_control.html|Nutch Version Control]]
   * MultiLingualSupport - ''In development''.
   * FixingOpicScoring - ''In planning''.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=185&rev2=186

   * DownloadingNutch
   * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release /!\ 
  === Tutorials ===
-  * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index /!\ :TODO:This tutorial is being updated to accomodate changes 
to Nutch 1.3 release /!\ 
+  * RunningNutchAndSolr - How to configure Nutch 1.3 to crawl and post to 
Apache Solr for search/index /!\ :TODO:This tutorial is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
  === Configuration ===
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles
@@ -33, +33 @@

   * Commercial [[Support]] and developers for hire
   * [[Mailing]] Lists
   * AcademicArticles that deal with Nutch
-  * [[http://videolectures.net/iiia06_cutting_ense/|Experiences with the Nutch 
search engine]] author:Doug   Cutting,"Video Lecture"
-  * [[Lucene]]
   * [[FAQ]]
   * HardwareRequirements
  
@@ -72, +70 @@

   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
  
  == Pre Nutch 1.3 ==
-  * [[Archive]]
+  * [[Archive and Legacy]]
  
  == Other Resources ==
   * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
@@ -80, +78 @@

   * [[Search_Theory]] Search Theory & White Papers
   * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts
   * [[http://www.interadvertising.co.uk/blog/nutch_logos|Larger / better 
quality Nutch logos]] Re-created Nutch logos available in GIF, PNG & EPS in 
resolutions up to 1200 x 449
-  * [[http://openbixo.org/documentation/running-bixo-in-ec2/|Instructions for 
running Bixo on EC2]] (includes parts of Nutch)
  
  == How to edit this Wiki ==
  This Wiki is a collaborative site, anyone can contribute and share:


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=184&rev2=185

  
  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions /!\ :TODO:This page is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
+  * Current CommandLineOptions /!\ :TODO:Missing pages to be added to 
accommodate new commands in Nutch 1.3 release /!\ 
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index /!\ :TODO:This tutorial is being updated to accomodate changes 
to Nutch 1.3 release /!\ 
  === Configuration ===


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-14 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=183&rev2=184

   * Create an account by clicking the "Login" link at the top of any page, and 
picking a username and password.
   * Edit any page by pressing '''<>''' at the top or the bottom 
of the page
  
- There are some conventions used on the Solr wiki:
+ There are some conventions used on the Nutch wiki:
  
   * /!\ :TODO: /!\  (`/!\ :TODO: /!\` ) is used to denote sections that 
definitely need to be cleaned up.
-  *  [[Solr4.0]] (` [[Solr4.0]]`) is used to draw attention to which 
version of Solr a feature was (or will be) added to Solr.
  
  Some general info on using this Wiki Software:
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=182&rev2=183

  
  == Nutch Version 1.3 Administration ==
   * DownloadingNutch
-  * Current CommandLineOptions
+  * Current CommandLineOptions /!\ :TODO:This page is being updated to 
accomodate changes to Nutch 1.3 release /!\ 
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index /!\ :TODO:This tutorial is being updated to accomodate changes 
to Nutch 1.3 release /!\ 
  === Configuration ===


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=181&rev2=182

   * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
   * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch 
Documentation]]
   * [[Search_Theory]] Search Theory & White Papers
-  * 
[[http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20|Tutorial
 Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06]]
   * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts
   * [[http://www.interadvertising.co.uk/blog/nutch_logos|Larger / better 
quality Nutch logos]] Re-created Nutch logos available in GIF, PNG & EPS in 
resolutions up to 1200 x 449
   * [[http://openbixo.org/documentation/running-bixo-in-ec2/|Instructions for 
running Bixo on EC2]] (includes parts of Nutch)


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=180&rev2=181

   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
   * ErrorMessages -- What they mean and suggestions for getting rid of them.
   * SetupProxyForNutch - using Tinyproxy on Ubuntu
- === Script Administration ===
- 
- 
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
@@ -40, +37 @@

   * [[Lucene]]
   * [[FAQ]]
   * HardwareRequirements
- 
- === Script Administration ===
-  * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to 
automatic the Nutch fetching process using Python
-  * [[Nutch_0.9_Crawl_Script_Tutorial]]
-  * CrossPlatformNutchScripts
-  * MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's 
progress.
-  * [[Crawl]] - script to crawl (and possible recrawl too)
-  * IntranetRecrawl - script to recrawl a crawl
-  * [[Whole-Web Crawling incremental script]] - crawled urls are searchable at 
each iteration after merging
-  * MergeCrawl - script to merge 2 (or more) crawls
  
  == Nutch Development ==
   * [[Becoming_A_Nutch_Developer|Becoming a Nutch Developer]] - Start 
developing and contributing to Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=179&rev2=180

  Please contribute your knowledge about Nutch here!
  <>
  
- == Version 1.3 release ==
+ == Nutch Version 1.3 Administration ==
   * DownloadingNutch
   * Current CommandLineOptions
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index /!\ :TODO:This tutorial is being updated to accomodate changes 
to Nutch 1.3 release /!\ 
+ === Configuration ===
+  * OverviewDeploymentConfigs
+  * NutchConfigurationFiles
+  * HowToMakeCustomSearch
+  * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
+  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration.
+  * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
+  * ErrorMessages -- What they mean and suggestions for getting rid of them.
+  * SetupProxyForNutch - using Tinyproxy on Ubuntu
+ === Script Administration ===
  
  
  
@@ -31, +41 @@

   * [[FAQ]]
   * HardwareRequirements
  
- == Nutch Administration ==
- 
- === Configuration ===
-  * OverviewDeploymentConfigs
-  * NutchConfigurationFiles
-  * GettingNutchRunningWithUtf8 - For support of non-ASCII characters 
(Chinese, German, Japanese, Korean).
-  * GettingNutchRunningWithResin - Resin is a JSP/Servlet/EJB application 
server (alternative to tomcat).
-  * GettingNutchRunningWithJetty
-  * GettingNutchRunningWithJboss
-  * GettingNutchRunningWithUbuntu
-  * GettingNutchRunningWithWindows
-  * GettingNutchRunningWithMacOsx
-  * GettingNutchRunningWithRedHatApplicationServer
-  * GettingNutchRunningWithDebian
-  * GettingNutchRunningWithSocksProxy
-  * ErrorMessages -- What they mean and suggestions for getting rid of them.
-  * SetupProxyForNutch - using Tinyproxy on Ubuntu
-  * CreateNewFilter - for example to add a category metadata to your index and 
be able to search for it
-  * HowToMakeCustomSearch
-  * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
-  * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration.
-  * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
  === Script Administration ===
   * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to 
automatic the Nutch fetching process using Python
   * [[Nutch_0.9_Crawl_Script_Tutorial]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=178&rev2=179

   * DownloadingNutch
   * Current CommandLineOptions
  === Tutorials ===
-  * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index
+  * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index /!\ :TODO:This tutorial is being updated to accomodate changes 
to Nutch 1.3 release /!\ 
  
  
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=177&rev2=178

  == Nutch Administration ==
  
  === Configuration ===
-  * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to 
automatic the Nutch fetching process using Python
-  * [[Upgrading_Hadoop|Upgrading Hadoop Version in Nutch]] - Basic steps for 
upgrading Hadoop in Nutch.
-  * [[07CommandLineOptions|Commandline]] options for 0.7.x
-  * [[08CommandLineOptions|Commandline]] options for version 0.8
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles
   * GettingNutchRunningWithUtf8 - For support of non-ASCII characters 
(Chinese, German, Japanese, Korean).
@@ -54, +50 @@

   * SetupProxyForNutch - using Tinyproxy on Ubuntu
   * CreateNewFilter - for example to add a category metadata to your index and 
be able to search for it
   * HowToMakeCustomSearch
-  * UpgradeFrom07To08
-  * [[Upgrading_from_0.8.x_to_0.9]]
   * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself 
using NTLM, Basic or Digest authentication schemes.
   * NonDefaultIntranetCrawlingOptions - Desirable options to add to your 
intranet crawling configuration.
   * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch.
  === Script Administration ===
+  * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to 
automatic the Nutch fetching process using Python
   * [[Nutch_0.9_Crawl_Script_Tutorial]]
   * CrossPlatformNutchScripts
   * MonitoringNutchCrawls - techniques for keeping an eye on a nutch crawl's 
progress.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=176&rev2=177

  == Version 1.3 release ==
   * DownloadingNutch
   * Current CommandLineOptions
+ === Tutorials ===
+  * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index
+ 
+ 
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
@@ -29, +33 @@

  
  == Nutch Administration ==
  
- === Tutorials ===
-  * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index
-  * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup Nutch and 
Hadoop over a cluster of machines
-  * NutchTutorial (<=1.2).
-  * [[http://nutch.sourceforge.net/docs/en/tutorial.html|Tutorial]] -- A 
Step-by-Step guide to getting Nutch up and running (<=1.2).
-  * [[http://peterpuwang.googlepages.com/NutchGuideForDummies.htm|Tutorial]] 
-- A Step-by-Step installation guide for dummies: Nutch 0.9.
-  * [[Nutch_-_The_Java_Search_Engine]] (Builds on the basic tutorials. 
Includes index maintenance scripts)
-  * RunNutchInEclipse for v0.8
-  * [[RunNutchInEclipse0.9]] for v0.9 (Linux and Windows)
-  * [[RunNutchInEclipse1.0]] for v1.0 (Linux and Windows)
  === Configuration ===
   * [[Automating_Fetches_with_Python|Automating Fetches with Python]] - How to 
automatic the Nutch fetching process using Python
   * [[Upgrading_Hadoop|Upgrading Hadoop Version in Nutch]] - Basic steps for 
upgrading Hadoop in Nutch.


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=175&rev2=176

  == Other Resources ==
   * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
   * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch 
Documentation]]
-  * [[http://frutch.free.fr/|Frutch Wiki]] -- French Nutch Wiki
-  * The [[http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/Nutch|Old Wiki]]
   * [[Search_Theory]] Search Theory & White Papers
   * 
[[http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20|Tutorial
 Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06]]
   * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=174&rev2=175

  == Other Resources ==
   * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
   * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch 
Documentation]]
-  * [[http://frutch.free.fr/wikini/|Frutch Wiki]] -- French Nutch Wiki
+  * [[http://frutch.free.fr/|Frutch Wiki]] -- French Nutch Wiki
   * The [[http://nutch.sourceforge.net/cgi-bin/twiki/view/Main/Nutch|Old Wiki]]
   * [[Search_Theory]] Search Theory & White Papers
   * 
[[http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20|Tutorial
 Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=173&rev2=174

  
  == Version 1.3 release ==
   * DownloadingNutch
+  * Current CommandLineOptions
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
@@ -43, +44 @@

   * [[Upgrading_Hadoop|Upgrading Hadoop Version in Nutch]] - Basic steps for 
upgrading Hadoop in Nutch.
   * [[07CommandLineOptions|Commandline]] options for 0.7.x
   * [[08CommandLineOptions|Commandline]] options for version 0.8
-  * Current CommandLineOptions
   * OverviewDeploymentConfigs
   * NutchConfigurationFiles
   * GettingNutchRunningWithUtf8 - For support of non-ASCII characters 
(Chinese, German, Japanese, Korean).


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=172&rev2=173

  <>
  
  == Version 1.3 release ==
- Find it at http://www.apache.org/dyn/closer.cgi/nutch/
+  * DownloadingNutch
  
  == General Information ==
   * [[http://nutch.apache.org|Nutch Website]]
@@ -24, +24 @@

   * [[http://videolectures.net/iiia06_cutting_ense/|Experiences with the Nutch 
search engine]] author:Doug   Cutting,"Video Lecture"
   * [[Lucene]]
   * [[FAQ]]
+  * HardwareRequirements
  
  == Nutch Administration ==
+ 
-  * DownloadingNutch
-  * HardwareRequirements
  === Tutorials ===
   * RunningNutchAndSolr - How to configure Nutch to crawl, but post to Solr 
for search/index
   * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup Nutch and 
Hadoop over a cluster of machines


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=171&rev2=172

   * [[GORA_HBase]] -- Configuring Nutch 2.0 with GORA and HBASE
   * [[http://techvineyard.blogspot.com/2010/12/build-nutch-20.html|Build Nutch 
2.0 in Eclipse]] -- How to setup your IDE environment comfortably.
  
+ == Pre Nutch 1.3 ==
+  * [[Archive]]
+ 
  == Other Resources ==
   * [[http://nutch.sourceforge.net/blog/cutting.html|Doug's Weblog]] -- He's 
the one who originally wrote Lucene and Nutch.
   * [[http://wiki.media-style.com/display/nutchDocu/Home|Stefan's Nutch 
Documentation]]


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=170&rev2=171

  Please contribute your knowledge about Nutch here!
  <>
  
- == Looking for the Version 1.3 release ==
+ == Version 1.3 release ==
  Find it at http://www.apache.org/dyn/closer.cgi/nutch/
  
  == General Information ==


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=169&rev2=170

   * [[Search_Theory]] Search Theory & White Papers
   * 
[[http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20|Tutorial
 Hadoop+Nutch 0.8 night build Roberto Navoni 24-07-06]]
   * [[http://blog.foofactory.fi/|FooFactory]] Nutch and Hadoop related posts
-  * [[http://spinn3r.com|Spinn3r]] [[http://spinn3r.com/opensource.php|Open 
Source components]] (our contribution to the crawling OSS community with more 
to come). /!\ 404 Not found
   * [[http://www.interadvertising.co.uk/blog/nutch_logos|Larger / better 
quality Nutch logos]] Re-created Nutch logos available in GIF, PNG & EPS in 
resolutions up to 1200 x 449
   * [[http://openbixo.org/documentation/running-bixo-in-ec2/|Instructions for 
running Bixo on EC2]] (includes parts of Nutch)
  


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=168&rev2=169

  {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}}
  
  Please contribute your knowledge about Nutch here!
+ <>
  
  == Looking for the Version 1.3 release ==
  Find it at http://www.apache.org/dyn/closer.cgi/nutch/


[Nutch Wiki] Trivial Update of "FrontPage" by LewisJohnMcgibbney

2011-06-13 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=167&rev2=168

- == Welcome to the Apache Nutch Wiki ==
+ = Welcome to the Apache Nutch Wiki =
  {{http://www.interadvertising.co.uk/files/nutch_logo_medium.gif}}
  
  Please contribute your knowledge about Nutch here!


  1   2   >