Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "FrontPage" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=237&rev2=238 * [[http://nutch.apache.org/apidocs-1.4/index.html|JavaDocs]] -- The !JavaDocs for Nutch-1.4 release. === Tutorials === - * NutchTutorial - How to configure Nutch 1.3 to crawl in local mode and post to Apache Solr for search/index. + * NutchTutorial - How to configure Nutch to crawl in local mode and post to Apache Solr for search/index. * [[http://hadoop.apache.org/common/docs/stable/|Hadoop Tutorial]] Nutch being based Hadoop, it helps to have a better understanding of Hadoop. * [[NutchHadoopTutorial|Nutch Hadoop Tutorial]] - How to setup and run Nutch in deploy mode over a Hadoop cluster. /!\ :This tutorial is in development: /!\ - * RunNutchInEclipse - How to configure, build, crawl and debug Nutch 1.3 within Eclipse + * RunNutchInEclipse - How to configure, build, crawl and debug Nutch within Eclipse - * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search Microsoft Office, PDF etc documentsin a file system hierachy with a Solr backend. + * [[IntranetDocumentSearch|Intranet Document Search]] - Index and search Microsoft Office, PDF etc. documents in a file system hierarchy with a Solr backend. === Configuration === - * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect Nutch 1.3 release: /!\ + * OverviewDeploymentConfigs /!\ :This full page requires a complete update to reflect recent Nutch releases: /!\ * NutchConfigurationFiles * HttpAuthenticationSchemes - How to enable Nutch to authenticate itself using NTLM, Basic or Digest authentication schemes. - * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch 1.3 intranet crawling configuration. + * NonDefaultIntranetCrawlingOptions - Desirable options to add to your Nutch intranet crawling configuration. - * OptimizingCrawls - How to optimize your crawling/fetching speed with Nutch. + * OptimizingCrawls - How to optimise your crawling/fetching speed with Nutch. - * ErrorMessages -- What they mean and suggestions for getting rid of them. /!\ :This requires extensive updating to reflect Nutch 1.3. In addition the legacy indexing and searching material should be archived. /!\ + * ErrorMessages -- What they mean and suggestions for getting rid of them. /!\ :This requires extensive updating to reflect recent Nutch releases. In addition the legacy indexing and searching material should be archived. /!\ * SetupProxyForNutch - using Tinyproxy on Ubuntu * IndexStructure /!\ :This page needs a slight update to provide more information on plugins and the data they send to Solr for indexing: /!\ == General Information == * [[http://nutch.apache.org|Nutch Website]] - * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect Nutch 1.3 features. /!\ + * [[Features]] /!\ :TODO:This needs to be completely overhauled to reflect recent Nutch features. /!\ * Current [[NutchGotchas|Nutch Gotchas]] * PublicServers running Nutch * [[Presentations]] on Nutch