Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by ThiloPfennig: http://wiki.apache.org/nutch/GettingNutchRunningWithFedoraCore ------------------------------------------------------------------------------ This is based on GettingNutchRunningWithRedHatApplicationServer. To make this easier to start we are using the yum command line as an example. - - - /!\ This is not yet a working installation description. == Repositories we need == @@ -41, +38 @@ * No Match for argument: jta-javadoc + + == Install Java == + + * [http://javashoplm.sun.com/ECom/docs/Welcome.jsp?StoreId=22&PartDetailId=jdk-1.5.0_08-oth-JPR&SiteId=JSC&TransactionId=noregDownload Install Linux RPM in self-extracting file] + + == Download and Testing == + * DownloadingNutch: downloaded nutch-0.8.tar.gz {{{ tar xzf nutch-08.tar.gz cd nutch-0.8 + + {{{ + export JAVA_HOME=/usr/java/jdk1.5.0_08/ bin/nutch }}} + + + * Test using http://lucene.apache.org/nutch/tutorial.html + + 1. add an url in a new file "urls" + 1. add/edit conf/crawl-urlfilter.txt (under # accept hosts in MY.DOMAIN.NAME ) + + '''result:''' {{{ + Exception in thread "main" java.io.IOException: Input directory /home/vinci/Down + loads/nutch-0.8/urls in local is invalid. + at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) + at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) + at org.apache.nutch.crawl.Injector.inject(Injector.java:138) + at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) - Usage: nutch COMMAND - where COMMAND is one of: - crawl one-step crawler for intranets - readdb read / dump crawl db - mergedb merge crawldb-s, with optional filtering - readlinkdb read / dump link db - inject inject new urls into the database - generate generate new segments to fetch - fetch fetch a segment's pages - parse parse a segment's pages - segread read / dump segment data - mergesegs merge several segments, with optional filtering and slicing - updatedb update crawl db from segments after fetching - invertlinks create a linkdb from parsed segments - mergelinkdb merge linkdb-s, with optional filtering - index run the indexer on parsed segments and linkdb - merge merge several segment indexes - dedup remove duplicates from a set of segment indexes - plugin load a plugin and run one of its classes main() - server run a search server - or - CLASSNAME run the class named CLASSNAME - Most commands print help when invoked w/o parameters. }}} + ---