Hi , I have 3 slaves mentiond in the conf/slave file, also started all process using bin/nutch start-all.sh and i have started crawling using the command bin/nutch crawl -dir crawld -depth 30 -topN 50. and crowld successfully....no problem
but all the jobs are executed in localhost Machine, is it possible to split all the jobs into the 3 slave machines ? if so how can i do ? please help me its urgent.............. http://localhost:50030/ there is only one node have been displayed Maps Reduces Tasks/Node Nodes 0 0 2 1 Regards Mohan Lal "H?vard W. Kongsg?rd"-2 wrote: > > see: > http://wiki.apache.org/nutch-data/attachments/FrontPage/attachments/Hadoop-Nutch%200.8%20Tutorial%2022-07-06%20%3CNavoni%20Roberto%3E > > Before you start tomcat remeber to change the path of your search > directory in the file nutch-site.xml in webapps/ROOT/web-inf/classes > directory > > #This is an example of my configuration > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > > <!-- Put site-specific property overrides in this file. --> > > <configuration> > <property> > <name>fs.default.name</name> > <value>LSearchDev01:9000</value> > </property> > > <property> > <name>searcher.dir</name> > <value>/user/root/crawld</value> > </property> > > </configuration> > > > > Mohan Lal wrote: >> Hi, >> >> thanks for your valuable information, i have solved that problem after >> that >> iam facing another problem .... >> i have 2 slaves >> 1) MAC1 >> 2) MAC2 >> >> but the job was running in MAC1 itself, and it take a long time to finish >> the crawling process >> how can i assign job to distributed machines i specified in tha slaves >> file >> ? >> >> But my Crowling process done successfully..........also how ccan i >> specify >> the searcher dir in the nutch-site.xml file >> >> <property> >> <name>searcher.dir</name> >> <value> ? </value> >> </property> >> >> please help me......... >> >> >> I have done the following setting..... >> >> [EMAIL PROTECTED] ~]# cd /home/lucene/nutch-0.8.1/ >> [EMAIL PROTECTED] nutch-0.8.1]# bin/hadoop namenode -format >> Re-format filesystem in /tmp/hadoop/dfs/name ? (Y or N) Y >> Formatted /tmp/hadoop/dfs/name >> [EMAIL PROTECTED] nutch-0.8.1]# bin/start-all.sh >> starting namenode, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hadoop-root-n >> amenode-mohanlal.qburst.local.out >> fpo: ssh: fpo: Name or service not known >> localhost: starting datanode, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/ha >> doop-root-datanode-mohanlal.qburst.local.out >> starting jobtracker, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hadoop-root >> -jobtracker-mohanlal.qburst.local.out >> fpo: ssh: fpo: Name or service not known >> localhost: starting tasktracker, logging to >> /home/lucene/nutch-0.8.1/bin/../logs >> /hadoop-root-tasktracker-mohanlal.qburst.local.out >> [EMAIL PROTECTED] nutch-0.8.1]# bin/stop-all.sh >> stopping jobtracker >> localhost: stopping tasktracker >> sonu: no tasktracker to stop >> stopping namenode >> sonu: no datanode to stop >> localhost: stopping datanode >> [EMAIL PROTECTED] nutch-0.8.1]# bin/start-all.sh >> starting namenode, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hadoop-root-n >> amenode-mohanlal.qburst.local.out >> sonu: starting datanode, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hadoop- >> root-datanode-sonu.qburst.local.out >> localhost: starting datanode, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/ha >> doop-root-datanode-mohanlal.qburst.local.out >> starting jobtracker, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hadoop-root >> -jobtracker-mohanlal.qburst.local.out >> localhost: starting tasktracker, logging to >> /home/lucene/nutch-0.8.1/bin/../logs >> /hadoop-root-tasktracker-mohanlal.qburst.local.out >> sonu: starting tasktracker, logging to >> /home/lucene/nutch-0.8.1/bin/../logs/hado >> op-root-tasktracker-sonu.qburst.local.out >> [EMAIL PROTECTED] nutch-0.8.1]# bin/hadoop dfs -put urls urls >> [EMAIL PROTECTED] nutch-0.8.1]# bin/nutch crawl urls -dir crawl.1 -depth 2 >> -topN 10 crawl started in: crawl.1 >> rootUrlDir = urls >> threads = 100 >> depth = 2 >> topN = 10 >> Injector: starting >> Injector: crawlDb: crawl.1/crawldb >> Injector: urlDir: urls >> Injector: Converting injected urls to crawl db entries. >> Injector: Merging injected urls into crawl db. >> Injector: done >> Generator: starting >> Generator: segment: crawl.1/segments/20060929120038 >> Generator: Selecting best-scoring urls due for fetch. >> Generator: Partitioning selected urls by host, for politeness. >> Generator: done. >> Fetcher: starting >> Fetcher: segment: crawl.1/segments/20060929120038 >> Fetcher: done >> CrawlDb update: starting >> CrawlDb update: db: crawl.1/crawldb >> CrawlDb update: segment: crawl.1/segments/20060929120038 >> CrawlDb update: Merging segment data into db. >> CrawlDb update: done >> Generator: starting >> Generator: segment: crawl.1/segments/20060929120235 >> Generator: Selecting best-scoring urls due for fetch. >> Generator: Partitioning selected urls by host, for politeness. >> Generator: done. >> Fetcher: starting >> Fetcher: segment: crawl.1/segments/20060929120235 >> Fetcher: done >> CrawlDb update: starting >> CrawlDb update: db: crawl.1/crawldb >> CrawlDb update: segment: crawl.1/segments/20060929120235 >> CrawlDb update: Merging segment data into db. >> CrawlDb update: done >> LinkDb: starting >> LinkDb: linkdb: crawl.1/linkdb >> LinkDb: adding segment: /user/root/crawl.1/segments/20060929120038 >> LinkDb: adding segment: /user/root/crawl.1/segments/20060929120235 >> LinkDb: done >> Indexer: starting >> Indexer: linkdb: crawl.1/linkdb >> Indexer: adding segment: /user/root/crawl.1/segments/20060929120038 >> Indexer: adding segment: /user/root/crawl.1/segments/20060929120235 >> Indexer: done >> Dedup: starting >> Dedup: adding indexes in: crawl.1/indexes >> Dedup: done >> Adding /user/root/crawl.1/indexes/part-00000 >> Adding /user/root/crawl.1/indexes/part-00001 >> crawl finished: crawl.1 >> >> >> Thanks and Regards >> Mohanlal >> >> >> "H?vard W. Kongsg?rd"-2 wrote: >> >>> Do /user/root/url exist, have you uploaded the url folder to you dfs >>> system? >>> >>> bin/hadoop dfs -mkdir urls >>> bin/hadoop dfs -copyFromLocal urls.txt urls/urls.txt >>> >>> or >>> >>> bin/hadoop -put <localsrc> <dst> >>> >>> >>> Mohan Lal wrote: >>> >>>> Hi all, >>>> >>>> While iam try to crawl using distributed machines its throw an error >>>> >>>> bin/nutch crawl urls -dir crawl -depth 10 -topN 50 >>>> crawl started in: crawl >>>> rootUrlDir = urls >>>> threads = 10 >>>> depth = 10 >>>> topN = 50 >>>> Injector: starting >>>> Injector: crawlDb: crawl/crawldb >>>> Injector: urlDir: urls >>>> Injector: Converting injected urls to crawl db entries. >>>> Exception in thread "main" java.io.IOException: Input directory >>>> /user/root/urls in localhost:9000 is invalid. >>>> at >>>> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) >>>> at >>>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) >>>> at org.apache.nutch.crawl.Injector.inject(Injector.java:138) >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) >>>> >>>> whats wrong with my configuration, please help me.................. >>>> >>>> >>>> Regards >>>> Mohan Lal >>>> >>>> >>> >>> >> >> > > > -- View this message in context: http://www.nabble.com/Problem-in-Distributed-crawling-using-nutch-0.8-tf2348922.html#a6576824 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
