No input directories specified in: while crawing in nightly build from the 14.1.2006: sh ./nutch crawl urllist.txt -dir tmpdir ------------------------------------------------------------------------------------------------------------------------------
Key: NUTCH-175 URL: http://issues.apache.org/jira/browse/NUTCH-175 Project: Nutch Type: Bug Environment: SUSE Linux 9.3 Reporter: Matthias Günter Priority: Trivial [EMAIL PROTECTED]:~/workspace/lucene/nutch-nightly/bin> sh ./nutch crawl urllist.txt -dir tmpdir 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml 060114 205612 crawl started in: tmpdir 060114 205612 rootUrlDir = urllist.txt 060114 205612 threads = 10 060114 205612 depth = 5 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml 060114 205612 Injector: starting 060114 205612 Injector: crawlDb: tmpdir/crawldb 060114 205612 Injector: urlDir: urllist.txt 060114 205612 Injector: Converting injected urls to crawl db entries. 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/crawl-tool.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml 060114 205612 Running job: job_n0o7ps 060114 205612 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-default.xml 060114 205613 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/mapred-default.xml 060114 205613 parsing /tmp/nutch/mapred/local/localRunner/job_n0o7ps.xml 060114 205613 parsing file:/home/guenter/workspace/lucene/nutch-nightly/conf/nutch-site.xml java.io.IOException: No input directories specified in: NutchConf: nutch-default.xml , mapred-default.xml , /tmp/nutch/mapred/local/localRunner/job_n0o7ps.xml , nutch-site.xml at org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:85) at org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:95) at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:63) 060114 205613 map 0% Exception in thread "main" java.io.IOException: Job failed! at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308) at org.apache.nutch.crawl.Injector.inject(Injector.java:102) at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) urllist.txt contains http://www.mentor.ch PS: Is there a committer or developer (near Switzerland) who can support (paid support) with a mixed index for intranet, some internet sites and scanning of local drives (P:\ , S:\ etc) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers