I see now whats causing the error. /urls/nutch is a file...but you have to give as input only the urls folder not the file as i did ;)
ps: is there an irc channel for nutch or 'only' mailing list? thx martin Zitat von Briggs <[EMAIL PROTECTED]>: > is urls/nutch a file or directory? > > On 6/6/07, Martin Kammerlander <[EMAIL PROTECTED]> > wrote: > > Hi > > > > I wanted to start a crawl like it is done in the nutch 0.8.x tutorial. > > Unfortunately I get the following error: > > > > [EMAIL PROTECTED] nutch-0.8.1]$ bin/nutch crawl urls/nutch -dir crawl.test > > -depth 10 > > crawl started in: crawl.test > > rootUrlDir = urls/nutch > > threads = 10 > > depth = 10 > > Injector: starting > > Injector: crawlDb: crawl.test/crawldb > > Injector: urlDir: urls/nutch > > Injector: Converting injected urls to crawl db entries. > > Exception in thread "main" java.io.IOException: Input directory > > /scratch/nutch-0.8.1/urls/nutch in local is invalid. > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327) > > at org.apache.nutch.crawl.Injector.inject(Injector.java:138) > > at org.apache.nutch.crawl.Crawl.main(Crawl.java:105) > > > > Any ideas what is causing that? > > > > regards > > martin > > > > > -- > "Conscious decisions by conscious minds are what make reality real" > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general