Yes, I have add this to my crawl-urlfilter.txt
+^http://([a-z0-9]*\.)*(yahoo.com|cnn.com|amazon.com|msn.com|google.com)/ but i still have the problem that I mention in my previous mail. On 4/10/07, Michael Wechner <[EMAIL PROTECTED]> wrote: > Meryl Silverburgh wrote: > > > Hi, > > > > i am trying to setup Nutch. > > I setup 1 site in my urls file: > > http://www.yahoo.com > > > have yiu added it to the URL/Crawl filters? > > Cheers > > Michael > > > > > And then I start crawl using this command: > > $bin/nutch crawl urls -dir crawl -depth 1 -topN 5 > > > > But I get this "No URLs to fecth", can you please tell me what am i > > missing? > > $ bin/nutch crawl urls -dir crawl -depth 1 -topN 5 > > crawl started in: crawl > > rootUrlDir = urls > > threads = 10 > > depth = 1 > > topN = 5 > > Injector: starting > > Injector: crawlDb: crawl/crawldb > > Injector: urlDir: urls > > Injector: Converting injected urls to crawl db entries. > > Injector: Merging injected urls into crawl db. > > Injector: done > > Generator: Selecting best-scoring urls due for fetch. > > Generator: starting > > Generator: segment: crawl/segments/20070406140513 > > Generator: filtering: false > > Generator: topN: 5 > > Generator: jobtracker is 'local', generating exactly one partition. > > Generator: 0 records selected for fetching, exiting ... > > Stopping at depth=0 - no more URLs to fetch. > > No URLs to fetch - check your seed list and URL filters. > > crawl finished: crawl > > > > > -- > Michael Wechner > Wyona - Open Source Content Management - Apache Lenya > http://www.wyona.com http://lenya.apache.org > [EMAIL PROTECTED] [EMAIL PROTECTED] > +41 44 272 91 61 > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
