All,

I am a masters student and want to crawl the whole web for my masters
project.

While trying to generate, fetch, crawl the whole web using Nutch (I am
following steps from http://lucene.apache.org/nutch/tutorial8.html), I got
confused among various nutch terms and usage:
1) What is the purpose and difference between *crawl_fetch *and* crawldb* ?
If nutch stores all the info regarding urls in * crawldb*, then what is the
need for *crawl_fetch*?
2) Moreover, what does fetch and generate do? Can anyone describe in detail?
Is there any documentation for nutch commands like generate, fetch, etc?


Thanks & Regards,
Gaurang Patel

Reply via email to