I have crawled few website. Everything worked fine and now i have one
crawldb, segments, linkdb and indexes.
I decide to recrawl those website, so I will
- generate a list of urls from my existing crawldb
- fetch this segment
- update the existing crawldb
- invert links and finally index.
Should i create a new index and linkdb with all segments in the folder (
segments dated from my first crawl and segments dated from this crawl) ?
Or should i just use the command index (or invertlinks) with the existing
index (or existing linkdb) and i just specify only the new segments that
have just been crawled ?
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general