The default interval for fetched content is 30 days, so whats in your index now
will not be fetched until those days have passed.
All the new links are ready to be fetched immediately. Just create another
segment from the same Nutch DB and it will include all of those new links to be
fetched.
You might want to run some stats on your Nutch DB before you do this, or at
least limit the size of the new segment being created. Depending on the size of
your first segment and the amount of links on those pages you might have
imported "a lot" more links then your expecting.
Stats command:
bin/nutch readdb crawl/crawldb -stats
Limiting segment size:
bin/nutch generate crawl/crawldb crawl/segments -topN [maximum amount of links]
----- Original Message ----
From: Ricardo J. Méndez <[EMAIL PROTECTED]>
To: [email protected]
Sent: Wednesday, March 7, 2007 12:16:54 AM
Subject: Following outlinks during - or after - seed fetch
Hi,
I've written a plugin and have been running some tests with Nutch, based
on the tutorials on the wiki (specifically
http://wiki.apache.org/nutch/NutchTutorial ). I'm seeding the crawl
list with a limited item list, so that I can verify the items are being
loaded.
After the end of the fetch, the index is correctly populated with the
items I told it to fetch. How can I start a crawl from the outlinks on
the items I've seeded?
Thanks in advance,
Ricardo J. Méndez
http://ricardo.strangevistas.net/
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general