Thanks a lot, Sean. It wasn't clear to me that those commands acted upon waiting outlinks as well, or that outlinks were basically in the same list es injected urls.
Ricardo J. Méndez http://ricardo.strangevistas.net/ Sean Dean wrote: > The default interval for fetched content is 30 days, so whats in your index > now will not be fetched until those days have passed. > > All the new links are ready to be fetched immediately. Just create another > segment from the same Nutch DB and it will include all of those new links to > be fetched. > > You might want to run some stats on your Nutch DB before you do this, or at > least limit the size of the new segment being created. Depending on the size > of your first segment and the amount of links on those pages you might have > imported "a lot" more links then your expecting. > > Stats command: > > bin/nutch readdb crawl/crawldb -stats > > > Limiting segment size: > > bin/nutch generate crawl/crawldb crawl/segments -topN [maximum amount of > links] > > > ----- Original Message ---- > From: Ricardo J. Méndez <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, March 7, 2007 12:16:54 AM > Subject: Following outlinks during - or after - seed fetch > > > Hi, > > I've written a plugin and have been running some tests with Nutch, based > on the tutorials on the wiki (specifically > http://wiki.apache.org/nutch/NutchTutorial ). I'm seeding the crawl > list with a limited item list, so that I can verify the items are being > loaded. > > After the end of the fetch, the index is correctly populated with the > items I told it to fetch. How can I start a crawl from the outlinks on > the items I've seeded? > > Thanks in advance, > > > > Ricardo J. Méndez > http://ricardo.strangevistas.net/ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
