Thanks a lot, Sean.  It wasn't clear to me that those commands acted
upon waiting outlinks as well, or that outlinks were basically in the
same list es injected urls.



Ricardo J. Méndez
http://ricardo.strangevistas.net/

Sean Dean wrote:
> The default interval for fetched content is 30 days, so whats in your index 
> now will not be fetched until those days have passed.
>  
> All the new links are ready to be fetched immediately. Just create another 
> segment from the same Nutch DB and it will include all of those new links to 
> be fetched.
>  
> You might want to run some stats on your Nutch DB before you do this, or at 
> least limit the size of the new segment being created. Depending on the size 
> of your first segment and the amount of links on those pages you might have 
> imported "a lot" more links then your expecting.
>  
> Stats command: 
>  
> bin/nutch readdb crawl/crawldb -stats
> 
>  
> Limiting segment size:
>  
> bin/nutch generate crawl/crawldb crawl/segments -topN [maximum amount of 
> links]
> 
>  
> ----- Original Message ----
> From: Ricardo J. Méndez <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, March 7, 2007 12:16:54 AM
> Subject: Following outlinks during - or after - seed fetch
> 
> 
> Hi,
> 
> I've written a plugin and have been running some tests with Nutch, based
> on the tutorials on the wiki (specifically
> http://wiki.apache.org/nutch/NutchTutorial ).  I'm seeding the crawl
> list with a limited item list, so that I can verify the items are being
> loaded.
> 
> After the end of the fetch, the index is correctly populated with the
> items I told it to fetch.   How can I start a crawl from the outlinks on
> the items I've seeded?
> 
> Thanks in advance,
> 
> 
> 
> Ricardo J. Méndez
> http://ricardo.strangevistas.net/

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to