Re: [Nutch-general] Fetcher threads & automation

Dennis Kubes Sun, 28 Jan 2007 08:48:09 -0800

We have a python script with logging which fully automates the fetching 
and updating process, not the invert links or the indexing process.  If 
anybody wants a copy, send me an email and I will send you a copy.


We are currently working on a more in-depth framework for automating 
these types of job streams in python but that is not complete yet.

Andrzej, do you think this is something we should post to the wiki?

Dennis Kubes

Justin Hartman wrote:
> Hi all
> 
> Just have a couple more questions which remain unclear to me at this stage.
> 
> 1. I'm fetching urls on a P4 2.8ghz machine with 1GB ram and 100mbps
> connection. Based on this config what would you recommend the maximum
> fetcher threads should be?
> 
> 2. Does anyone know of a script or plugin that can automate the
> segment/fetch/indexing process? Basicallly I'm fetching about 20
> million pages and I have to run the segment, fetch and index process
> myself in a shell (which takes some time). I really would like some
> sort of a shell script that I can run and the whole process can run as
> a daemon in the background and I can worry about other issues.
> 
> Thank you in advance!!!!

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Fetcher threads & automation

Reply via email to