Hi Sean Firstly thanks for the input - it is much appreciated!
> 1. I would try anything between 100 and 300 threads when using the latest > trunk sources (I currently use 150). You don't really need that many threads, > and with too many you might run out of stack memory. What is your recommendation with threads per host? I was running 10 but then I noticed one site that I was indexing had a 500 server error stating that "there were too many connections to localhost". The last thing I want to do is create a DoS attack on webservers so I reduced this to 5 but not sure what the recommended is. > 2. This isn't exactly what you wanted, but you can build upon it. It should > save you at least some time as it will complete one full cycle (generate, > fetch, updatedb, invertlinks, and index). Most of this is basically whats > listed in the tutorial, and remember to edit so that it matches your paths > and config. When you say it will generate one full cycle do you mean that only one segment will be created and then the fetch, updatedb, invertlinks, and index from that one segment? One last question I would also like to ask is can a URL be deleted from a segment and/or index once it has been fetched or will the whole index need to be re-created? -- Regards Justin Hartman PGP Key ID: 102CC123 ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
