Lukáš Vlček wrote:
Hi,

I just noticed that Niocchi has been released recently.
http://www.niocchi.com/

Niocchi is a java asynchronous crawl library implemented with NIO. It is designed to crawl several thousands of hosts in parallel on a single low end server.It is currently being used in production by Enormo <http://www.enormo.com/> to crawl thousands of websites daily, and by Vitalprix <http://www.vitalprix.com/>.

Well, of course we should optimize our use of resources, and we could check what this library can offer - but I doubt that optimizations on this level would bring significant benefits in terms of increased speed of crawling - low-level IO handling is rarely the bottleneck. Most of the time the politeness limits (max rate of requests per host) are the bottleneck.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to