Fuad Efendi wrote:
Hi Andrzej,

Real bottleneck of Nutch is RegexURLNormalizer, it is still synchronized 
singleton (shared by multiple threads). And similar synchronized plugins which 
should be probably refactored to Nutch core...

It's not a singleton, but it's true that the normalize() method is synchronized. Did you actually measure the impact of this synchronization on the crawling speed? I very much doubt it outweighs the impact of politeness limits.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to