Fuad Efendi wrote:
Hi Andrzej,
Real bottleneck of Nutch is RegexURLNormalizer, it is still synchronized
singleton (shared by multiple threads). And similar synchronized plugins which
should be probably refactored to Nutch core...
It's not a singleton, but it's true that the normalize() method is
synchronized. Did you actually measure the impact of this
synchronization on the crawling speed? I very much doubt it outweighs
the impact of politeness limits.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com