Doğacan Güney wrote:

> I am still not sure about the source of this bug, but I think I found
> some unnecessary waits in Fetcher2. Even if a url is blocked by
> robots.txt (or has a crawl delay larger that max.crawl.delay),
> Fetcher2 still waits fetcher.server.delay before fetching another url
> from same host, which is not necessary, considering that Fetcher2
> didn't make a request to server anyway.
> 
> So, I have put up a patch for this at (*) . What do you think? If you
> have no objections I am going to go ahead and open an issue for this.
> 
> (*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch

Good catch! The patch looks good, too - please go ahead. One question: 
why did you remove the call to finishFetchItem() around line 505?

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to