Re: [Nutch-general] Fetcher2 slowness?

Doğacan Güney Thu, 31 May 2007 08:52:47 -0700

On 5/31/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Doğacan Güney wrote:
>
> > I am still not sure about the source of this bug, but I think I found
> > some unnecessary waits in Fetcher2. Even if a url is blocked by
> > robots.txt (or has a crawl delay larger that max.crawl.delay),
> > Fetcher2 still waits fetcher.server.delay before fetching another url
> > from same host, which is not necessary, considering that Fetcher2
> > didn't make a request to server anyway.
> >
> > So, I have put up a patch for this at (*) . What do you think? If you
> > have no objections I am going to go ahead and open an issue for this.
> >
> > (*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch
>
> Good catch! The patch looks good, too - please go ahead. One question:
> why did you remove the call to finishFetchItem() around line 505?


Because it seems we already call finishFetchItem in that code path
just before the switch statement. I have opened NUTCH-495 for this, if
I am mistaken, just give me a nudge and I will send an updated patch.

>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] Fetcher2 slowness?

Reply via email to