On 5/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Doğacan Güney wrote:
> > On 5/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> >> Doğacan Güney wrote:
> >> > Hi everyone,
> >> >
> >> > Has anyone tried Fetcher2 from latest trunk? On our tests, Fetcher2 is
> >> > always slower (by a large margin) that Fetcher.
> >> >
> >> > For a segment with ~30000 urls, we ran Fetcher with 150 threads and
> >> > Fetcher2 with 50 threads. Fetcher finishes around 1 hour, while
> >> > Fetcher2 takes around 4 hours.  We ran this test more than once and
> >> > got similar results.
> >> >
> >> > Are we running Fetcher2 with too few/too many threads? I was under the
> >> > impression that Fetcher2 doesn't need as many threads as Fetcher since
> >> > threads do not block.
> >>
> >>
> >> Yes, that was the idea. Could you test it with the same number of
> >> threads? Is the configuration identical in all other aspects?
> >
> > Yes, it is identical in other aspects. I am currently testing with
> > same number of threads. Will report if there is a difference.
> >
> >>
> >> Are you running the version with the fix from NUTCH-474?
> >>
> >>
> >> >
> >> > Any suggestions?
> >> >
> >>
> >> If you already have a setup to reproduce this, you could perhaps spend
> >> some time debugging this ... add some timing info, and queue info
> >> logging.
> >
> > What do you think would be a good place(or places) to add debug info?
> > Looking at the code I am not sure where to add them?
>
> FetchItemQueues.getFetchItem() and FetchItemQueue.getFetchItem() would
> be good places to start - the logging here would show how frequently
> they are called, and why fetch items are not picked up (perhaps
> per-queue blocking is buggy?).

I am still not sure about the source of this bug, but I think I found
some unnecessary waits in Fetcher2. Even if a url is blocked by
robots.txt (or has a crawl delay larger that max.crawl.delay),
Fetcher2 still waits fetcher.server.delay before fetching another url
from same host, which is not necessary, considering that Fetcher2
didn't make a request to server anyway.

So, I have put up a patch for this at (*) . What do you think? If you
have no objections I am going to go ahead and open an issue for this.

(*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch


>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to