On 5/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Doğacan Güney wrote: > > On 5/18/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > >> Doğacan Güney wrote: > >> > Hi everyone, > >> > > >> > Has anyone tried Fetcher2 from latest trunk? On our tests, Fetcher2 is > >> > always slower (by a large margin) that Fetcher. > >> > > >> > For a segment with ~30000 urls, we ran Fetcher with 150 threads and > >> > Fetcher2 with 50 threads. Fetcher finishes around 1 hour, while > >> > Fetcher2 takes around 4 hours. We ran this test more than once and > >> > got similar results. > >> > > >> > Are we running Fetcher2 with too few/too many threads? I was under the > >> > impression that Fetcher2 doesn't need as many threads as Fetcher since > >> > threads do not block. > >> > >> > >> Yes, that was the idea. Could you test it with the same number of > >> threads? Is the configuration identical in all other aspects? > > > > Yes, it is identical in other aspects. I am currently testing with > > same number of threads. Will report if there is a difference. > > > >> > >> Are you running the version with the fix from NUTCH-474? > >> > >> > >> > > >> > Any suggestions? > >> > > >> > >> If you already have a setup to reproduce this, you could perhaps spend > >> some time debugging this ... add some timing info, and queue info > >> logging. > > > > What do you think would be a good place(or places) to add debug info? > > Looking at the code I am not sure where to add them? > > FetchItemQueues.getFetchItem() and FetchItemQueue.getFetchItem() would > be good places to start - the logging here would show how frequently > they are called, and why fetch items are not picked up (perhaps > per-queue blocking is buggy?).
I am still not sure about the source of this bug, but I think I found some unnecessary waits in Fetcher2. Even if a url is blocked by robots.txt (or has a crawl delay larger that max.crawl.delay), Fetcher2 still waits fetcher.server.delay before fetching another url from same host, which is not necessary, considering that Fetcher2 didn't make a request to server anyway. So, I have put up a patch for this at (*) . What do you think? If you have no objections I am going to go ahead and open an issue for this. (*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
