On 5/31/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Doğacan Güney wrote: > > > I am still not sure about the source of this bug, but I think I found > > some unnecessary waits in Fetcher2. Even if a url is blocked by > > robots.txt (or has a crawl delay larger that max.crawl.delay), > > Fetcher2 still waits fetcher.server.delay before fetching another url > > from same host, which is not necessary, considering that Fetcher2 > > didn't make a request to server anyway. > > > > So, I have put up a patch for this at (*) . What do you think? If you > > have no objections I am going to go ahead and open an issue for this. > > > > (*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch > > Good catch! The patch looks good, too - please go ahead. One question: > why did you remove the call to finishFetchItem() around line 505?
Because it seems we already call finishFetchItem in that code path just before the switch statement. I have opened NUTCH-495 for this, if I am mistaken, just give me a nudge and I will send an updated patch. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
