> Andy Hedges wrote: >> Any reason why this patch hasn't been commited? > > I have not yet had time to test it. Glancing at > http://www.hedges.net/nutch/001/patch-02.txt, it does look like you > fixed the issues I raised in: > > http://www.mail-archive.com/[EMAIL PROTECTED]/msg01381.html > > but before I would feel comfortable comitting a patch to such a critical > component I would want to run it a fair amount to make sure that it > performs well. I probably won't have time for that this week.
OK, fair enough. I'm keen to get an idea of commit dates because I want to be able to reference an CVS tag (or at least a datetime) for a version of Nutch that has certain capabilities. Next week is fine. > > Has anyone else applied this patch and had good luck with it? I have now sucessfully created a corpus of 1M+ pages with this version of the code. There are some interesting (and some not so interesting) log output from the client. For example many sites use invalid RFC cookies (not that it stops the code from fetching them). Another thing of interest is that the code pools connections and can reuse the same socket connection to retrieve multiple resources from the same site (using http1.1's implicit 'keep-alive' mechanism). It might be worth looking at grouping small chunks of urls from the same sites to exploit this efficiency (obviously without hammering sites). > > Doug > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Nutch-developers mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/nutch-developers > ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
