> Andy Hedges wrote:
>> Any reason why this patch hasn't been commited?
>
> I have not yet had time to test it.  Glancing at
> http://www.hedges.net/nutch/001/patch-02.txt, it does look like you
> fixed the issues I raised in:
>
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg01381.html
>
> but before I would feel comfortable comitting a patch to such a critical
> component I would want to run it a fair amount to make sure that it
> performs well.  I probably won't have time for that this week.

OK, fair enough. I'm keen to get an idea of commit dates because I want to
be able to reference an CVS tag (or at least a datetime) for a version of
Nutch  that has certain capabilities. Next week is fine.

>
> Has anyone else applied this patch and had good luck with it?

I have now sucessfully created a corpus of 1M+ pages with this version of
the code. There are some interesting (and some not so interesting) log
output from the client. For example many sites use invalid RFC cookies
(not that it stops the code from fetching them). Another thing of interest
is that the code pools connections and can reuse the same socket
connection to retrieve multiple resources from the same site (using
http1.1's implicit 'keep-alive' mechanism). It might be worth looking at
grouping small chunks of urls from the same sites to exploit this
efficiency (obviously without hammering sites).

>
> Doug
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>




-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to