Ken Krugler wrote:

1. We needed to modify the commons-httpclient code to fix one hang that sometimes occurs in

[...]

So the question here is what to do with these changes. I will try to get them integrated into the commons-httpclient code, but that might take a while before they circle back into Nutch. Suggestions for what to do in the short term?


Please submit them to the commons-httpclient people - I found them very responsive to my bug reports. Even before they accept the patches we could use a "fixed" version of the library - see e.g. parse-rss where a similar situation occured.

2. Our other changes are a mixture of dealing more effectively with bad hosts so fetcher threads don't get hung up, and changes to do a better job of crawling a limited domain space (vertical crawl).

The first set of changes seem like something that could get merged in (if deemed useful) without too much effort. The second set are more architectural in nature - and I'm a bit worried about what happens when we try to integrate these into 0.8. Plus we're still in the middle of getting the wrinkles ironed out, so it would be premature to submit any patches.


The Fetcher in 0.8 (or rather in mapred branch) is somewhat different from 0.7.

But are we going to be running into trouble by waiting? Would it make sense to send out patches of what we've done to date, even if the code isn't ready for prime time?


IMHO you should definitely submit the patches to commons-httpclient. Regarding our code - please create a bug issue and attach the patches. This gives a chance for others to work on them.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to