Ken Krugler wrote:
1. We needed to modify the commons-httpclient code to fix one hang
that sometimes occurs in
[...]
So the question here is what to do with these changes. I will try to
get them integrated into the commons-httpclient code, but that might
take a while before they circle back into Nutch. Suggestions for what
to do in the short term?
Please submit them to the commons-httpclient people - I found them very
responsive to my bug reports. Even before they accept the patches we
could use a "fixed" version of the library - see e.g. parse-rss where a
similar situation occured.
2. Our other changes are a mixture of dealing more effectively with
bad hosts so fetcher threads don't get hung up, and changes to do a
better job of crawling a limited domain space (vertical crawl).
The first set of changes seem like something that could get merged in
(if deemed useful) without too much effort. The second set are more
architectural in nature - and I'm a bit worried about what happens
when we try to integrate these into 0.8. Plus we're still in the
middle of getting the wrinkles ironed out, so it would be premature to
submit any patches.
The Fetcher in 0.8 (or rather in mapred branch) is somewhat different
from 0.7.
But are we going to be running into trouble by waiting? Would it make
sense to send out patches of what we've done to date, even if the code
isn't ready for prime time?
IMHO you should definitely submit the patches to commons-httpclient.
Regarding our code - please create a bug issue and attach the patches.
This gives a chance for others to work on them.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com