+1
I've been planning to switch my crawler over to use protocol-
httpclient, but haven't got there yet. Interesting that there seems
to be a performance impact with the new plugin.
(In my crawl setup, I override the default HTTP plugin so I can
modify HTML content before it is written to a segment. I'd prefer if
there was a hook for rewriting content regardless of protocol, but
this works for now.)
--Matt
On Nov 9, 2005, at 1:19 PM, Doug Cutting wrote:
I was recently benchmarking fetching at a site with lots of
bandwidth, and it seemed to me that protocol-http is capable of
faster crawling than protocol-httpclient. So I don't think we
should discard protocol-http just yet. But there's a lot of
duplicate code between these, which is difficult to maintain.
I think we should thus merge these, with a configuration parameter
determining which http backend is used, much like parse-html, which
can switch between neko and tagsoup.
What do others think?
Doug
--
Matt Kangas / [EMAIL PROTECTED]
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers