+1

I've been planning to switch my crawler over to use protocol- httpclient, but haven't got there yet. Interesting that there seems to be a performance impact with the new plugin.

(In my crawl setup, I override the default HTTP plugin so I can modify HTML content before it is written to a segment. I'd prefer if there was a hook for rewriting content regardless of protocol, but this works for now.)

--Matt

On Nov 9, 2005, at 1:19 PM, Doug Cutting wrote:

I was recently benchmarking fetching at a site with lots of bandwidth, and it seemed to me that protocol-http is capable of faster crawling than protocol-httpclient. So I don't think we should discard protocol-http just yet. But there's a lot of duplicate code between these, which is difficult to maintain.

I think we should thus merge these, with a configuration parameter determining which http backend is used, much like parse-html, which can switch between neko and tagsoup.

What do others think?

Doug

--
Matt Kangas / [EMAIL PROTECTED]




-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to