Andrzej Bialecki wrote:
Hmm... I'm not saying it's flawless, there were surely some mysterious things going on with it. That large crawl you mention, was it with the (recently updated in Nutch) release 3.0? What were the issues?

No, it was in early December, with the previous version. I don't recall the details, but it seemed slower, had a higher error rate, and seemed to result in more hung thread incidents.

The main advantage of protocol-http is that it's so simple that few things can go wrong, but this also means it's relatively unsophisticated, and adding more advanced features could mean a lot of work. Namely, adding support for https, cookies and authentication.

These are all good reasons to use protocol-httpclient. But if you don't need any of those features, protocol-http seems to presently work better.

Perhaps we should get more feedback on the 3.0 version before we make a decision?

Doug


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to