Andrzej Bialecki wrote:
Hmm... I'm not saying it's flawless, there were surely some mysterious things going on with it. That large crawl you mention, was it with the (recently updated in Nutch) release 3.0? What were the issues?
No, it was in early December, with the previous version. I don't recall the details, but it seemed slower, had a higher error rate, and seemed to result in more hung thread incidents.
The main advantage of protocol-http is that it's so simple that few things can go wrong, but this also means it's relatively unsophisticated, and adding more advanced features could mean a lot of work. Namely, adding support for https, cookies and authentication.
These are all good reasons to use protocol-httpclient. But if you don't need any of those features, protocol-http seems to presently work better.
Perhaps we should get more feedback on the 3.0 version before we make a decision?
Doug ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
