The issue was that I had not modified the plugins. The documentation was less than clear that this was a pre-requisite so after I figured out exactly what was wrong I edited the HTTPAuthentication page in the wiki to clarify that it was required.
rjsjr -----Original Message----- From: Susam Pal [mailto:[email protected]] Sent: Wednesday, June 17, 2009 11:59 AM To: [email protected] Subject: Re: NTLM Authentication Not Occuring... On Wed, Jun 17, 2009 at 8:37 PM, Robert Sanford<[email protected]> wrote: > I installed "Fiddler" as a proxy on the server and compared the sessions from > IE and Nutch. When IE receives the 401 it will then create a new request with > the NTLM authentication tokens for which it receives a 200. When Nutch > receives the 401 it does not make another request. > > This implies to me that the credential that I've added to httpclient-auth.xml > are being ignored. > > Is there something that I need to set in nutch-site.xml to enable > authentication? Is there another configuration option that I've missed > somewhere? > > Many thanks! > > rjsjr Hi Robert, Please provide the following information: 1. How are you running the Nutch crawler on Windows 2003 Server? Please mention the tools used and the commands invoked. e.g. Cygwin, java commands if any, etc. 2. Have you modified 'conf/nutch-site.xml' to include 'protocol-httpclient' in the 'plugin.includes' property? 3. There must be more logs in the log file pertaining to HTTP authentication. e.g. Log messages containing the word "Credentials", "auth.AuthChallengeProcessor", etc. Please send these log messages as well. If they are not present, probably you have not included 'protocol-httpclient'. I would suggest that you go through the "Prerequisites" section of this article: http://wiki.apache.org/nutch/HttpAuthenticationSchemes to make sure that you have configured 'conf/nutch-site.xml' properly. You need to ensure that you have replaced 'protocol-http' with 'protocol-httpclient' in the 'plugin.includes' property of 'conf/nutch-site.xml'. Next, please go through the "Need Help?" section of the same article and see if it helps you to troubleshoot your issue. If not, please mail again with the information I have requested above. Regards, Susam Pal
