The issue was that I had not modified the plugins. The documentation was less 
than clear that this was a pre-requisite so after I figured out exactly what 
was wrong I edited the HTTPAuthentication page in the wiki to clarify that it 
was required.

rjsjr

-----Original Message-----
From: Susam Pal [mailto:[email protected]] 
Sent: Wednesday, June 17, 2009 11:59 AM
To: [email protected]
Subject: Re: NTLM Authentication Not Occuring...

On Wed, Jun 17, 2009 at 8:37 PM, Robert Sanford<[email protected]> wrote:
> I installed "Fiddler" as a proxy on the server and compared the sessions from 
> IE and Nutch. When IE receives the 401 it will then create a new request with 
> the NTLM authentication tokens for which it receives a 200. When Nutch 
> receives the 401 it does not make another request.
>
> This implies to me that the credential that I've added to httpclient-auth.xml 
> are being ignored.
>
> Is there something that I need to set in nutch-site.xml to enable 
> authentication? Is there another configuration option that I've missed 
> somewhere?
>
> Many thanks!
>
> rjsjr

Hi Robert,

Please provide the following information:

1. How are you running the Nutch crawler on Windows 2003 Server?
Please mention the tools used and the commands invoked. e.g. Cygwin,
java commands if any, etc.

2. Have you modified 'conf/nutch-site.xml' to include
'protocol-httpclient' in the 'plugin.includes' property?

3. There must be more logs in the log file pertaining to HTTP
authentication. e.g. Log messages containing the word "Credentials",
"auth.AuthChallengeProcessor", etc. Please send these log messages as
well. If they are not present, probably you have not included
'protocol-httpclient'.

I would suggest that you go through the "Prerequisites" section of
this article: http://wiki.apache.org/nutch/HttpAuthenticationSchemes
to make sure that you have configured 'conf/nutch-site.xml' properly.
You need to ensure that you have replaced 'protocol-http' with
'protocol-httpclient' in the 'plugin.includes' property of
'conf/nutch-site.xml'.

Next, please go through the "Need Help?" section of the same article
and see if it helps you to troubleshoot your issue. If not, please
mail again with the information I have requested above.

Regards,
Susam Pal

Reply via email to