On Fri, May 15, 2009 at 6:57 PM, Rochelle D'souza
<[email protected]> wrote:
> hi Susam,
> very sorry for the mistake in the 1st code. I had put <default/> but omitted
> that line when i sent it across to u :(.
>
> for our intranet sites we do not require a proxy. hence i have now removed
> the proxy and ensured its default auth and did a crawl. have attached the
> log, still getting the same 401 :(

I have run out of ideas on what might be causing the problem.

2009-05-15 18:44:58,326 DEBUG httpclient.Http - Credentials -
username: devadmin; set as default for realm: ; scheme:

2009-05-15 18:44:58,326 DEBUG httpclient.Http - Pre-configured
credentials with scope -  host: googly; port: 80; not found for url:
http://googly/robots.txt

2009-05-15 18:44:58,842 INFO  auth.AuthChallengeProcessor - ntlm
authentication scheme selected

2009-05-15 18:44:59,888 INFO  fetcher.Fetcher - -activeThreads=1,
spinWaiting=0, fetchQueues.totalSize=0

2009-05-15 18:45:00,810 INFO  httpclient.HttpMethodDirector - Failure
authenticating with NTLM <any realm>@googly:80

2009-05-15 18:45:00,841 DEBUG httpclient.Http - url:
http://googly/robots.txt; status code: 401; bytes received: 1539;
Content-Length: 1539

2009-05-15 18:45:00,856 DEBUG httpclient.Http - Pre-configured
credentials with scope - host: googly; port: 80; found for url:
http://googly/

2009-05-15 18:45:00,856 INFO  auth.AuthChallengeProcessor - ntlm
authentication scheme selected

2009-05-15 18:45:00,888 INFO  httpclient.HttpMethodDirector - Failure
authenticating with NTLM <any realm>@googly:80


This part of the logs show that the 'devadmin' credentials were picked
up for the authentication, but the server refused to allow access and
returned HTTP 401 response. There is not much I can help here since
everything looks to be happening fine except that the server returns
an HTTP 401 response.

A few other things you could check though I do not think any of these
should cause a problem.

1. Does the password have any special characters? If yes, could you
try again with a simpler alphanumeric password?

2. Is http.agent.host set properly? This should be the host name or
the IP address of the machine on which your crawler is running.

3. Does this configuration help?

 <credentials username="devadmin" password="password">
  <authscope host="googly" port="80"/>
 </credentials>

4. This one?

 <credentials username="devadmin" password="password">
  <authscope host="googly" port="80" scheme="NTLM"/>
 </credentials>

If nothing helps, may be it is time to put network sniffers such as
Wireshark and analyze the HTTP packets to see whether the server or
the client is making a mistake here. (There could be a human error
too. So don't rule out that option.) It would be worthwhile to compare
the traffic between the browser and the server with that of the Nutch
and the server.

Regards,
Susam Pal

Reply via email to