hi Susam, very sorry for the mistake in the 1st code. I had put <default/> but omitted that line when i sent it across to u :(.
for our intranet sites we do not require a proxy. hence i have now removed the proxy and ensured its default auth and did a crawl. have attached the log, still getting the same 401 :( would you like me to send the nutch-site and default xmls? thanks, Rochelle On Fri, May 15, 2009 at 4:54 PM, Susam Pal <[email protected]> wrote: > On Fri, May 15, 2009 at 2:43 PM, Rochelle D'souza > <[email protected]> wrote: > > hi Susam, > > > > Many thanks for your reply. > > > > > > > > As requested I have given only default authentication. Below is the > > httpclient-auth.xml > > > > <?xml version="1.0"?> > > > > <auth-configuration> > > > > <credentials username="devadmin" password="password"> > > > > </credentials> > > > > </auth-configuration> > > This is not the correct way to configure default credentials. You need > to put a <default/> tag within the <credentials> tag. > > Please read the section, 'Crawling an Intranet with Default > Authentication Scope' in > http://wiki.apache.org/nutch/HttpAuthenticationSchemes to see an > example. > > > > > The logs for the same are > > > > I have only masked the agent and proxy host name since I am sharing the > log > > file. > > > > > > > > Then I changed the httpclient-auth.xml to the below code > > > > <?xml version="1.0"?> > > > > <auth-configuration> > > > > <credentials username="devadmin" password="password"> > > > > <authscope host="googly" port="80" realm="xyz"/> > > > > </credentials> > > > > </auth-configuration> > > From the logs obtained with this configuration, I see: > > 2009-05-15 14:32:40,971 INFO auth.AuthChallengeProcessor - ntlm > authentication scheme selected > 2009-05-15 14:32:41,002 INFO httpclient.HttpMethodDirector - Failure > authenticating with NTLM <any realm>@googly:80 > 2009-05-15 14:32:41,002 DEBUG httpclient.Http - url: http://googly/; > status code: 401; bytes received: 1539; Content-Length: 1539 > 2009-05-15 14:32:41,205 DEBUG httpclient.Http - 401 Authentication Required > > These lines tell that authentication was tried but the authentication > failed. > > In the logs I also see these lines: > > 2009-05-15 14:32:35,661 INFO httpclient.Http - http.proxy.host = > proxy.companyname.com > 2009-05-15 14:32:35,739 INFO httpclient.Http - http.proxy.port = 6050 > > So, it seems you have configured a proxy server. Have you configured > http.proxy.username and http.proxy.password too? If yes, this may be > the cause of the problem. Authentication for proxy server as well as > web server is not supported at the moment. For more on this please go > through the 'NTLM' section of this article: > http://hc.apache.org/httpclient-3.x/authentication.html > > Regards, > Susam Pal >
