Hi all, I am trying to crawl password protected web pages present in our intranet . I don't know the reason why "*401 Authentication Required*" error creeps up. I have gone through the previous mails sent by others, but it is not getting resolved.
Below are the configuration files i have modified as told in " http://wiki.apache.org/nutch/HttpAuthenticationSchemes" My Url file contains single url *"http://10.2.44.34:8088/xwiki/" *(This url is actually being redirect to "* http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=CDsTIqqN*") *"httpclient-auth.xml* " <credentials username="xyz" password="xyz"> <default/> <authscope host="10.2.44.34" port="8088"/> </credentials> *"nutch-default.xml"* <property> <name>plugin.includes</name> <value>*protocol-httpclient|* urlfilter-regex|parse-(text|html|js|zip)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)| summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> </property> *OutPut Printed to Terminal* Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. Fetcher: starting Fetcher: segment: crawl/segments/20090909151219 Fetcher: threads: 10 QueueFeeder finished: total 1 records. fetching http://10.2.44.34:8088/xwiki/ http.proxy.host = null http.proxy.port = 8080 http.timeout = 10000 http.content.limit = -1 http.agent = iiith/Nutch-1.0 ([email protected]) protocol.plugin.check.blocking = false protocol.plugin.check.robots = false *Credentials - username: superadmin; set as default for realm: ; scheme:* -finishing thread FetcherThread, activeThreads=1 -finishing thread FetcherThread, activeThreads=1 *Credentials - username: superadmin; set for AuthScope - host: 10.2.44.34; port: 8088; realm: ; scheme: Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/robots.txt url: http://10.2.44.34:8088/robots.txt; status code: 401; bytes received: 6739; Content-Length: 6739 Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/ url: http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0; Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/* -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1 * queue: http://10.2.44.34 maxThreads = 1 inProgress = 0 crawlDelay = 1000 minCrawlDelay = 0 nextFetchTime = 1252489344874 now = 1252489344577 0. http://10.2.44.34:8088/xwiki/bin/view/Main/ *fetching http://10.2.44.34:8088/xwiki/bin/view/Main/ Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/bin/view/Main/ url: http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes received: 0; Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX* -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1 * queue: http://10.2.44.34 maxThreads = 1 inProgress = 0 crawlDelay = 1000 minCrawlDelay = 0 nextFetchTime = 1252489345884 now = 1252489345578 0. http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX *fetching http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=yjACAWWX; status code: 401; bytes received: 6739; Content-Length: 6739 401 Authentication Required* -finishing thread FetcherThread, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 -activeThreads=0 Fetcher: done *LOG FILE IS* 2009-09-09 15:46:55,602 INFO fetcher.Fetcher - fetching http://10.2.44.34:8088/xwiki/ 2009-09-09 15:46:55,657 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 2009-09-09 15:46:55,657 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 2009-09-09 15:46:55,691 INFO httpclient.Http - http.proxy.host = null 2009-09-09 15:46:55,691 INFO httpclient.Http - http.proxy.port = 8080 2009-09-09 15:46:55,691 INFO httpclient.Http - http.timeout = 10000 2009-09-09 15:46:55,691 INFO httpclient.Http - http.content.limit = -1 2009-09-09 15:46:55,691 INFO httpclient.Http - http.agent = iiith/Nutch-1.0 ([email protected]) 2009-09-09 15:46:55,691 INFO httpclient.Http - protocol.plugin.check.blocking = false 2009-09-09 15:46:55,691 INFO httpclient.Http - protocol.plugin.check.robots = false 2009-09-09 15:46:55,695 DEBUG httpclient.Http - Credentials - username: superadmin; set as default for realm: ; scheme: 2009-09-09 15:46:55,697 DEBUG httpclient.Http - Credentials - username: superadmin; set for AuthScope - host: 10.2.44.34; port: 8088; realm: ; scheme: *2009-09-09 15:46:55,697 DEBUG httpclient.Http - Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/robots.txt 2009-09-09 15:46:55,942 DEBUG httpclient.Http - url: http://10.2.44.34:8088/robots.txt; status code: 401; bytes received: 6739; Content-Length: 6739 2009-09-09 15:46:55,943 DEBUG httpclient.Http - Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/ 2009-09-09 15:46:55,946 INFO httpclient.HttpMethodDirector - Redirect requested but followRedirects is disabled 2009-09-09 15:46:55,946 DEBUG httpclient.Http - url: http://10.2.44.34:8088/xwiki/; status code: 302; bytes received: 0; Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/view/Main/* 2009-09-09 15:46:56,657 INFO fetcher.Fetcher - -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - * queue: http://10.2.44.34 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - maxThreads = 1 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - inProgress = 0 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - crawlDelay = 1000 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - minCrawlDelay = 0 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - nextFetchTime = 1252491417050 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - now = 1252491416658 2009-09-09 15:46:56,658 INFO fetcher.Fetcher - 0. http://10.2.44.34:8088/xwiki/bin/view/Main/ 2009-09-09 15:46:57,051 INFO fetcher.Fetcher - fetching http://10.2.44.34:8088/xwiki/bin/view/Main/ 2*009-09-09 15:46:57,051 DEBUG httpclient.Http - Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/bin/view/Main/ 2009-09-09 15:46:57,056 INFO httpclient.HttpMethodDirector - Redirect requested but followRedirects is disabled 2009-09-09 15:46:57,057 DEBUG httpclient.Http - url: http://10.2.44.34:8088/xwiki/bin/view/Main/; status code: 302; bytes received: 0; Content-Length: 0; Location: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1* 2009-09-09 15:46:57,658 INFO fetcher.Fetcher - -activeThreads=1, spinWaiting=1, fetchQueues.totalSize=1 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - * queue: http://10.2.44.34 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - maxThreads = 1 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - inProgress = 0 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - crawlDelay = 1000 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - minCrawlDelay = 0 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - nextFetchTime = 1252491418057 2009-09-09 15:46:57,659 INFO fetcher.Fetcher - now = 1252491417659 *2009-09-09 15:46:57,659 INFO fetcher.Fetcher - 0. http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1 2009-09-09 15:46:58,058 INFO fetcher.Fetcher - fetching http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1 2009-09-09 15:46:58,058 DEBUG httpclient.Http - Pre-configured credentials with scope - host: 10.2.44.34; port: 8088; found for url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1 2009-09-09 15:46:58,170 DEBUG httpclient.Http - url: http://10.2.44.34:8088/xwiki/bin/login/XWiki/XWikiLogin?srid=2h453tM1; status code: 401; bytes received: 6739; Content-Length: 6739 2009-09-09 15:46:58,180 DEBUG httpclient.Http - 401 Authentication Required* 2009-09-09 15:46:58,180 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 2009-09-09 15:46:58,659 INFO fetcher.Fetcher - -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 2009-09-09 15:46:58,659 INFO fetcher.Fetcher - -activeThreads=0 Thank you in advance, bye, Kranthi Reddy. B
