Hi, I am crawling a site in which i am getting the following error. And when i am opening this particular document am able to access its contents.
I am getting the following info in log file. 2008-08-19 09:37:20,609 INFO fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc 2008-08-19 09:37:21,593 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc 2008-08-19 09:37:21,593 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219) 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88) 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80) 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145) 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219) 2008-08-19 09:37:21,593 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) 2008-08-19 09:37:21,593 INFO fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/UNIX-resume[1].doc': escaped absolute path not valid --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 2008-08-19 09:53:08,609 INFO fetcher.Fetcher - fetching http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm 2008-08-19 09:53:09,609 DEBUG httpclient.Http - Pre-configured credentials with scope - host: finacleportal; port: 80; found for url: http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm 2008-08-19 09:53:09,609 ERROR httpclient.Http - java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.HttpMethodBase.<init>(HttpMethodBase.java:219) 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.commons.httpclient.methods.GetMethod.<init>(GetMethod.java:88) 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:80) 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:145) 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:219) 2008-08-19 09:53:09,609 ERROR httpclient.Http - at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145) 2008-08-19 09:53:09,609 INFO fetcher.Fetcher - fetch of http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm failed with: java.lang.IllegalArgumentException: Invalid uri 'http://finacleportal/shadows/VssLatestShadow/Treasury/Central Banks/Jamaica_.htm': escaped absolute path not valid The same error is there with many more files. Please guide me.. Regards Nisha Aggarwal **************** CAUTION - Disclaimer ***************** This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS******** End of Disclaimer ********INFOSYS***
