Hi

Please can someone point me in the right direction. I have a problem when 
scanning our intranet because many of the pages return code 500 as illustrated 
in the headers below, which (correctly I agree) gives httpclient the impression 
the GET failed. However the server actually redirects the GET by appending 
"?OpenDocument" on the end of the initial url requested.

 I don't think there's a way to get round this in the configuration so I looked 
at fetcher.java and tried to get it to refetch the url with "?OpenDocument" 
appended but my code didn't work. I can't really figure out how it works! duh! 
Could someone tell me how to get nutch to refetch the ammended url please if 
httpclient gets a 500 back?

Thanks,

Ed.


http://planetba.baplc.com/general/aptrix/aptprop.nsf/Content/Europe+%26+Africa+Home%5CLibrary%5C500+EA+LocCodes

GET 
/general/aptrix/aptprop.nsf/Content/Europe+%26+Africa+Home%5CLibrary%5C500+EA+LocCodes
 HTTP/1.1
Host: planetba.baplc.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1) 
Gecko/2008070208 Firefox/3.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie:
ObSSOCookie=DdKzZ2Ebcglw9MjchanSFA%2FKN0agvrJTAe6PEGDHOXTeEgfmCrvqYCxVBY0qwU24Xb2T6MV3%2BUwrIfNhKVQA97J54%2Fd2%2BjetZjNoC98N4638eJpf3ZDyE50llsTdOAADaNn%2BjqVfeFrvDjJ2agM1Pxo1Y7DGR0yME1P0%2FHcd6XgFaHwEq9CyUvPq5k6mKMr7Vy4oiZS75RRPAJwNTOxoj7cLuwHX%2Fugj2GJ%2F8Jdynj6Ov1rxgeCWqGdm1ltqEma1TkAbKayt8RtilHwZxRmYDRc3tnGlaqauVUZDNVNE3B3L3bQDyfaFWaDHuX3r67CP

HTTP/1.x 500 Internal Server Error
Server: Lotus-Domino
Date: Tue, 02 Sep 2008 21:35:52 GMT
Connection: close
Expires: Tue, 01 Jan 1980 06:00:00 GMT
Content-Type: text/html; charset=US-ASCII
Content-Length: 661
Cache-Control: no-cache


----------------------------------------------------------
http://planetba.baplc.com/general/aptrix/aptprop.nsf/Content/Europe+%26+Africa+Home%5CLibrary%5C500+EA+LocCodes?OpenDocument

GET 
/general/aptrix/aptprop.nsf/Content/Europe+%26+Africa+Home%5CLibrary%5C500+EA+LocCodes?OpenDocument
 HTTP/1.1
Host: planetba.baplc.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.0.1) 
Gecko/2008070208 Firefox/3.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie:
ObSSOCookie=DdKzZ2Ebcglw9MjchanSFA%2FKN0agvrJTAe6PEGDHOXTeEgfmCrvqYCxVBY0qwU24Xb2T6MV3%2BUwrIfNhKVQA97J54%2Fd2%2BjetZjNoC98N4638eJpf3ZDyE50llsTdOAADaNn%2BjqVfeFrvDjJ2agM1Pxo1Y7DGR0yME1P0%2FHcd6XgFaHwEq9CyUvPq5k6mKMr7Vy4oiZS75RRPAJwNTOxoj7cLuwHX%2Fugj2GJ%2F8Jdynj6Ov1rxgeCWqGdm1ltqEma1TkAbKayt8RtilHwZxRmYDRc3tnGlaqauVUZDNVNE3B3L3bQDyfaFWaDHuX3r67CP

HTTP/1.x 200 OK
Server: Lotus-Domino
Date: Tue, 02 Sep 2008 21:35:52 GMT
Last-Modified: Tue, 02 Sep 2008 21:35:50 GMT
Expires: Tue, 01 Jan 1980 06:00:00 GMT
Content-Type: text/html; charset=ISO-8859-1
Content-Length: 104168
Cache-Control: no-cache
_________________________________________________________________
Win New York holidays with Kellogg’s & Live Search
http://clk.atdmt.com/UKM/go/111354033/direct/01/

Reply via email to