Hello,

I was using httpclient to write a simple crawler. However, it returns 400 bad request for a particular URL. I tested the URL with HttpURLConnection and there is no problem retrieving the content at all.

Here are the DEBUG message:
DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.useragent = Jakarta Commons-HttpClient/3.1 DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.protocol.version = HTTP/1.1 DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.protocol.cookie-policy = default DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.protocol.element-charset = US-ASCII DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.protocol.content-charset = ISO-8859-1 DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.method.retry-handler = [EMAIL PROTECTED] DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z] DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.useragent = Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) DEBUG [org.apache.commons.httpclient.params.DefaultHttpParams] Set parameter http.connection.timeout = 10000 DEBUG [org.apache.commons.httpclient.HttpConnection] Open connection to www.cse.psu.edu:80
DEBUG [httpclient.wire.header] >> "GET /people/faculty.php HTTP/1.1[\r][\n]"
DEBUG [org.apache.commons.httpclient.HttpMethodBase] Adding Host request header DEBUG [httpclient.wire.header] >> "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12)[\r][\n]"
DEBUG [httpclient.wire.header] >> "Host: www.cse.psu.edu[\r][\n]"
DEBUG [httpclient.wire.header] >> "[\r][\n]"
DEBUG [httpclient.wire.header] << "HTTP/1.1 400 Bad Request[\r][\n]"
DEBUG [httpclient.wire.header] << "HTTP/1.1 400 Bad Request[\r][\n]"
DEBUG [httpclient.wire.header] << "Date: Tue, 25 Mar 2008 15:21:24 GMT[\r][\n]"
DEBUG [httpclient.wire.header] << "Server: Apache2[\r][\n]"
DEBUG [httpclient.wire.header] << "Content-Length: 292[\r][\n]"
DEBUG [httpclient.wire.header] << "Connection: close[\r][\n]"
DEBUG [httpclient.wire.header] << "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
DEBUG [httpclient.wire.header] << "[\r][\n]"

Any idea what is going on?

Thanks,
Yang


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to