|
Hello, I’ve recently integrated HTTPClient 3.0-rc2 into an
application that uses a popular Proxy service, Anonymizer.com. Unfortunetly,
sometimes HTTPClient returns webpages with portions of the page corrupted. If I
use any web browser (IE, Firefox, Opera) I never see the same corrupt data in
the same webpages. I was originally using simple sockets and java.net to find,
connect to and retrieve these pages, but when switching to Anonymizer I was
running into problems parsing chunked data content. I had written a regex
_expression_ to try and find and discard the chunk identifiers (as opposed to
reading the page based on the chunked identifers) but the regex _expression_
would occasionally miss some of the hex idenifiers. I cannot find anything
wrong with the regex _expression_ and so I suspect that the proxy was not
returning data per RFC. Regardless, I decided to switch over to HTTPClient for
several reasons, one of which is the transparent reading of chunked data. Still,
after implementing (and I hope I followed the tutorials, docs, and sample code
as closely as possible), I’m still getting corrupt data. I’ve
looked throughout the user and most of the dev mailing lists and have not found
quite a similar problem being reported. So my few questions, if any one can help, are: 1) Should
HTTPClient 3.0 return data as well as any web browser? 2) Has anyone
run into similar problems with Proxy Services? 3) Are there
any fine tuning tips anyone has for using Proxies? 4) Or tips for
reading chunked data? Below is a snip of the code to connect to and retrieve proxy
data. Note, I did not follow the sample Proxy code found in the rc2 src,
because I need to sometimes connect to Google and Overture, both of which
return 502 – Forbidden pages when connecting using that particular
method. Instead I opted on the tuturial method of connecting through proxies. Also, attached is one of the returned corrupted pages. Check
out the page source and about 70 lines down, you’ll start seeing the
corrupted characters. Thanks in advance for any pointers or responses Chris private HttpConnectionEngine(String pHost, int
pPort,
HttpConnectionEngineParams pConnEngineParams) { HttpConnectionManagerParams
connManagerParams = new HttpConnectionManagerParams();
connManagerParams.setDefaultMaxConnectionsPerHost(pConnEngineParams
.getMaxConnectionsPerHost());
connManagerParams.setMaxTotalConnections(pConnEngineParams
.getMaxConnectionsPerHost());
connManagerParams.setStaleCheckingEnabled(pConnEngineParams
.isConnectionStaleCheckingEnabled());
connManagerParams.setConnectionTimeout((int) pConnEngineParams
.getIdleConnectionTimeout()); cConnManager = new
MultiThreadedHttpConnectionManager(); cConnManager.setParams(connManagerParams);
cConnManager.closeIdleConnections(pConnEngineParams
.getIdleConnectionTimeout()); cConnEngineParams =
pConnEngineParams; cHostConfig = new
HostConfiguration();
cHostConfig.setHost("www.google.com"); // example host that sometimes
returns corrupt webpage
cHostConfig.setProxy("quinstreet.anonymizer.com", 80); // proxy host } public void readFromServer(String pRequest,
StringBuffer pWebPage)
throws InvalidArgumentException { final String METHOD =
"readFromServer()"; int status = -1, lReadLine =
-1; int lBufSize = 4 * 1024; char[] lHtmlBuf = new
char[lBufSize]; GetMethod lRequestMethod =
null; InputStreamReader lIn =
null; HttpClient lClient = new
HttpClient();
lClient.setHttpConnectionManager(cConnManager);
lClient.setHostConfiguration(cHostConfig); // set number of retrys on
bad connect
lClient.getParams().setParameter(
HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(cConnEngineParams
.getNumOfRetryOnBadHttpStatus(), cConnEngineParams
.isRequestSentRetryEnabled())); log.log(Level.INFO,
"Request: " + pRequest); lRequestMethod = new
GetMethod(pRequest); // add headers to request Properties lReqHeaderProps =
cConnEngineParams.getReqHeaderProps(); Enumeration enum =
lReqHeaderProps.keys(); while
(enum.hasMoreElements()) {
String key = (String) enum.nextElement();
lRequestMethod.addRequestHeader(key, lReqHeaderProps .getProperty(key)); } // clean StringBuffer pWebPage.delete(0,
pWebPage.length()); // execute request try {
status =
lClient.executeMethod(lRequestMethod);
lIn
= new InputStreamReader(
lRequestMethod.getResponseBodyAsStream(),
lRequestMethod.getResponseCharSet() );
while ((lReadLine = lIn.read(lHtmlBuf)) != -1) {
pWebPage.append(lHtmlBuf);
lHtmlBuf = new char[lBufSize]; } } catch (HttpException he) {
throw new InvalidArgumentException(
"HttpException executing GetMethod on request: " + pRequest
+ ", with: " + he.getMessage()); } catch (IOException ioe) {
throw new InvalidArgumentException(
"IOException executing request or reading response on request: "
+ pRequest + ", with: " + ioe.getMessage()); } finally { //
clean resources //
NOTE: don't close connection with HTTP/1.1
lRequestMethod.releaseConnection();
lRequestMethod = null;
lClient = null; enum
= null; //
check the status for logging if
(status != HttpStatus.SC_OK)
log.logp(Level.INFO, CLASS, METHOD, "Bad request, status: "
+ status); else
log.logp(Level.FINE, CLASS, METHOD,
"OK status, webpage-length:" + pWebPage.length()); } } |
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
