Hi.

I write a robot for a search engine. The robot must harvest all files which are shorter than a few kilobytes (let's say 100kB) - longer files are not important, because they are often archives or long sheets about nothing.

I cannot find a robust style in which I could drop a connection (GET over HTTP/1.0 and HTTP/1.1) when the incoming data stream exceeds the upper limit. I do it by closing the input stream, which is constructed by getResponseAsStream, followed by releaseConnection. Is it OK?

My second point is related to "retrying" you have in your docs (http://jakarta.apache.org/commons/httpclient/tutorial.html - catch block of HttpRecovableException). When I do something like this, I found out that I had to call method.recycle() in the catch block, or the connection was not reinitialized and everything fails. Could you enlighten me on this? Is it a bug in the guide? (I have tried it on 2.0-b1).

And my last point - when I run the robot under stress conditions, some connections seem to be frozen, although I use setConnectionTimeout. Is it a known issue? How should I debug it so that you can get a valuable log? It happens after 1-2 hours of run, so the log could have a few gigas...

Thank you

-g-



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to