Re: strange behavior of 2.0-rc1
Yes, it works. THX Leo Michael Becke wrote: Okay, I think I have the fix. It seems that there is a problem releasing connections that throw an exception then closing the response stream. Leo, please try out the patch attached to <http://issues.apache.org/bugzilla/show_bug.cgi?id=22841>. Mike On Saturday, August 30, 2003, at 12:59 PM, Michael Becke wrote: Well, I've narrowed it down a little. It seems that connections made to the host below are not being released. Coincidentally all of these hosts use chunked encoding. I don't know if this is the cause, but it seems suspicious. I'm going to run some more tests. Attached is the perl script I'm using to find unreleased connections. bazar.technet.cz bonusweb.cz cheaty.bonusweb.cz foto.katedrala.cz index.katedrala.cz mms.idnes.cz stred.mojenoviny.cz vtipy.katedrala.cz www.bonusweb.cz www.bytycz.cz www.cestiny.cz www.domeny.cz www.ekolist.cz www.gamesy.cz www.jarmark.cz www.kisshady.cz www.meliorannis.cz www.mobil.cz www.netem.cz www.notebooky.cz www.palmare.cz www.plnehry.cz www.pocketpc.cz www.reality.cz www.reality21.cz www.technet.cz www2.ticketpro.cz Mike On Saturday, August 30, 2003, at 12:50 AM, Michael Becke wrote: Hi Leo, Well, I've done some analysis of the log file and I think I have a few answers. It seems that a few connections are not being released. To be exact, 47 connections are not released. 42 of them get reclaimed by the garbage collector, but the other 5 remain in limbo. We'll need to do some more work to figure out why these connections are not released. I wrote a quick perl script that analyzes the log file, which I will post tomorrow. That's all for tonight. I'm going to get some sleep. More to come tomorrow... Mike On Friday, August 29, 2003, at 11:38 AM, Leo Galambos wrote: Hi Mike, you have it here: http://com-os2.ms.mff.cuni.cz/temp/example.log.gz (2767808B) I set maxTotalConnection=5. -g- Michael Becke wrote: Hi Leo, Attached at the bottom of my message is another log4j configuration. This should considerably decrease the amount of logging. It also keeps all of the logging info in a single file. To successfully debug this case we will need all of the logging data for a particular session. In conjunction with this debugging we also need to reduce the time it takes for this error to come up. As I suggested in my email a few day ago you should also try decreasing the number of total connections via MultiThreadedHttpConnectionManager.setMaxTotalConnections(). Mike log4j.rootLogger=WARN, R log4j.appender.R=org.apache.log4j.FileAppender log4j.appender.R.File=example.log log4j.appender.R.append=false log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.commons.httpclient.HttpMethodBase=DEBUG log4j.logger.org.apache.commons.httpclient.MultiThreadedHttpConnecti onMa nager=DEBUG log4j.logger.org.apache.commons.httpclient.HttpConnection=DEBUG On Thursday, August 28, 2003, at 10:36 PM, Leo Galambos wrote: Hi Mike, I've never used log4j and I missed your logging.html page, thus I tried my simple config (see below). Unfortunately, I cannot reproduce the bug easily (-> no test case yet) - the robot has to run more than 6 hours. BTW: I had to rotate log files due to a huge volume of data, so you can see last 30 log files. I guess the format is obvious from the config file below. If you cannot find what you need, please, edit my log4j cfg file and I will rerun it again. Thank you Leo -- My log4j config: log4j.rootLogger=debug, R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout # Pattern to output the caller's file name and line number. log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=example.log log4j.appender.R.MaxFileSize=1000KB # Keep one backup file log4j.appender.R.MaxBackupIndex=30 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.commons.httpclient=DEBUG - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PRO
Re: strange behavior of 2.0-rc1
Hi Mike, you have it here: http://com-os2.ms.mff.cuni.cz/temp/example.log.gz (2767808B) I set maxTotalConnection=5. -g- Michael Becke wrote: Hi Leo, Attached at the bottom of my message is another log4j configuration. This should considerably decrease the amount of logging. It also keeps all of the logging info in a single file. To successfully debug this case we will need all of the logging data for a particular session. In conjunction with this debugging we also need to reduce the time it takes for this error to come up. As I suggested in my email a few day ago you should also try decreasing the number of total connections via MultiThreadedHttpConnectionManager.setMaxTotalConnections(). Mike log4j.rootLogger=WARN, R log4j.appender.R=org.apache.log4j.FileAppender log4j.appender.R.File=example.log log4j.appender.R.append=false log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.commons.httpclient.HttpMethodBase=DEBUG log4j.logger.org.apache.commons.httpclient.MultiThreadedHttpConnectionMa nager=DEBUG log4j.logger.org.apache.commons.httpclient.HttpConnection=DEBUG On Thursday, August 28, 2003, at 10:36 PM, Leo Galambos wrote: Hi Mike, I've never used log4j and I missed your logging.html page, thus I tried my simple config (see below). Unfortunately, I cannot reproduce the bug easily (-> no test case yet) - the robot has to run more than 6 hours. BTW: I had to rotate log files due to a huge volume of data, so you can see last 30 log files. I guess the format is obvious from the config file below. If you cannot find what you need, please, edit my log4j cfg file and I will rerun it again. Thank you Leo -- My log4j config: log4j.rootLogger=debug, R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout # Pattern to output the caller's file name and line number. log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=example.log log4j.appender.R.MaxFileSize=1000KB # Keep one backup file log4j.appender.R.MaxBackupIndex=30 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.commons.httpclient=DEBUG - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: strange behavior of 2.0-rc1
Hi Mike, I've never used log4j and I missed your logging.html page, thus I tried my simple config (see below). Unfortunately, I cannot reproduce the bug easily (-> no test case yet) - the robot has to run more than 6 hours. BTW: I had to rotate log files due to a huge volume of data, so you can see last 30 log files. I guess the format is obvious from the config file below. If you cannot find what you need, please, edit my log4j cfg file and I will rerun it again. Thank you Leo -- My log4j config: log4j.rootLogger=debug, R log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout # Pattern to output the caller's file name and line number. log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.File=example.log log4j.appender.R.MaxFileSize=1000KB # Keep one backup file log4j.appender.R.MaxBackupIndex=30 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n log4j.logger.org.apache.commons.httpclient=DEBUG Michael Becke wrote: Hi Leo, Could you give me a little more detail about what these various files are? Do each of them show the problem that's occurring? Also, though it's a little late now, we really only need the debug log for MultiThreadedHttpConnectionManager and perhaps the wire I think. Have you had any luck in creating a test case? Thanks, Mike On Thursday, August 28, 2003, at 01:45 PM, Leo Galambos wrote: Mike, the the logs are here: http://com-os2.ms.mff.cuni.cz/temp/ (example.*.*) THX -g- Michael Becke wrote: Yea, "good news" :) So now we know that we're losing connections somewhere, and they are not being garbage collected. There are a bunch of things we can try.Here are a few ideas: - enable debug logging for the MultiThreadedHttpConnectionManager. You will probably also want to print out the thread name with the log. This will require Log4j or JRE 1.4 logging. - reduce the number of threads executing methods as well as the max number of connections(MultiThreadedHttpConnectionManager.setMaxTotalConnection s( )) . This should cause the error to occur more quickly. - if possible, try to come up with something standalone that reproduces the problem. If I can reproduce the error I will be better able to debug the problem. - run the application in debug mode so that when things get locked up we can see where each of the threads are. Also, any more detail you can provide about your application andenvironment would be helpful. Please let me know if you have any ideas or questions. Mike On Monday, August 25, 2003, at 01:59 PM, Leo Galambos wrote: Hi Mike, "good news": c-0.log:org.apache.commons.httpclient.HttpException: Timeout waiting for connection c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.do Ge tC onnection(MultiThreadedHttpConnectionManager.java:314) c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.ge tC on nection(MultiThreadedHttpConnectionManager.java:241) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.j av a: 609) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.j av a: 500) c-0.log-at org.egothor.robot.Network.get(Network.java:109) c-0.log-at org.egothor.robot.Capek$Fetcher.run(Capek.java:186) What next? THX Leo Michael Becke wrote: Hi Leo, These 20 CLOSE_WAIT sockets are from the 20 connections in the pool held by MultiThreadedHttpConnectionManager. When connections are released they are not necessarily closed. My guess is that connections are being lost somewhere. If all 20connections are checked out and not released the 21st request for a connection will block until one is available. You can verify this by setting httpConnectionFactoryTimeout on HttpClient. This timeout is how long HttpClient will wait when getting a connection from the connection manager. Setting this timeout to something big is probably best, that way you won't run into it by accident. If connections are getting lost somewhere and httpConnectionFactoryTimeout is set you will start getting HttpExceptions with a message of "Timeout waiting for connection" on every method execute. Please give this a shot to ensure that the problem is with lost connections. Once verified we can start looking for a cause. Also, in you example Network.java class I noticed the following lines: while (r.status == -1 && attempt++ < RETRIES) { try { r.status = client.executeMethod(method); r.msg = null; } catch (Throwable
Re: strange behavior of 2.0-rc1
Mike, the the logs are here: http://com-os2.ms.mff.cuni.cz/temp/ (example.*.*) THX -g- Michael Becke wrote: Yea, "good news" :) So now we know that we're losing connections somewhere, and they are not being garbage collected. There are a bunch of things we can try.Here are a few ideas: - enable debug logging for the MultiThreadedHttpConnectionManager.You will probably also want to print out the thread name with the log. This will require Log4j or JRE 1.4 logging. - reduce the number of threads executing methods as well as the max number of connections(MultiThreadedHttpConnectionManager.setMaxTotalConnections( )) . This should cause the error to occur more quickly. - if possible, try to come up with something standalone that reproduces the problem. If I can reproduce the error I will be better able to debug the problem. - run the application in debug mode so that when things get locked up we can see where each of the threads are. Also, any more detail you can provide about your application and environment would be helpful. Please let me know if you have any ideas or questions. Mike On Monday, August 25, 2003, at 01:59 PM, Leo Galambos wrote: Hi Mike, "good news": c-0.log:org.apache.commons.httpclient.HttpException: Timeout waiting for connection c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGe tC onnection(MultiThreadedHttpConnectionManager.java:314) c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getC on nection(MultiThreadedHttpConnectionManager.java:241) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav a: 609) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.jav a: 500) c-0.log-at org.egothor.robot.Network.get(Network.java:109) c-0.log-at org.egothor.robot.Capek$Fetcher.run(Capek.java:186) What next? THX Leo Michael Becke wrote: Hi Leo, These 20 CLOSE_WAIT sockets are from the 20 connections in the pool held by MultiThreadedHttpConnectionManager. When connections are released they are not necessarily closed. My guess is that connections are being lost somewhere. If all 20 connections are checked out and not released the 21st request for a connection will block until one is available. You can verify this by setting httpConnectionFactoryTimeout on HttpClient. This timeout is how long HttpClient will wait when getting a connection from the connection manager. Setting this timeout to something big is probably best, that way you won't run into it by accident. If connections are getting lost somewhere and httpConnectionFactoryTimeout is set you will start getting HttpExceptions with a message of "Timeout waiting for connection" on every method execute. Please give this a shot to ensure that the problem is with lost connections. Once verified we can start looking for a cause. Also, in you example Network.java class I noticed the following lines: while (r.status == -1 && attempt++ < RETRIES) { try { r.status = client.executeMethod(method); r.msg = null; } catch (Throwable x) { method.recycle(); r.msg = url + ": " + x.getMessage(); if (logger.isLoggable(Level.FINE)) { logger.log(Level.FINE, attempt + " of " + url, x); } } } HttpMethod.recycle() resets all values in the method, including the URL and request headers. Calling recycle in the exception case and not resetting the various values will make the retry behave quite differently. My guess is that this is not what you were trying to >> do. Mike On Saturday, August 23, 2003, at 01:07 PM, Leo Galambos wrote: Hi. I am using httpclient (HC) in a webcrawler. After 6 hours of run, HC stops working and I think, it is locked by some lock of a critical section in HC. The "problematic" code, I use, is here: http://www.egothor.org/temp/Network.java When 20 threads are used, I have this in `netstat -an` after stopping the robot from its console (note: the number of dead connection is really *20*): tcp23786 0 censored:60083censored2:80 CLOSE_WAIT tcp24179 0 censored:60080censored2:80 CLOSE_WAIT tcp24143 0 censored:60092censored2:80 CLOSE_WAIT tcp24164 0 censored:60091censored2:80 CLOSE_WAIT tcp24169 0 censored:60090censored2:80 CLOSE_WAIT tcp24189 0 censored:60089censored2:80 CLOSE_WAIT tcp24188 0 censored:60071censored2:80 CLOSE_WAIT tcp11697 0 c
Re: strange behavior of 2.0-rc1
OK, before I start with all the points, can you tell me whether the following strategy may cause the problem? In Network.java you can see, that the response shorter than 64kB is read completely, but when it is longer, just first 64kB are read. In both cases, method.releaseConnection is called. Is this correct? If not, how can I close the connection? I had already asked here, but I did not get a clear answer yet. You know, I cannot read the full stream, because a bad boy could redirect /dev/random to a socket. Thank you Leo Michael Becke wrote: Yea, "good news" :) So now we know that we're losing connections somewhere, and they are not being garbage collected. There are a bunch of things we can try. Here are a few ideas: - enable debug logging for the MultiThreadedHttpConnectionManager. You will probably also want to print out the thread name with the log. This will require Log4j or JRE 1.4 logging. - reduce the number of threads executing methods as well as the max number of connections(MultiThreadedHttpConnectionManager.setMaxTotalConnections()) . This should cause the error to occur more quickly. - if possible, try to come up with something standalone that reproduces the problem. If I can reproduce the error I will be better able to debug the problem. - run the application in debug mode so that when things get locked up we can see where each of the threads are. Also, any more detail you can provide about your application and environment would be helpful. Please let me know if you have any ideas or questions. Mike On Monday, August 25, 2003, at 01:59 PM, Leo Galambos wrote: Hi Mike, "good news": c-0.log:org.apache.commons.httpclient.HttpException: Timeout waiting for connection c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetC onnection(MultiThreadedHttpConnectionManager.java:314) c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getCon nection(MultiThreadedHttpConnectionManager.java:241) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 609) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 500) c-0.log-at org.egothor.robot.Network.get(Network.java:109) c-0.log-at org.egothor.robot.Capek$Fetcher.run(Capek.java:186) What next? THX Leo Michael Becke wrote: Hi Leo, These 20 CLOSE_WAIT sockets are from the 20 connections in the pool held by MultiThreadedHttpConnectionManager. When connections are released they are not necessarily closed. My guess is that connections are being lost somewhere. If all 20 connections are checked out and not released the 21st request for a connection will block until one is available. You can verify this by setting httpConnectionFactoryTimeout on HttpClient. This timeout is how long HttpClient will wait when getting a connection from the connection manager. Setting this timeout to something big is probably best, that way you won't run into it by accident. If connections are getting lost somewhere and httpConnectionFactoryTimeout is set you will start getting HttpExceptions with a message of "Timeout waiting for connection" on every method execute. Please give this a shot to ensure that the problem is with lost connections. Once verified we can start looking for a cause. Also, in you example Network.java class I noticed the following lines: while (r.status == -1 && attempt++ < RETRIES) { try { r.status = client.executeMethod(method); r.msg = null; } catch (Throwable x) { method.recycle(); r.msg = url + ": " + x.getMessage(); if (logger.isLoggable(Level.FINE)) { logger.log(Level.FINE, attempt + " of " + url, x); } } } HttpMethod.recycle() resets all values in the method, including the URL and request headers. Calling recycle in the exception case and not resetting the various values will make the retry behave quite differently. My guess is that this is not what you were trying to >> do. Mike On Saturday, August 23, 2003, at 01:07 PM, Leo Galambos wrote: Hi. I am using httpclient (HC) in a webcrawler. After 6 hours of run, HC stops working and I think, it is locked by some lock of a critical section in HC. The "problematic" code, I use, is here: http://www.egothor.org/temp/Network.java When 20 threads are used, I have this in `netstat -an` after stopping the robot from its console (note: the number of dead connection is really *20*): tcp23786 0 censored:60083censored2:80 CLOSE_WAIT tcp24179 0 censored:60080censored2:80 CLOSE_WAI
Re: strange behavior of 2.0-rc1
Hi Mike, "good news": c-0.log:org.apache.commons.httpclient.HttpException: Timeout waiting for connection c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:314) c-0.log-at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnection(MultiThreadedHttpConnectionManager.java:241) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:609) c-0.log-at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:500) c-0.log-at org.egothor.robot.Network.get(Network.java:109) c-0.log-at org.egothor.robot.Capek$Fetcher.run(Capek.java:186) What next? THX Leo Michael Becke wrote: Hi Leo, These 20 CLOSE_WAIT sockets are from the 20 connections in the pool held by MultiThreadedHttpConnectionManager. When connections are released they are not necessarily closed. My guess is that connections are being lost somewhere. If all 20 connections are checked out and not released the 21st request for a connection will block until one is available. You can verify this by setting httpConnectionFactoryTimeout on HttpClient. This timeout is how long HttpClient will wait when getting a connection from the connection manager. Setting this timeout to something big is probably best, that way you won't run into it by accident. If connections are getting lost somewhere and httpConnectionFactoryTimeout is set you will start getting HttpExceptions with a message of "Timeout waiting for connection" on every method execute. Please give this a shot to ensure that the problem is with lost connections. Once verified we can start looking for a cause. Also, in you example Network.java class I noticed the following lines: while (r.status == -1 && attempt++ < RETRIES) { try { r.status = client.executeMethod(method); r.msg = null; } catch (Throwable x) { method.recycle(); r.msg = url + ": " + x.getMessage(); if (logger.isLoggable(Level.FINE)) { logger.log(Level.FINE, attempt + " of " + url, x); } } } HttpMethod.recycle() resets all values in the method, including the URL and request headers. Calling recycle in the exception case and not resetting the various values will make the retry behave quite differently. My guess is that this is not what you were trying to do. Mike On Saturday, August 23, 2003, at 01:07 PM, Leo Galambos wrote: Hi. I am using httpclient (HC) in a webcrawler. After 6 hours of run, HC stops working and I think, it is locked by some lock of a critical section in HC. The "problematic" code, I use, is here: http://www.egothor.org/temp/Network.java When 20 threads are used, I have this in `netstat -an` after stopping the robot from its console (note: the number of dead connection is really *20*): tcp23786 0 censored:60083censored2:80 CLOSE_WAIT tcp24179 0 censored:60080censored2:80 CLOSE_WAIT tcp24143 0 censored:60092censored2:80 CLOSE_WAIT tcp24164 0 censored:60091censored2:80 CLOSE_WAIT tcp24169 0 censored:60090censored2:80 CLOSE_WAIT tcp24189 0 censored:60089censored2:80 CLOSE_WAIT tcp24188 0 censored:60071censored2:80 CLOSE_WAIT tcp11697 0 censored:60066censored2:80 CLOSE_WAIT tcp24152 0 censored:60078censored2:80 CLOSE_WAIT tcp24180 0 censored:60076censored2:80 CLOSE_WAIT tcp24208 0 censored:60074censored2:80 CLOSE_WAIT tcp24154 0 censored:60073censored2:80 CLOSE_WAIT tcp24150 0 censored:60044censored2:80 CLOSE_WAIT tcp24150 0 censored:60041censored2:80 CLOSE_WAIT tcp24194 0 censored:60101censored2:80 CLOSE_WAIT tcp24206 0 censored:60099censored2:80 CLOSE_WAIT tcp24156 0 censored:60097censored2:80 CLOSE_WAIT tcp23643 0 censored:60096censored2:80 CLOSE_WAIT tcp23609 0 censored:60111censored2:80 CLOSE_WAIT tcp23961 0 censored:60110censored2:80 CLOSE_WAIT Other connections are not allocated and the robot is waiting for something (O/S does not report that the respective JAVA processes > run). Any thoughts? Thank you Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail:
Re: strange behavior of 2.0-rc1
Michael Becke wrote: Also, in you example Network.java class I noticed the following lines: while (r.status == -1 && attempt++ < RETRIES) { try { r.status = client.executeMethod(method); r.msg = null; } catch (Throwable x) { method.recycle(); r.msg = url + ": " + x.getMessage(); if (logger.isLoggable(Level.FINE)) { logger.log(Level.FINE, attempt + " of " + url, x); } } } Hi Mike, what do you suggest? Your tutorial does not cover this - a) you use System.exit(-1) which would not be a good solution in my case ;-); or b) you do nothing, which does not work, because "executeMethod" would throw some exception again (I tried it with a previous HC version). Should I use "releaseConnection"? Will it work correctly? BTW: I will try "httpConnectionFactoryTimeout" as soon as we are sure, that "recycle" is not the problematic point which causes the issue. Thank you Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: strange behavior of 2.0-rc1
Laura Werner wrote: Leo Galambos wrote: I am using httpclient (HC) in a webcrawler. After 6 hours of run, HC stops working and I think, it is locked by some lock of a critical section in HC. The "problematic" code, I use, is here: http://www.egothor.org/temp/Network.java Are you running on Windows by any chance? It has a "feature" where the OS waits a very long time before deciding that half-closed sockets are really dead, so you can end up with lots of sockets in TIME_WAIT or CLOSE_WAIT states. There's a registry setting you can change to adjust the timeout down to a more reasonable value. This doesn't sound exactly like what you're seeing, but I thought I'd throw it out there... Unfortunately, I do not use MS-Windows. I have nothing against TIME_WAIT sockets, my problem is that httpclient stops operate (all threads are locked somewhere in HC) and those sockets may be the key. As you see, their data stacks are not empty - it implies, that httpclient did not read the data, and it also shows where the library may be locked. Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
strange behavior of 2.0-rc1
Hi. I am using httpclient (HC) in a webcrawler. After 6 hours of run, HC stops working and I think, it is locked by some lock of a critical section in HC. The "problematic" code, I use, is here: http://www.egothor.org/temp/Network.java When 20 threads are used, I have this in `netstat -an` after stopping the robot from its console (note: the number of dead connection is really *20*): tcp23786 0 censored:60083censored2:80 CLOSE_WAIT tcp24179 0 censored:60080censored2:80 CLOSE_WAIT tcp24143 0 censored:60092censored2:80 CLOSE_WAIT tcp24164 0 censored:60091censored2:80 CLOSE_WAIT tcp24169 0 censored:60090censored2:80 CLOSE_WAIT tcp24189 0 censored:60089censored2:80 CLOSE_WAIT tcp24188 0 censored:60071censored2:80 CLOSE_WAIT tcp11697 0 censored:60066censored2:80 CLOSE_WAIT tcp24152 0 censored:60078censored2:80 CLOSE_WAIT tcp24180 0 censored:60076censored2:80 CLOSE_WAIT tcp24208 0 censored:60074censored2:80 CLOSE_WAIT tcp24154 0 censored:60073censored2:80 CLOSE_WAIT tcp24150 0 censored:60044censored2:80 CLOSE_WAIT tcp24150 0 censored:60041censored2:80 CLOSE_WAIT tcp24194 0 censored:60101censored2:80 CLOSE_WAIT tcp24206 0 censored:60099censored2:80 CLOSE_WAIT tcp24156 0 censored:60097censored2:80 CLOSE_WAIT tcp23643 0 censored:60096censored2:80 CLOSE_WAIT tcp23609 0 censored:60111censored2:80 CLOSE_WAIT tcp23961 0 censored:60110censored2:80 CLOSE_WAIT Other connections are not allocated and the robot is waiting for something (O/S does not report that the respective JAVA processes run). Any thoughts? Thank you Leo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Connection break
Hi. I write a robot for a search engine. The robot must harvest all files which are shorter than a few kilobytes (let's say 100kB) - longer files are not important, because they are often archives or long sheets about nothing. I cannot find a robust style in which I could drop a connection (GET over HTTP/1.0 and HTTP/1.1) when the incoming data stream exceeds the upper limit. I do it by closing the input stream, which is constructed by getResponseAsStream, followed by releaseConnection. Is it OK? My second point is related to "retrying" you have in your docs (http://jakarta.apache.org/commons/httpclient/tutorial.html - catch block of HttpRecovableException). When I do something like this, I found out that I had to call method.recycle() in the catch block, or the connection was not reinitialized and everything fails. Could you enlighten me on this? Is it a bug in the guide? (I have tried it on 2.0-b1). And my last point - when I run the robot under stress conditions, some connections seem to be frozen, although I use setConnectionTimeout. Is it a known issue? How should I debug it so that you can get a valuable log? It happens after 1-2 hours of run, so the log could have a few gigas... Thank you -g- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]