Re: [Resin-interest] Resin 4.0.9 release
Hi Jan, Did your resin server have a busy traffic? Did u observed any memory leak or heap overflow? -Wesley ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Wesley, Did your resin server have a busy traffic? Did u observed any memory leak or heap overflow? neither nor. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Out of curiosity do you have multiple instances of Resin (separate JVMs) running on that same virtual machine? If so do you use different ports for the watchdog on both instances? We had a similar situation and it was fixed by just running the watchdog on separate ports for each instance of Resin. Probably not your situation, but thought I'd ask. Aaron On 8/13/2010 12:27 AM, Jan Kriesten wrote: Hi Scott, I've put up a new snapshot. On a restart, you should see additional logging information in both the watchdog-manager.log and the jvm-default.log that should help narrow this down. the only new entry in the log on restart in the jvm-default.log is this: 06:44:16.969] {main} ProResin[id=] started in 81883ms WarningService: Stopping Resin because ping did not complete in time. [07:13:56.772] {resin-shutdown} ProServer[id=,cluster=app-tier] stopping Shutdown Resin reason: HEALTH Almost the same is showing up in watchdog-manager.log: [2010/08/13 07:13:56.816] Watchdog received warning from Resin[1,pid=32039]: Stopping Resin because ping did not complete in time. [2010/08/13 07:13:58.579] Watchdog detected close of Resin[,pid=32039] exit reason: HEALTH (exit code=8) [2010/08/13 07:13:58.579] Watchdog starting Resin[] It's really strange, though, that it happens almost exactly every 30 minutes. The behavior didn't occor with 4.0.7 or previous versions. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Aaron, Out of curiosity do you have multiple instances of Resin (separate JVMs) running on that same virtual machine? If so do you use different ports for the watchdog on both instances? We had a similar situation and it was fixed by just running the watchdog on separate ports for each instance of Resin. Probably not your situation, but thought I'd ask. no, it's just one instance. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, I've put up a new snapshot. On a restart, you should see additional logging information in both the watchdog-manager.log and the jvm-default.log that should help narrow this down. the only new entry in the log on restart in the jvm-default.log is this: 06:44:16.969] {main} ProResin[id=] started in 81883ms WarningService: Stopping Resin because ping did not complete in time. [07:13:56.772] {resin-shutdown} ProServer[id=,cluster=app-tier] stopping Shutdown Resin reason: HEALTH Almost the same is showing up in watchdog-manager.log: [2010/08/13 07:13:56.816] Watchdog received warning from Resin[1,pid=32039]: Stopping Resin because ping did not complete in time. [2010/08/13 07:13:58.579] Watchdog detected close of Resin[,pid=32039] exit reason: HEALTH (exit code=8) [2010/08/13 07:13:58.579] Watchdog starting Resin[] Excellent. That's exactly the kind of information I was hoping the log would provide. The HEALTH is an exit caused by a failed ping/health check. If Resin had detected an OOM or thread problem, the watchdog log would have shown MEMORY or THREAD. It's really strange, though, that it happens almost exactly every 30 minutes. The behavior didn't occor with 4.0.7 or previous versions. As a workaround, you can disable the ping (or PingThread) until I figure out why that's happening. -- Scott Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Scott, As a workaround, you can disable the ping (or PingThread) until I figure out why that's happening. as an idea, I have the following in resin.xm from an old default: resin:if test=${resin.professional} ping !-- urlhttp://localhost:8080/test-ping.jsp/url -- /ping /resin:if So the ping configuration is actually empty. Maybe this is treated differently with the latest releases? Best regards, --- Jan. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (Darwin) iEYEARECAAYFAkxmGGAACgkQME/SSH3iSFkYowCcCj+0AiAHnN01VI1tuE6TeleF AC8An0JWk3kVZq5eNtlXuGVZAbkpdggl =neMZ -END PGP SIGNATURE- ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Scott, I've put up a new snapshot. On a restart, you should see additional logging information in both the watchdog-manager.log and the jvm-default.log that should help narrow this down. the only new entry in the log on restart in the jvm-default.log is this: 06:44:16.969] {main} ProResin[id=] started in 81883ms WarningService: Stopping Resin because ping did not complete in time. [07:13:56.772] {resin-shutdown} ProServer[id=,cluster=app-tier] stopping Shutdown Resin reason: HEALTH Almost the same is showing up in watchdog-manager.log: [2010/08/13 07:13:56.816] Watchdog received warning from Resin[1,pid=32039]: Stopping Resin because ping did not complete in time. [2010/08/13 07:13:58.579] Watchdog detected close of Resin[,pid=32039] exit reason: HEALTH (exit code=8) [2010/08/13 07:13:58.579] Watchdog starting Resin[] It's really strange, though, that it happens almost exactly every 30 minutes. The behavior didn't occor with 4.0.7 or previous versions. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, It's not simply the ping because the restart isn't happening here. I'm working on improving the logging on both the Resin and Watchdog to get better information about this. that would be helpful. Funny thing is, it only happens on one of the machines. The configuration is the same on both, the only difference is a) in the number of virtual hosts (104 to 2) b) the number of provdided database jndi connections (57 to 5) Maybe this helps? The extra information for the logging will help (I'm hoping to get a snapshot today with the additional logging.) I'm not sure about the difference. Both of those do have associated timers/alarms, so if there's a timer problem, it's more likely to show up with the server with more. -- Scott Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, It's not simply the ping because the restart isn't happening here. I'm working on improving the logging on both the Resin and Watchdog to get better information about this. that would be helpful. Funny thing is, it only happens on one of the machines. The configuration is the same on both, the only difference is a) in the number of virtual hosts (104 to 2) b) the number of provdided database jndi connections (57 to 5) I've put up a new snapshot. On a restart, you should see additional logging information in both the watchdog-manager.log and the jvm-default.log that should help narrow this down. -- Scott Maybe this helps? Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Scott, the snapshot hasn't solved the restart-problem on our server: Resin Professional 4.0.s100809 (built Mon, 09 Aug 2010 11:41:02 PDT) [07:19:42.671] {main} ProResin[id=] started in 91076ms [...] [07:50:33.998] {main} ProResin[id=] started in 77686ms [...] [08:21:38.785] {main} ProResin[id=] started in 77336ms [...] [08:52:42.604] {main} ProResin[id=] started in 76134ms So it's seems to be every 30 minutes. :-/ The watchdog-manager.log is cluttered with the following entries - but I don't know what to make of them: [2010/08/10 07:58:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 30116ms [2010/08/10 07:58:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 30117ms [2010/08/10 07:59:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[com.caucho.network.listen.SocketLi nklistener$suspendrea...@71ce5e7a]] 60111ms [2010/08/10 07:59:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[NetworkListenService[]]] 60110ms [2010/08/10 07:59:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.boot.watchdogmana...@23 9cd5f5]] 90103ms [2010/08/10 07:59:40.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.log.rotatestr...@4ab346 46]] 66890ms [2010/08/10 08:00:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[WebAppController$2034408626[null]] ] 6ms [2010/08/10 08:00:10.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 60001ms [2010/08/10 08:00:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 6ms [2010/08/10 08:00:40.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 60001ms [...] [2010/08/10 11:11:10.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 59941ms [2010/08/10 11:11:10.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 30059ms [2010/08/10 11:11:40.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 59941ms [2010/08/10 11:11:40.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[NetworkListenService[]]] 6ms [2010/08/10 11:12:10.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[com.caucho.network.listen.SocketLi nklistener$suspendrea...@71ce5e7a]] 6ms [2010/08/10 11:12:10.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.boot.watchdogmana...@23 9cd5f5]] 6ms [2010/08/10 11:12:40.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[WebAppController$2034408626[null]] ] 59941ms [2010/08/10 11:12:40.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 30059ms Other errors are not reported by resin, though. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, the snapshot hasn't solved the restart-problem on our server: Resin Professional 4.0.s100809 (built Mon, 09 Aug 2010 11:41:02 PDT) [07:19:42.671] {main} ProResin[id=] started in 91076ms [...] [07:50:33.998] {main} ProResin[id=] started in 77686ms [...] [08:21:38.785] {main} ProResin[id=] started in 77336ms [...] [08:52:42.604] {main} ProResin[id=] started in 76134ms So it's seems to be every 30 minutes. :-/ Thanks. It's not simply the ping because the restart isn't happening here. I'm working on improving the logging on both the Resin and Watchdog to get better information about this. The watchdog-manager.log is cluttered with the following entries - but I don't know what to make of them: [2010/08/10 07:58:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 30116ms That may be related. When a Resin alarm (internal timer) is woken, it checks against the time it expected to be woken up. If that time is too big (over 30s) it prints that warning. If there's a big GC or 100% CPU, that might be expected, but that message is in the watchdog, so there's no good reason for the delay. -- Scott [2010/08/10 07:58:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 30117ms [2010/08/10 07:59:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[com.caucho.network.listen.SocketLi nklistener$suspendrea...@71ce5e7a]] 60111ms [2010/08/10 07:59:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[NetworkListenService[]]] 60110ms [2010/08/10 07:59:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.boot.watchdogmana...@23 9cd5f5]] 90103ms [2010/08/10 07:59:40.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.log.rotatestr...@4ab346 46]] 66890ms [2010/08/10 08:00:10.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[WebAppController$2034408626[null]] ] 6ms [2010/08/10 08:00:10.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 60001ms [2010/08/10 08:00:40.775] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 6ms [2010/08/10 08:00:40.776] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 60001ms [...] [2010/08/10 11:11:10.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 59941ms [2010/08/10 11:11:10.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[HostController[null]]] 30059ms [2010/08/10 11:11:40.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[ProServer[id=,cluster=]]] 59941ms [2010/08/10 11:11:40.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[NetworkListenService[]]] 6ms [2010/08/10 11:12:10.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[com.caucho.network.listen.SocketLi nklistener$suspendrea...@71ce5e7a]] 6ms [2010/08/10 11:12:10.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm alarm[alarm[com.caucho.boot.watchdogmana...@23 9cd5f5]] 6ms [2010/08/10 11:12:40.665] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[WebAppController$2034408626[null]] ] 59941ms [2010/08/10 11:12:40.724] com.caucho.util.alarm$coordinatorthr...@3ce95a56 slow alarm Alarm[alarm[SessionManager[]]] 30059ms Other errors are not reported by resin, though. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Scott, It's not simply the ping because the restart isn't happening here. I'm working on improving the logging on both the Resin and Watchdog to get better information about this. that would be helpful. Funny thing is, it only happens on one of the machines. The configuration is the same on both, the only difference is a) in the number of virtual hosts (104 to 2) b) the number of provdided database jndi connections (57 to 5) Maybe this helps? Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, I assume there's nothing in the jvm-default.log about why Resin's exiting? no, nothing. Hmm. Does the postmortem tab in the /resin-admin show any unusual memory or thread behavior? Or any created hs_* files? One of the things I want to do soon for Resin is to combine all the exit methods into one, so Resin restart is completely centralized. -- Scott Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Jan Kriesten wrote: Hi Scott, Still, since these are core fixes to critical timing-related capabilities that are hard to test exhaustively, you may want to take extra care in your own testing before deploying on 4.0.9. at first, resin 4.0.9 looked fine. But now I see in the jvm-default.log that resin is starting/stopping about every half hour?! There are no notices in the watchdog-log, though. Any hints on what's happening here? It doesn't happen with 4.0.7 (can't check with 4.0.8 cause it has the UTF8 buffer problem. I assume there's nothing in the jvm-default.log about why Resin's exiting? I'll see if I can reproduce it here. -- Scott Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Scott, I assume there's nothing in the jvm-default.log about why Resin's exiting? no, nothing. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Hi Scott, Still, since these are core fixes to critical timing-related capabilities that are hard to test exhaustively, you may want to take extra care in your own testing before deploying on 4.0.9. at first, resin 4.0.9 looked fine. But now I see in the jvm-default.log that resin is starting/stopping about every half hour?! There are no notices in the watchdog-log, though. Any hints on what's happening here? It doesn't happen with 4.0.7 (can't check with 4.0.8 cause it has the UTF8 buffer problem. Best regards, --- Jan. ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
[Resin-interest] Resin 4.0.9 release
Just an important note on Resin 4.0.9. Most of the work for this release was related to low level improvements and fixes to core capabilities like the locking for distributed caching, replacing synchronization with atomic locks in the core thread dispatching, improving the core alarm/timer functionality, fixing the watchdog/cluster authentication, and organizing the core Resin startup code to fix some startup timing problems. We added extra tests for those capabilities, and added several automatic stress tests to our nightly check. Still, since these are core fixes to critical timing-related capabilities that are hard to test exhaustively, you may want to take extra care in your own testing before deploying on 4.0.9. -- Scott ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
Oh my god Scott, you rule it! /** * Returns the current environment container. */ public static InjectManager getCurrent(ClassLoader loader) { return _localContainer.get(loader); } finally the synchronized (_localContainer) was gone! You have no idea how I suffered from the locking problem of previous Resin 4.0.x. My heavy traffic website hanged several times a day. Thanks a lot! I'll try 4.0.9 immediately. -Wesley 2010/7/31 Scott Ferguson f...@caucho.com Just an important note on Resin 4.0.9. Most of the work for this release was related to low level improvements and fixes to core capabilities like the locking for distributed caching, replacing synchronization with atomic locks in the core thread dispatching, improving the core alarm/timer functionality, fixing the watchdog/cluster authentication, and organizing the core Resin startup code to fix some startup timing problems. We added extra tests for those capabilities, and added several automatic stress tests to our nightly check. Still, since these are core fixes to critical timing-related capabilities that are hard to test exhaustively, you may want to take extra care in your own testing before deploying on 4.0.9. -- Scott ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest
Re: [Resin-interest] Resin 4.0.9 release
You got any release notes for 4.0.9 yet? On Jul 30, 2010, at 11:29:05, Scott Ferguson wrote: Just an important note on Resin 4.0.9. Most of the work for this release was related to low level improvements and fixes to core capabilities like the locking for distributed caching, replacing synchronization with atomic locks in the core thread dispatching, improving the core alarm/timer functionality, fixing the watchdog/cluster authentication, and organizing the core Resin startup code to fix some startup timing problems. We added extra tests for those capabilities, and added several automatic stress tests to our nightly check. Still, since these are core fixes to critical timing-related capabilities that are hard to test exhaustively, you may want to take extra care in your own testing before deploying on 4.0.9. -- Scott ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest ___ resin-interest mailing list resin-interest@caucho.com http://maillist.caucho.com/mailman/listinfo/resin-interest