Re: [Resin-interest] Regarding bug 3418, Watchdog Startup Synchronization

Rob Lockstone Thu, 14 May 2009 14:45:01 -0700

On May 14, 2009, at 14:03, Scott Ferguson wrote:

On May 14, 2009, at 12:57 PM, Rob Lockstone wrote:

Environment: Resin Pro 3.1.9 (100 Server License) on 64-bit Windows
2003/08 Server with Java 1.5_18.

This bug <http://bugs.caucho.com/view.php?id=3418> is still presentin

Resin Pro 3.1.9. I've already updated the bug, but figured I would
post here because I don't know how often bugs are read/updated.


Looking at the code, the socket timeout is only 1s, which is pretty
short.  The timeout is in com.caucho.boot.WatchdogProcess.runInstance

Agreed. The logs indicate that the closingInstance() method is gettingcalled. But look at the time stamps between the time when that happensand the time when it attempts to restart the WatchdogTask (which iswhen the problem occurs):


[2009/05/14 06:29:58.873] WatchdogProcess[Watchdog[],1] stopping Resin

[2009/05/14 06:30:01.827] java.lang.IllegalStateException: Can't startnew task because of old task 'WatchdogTask[Watchdog[]]'

That's about 3 seconds. So I don't think the socket time out is anissue. It does get to the closeInstance() method.


In the closingInstance() method, line 322 is:

        int status = process.exitValue();

So that means it has to wait for the process itself to exit. Thedestroy() method does a waitFor() on the process, which is reasonable.However, that waitFor() isn't encapsulated by any kind of maximum waittime thread. Java doesn't offer a waitFor() method with a wait time.In code that I've written that uses Processes, I always encapsulatethe Process inside a thread and specify a maximum amount of time thatI'm willing to wait for the process to end. In theory, this could leadto memory leaks if the process *never* ends. But at least thatinformation can be logged.

But anyway, none of this explains why the specified WatchdogTask isn'tbeing set to null in the Watchdog (it does, but only in the kill()method), so that means the same Watchdog instance is getting re-used.That's where I'm getting a little lost in the code. I see theWatchdogManager, and I see that there are a few places that Watchdoginstances get added to the Map, but I don't see where they getremoved. Should there be some point where a Watchdog instance getsremoved from the WatchdogManager?

Then again, given that resin is getting completely shut down, why isthere even a WatchdogManager and list of Watchdogs to worry about?Shouldn't all that disappear along with everything else?

Rob



-- Scott



Our deployment system uses the Windows SC (Service Controller)
commands to stop and then start resin. I built in a five second delay
between the time the SC query command notifies me that resin has
stopped and the time that it attempts to start it up again. My
concern, of course, is that five seconds might not always be enough
time for the watchdog to completely exit.

The original reporter of this bug indicates that, on a busy machine,
he has to wait as long as 15 seconds. However, it's unclear to me if
he's confirmed that resin has stopped before initiating the 15 second
delay. The five second delay I put in kicks in *after* I've confirmed
that resin has stopped as reported by the sc query command.

Is there any way to know if this five second delay is a legitimate
hack? Waiting for the dev/QA cycle for the 3.1.10 release isn't going

to work for us on our time table. I'm looking at the watchdog codenow

to see if I can figure out a fix, but I'm not going to be able to
spend too much time on it, I'm afraid.

Rob



_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest




_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

Re: [Resin-interest] Regarding bug 3418, Watchdog Startup Synchronization

Reply via email to