On May 14, 2009, at 14:03, Scott Ferguson wrote:
On May 14, 2009, at 12:57 PM, Rob Lockstone wrote:
Environment: Resin Pro 3.1.9 (100 Server License) on 64-bit Windows
2003/08 Server with Java 1.5_18.
This bug <http://bugs.caucho.com/view.php?id=3418> is still present
in
Resin Pro 3.1.9. I've already updated the bug, but figured I would
post here because I don't know how often bugs are read/updated.
Looking at the code, the socket timeout is only 1s, which is pretty
short. The timeout is in com.caucho.boot.WatchdogProcess.runInstance
Agreed. The logs indicate that the closingInstance() method is getting
called. But look at the time stamps between the time when that happens
and the time when it attempts to restart the WatchdogTask (which is
when the problem occurs):
[2009/05/14 06:29:58.873] WatchdogProcess[Watchdog[],1] stopping Resin
[2009/05/14 06:30:01.827] java.lang.IllegalStateException: Can't start
new task because of old task 'WatchdogTask[Watchdog[]]'
That's about 3 seconds. So I don't think the socket time out is an
issue. It does get to the closeInstance() method.
In the closingInstance() method, line 322 is:
int status = process.exitValue();
So that means it has to wait for the process itself to exit. The
destroy() method does a waitFor() on the process, which is reasonable.
However, that waitFor() isn't encapsulated by any kind of maximum wait
time thread. Java doesn't offer a waitFor() method with a wait time.
In code that I've written that uses Processes, I always encapsulate
the Process inside a thread and specify a maximum amount of time that
I'm willing to wait for the process to end. In theory, this could lead
to memory leaks if the process *never* ends. But at least that
information can be logged.
But anyway, none of this explains why the specified WatchdogTask isn't
being set to null in the Watchdog (it does, but only in the kill()
method), so that means the same Watchdog instance is getting re-used.
That's where I'm getting a little lost in the code. I see the
WatchdogManager, and I see that there are a few places that Watchdog
instances get added to the Map, but I don't see where they get
removed. Should there be some point where a Watchdog instance gets
removed from the WatchdogManager?
Then again, given that resin is getting completely shut down, why is
there even a WatchdogManager and list of Watchdogs to worry about?
Shouldn't all that disappear along with everything else?
Rob
-- Scott
Our deployment system uses the Windows SC (Service Controller)
commands to stop and then start resin. I built in a five second delay
between the time the SC query command notifies me that resin has
stopped and the time that it attempts to start it up again. My
concern, of course, is that five seconds might not always be enough
time for the watchdog to completely exit.
The original reporter of this bug indicates that, on a busy machine,
he has to wait as long as 15 seconds. However, it's unclear to me if
he's confirmed that resin has stopped before initiating the 15 second
delay. The five second delay I put in kicks in *after* I've confirmed
that resin has stopped as reported by the sc query command.
Is there any way to know if this five second delay is a legitimate
hack? Waiting for the dev/QA cycle for the 3.1.10 release isn't going
to work for us on our time table. I'm looking at the watchdog code
now
to see if I can figure out a fix, but I'm not going to be able to
spend too much time on it, I'm afraid.
Rob
_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest
_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest
_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest