[
https://issues.apache.org/jira/browse/DAEMON-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rustam Abdullaev updated DAEMON-377:
Attachment: daemon-377.patch
Attaching a patch that fixes a couple of races with pid file handling
(including this issue).
Now the controller will lock and check that the pid file hasn't changed before
deleting it.
Also fixed an issue with the version check that creates a pid file
unnecessarily and never removes one.
> Race in PID file handing in jsvc resulting in Tomcat running without a pidfile
> --
>
> Key: DAEMON-377
> URL: https://issues.apache.org/jira/browse/DAEMON-377
> Project: Commons Daemon
> Issue Type: Bug
>Reporter: Rustam Abdullaev
> Attachments: daemon-377.patch
>
>
> This issue is reproducible in FreeBSD under moderate load during a restart of
> Tomcat 8 running via jsvc.
> The old jsvc controller process exits _after_ the new process is started,
> deleting the new pid file with it. As a result, the jsvc starter process
> fails with a timeout since it is waiting on the pid file to be created, which
> never happens. The Tomcat process itself is *started without a pid file*.
> Example: (35659/35660 is the old jsvc process, 56362/56363 is the new jsvc
> process):
> {noformat}
> 2017-11-08 09:36:59 35660 jsvc debug: Daemon destroyed successfully
> 2017-11-08 09:36:59 35660 jsvc debug: Calling System.exit(0)
> 2017-11-08 09:36:59 56362 jsvc debug: Switching umask back to 022 from 077
> (((/var/run/tomcat8.pid is written by 56362 here)))
> 2017-11-08 09:36:59 56363 jsvc debug: Using specific JVM in
> /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so
> 2017-11-08 09:36:59 56363 jsvc debug: Attemtping to load library
> /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so
> (((/var/run/tomcat8.pid is deleted by 35659 here)))
> 2017-11-08 09:36:59 35659 jsvc debug: Service shut down
> 2017-11-08 09:36:59 56363 jsvc debug: JVM library
> /usr/local/openjdk8/jre/lib/amd64/server/libjvm.so loaded
> 2017-11-08 09:36:59 56363 jsvc debug: JVM library entry point found
> (0x019DE640)
> {noformat}
> Restart script eventually times out:
> {noformat}
> >/usr/local/etc/rc.d/tomcat8 restart
> Stopping tomcat8.
> Waiting for PIDS: 35660.
> Starting tomcat8.
> /usr/local/etc/rc.d/tomcat8: WARNING: failed to start tomcat8
> {noformat}
> No PID file:
> {noformat}
> >ls -l /var/run/tomcat8.pid
> ls: /var/run/tomcat8.pid: No such file or directory
> {noformat}
> Yet Tomcat is running:
> {noformat}
> >ps ax|grep java|grep -v grep
> 56362 - Is 0:00.00 /usr/local/bin/jsvc -java-home /usr/local/openjdk8
> -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u
> 56363 - I 0:57.25 /usr/local/bin/jsvc -java-home /usr/local/openjdk8
> -server -user www -pidfile /var/run/tomcat8.pid -wait 300 -outfile /u
> {noformat}
> The issue is that the pidfile contains the PID of the _child_, but is being
> deleted by the _parent_ process (the controller), in the {{run_controller}}
> function which looks like:
> {code}
> static int run_controller(arg_data *args, home_data *data, uid_t uid,
> gid_t gid)
> . . .
> waitpid(pid, &status, 0);
> unlink(args->pidf);
> {code}
> If the controller process is paged out (which happens often because it is
> dormant while inside {{waitpid}}), then considerable amount of time can pass
> between the time the child terminates and the call to {{unlink(args->pidf)}}.
> The issue can be reproduced reliably by adding {{sleep(1);}} before
> {{unlink(args->pidf)}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)