[ 
https://issues.apache.org/jira/browse/HDFS-7708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129769#comment-15129769
 ] 

Vinayakumar B commented on HDFS-7708:
-------------------------------------

bq. However, this approach isn't going to work here. The bash script execs 
java. That means that any bash commands after the exec are not relevant, since 
the bash process is replaced by the Java process. If you want a signal handler 
or an atexit handler, you need to do it from java.

IMO, its no need to handle in start-{daemon}.sh script or from java. Each 
daemon have its own stop script provided along with them. Better to handle 
removal of pid file corresponding to a daemon there.
For ex: For balancer, even if the balancer is running or exited after 
re-balancing, let the pid file there. On error, stop script can be called, 
which internally check for the pid file and corresponding command is running or 
not. If some other process is running with same pid as in pid file, instead of 
killing the other process, just can remove the pid file. 
Check of whether the process is same as command we are trying to stop, can be 
checked, by checking the existence of {{-Dproc_balancer}} in command line.

Since this general problem for all daemons. Have seen some other Jiras also 
facing same problem.

> Balancer should delete its pid file when it completes rebalance
> ---------------------------------------------------------------
>
>                 Key: HDFS-7708
>                 URL: https://issues.apache.org/jira/browse/HDFS-7708
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: balancer & mover
>    Affects Versions: 2.6.0
>            Reporter: Akira AJISAKA
>            Assignee: Rakesh R
>         Attachments: HDFS-7708-002.patch, HDFS-7708.patch
>
>
> When balancer completes rebalance and exits, it does not delete its pid file. 
> Starting balancer again, then "kill -0 pid" to confirm the balancer process 
> is not running.
> The problem is: If another process is running as the same pid as `cat 
> pidfile`, balancer fails to start with following message:
> {code}
>   balancer is running as process 3443. Stop it first.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to