[ 
https://issues.apache.org/jira/browse/AMBARI-13007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14741747#comment-14741747
 ] 

Hudson commented on AMBARI-13007:
---------------------------------

SUCCESS: Integrated in Ambari-trunk-Commit #3430 (See 
[https://builds.apache.org/job/Ambari-trunk-Commit/3430/])
AMBARI-13007: Stopping ambari-server may kill ambari-agent running on the same 
machine in some cases (Nahappan Somasundaram via jluniya) (jluniya: 
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=02fcea6c3c38f2fff4e73bdc3d235cf87477d4e2)
* ambari-server/src/main/python/ambari_server_main.py


> Stopping ambari-server may kill ambari-agent running on the same machine in 
> some cases
> --------------------------------------------------------------------------------------
>
>                 Key: AMBARI-13007
>                 URL: https://issues.apache.org/jira/browse/AMBARI-13007
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.2.0
>            Reporter: Nahappan Somasundaram
>            Assignee: Nahappan Somasundaram
>             Fix For: 2.2.0
>
>
> Launch multinode Ambari clusters using a simple python script. It logs in to 
> every node via ssh and runs a shell script:
> {code}
> #!/usr/bin/env bash
> while [[ $# > 0 ]]
> do
>   key="$1"
>   case ${key} in
>       --server)
>         ASERVER="$2"        # Server hostname
>         shift # past argument
>       ;;
>       --noserver)
>         NOSERVER="NOSERVER"  # Don't install/start server
>       ;;
>       *)
>         echo unknown option
>         exit 1
>       ;;
>   esac
>   shift # past argument or value
> done
> yum clean all
> curl 
> http://s3.amazonaws.com/dev.hortonworks.com/ambari/centos6/2.x/latest/trunk/ambaribn.repo
>  > /etc/yum.repos.d/ambari.repo
> # server
> if [ "${ASERVER}" = $(hostname -f) ] && [ -z "${NOSERVER}" ] ; then
>   yum install sudo postgresql-server wget -y
>   rpm -i /tmp/rpms/ambari-server*.rpm
>   # Disable iptables
>   iptables -F
>   ambari-server setup -s
>   # Enable remote debug
>   sed -rie 's/-server -XX:NewRatio/-server 
> -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 
> -XX:NewRatio/g'  /usr/sbin/ambari_server_main.py
>   ## Sleep until debugger connects
>   # sed -rie 
> 's/dt_socket,server=y,suspend=.,address=5005/dt_socket,server=y,suspend=y,address=5005/g'
>  /usr/sbin/ambari-server.py
>   # Fix an issue with UI client version
>   gunzip /usr/lib/ambari-server/web/javascripts/app.js.gz
>   amb=$(ambari-server --version); sed -i "s/App\.version = '';/App.version = 
> '$amb';/" /usr/lib/ambari-server/web/javascripts/app.js
>   gzip /usr/lib/ambari-server/web/javascripts/app.js
>   # Increase task timeout
>   sed -ri 
> 's/agent.package.install.task.timeout=1800/agent.package.install.task.timeout=3600/g'
>  /etc/ambari-server/conf/ambari.properties
>   find /var/lib/ambari-server/resources/ -name metainfo.xml | xargs -L 1 sed 
> -ri 's/<timeout>[[:digit:]]+[[:digit:]]*<\//<timeout>1800<\//g'
>   # Start the server
>   ambari-server start -v || exit 1
> fi
> # Agent
> iptables -F
> yum clean all
> yum install -y wget
> rpm -i /tmp/rpms/ambari-agent*.rpm
> # Replace server hostname
> sed -rie "s/hostname=localhost/hostname=$ASERVER/g" 
> /etc/ambari-agent/conf/ambari-agent.ini
> # Enable debug mode at agent
> # sed -rie 's/=INFO/=DEBUG/g' /etc/ambari-agent/conf/ambari-agent.ini
> ambari-agent start || exit 1
> {code}
> When I restart ambari-server, agent running on the same node is killed with 
> 100% probability. That is because it is launched in the same process group 
> with ambari-server, and ambari-server kills everything that belongs to it's 
> process group. I assume that this situation is common for launching 
> ambari-server and ambari-agent from the same shell script via ssh, or maybe 
> also via configuration management tools like puppet/chef/etc. (did not check 
> this assumption).
> *More info:*
> {code}
> [root@dlysnichenko-ru3-1 ~]# ps -ejH
>   PID  PGID   SID TTY          TIME CMD
>  1584  1584  1584 ?        00:00:00   sshd
>  2659  2659  2659 ?        00:00:00     sshd
>  2662  2662  2662 pts/0    00:00:00       bash
>  3268  3268  2662 pts/0    00:00:00         ps
>  2056  2041  2041 ?        00:00:00   postmaster
>  2058  2058  2058 ?        00:00:00     postmaster
>  2060  2060  2060 ?        00:00:00     postmaster
>  2061  2061  2061 ?        00:00:00     postmaster
>  2062  2062  2062 ?        00:00:00     postmaster
>  2063  2063  2063 ?        00:00:00     postmaster
>  2380  2380  2380 ?        00:00:00     postmaster
>  2397  2397  2397 ?        00:00:00     postmaster
>  2649  2649  2649 ?        00:00:01     postmaster
>  2654  2654  2654 ?        00:00:00     postmaster
>  2655  2655  2655 ?        00:00:00     postmaster
>  2656  2656  2656 ?        00:00:00     postmaster
>  2360  1644  1644 ?        00:00:59   java
>  2507  1644  1644 ?        00:00:00   python2.6
>  2515  1644  1644 ?        00:00:01     python2.6
>  3230  3230  3230 ?        00:00:00   anacron
> [root@dlysnichenko-ru3-1 ~]# ambari-agent status
> Found ambari-agent PID: 2515
> ambari-agent running.
> Agent PID at: /var/run/ambari-agent/ambari-agent.pid
> Agent out at: /var/log/ambari-agent/ambari-agent.out
> Agent log at: /var/log/ambari-agent/ambari-agent.log
> [root@dlysnichenko-ru3-1 ~]# ambari-server stop
> Using python  /usr/bin/python2.6
> Stopping ambari-server
> Ambari Server stopped
> [root@dlysnichenko-ru3-1 ~]# ambari-agent status
> Found ambari-agent PID: 2515
> ambari-agent not running. Stale PID File at: 
> /var/run/ambari-agent/ambari-agent.pid
> [root@dlysnichenko-ru3-1 ~]# 
> {code}
> Note: both agent and server share the same process group 1644. We should not 
> kill process group when stopping ambari-server, or we should create a 
> dedicated process group when launching it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to