[jira] [Commented] (SLIDER-1055) hbase-daemon executed by slider is excepted from nodemanager container monitoring

kyungwan nam (JIRA) Tue, 16 Feb 2016 00:28:25 -0800

    [ 
https://issues.apache.org/jira/browse/SLIDER-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148266#comment-15148266
 ]


kyungwan nam commented on SLIDER-1055:
--------------------------------------

I created a new class to replace the ProcfsBasedProcessTree.
children of the process are added to process-tree in the ProcfsBasedProcessTree.
in addition to children, processes whose pgrpid is the same as pgrpid of the 
process are added to process-tree in the new class.
and I set "yarn.nodemanager.container-monitor.process-tree.class” to the new 
class
after restart NM, It is fixed as follows.

{code}
2016-02-16 12:33:03,899 DEBUG NewProcfsBasedProcessTree 
(NewProcfsBasedProcessTree.java:updateProcessTree(288)) - [ 29794 29779 29564 
29559 ]
2016-02-16 12:33:03,899 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 29559 for 
container-id container_e07_1451897008090_0009_01_000002: 410.4 MB of 1 GB 
physical memory used; 2.3 GB of 2.1 GB virtual memory used
{code}

{code}
29556 29559 29559 29559 ?           -1 Ss     500   0:00  \_ /bin/bash -c 
python ./infra/agent/slider-agent/agent/main.py --label 
container_e07_1451897008090_0009_01_000002___HBASE_MASTER --zk-quorum 
zk01.com:2181, zk01.com:2181,zk03.com:2181 --zk-reg-path 
/registry/users/yarn/services/org-apache-slider/hbase1 > 
/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/slider-agent.out
 2>&1
29559 29564 29559 29559 ?           -1 Sl     500  99:43      \_ python 
./infra/agent/slider-agent/agent/main.py --label 
container_e07_1451897008090_0009_01_000002___HBASE_MASTER --zk-quorum  
zk01.com:2181, zk01.com:2181,zk03.com:2181 --zk-reg-path 
/registry/users/yarn/services/org-apache-slider/hbase1
    1 29779 29559 29559 ?           -1 S      500   0:00 bash 
/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh
 --config 
/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/conf
 foreground_start master
29779 29794 29559 29559 ?           -1 Sl     500  62:56  \_ 
/jdk-1.7.0_45/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p 
-Xmx1000m -XX:+UseConcMarkSweepGC 
-XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/hs_err_pid%p.log
 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/gc.log-201601121408
 -Xmx1024m 
-Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002
 -Dhbase.log.file=hbase-yarn-master-csw024.log 
-Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/bin/..
 -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA 
-Djava.library.path=/package/hadoop-yarn-2.7.1-c3-20151026-arch-centos6-x86_64/lib/native
 -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start
{code}

It works well as I expected.
but I am not sure it is a right way.
please correct me if it is wrong...
or Is there a better way?


> hbase-daemon executed by slider is excepted from nodemanager container 
> monitoring
> ---------------------------------------------------------------------------------
>
>                 Key: SLIDER-1055
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1055
>             Project: Slider
>          Issue Type: Bug
>          Components: application/hbase
>    Affects Versions: Slider 0.81
>            Reporter: kyungwan nam
>
> here is nodemanager log of a host where a HBASE_REGIONSERVER component is 
> running
> {code}
> 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(361)) - Current ProcessTree list : [ 9801 ]
> 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(436)) - Constructing ProcessTree for : PID = 
> 9801 ContainerId = container_e07_1451897008090_0009_01_000003
> 2016-01-12 14:11:49,262 DEBUG util.ProcfsBasedProcessTree 
> (ProcfsBasedProcessTree.java:updateProcessTree(274)) - [ 9801 9806 ]
> 2016-01-12 14:11:49,262 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 9801 for 
> container-id container_e07_1451897008090_0009_01_000003: 14.2 MB of 1 GB 
> physical memory used; 517.1 MB of 2.1 GB virtual memory used
> {code}
> used memory for the container is lower than i expected.
> because pids ( 9801 9806 ) are slider-agent process. regionserver process was 
> excepted from monitoring.
> here is the result of "ps axjf" 
> {code}
>  9798  9801  9801  9801 ?           -1 Ss     500   0:00      \_ /bin/bash -c 
> python ./infra/agent/slider-agent/agent/main.py --label 
> container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
>  9801  9806  9801  9801 ?           -1 Sl     500   0:01          \_ python 
> ./infra/agent/slider-agent/agent/main.py --label 
> container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
>     1  9979  9801  9801 ?           -1 S      500   0:00 bash 
> /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh
>  --config 
> /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/conf
>  foreground_start regionserver
>  9979  9994  9801  9801 ?           -1 Sl     500   0:10  \_ 
> /package/jdk-1.7.0_45/bin/java -Dproc_regionserver 
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC 
> -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/hs_err_pid%p.log
>  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/gc.log-201601121408
>  -Xmn200m -XX:CMSInitiatingOccupancyFraction=70 -Xms1024m -Xmx1024m 
> -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003
>  -Dhbase.log.file=hbase-yarn-regionserver.log 
> -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/..
>  -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA 
> -Djava.library.path=/package/hadoop-yarn-2.7.1-arch-centos6-x86_64/lib/native 
> -Dhbase.security.logger=INFO,RFAS 
> org.apache.hadoop.hbase.regionserver.HRegionServer start
> {code}
> when i use the ProcfsBasedProcessTree (default)
> process-tree is determined by relationship between parent and child process.
> so, daemonized process (ppid=1) can’t be included in process-tree.
> I don't know it can be fixed in slider.
> does it need to implement another ResourceCalculatorProcessTree to replace 
> the ProcfsBasedProcessTree?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLIDER-1055) hbase-daemon executed by slider is excepted from nodemanager container monitoring

Reply via email to