[ https://issues.apache.org/jira/browse/SLIDER-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148266#comment-15148266 ]
kyungwan nam commented on SLIDER-1055: -------------------------------------- I created a new class to replace the ProcfsBasedProcessTree. children of the process are added to process-tree in the ProcfsBasedProcessTree. in addition to children, processes whose pgrpid is the same as pgrpid of the process are added to process-tree in the new class. and I set "yarn.nodemanager.container-monitor.process-tree.class” to the new class after restart NM, It is fixed as follows. {code} 2016-02-16 12:33:03,899 DEBUG NewProcfsBasedProcessTree (NewProcfsBasedProcessTree.java:updateProcessTree(288)) - [ 29794 29779 29564 29559 ] 2016-02-16 12:33:03,899 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 29559 for container-id container_e07_1451897008090_0009_01_000002: 410.4 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used {code} {code} 29556 29559 29559 29559 ? -1 Ss 500 0:00 \_ /bin/bash -c python ./infra/agent/slider-agent/agent/main.py --label container_e07_1451897008090_0009_01_000002___HBASE_MASTER --zk-quorum zk01.com:2181, zk01.com:2181,zk03.com:2181 --zk-reg-path /registry/users/yarn/services/org-apache-slider/hbase1 > /var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/slider-agent.out 2>&1 29559 29564 29559 29559 ? -1 Sl 500 99:43 \_ python ./infra/agent/slider-agent/agent/main.py --label container_e07_1451897008090_0009_01_000002___HBASE_MASTER --zk-quorum zk01.com:2181, zk01.com:2181,zk03.com:2181 --zk-reg-path /registry/users/yarn/services/org-apache-slider/hbase1 1 29779 29559 29559 ? -1 S 500 0:00 bash /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh --config /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/conf foreground_start master 29779 29794 29559 29559 ? -1 Sl 500 62:56 \_ /jdk-1.7.0_45/bin/java -Dproc_master -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/hs_err_pid%p.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/gc.log-201601121408 -Xmx1024m -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002 -Dhbase.log.file=hbase-yarn-master-csw024.log -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000002/app/install/hbase-0.98.13-hadoop2/bin/.. -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA -Djava.library.path=/package/hadoop-yarn-2.7.1-c3-20151026-arch-centos6-x86_64/lib/native -Dhbase.security.logger=INFO,RFAS org.apache.hadoop.hbase.master.HMaster start {code} It works well as I expected. but I am not sure it is a right way. please correct me if it is wrong... or Is there a better way? > hbase-daemon executed by slider is excepted from nodemanager container > monitoring > --------------------------------------------------------------------------------- > > Key: SLIDER-1055 > URL: https://issues.apache.org/jira/browse/SLIDER-1055 > Project: Slider > Issue Type: Bug > Components: application/hbase > Affects Versions: Slider 0.81 > Reporter: kyungwan nam > > here is nodemanager log of a host where a HBASE_REGIONSERVER component is > running > {code} > 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(361)) - Current ProcessTree list : [ 9801 ] > 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(436)) - Constructing ProcessTree for : PID = > 9801 ContainerId = container_e07_1451897008090_0009_01_000003 > 2016-01-12 14:11:49,262 DEBUG util.ProcfsBasedProcessTree > (ProcfsBasedProcessTree.java:updateProcessTree(274)) - [ 9801 9806 ] > 2016-01-12 14:11:49,262 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 9801 for > container-id container_e07_1451897008090_0009_01_000003: 14.2 MB of 1 GB > physical memory used; 517.1 MB of 2.1 GB virtual memory used > {code} > used memory for the container is lower than i expected. > because pids ( 9801 9806 ) are slider-agent process. regionserver process was > excepted from monitoring. > here is the result of "ps axjf" > {code} > 9798 9801 9801 9801 ? -1 Ss 500 0:00 \_ /bin/bash -c > python ./infra/agent/slider-agent/agent/main.py --label > container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum > 9801 9806 9801 9801 ? -1 Sl 500 0:01 \_ python > ./infra/agent/slider-agent/agent/main.py --label > container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum > 1 9979 9801 9801 ? -1 S 500 0:00 bash > /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh > --config > /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/conf > foreground_start regionserver > 9979 9994 9801 9801 ? -1 Sl 500 0:10 \_ > /package/jdk-1.7.0_45/bin/java -Dproc_regionserver > -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC > -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/hs_err_pid%p.log > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/gc.log-201601121408 > -Xmn200m -XX:CMSInitiatingOccupancyFraction=70 -Xms1024m -Xmx1024m > -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003 > -Dhbase.log.file=hbase-yarn-regionserver.log > -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/.. > -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA > -Djava.library.path=/package/hadoop-yarn-2.7.1-arch-centos6-x86_64/lib/native > -Dhbase.security.logger=INFO,RFAS > org.apache.hadoop.hbase.regionserver.HRegionServer start > {code} > when i use the ProcfsBasedProcessTree (default) > process-tree is determined by relationship between parent and child process. > so, daemonized process (ppid=1) can’t be included in process-tree. > I don't know it can be fixed in slider. > does it need to implement another ResourceCalculatorProcessTree to replace > the ProcfsBasedProcessTree? -- This message was sent by Atlassian JIRA (v6.3.4#6332)