[ https://issues.apache.org/jira/browse/SLIDER-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192148#comment-15192148 ]
Josh Elser commented on SLIDER-1055: ------------------------------------ Great finds here, all! bq. Besides, when nodemanager exits by accident, hbase process would exit as well(today, hbase process would remain and be out of control). I think that's a limitation of just how YARN works. If the nodemanager dies, the containers that it was running should also die (and get relaunched elsewhere). It would be possible to potentially do somethign differently, but I think that's how it should work (background'ing seems wrong). bq. 1. don’t allow running daemon process in slider-app. as Tao Jie did, process should be ran on foreground. most of apps in apache slider need to be fixed to achieve it. This would be a really good fix to make. Any interest in putting up a patch which changes this? That would be helpful for us to verify. bq. 2. set "yarn.nodemanager.container-monitor.process-tree.class" to a new class to replace the ProcfsBasedProcessTree. (as I did above) I don't exactly understand why we need the new ProcessTree implementation. Can you comment on why it was needed and what your implementation did differently? That would be something we'd have to work with the YARN project to address. > hbase-daemon executed by slider is excepted from nodemanager container > monitoring > --------------------------------------------------------------------------------- > > Key: SLIDER-1055 > URL: https://issues.apache.org/jira/browse/SLIDER-1055 > Project: Slider > Issue Type: Bug > Components: application/hbase > Affects Versions: Slider 0.81 > Reporter: kyungwan nam > > here is nodemanager log of a host where a HBASE_REGIONSERVER component is > running > {code} > 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(361)) - Current ProcessTree list : [ 9801 ] > 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(436)) - Constructing ProcessTree for : PID = > 9801 ContainerId = container_e07_1451897008090_0009_01_000003 > 2016-01-12 14:11:49,262 DEBUG util.ProcfsBasedProcessTree > (ProcfsBasedProcessTree.java:updateProcessTree(274)) - [ 9801 9806 ] > 2016-01-12 14:11:49,262 INFO monitor.ContainersMonitorImpl > (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 9801 for > container-id container_e07_1451897008090_0009_01_000003: 14.2 MB of 1 GB > physical memory used; 517.1 MB of 2.1 GB virtual memory used > {code} > used memory for the container is lower than i expected. > because pids ( 9801 9806 ) are slider-agent process. regionserver process was > excepted from monitoring. > here is the result of "ps axjf" > {code} > 9798 9801 9801 9801 ? -1 Ss 500 0:00 \_ /bin/bash -c > python ./infra/agent/slider-agent/agent/main.py --label > container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum > 9801 9806 9801 9801 ? -1 Sl 500 0:01 \_ python > ./infra/agent/slider-agent/agent/main.py --label > container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum > 1 9979 9801 9801 ? -1 S 500 0:00 bash > /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh > --config > /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/conf > foreground_start regionserver > 9979 9994 9801 9801 ? -1 Sl 500 0:10 \_ > /package/jdk-1.7.0_45/bin/java -Dproc_regionserver > -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC > -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/hs_err_pid%p.log > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps > -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/gc.log-201601121408 > -Xmn200m -XX:CMSInitiatingOccupancyFraction=70 -Xms1024m -Xmx1024m > -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003 > -Dhbase.log.file=hbase-yarn-regionserver.log > -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/.. > -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA > -Djava.library.path=/package/hadoop-yarn-2.7.1-arch-centos6-x86_64/lib/native > -Dhbase.security.logger=INFO,RFAS > org.apache.hadoop.hbase.regionserver.HRegionServer start > {code} > when i use the ProcfsBasedProcessTree (default) > process-tree is determined by relationship between parent and child process. > so, daemonized process (ppid=1) can’t be included in process-tree. > I don't know it can be fixed in slider. > does it need to implement another ResourceCalculatorProcessTree to replace > the ProcfsBasedProcessTree? -- This message was sent by Atlassian JIRA (v6.3.4#6332)