[jira] [Commented] (SLIDER-1055) hbase-daemon executed by slider is excepted from nodemanager container monitoring

Josh Elser (JIRA) Sat, 12 Mar 2016 21:47:13 -0800

    [ 
https://issues.apache.org/jira/browse/SLIDER-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15192148#comment-15192148
 ]


Josh Elser commented on SLIDER-1055:
------------------------------------

Great finds here, all!

bq. Besides, when nodemanager exits by accident, hbase process would exit as 
well(today, hbase process would remain and be out of control).

I think that's a limitation of just how YARN works. If the nodemanager dies, 
the containers that it was running should also die (and get relaunched 
elsewhere). It would be possible to potentially do somethign differently, but I 
think that's how it should work (background'ing seems wrong).

bq. 1. don’t allow running daemon process in slider-app. as Tao Jie did, 
process should be ran on foreground. most of apps in apache slider need to be 
fixed to achieve it.

This would be a really good fix to make. Any interest in putting up a patch 
which changes this? That would be helpful for us to verify.

bq.  2. set "yarn.nodemanager.container-monitor.process-tree.class" to a new 
class to replace the ProcfsBasedProcessTree. (as I did above)

I don't exactly understand why we need the new ProcessTree implementation. Can 
you comment on why it was needed and what your implementation did differently? 
That would be something we'd have to work with the YARN project to address.

> hbase-daemon executed by slider is excepted from nodemanager container 
> monitoring
> ---------------------------------------------------------------------------------
>
>                 Key: SLIDER-1055
>                 URL: https://issues.apache.org/jira/browse/SLIDER-1055
>             Project: Slider
>          Issue Type: Bug
>          Components: application/hbase
>    Affects Versions: Slider 0.81
>            Reporter: kyungwan nam
>
> here is nodemanager log of a host where a HBASE_REGIONSERVER component is 
> running
> {code}
> 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(361)) - Current ProcessTree list : [ 9801 ]
> 2016-01-12 14:11:49,237 DEBUG monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(436)) - Constructing ProcessTree for : PID = 
> 9801 ContainerId = container_e07_1451897008090_0009_01_000003
> 2016-01-12 14:11:49,262 DEBUG util.ProcfsBasedProcessTree 
> (ProcfsBasedProcessTree.java:updateProcessTree(274)) - [ 9801 9806 ]
> 2016-01-12 14:11:49,262 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 9801 for 
> container-id container_e07_1451897008090_0009_01_000003: 14.2 MB of 1 GB 
> physical memory used; 517.1 MB of 2.1 GB virtual memory used
> {code}
> used memory for the container is lower than i expected.
> because pids ( 9801 9806 ) are slider-agent process. regionserver process was 
> excepted from monitoring.
> here is the result of "ps axjf" 
> {code}
>  9798  9801  9801  9801 ?           -1 Ss     500   0:00      \_ /bin/bash -c 
> python ./infra/agent/slider-agent/agent/main.py --label 
> container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
>  9801  9806  9801  9801 ?           -1 Sl     500   0:01          \_ python 
> ./infra/agent/slider-agent/agent/main.py --label 
> container_e07_1451897008090_0009_01_000003___HBASE_REGIONSERVER --zk-quorum 
>     1  9979  9801  9801 ?           -1 S      500   0:00 bash 
> /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/hbase-daemon.sh
>  --config 
> /volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/conf
>  foreground_start regionserver
>  9979  9994  9801  9801 ?           -1 Sl     500   0:10  \_ 
> /package/jdk-1.7.0_45/bin/java -Dproc_regionserver 
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx1000m -XX:+UseConcMarkSweepGC 
> -XX:ErrorFile=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/hs_err_pid%p.log
>  -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
> -Xloggc:/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/gc.log-201601121408
>  -Xmn200m -XX:CMSInitiatingOccupancyFraction=70 -Xms1024m -Xmx1024m 
> -Dhbase.log.dir=/var/logs/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003
>  -Dhbase.log.file=hbase-yarn-regionserver.log 
> -Dhbase.home.dir=/volume/nodemanager/usercache/yarn/appcache/application_1451897008090_0009/container_e07_1451897008090_0009_01_000003/app/install/hbase-0.98.13-hadoop2/bin/..
>  -Dhbase.id.str=yarn -Dhbase.root.logger=INFO,RFA 
> -Djava.library.path=/package/hadoop-yarn-2.7.1-arch-centos6-x86_64/lib/native 
> -Dhbase.security.logger=INFO,RFAS 
> org.apache.hadoop.hbase.regionserver.HRegionServer start
> {code}
> when i use the ProcfsBasedProcessTree (default)
> process-tree is determined by relationship between parent and child process.
> so, daemonized process (ppid=1) can’t be included in process-tree.
> I don't know it can be fixed in slider.
> does it need to implement another ResourceCalculatorProcessTree to replace 
> the ProcfsBasedProcessTree?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SLIDER-1055) hbase-daemon executed by slider is excepted from nodemanager container monitoring

Reply via email to