[jira] [Updated] (HADOOP-14858) Why Yarn crashes ?

anikad ayman (JIRA) Mon, 11 Sep 2017 08:01:48 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-14858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


anikad ayman updated HADOOP-14858:
----------------------------------
    Description: 
During MapReduce processing, Yarn did crash and the processing of jobs had 
stopped. I successed to back the processing after killing the first job which 
was running, but after some minutes, another crach thatI solved by killing the 
second job wich was running.

 

We are looking for reasons of this crach that we had several times before 
(between one to two times in a month)

 

In ressource manager logs , I find this messages repeated from the beggining of 
the crach until the killing of the jobs:

 

    
{code:java}
2017-08-25 03:51:58,815 WARN org.apache.hadoop.ipc.Server: Large response size 
4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38352 Call#33361 Retry#0
    2017-08-25 03:53:39,255 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38456 Call#33364 Retry#0
    2017-08-25 03:55:19,700 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38556 Call#33367 Retry#0
    2017-08-25 03:57:00,262 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38674 Call#33370 Retry#0
    2017-08-25 03:58:40,687 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38804 Call#33373 Retry#0
{code}

    .
    .
    .
    2017-08-25 11:02:44,086 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39778 Call#34159 Retry#0
    2017-08-25 11:02:47,933 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39778 Call#34162 Retry#0
    2017-08-25 11:03:06,800 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39814 Call#34165 Retry#0

 

NB: We still get this warning from time to another, we still wondring if it 
concerns a connexion between the node manager (10.135.8.101) and the ressource 
manager, or something else ?

 

For the node manager logs, I find theses messages :

    2017-08-25 03:51:54,396 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:54,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 112912 for container-id 
container_e41_1500982512144_36679_01_000387: 2.3 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:55,177 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 619.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:58,938 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    .
    .
    .
    2017-08-25 11:05:40,104 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 112912 for container-id 
container_e41_1500982512144_36679_01_000387: 1.1 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:40,493 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:43,867 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.1 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:45,040 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e41_1500982512144_36627_01_001644 transitioned from 
RUNNING to KILLING
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Application application_1500982512144_36627 transitioned from RUNNING to 
FINISHING_CONTAINERS_WAIT
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_e41_1500982512144_36627_01_001644


and also for the job history :

    2017-08-25 03:53:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 03:56:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 03:59:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:02:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:05:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:08:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:11:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files 


.
.
.

    2017-08-25 11:05:36,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner started
    2017-08-25 11:05:41,271 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner complete
    2017-08-25 11:06:04,214 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
    2017-08-25 11:08:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
jobId=job_1500982512144_36793,submitTime=1503647426340,launchTime=1503651960434,firstMapTaskLaunchTime=1503651982671,firstReduceTaskLaunchTime=0,finishTime=1503651985794,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=9,reduceSlotSeconds=0,jobName=SELECT
 `C_7361705f62736973`.`buk...20170825)(Stage-1)
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Deleting JobSummary file: 
[maprfs:/var/mapr/cluster/yarn/rm/staging/history/done_intermediate/mapr/job_1500982512144_36793.summary]
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
jobId=job_1500982512144_36778,submitTime=1503642110785,launchTime=1503651960266,firstMapTaskLaunchTime=1503651969483,firstReduceTaskLaunchTime=0,finishTime=1503651976016,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=19,reduceSlotSeconds=0,jobName=SELECT
 `C_7361705f7662726b`.`vbe...20170825)(Stage-1)

 

Please, have you any explication or solution of this issue ?

  was:
During MapReduce processing, Yarn did crash and the processing of jobs had 
stopped. I successed to back the processing after killing the first job which 
was running, but after some minutes, another crach thatI solved by killing the 
second job wich was running.

 

We are looking for reasons of this crach that we had several times before 
(between one to two times in a month)

 

In ressource manager logs , I find this messages repeated from the beggining of 
the crach until the killing of the jobs:

 

    2017-08-25 03:51:58,815 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38352 Call#33361 Retry#0
    2017-08-25 03:53:39,255 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38456 Call#33364 Retry#0
    2017-08-25 03:55:19,700 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38556 Call#33367 Retry#0
    2017-08-25 03:57:00,262 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38674 Call#33370 Retry#0
    2017-08-25 03:58:40,687 WARN org.apache.hadoop.ipc.Server: Large response 
size 4739374 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:38804 Call#33373 Retry#0
    .
    .
    .
    2017-08-25 11:02:44,086 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39778 Call#34159 Retry#0
    2017-08-25 11:02:47,933 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39778 Call#34162 Retry#0
    2017-08-25 11:03:06,800 WARN org.apache.hadoop.ipc.Server: Large response 
size 4751251 for call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
10.135.8.101:39814 Call#34165 Retry#0

 

NB: We still get this warning from time to another, we still wondring if it 
concerns a connexion between the node manager (10.135.8.101) and the ressource 
manager, or something else ?

 

For the node manager logs, I find theses messages :

    2017-08-25 03:51:54,396 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:54,791 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 112912 for container-id 
container_e41_1500982512144_36679_01_000387: 2.3 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:55,177 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 619.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 03:51:58,938 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    .
    .
    .
    2017-08-25 11:05:40,104 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 112912 for container-id 
container_e41_1500982512144_36679_01_000387: 1.1 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:40,493 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:43,867 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 98201 for container-id 
container_e41_1500982512144_36679_01_000382: 1.1 GB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:45,040 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 105848 for container-id 
container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical memory 
used; 10.1 GB of 21 GB virtual memory used
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
 Container container_e41_1500982512144_36627_01_001644 transitioned from 
RUNNING to KILLING
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
 Application application_1500982512144_36627 transitioned from RUNNING to 
FINISHING_CONTAINERS_WAIT
    2017-08-25 11:05:48,397 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
 Cleaning up container container_e41_1500982512144_36627_01_001644


and also for the job history :

    2017-08-25 03:53:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 03:56:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 03:59:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:02:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:05:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:08:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 04:11:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files 


.
.
.

    2017-08-25 11:05:36,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner started
    2017-08-25 11:05:41,271 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
History Cleaner complete
    2017-08-25 11:06:04,214 INFO 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
 Updating the current master key for generating delegation tokens
    2017-08-25 11:08:06,504 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: 
Starting scan to move intermediate done files
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
jobId=job_1500982512144_36793,submitTime=1503647426340,launchTime=1503651960434,firstMapTaskLaunchTime=1503651982671,firstReduceTaskLaunchTime=0,finishTime=1503651985794,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=9,reduceSlotSeconds=0,jobName=SELECT
 `C_7361705f62736973`.`buk...20170825)(Stage-1)
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Deleting JobSummary file: 
[maprfs:/var/mapr/cluster/yarn/rm/staging/history/done_intermediate/mapr/job_1500982512144_36793.summary]
    2017-08-25 11:08:06,518 INFO 
org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
jobId=job_1500982512144_36778,submitTime=1503642110785,launchTime=1503651960266,firstMapTaskLaunchTime=1503651969483,firstReduceTaskLaunchTime=0,finishTime=1503651976016,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=19,reduceSlotSeconds=0,jobName=SELECT
 `C_7361705f7662726b`.`vbe...20170825)(Stage-1)

 

Please, have you any explication or solution of this issue ?


> Why Yarn crashes ?
> ------------------
>
>                 Key: HADOOP-14858
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14858
>             Project: Hadoop Common
>          Issue Type: Bug
>         Environment: Production 
>            Reporter: anikad ayman
>             Fix For: 2.7.0
>
>
> During MapReduce processing, Yarn did crash and the processing of jobs had 
> stopped. I successed to back the processing after killing the first job which 
> was running, but after some minutes, another crach thatI solved by killing 
> the second job wich was running.
>  
> We are looking for reasons of this crach that we had several times before 
> (between one to two times in a month)
>  
> In ressource manager logs , I find this messages repeated from the beggining 
> of the crach until the killing of the jobs:
>  
>     
> {code:java}
> 2017-08-25 03:51:58,815 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4739374 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:38352 Call#33361 Retry#0
>     2017-08-25 03:53:39,255 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4739374 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:38456 Call#33364 Retry#0
>     2017-08-25 03:55:19,700 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4739374 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:38556 Call#33367 Retry#0
>     2017-08-25 03:57:00,262 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4739374 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:38674 Call#33370 Retry#0
>     2017-08-25 03:58:40,687 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4739374 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:38804 Call#33373 Retry#0
> {code}
>     .
>     .
>     .
>     2017-08-25 11:02:44,086 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4751251 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:39778 Call#34159 Retry#0
>     2017-08-25 11:02:47,933 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4751251 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:39778 Call#34162 Retry#0
>     2017-08-25 11:03:06,800 WARN org.apache.hadoop.ipc.Server: Large response 
> size 4751251 for call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications from 
> 10.135.8.101:39814 Call#34165 Retry#0
>  
> NB: We still get this warning from time to another, we still wondring if it 
> concerns a connexion between the node manager (10.135.8.101) and the 
> ressource manager, or something else ?
>  
> For the node manager logs, I find theses messages :
>     2017-08-25 03:51:54,396 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 98201 for container-id 
> container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
> used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 03:51:54,791 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 112912 for container-id 
> container_e41_1500982512144_36679_01_000387: 2.3 GB of 10 GB physical memory 
> used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 03:51:55,177 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 105848 for container-id 
> container_e41_1500982512144_36627_01_001644: 619.4 MB of 10 GB physical 
> memory used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 03:51:58,938 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 98201 for container-id 
> container_e41_1500982512144_36679_01_000382: 1.4 GB of 10 GB physical memory 
> used; 10.1 GB of 21 GB virtual memory used
>     .
>     .
>     .
>     2017-08-25 11:05:40,104 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 112912 for container-id 
> container_e41_1500982512144_36679_01_000387: 1.1 GB of 10 GB physical memory 
> used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 11:05:40,493 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 105848 for container-id 
> container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical 
> memory used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 11:05:43,867 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 98201 for container-id 
> container_e41_1500982512144_36679_01_000382: 1.1 GB of 10 GB physical memory 
> used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 11:05:45,040 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 105848 for container-id 
> container_e41_1500982512144_36627_01_001644: 648.4 MB of 10 GB physical 
> memory used; 10.1 GB of 21 GB virtual memory used
>     2017-08-25 11:05:48,397 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl:
>  Container container_e41_1500982512144_36627_01_001644 transitioned from 
> RUNNING to KILLING
>     2017-08-25 11:05:48,397 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl:
>  Application application_1500982512144_36627 transitioned from RUNNING to 
> FINISHING_CONTAINERS_WAIT
>     2017-08-25 11:05:48,397 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
>  Cleaning up container container_e41_1500982512144_36627_01_001644
> and also for the job history :
>     2017-08-25 03:53:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 03:56:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 03:59:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 04:02:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 04:05:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 04:08:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 04:11:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files 
> .
> .
> .
>     2017-08-25 11:05:36,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: History Cleaner started
>     2017-08-25 11:05:41,271 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: History Cleaner complete
>     2017-08-25 11:06:04,214 INFO 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
>  Updating the current master key for generating delegation tokens
>     2017-08-25 11:08:06,504 INFO 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move 
> intermediate done files
>     2017-08-25 11:08:06,518 INFO 
> org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
> jobId=job_1500982512144_36793,submitTime=1503647426340,launchTime=1503651960434,firstMapTaskLaunchTime=1503651982671,firstReduceTaskLaunchTime=0,finishTime=1503651985794,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=9,reduceSlotSeconds=0,jobName=SELECT
>  `C_7361705f62736973`.`buk...20170825)(Stage-1)
>     2017-08-25 11:08:06,518 INFO 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager: Deleting JobSummary 
> file: 
> [maprfs:/var/mapr/cluster/yarn/rm/staging/history/done_intermediate/mapr/job_1500982512144_36793.summary]
>     2017-08-25 11:08:06,518 INFO 
> org.apache.hadoop.mapreduce.jobhistory.JobSummary: 
> jobId=job_1500982512144_36778,submitTime=1503642110785,launchTime=1503651960266,firstMapTaskLaunchTime=1503651969483,firstReduceTaskLaunchTime=0,finishTime=1503651976016,resourcesPerMap=5120,resourcesPerReduce=0,numMaps=1,numReduces=0,user=mapr,queue=default,status=SUCCEEDED,mapSlotSeconds=19,reduceSlotSeconds=0,jobName=SELECT
>  `C_7361705f7662726b`.`vbe...20170825)(Stage-1)
>  
> Please, have you any explication or solution of this issue ?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Updated] (HADOOP-14858) Why Yarn crashes ?

Reply via email to