[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

Sharad Agarwal (Commented) (JIRA) Tue, 07 Feb 2012 23:01:45 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13203325#comment-13203325
 ]


Sharad Agarwal commented on MAPREDUCE-3802:
-------------------------------------------

bq. need to understand a little bit better how these names are determined
The task attemptIds are unique across all the generations of AM. This is to 
avoid any remote task attempt from previous generation of AM joining the 
current AM. The assumption is there won't be more than 1000 attempts of a task 
in AM run. The suffix part of task attemptId is determined as follows:
 _(AMGeneration-1)*1000. For first AM it will start from 0. For second it will 
start from 1000, for third from 2000 ..


                
> If an MR AM dies twice  it looks like the process freezes
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-3802
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3802
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: applicationmaster, mrv2
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Critical
>         Attachments: syslog
>
>
> It looks like recovering from an RM AM dieing works very well on a single 
> failure.  But if it fails multiple times we appear to get into a live lock 
> situation.
> {noformat}
> yarn jar 
> hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*-SNAPSHOT.jar 
> wordcount -Dyarn.app.mapreduce.am.log.level=DEBUG -Dmapreduce.job.reduces=30 
> input output
> 12/02/03 21:06:57 WARN conf.Configuration: fs.default.name is deprecated. 
> Instead, use fs.defaultFS
> 12/02/03 21:06:57 WARN conf.Configuration: mapred.used.genericoptionsparser 
> is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
> 12/02/03 21:06:57 INFO input.FileInputFormat: Total input paths to process : 
> 17
> 12/02/03 21:06:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library
> 12/02/03 21:06:57 WARN snappy.LoadSnappy: Snappy native library not loaded
> 12/02/03 21:06:57 INFO mapreduce.JobSubmitter: number of splits:17
> 12/02/03 21:06:57 INFO mapred.ResourceMgrDelegate: Submitted application 
> application_1328302034486_0003 to ResourceManager at HOST/IP:8040
> 12/02/03 21:06:57 INFO mapreduce.Job: The url to track the job: 
> http://HOST:8088/proxy/application_1328302034486_0003/
> 12/02/03 21:06:57 INFO mapreduce.Job: Running job: job_1328302034486_0003
> 12/02/03 21:07:03 INFO mapreduce.Job: Job job_1328302034486_0003 running in 
> uber mode : false
> 12/02/03 21:07:03 INFO mapreduce.Job:  map 0% reduce 0%
> 12/02/03 21:07:09 INFO mapreduce.Job:  map 5% reduce 0%
> 12/02/03 21:07:10 INFO mapreduce.Job:  map 17% reduce 0%
> #KILLED AM with kill -9 here
> 12/02/03 21:07:16 INFO mapreduce.Job:  map 29% reduce 0%
> 12/02/03 21:07:17 INFO mapreduce.Job:  map 35% reduce 0%
> 12/02/03 21:07:30 INFO mapreduce.Job:  map 52% reduce 0%
> 12/02/03 21:07:35 INFO mapreduce.Job:  map 58% reduce 0%
> 12/02/03 21:07:37 INFO mapreduce.Job:  map 70% reduce 0%
> 12/02/03 21:07:41 INFO mapreduce.Job:  map 76% reduce 0%
> 12/02/03 21:07:43 INFO mapreduce.Job:  map 82% reduce 0%
> 12/02/03 21:07:44 INFO mapreduce.Job:  map 88% reduce 0%
> 12/02/03 21:07:47 INFO mapreduce.Job:  map 94% reduce 0%
> 12/02/03 21:07:49 INFO mapreduce.Job:  map 100% reduce 0%
> 12/02/03 21:07:53 INFO mapreduce.Job:  map 100% reduce 3%
> 12/02/03 21:08:00 INFO mapreduce.Job:  map 100% reduce 6%
> 12/02/03 21:08:06 INFO mapreduce.Job:  map 100% reduce 10%
> 12/02/03 21:08:12 INFO mapreduce.Job:  map 100% reduce 13%
> 12/02/03 21:08:18 INFO mapreduce.Job:  map 100% reduce 16%
> #killed AM with kill -9 here
> 12/02/03 21:08:20 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
> Already tried 0 time(s).
> 12/02/03 21:08:21 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
> Already tried 1 time(s).
> 12/02/03 21:08:22 INFO ipc.Client: Retrying connect to server: HOST/IP:44223. 
> Already tried 2 time(s).
> 12/02/03 21:08:26 INFO mapreduce.Job:  map 64% reduce 16%
> #It never makes any more progress...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3802) If an MR AM dies twice it looks like the process freezes

Reply via email to