[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720982#comment-13720982
 ] 

Jason Lowe commented on MAPREDUCE-5423:
---------------------------------------

This may be a duplicate of MAPREDUCE-4842.
                
> Rare deadlock situation when reducers try to fetch map output
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-5423
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5423
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Chu Tong
>
> During our cluster deployment, we found there is a very rare deadlock 
> situation when reducers try to fetch map output. We had 5 fetchers and log 
> snippet illustrates this problem is below (all fetchers went into a wait 
> state after they can't acquire more RAM beyond the memoryLimit and no fetcher 
> is releasing memory):
> 2013-07-18 04:32:28,135 INFO [main] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: MergerManager: 
> memoryLimit=1503238528, maxSingleShuffleLimit=375809632, 
> mergeThreshold=992137472, ioSortFactor=10, memToMemMergeOutputsThreshold=10
> 2013-07-18 04:32:28,138 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0 Thread started: EventFetcher for 
> fetching Map Completion Events
> 2013-07-18 04:32:28,146 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:28,146 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:28,319 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000017_0
>  sent hash and receievd reply
> 2013-07-18 04:32:28,320 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_000017_0 decomp: 27 len: 31 to 
> MEMORY
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 27 bytes from 
> map-output for attempt_1373902166027_0622_m_000017_0
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 27, inMemoryMapOutputs.size() -> 1, commitMemory -> 
> 0, usedMemory ->27
> 2013-07-18 04:32:28,325 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 179s
> 2013-07-18 04:32:33,158 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:33,158 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:33,161 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000016_0
>  sent hash and receievd reply
> 2013-07-18 04:32:33,200 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_000016_0 decomp: 55841282 len: 
> 55841286 to MEMORY
> 2013-07-18 04:32:33,322 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 55841282 bytes from 
> map-output for attempt_1373902166027_0622_m_000016_0
> 2013-07-18 04:32:33,323 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 55841282, inMemoryMapOutputs.size() -> 2, commitMemory 
> -> 27, usedMemory ->55841309
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 118022137 bytes from 
> map-output for attempt_1373902166027_0622_m_000015_0
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 118022137, inMemoryMapOutputs.size() -> 3, 
> commitMemory -> 55841309, usedMemory ->173863446
> 2013-07-18 04:32:39,594 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 413s
> 2013-07-18 04:32:42,188 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:42,188 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:42,188 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:42,190 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000014_0
>  sent hash and receievd reply
> 2013-07-18 04:32:42,277 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_000014_0 decomp: 140715962 len: 
> 140715966 to MEMORY
> 2013-07-18 04:32:42,493 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 140715962 bytes from 
> map-output for attempt_1373902166027_0622_m_000014_0
> 2013-07-18 04:32:42,493 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 140715962, inMemoryMapOutputs.size() -> 4, 
> commitMemory -> 173863446, usedMemory ->314579408
> 2013-07-18 04:32:42,494 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 306s
> 2013-07-18 04:32:43,192 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:43,192 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:43,192 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:43,195 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000013_0
>  sent hash and receievd reply
> 2013-07-18 04:32:43,280 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_000013_0 decomp: 141243082 len: 
> 141243086 to MEMORY
> 2013-07-18 04:32:43,506 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 141243082 bytes from 
> map-output for attempt_1373902166027_0622_m_000013_0
> 2013-07-18 04:32:43,506 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 141243082, inMemoryMapOutputs.size() -> 5, 
> commitMemory -> 314579408, usedMemory ->455822490
> 2013-07-18 04:32:43,507 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#1 in 315s
> 2013-07-18 04:32:44,195 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:44,195 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 1 to fetcher#1
> 2013-07-18 04:32:44,195 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 1 of 1 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#1
> 2013-07-18 04:32:44,198 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000011_0
>  sent hash and receievd reply
> 2013-07-18 04:32:44,305 INFO [fetcher#1] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 about to shuffle 
> output of map attempt_1373902166027_0622_m_000011_0 decomp: 173528412 len: 
> 173528416 to MEMORY
> ...
> 2013-07-18 04:32:56,901 INFO [fetcher#2] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 282474777 bytes from 
> map-output for attempt_1373902166027_0622_m_000001_0
> 2013-07-18 04:32:56,901 INFO [fetcher#2] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 282474777, inMemoryMapOutputs.size() -> 5, 
> commitMemory -> 1179552807, usedMemory ->1462027584
> 2013-07-18 04:32:56,901 INFO [fetcher#2] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#2 in 2682s
> 2013-07-18 04:32:56,901 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#4
> 2013-07-18 04:32:56,902 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 4 of 4 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#4
> 2013-07-18 04:32:56,904 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000006_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000003_0,attempt_1373902166027_0622_m_000005_0
>  sent hash and receievd reply
> 2013-07-18 04:32:57,336 INFO [EventFetcher for fetching Map Completion 
> Events] org.apache.hadoop.mapreduce.task.reduce.EventFetcher: 
> attempt_1373902166027_0622_r_000001_0: Got 1 new map-outputs
> 2013-07-18 04:32:57,414 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 about to shuffle 
> output of map attempt_1373902166027_0622_m_000006_0 decomp: 280156692 len: 
> 280156696 to MEMORY
> 2013-07-18 04:32:57,867 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: Read 280156692 bytes from 
> map-output for attempt_1373902166027_0622_m_000006_0
> 2013-07-18 04:32:57,867 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.MergeManager: closeInMemoryFile -> 
> map-output of size: 280156692, inMemoryMapOutputs.size() -> 6, 
> commitMemory -> 1462027584, usedMemory ->1742184276
> 2013-07-18 04:32:57,900 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#4 - MergerManager 
> returned Status.WAIT ...
> 2013-07-18 04:32:57,901 INFO [fetcher#4] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: 
> 101-09-04.sc1.verticloud.com:8080 freed by fetcher#4 in 999s
> 2013-07-18 04:32:57,901 INFO [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: Assiging 
> 101-09-04.sc1.verticloud.com:8080 with 4 to fetcher#3
> 2013-07-18 04:32:57,901 INFO [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler: assigned 4 of 4 to 
> 101-09-04.sc1.verticloud.com:8080 to fetcher#3
> 2013-07-18 04:32:57,903 INFO [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: for 
> url=8080/mapOutput?job=job_1373902166027_0622&reduce=1&map=attempt_1373902166027_0622_m_000000_0,attempt_1373902166027_0622_m_000002_0,attempt_1373902166027_0622_m_000005_0,attempt_1373902166027_0622_m_000003_0
>  sent hash and receievd reply
> 2013-07-18 04:32:57,904 INFO [fetcher#3] 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#3 - MergerManager 
> returned Status.WAIT ...
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to