[jira] [Commented] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-14 Thread Jooseong Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166894#comment-16166894
 ] 

Jooseong Kim commented on MAPREDUCE-6957:
-

Thank you for the quick review!

> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
>Assignee: Jooseong Kim
> Fix For: 2.9.0, 3.0.0-beta1, 2.7.5, 2.8.3
>
> Attachments: MAPREDUCE-6957.001.patch, MAPREDUCE-6957.002.patch, 
> MAPREDUCE-6957.003.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-13 Thread Jooseong Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jooseong Kim updated MAPREDUCE-6957:

Attachment: MAPREDUCE-6957.003.patch

Fixed checkstyle errors. Missed the errors before :( Sorry about that.

> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
>Assignee: Jooseong Kim
> Attachments: MAPREDUCE-6957.001.patch, MAPREDUCE-6957.002.patch, 
> MAPREDUCE-6957.003.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-12 Thread Jooseong Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jooseong Kim updated MAPREDUCE-6957:

Status: Patch Available  (was: In Progress)

> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
>Assignee: Jooseong Kim
> Attachments: MAPREDUCE-6957.001.patch, MAPREDUCE-6957.002.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-12 Thread Jooseong Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jooseong Kim updated MAPREDUCE-6957:

Attachment: MAPREDUCE-6957.002.patch

Thank you for the review and pointing out the other bug [~jlowe]. I added a 
call to output.abort() for the case where a fetch is completing for a finished 
map. Please let me know what you think.

> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
>Assignee: Jooseong Kim
> Attachments: MAPREDUCE-6957.001.patch, MAPREDUCE-6957.002.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Work started] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-11 Thread Jooseong Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-6957 started by Jooseong Kim.
---
> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
>Assignee: Jooseong Kim
> Attachments: MAPREDUCE-6957.001.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-11 Thread Jooseong Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jooseong Kim updated MAPREDUCE-6957:

Attachment: MAPREDUCE-6957.001.patch

The patch removes the call to putBackKnownMapOutput from openShuffleUrl and 
leaves only one call site in copyFromHost.

> shuffle hangs after a node manager connection timeout
> -
>
> Key: MAPREDUCE-6957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Jooseong Kim
> Attachments: MAPREDUCE-6957.001.patch
>
>
> After a connection failure from the reducer to the node manager, shuffles 
> started to hang with the following message:
> {code}
> org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
> returned status WAIT ...
> {code}
> There are two problems that leads to the hang.
> Problem 1.
> When a reducer has an issue connecting to the node manager, copyFromHost may 
> call putBackKnownMapOutput on the same task attempt multiple times.
> There are two call sites of putBackKnownMapOutput in copyFromHost since 
> MAPREDUCE-6303:
> 1. In the finally block of copyFromHost
> 2. In the catch block of openShuffleUrl.
> When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
> returns null.
> By the time openShuffleUrl returns null, putBackKnownMapOutput would have 
> been called already for all remaining map outputs.
> However, the finally block calls putBackKnownMapOutput one more time on the 
> map outputs.
> Problem 2. Problem 1 causes a leak in MergeManager.
> The problem occurs when multiple fetchers get the same set of map attempt 
> outputs to fetch.
> Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
> for the same map outputs.
> When the fetch succeeds, only the first map output gets committed through 
> ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
> commit() is gated by !finishedMaps[mapIndex].
> This may lead to a condition where usedMemory > memoryLimit, while 
> commitMemory < mergeThreshold.
> This gets the MergeManager into a deadlock where a merge is never triggered 
> while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6957) shuffle hangs after a node manager connection timeout

2017-09-11 Thread Jooseong Kim (JIRA)
Jooseong Kim created MAPREDUCE-6957:
---

 Summary: shuffle hangs after a node manager connection timeout
 Key: MAPREDUCE-6957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6957
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Jooseong Kim


After a connection failure from the reducer to the node manager, shuffles 
started to hang with the following message:

{code}
org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#1 - MergeManager 
returned status WAIT ...
{code}

There are two problems that leads to the hang.

Problem 1.
When a reducer has an issue connecting to the node manager, copyFromHost may 
call putBackKnownMapOutput on the same task attempt multiple times.

There are two call sites of putBackKnownMapOutput in copyFromHost since 
MAPREDUCE-6303:
1. In the finally block of copyFromHost
2. In the catch block of openShuffleUrl.

When openShuffleUrl fails to connect from the catch block in copyFromHost, it 
returns null.
By the time openShuffleUrl returns null, putBackKnownMapOutput would have been 
called already for all remaining map outputs.
However, the finally block calls putBackKnownMapOutput one more time on the map 
outputs.

Problem 2. Problem 1 causes a leak in MergeManager.
The problem occurs when multiple fetchers get the same set of map attempt 
outputs to fetch.
Different fetchers reserves memory from MergeManager in Fetcher.copyMapOutput 
for the same map outputs.
When the fetch succeeds, only the first map output gets committed through 
ShuffleSchedulerImpl.copySucceeded -> InMemoryMapOutput.commit, because 
commit() is gated by !finishedMaps[mapIndex].
This may lead to a condition where usedMemory > memoryLimit, while commitMemory 
< mergeThreshold.
This gets the MergeManager into a deadlock where a merge is never triggered 
while MergeManager cannot reserve additional space for map outputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6302) Preempt reducers after a configurable timeout irrespective of headroom

2015-10-30 Thread Jooseong Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983656#comment-14983656
 ] 

Jooseong Kim commented on MAPREDUCE-6302:
-

I think this usually happens when the RM sends out a overestimated headroom.
One thing we could do is to skip scheduleReduces() if we ended up preempting 
reducers through preemptReducesIfNeeded().
Since the headroom is underestimated, scheduleReduces may schedule more 
reducers, which will need to be preempted again.

> Preempt reducers after a configurable timeout irrespective of headroom
> --
>
> Key: MAPREDUCE-6302
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6302
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: mai shurong
>Assignee: Karthik Kambatla
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: AM_log_head10.txt.gz, AM_log_tail10.txt.gz, 
> log.txt, mr-6302-1.patch, mr-6302-2.patch, mr-6302-3.patch, mr-6302-4.patch, 
> mr-6302-5.patch, mr-6302-6.patch, mr-6302-7.patch, mr-6302-prelim.patch, 
> mr-6302_branch-2.patch, queue_with_max163cores.png, 
> queue_with_max263cores.png, queue_with_max333cores.png
>
>
> I submit a  big job, which has 500 maps and 350 reduce, to a 
> queue(fairscheduler) with 300 max cores. When the big mapreduce job is 
> running 100% maps, the 300 reduces have occupied 300 max cores in the queue. 
> And then, a map fails and retry, waiting for a core, while the 300 reduces 
> are waiting for failed map to finish. So a deadlock occur. As a result, the 
> job is blocked, and the later job in the queue cannot run because no 
> available cores in the queue.
> I think there is the similar issue for memory of a queue .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)