[jira] [Comment Edited] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784
 ] 

Erik Krogen edited comment on MAPREDUCE-6870 at 8/11/17 6:18 PM:
-

Given its rarity and that the worst case scenario is {{(expected execution 
time) + (single mapper execution time)}} I would consider it not a severe 
issue, which leans me towards compatibility. However the current behavior is 
pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we 
definitely need to maintain compatibility and thus have default = false. As for 
trunk/3.0.0 I am on the fence.


was (Author: xkrogen):
Given its rarity and that the worst case scenario is {{(expected execution 
time) + (single mapper execution time)}} I would consider it not a severe 
issue, which leans me towards compatibility. However the current behavior is 
pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we 
definitely need to maintain compatibility and thus have default = false. As for 
trunk I am on the fence.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-04 Thread Peter Bacsko (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114240#comment-16114240
 ] 

Peter Bacsko edited comment on MAPREDUCE-6870 at 8/4/17 1:49 PM:
-

What's your suggestion to the variables?
My ideas: {{finishJobWhenReducersDone}}, 
{{MRJobConfig.FINISH_JOB_WHEN_REDUCERS_DONE}}

Preventing {{TA_KILL}} events: basically I just store a state information in 
each {{MapTaskImpl}}. But it's unnecessary since you can store this in a single 
variable after sending the kill events. So your approach is better.

New test: in a real environment, certain events are coming from {{TaskImpl}} 
and {{TaskAttemptImpl}}. However in the tests, these are mocked inside 
{{JobImpl}}, so you have to generate them manually. To properly test the 
behavior of this change, it might make sense to use the real impl classes 
instead of mocks.

bq. Also, do we expect the job the succeed even when killMappers is set to 
false?

Only if we send the completion events. If we don't, then of course it stays in 
RUNNING. I took the idea of finishing mappers/reducers from this test: 
https://github.com/apache/hadoop/blob/78b487bde175544ebe40e4dafab35569baa1d79e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java#L597-L625


was (Author: pbacsko):
What's your suggestion to the variables?
My ideas: {{finishJobWhenReducersDone}}, 
{{MRJobConfig.FINISH_JOB_WHEN_REDUCERS_DONE}}

Preventing {{TA_KILL}} events: basically I just store a state information in 
each {{MapTaskImpl}}. But it's unnecessary since you can store this in a single 
variable after sending the kill events. So your approach is better.

New test: in {{TestJobImpl}}, certain events are coming from {{TaskImpl}} and 
{{TaskAttemptImpl}}. However these are mocked inside {{JobImpl}}, so you have 
to generate them manually. To properly test the behavior of this change, it 
might make sense to use the real impl classes instead of mocks.

bq. Also, do we expect the job the succeed even when killMappers is set to 
false?

Only if we send the completion events. If we don't, then of course it stays in 
RUNNING. I took the idea of finishing mappers/reducers from this test: 
https://github.com/apache/hadoop/blob/78b487bde175544ebe40e4dafab35569baa1d79e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestJobImpl.java#L597-L625

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org