[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate

2018-08-28 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595181#comment-16595181
 ] 

Erik Krogen commented on MAPREDUCE-7131:


[~pbacsko], we are seeing the issue in 2.7.4, and MAPREDUCE-7015 is only as far 
back as 2.10, so it should not be the cause.

> Job History Server has race condition where it moves files from intermediate 
> to finished but thinks file is in intermediate
> ---
>
> Key: MAPREDUCE-7131
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.4
>Reporter: Anthony Hsu
>Assignee: Anthony Hsu
>Priority: Major
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, 
> *HistoryFileInfo.moveToDone()* is scheduled for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put 
> in the *fileStatusList* to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history 
> files are moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 
> is created (history, conf, and summary files will point to the intermediate 
> user directory, and state will be IN_INTERMEDIATE)
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because 
> the source path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the 
> *jobListCache*, the JobHistoryServer will think the history file is in the 
> intermediate directory. If a user queries this job in the JobHistoryServer 
> UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load 
> history file 
> ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still 
> exist in trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability

2018-07-24 Thread Erik Krogen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554470#comment-16554470
 ] 

Erik Krogen commented on MAPREDUCE-7118:


[~leftnoteasy], we should put this in branch-3.0 as well, right?

> Distributed cache conflicts breaks backwards compatability
> --
>
> Key: MAPREDUCE-7118
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 3.1.0, 3.2.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 3.2.0, 3.1.1
>
> Attachments: MAPREDUCE-7118.001.patch
>
>
> MAPREDUCE-4503 made distributed cache conflicts break job submission, but 
> this was quickly downgraded to a warning in MAPREDUCE-4549.  Unfortunately 
> the latter did not go into trunk, so the fix is only in 0.23 and 2.x.  When 
> Oozie, Pig, and other downstream projects that can occasionally generate 
> distributed cache conflicts move to Hadoop 3.x the workflows that used to 
> work on 0.23 and 2.x no longer function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6918) ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely

2017-10-27 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen resolved MAPREDUCE-6918.

Resolution: Duplicate

> ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely
> 
>
> Key: MAPREDUCE-6918
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6918
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Erik Krogen
>
> We recently noticed that the {{mapred.ShuffleMetrics.ShuffleConnections}} 
> metric seems to climb infinitely, up to many millions (see attached graph), 
> despite being supposedly a gauge measure of the number of open connections:
> {code:title=ShuffleHandler.java}
> @Metric("# of current shuffle connections")
> MutableGaugeInt shuffleConnections;
> {code}
> It seems that shuffleConnections gets incremented once for every map fetched, 
> but only decremented once for every request. It seems to me it should be 
> modified to only be incremented once for every request rather than for every 
> map fetched, but I'm not familiar with the original intent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-09-11 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6870:
---
Release Note: Enables {{mapreduce.job.finish-when-all-reducers-done}} by 
default. With this enabled, a MapReduce job will complete as soon as all of its 
reducers are complete, even if some mappers are still running. This can occur 
if a mapper was relaunched after node failure but the relaunched task's output 
is not actually needed. Previously the job would wait for all mappers to 
complete.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-09-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162015#comment-16162015
 ] 

Erik Krogen commented on MAPREDUCE-6870:


Good idea, [~andrew.wang], thanks for the reminder. Done.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-09-11 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6870:
---
Release Note: Enables mapreduce.job.finish-when-all-reducers-done by 
default. With this enabled, a MapReduce job will complete as soon as all of its 
reducers are complete, even if some mappers are still running. This can occur 
if a mapper was relaunched after node failure but the relaunched task's output 
is not actually needed. Previously the job would wait for all mappers to 
complete.  (was: Enables {{mapreduce.job.finish-when-all-reducers-done}} by 
default. With this enabled, a MapReduce job will complete as soon as all of its 
reducers are complete, even if some mappers are still running. This can occur 
if a mapper was relaunched after node failure but the relaunched task's output 
is not actually needed. Previously the job would wait for all mappers to 
complete.)

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-09-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161910#comment-16161910
 ] 

Erik Krogen commented on MAPREDUCE-6937:


Big thanks to [~pbacsko] and [~haibo.chen] for working on this and helping us 
to backport! It is much appreciated.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.9.0, 2.8.2, 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-22 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137053#comment-16137053
 ] 

Erik Krogen commented on MAPREDUCE-6937:


Those branches are where we would be interested in seeing it available, yes.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-22 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6937:
---
Description: 
To maintain compatibility we need to disable this by default per discussion on 
MAPREDUCE-6870.

Using a separate JIRA to correctly track incompatibilities.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-14 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125974#comment-16125974
 ] 

Erik Krogen commented on MAPREDUCE-6937:


Hey [~pbacsko]/[~haibo.chen], any interest in backporting to older release 
lines? Looks like branch-2 is clean and 2.8/2.7 have very minor conflicts.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123993#comment-16123993
 ] 

Erik Krogen commented on MAPREDUCE-6870:


Sounds good; I am in agreement. Since this should be marked as incompatible but 
the backport should not, shall I create a separate JIRA for the backport, so 
that we can have the release scripts properly track incompatibility?

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784
 ] 

Erik Krogen commented on MAPREDUCE-6870:


Given its rarity and that the worst case scenario is {{(expected execution 
time) + (single mapper execution time)}} I would consider it not a severe 
issue, which leans me towards compatibility. However the current behavior is 
pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we 
definitely need to maintain compatibility and thus have default = false. As for 
trunk I am on the fence.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784
 ] 

Erik Krogen edited comment on MAPREDUCE-6870 at 8/11/17 6:18 PM:
-

Given its rarity and that the worst case scenario is {{(expected execution 
time) + (single mapper execution time)}} I would consider it not a severe 
issue, which leans me towards compatibility. However the current behavior is 
pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we 
definitely need to maintain compatibility and thus have default = false. As for 
trunk/3.0.0 I am on the fence.


was (Author: xkrogen):
Given its rarity and that the worst case scenario is {{(expected execution 
time) + (single mapper execution time)}} I would consider it not a severe 
issue, which leans me towards compatibility. However the current behavior is 
pretty confusing for an average user, so, tough call.

We would like to backport this to older release lines, in which case we 
definitely need to maintain compatibility and thus have default = false. As for 
trunk I am on the fence.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123493#comment-16123493
 ] 

Erik Krogen commented on MAPREDUCE-6870:


[~haibo.chen], [~pbacsko], thank you for working on this! To provide some 
context, the reason we wanted it to be configurable is in case mapper tasks 
have side effects which are expected to be executed in full. For example, you 
may have a map task which deletes an output directory as it starts, then 
populates that directory. With this patch in effect, you could potentially wipe 
the output of a previous map tasks's execution and then never fully repopulate 
it (since the mapper is preempted). It's a pretty niche case but who knows what 
MR behavior people might be relying on.

Given that this patch is enabling the new behavior by default, should this be 
marked as an incompatible change? Ping [~templedf], [~andrew.wang], who I know 
are working on compatibility guidelines.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely

2017-07-25 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6919:
---
Attachment: MAPREDUCE-6919.test.patch

Attaching a unit test which reproduces the issue. Had to refactor 
{{ShuffleHandler.Shuffle}} a little to get the test to work nicely.

> ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely
> ---
>
> Key: MAPREDUCE-6919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Erik Krogen
> Attachments: mapred_ShuffleMetrics_ShuffleConnections.png, 
> MAPREDUCE-6919.test.patch
>
>
> We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric 
> rises indefinitely (see attached graph), despite supposedly being a gauge 
> measuring the number of currently open connections:
> {code:title=ShuffleHandler.java}
> @Metric("# of current shuffle connections")
> MutableGaugeInt shuffleConnections;
> {code}
> It seems this is because the metric is incremented once for each map file 
> sent, but decremented once for each request. Thus a request which fetches 
> multiple map files permanently increments shuffleConnections by (mapsFetched 
> - 1).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely

2017-07-25 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6919:
---
Attachment: mapred_ShuffleMetrics_ShuffleConnections.png

> ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely
> ---
>
> Key: MAPREDUCE-6919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Erik Krogen
> Attachments: mapred_ShuffleMetrics_ShuffleConnections.png
>
>
> We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric 
> rises indefinitely (see attached graph), despite supposedly being a gauge 
> measuring the number of currently open connections:
> {code:title=ShuffleHandler.java}
> @Metric("# of current shuffle connections")
> MutableGaugeInt shuffleConnections;
> {code}
> It seems this is because the metric is incremented once for each map file 
> sent, but decremented once for each request. Thus a request which fetches 
> multiple map files permanently increments shuffleConnections by (mapsFetched 
> - 1).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely

2017-07-25 Thread Erik Krogen (JIRA)
Erik Krogen created MAPREDUCE-6919:
--

 Summary: ShuffleMetrics.ShuffleConnections Gauge Metric Rises 
Infinitely
 Key: MAPREDUCE-6919
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Erik Krogen


We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric 
rises indefinitely (see attached graph), despite supposedly being a gauge 
measuring the number of currently open connections:
{code:title=ShuffleHandler.java}
@Metric("# of current shuffle connections")
MutableGaugeInt shuffleConnections;
{code}

It seems this is because the metric is incremented once for each map file sent, 
but decremented once for each request. Thus a request which fetches multiple 
map files permanently increments shuffleConnections by (mapsFetched - 1).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6918) ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely

2017-07-25 Thread Erik Krogen (JIRA)
Erik Krogen created MAPREDUCE-6918:
--

 Summary: ShuffleMetrics.ShuffleConnections Gauge Metric Climbs 
Infinitely
 Key: MAPREDUCE-6918
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6918
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Erik Krogen


We recently noticed that the {{mapred.ShuffleMetrics.ShuffleConnections}} 
metric seems to climb infinitely, up to many millions (see attached graph), 
despite being supposedly a gauge measure of the number of open connections:
{code:title=ShuffleHandler.java}
@Metric("# of current shuffle connections")
MutableGaugeInt shuffleConnections;
{code}

It seems that shuffleConnections gets incremented once for every map fetched, 
but only decremented once for every request. It seems to me it should be 
modified to only be incremented once for every request rather than for every 
map fetched, but I'm not familiar with the original intent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-28 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989035#comment-15989035
 ] 

Erik Krogen commented on MAPREDUCE-5951:


Ah, excellent point, [~jlowe]... I actually would love to hear the reasoning 
behind the current strategy of  AM downloads 
resource -> AM uploads resource to SCM> rather than the seemingly more 
obvious/simpler . Is this so that the uploading 
to SCM can be done by the NM, which is a privileged user, to have more secure 
control over it?

[~ctrezzo], first off thanks for getting back so quickly! And for the pointer 
to YARN-5727; that's an interesting issue. The public visibility solution is 
certainly simpler from the YARN side and seems pretty reasonable from a point 
of expectation of burden on an application ("you want a publicly shared 
resource? put it somewhere public"). It  doesn't add _too_ much complexity on 
the MR side, though having a separate staging directory just for public 
resources is a bit cumbersome. It also means that other application developers 
will have to build the same type of logic - in general I would lean towards 
more logic pushed into the YARN level so that it is easy for application devs 
to support. I don't have good insight into how difficult your initially 
proposed solution in YARN-5727 would be to implement, though.

> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-27 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987585#comment-15987585
 ] 

Erik Krogen edited comment on MAPREDUCE-5951 at 4/27/17 8:22 PM:
-

Hey [~ctrezzo], I have a question about the behavior of this patch. Currently 
the old logic for resource visibility is used, so if a resource is 
world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current 
understanding of this patch's behavior, I see the following scenario:
* Client submits a job with libjar X, which has never been used before. Client 
contacts SCM to mark X as "used", SCM responds that it does not have X.
* Client uploads X to staging directory, which I assume here is _not_ 
world-readable. X is marked as PRIVATE.
* MR-AM localizes X, then uploads it to the shared cache. Other NMs all 
localize X as PRIVATE and do not share it with other applications.
* Client then submits the same job with the same X. Client contacts SCM, and 
SCM responds with a world-readable (755 dirs / 555 file) path inside of the 
shared cache.
* Client does not upload X, and marks X as PUBLIC, since it is currently in a 
world-readable location. 
* MR-AM and NMs all localize X as PUBLIC and share it with other applications.

Please correct me if I am wrong on any of these steps. It seems that it is the 
expected behavior that X is eventually PUBLIC, given that we asked for it to be 
uploaded to the publicly shared cache, but it seems unnecessary for it to be 
marked as PRIVATE the first time around. Do we do this just to avoid changing 
the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an 
oversight, or is this behavior desired?


was (Author: xkrogen):
Hey [~ctrezzo], I have a question about the behavior of this patch. Currently 
the old logic for resource visibility is used, so if a resource is 
world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current 
understanding of this patch's behavior, I see the following scenario:
* Client submits a job with libjar X, which has never been used before. Client 
contacts SCM to mark X as "used", SCM responds that it does not have X.
* Client uploads X to staging directory, which I assume here is _not_ 
world-readable. X is marked as PRIVATE.
* MR-AM localizes X, then uploads it to the shared cache. Other NMs all 
localize X as PRIVATE and do not share it with other applications.
* Client then submits the same job with the same X. Client contacts SCM, and 
SCM responds with a world-readable (755 dirs / 555 file) path inside of the 
shared cache.
* Client does not upload X, and marks X as PUBLIC, since it is currently in a 
world-readable location. 
* MR-AM and NMs all localize X as PUBLIC and share it with other applications.
Please correct me if I am wrong on any of these steps. It seems that it is the 
expected behavior that X is eventually PUBLIC, given that we asked for it to be 
uploaded to the publicly shared cache, but it seems unnecessary for it to be 
marked as PRIVATE the first time around. Do we do this just to avoid changing 
the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an 
oversight, or is this behavior desired?

> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache

2017-04-27 Thread Erik Krogen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987585#comment-15987585
 ] 

Erik Krogen commented on MAPREDUCE-5951:


Hey [~ctrezzo], I have a question about the behavior of this patch. Currently 
the old logic for resource visibility is used, so if a resource is 
world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current 
understanding of this patch's behavior, I see the following scenario:
* Client submits a job with libjar X, which has never been used before. Client 
contacts SCM to mark X as "used", SCM responds that it does not have X.
* Client uploads X to staging directory, which I assume here is _not_ 
world-readable. X is marked as PRIVATE.
* MR-AM localizes X, then uploads it to the shared cache. Other NMs all 
localize X as PRIVATE and do not share it with other applications.
* Client then submits the same job with the same X. Client contacts SCM, and 
SCM responds with a world-readable (755 dirs / 555 file) path inside of the 
shared cache.
* Client does not upload X, and marks X as PUBLIC, since it is currently in a 
world-readable location. 
* MR-AM and NMs all localize X as PUBLIC and share it with other applications.
Please correct me if I am wrong on any of these steps. It seems that it is the 
expected behavior that X is eventually PUBLIC, given that we asked for it to be 
uploaded to the publicly shared cache, but it seems unnecessary for it to be 
marked as PRIVATE the first time around. Do we do this just to avoid changing 
the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an 
oversight, or is this behavior desired?

> Add support for the YARN Shared Cache
> -
>
> Key: MAPREDUCE-5951
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-5951-Overview.001.pdf, 
> MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, 
> MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, 
> MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, 
> MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, 
> MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, 
> MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, 
> MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, 
> MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, 
> MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, 
> MAPREDUCE-5951-trunk-v9.patch
>
>
> Implement the necessary changes so that the MapReduce application can 
> leverage the new YARN shared cache (i.e. YARN-1492).
> Specifically, allow per-job configuration so that MapReduce jobs can specify 
> which set of resources they would like to cache (i.e. jobjar, libjars, 
> archives, files).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS

2017-03-29 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6873:
---
Attachment: MAPREDUCE-6873.000.patch

Attaching one-liner patch...

> MR Job Submission Fails if MR framework application path not on defaultFS
> -
>
> Key: MAPREDUCE-6873
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Erik Krogen
>Priority: Minor
> Attachments: MAPREDUCE-6873.000.patch
>
>
> {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that 
> {{mapreduce.framework.application.path}} has a FS which matches 
> {{fs.defaultFS}} which may not always be true. This is just a consequence of 
> using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, 
> Configuration)}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS

2017-03-29 Thread Erik Krogen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen updated MAPREDUCE-6873:
---
Status: Patch Available  (was: Open)

> MR Job Submission Fails if MR framework application path not on defaultFS
> -
>
> Key: MAPREDUCE-6873
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 2.6.0
>Reporter: Erik Krogen
>Priority: Minor
> Attachments: MAPREDUCE-6873.000.patch
>
>
> {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that 
> {{mapreduce.framework.application.path}} has a FS which matches 
> {{fs.defaultFS}} which may not always be true. This is just a consequence of 
> using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, 
> Configuration)}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS

2017-03-29 Thread Erik Krogen (JIRA)
Erik Krogen created MAPREDUCE-6873:
--

 Summary: MR Job Submission Fails if MR framework application path 
not on defaultFS
 Key: MAPREDUCE-6873
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Erik Krogen
Priority: Minor


{{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that 
{{mapreduce.framework.application.path}} has a FS which matches 
{{fs.defaultFS}} which may not always be true. This is just a consequence of 
using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, 
Configuration)}}. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org