[jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate
[ https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16595181#comment-16595181 ] Erik Krogen commented on MAPREDUCE-7131: [~pbacsko], we are seeing the issue in 2.7.4, and MAPREDUCE-7015 is only as far back as 2.10, so it should not be the cause. > Job History Server has race condition where it moves files from intermediate > to finished but thinks file is in intermediate > --- > > Key: MAPREDUCE-7131 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.7.4 >Reporter: Anthony Hsu >Assignee: Anthony Hsu >Priority: Major > > This is the race condition that can occur: > # during the first *scanIntermediateDirectory()*, > *HistoryFileInfo.moveToDone()* is scheduled for job j1 > # during the second *scanIntermediateDirectory()*, j1 is found again and put > in the *fileStatusList* to process > # *HistoryFileInfo.moveToDone()* is processed in another thread and history > files are moved to the finished directory > # the *HistoryFileInfo* for j1 is removed from *jobListCache* > # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 > is created (history, conf, and summary files will point to the intermediate > user directory, and state will be IN_INTERMEDIATE) > # *moveToDone()* is scheduled for this new j1 > # *moveToDone()* fails during *moveToDoneNow()* for the history file because > the source path in the intermediate directory does not exist > From this point on, while the new j1 *HistoryFileInfo* is in the > *jobListCache*, the JobHistoryServer will think the history file is in the > intermediate directory. If a user queries this job in the JobHistoryServer > UI, they will get > {code} > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load > history file > ://:/mr-history/intermediate//job_1529348381246_27275711-1535123223269---1535127026668-1-0-SUCCEEDED--1535126980787.jhist > {code} > Noticed this issue while running 2.7.4, but the race condition seems to still > exist in trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7118) Distributed cache conflicts breaks backwards compatability
[ https://issues.apache.org/jira/browse/MAPREDUCE-7118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554470#comment-16554470 ] Erik Krogen commented on MAPREDUCE-7118: [~leftnoteasy], we should put this in branch-3.0 as well, right? > Distributed cache conflicts breaks backwards compatability > -- > > Key: MAPREDUCE-7118 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7118 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 3.1.0, 3.2.0 >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Blocker > Fix For: 3.2.0, 3.1.1 > > Attachments: MAPREDUCE-7118.001.patch > > > MAPREDUCE-4503 made distributed cache conflicts break job submission, but > this was quickly downgraded to a warning in MAPREDUCE-4549. Unfortunately > the latter did not go into trunk, so the fix is only in 0.23 and 2.x. When > Oozie, Pig, and other downstream projects that can occasionally generate > distributed cache conflicts move to Hadoop 3.x the workflows that used to > work on 0.23 and 2.x no longer function. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6918) ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely
[ https://issues.apache.org/jira/browse/MAPREDUCE-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen resolved MAPREDUCE-6918. Resolution: Duplicate > ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely > > > Key: MAPREDUCE-6918 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6918 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Erik Krogen > > We recently noticed that the {{mapred.ShuffleMetrics.ShuffleConnections}} > metric seems to climb infinitely, up to many millions (see attached graph), > despite being supposedly a gauge measure of the number of open connections: > {code:title=ShuffleHandler.java} > @Metric("# of current shuffle connections") > MutableGaugeInt shuffleConnections; > {code} > It seems that shuffleConnections gets incremented once for every map fetched, > but only decremented once for every request. It seems to me it should be > modified to only be incremented once for every request rather than for every > map fetched, but I'm not familiar with the original intent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6870: --- Release Note: Enables {{mapreduce.job.finish-when-all-reducers-done}} by default. With this enabled, a MapReduce job will complete as soon as all of its reducers are complete, even if some mappers are still running. This can occur if a mapper was relaunched after node failure but the relaunched task's output is not actually needed. Previously the job would wait for all mappers to complete. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162015#comment-16162015 ] Erik Krogen commented on MAPREDUCE-6870: Good idea, [~andrew.wang], thanks for the reminder. Done. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6870: --- Release Note: Enables mapreduce.job.finish-when-all-reducers-done by default. With this enabled, a MapReduce job will complete as soon as all of its reducers are complete, even if some mappers are still running. This can occur if a mapper was relaunched after node failure but the relaunched task's output is not actually needed. Previously the job would wait for all mappers to complete. (was: Enables {{mapreduce.job.finish-when-all-reducers-done}} by default. With this enabled, a MapReduce job will complete as soon as all of its reducers are complete, even if some mappers are still running. This can occur if a mapper was relaunched after node failure but the relaunched task's output is not actually needed. Previously the job would wait for all mappers to complete.) > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161910#comment-16161910 ] Erik Krogen commented on MAPREDUCE-6937: Big thanks to [~pbacsko] and [~haibo.chen] for working on this and helping us to backport! It is much appreciated. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.9.0, 2.8.2, 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137053#comment-16137053 ] Erik Krogen commented on MAPREDUCE-6937: Those branches are where we would be interested in seeing it available, yes. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Erik Krogen > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6937: --- Description: To maintain compatibility we need to disable this by default per discussion on MAPREDUCE-6870. Using a separate JIRA to correctly track incompatibilities. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Erik Krogen > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125974#comment-16125974 ] Erik Krogen commented on MAPREDUCE-6937: Hey [~pbacsko]/[~haibo.chen], any interest in backporting to older release lines? Looks like branch-2 is clean and 2.8/2.7 have very minor conflicts. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Erik Krogen > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123993#comment-16123993 ] Erik Krogen commented on MAPREDUCE-6870: Sounds good; I am in agreement. Since this should be marked as incompatible but the backport should not, shall I create a separate JIRA for the backport, so that we can have the release scripts properly track incompatibility? > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784 ] Erik Krogen commented on MAPREDUCE-6870: Given its rarity and that the worst case scenario is {{(expected execution time) + (single mapper execution time)}} I would consider it not a severe issue, which leans me towards compatibility. However the current behavior is pretty confusing for an average user, so, tough call. We would like to backport this to older release lines, in which case we definitely need to maintain compatibility and thus have default = false. As for trunk I am on the fence. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123784#comment-16123784 ] Erik Krogen edited comment on MAPREDUCE-6870 at 8/11/17 6:18 PM: - Given its rarity and that the worst case scenario is {{(expected execution time) + (single mapper execution time)}} I would consider it not a severe issue, which leans me towards compatibility. However the current behavior is pretty confusing for an average user, so, tough call. We would like to backport this to older release lines, in which case we definitely need to maintain compatibility and thus have default = false. As for trunk/3.0.0 I am on the fence. was (Author: xkrogen): Given its rarity and that the worst case scenario is {{(expected execution time) + (single mapper execution time)}} I would consider it not a severe issue, which leans me towards compatibility. However the current behavior is pretty confusing for an average user, so, tough call. We would like to backport this to older release lines, in which case we definitely need to maintain compatibility and thus have default = false. As for trunk I am on the fence. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123493#comment-16123493 ] Erik Krogen commented on MAPREDUCE-6870: [~haibo.chen], [~pbacsko], thank you for working on this! To provide some context, the reason we wanted it to be configurable is in case mapper tasks have side effects which are expected to be executed in full. For example, you may have a map task which deletes an output directory as it starts, then populates that directory. With this patch in effect, you could potentially wipe the output of a previous map tasks's execution and then never fully repopulate it (since the mapper is preempted). It's a pretty niche case but who knows what MR behavior people might be relying on. Given that this patch is enabling the new behavior by default, should this be marked as an incompatible change? Ping [~templedf], [~andrew.wang], who I know are working on compatibility guidelines. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely
[ https://issues.apache.org/jira/browse/MAPREDUCE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6919: --- Attachment: MAPREDUCE-6919.test.patch Attaching a unit test which reproduces the issue. Had to refactor {{ShuffleHandler.Shuffle}} a little to get the test to work nicely. > ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely > --- > > Key: MAPREDUCE-6919 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Erik Krogen > Attachments: mapred_ShuffleMetrics_ShuffleConnections.png, > MAPREDUCE-6919.test.patch > > > We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric > rises indefinitely (see attached graph), despite supposedly being a gauge > measuring the number of currently open connections: > {code:title=ShuffleHandler.java} > @Metric("# of current shuffle connections") > MutableGaugeInt shuffleConnections; > {code} > It seems this is because the metric is incremented once for each map file > sent, but decremented once for each request. Thus a request which fetches > multiple map files permanently increments shuffleConnections by (mapsFetched > - 1). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely
[ https://issues.apache.org/jira/browse/MAPREDUCE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6919: --- Attachment: mapred_ShuffleMetrics_ShuffleConnections.png > ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely > --- > > Key: MAPREDUCE-6919 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Erik Krogen > Attachments: mapred_ShuffleMetrics_ShuffleConnections.png > > > We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric > rises indefinitely (see attached graph), despite supposedly being a gauge > measuring the number of currently open connections: > {code:title=ShuffleHandler.java} > @Metric("# of current shuffle connections") > MutableGaugeInt shuffleConnections; > {code} > It seems this is because the metric is incremented once for each map file > sent, but decremented once for each request. Thus a request which fetches > multiple map files permanently increments shuffleConnections by (mapsFetched > - 1). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6919) ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely
Erik Krogen created MAPREDUCE-6919: -- Summary: ShuffleMetrics.ShuffleConnections Gauge Metric Rises Infinitely Key: MAPREDUCE-6919 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6919 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Erik Krogen We recently noticed that the mapred.ShuffleMetrics.ShuffleConnections metric rises indefinitely (see attached graph), despite supposedly being a gauge measuring the number of currently open connections: {code:title=ShuffleHandler.java} @Metric("# of current shuffle connections") MutableGaugeInt shuffleConnections; {code} It seems this is because the metric is incremented once for each map file sent, but decremented once for each request. Thus a request which fetches multiple map files permanently increments shuffleConnections by (mapsFetched - 1). -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6918) ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely
Erik Krogen created MAPREDUCE-6918: -- Summary: ShuffleMetrics.ShuffleConnections Gauge Metric Climbs Infinitely Key: MAPREDUCE-6918 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6918 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Erik Krogen We recently noticed that the {{mapred.ShuffleMetrics.ShuffleConnections}} metric seems to climb infinitely, up to many millions (see attached graph), despite being supposedly a gauge measure of the number of open connections: {code:title=ShuffleHandler.java} @Metric("# of current shuffle connections") MutableGaugeInt shuffleConnections; {code} It seems that shuffleConnections gets incremented once for every map fetched, but only decremented once for every request. It seems to me it should be modified to only be incremented once for every request rather than for every map fetched, but I'm not familiar with the original intent. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15989035#comment-15989035 ] Erik Krogen commented on MAPREDUCE-5951: Ah, excellent point, [~jlowe]... I actually would love to hear the reasoning behind the current strategy of AM downloads resource -> AM uploads resource to SCM> rather than the seemingly more obvious/simpler . Is this so that the uploading to SCM can be done by the NM, which is a privileged user, to have more secure control over it? [~ctrezzo], first off thanks for getting back so quickly! And for the pointer to YARN-5727; that's an interesting issue. The public visibility solution is certainly simpler from the YARN side and seems pretty reasonable from a point of expectation of burden on an application ("you want a publicly shared resource? put it somewhere public"). It doesn't add _too_ much complexity on the MR side, though having a separate staging directory just for public resources is a bit cumbersome. It also means that other application developers will have to build the same type of logic - in general I would lean towards more logic pushed into the YARN level so that it is easy for application devs to support. I don't have good insight into how difficult your initially proposed solution in YARN-5727 would be to implement, though. > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987585#comment-15987585 ] Erik Krogen edited comment on MAPREDUCE-5951 at 4/27/17 8:22 PM: - Hey [~ctrezzo], I have a question about the behavior of this patch. Currently the old logic for resource visibility is used, so if a resource is world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current understanding of this patch's behavior, I see the following scenario: * Client submits a job with libjar X, which has never been used before. Client contacts SCM to mark X as "used", SCM responds that it does not have X. * Client uploads X to staging directory, which I assume here is _not_ world-readable. X is marked as PRIVATE. * MR-AM localizes X, then uploads it to the shared cache. Other NMs all localize X as PRIVATE and do not share it with other applications. * Client then submits the same job with the same X. Client contacts SCM, and SCM responds with a world-readable (755 dirs / 555 file) path inside of the shared cache. * Client does not upload X, and marks X as PUBLIC, since it is currently in a world-readable location. * MR-AM and NMs all localize X as PUBLIC and share it with other applications. Please correct me if I am wrong on any of these steps. It seems that it is the expected behavior that X is eventually PUBLIC, given that we asked for it to be uploaded to the publicly shared cache, but it seems unnecessary for it to be marked as PRIVATE the first time around. Do we do this just to avoid changing the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an oversight, or is this behavior desired? was (Author: xkrogen): Hey [~ctrezzo], I have a question about the behavior of this patch. Currently the old logic for resource visibility is used, so if a resource is world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current understanding of this patch's behavior, I see the following scenario: * Client submits a job with libjar X, which has never been used before. Client contacts SCM to mark X as "used", SCM responds that it does not have X. * Client uploads X to staging directory, which I assume here is _not_ world-readable. X is marked as PRIVATE. * MR-AM localizes X, then uploads it to the shared cache. Other NMs all localize X as PRIVATE and do not share it with other applications. * Client then submits the same job with the same X. Client contacts SCM, and SCM responds with a world-readable (755 dirs / 555 file) path inside of the shared cache. * Client does not upload X, and marks X as PUBLIC, since it is currently in a world-readable location. * MR-AM and NMs all localize X as PUBLIC and share it with other applications. Please correct me if I am wrong on any of these steps. It seems that it is the expected behavior that X is eventually PUBLIC, given that we asked for it to be uploaded to the publicly shared cache, but it seems unnecessary for it to be marked as PRIVATE the first time around. Do we do this just to avoid changing the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an oversight, or is this behavior desired? > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5951) Add support for the YARN Shared Cache
[ https://issues.apache.org/jira/browse/MAPREDUCE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987585#comment-15987585 ] Erik Krogen commented on MAPREDUCE-5951: Hey [~ctrezzo], I have a question about the behavior of this patch. Currently the old logic for resource visibility is used, so if a resource is world-readable, it will be marked as PUBLIC, else PRIVATE. Given my current understanding of this patch's behavior, I see the following scenario: * Client submits a job with libjar X, which has never been used before. Client contacts SCM to mark X as "used", SCM responds that it does not have X. * Client uploads X to staging directory, which I assume here is _not_ world-readable. X is marked as PRIVATE. * MR-AM localizes X, then uploads it to the shared cache. Other NMs all localize X as PRIVATE and do not share it with other applications. * Client then submits the same job with the same X. Client contacts SCM, and SCM responds with a world-readable (755 dirs / 555 file) path inside of the shared cache. * Client does not upload X, and marks X as PUBLIC, since it is currently in a world-readable location. * MR-AM and NMs all localize X as PUBLIC and share it with other applications. Please correct me if I am wrong on any of these steps. It seems that it is the expected behavior that X is eventually PUBLIC, given that we asked for it to be uploaded to the publicly shared cache, but it seems unnecessary for it to be marked as PRIVATE the first time around. Do we do this just to avoid changing the existing logic for marking a resource as PRIVATE vs PUBLIC, is this an oversight, or is this behavior desired? > Add support for the YARN Shared Cache > - > > Key: MAPREDUCE-5951 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5951 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Chris Trezzo >Assignee: Chris Trezzo > Labels: BB2015-05-TBR > Attachments: MAPREDUCE-5951-Overview.001.pdf, > MAPREDUCE-5951-trunk.016.patch, MAPREDUCE-5951-trunk.017.patch, > MAPREDUCE-5951-trunk.018.patch, MAPREDUCE-5951-trunk.019.patch, > MAPREDUCE-5951-trunk-v10.patch, MAPREDUCE-5951-trunk-v11.patch, > MAPREDUCE-5951-trunk-v12.patch, MAPREDUCE-5951-trunk-v13.patch, > MAPREDUCE-5951-trunk-v14.patch, MAPREDUCE-5951-trunk-v15.patch, > MAPREDUCE-5951-trunk-v1.patch, MAPREDUCE-5951-trunk-v2.patch, > MAPREDUCE-5951-trunk-v3.patch, MAPREDUCE-5951-trunk-v4.patch, > MAPREDUCE-5951-trunk-v5.patch, MAPREDUCE-5951-trunk-v6.patch, > MAPREDUCE-5951-trunk-v7.patch, MAPREDUCE-5951-trunk-v8.patch, > MAPREDUCE-5951-trunk-v9.patch > > > Implement the necessary changes so that the MapReduce application can > leverage the new YARN shared cache (i.e. YARN-1492). > Specifically, allow per-job configuration so that MapReduce jobs can specify > which set of resources they would like to cache (i.e. jobjar, libjars, > archives, files). -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6873: --- Attachment: MAPREDUCE-6873.000.patch Attaching one-liner patch... > MR Job Submission Fails if MR framework application path not on defaultFS > - > > Key: MAPREDUCE-6873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.6.0 >Reporter: Erik Krogen >Priority: Minor > Attachments: MAPREDUCE-6873.000.patch > > > {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that > {{mapreduce.framework.application.path}} has a FS which matches > {{fs.defaultFS}} which may not always be true. This is just a consequence of > using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, > Configuration)}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-6873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Krogen updated MAPREDUCE-6873: --- Status: Patch Available (was: Open) > MR Job Submission Fails if MR framework application path not on defaultFS > - > > Key: MAPREDUCE-6873 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 2.6.0 >Reporter: Erik Krogen >Priority: Minor > Attachments: MAPREDUCE-6873.000.patch > > > {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that > {{mapreduce.framework.application.path}} has a FS which matches > {{fs.defaultFS}} which may not always be true. This is just a consequence of > using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, > Configuration)}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6873) MR Job Submission Fails if MR framework application path not on defaultFS
Erik Krogen created MAPREDUCE-6873: -- Summary: MR Job Submission Fails if MR framework application path not on defaultFS Key: MAPREDUCE-6873 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6873 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Reporter: Erik Krogen Priority: Minor {{JobSubmitter#addMRFrameworkPathToDistributedCache()}} assumes that {{mapreduce.framework.application.path}} has a FS which matches {{fs.defaultFS}} which may not always be true. This is just a consequence of using {{FileSystem.get(Configuration)}} instead of {{FileSystem.get(URI, Configuration)}}. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org