[jira] [Updated] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-4669: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > MRAM web UI does not work with HTTPS > > > Key: MAPREDUCE-4669 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter >Priority: Major > Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, > MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch > > > With Kerberos enable, the MRAM runs as the user that submitted the job, thus > the MRAM process cannot read the cluster keystore files to get the > certificates to start its HttpServer using HTTPS. > We need to decouple the keystore used by RM/NM/NN/DN (which are cluster > provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-4669: -- Fix Version/s: 3.3.0 > MRAM web UI does not work with HTTPS > > > Key: MAPREDUCE-4669 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter >Priority: Major > Fix For: 3.3.0 > > Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, > MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch > > > With Kerberos enable, the MRAM runs as the user that submitted the job, thus > the MRAM process cannot read the cluster keystore files to get the > certificates to start its HttpServer using HTTPS. > We need to decouple the keystore used by RM/NM/NN/DN (which are cluster > provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661391#comment-16661391 ] Haibo Chen commented on MAPREDUCE-4669: --- +1 on the latest patch. Checking it in shortly. > MRAM web UI does not work with HTTPS > > > Key: MAPREDUCE-4669 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter >Priority: Major > Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, > MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch > > > With Kerberos enable, the MRAM runs as the user that submitted the job, thus > the MRAM process cannot read the cluster keystore files to get the > certificates to start its HttpServer using HTTPS. > We need to decouple the keystore used by RM/NM/NN/DN (which are cluster > provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657337#comment-16657337 ] Haibo Chen commented on MAPREDUCE-4669: --- Thanks [~rkanter] for the patch! I have a few comments/questions. 1) Can we rename withNeedsClientAuth() to withClientAuth(boolean)? 2) In WebApps, I think we can skip the truststore part if clientAuth is false. 3) Can we add another unit test that queries the AM web server with a un-trusted certificate when client authorization is on? > MRAM web UI does not work with HTTPS > > > Key: MAPREDUCE-4669 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Assignee: Robert Kanter >Priority: Major > Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, > MAPREDUCE-4669.003.patch > > > With Kerberos enable, the MRAM runs as the user that submitted the job, thus > the MRAM process cannot read the cluster keystore files to get the > certificates to start its HttpServer using HTTPS. > We need to decouple the keystore used by RM/NM/NN/DN (which are cluster > provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7150: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.3.0 Status: Resolved (was: Patch Available) Thanks [~mi...@cloudera.com] for the contribution and [~pbacsko] for additional reviews! > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Fix For: 3.3.0 > > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652407#comment-16652407 ] Haibo Chen commented on MAPREDUCE-7150: --- +1 on the latest patch. Will check it in shortly. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648621#comment-16648621 ] Haibo Chen commented on MAPREDUCE-7150: --- Thanks for the detailed explaination, [~mi...@cloudera.com]. That makes a lot of sense, and I agree with you that it is correct without the synchronized fix. It's more of a convention to address the warning as much as possible, so that the findbugs warning won't show up in every Jenkins build. The jenkins' job might be confused because the Jira is now moved to MapReduce. I have manually kicked it off. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > YARN-8872.03.patch, jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648559#comment-16648559 ] Haibo Chen commented on MAPREDUCE-7150: --- Thanks [~mi...@cloudera.com] for the patch! {quote}The {{map}} object is created lazily in the synchronized method {{findCounter()}}, so according to the Java Memory Model, once it's created, it's visible to all the code, both synchronized and unsynchronized. {quote} Not a JMM expert. Doesn't the reader always need to have a read barrier to get the latest result of a variable? Is there something that synchronized block does special? Regardless, let's add synchronized to the write(DataOutput) method too to fix the findbugs warning. > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7150: -- Summary: Optimize collections used by MR JHS to reduce its memory (was: Optimize collections used by Yarn JHS to reduce its memory) > Optimize collections used by MR JHS to reduce its memory > > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7150) Optimize collections used by Yarn JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-7150: - Assignee: Misha Dmitriev > Optimize collections used by Yarn JHS to reduce its memory > -- > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-7150) Optimize collections used by Yarn JHS to reduce its memory
[ https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-7150: - Assignee: (was: Misha Dmitriev) Component/s: (was: yarn) mrv2 jobhistoryserver Key: MAPREDUCE-7150 (was: YARN-8872) Project: Hadoop Map/Reduce (was: Hadoop YARN) > Optimize collections used by Yarn JHS to reduce its memory > -- > > Key: MAPREDUCE-7150 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver, mrv2 >Reporter: Misha Dmitriev >Priority: Major > Attachments: YARN-8872.01.patch, YARN-8872.02.patch, > jhs-bad-collections.png > > > We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big > heap in a large clusters, handling large MapReduce jobs. The heap is large > (over 32GB) and 21.4% of it is wasted due to various suboptimal Java > collections, mostly maps and lists that are either empty or contain only one > element. In such under-populated collections considerable amount of memory is > still used by just the internal implementation objects. See the attached > excerpt from the jxray report for the details. If certain collections are > almost always empty, they should be initialized lazily. If others almost > always have just 1 or 2 elements, they should be initialized with the > appropriate initial capacity of 1 or 2 (the default capacity is 16 for > HashMap and 10 for ArrayList). > Based on the attached report, we should do the following: > # {{FileSystemCounterGroup.map}} - initialize lazily > # {{CompletedTask.attempts}} - initialize with capacity 2, given most tasks > only have one or two attempts > # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity > # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it > contains one diagnostic message most of the time > # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to > use the more wasteful LinkedList here) and initialize with capacity 1. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6861: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) Thanks [~zsiegl] for your contribution! I have checked in the patch to trunk > Add metrics tags for ShuffleClientMetrics > - > > Key: MAPREDUCE-6861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Akira Ajisaka >Assignee: Zoltan Siegl >Priority: Major > Labels: newbie++ > Fix For: 3.2.0 > > Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.002.patch, > MAPREDUCE-6861.01.patch > > > Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them > back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593716#comment-16593716 ] Haibo Chen commented on MAPREDUCE-6861: --- +1 on the latest 02 patch pending jenkins. > Add metrics tags for ShuffleClientMetrics > - > > Key: MAPREDUCE-6861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Akira Ajisaka >Assignee: Zoltan Siegl >Priority: Major > Labels: newbie++ > Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.002.patch, > MAPREDUCE-6861.01.patch > > > Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them > back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593651#comment-16593651 ] Haibo Chen commented on MAPREDUCE-6861: --- Thanks [~zsiegl] for the patch. The patch looks good to me with minor comments. 1) The description of the RECORD_INFO can be "Metrics for Shuffle client" instead of "Metrics for Shuffle Plugin" 2) TestShuffleClientMetrics.testCreate() can probably be renamed to testShuffleMetricsTag() or the like. 3) "shuffleClientMetrics.threadBusy();" is not necessary as part of the act step in the unit test given what the test is doing. > Add metrics tags for ShuffleClientMetrics > - > > Key: MAPREDUCE-6861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Akira Ajisaka >Assignee: Zoltan Siegl >Priority: Major > Labels: newbie++ > Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.01.patch > > > Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them > back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics
[ https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6861: - Assignee: Zoltan Siegl > Add metrics tags for ShuffleClientMetrics > - > > Key: MAPREDUCE-6861 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha1 >Reporter: Akira Ajisaka >Assignee: Zoltan Siegl >Priority: Major > Labels: newbie++ > > Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them > back. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546895#comment-16546895 ] Haibo Chen commented on MAPREDUCE-6948: --- I am okay with closing this for now. > TestJobImpl.testUnusableNodeTransition failed > - > > Key: MAPREDUCE-6948 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Jim Brennan >Priority: Major > Labels: unit-test > > *Error Message* > expected: but was: > *Stacktrace* > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615) > *Standard out* > {code} > 2017-08-30 10:12:21,928 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler > 2017-08-30 10:12:21,939 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.jobhistory.EventType for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,941 INFO [Thread-49] impl.JobImpl > (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to > jobTokenSecretManager > 2017-08-30 10:12:21,941 WARN [Thread-49] impl.JobImpl > (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. > Using job token secret as shuffle secret. > 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl > (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 > because: not enabled; > 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl > (JobImpl.java:createMapTasks(1562)) - Input size for job > job_123456789_0001 = 0. Number of splits = 2 > 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl > (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job > job_123456789_0001 = 1 > 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW > to INITED > 2017-08-30 10:12:21,946 INFO [Thread-49] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from > INITED to SETUP > 2017-08-30 10:12:21,954 INFO [CommitterEvent Processor #0] > commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - > Processing the event EventType: JOB_SETUP > 2017-08-30 10:12:21,978 INFO [AsyncDispatcher event handler] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from > SETUP to RUNNING > 2017-08-30 10:12:21,983 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5 > 2017-08-30 10:12:22,000 INFO [Thread-49] impl.JobImpl > (JobImpl.java:transition(1953)) - Num completed Tasks: 1 > 2017-08-30 10:12:22,029 INFO [Thread-49] impl.JobImpl > (JobImpl.java:transition(1953)) - Num completed Tasks: 2 > 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl > (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on > unusable node Mock for NodeId, hashCode: 1280187896. > AttemptId:attempt_123456789_0001_m_00_0 > 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl > (JobIm
[jira] [Commented] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470841#comment-16470841 ] Haibo Chen commented on MAPREDUCE-7095: --- Thanks [~miklos.szeg...@cloudera.com] for the fix. I have committed the patch to trunk. > Race conditions in closing FadvisedChunkedFile > --- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Fix For: 3.2.0 > > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch, YARN-8090.003.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.Th
[jira] [Updated] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7095: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.2.0 Status: Resolved (was: Patch Available) > Race conditions in closing FadvisedChunkedFile > --- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Fix For: 3.2.0 > > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch, YARN-8090.003.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$
[jira] [Updated] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7095: -- Summary: Race conditions in closing FadvisedChunkedFile (was: Race conditions in FadvisedChunkedFile) > Race conditions in closing FadvisedChunkedFile > --- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch, YARN-8090.003.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at j
[jira] [Commented] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470818#comment-16470818 ] Haibo Chen commented on MAPREDUCE-7095: --- +1. Checking this in shortly. > Race conditions in FadvisedChunkedFile > -- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch, YARN-8090.003.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){co
[jira] [Commented] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469342#comment-16469342 ] Haibo Chen commented on MAPREDUCE-7095: --- [~miklos.szeg...@cloudera.com] can you address the checkstyle issues? > Race conditions in FadvisedChunkedFile > -- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(
[jira] [Updated] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7095: -- Status: Patch Available (was: Open) > Race conditions in FadvisedChunkedFile > -- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --
[jira] [Moved] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile
[ https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen moved YARN-8090 to MAPREDUCE-7095: - Affects Version/s: (was: 3.1.0) 3.1.0 Key: MAPREDUCE-7095 (was: YARN-8090) Project: Hadoop Map/Reduce (was: Hadoop YARN) > Race conditions in FadvisedChunkedFile > -- > > Key: MAPREDUCE-7095 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Miklos Szegedi >Assignee: Miklos Szegedi >Priority: Minor > Attachments: YARN-8090.000.patch, YARN-8090.001.patch, > YARN-8090.002.patch > > > When a file is closed mutple times by multiple threads, all but the first > close will generate a WARNING message. > {code:java} > 11:04:33.605 AM WARNFadvisedChunkedFile > Failed to manage OS cache for > /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out > EBADF: Bad file descriptor > at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native > Method) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267) > at > org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146) > at > org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192) > at > org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493) > at > org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371) > at > org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667) > at > org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) > at > org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) > at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375) > at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) > at > org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337) > at > org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) > at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) > at > org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) > at > org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPool
[jira] [Commented] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408722#comment-16408722 ] Haibo Chen commented on MAPREDUCE-6948: --- This was observed in alpha4, so I suspect there is another race condition. > TestJobImpl.testUnusableNodeTransition failed > - > > Key: MAPREDUCE-6948 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Jim Brennan >Priority: Major > Labels: unit-test > > *Error Message* > expected: but was: > *Stacktrace* > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041) > at > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615) > *Standard out* > {code} > 2017-08-30 10:12:21,928 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler > 2017-08-30 10:12:21,939 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.jobhistory.EventType for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class > org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf > 2017-08-30 10:12:21,941 INFO [Thread-49] impl.JobImpl > (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to > jobTokenSecretManager > 2017-08-30 10:12:21,941 WARN [Thread-49] impl.JobImpl > (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. > Using job token secret as shuffle secret. > 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl > (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 > because: not enabled; > 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl > (JobImpl.java:createMapTasks(1562)) - Input size for job > job_123456789_0001 = 0. Number of splits = 2 > 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl > (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job > job_123456789_0001 = 1 > 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW > to INITED > 2017-08-30 10:12:21,946 INFO [Thread-49] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from > INITED to SETUP > 2017-08-30 10:12:21,954 INFO [CommitterEvent Processor #0] > commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - > Processing the event EventType: JOB_SETUP > 2017-08-30 10:12:21,978 INFO [AsyncDispatcher event handler] impl.JobImpl > (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from > SETUP to RUNNING > 2017-08-30 10:12:21,983 INFO [Thread-49] event.AsyncDispatcher > (AsyncDispatcher.java:register(209)) - Registering class > org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class > org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5 > 2017-08-30 10:12:22,000 INFO [Thread-49] impl.JobImpl > (JobImpl.java:transition(1953)) - Num completed Tasks: 1 > 2017-08-30 10:12:22,029 INFO [Thread-49] impl.JobImpl > (JobImpl.java:transition(1953)) - Num completed Tasks: 2 > 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl > (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on > unusable node Mock for NodeId, hashCode: 1280187896. > AttemptId:attempt_123456789_0001_m_00_0 > 2017-08-30 10:12:22,032 INF
[jira] [Updated] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7065: -- Description: While exploring the possibility of retrieving every piece of information that JHS presents today through ATSv2, I found a few improvements we can make. 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT 2) Task attempt final state are stored in the events, so we can not use infofilter to group task attempts by final state, which is what JHS does. 3) Display names of counters are not stored in JHS. We are currently storing (counter name, display name, value) as a metric (counter name, value). We can potentially store (counter name, display name) as an info. Similarly for sources of Job configuration properties 4) Job level counters and configuration properties are stored both in ApplicationTable and EntityTable. It's probably safe just to store MR specific counters in EntityTable. One general problem I see around this area in MR: 1) We can precompute # of failed/killed/successful map/reduce task attempts and average map/reduce/shuffle/merge time in the AM. This would avoid iterating over all task attempts when JHS servers the Job Overview Page. To fully replace JHS with ATSv2, three functionalities need to be supported by ATSv2 1) /apps/ query so that a list of all jobs can be retrieved (YARN-6058) 2) support streaming api to get all generic entities (YARN-5627) 3) support per-app data retention policy. Likely a setting in TimelineWriter that allow admins specifies how long information of a given application should be kepts, in the form of TTL in HBase. was: While exploring the possibility of retrieving every piece of information that JHS presents today through ATSv2, I found a few improvements we can make. 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT 2) Task attempt final state are stored in the events, so we can not use infofilter to group task attempts by final state, which is what JHS does. 3) Display names of counters are not stored in JHS. We are currently storing (counter name, display name, value) as a metric (counter name, value). We can potentially store (counter name, display name) as an info. Similarly for sources of Job configuration properties 4) Job level counters and configuration properties are stored both in ApplicationTable and EntityTable. It's probably safe just to store MR specific counters in EntityTable. One general problem I see around this area in MR: 1) We can precompute # of failed/killed/successful map/reduce task attempts and average map/reduce/shuffle/merge time in the AM. This would avoid iterating over all task attempts when JHS servers the Job Overview Page. To fully replace JHS with ATSv2, three functionalities need to be supported by ATSv2 1) /apps/ query so that a list of all jobs can be retrieved 2) support streaming api to get all generic entities (YARN-5627) 3) support per-app data retention policy. Likely a setting in TimelineWriter that allow admins specifies how long information of a given application should be kepts, in the form of TTL in HBase. > Improve information stored in ATSv2 for MR jobs > --- > > Key: MAPREDUCE-7065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > While exploring the possibility of retrieving every piece of information that > JHS presents today through ATSv2, I found a few improvements we can make. > 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are > indistinguishably stored as entities of type MR_TASK. We can split MR_TASK > into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT > 2) Task attempt final state are stored in the events, so we can not use > infofilter to group task attempts by final state, which is what JHS does. > 3) Display names of counters are not stored in JHS. We are currently storing > (counter name, display name, value) as a metric (counter name, value). We can > potentially store (counter name, display name) as an info. Similarly for > sources of Job configuration properties > 4) Job level counters and configuration properties are stored both in > ApplicationTable and EntityTable. It's probably safe just to store MR > specific counters in EntityTable. > > One general problem I see around this area in M
[jira] [Commented] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396248#comment-16396248 ] Haibo Chen commented on MAPREDUCE-7065: --- CC [~rohithsharma] [~vrushalic]. Is it OK at this point to change how MR data is stored in ATSv2, for example split MR_TASK into MR_MAP_TASK and MR_REDUCE_TASK? > Improve information stored in ATSv2 for MR jobs > --- > > Key: MAPREDUCE-7065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > While exploring the possibility of retrieving every piece of information that > JHS presents today through ATSv2, I found a few improvements we can make. > 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are > indistinguishably stored as entities of type MR_TASK. We can split MR_TASK > into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT > 2) Task attempt final state are stored in the events, so we can not use > infofilter to group task attempts by final state, which is what JHS does. > 3) Display names of counters are not stored in JHS. We are currently storing > (counter name, display name, value) as a metric (counter name, value). We can > potentially store (counter name, display name) as an info. Similarly for > sources of Job configuration properties > 4) Job level counters and configuration properties are stored both in > ApplicationTable and EntityTable. It's probably safe just to store MR > specific counters in EntityTable. > > One general problem I see around this area in MR: > 1) We can precompute # of failed/killed/successful map/reduce task attempts > and average map/reduce/shuffle/merge time in the AM. This would avoid > iterating over all task attempts when JHS servers the Job Overview Page. > > To fully replace JHS with ATSv2, three functionalities need to be supported > by ATSv2 > 1) /apps/ query so that a list of all jobs can be retrieved > 2) support streaming api to get all generic entities (YARN-5627) > 3) support per-app data retention policy. Likely a setting in TimelineWriter > that allow admins specifies how long information of a given application > should be kepts, in the form of TTL in HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs
[ https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7065: -- Description: While exploring the possibility of retrieving every piece of information that JHS presents today through ATSv2, I found a few improvements we can make. 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT 2) Task attempt final state are stored in the events, so we can not use infofilter to group task attempts by final state, which is what JHS does. 3) Display names of counters are not stored in JHS. We are currently storing (counter name, display name, value) as a metric (counter name, value). We can potentially store (counter name, display name) as an info. Similarly for sources of Job configuration properties 4) Job level counters and configuration properties are stored both in ApplicationTable and EntityTable. It's probably safe just to store MR specific counters in EntityTable. One general problem I see around this area in MR: 1) We can precompute # of failed/killed/successful map/reduce task attempts and average map/reduce/shuffle/merge time in the AM. This would avoid iterating over all task attempts when JHS servers the Job Overview Page. To fully replace JHS with ATSv2, three functionalities need to be supported by ATSv2 1) /apps/ query so that a list of all jobs can be retrieved 2) support streaming api to get all generic entities (YARN-5627) 3) support per-app data retention policy. Likely a setting in TimelineWriter that allow admins specifies how long information of a given application should be kepts, in the form of TTL in HBase. was: While exploring the possibility of retrieving every piece of information that JHS presents today through ATSv2, I found a few improvements we can make. 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT 2) Task attempt final state are stored in the events, so we can not use infofilter to group task attempts by final state, which is what JHS does. 3) Display names of counters are not stored in JHS. We are currently storing (counter name, display name, value) as a metric (counter name, value). We can potentially store (counter name, display name) as an info. Similarly for sources of Job configuration properties 4) Job level counters and configuration properties are stored both in ApplicationTable and EntityTable. It's probably safe just to store MR specific counters in EntityTable. One general problem I see around this area in MR: 1) We can precompute # of failed/killed/successful map/reduce task attempts and average map/reduce/shuffle/merge time in the AM. This would avoid iterating over all task attempts when JHS servers the Job Overview Page. To fully replace JHS with ATSv2, three functionalities need to be supported by ATSv2 1) /apps/ query so that a list of all jobs can be retrieved 2) support streaming api to get all generic entities (YARN-5672) 3) support per-app data retention policy. Likely a setting in TimelineWriter that allow admins specifies how long information of a given application should be kepts, in the form of TTL in HBase. > Improve information stored in ATSv2 for MR jobs > --- > > Key: MAPREDUCE-7065 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Major > > While exploring the possibility of retrieving every piece of information that > JHS presents today through ATSv2, I found a few improvements we can make. > 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are > indistinguishably stored as entities of type MR_TASK. We can split MR_TASK > into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT > 2) Task attempt final state are stored in the events, so we can not use > infofilter to group task attempts by final state, which is what JHS does. > 3) Display names of counters are not stored in JHS. We are currently storing > (counter name, display name, value) as a metric (counter name, value). We can > potentially store (counter name, display name) as an info. Similarly for > sources of Job configuration properties > 4) Job level counters and configuration properties are stored both in > ApplicationTable and EntityTable. It's probably safe just to store MR > specific counters in EntityTable. > > One general problem I see around this area in MR: > 1) We c
[jira] [Created] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs
Haibo Chen created MAPREDUCE-7065: - Summary: Improve information stored in ATSv2 for MR jobs Key: MAPREDUCE-7065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Haibo Chen Assignee: Haibo Chen While exploring the possibility of retrieving every piece of information that JHS presents today through ATSv2, I found a few improvements we can make. 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT 2) Task attempt final state are stored in the events, so we can not use infofilter to group task attempts by final state, which is what JHS does. 3) Display names of counters are not stored in JHS. We are currently storing (counter name, display name, value) as a metric (counter name, value). We can potentially store (counter name, display name) as an info. Similarly for sources of Job configuration properties 4) Job level counters and configuration properties are stored both in ApplicationTable and EntityTable. It's probably safe just to store MR specific counters in EntityTable. One general problem I see around this area in MR: 1) We can precompute # of failed/killed/successful map/reduce task attempts and average map/reduce/shuffle/merge time in the AM. This would avoid iterating over all task attempts when JHS servers the Job Overview Page. To fully replace JHS with ATSv2, three functionalities need to be supported by ATSv2 1) /apps/ query so that a list of all jobs can be retrieved 2) support streaming api to get all generic entities (YARN-5672) 3) support per-app data retention policy. Likely a setting in TimelineWriter that allow admins specifies how long information of a given application should be kepts, in the form of TTL in HBase. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6441: -- Attachment: MAPREDUCE-6441.011.patch > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang >Priority: Major > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch, MAPREDUCE-6441.011.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6441: -- Attachment: MAPREDUCE-6441.010.patch > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang >Priority: Major > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6441: -- Attachment: MAPREDUCE-6441.009.patch > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang >Priority: Major > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6441: -- Attachment: (was: MAPREDUCE-6441.009.patch) > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang >Priority: Major > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS
[ https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-4669: - Assignee: (was: Haibo Chen) > MRAM web UI does not work with HTTPS > > > Key: MAPREDUCE-4669 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha >Reporter: Alejandro Abdelnur >Priority: Major > > With Kerberos enable, the MRAM runs as the user that submitted the job, thus > the MRAM process cannot read the cluster keystore files to get the > certificates to start its HttpServer using HTTPS. > We need to decouple the keystore used by RM/NM/NN/DN (which are cluster > provided) from the keystore used by AMs (which ought to be user provided). -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363021#comment-16363021 ] Haibo Chen commented on MAPREDUCE-6441: --- [~rchiang] Do you plan to work on this? I can take it over if you don't have the cycles > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang >Priority: Major > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7041) MR should not try to clean up at first job attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7041: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.10.0 3.0.1 3.1.0 Status: Resolved (was: Patch Available) Thanks [~tasanuma0829] for reporting the issue, and [~grepas] for the quick fix. I committed the patch in branch-2, branch-3.0 and trunk > MR should not try to clean up at first job attempt > -- > > Key: MAPREDUCE-7041 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Gergo Repas >Priority: Major > Fix For: 3.1.0, 3.0.1, 2.10.0 > > Attachments: MAPREDUCE-7041.000.patch > > > These tests fails in trunk now. MAPREDUCE-6984 may be related. > {noformat} > hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed > hadoop.mapred.TestJobCleanup.testCustomAbort > hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7041) MR should not try to clean up at first job attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-7041: -- Summary: MR should not try to clean up at first job attempt (was: Several unit tests in hadoop-mapreduce-client-jobclient are failed) > MR should not try to clean up at first job attempt > -- > > Key: MAPREDUCE-7041 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-7041.000.patch > > > These tests fails in trunk now. MAPREDUCE-6984 may be related. > {noformat} > hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed > hadoop.mapred.TestJobCleanup.testCustomAbort > hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7041) Several unit tests in hadoop-mapreduce-client-jobclient are failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340329#comment-16340329 ] Haibo Chen commented on MAPREDUCE-7041: --- +1. Check this in shortly. > Several unit tests in hadoop-mapreduce-client-jobclient are failed > -- > > Key: MAPREDUCE-7041 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Takanobu Asanuma >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-7041.000.patch > > > These tests fails in trunk now. MAPREDUCE-6984 may be related. > {noformat} > hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed > hadoop.mapred.TestJobCleanup.testCustomAbort > hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6984: -- Fix Version/s: 2.10.0 > MR AM to clean up temporary files from previous attempt in case of no recovery > -- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.1.0, 3.0.1, 2.10.0 > > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, > MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6984: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.1 3.1.0 Status: Resolved (was: Patch Available) Thanks [~grepas] for your contribution. I have committed the patch to branch-2, branch-3.0 and trunk > MR AM to clean up temporary files from previous attempt in case of no recovery > -- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Fix For: 3.1.0, 3.0.1 > > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, > MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6984: -- Summary: MR AM to clean up temporary files from previous attempt in case of no recovery (was: MR AM to clean up temporary files from previous attempt) > MR AM to clean up temporary files from previous attempt in case of no recovery > -- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, > MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332761#comment-16332761 ] Haibo Chen commented on MAPREDUCE-6984: --- +1 pending Jenkins. > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, > MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332679#comment-16332679 ] Haibo Chen commented on MAPREDUCE-6984: --- Thanks [~grepas] for the update! Please address the check style issue and a minor comment. Not sure what this line of code is for. Otherwise, LGTM {quote} conf.setBoolean("want.am.recovery", true) {quote} > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, > MAPREDUCE-6984.009.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328901#comment-16328901 ] Haibo Chen commented on MAPREDUCE-6984: --- In TestRecovery, we always create an instance MRApp and stops it to simulate an AM failure, which in my opinion is more readable than mucking with the file system directly in our test. Plus, the current test is coupled with the behavior of FileOutputCommitter. We can add a similar test in TestRecovery, wrap up the OutputCommit in a spy and verify if abortJob is called. This way, we cover other OutputCommitters. > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327558#comment-16327558 ] Haibo Chen commented on MAPREDUCE-6984: --- Thanks [~grepas] for the updated patch! A few more comments: # In cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId), we essentially clean up the whole job rather than individual job attempt. If there are previously more than one attempts, only the first job clean up will succeed. How about we rename it to CleanupPreviousJobOutput and call it only once in cleanUpPreviousAttemptOutput()? # We should probably pass FAILED to the abortJob() call. # For the unit test, we can do things similar in TestRecovery to simulate multiple job attempts, which I think is more readable. > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas >Priority: Major > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, > MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, > MAPREDUCE-6984.006.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321295#comment-16321295 ] Haibo Chen commented on MAPREDUCE-6926: --- Thanks [~miklos.szeg...@cloudera.com] for your reviews! > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: YARN-1011 > > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, > MAPREDUCE-6926-YARN-1011.03.patch, MAPREDUCE-6926-YARN-1011.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Attachment: MAPREDUCE-6926-YARN-1011.04.patch Good catch. I updated the patch to address that issue. > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, > MAPREDUCE-6926-YARN-1011.03.patch, MAPREDUCE-6926-YARN-1011.04.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Attachment: MAPREDUCE-6926-YARN-1011.03.patch > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, > MAPREDUCE-6926-YARN-1011.03.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310315#comment-16310315 ] Haibo Chen commented on MAPREDUCE-6926: --- I have instead rebased YARN-1011 branch and manually triggered a Jenkins job > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308813#comment-16308813 ] Haibo Chen commented on MAPREDUCE-6926: --- As pointed by Miklos offiline, HADOOP-15122 should fix the build issue here. Cherry-picking HADOOP-15122 and re triggering another Jenkins job. > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Attachment: MAPREDUCE-6926-YARN-1011.02.patch > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308705#comment-16308705 ] Haibo Chen commented on MAPREDUCE-6926: --- Good catch. I updated the message in the new patch. > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded
[ https://issues.apache.org/jira/browse/MAPREDUCE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308511#comment-16308511 ] Haibo Chen commented on MAPREDUCE-7015: --- [~pbacsko] I think you meant to say step #6 is slow enough for the most times, right? Otherwise, the cli client would almost always try to find the config file in the intermediate directory when the file is already quickly moved into the done directory, and fail. > Possible race condition in JHS if the job is not loaded > --- > > Key: MAPREDUCE-7015 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-7015-POC01.patch > > > There could be a race condition inside JHS. In our build environment, > {{TestMRJobClient.testJobClient()}} failed with this exception: > {noformat} > ava.io.FileNotFoundException: File does not exist: > hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml > at > org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266) > at > org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092) > at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068) > at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at > org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167) > {noformat} > Root cause: > 1. MapReduce job completes > 2. CLI calls {{cluster.getJob(jobid)}} > 3. The job is finished and the client side gets redirected to JHS > 4. The job data is missing from {{CachedHistoryStorage}} so JHS tries to find > the job > 5. First it scans the intermediate directory and finds the job > 6. The call {{moveToDone()}} is scheduled for execution on a separate thread > inside {{moveToDoneExecutor}} but does not get the chance to run immediately > 7. RPC invocation returns with the path pointing to > {{/tmp/hadoop-yarn/staging/history/done_intermediate}} > 8. The call to {{moveToDone()}} completes which moves the contents of > {{done_intermediate}} to {{done}} > 9. Hadoop CLI tries to download the config file from done_intermediate but > it's no longer there > Usually step #6 is fast enough to complete before step #7, but sometimes it > can get behind, causing this race condition. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308413#comment-16308413 ] Haibo Chen commented on MAPREDUCE-6926: --- Thanks [~miklos.szeg...@cloudera.com] for the review! I agree with you that Using Clock.getTime() would ease unit test in general. But in this case, there are already other ContainerRequest constructors to allow easy unit test through direct specification of request time. Plus, I believe my change is preserving the current behavior in the code. > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Attachment: MAPREDUCE-6926-YARN-1011.01.patch Updated the patch to address the unit test failure plus the checkstyle issues. > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch, > MAPREDUCE-6926-YARN-1011.01.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Status: Patch Available (was: Open) > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription
[ https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6926: -- Attachment: MAPREDUCE-6926-YARN-1011.00.patch > Allow MR jobs to opt out of oversubscription > > > Key: MAPREDUCE-6926 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: mrv2 >Affects Versions: 3.0.0-alpha3 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: MAPREDUCE-6926-YARN-1011.00.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221178#comment-16221178 ] Haibo Chen commented on MAPREDUCE-6984: --- Thanks [~grepas] for the patch! A few high level comments: 1) The cleanup of previous attempts does not impact the current attempt in any case, so I think we should catch all exception in cleanUpPreviousAttemptOutput() instead of just IOExceptions. 2) If we have multiple previous attempts and the previous attempt cleaned up its previous attempts, we'd get FileNotFoundException while doing cleanUp in current attempt, resulting in bogus error messages "LOG.error("Error while trying to clean up previous attempt (" + appAttemptId + ")", e);" We could catch FileNotFoundException and ignore it first, and then log any exception like this. {code} private void cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId) { Configuration configuration = new Configuration(getConfig()); configuration.setInt(MRJobConfig.APPLICATION_ATTEMPT_ID, appAttemptId.getAttemptId()); JobContext jobContext = getJobContextFromConf(configuration); try { LOG.info("Starting to clean up previous attempt's (" + appAttemptId + ") temporary files"); OutputCommitter outputCommitter = createOutputCommitter(configuration); outputCommitter.abortJob(jobContext, State.KILLED); LOG.info("Finished cleaning up previous attempt's (" + appAttemptId + ") temporary files"); } catch (FileNotFoundException) { //safely ignore }catch (Exception e) { // the clean up of a previous attempt is not critical to the success // of this job - only logging the error LOG.error("Error while trying to clean up previous attempt (" + appAttemptId + ")", e); } } {code} 3) We only recover successful tasks, so we can also do the clean up of failed/killed tasks even recovery is enabled. but that's another optimization. I am ok with not doing it if it is complicated. > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6992) Race for temp dir in LocalDistributedCacheManager.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220977#comment-16220977 ] Haibo Chen commented on MAPREDUCE-6992: --- I believe this is a duplicate of MAPREDUCE-6441 > Race for temp dir in LocalDistributedCacheManager.java > -- > > Key: MAPREDUCE-6992 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6992 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Philip Zeyliger > > When localizing distributed cache files in "local" mode, > LocalDistributedCacheManager.java chooses a "unique" directory based on a > millisecond time stamp. When running code with some parallelism, it's > possible to run into this. > The error message looks like > {code} > bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp > does not exist > {code} > I ran into this in Impala's data loading. There, we run a HiveServer2 which > runs in MapReduce. If multiple queries are submitted simultaneously to the > HS2, they conflict on this directory. Googling found that StreamSets ran into > something very similar looking at > https://issues.streamsets.com/browse/SDC-5473. > I believe the buggy code is (link: > https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94) > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = > new AtomicLong(System.currentTimeMillis()); > {code} > Notably, a similar code path uses an actual random number generator > ({{LocalJobRunner.java}}, > https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912). > {code} > public String getStagingAreaDir() throws IOException { > Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT, > "/tmp/hadoop/mapred/staging")); > UserGroupInformation ugi = UserGroupInformation.getCurrentUser(); > String user; > randid = rand.nextInt(Integer.MAX_VALUE); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215975#comment-16215975 ] Haibo Chen commented on MAPREDUCE-6984: --- Manually trigger the jenkins job. > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6631) shuffle handler would benefit from per-local-dir threads
[ https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6631: - Assignee: (was: Haibo Chen) > shuffle handler would benefit from per-local-dir threads > > > Key: MAPREDUCE-6631 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: nodemanager >Affects Versions: 2.7.2, 3.0.0-alpha1 >Reporter: Nathan Roberts > > [~jlowe] and I discussed this while investigating I/O starvation we have been > seeing on our clusters lately (possibly amplified by increased tez > workloads). > If a particular disk is being slow, it is very likely that all shuffle netty > threads will be blocked on the read side of sendfile(). (sendfile() is > asynchronous on the outbound socket side, but not on the read side.) This > causes the entire shuffle subsystem to slow down. > It seems like we could make the netty threads more asynchronous by > introducing a small set of threads per local-dir that are responsible for the > actual sendfile() invocations. > This would not only improve shuffles that span drives, but also improve > situations where there is a single large shuffle from a single local-dir. It > would allow other drives to continue serving shuffle requests, AND avoid a > large number of readers (2X number_of_cores by default) all fighting for the > same drive, which becomes unfair to everything else on the system. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-5124) AM lacks flow control for task events
[ https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-5124: - Assignee: Peter Bacsko (was: Haibo Chen) > AM lacks flow control for task events > - > > Key: MAPREDUCE-5124 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe >Assignee: Peter Bacsko > Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt > > > The AM does not have any flow control to limit the incoming rate of events > from tasks. If the AM is unable to keep pace with the rate of incoming > events for a sufficient period of time then it will eventually exhaust the > heap and crash. MAPREDUCE-5043 addressed a major bottleneck for event > processing, but the AM could still get behind if it's starved for CPU and/or > handling a very large job with tens of thousands of active tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt
[ https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6984: - Assignee: Gergo Repas > MR AM to clean up temporary files from previous attempt > --- > > Key: MAPREDUCE-6984 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster >Affects Versions: 3.0.0-beta1 >Reporter: Gergo Repas >Assignee: Gergo Repas > Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch > > > When the MR AM restarts, the > {outputDir}/_temporary/{appAttemptNumber} directory > remains on HDFS, even though this directory is not used during the next > attempt if the restart has been done without recovery. So if recovery is not > used for the AM restart, then the deletion of this directory can be done > earlier (at the start of the next attempt). The benefit is that more free > HDFS space is available for the next attempt. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193551#comment-16193551 ] Haibo Chen commented on MAPREDUCE-6441: --- Thanks for the update, [~rchiang]. We probably should check the future objects returned by invoking all the callables to see if there are exceptions. Otherwise, the test won't fail even if some of the callables fail > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, > MAPREDUCE-6441.009.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6892: -- Release Note: This adds some new job counters, the number of failed MAP/REDUCE tasks and the number of killed MAP/REDUCE tasks. > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, > MAPREDUCE-6892-006.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162181#comment-16162181 ] Haibo Chen commented on MAPREDUCE-6441: --- I see. Talking with Daniel, there is really no good way to do this. Can we then add a javadoc to this new method to explain the issue with the test? As suggested by Daniel, we should probably use Barrier (All threads wait on the barrier and get notified at the same time) which will give us better chance to reproduce this. > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle
[ https://issues.apache.org/jira/browse/MAPREDUCE-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen resolved MAPREDUCE-6955. --- Resolution: Not A Problem > remove unnecessary dependency from hadoop-mapreduce-client-app to > hadoop-mapreduce-client-shuffle > - > > Key: MAPREDUCE-6955 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Haibo Chen >Assignee: Haibo Chen > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle
Haibo Chen created MAPREDUCE-6955: - Summary: remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle Key: MAPREDUCE-6955 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Haibo Chen Assignee: Haibo Chen -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes
[ https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153860#comment-16153860 ] Haibo Chen commented on MAPREDUCE-6441: --- bq. but I haven't managed to get it to fail with the old code My understanding is that the new test is supposed to fail with the old code and the new change is supposed to fix the test failure. Otherwise, the new test is not testing any new behavior, right? > Improve temporary directory name generation in LocalDistributedCacheManager > for concurrent processes > > > Key: MAPREDUCE-6441 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: William Watson >Assignee: Ray Chiang > Attachments: HADOOP-10924.02.patch, > HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, > MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch > > > Kicking off many sqoop processes in different threads results in: > {code} > 2014-08-01 13:47:24 -0400: INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: > Encountered IOException running import job: java.io.IOException: > java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot > overwrite non empty destination directory > /tmp/hadoop-hadoop/mapred/local/1406915233073 > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > java.security.AccessController.doPrivileged(Native Method) > 2014-08-01 13:47:24 -0400: INFO -at > javax.security.auth.Subject.doAs(Subject.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.submit(Job.java:1282) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.run(Sqoop.java:145) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:220) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.runTool(Sqoop.java:229) > 2014-08-01 13:47:24 -0400: INFO -at > org.apache.sqoop.Sqoop.main(Sqoop.java:238) > {code} > If two are kicked off in the same second. The issue is the following lines of > code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: > {code} > // Generating unique numbers for FSDownload. > AtomicLong uniqueNumberGenerator = >new AtomicLong(System.currentTimeMillis()); > {code} > and > {code} > Long.toString(uniqueNumberGenerator.incrementAndGet())), > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149756#comment-16149756 ] Haibo Chen commented on MAPREDUCE-6937: --- Ah. Thanks [~djp]! > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.9.0, 2.8.2, 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6937: -- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.9.0, 2.8.2, 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6937: -- Fix Version/s: 2.8.2 > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.9.0, 2.8.2, 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6937: -- Fix Version/s: 2.9.0 > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.9.0, 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149184#comment-16149184 ] Haibo Chen commented on MAPREDUCE-6937: --- The unit tests always fail with error message "hadoop-mapreduce-client-app: There was a timeout or other error in the fork" It timed out while running org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler (See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7111/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_151.txt). But it does not seem related to this patch. The same issue existed in the Jenkins job testing the MAPREDUCE-6641 patch against branch-2.8 > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-4955) NM container diagnostics for excess resource usage can be lost if task fails while being killed
[ https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-4955: - Assignee: (was: Haibo Chen) > NM container diagnostics for excess resource usage can be lost if task fails > while being killed > > > Key: MAPREDUCE-4955 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mr-am >Affects Versions: 2.0.3-alpha, 0.23.5 >Reporter: Jason Lowe > > When a nodemanager kills a container for being over resource budgets, it > provides a diagnostics message for the container status explaining why it was > killed. However this message can be lost if the task fails during the > shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) > and notifies the AM via the task umbilical *before* the AM receives the NM's > container status message via the RM heartbeat. > In that case the task attempt fails with the task's failure diagnostic, and > the user is left wondering exactly why the task failed because the NM's > diagnostics arrive too late, are not written to the history file, and are > lost. If the AM receives the container status via the RM heartbeat before > the task fails during shutdown then the diagnostics are written properly to > the history file, and the user can see why the task failed. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Assigned] (MAPREDUCE-6636) Clean up argument parsing in mapred CLI
[ https://issues.apache.org/jira/browse/MAPREDUCE-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen reassigned MAPREDUCE-6636: - Assignee: (was: Haibo Chen) > Clean up argument parsing in mapred CLI > --- > > Key: MAPREDUCE-6636 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6636 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Robert Kanter > > {{org.apache.hadoop.mapreduce.tools.CLI}} manually parses arguments, which > has a number of downsides including (a) requiring strict ordering of > arguments by the user and (b) it's hard for devs to add new arguments. > We should replace all of this with a CLI parsing library like > {{org.apache.commons.cli.CommandLineParser}}, which is used in a number of > other places already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen resolved MAPREDUCE-6949. --- Resolution: Duplicate > yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml > --- > > Key: MAPREDUCE-6949 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: documentation >Affects Versions: 3.0.0-alpha4 >Reporter: Haibo Chen >Assignee: Haibo Chen >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147977#comment-16147977 ] Haibo Chen commented on MAPREDUCE-6937: --- Never mind my comments on the patch for 2.9. It's still branch-2 at this point. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml
Haibo Chen created MAPREDUCE-6949: - Summary: yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml Key: MAPREDUCE-6949 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949 Project: Hadoop Map/Reduce Issue Type: Bug Components: documentation Affects Versions: 3.0.0-alpha4 Reporter: Haibo Chen Assignee: Haibo Chen Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6892: -- Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Fix Version/s: 3.0.0-beta1 Status: Resolved (was: Patch Available) Thanks [~pbacsko] for the contribution, I have committed the patch to trunk > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, > MAPREDUCE-6892-006.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6937: -- Fix Version/s: 2.7.5 > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 2.7.5 > > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147615#comment-16147615 ] Haibo Chen commented on MAPREDUCE-6892: --- +1 on the latest patch. Will commit it shortly. > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, > MAPREDUCE-6892-006.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147567#comment-16147567 ] Haibo Chen commented on MAPREDUCE-6937: --- [~pbacsko] Can you also upload a patch for 2.9? > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147554#comment-16147554 ] Haibo Chen commented on MAPREDUCE-6937: --- +1 on the branch-2.7-v5 patch. I have filed MAPREDUCE-6948 for the unit test failure TestJobImpl.testUnusableNodeTransition. The checkstyle issues are legit, we could address them in MAPREDUCE-6939. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6948: -- Description: *Error Message* expected: but was: *Stacktrace* java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615) *Standard out* {code} 2017-08-30 10:12:21,928 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler 2017-08-30 10:12:21,939 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.jobhistory.EventType for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf 2017-08-30 10:12:21,940 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf 2017-08-30 10:12:21,941 INFO [Thread-49] impl.JobImpl (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to jobTokenSecretManager 2017-08-30 10:12:21,941 WARN [Thread-49] impl.JobImpl (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. Using job token secret as shuffle secret. 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 because: not enabled; 2017-08-30 10:12:21,944 INFO [Thread-49] impl.JobImpl (JobImpl.java:createMapTasks(1562)) - Input size for job job_123456789_0001 = 0. Number of splits = 2 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job job_123456789_0001 = 1 2017-08-30 10:12:21,945 INFO [Thread-49] impl.JobImpl (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW to INITED 2017-08-30 10:12:21,946 INFO [Thread-49] impl.JobImpl (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from INITED to SETUP 2017-08-30 10:12:21,954 INFO [CommitterEvent Processor #0] commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing the event EventType: JOB_SETUP 2017-08-30 10:12:21,978 INFO [AsyncDispatcher event handler] impl.JobImpl (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from SETUP to RUNNING 2017-08-30 10:12:21,983 INFO [Thread-49] event.AsyncDispatcher (AsyncDispatcher.java:register(209)) - Registering class org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5 2017-08-30 10:12:22,000 INFO [Thread-49] impl.JobImpl (JobImpl.java:transition(1953)) - Num completed Tasks: 1 2017-08-30 10:12:22,029 INFO [Thread-49] impl.JobImpl (JobImpl.java:transition(1953)) - Num completed Tasks: 2 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on unusable node Mock for NodeId, hashCode: 1280187896. AttemptId:attempt_123456789_0001_m_00_0 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl (JobImpl.java:transition(1953)) - Num completed Tasks: 3 2017-08-30 10:12:22,032 INFO [Thread-49] impl.JobImpl (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from RUNNING to COMMITTING 2017-08-30 10:12:22,032 ERROR [Thread-49] impl.JobImpl (JobImpl.java:handle(1009)) - Can't handle this event at current state org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at COMMITTING at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.a
[jira] [Created] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed
Haibo Chen created MAPREDUCE-6948: - Summary: TestJobImpl.testUnusableNodeTransition failed Key: MAPREDUCE-6948 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0-alpha4 Reporter: Haibo Chen *Error Message* expected: but was: *Stacktrace* java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041) at org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147536#comment-16147536 ] Haibo Chen commented on MAPREDUCE-6937: --- branch-2.8 jenkins job failed due to time out. I have retriggered the job. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6870-branch-2.01.patch, > MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, > MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, > MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, > MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, > MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6892: -- Tags: (was: incom) > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6892: -- Tags: incom Hadoop Flags: Incompatible change > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143964#comment-16143964 ] Haibo Chen commented on MAPREDUCE-6892: --- Thanks for the update [~pbacsko]! The patch is very close to be ready. But I still think we should give -1 as the default value to stand for unknown in the schema (Events.avpr). In TestJobHistoryParsing.testHistoryParsingForKilledAndFailedAttempts(), let's use // for the comments you've added (/**/ is for class/methods) and remove the nested for loop to assert RACK NAME for each of the tasks (I don't think they are relevant). Otherwise, the patch LGTM. > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143883#comment-16143883 ] Haibo Chen commented on MAPREDUCE-6937: --- Seems like the patch is not picked up by Jenkins. Can you also rename the patch in the format of [jira name]-[branch name].[patch version].patch, e.g. MAPREDUCE-6870-branch-2.7.02.patch? > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Erik Krogen > Attachments: MAPREDUCE-6870_branch2.7.patch, > MAPREDUCE-6870_branch2.7v2.patch, MAPREDUCE-6870_branch2.8.patch, > MAPREDUCE-6870_branch2.8v2.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
[ https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143857#comment-16143857 ] Haibo Chen commented on MAPREDUCE-6937: --- [~pbacsko], the default value for mapreduce.job.finish-when-all-reducers-done should be false to preserve the current behavior in branch-2.7 and 2.8. > Backport MAPREDUCE-6870 to branch-2 while preserving compatibility > -- > > Key: MAPREDUCE-6937 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Zhe Zhang >Assignee: Erik Krogen > Attachments: MAPREDUCE-6870_branch2.7.patch, > MAPREDUCE-6870_branch2.8.patch > > > To maintain compatibility we need to disable this by default per discussion > on MAPREDUCE-6870. > Using a separate JIRA to correctly track incompatibilities. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140481#comment-16140481 ] Haibo Chen commented on MAPREDUCE-6892: --- Thanks @Peter for the update, I'll wait for you to fix the test failure before I review. I do think the default value for killed failed should be -1 in the schema. -1 means not available/known IIUC. In our case, the killed and failed counters were not recorded previously, thus I think -1 makes more sense. > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common
[ https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129637#comment-16129637 ] Haibo Chen commented on MAPREDUCE-6936: --- Thanks [~rkanter] for the review! > Remove unnecessary dependency of hadoop-yarn-server-common from > hadoop-mapreduce-client-common > --- > > Key: MAPREDUCE-6936 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: MAPREDUCE-6936.00.patch > > > The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common > seems unnecessary, as > it is not using as of the classes from hadoop-yarn-server-common. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6870: -- Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124114#comment-16124114 ] Haibo Chen commented on MAPREDUCE-6870: --- Agreed. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123984#comment-16123984 ] Haibo Chen commented on MAPREDUCE-6870: --- In some cases, the single mapper can take hours to finish, thus delaying job completion by hours. We definitely want to default to false in 2.x for compatibility. For trunk, I think it is a good opportunity to fix it as an incompatible change, unless folks think strongly otherwise. IMO, it's better to fail the niche case in order to not confuse average users. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123624#comment-16123624 ] Haibo Chen commented on MAPREDUCE-6870: --- Thanks [~xkrogen] for pointing out the incompatibility! I agree. Maybe we even need to flip the default to false in order to avoid the incompatible, since the case this is trying to fix is an edge case to begin with. > Add configuration for MR job to finish when all reducers are complete (even > with unfinished mappers) > > > Key: MAPREDUCE-6870 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.6.1 >Reporter: Zhe Zhang >Assignee: Peter Bacsko > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, > MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, > MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch > > > Even with MAPREDUCE-5817, there could still be cases where mappers get > scheduled before all reducers are complete, but those mappers run for long > time, even after all reducers are complete. This could hurt the performance > of large MR jobs. > In some cases, mappers don't have any materialize-able outcome other than > providing intermediate data to reducers. In that case, the job owner should > have the config option to finish the job once all reducers are complete. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123608#comment-16123608 ] Haibo Chen commented on MAPREDUCE-6892: --- Thanks [~pbacsko] for updating the patch! A few more comments: 1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle the newly added fields? 2) In UnparsedJob, let return -1 to be consistent with getCompletedMaps() and getCompletedReduce(). Similarly for PartialJob, let's also return -1 to indicate the info is not available. 3) JobImpl.getCompletedMaps() return successMap + killedMap + FailedMap, whereas CompletedJob.getCompletedMaps() returns only successMap. Let's do the same in CompletedJob.getCompletedMaps() as well as in CompletedJob.getCompletedReduce(). 4) in Job20LineHistoryEventEmitter, how much work is it to also parse the failed/killed map/reducer counters (I have not familiar with this code)? I am OK to leave it if it is too much. 5) Not an issue with this patch, but let's also set killed/failed counters in JobHistoryParser.handleJobFinishedEvent() 6) CompletedJob.getKillReduces() should return (int) jobInfo.getKilledReduces(); 7) Rename JobSummary.getNumFinishedMaps() to getNumSucceededMaps(). Also, let's add summary.setNumKilled[Map/Reduce] in TestJobSummary.before() as well. Can you look into the test failure and fix it if possible? > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6892: -- Comment: was deleted (was: Thanks [~pbacsko] for updating the patch! A few more comments: 1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle the newly added fields? 2) In UnparsedJob, let return -1 to be consistent with getCompleted) > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file
[ https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123520#comment-16123520 ] Haibo Chen commented on MAPREDUCE-6892: --- Thanks [~pbacsko] for updating the patch! A few more comments: 1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle the newly added fields? 2) In UnparsedJob, let return -1 to be consistent with getCompleted > Issues with the count of failed/killed tasks in the jhist file > -- > > Key: MAPREDUCE-6892 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, jobhistoryserver >Reporter: Peter Bacsko >Assignee: Peter Bacsko > Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, > MAPREDUCE-6892-003.patch > > > Recently we encountered some issues with the value of failed tasks. After > parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually > there were failures. > Another minor thing is that you cannot get the number of killed tasks > (although this can be calculated). > The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the > successful map/reduce task counts. Number of failed (or killed) tasks are not > stored. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org