[jira] [Updated] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS

2018-10-23 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-4669:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> MRAM web UI does not work with HTTPS
> 
>
> Key: MAPREDUCE-4669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Robert Kanter
>Priority: Major
> Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, 
> MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch
>
>
> With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
> the MRAM process cannot read the cluster keystore files to get the 
> certificates to start its HttpServer using HTTPS.
> We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
> provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS

2018-10-23 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-4669:
--
Fix Version/s: 3.3.0

> MRAM web UI does not work with HTTPS
> 
>
> Key: MAPREDUCE-4669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Robert Kanter
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, 
> MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch
>
>
> With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
> the MRAM process cannot read the cluster keystore files to get the 
> certificates to start its HttpServer using HTTPS.
> We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
> provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS

2018-10-23 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661391#comment-16661391
 ] 

Haibo Chen commented on MAPREDUCE-4669:
---

+1 on the latest patch. Checking it in shortly.

> MRAM web UI does not work with HTTPS
> 
>
> Key: MAPREDUCE-4669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Robert Kanter
>Priority: Major
> Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, 
> MAPREDUCE-4669.003.patch, MAPREDUCE-4669.004.patch
>
>
> With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
> the MRAM process cannot read the cluster keystore files to get the 
> certificates to start its HttpServer using HTTPS.
> We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
> provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS

2018-10-19 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657337#comment-16657337
 ] 

Haibo Chen commented on MAPREDUCE-4669:
---

Thanks [~rkanter] for the patch! I have a few comments/questions.

1) Can we rename withNeedsClientAuth() to withClientAuth(boolean)?

2) In WebApps, I think we can skip the truststore part if clientAuth is false.

3) Can we add another unit test that queries the AM web server with a 
un-trusted certificate when client authorization is on?

> MRAM web UI does not work with HTTPS
> 
>
> Key: MAPREDUCE-4669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Robert Kanter
>Priority: Major
> Attachments: MAPREDUCE-4669.001.patch, MAPREDUCE-4669.002.patch, 
> MAPREDUCE-4669.003.patch
>
>
> With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
> the MRAM process cannot read the cluster keystore files to get the 
> certificates to start its HttpServer using HTTPS.
> We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
> provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory

2018-10-16 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7150:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.3.0
   Status: Resolved  (was: Patch Available)

Thanks [~mi...@cloudera.com] for the contribution and [~pbacsko] for additional 
reviews!

> Optimize collections used by MR JHS to reduce its memory
> 
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory

2018-10-16 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16652407#comment-16652407
 ] 

Haibo Chen commented on MAPREDUCE-7150:
---

+1 on the latest patch. Will check it in shortly.

> Optimize collections used by MR JHS to reduce its memory
> 
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> YARN-8872.03.patch, YARN-8872.04.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory

2018-10-12 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648621#comment-16648621
 ] 

Haibo Chen commented on MAPREDUCE-7150:
---

Thanks for the detailed explaination, [~mi...@cloudera.com]. That makes a lot 
of sense, and I agree with you that it is correct without the synchronized fix. 
 It's more of a convention to address the warning as much as possible, so that 
the findbugs warning won't show up in every Jenkins build.

The jenkins' job might be confused because the Jira is now moved to MapReduce. 
I have manually kicked it off.

 

> Optimize collections used by MR JHS to reduce its memory
> 
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> YARN-8872.03.patch, jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory

2018-10-12 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16648559#comment-16648559
 ] 

Haibo Chen commented on MAPREDUCE-7150:
---

Thanks [~mi...@cloudera.com] for the patch!
{quote}The {{map}} object is created lazily in the synchronized method 
{{findCounter()}}, so according to the Java Memory Model, once it's created, 
it's visible to all the code, both synchronized and unsynchronized.
{quote}
Not a JMM expert. Doesn't the reader always need to have a read barrier to get 
the latest result of a variable?  Is there something that synchronized block 
does special?

Regardless, let's add synchronized to the write(DataOutput) method too to fix 
the findbugs warning.

> Optimize collections used by MR JHS to reduce its memory
> 
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7150) Optimize collections used by MR JHS to reduce its memory

2018-10-12 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7150:
--
Summary: Optimize collections used by MR JHS to reduce its memory  (was: 
Optimize collections used by Yarn JHS to reduce its memory)

> Optimize collections used by MR JHS to reduce its memory
> 
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7150) Optimize collections used by Yarn JHS to reduce its memory

2018-10-12 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-7150:
-

Assignee: Misha Dmitriev

> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-7150) Optimize collections used by Yarn JHS to reduce its memory

2018-10-12 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-7150:
-

   Assignee: (was: Misha Dmitriev)
Component/s: (was: yarn)
 mrv2
 jobhistoryserver
Key: MAPREDUCE-7150  (was: YARN-8872)
Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Optimize collections used by Yarn JHS to reduce its memory
> --
>
> Key: MAPREDUCE-7150
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7150
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver, mrv2
>Reporter: Misha Dmitriev
>Priority: Major
> Attachments: YARN-8872.01.patch, YARN-8872.02.patch, 
> jhs-bad-collections.png
>
>
> We analyzed, using jxray (www.jxray.com) a heap dump of JHS running with big 
> heap in a large clusters, handling large MapReduce jobs. The heap is large 
> (over 32GB) and 21.4% of it is wasted due to various suboptimal Java 
> collections, mostly maps and lists that are either empty or contain only one 
> element. In such under-populated collections considerable amount of memory is 
> still used by just the internal implementation objects. See the attached 
> excerpt from the jxray report for the details. If certain collections are 
> almost always empty, they should be initialized lazily. If others almost 
> always have just 1 or 2 elements, they should be initialized with the 
> appropriate initial capacity of 1 or 2 (the default capacity is 16 for 
> HashMap and 10 for ArrayList).
> Based on the attached report, we should do the following:
>  # {{FileSystemCounterGroup.map}} - initialize lazily
>  # {{CompletedTask.attempts}} - initialize with  capacity 2, given most tasks 
> only have one or two attempts
>  # {{JobHistoryParser$TaskInfo.attemptsMap}} - initialize with capacity
>  # {{CompletedTaskAttempt.diagnostics}} - initialize with capacity 1 since it 
> contains one diagnostic message most of the time
>  # {{CompletedTask.reportDiagnostics}} - switch to ArrayList (no reason to 
> use the more wasteful LinkedList here) and initialize with capacity 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics

2018-08-27 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6861:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~zsiegl] for your contribution! I have checked in the patch to trunk

> Add metrics tags for ShuffleClientMetrics
> -
>
> Key: MAPREDUCE-6861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
> Fix For: 3.2.0
>
> Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.002.patch, 
> MAPREDUCE-6861.01.patch
>
>
> Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them 
> back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics

2018-08-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593716#comment-16593716
 ] 

Haibo Chen commented on MAPREDUCE-6861:
---

+1 on the latest 02 patch pending jenkins.

> Add metrics tags for ShuffleClientMetrics
> -
>
> Key: MAPREDUCE-6861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
> Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.002.patch, 
> MAPREDUCE-6861.01.patch
>
>
> Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them 
> back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics

2018-08-27 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16593651#comment-16593651
 ] 

Haibo Chen commented on MAPREDUCE-6861:
---

Thanks [~zsiegl] for the patch. The patch looks good to me with minor comments.

1) The description of the RECORD_INFO can be "Metrics for Shuffle client" 
instead of "Metrics for Shuffle Plugin"

2) TestShuffleClientMetrics.testCreate() can probably be renamed to 
testShuffleMetricsTag() or the like.

3) "shuffleClientMetrics.threadBusy();" is not necessary as part of the act 
step in the unit test given what the test is doing.

> Add metrics tags for ShuffleClientMetrics
> -
>
> Key: MAPREDUCE-6861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
> Attachments: MAPREDUCE-6861.00.patch, MAPREDUCE-6861.01.patch
>
>
> Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them 
> back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6861) Add metrics tags for ShuffleClientMetrics

2018-08-24 Thread Haibo Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6861:
-

Assignee: Zoltan Siegl

> Add metrics tags for ShuffleClientMetrics
> -
>
> Key: MAPREDUCE-6861
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6861
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha1
>Reporter: Akira Ajisaka
>Assignee: Zoltan Siegl
>Priority: Major
>  Labels: newbie++
>
> Metrics tags were unintentionally removed by MAPREDUCE-6526. Let's add them 
> back.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2018-07-17 Thread Haibo Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546895#comment-16546895
 ] 

Haibo Chen commented on MAPREDUCE-6948:
---

I am okay with closing this for now.

> TestJobImpl.testUnusableNodeTransition failed
> -
>
> Key: MAPREDUCE-6948
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Jim Brennan
>Priority: Major
>  Labels: unit-test
>
> *Error Message*
> expected: but was:
> *Stacktrace*
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)
> *Standard out*
> {code}
> 2017-08-30 10:12:21,928 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2017-08-30 10:12:21,939 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,941 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to 
> jobTokenSecretManager
> 2017-08-30 10:12:21,941 WARN  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. 
> Using job token secret as shuffle secret.
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 
> because: not enabled;
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createMapTasks(1562)) - Input size for job 
> job_123456789_0001 = 0. Number of splits = 2
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job 
> job_123456789_0001 = 1
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW 
> to INITED
> 2017-08-30 10:12:21,946 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> INITED to SETUP
> 2017-08-30 10:12:21,954 INFO  [CommitterEvent Processor #0] 
> commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - 
> Processing the event EventType: JOB_SETUP
> 2017-08-30 10:12:21,978 INFO  [AsyncDispatcher event handler] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> SETUP to RUNNING
> 2017-08-30 10:12:21,983 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5
> 2017-08-30 10:12:22,000 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 1
> 2017-08-30 10:12:22,029 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 2
> 2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on 
> unusable node Mock for NodeId, hashCode: 1280187896. 
> AttemptId:attempt_123456789_0001_m_00_0
> 2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
> (JobIm

[jira] [Commented] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile

2018-05-10 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470841#comment-16470841
 ] 

Haibo Chen commented on MAPREDUCE-7095:
---

Thanks [~miklos.szeg...@cloudera.com] for the fix. I have committed the patch 
to trunk.

> Race conditions in closing FadvisedChunkedFile 
> ---
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch, YARN-8090.003.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.Th

[jira] [Updated] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile

2018-05-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7095:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.2.0
   Status: Resolved  (was: Patch Available)

> Race conditions in closing FadvisedChunkedFile 
> ---
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Fix For: 3.2.0
>
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch, YARN-8090.003.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$

[jira] [Updated] (MAPREDUCE-7095) Race conditions in closing FadvisedChunkedFile

2018-05-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7095:
--
Summary: Race conditions in closing FadvisedChunkedFile   (was: Race 
conditions in FadvisedChunkedFile)

> Race conditions in closing FadvisedChunkedFile 
> ---
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch, YARN-8090.003.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at j

[jira] [Commented] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile

2018-05-10 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470818#comment-16470818
 ] 

Haibo Chen commented on MAPREDUCE-7095:
---

+1. Checking this in shortly.

> Race conditions in FadvisedChunkedFile
> --
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch, YARN-8090.003.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){co

[jira] [Commented] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile

2018-05-09 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469342#comment-16469342
 ] 

Haibo Chen commented on MAPREDUCE-7095:
---

[~miklos.szeg...@cloudera.com] can you address the checkstyle issues?

> Race conditions in FadvisedChunkedFile
> --
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(

[jira] [Updated] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile

2018-05-09 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7095:
--
Status: Patch Available  (was: Open)

> Race conditions in FadvisedChunkedFile
> --
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

--

[jira] [Moved] (MAPREDUCE-7095) Race conditions in FadvisedChunkedFile

2018-05-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen moved YARN-8090 to MAPREDUCE-7095:
-

Affects Version/s: (was: 3.1.0)
   3.1.0
  Key: MAPREDUCE-7095  (was: YARN-8090)
  Project: Hadoop Map/Reduce  (was: Hadoop YARN)

> Race conditions in FadvisedChunkedFile
> --
>
> Key: MAPREDUCE-7095
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7095
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Minor
> Attachments: YARN-8090.000.patch, YARN-8090.001.patch, 
> YARN-8090.002.patch
>
>
> When a file is closed mutple times by multiple threads, all but the first 
> close will generate a WARNING message.
> {code:java}
> 11:04:33.605 AM   WARNFadvisedChunkedFile 
> Failed to manage OS cache for 
> /var/run/100/yarn/nm/usercache/systest/appcache/application_1521665017379_0062/output/attempt_1521665017379_0062_m_012797_0/file.out
> EBADF: Bad file descriptor
>   at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native 
> Method)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
>   at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
>   at 
> org.apache.hadoop.mapred.FadvisedChunkedFile.close(FadvisedChunkedFile.java:76)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.closeInput(ChunkedWriteHandler.java:303)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.discard(ChunkedWriteHandler.java:163)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.flush(ChunkedWriteHandler.java:192)
>   at 
> org.jboss.netty.handler.stream.ChunkedWriteHandler.handleUpstream(ChunkedWriteHandler.java:137)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelClosed(SimpleChannelUpstreamHandler.java:225)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.replay.ReplayingDecoder.cleanup(ReplayingDecoder.java:570)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:493)
>   at 
> org.jboss.netty.handler.codec.frame.FrameDecoder.channelClosed(FrameDecoder.java:371)
>   at 
> org.jboss.netty.handler.ssl.SslHandler.channelClosed(SslHandler.java:1667)
>   at 
> org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:88)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
>   at 
> org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
>   at org.jboss.netty.channel.Channels.fireChannelClosed(Channels.java:468)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:375)
>   at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
>   at 
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>   at 
> org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at 
> org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPool

[jira] [Commented] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2018-03-21 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16408722#comment-16408722
 ] 

Haibo Chen commented on MAPREDUCE-6948:
---

This was observed in alpha4, so I suspect there is another race condition.

> TestJobImpl.testUnusableNodeTransition failed
> -
>
> Key: MAPREDUCE-6948
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Jim Brennan
>Priority: Major
>  Labels: unit-test
>
> *Error Message*
> expected: but was:
> *Stacktrace*
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
>   at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)
> *Standard out*
> {code}
> 2017-08-30 10:12:21,928 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
> 2017-08-30 10:12:21,939 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.jobhistory.EventType for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
> org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
> 2017-08-30 10:12:21,941 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to 
> jobTokenSecretManager
> 2017-08-30 10:12:21,941 WARN  [Thread-49] impl.JobImpl 
> (JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. 
> Using job token secret as shuffle secret.
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 
> because: not enabled;
> 2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createMapTasks(1562)) - Input size for job 
> job_123456789_0001 = 0. Number of splits = 2
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:createReduceTasks(1579)) - Number of reduces for job 
> job_123456789_0001 = 1
> 2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW 
> to INITED
> 2017-08-30 10:12:21,946 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> INITED to SETUP
> 2017-08-30 10:12:21,954 INFO  [CommitterEvent Processor #0] 
> commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - 
> Processing the event EventType: JOB_SETUP
> 2017-08-30 10:12:21,978 INFO  [AsyncDispatcher event handler] impl.JobImpl 
> (JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
> SETUP to RUNNING
> 2017-08-30 10:12:21,983 INFO  [Thread-49] event.AsyncDispatcher 
> (AsyncDispatcher.java:register(209)) - Registering class 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5
> 2017-08-30 10:12:22,000 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 1
> 2017-08-30 10:12:22,029 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:transition(1953)) - Num completed Tasks: 2
> 2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
> (JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on 
> unusable node Mock for NodeId, hashCode: 1280187896. 
> AttemptId:attempt_123456789_0001_m_00_0
> 2017-08-30 10:12:22,032 INF

[jira] [Updated] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs

2018-03-12 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7065:
--
Description: 
While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved (YARN-6058)

2) support streaming api to get all generic entities (YARN-5627)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.

  was:
While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved

2) support streaming api to get all generic entities (YARN-5627)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.


> Improve information stored in ATSv2 for MR jobs
> ---
>
> Key: MAPREDUCE-7065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>
> While exploring the possibility of retrieving every piece of information that 
> JHS presents today through ATSv2, I found a few improvements we can make.
> 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
> indistinguishably stored as entities of type MR_TASK. We can split MR_TASK 
> into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT
> 2) Task attempt final state are stored in the events, so we can not use 
> infofilter to group task attempts by final state, which is what JHS does.
> 3) Display names of counters are not stored in JHS. We are currently storing 
> (counter name, display name, value) as a metric (counter name, value). We can 
> potentially store (counter name, display name) as an info. Similarly for 
> sources of Job configuration properties
> 4) Job level counters and configuration properties are stored both in 
> ApplicationTable and EntityTable. It's probably safe just to store MR 
> specific counters in EntityTable.
>  
> One general problem I see around this area in M

[jira] [Commented] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs

2018-03-12 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396248#comment-16396248
 ] 

Haibo Chen commented on MAPREDUCE-7065:
---

CC [~rohithsharma] [~vrushalic]. Is it OK at this point to change how MR data 
is stored in ATSv2, for example split MR_TASK into MR_MAP_TASK and 
MR_REDUCE_TASK?

> Improve information stored in ATSv2 for MR jobs
> ---
>
> Key: MAPREDUCE-7065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>
> While exploring the possibility of retrieving every piece of information that 
> JHS presents today through ATSv2, I found a few improvements we can make.
> 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
> indistinguishably stored as entities of type MR_TASK. We can split MR_TASK 
> into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT
> 2) Task attempt final state are stored in the events, so we can not use 
> infofilter to group task attempts by final state, which is what JHS does.
> 3) Display names of counters are not stored in JHS. We are currently storing 
> (counter name, display name, value) as a metric (counter name, value). We can 
> potentially store (counter name, display name) as an info. Similarly for 
> sources of Job configuration properties
> 4) Job level counters and configuration properties are stored both in 
> ApplicationTable and EntityTable. It's probably safe just to store MR 
> specific counters in EntityTable.
>  
> One general problem I see around this area in MR:
> 1) We can precompute # of failed/killed/successful map/reduce task attempts 
> and average map/reduce/shuffle/merge time in the AM. This would avoid 
> iterating over all task attempts when JHS servers the Job Overview Page.
>  
> To fully replace JHS with ATSv2, three functionalities need to be supported 
> by ATSv2
> 1) /apps/ query so that a list of all jobs can be retrieved
> 2) support streaming api to get all generic entities (YARN-5627)
> 3) support per-app data retention policy. Likely a setting in TimelineWriter 
> that allow admins specifies how long information of a given application 
> should be kepts, in the form of TTL in HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs

2018-03-12 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7065:
--
Description: 
While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved

2) support streaming api to get all generic entities (YARN-5627)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.

  was:
While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved

2) support streaming api to get all generic entities (YARN-5672)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.


> Improve information stored in ATSv2 for MR jobs
> ---
>
> Key: MAPREDUCE-7065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Major
>
> While exploring the possibility of retrieving every piece of information that 
> JHS presents today through ATSv2, I found a few improvements we can make.
> 1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
> indistinguishably stored as entities of type MR_TASK. We can split MR_TASK 
> into MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT
> 2) Task attempt final state are stored in the events, so we can not use 
> infofilter to group task attempts by final state, which is what JHS does.
> 3) Display names of counters are not stored in JHS. We are currently storing 
> (counter name, display name, value) as a metric (counter name, value). We can 
> potentially store (counter name, display name) as an info. Similarly for 
> sources of Job configuration properties
> 4) Job level counters and configuration properties are stored both in 
> ApplicationTable and EntityTable. It's probably safe just to store MR 
> specific counters in EntityTable.
>  
> One general problem I see around this area in MR:
> 1) We c

[jira] [Created] (MAPREDUCE-7065) Improve information stored in ATSv2 for MR jobs

2018-03-12 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-7065:
-

 Summary: Improve information stored in ATSv2 for MR jobs
 Key: MAPREDUCE-7065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7065
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Haibo Chen
Assignee: Haibo Chen


While exploring the possibility of retrieving every piece of information that 
JHS presents today through ATSv2, I found a few improvements we can make.

1) MR tasks are split by type in JHS, map tasks or reduce tasks. They are 
indistinguishably stored as entities of type MR_TASK. We can split MR_TASK into 
MR_REDUCE_TASK and MR_MAP_TASK. Similarly for MR_TASK_ATTEMPT

2) Task attempt final state are stored in the events, so we can not use 
infofilter to group task attempts by final state, which is what JHS does.

3) Display names of counters are not stored in JHS. We are currently storing 
(counter name, display name, value) as a metric (counter name, value). We can 
potentially store (counter name, display name) as an info. Similarly for 
sources of Job configuration properties

4) Job level counters and configuration properties are stored both in 
ApplicationTable and EntityTable. It's probably safe just to store MR specific 
counters in EntityTable.

 

One general problem I see around this area in MR:

1) We can precompute # of failed/killed/successful map/reduce task attempts and 
average map/reduce/shuffle/merge time in the AM. This would avoid iterating 
over all task attempts when JHS servers the Job Overview Page.

 

To fully replace JHS with ATSv2, three functionalities need to be supported by 
ATSv2

1) /apps/ query so that a list of all jobs can be retrieved

2) support streaming api to get all generic entities (YARN-5672)

3) support per-app data retention policy. Likely a setting in TimelineWriter 
that allow admins specifies how long information of a given application should 
be kepts, in the form of TTL in HBase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2018-02-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6441:
--
Attachment: MAPREDUCE-6441.011.patch

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
>Priority: Major
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch, MAPREDUCE-6441.011.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2018-02-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6441:
--
Attachment: MAPREDUCE-6441.010.patch

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
>Priority: Major
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch, MAPREDUCE-6441.010.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2018-02-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6441:
--
Attachment: MAPREDUCE-6441.009.patch

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
>Priority: Major
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2018-02-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6441:
--
Attachment: (was: MAPREDUCE-6441.009.patch)

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
>Priority: Major
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-4669) MRAM web UI does not work with HTTPS

2018-02-13 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-4669:
-

Assignee: (was: Haibo Chen)

> MRAM web UI does not work with HTTPS
> 
>
> Key: MAPREDUCE-4669
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4669
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha
>Reporter: Alejandro Abdelnur
>Priority: Major
>
> With Kerberos enable, the MRAM runs as the user that submitted the job, thus 
> the MRAM process cannot read the cluster keystore files to get the 
> certificates to start its HttpServer using HTTPS.
> We need to decouple the keystore used by RM/NM/NN/DN (which are cluster 
> provided) from the keystore used by AMs (which ought to be user provided).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2018-02-13 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363021#comment-16363021
 ] 

Haibo Chen commented on MAPREDUCE-6441:
---

[~rchiang] Do you plan to work on this? I can take it over if you don't have 
the cycles

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
>Priority: Major
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7041) MR should not try to clean up at first job attempt

2018-01-25 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7041:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.10.0
   3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~tasanuma0829] for reporting the issue, and [~grepas] for the quick 
fix. I committed the patch in branch-2, branch-3.0 and trunk

> MR should not try to clean up at first job attempt
> --
>
> Key: MAPREDUCE-7041
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 2.10.0
>
> Attachments: MAPREDUCE-7041.000.patch
>
>
> These tests fails in trunk now. MAPREDUCE-6984 may be related.
> {noformat}
> hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed
> hadoop.mapred.TestJobCleanup.testCustomAbort
> hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7041) MR should not try to clean up at first job attempt

2018-01-25 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-7041:
--
Summary: MR should not try to clean up at first job attempt  (was: Several 
unit tests in hadoop-mapreduce-client-jobclient are failed)

> MR should not try to clean up at first job attempt
> --
>
> Key: MAPREDUCE-7041
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-7041.000.patch
>
>
> These tests fails in trunk now. MAPREDUCE-6984 may be related.
> {noformat}
> hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed
> hadoop.mapred.TestJobCleanup.testCustomAbort
> hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7041) Several unit tests in hadoop-mapreduce-client-jobclient are failed

2018-01-25 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340329#comment-16340329
 ] 

Haibo Chen commented on MAPREDUCE-7041:
---

+1. Check this in shortly.

> Several unit tests in hadoop-mapreduce-client-jobclient are failed
> --
>
> Key: MAPREDUCE-7041
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7041
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Takanobu Asanuma
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-7041.000.patch
>
>
> These tests fails in trunk now. MAPREDUCE-6984 may be related.
> {noformat}
> hadoop.mapreduce.v2.TestMROldApiJobs.testJobSucceed
> hadoop.mapred.TestJobCleanup.testCustomAbort
> hadoop.mapreduce.lib.output.TestJobOutputCommitter.testCustomAbort
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery

2018-01-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6984:
--
Fix Version/s: 2.10.0

> MR AM to clean up temporary files from previous attempt in case of no recovery
> --
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1, 2.10.0
>
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, 
> MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery

2018-01-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6984:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   3.1.0
   Status: Resolved  (was: Patch Available)

Thanks [~grepas] for your contribution. I have committed the patch to branch-2, 
branch-3.0 and trunk

> MR AM to clean up temporary files from previous attempt in case of no recovery
> --
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Fix For: 3.1.0, 3.0.1
>
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, 
> MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt in case of no recovery

2018-01-19 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6984:
--
Summary: MR AM to clean up temporary files from previous attempt in case of 
no recovery  (was: MR AM to clean up temporary files from previous attempt)

> MR AM to clean up temporary files from previous attempt in case of no recovery
> --
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, 
> MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2018-01-19 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332761#comment-16332761
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

+1 pending Jenkins.

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, 
> MAPREDUCE-6984.009.patch, MAPREDUCE-6984.010.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2018-01-19 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16332679#comment-16332679
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

Thanks [~grepas] for the update! Please address the check style issue and a 
minor comment. Not sure what this line of code is for. Otherwise, LGTM
{quote}    conf.setBoolean("want.am.recovery", true)
{quote}

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch, MAPREDUCE-6984.007.patch, MAPREDUCE-6984.008.patch, 
> MAPREDUCE-6984.009.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2018-01-17 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328901#comment-16328901
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

In TestRecovery, we always create an instance MRApp and stops it to simulate an 
AM failure, which in my opinion is more readable than mucking with the file 
system directly in our test. Plus, the current test is coupled with the 
behavior of FileOutputCommitter. We can add a similar test in TestRecovery, 
wrap up the OutputCommit in a spy and verify if abortJob is called. This way, 
we cover other OutputCommitters.

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2018-01-16 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327558#comment-16327558
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

Thanks [~grepas] for the updated patch! A few more comments:
 # In cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId), we 
essentially clean up the whole job rather than individual job attempt. If there 
are previously more than one attempts, only the first job clean up will 
succeed. How about we rename it to CleanupPreviousJobOutput and call it only 
once in  cleanUpPreviousAttemptOutput()?
 # We should probably pass FAILED to the abortJob() call.
 # For the unit test, we can do things similar in TestRecovery to simulate 
multiple job attempts, which I think is more readable.

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
>Priority: Major
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch, 
> MAPREDUCE-6984.003.patch, MAPREDUCE-6984.004.patch, MAPREDUCE-6984.005.patch, 
> MAPREDUCE-6984.006.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-10 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321295#comment-16321295
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

Thanks [~miklos.szeg...@cloudera.com] for your reviews!

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: YARN-1011
>
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, 
> MAPREDUCE-6926-YARN-1011.03.patch, MAPREDUCE-6926-YARN-1011.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-10 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.04.patch

Good catch. I updated the patch to address that issue.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, 
> MAPREDUCE-6926-YARN-1011.03.patch, MAPREDUCE-6926-YARN-1011.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-09 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.03.patch

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch, 
> MAPREDUCE-6926-YARN-1011.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-03 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310315#comment-16310315
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

I have instead rebased YARN-1011 branch and manually triggered a Jenkins job

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308813#comment-16308813
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

As pointed by Miklos offiline, HADOOP-15122 should fix the build issue here. 
Cherry-picking HADOOP-15122 and re triggering another Jenkins job.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.02.patch

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch, MAPREDUCE-6926-YARN-1011.02.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308705#comment-16308705
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

Good catch. I updated the message in the new patch.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7015) Possible race condition in JHS if the job is not loaded

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308511#comment-16308511
 ] 

Haibo Chen commented on MAPREDUCE-7015:
---

[~pbacsko] I think you meant to say step #6 is slow enough for the most times, 
right? Otherwise, the cli client would almost always try to find the config 
file in the intermediate directory when the file is already quickly moved into 
the done directory, and fail.

> Possible race condition in JHS if the job is not loaded
> ---
>
> Key: MAPREDUCE-7015
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7015
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-7015-POC01.patch
>
>
> There could be a race condition inside JHS. In our build environment, 
> {{TestMRJobClient.testJobClient()}} failed with this exception:
> {noformat}
> ava.io.FileNotFoundException: File does not exist: 
> hdfs://localhost:32836/tmp/hadoop-yarn/staging/history/done_intermediate/jenkins/job_1509975084722_0001_conf.xml
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:340)
>   at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2123)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2092)
>   at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2068)
>   at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:460)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.runTool(TestMRJobClient.java:94)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testConfig(TestMRJobClient.java:551)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:167)
> {noformat}
> Root cause:
> 1. MapReduce job completes
> 2. CLI calls {{cluster.getJob(jobid)}}
> 3. The job is finished and the client side gets redirected to JHS
> 4. The job data is missing from {{CachedHistoryStorage}} so JHS tries to find 
> the job
> 5. First it scans the intermediate directory and finds the job
> 6. The call {{moveToDone()}} is scheduled for execution on a separate thread 
> inside {{moveToDoneExecutor}} but does not get the chance to run immediately
> 7. RPC invocation returns with the path pointing to 
> {{/tmp/hadoop-yarn/staging/history/done_intermediate}}
> 8. The call to {{moveToDone()}} completes which moves the contents of 
> {{done_intermediate}} to {{done}}
> 9. Hadoop CLI tries to download the config file from done_intermediate but 
> it's no longer there
> Usually step #6 is fast enough to complete before step #7, but sometimes it 
> can get behind, causing this race condition.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2018-01-02 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308413#comment-16308413
 ] 

Haibo Chen commented on MAPREDUCE-6926:
---

Thanks [~miklos.szeg...@cloudera.com] for the review! I agree with you that 
Using Clock.getTime() would ease unit test in general. But in this case, there 
are already other ContainerRequest constructors to allow easy unit test through 
direct specification of request time. Plus, I believe my change is preserving 
the current behavior in the code.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2017-11-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.01.patch

Updated the patch to address the unit test failure plus the checkstyle issues.

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch, 
> MAPREDUCE-6926-YARN-1011.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2017-11-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Status: Patch Available  (was: Open)

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6926) Allow MR jobs to opt out of oversubscription

2017-11-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6926:
--
Attachment: MAPREDUCE-6926-YARN-1011.00.patch

> Allow MR jobs to opt out of oversubscription
> 
>
> Key: MAPREDUCE-6926
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6926
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: mrv2
>Affects Versions: 3.0.0-alpha3
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: MAPREDUCE-6926-YARN-1011.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2017-10-26 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221178#comment-16221178
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

Thanks [~grepas] for the patch! A few high level comments:
1) The cleanup of previous attempts does not impact the current attempt in any 
case, so I think we should catch
all exception in  cleanUpPreviousAttemptOutput() instead of just IOExceptions.

2) If we have multiple previous attempts and the previous attempt cleaned up 
its previous attempts, we'd get
FileNotFoundException while doing cleanUp in current attempt, resulting in 
bogus error messages 
"LOG.error("Error while trying to clean up previous attempt (" + appAttemptId + 
")", e);"
We could catch FileNotFoundException and ignore it first, and then log any 
exception like this.
{code}
  private void cleanUpPreviousAttemptOutput(ApplicationAttemptId appAttemptId) {
Configuration configuration = new Configuration(getConfig());
configuration.setInt(MRJobConfig.APPLICATION_ATTEMPT_ID,
appAttemptId.getAttemptId());
JobContext jobContext = getJobContextFromConf(configuration);
try {
  LOG.info("Starting to clean up previous attempt's (" +
  appAttemptId + ") temporary files");
  OutputCommitter outputCommitter = createOutputCommitter(configuration);
  outputCommitter.abortJob(jobContext, State.KILLED);
  LOG.info("Finished cleaning up previous attempt's (" +
  appAttemptId + ") temporary files");
} catch (FileNotFoundException) {
   //safely ignore
}catch (Exception e) {
  // the clean up of a previous attempt is not critical to the success
  // of this job - only logging the error
  LOG.error("Error while trying to clean up previous attempt (" +
  appAttemptId + ")", e);
}
  }
{code}

3) We only recover successful tasks, so we can also do the clean up of 
failed/killed tasks even recovery is enabled. but that's another optimization. 
I am ok with not doing it if it is complicated.



> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6992) Race for temp dir in LocalDistributedCacheManager.java

2017-10-26 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220977#comment-16220977
 ] 

Haibo Chen commented on MAPREDUCE-6992:
---

I believe this is a duplicate of MAPREDUCE-6441

> Race for temp dir in LocalDistributedCacheManager.java
> --
>
> Key: MAPREDUCE-6992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6992
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Philip Zeyliger
>
> When localizing distributed cache files in "local" mode, 
> LocalDistributedCacheManager.java chooses a "unique" directory based on a 
> millisecond time stamp. When running code with some parallelism, it's 
> possible to run into this.
> The error message looks like 
> {code}
> bq. java.io.FileNotFoundException: jenkins/mapred/local/1508958341829_tmp 
> does not exist
> {code}
> I ran into this in Impala's data loading. There, we run a HiveServer2 which 
> runs in MapReduce. If multiple queries are submitted simultaneously to the 
> HS2, they conflict on this directory. Googling found that StreamSets ran into 
> something very similar looking at 
> https://issues.streamsets.com/browse/SDC-5473.
> I believe the buggy code is (link: 
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L94)
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
> new AtomicLong(System.currentTimeMillis());
> {code}
> Notably, a similar code path uses an actual random number generator 
> ({{LocalJobRunner.java}}, 
> https://github.com/apache/hadoop/blob/2da654e34a436aae266c1fbdec5c1067da8d854e/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalJobRunner.java#L912).
> {code}
>   public String getStagingAreaDir() throws IOException {
> Path stagingRootDir = new Path(conf.get(JTConfig.JT_STAGING_AREA_ROOT,
> "/tmp/hadoop/mapred/staging"));
> UserGroupInformation ugi = UserGroupInformation.getCurrentUser();
> String user;
> randid = rand.nextInt(Integer.MAX_VALUE);
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2017-10-23 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16215975#comment-16215975
 ] 

Haibo Chen commented on MAPREDUCE-6984:
---

Manually trigger the jenkins job.

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6631) shuffle handler would benefit from per-local-dir threads

2017-10-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6631:
-

Assignee: (was: Haibo Chen)

> shuffle handler would benefit from per-local-dir threads
> 
>
> Key: MAPREDUCE-6631
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6631
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: nodemanager
>Affects Versions: 2.7.2, 3.0.0-alpha1
>Reporter: Nathan Roberts
>
> [~jlowe] and I discussed this while investigating I/O starvation we have been 
> seeing on our clusters lately (possibly amplified by increased tez 
> workloads). 
> If a particular disk is being slow, it is very likely that all shuffle netty 
> threads will be blocked on the read side of sendfile(). (sendfile() is 
> asynchronous on the outbound socket side, but not on the read side.) This 
> causes the entire shuffle subsystem to slow down. 
> It seems like we could make the netty threads more asynchronous by 
> introducing a small set of threads per local-dir that are responsible for the 
> actual sendfile() invocations.
> This would not only improve shuffles that span drives, but also improve 
> situations where there is a single large shuffle from a single local-dir. It 
> would allow other drives to continue serving shuffle requests, AND avoid a 
> large number of readers (2X number_of_cores by default) all fighting for the 
> same drive, which becomes unfair to everything else on the system.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-5124) AM lacks flow control for task events

2017-10-20 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-5124:
-

Assignee: Peter Bacsko  (was: Haibo Chen)

> AM lacks flow control for task events
> -
>
> Key: MAPREDUCE-5124
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5124
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-5124-proto.2.txt, MAPREDUCE-5124-prototype.txt
>
>
> The AM does not have any flow control to limit the incoming rate of events 
> from tasks.  If the AM is unable to keep pace with the rate of incoming 
> events for a sufficient period of time then it will eventually exhaust the 
> heap and crash.  MAPREDUCE-5043 addressed a major bottleneck for event 
> processing, but the AM could still get behind if it's starved for CPU and/or 
> handling a very large job with tens of thousands of active tasks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6984) MR AM to clean up temporary files from previous attempt

2017-10-17 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6984:
-

Assignee: Gergo Repas

> MR AM to clean up temporary files from previous attempt
> ---
>
> Key: MAPREDUCE-6984
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6984
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster
>Affects Versions: 3.0.0-beta1
>Reporter: Gergo Repas
>Assignee: Gergo Repas
> Attachments: MAPREDUCE-6984.000.patch, MAPREDUCE-6984.001.patch
>
>
> When the MR AM restarts, the 
> {outputDir}/_temporary/{appAttemptNumber} directory 
> remains on HDFS, even though this directory is not used during the next 
> attempt if the restart has been done without recovery. So if recovery is not 
> used for the AM restart, then the deletion of this directory can be done 
> earlier (at the start of the next attempt). The benefit is that more free 
> HDFS space is available for the next attempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2017-10-05 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193551#comment-16193551
 ] 

Haibo Chen commented on MAPREDUCE-6441:
---

Thanks for the update, [~rchiang]. We probably should check the future objects 
returned by invoking all the callables to see if there are exceptions.
Otherwise, the test won't fail even if some of the callables fail

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch, MAPREDUCE-6441.008.patch, 
> MAPREDUCE-6441.009.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-09-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6892:
--
Release Note: This adds some new job counters, the number of failed 
MAP/REDUCE tasks and the number of killed MAP/REDUCE tasks.

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, 
> MAPREDUCE-6892-006.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2017-09-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162181#comment-16162181
 ] 

Haibo Chen commented on MAPREDUCE-6441:
---

I see. Talking with Daniel, there is really no good way to do this. Can we then 
add a javadoc to this new method to explain the issue with the test?
As suggested by Daniel, we should probably use Barrier (All threads wait on the 
barrier and get notified at the same time) which will give us better chance to 
reproduce this.

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle

2017-09-08 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6955.
---
Resolution: Not A Problem

> remove unnecessary dependency from hadoop-mapreduce-client-app to 
> hadoop-mapreduce-client-shuffle
> -
>
> Key: MAPREDUCE-6955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6955) remove unnecessary dependency from hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle

2017-09-08 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6955:
-

 Summary: remove unnecessary dependency from 
hadoop-mapreduce-client-app to hadoop-mapreduce-client-shuffle
 Key: MAPREDUCE-6955
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6955
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Haibo Chen
Assignee: Haibo Chen






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6441) Improve temporary directory name generation in LocalDistributedCacheManager for concurrent processes

2017-09-05 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16153860#comment-16153860
 ] 

Haibo Chen commented on MAPREDUCE-6441:
---

bq. but I haven't managed to get it to fail with the old code
My understanding is that the new test is supposed to fail with the old code and 
the new change is supposed to fix the test failure. Otherwise, the new test is 
not testing any new behavior, right?

> Improve temporary directory name generation in LocalDistributedCacheManager 
> for concurrent processes
> 
>
> Key: MAPREDUCE-6441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: William Watson
>Assignee: Ray Chiang
> Attachments: HADOOP-10924.02.patch, 
> HADOOP-10924.03.jobid-plus-uuid.patch, MAPREDUCE-6441.004.patch, 
> MAPREDUCE-6441.005.patch, MAPREDUCE-6441.006.patch
>
>
> Kicking off many sqoop processes in different threads results in:
> {code}
> 2014-08-01 13:47:24 -0400:  INFO - 14/08/01 13:47:22 ERROR tool.ImportTool: 
> Encountered IOException running import job: java.io.IOException: 
> java.util.concurrent.ExecutionException: java.io.IOException: Rename cannot 
> overwrite non empty destination directory 
> /tmp/hadoop-hadoop/mapred/local/1406915233073
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:149)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.(LocalJobRunner.java:163)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> java.security.AccessController.doPrivileged(Native Method)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> javax.security.auth.Subject.doAs(Subject.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:186)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:159)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:239)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:645)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:415)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.run(Sqoop.java:145)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
> 2014-08-01 13:47:24 -0400:  INFO -at 
> org.apache.sqoop.Sqoop.main(Sqoop.java:238)
> {code}
> If two are kicked off in the same second. The issue is the following lines of 
> code in the org.apache.hadoop.mapred.LocalDistributedCacheManager class: 
> {code}
> // Generating unique numbers for FSDownload.
> AtomicLong uniqueNumberGenerator =
>new AtomicLong(System.currentTimeMillis());
> {code}
> and 
> {code}
> Long.toString(uniqueNumberGenerator.incrementAndGet())),
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-31 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149756#comment-16149756
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

Ah. Thanks [~djp]!

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.9.0, 2.8.2, 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6937:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.9.0, 2.8.2, 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6937:
--
Fix Version/s: 2.8.2

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.9.0, 2.8.2, 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6937:
--
Fix Version/s: 2.9.0

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.9.0, 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-31 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149184#comment-16149184
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

The unit tests always fail with error message "hadoop-mapreduce-client-app: 
There was a timeout or other error in the fork" It timed out while running 
org.apache.hadoop.mapreduce.jobhistory.TestJobHistoryEventHandler (See 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7111/artifact/patchprocess/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app-jdk1.7.0_151.txt).

But it does not seem related to this patch. The same issue existed in the 
Jenkins job testing the MAPREDUCE-6641 patch against branch-2.8

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-4955) NM container diagnostics for excess resource usage can be lost if task fails while being killed

2017-08-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-4955:
-

Assignee: (was: Haibo Chen)

> NM container diagnostics for excess resource usage can be lost if task fails 
> while being killed 
> 
>
> Key: MAPREDUCE-4955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4955
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.0.3-alpha, 0.23.5
>Reporter: Jason Lowe
>
> When a nodemanager kills a container for being over resource budgets, it 
> provides a diagnostics message for the container status explaining why it was 
> killed.  However this message can be lost if the task fails during the 
> shutdown from the SIGTERM (e.g.: lost DFS leases because filesystem closed) 
> and notifies the AM via the task umbilical *before* the AM receives the NM's 
> container status message via the RM heartbeat.
> In that case the task attempt fails with the task's failure diagnostic, and 
> the user is left wondering exactly why the task failed because the NM's 
> diagnostics arrive too late, are not written to the history file, and are 
> lost.  If the AM receives the container status via the RM heartbeat before 
> the task fails during shutdown then the diagnostics are written properly to 
> the history file, and the user can see why the task failed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Assigned] (MAPREDUCE-6636) Clean up argument parsing in mapred CLI

2017-08-31 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen reassigned MAPREDUCE-6636:
-

Assignee: (was: Haibo Chen)

> Clean up argument parsing in mapred CLI
> ---
>
> Key: MAPREDUCE-6636
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6636
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Robert Kanter
>
> {{org.apache.hadoop.mapreduce.tools.CLI}} manually parses arguments, which 
> has a number of downsides including (a) requiring strict ordering of 
> arguments by the user and (b) it's hard for devs to add new arguments.
> We should replace all of this with a CLI parsing library like 
> {{org.apache.commons.cli.CommandLineParser}}, which is used in a number of 
> other places already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml

2017-08-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen resolved MAPREDUCE-6949.
---
Resolution: Duplicate

> yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml
> ---
>
> Key: MAPREDUCE-6949
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 3.0.0-alpha4
>Reporter: Haibo Chen
>Assignee: Haibo Chen
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-30 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147977#comment-16147977
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

Never mind my comments on the patch for 2.9. It's still branch-2 at this point.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6949) yarn.app.mapreduce.am.log.level is not documented in mapred-default.xml

2017-08-30 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6949:
-

 Summary: yarn.app.mapreduce.am.log.level is not documented in 
mapred-default.xml
 Key: MAPREDUCE-6949
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6949
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen
Assignee: Haibo Chen
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6892:
--
   Resolution: Fixed
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
Fix Version/s: 3.0.0-beta1
   Status: Resolved  (was: Patch Available)

Thanks [~pbacsko] for the contribution, I have committed the patch to trunk

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, 
> MAPREDUCE-6892-006.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6937:
--
Fix Version/s: 2.7.5

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 2.7.5
>
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-30 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147615#comment-16147615
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---

+1 on the latest patch. Will commit it shortly.

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch, 
> MAPREDUCE-6892-006.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-30 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147567#comment-16147567
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

[~pbacsko] Can you also upload a patch for 2.9?

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-30 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147554#comment-16147554
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

+1 on the branch-2.7-v5 patch. I have filed MAPREDUCE-6948 for the unit test 
failure TestJobImpl.testUnusableNodeTransition. 
The checkstyle issues are legit, we could address them in MAPREDUCE-6939.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2017-08-30 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6948:
--
Description: 
*Error Message*
expected: but was:

*Stacktrace*
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)


*Standard out*
{code}
2017-08-30 10:12:21,928 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventType for class 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler
2017-08-30 10:12:21,939 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobEventType for class 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$StubbedJob
2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.TaskEventType for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.jobhistory.EventType for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
2017-08-30 10:12:21,940 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.JobFinishEvent$Type for class 
org.apache.hadoop.yarn.event.EventHandler$$EnhancerByMockitoWithCGLIB$$79f96ebf
2017-08-30 10:12:21,941 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:setup(1534)) - Adding job token for job_123456789_0001 to 
jobTokenSecretManager
2017-08-30 10:12:21,941 WARN  [Thread-49] impl.JobImpl 
(JobImpl.java:setup(1540)) - Shuffle secret key missing from job credentials. 
Using job token secret as shuffle secret.
2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:makeUberDecision(1305)) - Not uberizing job_123456789_0001 
because: not enabled;
2017-08-30 10:12:21,944 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:createMapTasks(1562)) - Input size for job job_123456789_0001 
= 0. Number of splits = 2
2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:createReduceTasks(1579)) - Number of reduces for job 
job_123456789_0001 = 1
2017-08-30 10:12:21,945 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from NEW 
to INITED
2017-08-30 10:12:21,946 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
INITED to SETUP
2017-08-30 10:12:21,954 INFO  [CommitterEvent Processor #0] 
commit.CommitterEventHandler (CommitterEventHandler.java:run(231)) - Processing 
the event EventType: JOB_SETUP
2017-08-30 10:12:21,978 INFO  [AsyncDispatcher event handler] impl.JobImpl 
(JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from SETUP 
to RUNNING
2017-08-30 10:12:21,983 INFO  [Thread-49] event.AsyncDispatcher 
(AsyncDispatcher.java:register(209)) - Registering class 
org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptEventType for class 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl$5
2017-08-30 10:12:22,000 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:transition(1953)) - Num completed Tasks: 1
2017-08-30 10:12:22,029 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:transition(1953)) - Num completed Tasks: 2
2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:actOnUnusableNode(1354)) - TaskAttempt killed because it ran on 
unusable node Mock for NodeId, hashCode: 1280187896. 
AttemptId:attempt_123456789_0001_m_00_0
2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:transition(1953)) - Num completed Tasks: 3
2017-08-30 10:12:22,032 INFO  [Thread-49] impl.JobImpl 
(JobImpl.java:handle(1017)) - job_123456789_0001Job Transitioned from 
RUNNING to COMMITTING
2017-08-30 10:12:22,032 ERROR [Thread-49] impl.JobImpl 
(JobImpl.java:handle(1009)) - Can't handle this event at current state
org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
JOB_TASK_COMPLETED at COMMITTING
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at 
org.a

[jira] [Created] (MAPREDUCE-6948) TestJobImpl.testUnusableNodeTransition failed

2017-08-30 Thread Haibo Chen (JIRA)
Haibo Chen created MAPREDUCE-6948:
-

 Summary: TestJobImpl.testUnusableNodeTransition failed
 Key: MAPREDUCE-6948
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6948
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0-alpha4
Reporter: Haibo Chen


*Error Message*
expected: but was:

*Stacktrace*
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:1041)
at 
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:615)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-30 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147536#comment-16147536
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

branch-2.8 jenkins job failed due to time out. I have retriggered the job.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6870-branch-2.01.patch, 
> MAPREDUCE-6870-branch-2.02.patch, MAPREDUCE-6870-branch-2.7.03.patch, 
> MAPREDUCE-6870-branch-2.7.04.patch, MAPREDUCE-6870-branch-2.7.05.patch, 
> MAPREDUCE-6870_branch2.7.patch, MAPREDUCE-6870_branch2.7v2.patch, 
> MAPREDUCE-6870-branch-2.8.03.patch, MAPREDUCE-6870-branch-2.8.04.patch, 
> MAPREDUCE-6870_branch2.8.patch, MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6892:
--
Tags:   (was: incom)

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-28 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6892:
--
Tags: incom
Hadoop Flags: Incompatible change

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143964#comment-16143964
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---

Thanks for the update [~pbacsko]! The patch is very close to be ready. But I 
still think we should give -1 as the default value to stand for unknown in the 
schema (Events.avpr). 
In TestJobHistoryParsing.testHistoryParsingForKilledAndFailedAttempts(), let's 
use // for the comments you've added (/**/ is for class/methods) and remove the 
nested for loop to assert RACK NAME for each of the tasks (I don't think they 
are relevant). Otherwise, the patch LGTM.

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch, MAPREDUCE-6892-005.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143883#comment-16143883
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

Seems like the patch is not picked up by Jenkins. Can you also rename the patch 
in the format of [jira name]-[branch name].[patch version].patch, e.g. 
MAPREDUCE-6870-branch-2.7.02.patch?

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
> Attachments: MAPREDUCE-6870_branch2.7.patch, 
> MAPREDUCE-6870_branch2.7v2.patch, MAPREDUCE-6870_branch2.8.patch, 
> MAPREDUCE-6870_branch2.8v2.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6937) Backport MAPREDUCE-6870 to branch-2 while preserving compatibility

2017-08-28 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143857#comment-16143857
 ] 

Haibo Chen commented on MAPREDUCE-6937:
---

[~pbacsko], the default value for mapreduce.job.finish-when-all-reducers-done 
should be false to preserve the current behavior in branch-2.7 and 2.8.

> Backport MAPREDUCE-6870 to branch-2 while preserving compatibility
> --
>
> Key: MAPREDUCE-6937
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6937
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Zhe Zhang
>Assignee: Erik Krogen
> Attachments: MAPREDUCE-6870_branch2.7.patch, 
> MAPREDUCE-6870_branch2.8.patch
>
>
> To maintain compatibility we need to disable this by default per discussion 
> on MAPREDUCE-6870.
> Using a separate JIRA to correctly track incompatibilities.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-24 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140481#comment-16140481
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---

Thanks @Peter for the update, I'll wait for you to fix the test failure before 
I review. 
I do think the default value for killed failed should be -1 in the schema. -1 
means not available/known IIUC.
In our case, the killed and failed counters were not recorded previously, thus 
I think -1 makes more sense.

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch, MAPREDUCE-6892-004.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6936) Remove unnecessary dependency of hadoop-yarn-server-common from hadoop-mapreduce-client-common

2017-08-16 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129637#comment-16129637
 ] 

Haibo Chen commented on MAPREDUCE-6936:
---

Thanks [~rkanter] for the review!

> Remove unnecessary dependency of hadoop-yarn-server-common from 
> hadoop-mapreduce-client-common 
> ---
>
> Key: MAPREDUCE-6936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6936
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Fix For: 2.9.0, 3.0.0-beta1
>
> Attachments: MAPREDUCE-6936.00.patch
>
>
> The dependency of hadoop-yarn-server-common in hadoop-mapreduce-client-common 
> seems unnecessary, as 
> it is not using as of the classes from hadoop-yarn-server-common. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6870:
--
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124114#comment-16124114
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---

Agreed.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123984#comment-16123984
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---

In some cases, the single mapper can take hours to finish, thus delaying job 
completion by hours. We definitely want to default to false in 2.x for 
compatibility. For trunk, I think it is a good opportunity to fix it as an 
incompatible change, unless folks think strongly otherwise. IMO, it's better to 
fail the niche case in order to not confuse average users.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6870) Add configuration for MR job to finish when all reducers are complete (even with unfinished mappers)

2017-08-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123624#comment-16123624
 ] 

Haibo Chen commented on MAPREDUCE-6870:
---

Thanks [~xkrogen] for pointing out the incompatibility! I agree. Maybe we even 
need to flip the default to false in order to avoid the incompatible, since the 
case this is trying to fix is an edge case to begin with.

> Add configuration for MR job to finish when all reducers are complete (even 
> with unfinished mappers)
> 
>
> Key: MAPREDUCE-6870
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6870
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.6.1
>Reporter: Zhe Zhang
>Assignee: Peter Bacsko
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6870-001.patch, MAPREDUCE-6870-002.patch, 
> MAPREDUCE-6870-003.patch, MAPREDUCE-6870-004.patch, MAPREDUCE-6870-005.patch, 
> MAPREDUCE-6870-006.patch, MAPREDUCE-6870-007.patch
>
>
> Even with MAPREDUCE-5817, there could still be cases where mappers get 
> scheduled before all reducers are complete, but those mappers run for long 
> time, even after all reducers are complete. This could hurt the performance 
> of large MR jobs.
> In some cases, mappers don't have any materialize-able outcome other than 
> providing intermediate data to reducers. In that case, the job owner should 
> have the config option to finish the job once all reducers are complete.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123608#comment-16123608
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---

Thanks [~pbacsko] for updating the patch! A few more comments:
1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle 
the newly added fields?
2) In UnparsedJob, let return -1 to be consistent with getCompletedMaps() and 
getCompletedReduce(). Similarly for PartialJob, let's also return -1 to 
indicate the info is not available.
3) JobImpl.getCompletedMaps() return successMap + killedMap + FailedMap, 
whereas CompletedJob.getCompletedMaps() returns only successMap. 
Let's do the same in CompletedJob.getCompletedMaps() as well as in 
CompletedJob.getCompletedReduce().
4) in Job20LineHistoryEventEmitter, how much work is it to also parse the 
failed/killed map/reducer counters (I have not familiar with this code)? I am 
OK to leave it if it is too much.
5) Not an issue with this patch, but let's also set killed/failed counters in 
JobHistoryParser.handleJobFinishedEvent()
6) CompletedJob.getKillReduces() should return (int) jobInfo.getKilledReduces();
7) Rename JobSummary.getNumFinishedMaps() to getNumSucceededMaps(). Also, let's 
add summary.setNumKilled[Map/Reduce] in TestJobSummary.before() as well.

Can you look into the test failure and fix it if possible? 



> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-11 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6892:
--
Comment: was deleted

(was: Thanks [~pbacsko] for updating the patch! A few more comments:
1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle 
the newly added fields?
2) In UnparsedJob, let return -1 to be consistent with getCompleted)

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6892) Issues with the count of failed/killed tasks in the jhist file

2017-08-11 Thread Haibo Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16123520#comment-16123520
 ] 

Haibo Chen commented on MAPREDUCE-6892:
---

Thanks [~pbacsko] for updating the patch! A few more comments:
1) In JobFinishedEvent, can we add code in getDatum and setDatum() to handle 
the newly added fields?
2) In UnparsedJob, let return -1 to be consistent with getCompleted

> Issues with the count of failed/killed tasks in the jhist file
> --
>
> Key: MAPREDUCE-6892
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6892
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client, jobhistoryserver
>Reporter: Peter Bacsko
>Assignee: Peter Bacsko
> Attachments: MAPREDUCE-6892-001.patch, MAPREDUCE-6892-002.PATCH, 
> MAPREDUCE-6892-003.patch
>
>
> Recently we encountered some issues with the value of failed tasks. After 
> parsing the jhist file, {{JobInfo.getFailedMaps()}} returned 0, but actually 
> there were failures. 
> Another minor thing is that you cannot get the number of killed tasks 
> (although this can be calculated).
> The root cause is that {{JobUnsuccessfulCompletionEvent}} contains only the 
> successful map/reduce task counts. Number of failed (or killed) tasks are not 
> stored.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   >