date:20120223


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214542#comment-13214542
 ] 

Hudson commented on MAPREDUCE-3787:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1840 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1840/])
MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for 
faster job submission. (amarrk) (Revision 1292736)

 Result = SUCCESS
amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml


 [Gridmix] Improve STRESS mode
 -

 Key: MAPREDUCE-3787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.24.0
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: gridmix, stress
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch


 Gridmix STRESS mode can be improved as follows:
 1. The sleep time in JobMonitor can be reduced and/or made configurable
 2. Map and reduce load calculation in StressJobFactory can be done in one loop
 3. Updating the overload status from the job submitter thread (inline)
 4. Optimizations to avoid un-necessary progress check (which inturn would 
 result into delay)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214545#comment-13214545
 ] 

Hudson commented on MAPREDUCE-3787:
---

Integrated in Hadoop-Common-trunk-Commit #1766 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1766/])
MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for 
faster job submission. (amarrk) (Revision 1292736)

 Result = SUCCESS
amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml


 [Gridmix] Improve STRESS mode
 -

 Key: MAPREDUCE-3787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.24.0
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: gridmix, stress
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch


 Gridmix STRESS mode can be improved as follows:
 1. The sleep time in JobMonitor can be reduced and/or made configurable
 2. Map and reduce load calculation in StressJobFactory can be done in one loop
 3. Updating the overload status from the job submitter thread (inline)
 4. Optimizations to avoid un-necessary progress check (which inturn would 
 result into delay)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode

2012-02-23 Thread Amar Kamat (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Amar Kamat resolved MAPREDUCE-3787.
---

Resolution: Fixed
Release Note: JobMonitor can now deploy multiple threads for faster
job-status polling. Use 'gridmix.job-monitor.thread-count' to set the number of
threads. Stress mode now relies on the updates from the job monitor instead of
polling for job status. Failures in job submission now get reported to the
statistics module and ultimately reported to the user via summary.
Hadoop Flags: Reviewed

I just committed this to trunk! Thanks Ravi for the review.

[Gridmix] Improve STRESS mode
-

Key: MAPREDUCE-3787
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/gridmix
Affects Versions: 0.24.0
Reporter: Amar Kamat
Assignee: Amar Kamat
Labels: gridmix, stress
Fix For: 0.23.1

Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch

Gridmix STRESS mode can be improved as follows:
1. The sleep time in JobMonitor can be reduced and/or made configurable
2. Map and reduce load calculation in StressJobFactory can be done in one loop
3. Updating the overload status from the job submitter thread (inline)
4. Optimizations to avoid un-necessary progress check (which inturn would
result into delay)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3829) [Gridmix] Gridmix should give better error message when input-data directory already exists and -generate option is given

2012-02-23 Thread Amar Kamat (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214551#comment-13214551
 ] 

Amar Kamat commented on MAPREDUCE-3829:
---

Ravi,
Should we reuse the 'STARTUP_FAILED_ERROR' in DistributedCacheEmulator? LOG 
statements should point to the real cause of the error. Lets try to keep all 
the error codes in one place i.e Gridmix.java. Other changes looks good to me.

 [Gridmix] Gridmix should give better error message when input-data directory 
 already exists and -generate option is given
 -

 Key: MAPREDUCE-3829
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3829
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: 3829.v0.patch


 Instead of throwing exception messages on to the console, Gridmix should give 
 better error message when input-data directory already exists and -generate 
 option is given.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3728) ShuffleHandler can't access results when configured in a secure mode

[
https://issues.apache.org/jira/browse/MAPREDUCE-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214553#comment-13214553
]

Hadoop QA commented on MAPREDUCE-3728:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515738/MAPREDUCE-3728.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1914//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1914//console

This message is automatically generated.

ShuffleHandler can't access results when configured in a secure mode

Key: MAPREDUCE-3728
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3728
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Roman Shaposhnik
Assignee: Ahmed Radwan
Priority: Critical
Fix For: 0.23.1

Attachments: MAPREDUCE-3728.patch

While running the simplest of jobs (Pi) on MR2 in a fully secure
configuration I have noticed that the job was failing on the reduce side with
the following messages littering the nodemanager logs:
{noformat}
2012-01-19 08:35:32,544 ERROR org.apache.hadoop.mapred.ShuffleHandler:
Shuffle error
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
usercache/rvs/appcache/application_1326928483038_0001/output/attempt_1326928483038_0001_m_03_0/file.out.index
in any of the configured local directories
{noformat}
While digging further I found out that the permissions on the files/dirs were
prohibiting nodemanager (running under the user yarn) to access these files:
{noformat}
$ ls -l
/data/3/yarn/usercache/testuser/appcache/application_1327102703969_0001/output/attempt_1327102703969_0001_m_01_0
-rw-r- 1 testuser testuser 28 Jan 20 15:41 file.out
-rw-r- 1 testuser testuser 32 Jan 20 15:41 file.out.index
{noformat}
Digging even further revealed that the group-sticky bit that was faithfully
put on all the subdirectories between testuser and
application_1327102703969_0001 was gone from output and
attempt_1327102703969_0001_m_01_0.
Looking into how these subdirectories are created
(org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs())
{noformat}
// $x/usercache/$user/appcache/$appId/filecache
Path appFileCacheDir = new Path(appBase, FILECACHE);
appsFileCacheDirs[i] = appFileCacheDir.toString();
lfs.mkdir(appFileCacheDir, null, false);
// $x/usercache/$user/appcache/$appId/output
lfs.mkdir(new Path(appBase, OUTPUTDIR), null, false);
{noformat}
Reveals that lfs.mkdir ends up manipulating permissions and thus clears
sticky bit from output and filecache.
At this point I'm at a loss about how this is supposed to work. My
understanding was
that the whole sequence of events here was predicated on a sticky bit set so
that daemons running under the user yarn (default group yarn) can have access
to the resulting files and subdirectories down at output and below. Please let
me know if I'm missing something or whether this is just a bug that needs to
be fixed.
On a related note, when the shuffle side of the Pi job failed the job itself
didn't.
It went into the endless loop and only exited when it exhausted all the local
storage
for the log files (at which point the nodemanager died and thus the job
ended). Perhaps
this is even more serious side effect of this issue that needs to be
investigated
separately.

[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214559#comment-13214559
 ] 

Hudson commented on MAPREDUCE-3787:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #1777 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/1777/])
MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for 
faster job submission. (amarrk) (Revision 1292736)

 Result = ABORTED
amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml


 [Gridmix] Improve STRESS mode
 -

 Key: MAPREDUCE-3787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.24.0
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: gridmix, stress
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch


 Gridmix STRESS mode can be improved as follows:
 1. The sleep time in JobMonitor can be reduced and/or made configurable
 2. Map and reduce load calculation in StressJobFactory can be done in one loop
 3. Updating the overload status from the job submitter thread (inline)
 4. Optimizations to avoid un-necessary progress check (which inturn would 
 result into delay)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3787) [Gridmix] Improve STRESS mode


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214584#comment-13214584
 ] 

Hudson commented on MAPREDUCE-3787:
---

Integrated in Hadoop-Hdfs-trunk #964 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/964/])
MAPREDUCE-3787. [Gridmix] Optimize job monitoring and STRESS mode for 
faster job submission. (amarrk) (Revision 1292736)

 Result = SUCCESS
amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292736
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/ExecutionSummarizer.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobMonitor.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobSubmitter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Statistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/StressJobFactory.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixStatistics.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSummary.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestSleepJob.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/src/docs/src/documentation/content/xdocs/gridmix.xml


 [Gridmix] Improve STRESS mode
 -

 Key: MAPREDUCE-3787
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3787
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Affects Versions: 0.24.0
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: gridmix, stress
 Fix For: 0.23.1

 Attachments: MAPREDUCE-3787-v1.12.patch, MAPREDUCE-3787-v1.9.patch


 Gridmix STRESS mode can be improved as follows:
 1. The sleep time in JobMonitor can be reduced and/or made configurable
 2. Map and reduce load calculation in StressJobFactory can be done in one loop
 3. Updating the overload status from the job submitter thread (inline)
 4. Optimizations to avoid un-necessary progress check (which inturn would 
 result into delay)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214582#comment-13214582
 ] 

Hudson commented on MAPREDUCE-3884:
---

Integrated in Hadoop-Hdfs-trunk #964 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/964/])
MAPREDUCE-3884. PWD should be first in the classpath of MR tasks (tucu) 
(Revision 1292424)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292424
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java


 PWD should be first in the classpath of MR tasks
 

 Key: MAPREDUCE-3884
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.2
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch


 Currently the current directory is not part of the classpath, this is a 
 regression from MR1 and existing applications assuming this fail to work 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214589#comment-13214589
 ] 

Hudson commented on MAPREDUCE-3884:
---

Integrated in Hadoop-Hdfs-0.23-Build #177 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/177/])
Merge -r 1292423:1292424 from trunk to branch. FIXES: MAPREDUCE-3884 
(Revision 1292427)

 Result = UNSTABLE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292427
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java


 PWD should be first in the classpath of MR tasks
 

 Key: MAPREDUCE-3884
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.2
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch


 Currently the current directory is not part of the classpath, this is a 
 regression from MR1 and existing applications assuming this fail to work 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214620#comment-13214620
 ] 

Hudson commented on MAPREDUCE-3884:
---

Integrated in Hadoop-Mapreduce-0.23-Build #205 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Build/205/])
Merge -r 1292423:1292424 from trunk to branch. FIXES: MAPREDUCE-3884 
(Revision 1292427)

 Result = FAILURE
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292427
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java


 PWD should be first in the classpath of MR tasks
 

 Key: MAPREDUCE-3884
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.2
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch


 Currently the current directory is not part of the classpath, this is a 
 regression from MR1 and existing applications assuming this fail to work 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3884) PWD should be first in the classpath of MR tasks


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214645#comment-13214645
 ] 

Hudson commented on MAPREDUCE-3884:
---

Integrated in Hadoop-Mapreduce-trunk #999 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/999/])
MAPREDUCE-3884. PWD should be first in the classpath of MR tasks (tucu) 
(Revision 1292424)

 Result = SUCCESS
tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292424
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/test/java/org/apache/hadoop/mapreduce/v2/util/TestMRApps.java


 PWD should be first in the classpath of MR tasks
 

 Key: MAPREDUCE-3884
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3884
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.2
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3884.patch, MAPREDUCE-3884.patch


 Currently the current directory is not part of the classpath, this is a 
 regression from MR1 and existing applications assuming this fail to work 
 properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3034) NM should act on a REBOOT command from RM

2012-02-23 Thread Eric Payne (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214790#comment-13214790
 ] 

Eric Payne commented on MAPREDUCE-3034:
---

@Devaraj,

Can you please upmerge the patch to the latest code in 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23

Thanks!

 NM should act on a REBOOT command from RM
 -

 Key: MAPREDUCE-3034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0, 0.24.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Devaraj K
Priority: Critical
 Attachments: MAPREDUCE-3034-1.patch, MAPREDUCE-3034-2.patch, 
 MAPREDUCE-3034-3.patch, MAPREDUCE-3034-4.patch, MAPREDUCE-3034.patch, 
 MR-3034.txt


 RM sends a reboot command to NM in some cases, like when it gets lost and 
 rejoins back. In such a case, NM should act on the command and 
 reboot/reinitalize itself.
 This is akin to TT reinitialize on order from JT. We will need to shutdown 
 all the services properly and reinitialize - this should automatically take 
 care of killing of containers, cleaning up local temporary files etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3902) MR AM should reuse containers for map tasks

2012-02-23 Thread Arun C Murthy (Created) (JIRA)

MR AM should reuse containers for map tasks
---

 Key: MAPREDUCE-3902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy


The MR AM is now in a great position to reuse containers across (map) tasks. 
This is something similar to JVM re-use we had in 0.20.x, but in a 
significantly better manner:
# Consider data-locality when re-using containers
# Consider the new shuffle - ensure that reduces fetch output of the whole 
container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (MAPREDUCE-3897) capacity scheduler - maxActiveApplicationsPerUser calculation can be wrong

2012-02-23 Thread Eric Payne (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne reassigned MAPREDUCE-3897:
-

Assignee: Eric Payne

 capacity scheduler - maxActiveApplicationsPerUser calculation can be wrong
 --

 Key: MAPREDUCE-3897
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3897
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Eric Payne
Priority: Critical

 The capacity scheduler calculates the maxActiveApplications and the 
 maxActiveApplicationsPerUser based on the config 
 yarn.scheduler.capacity.maximum-applications or default 1.  
 MaxActiveApplications = max ( ceil ( clusterMemory/minAllocation * 
 maxAMResource% * absoluteMaxCapacity), 1)  
 MaxActiveAppsPerUser = max( ceil (maxActiveApplicationsComputedAbove * 
 (userLimit%/100) * userLimitFactor), 1) 
 maxActiveApplications is already multiplied by the queue absolute MAXIMUM 
 capacity, so if max capacity  capacity and if you have user limit factor 1 
 (which is the default) and only 1 user is running, that user will not be 
 allowed to use over the queue capacity, so having it relative to MAX capacity 
 doesn't make sense.  That user could easily end up in a deadlock and all its 
 space used by application masters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3902) MR AM should reuse containers for map tasks

2012-02-23 Thread Arun C Murthy (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-3902:
-

Attachment: MAPREDUCE-3902.patch

Ok, I spent a long (isolated) flight on this - it clearly needs more work, but 
it's a start. *smile*

This patch improves the classic JVM re-use on both dimensions described in the 
jira.

We need to pay more attention to the user interface, some options:
# Allow user to specify actual number of map slots to be used (supported now, 
in the patch)
# Allow user to specify a target block-size for maps (which is greater than 
real HDFS block size) i.e. get around the small-files problem.

Thoughts?

 MR AM should reuse containers for map tasks
 ---

 Key: MAPREDUCE-3902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-3902.patch


 The MR AM is now in a great position to reuse containers across (map) tasks. 
 This is something similar to JVM re-use we had in 0.20.x, but in a 
 significantly better manner:
 # Consider data-locality when re-using containers
 # Consider the new shuffle - ensure that reduces fetch output of the whole 
 container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page

2012-02-23 Thread Thomas Graves (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214807#comment-13214807
 ] 

Thomas Graves commented on MAPREDUCE-3878:
--

+1 looks good.  Thanks Jon.

 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3878) Null user on filtered jobhistory job page

2012-02-23 Thread Thomas Graves (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-3878:
-

   Resolution: Fixed
Fix Version/s: 0.23.2
   Status: Resolved  (was: Patch Available)

I committed this to trunk and branch-0.23. Thanks Jon!

 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3902) MR AM should reuse containers for map tasks

2012-02-23 Thread Jay Finger (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214819#comment-13214819
 ] 

Jay Finger commented on MAPREDUCE-3902:
---

I haven't read the patch, forgive me if the answer is already there.

Is there a cap on the amount of re-use?  For example, if the container has been 
in use for more than 1 minute then do not re-use it.

Or to rephrase, what prevents a cluster with a few large jobs from having 
hogged containers?

 MR AM should reuse containers for map tasks
 ---

 Key: MAPREDUCE-3902
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3902
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: applicationmaster, mrv2
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Attachments: MAPREDUCE-3902.patch


 The MR AM is now in a great position to reuse containers across (map) tasks. 
 This is something similar to JVM re-use we had in 0.20.x, but in a 
 significantly better manner:
 # Consider data-locality when re-using containers
 # Consider the new shuffle - ensure that reduces fetch output of the whole 
 container at once (i.e. all maps) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214823#comment-13214823
 ] 

Hudson commented on MAPREDUCE-3878:
---

Integrated in Hadoop-Common-0.23-Commit #586 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/586/])
merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 
(Revision 1292834)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java


 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214824#comment-13214824
 ] 

Hudson commented on MAPREDUCE-3878:
---

Integrated in Hadoop-Common-trunk-Commit #1767 (See 
[https://builds.apache.org/job/Hadoop-Common-trunk-Commit/1767/])
MAPREDUCE-3878. Null user on filtered jobhistory job page (Jonathon Eagles 
via tgraves) (Revision 1292831)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292831
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java


 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214821#comment-13214821
 ] 

Hudson commented on MAPREDUCE-3878:
---

Integrated in Hadoop-Hdfs-trunk-Commit #1841 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/1841/])
MAPREDUCE-3878. Null user on filtered jobhistory job page (Jonathon Eagles 
via tgraves) (Revision 1292831)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292831
Files : 
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java


 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214826#comment-13214826
 ] 

Hudson commented on MAPREDUCE-3878:
---

Integrated in Hadoop-Hdfs-0.23-Commit #573 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/573/])
merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 
(Revision 1292834)

 Result = SUCCESS
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java


 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3878) Null user on filtered jobhistory job page

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214844#comment-13214844
 ] 

Hudson commented on MAPREDUCE-3878:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #588 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/588/])
merge -r 1292830:1292831 from trunk to branch-0.23. FIXES: MAPREDUCE-3878 
(Revision 1292834)

 Result = ABORTED
tgraves : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1292834
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/webapp/AppController.java


 Null user on filtered jobhistory job page
 -

 Key: MAPREDUCE-3878
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3878
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3878.patch


 If jobhistory/job.* is filtered to bypass acl, resulting page will always 
 show Null user. This differs from 0.20 where filtering on this page, bypasses 
 security to allow all access to the page. essentially passes a null user to 
 AppController where an exception is thrown. If a null user is detected, we 
 should acl checking is disabled on this page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3904) Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true

2012-02-23 Thread Jonathan Eagles (Created) (JIRA)

Job history produced with mapreduce.cluster.acls.enabled false can not be 
viewed with mapreduce.cluster.acls.enabled true
-

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles


Job history page displays 'null'. It looks like job history files only populate 
job acls when mapreduce.cluster.acls.enabled is true. Upon reading job history 
files, getAcls can return null, throwing an exception on the HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3872) event handling races in ContainerLauncherImpl and TestContainerLauncher

2012-02-23 Thread Patrick Hunt (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated MAPREDUCE-3872:


Attachment: MAPREDUCE-3872.patch

Refreshing the patch. Looks like MAPREDUCE-3634 fixed a number of the issues I 
had originally seen/fixed in this patch.

The latest version of this patch fixes the obvious concurrency bug in updating 
allNodes. This patch is currently tested by the unit tests, I don't see a way 
to trigger the bad case given it's non-deterministic. However by inspection you 
can see the obvious concurrency bug that exists in the current code.

 event handling races in ContainerLauncherImpl and TestContainerLauncher
 ---

 Key: MAPREDUCE-3872
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3872
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 0.23.1
Reporter: Patrick Hunt
 Attachments: MAPREDUCE-3872.patch, MAPREDUCE-3872.patch


 TestContainerLauncher is failing intermittently for me.
 {noformat}
 junit.framework.AssertionFailedError: Expected: null but was: Expected 22 
 but found 21
   at junit.framework.Assert.fail(Assert.java:47)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertNull(Assert.java:233)
   at junit.framework.Assert.assertNull(Assert.java:226)
   at 
 org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher.testPoolSize(TestContainerLauncher.java:117)
 {noformat}
 Patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214957#comment-13214957
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583:
---

 I got the following when running ant test-patch:

Sorry that I was not clear.  The full command looks like
{code}
ant -Dforrest.home=${FORREST_HOME} -Dfindbugs.home=${FINDBUGS_HOME} 
-Dpatch.file=a.patch test-patch
{code}
and it requires findbugs and forrest.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Status: Open  (was: Patch Available)

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Status: Patch Available  (was: Open)

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Attachment: MR3901_v2.txt

Updated to fix the very valid findbug warnings.

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3905) Allow per job log aggregation configuration

2012-02-23 Thread Siddharth Seth (Created) (JIRA)

Allow per job log aggregation configuration
---

 Key: MAPREDUCE-3905
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3905
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth


Currently, if log aggregation is enabled for a cluster - logs for all jobs will 
be aggregated - leading to a whole bunch of files on hdfs which users may not 
want.
Users should be able to control this along with the aggregation policy - failed 
only, all, etc.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3872) event handling races in ContainerLauncherImpl and TestContainerLauncher

[
https://issues.apache.org/jira/browse/MAPREDUCE-3872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214987#comment-13214987
]

Hadoop QA commented on MAPREDUCE-3872:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515773/MAPREDUCE-3872.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1915//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1915//console

This message is automatically generated.

event handling races in ContainerLauncherImpl and TestContainerLauncher
---

Key: MAPREDUCE-3872
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3872
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client, mrv2
Affects Versions: 0.23.1
Reporter: Patrick Hunt
Attachments: MAPREDUCE-3872.patch, MAPREDUCE-3872.patch

TestContainerLauncher is failing intermittently for me.
{noformat}
junit.framework.AssertionFailedError: Expected: null but was: Expected 22
but found 21
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertNull(Assert.java:233)
at junit.framework.Assert.assertNull(Assert.java:226)
at
org.apache.hadoop.mapreduce.v2.app.launcher.TestContainerLauncher.testPoolSize(TestContainerLauncher.java:117)
{noformat}
Patch momentarily.

[jira] [Updated] (MAPREDUCE-3614) 55

2012-02-23 Thread Ravi Prakash (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-3614:


Summary: 55  (was: finalState UNDEFINED if AM is killed by hand)

 55
 --

 Key: MAPREDUCE-3614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-3614.branch-0.23.patch


 Courtesy [~dcapwell]
 {quote}
 If the AM is running and you kill the process (sudo kill #pid), the State in 
 Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
 say History and point to the proxy url (which will redirect to the history 
 server).
 The state should be more descriptive that the job failed and the tracker url 
 shouldn't point to the history server.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Jason Lowe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-3738:
--

Attachment: MAPREDUCE-3738.patch

Patch to ensure we always set the finished boolean in the log aggregation 
thread.

On a side note we haven't seen a reoccurrence of the OOM condition on the 
nodemanager, so we haven't been able to track down what caused it.

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

2012-02-23 Thread Jason Lowe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-3738:
--

Target Version/s: 0.24.0, 0.23.2
  Status: Patch Available  (was: Open)

 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details

[
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215033#comment-13215033
]

Hadoop QA commented on MAPREDUCE-3901:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515794/MR3901_v2.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-hs.html
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1916//console

This message is automatically generated.

lazy load JobHistory Task and TaskAttempt details
-

Key: MAPREDUCE-3901
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Attachments: MR3901.txt, MR3901_v2.txt

The job history UI and MRClientProtocol calls routed via JobHistory are very
slow for large jobs. Some of this time is spent parsing the history file. A
good chunk is spent pre-creating lots of objects which may never be used.
Those can be create when required - bringing down the load times of job
history pages and getJobReport etc calls to approximately the history file
parse time.

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-23 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215047#comment-13215047
 ] 

Zhihong Yu commented on MAPREDUCE-3583:
---

I installed forrest and findbugs onto MacBook.
{code}
/Users/zhihyu/205-hadoop/build.xml:1310: 'java5.home' is not defined.  Forrest 
requires Java 5.  Please pass -Djava5.home=base of Java 5 distribution to Ant 
on the command-line.
{code}
Still need to install java 5.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

2012-02-23 Thread Hitesh Shah (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated MAPREDUCE-3614:
---

Summary:  finalState UNDEFINED if AM is killed by hand  (was: 55)

  finalState UNDEFINED if AM is killed by hand
 -

 Key: MAPREDUCE-3614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-3614.branch-0.23.patch


 Courtesy [~dcapwell]
 {quote}
 If the AM is running and you kill the process (sudo kill #pid), the State in 
 Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
 say History and point to the proxy url (which will redirect to the history 
 server).
 The state should be more descriptive that the job failed and the tracker url 
 shouldn't point to the history server.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Status: Open  (was: Patch Available)

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215052#comment-13215052
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583:
---

But I have to manually remove cn-doc dependency for using Java 6.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215051#comment-13215051
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583:
---

I put java 6 for the java5.home and it works.  So you don't really have to 
install Java 5.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215055#comment-13215055
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583:
---

Okay, I just have run ant test-patch on mapreduce-3583-v7.txt.
{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 5 new Findbugs 
(version 1.3.9) warnings.
 [exec] 
{noformat}
The findbugs warnings are not related.  The result is the same if running 
test-patch with an empty patch.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited

[jira] [Assigned] (MAPREDUCE-3903) no admin override to view jobs on mr app master and job history server

2012-02-23 Thread Thomas Graves (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned MAPREDUCE-3903:


Assignee: Thomas Graves

 no admin override to view jobs on mr app master and job history server
 --

 Key: MAPREDUCE-3903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3903
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Critical
 Fix For: 0.23.0


 in 1.0 there was a config mapreduce.cluster.administrators that allowed 
 administrators to view anyones job.  That no longer works on yarn.
 yarn has the new config yarn.admin.acl but it appears the mr app master and 
 job history server don't use that.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal

2012-02-23 Thread Eugene Koontz (Created) (JIRA)

Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
---

 Key: MAPREDUCE-3906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Trivial


Currently the documentation on 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 is inconsistent on the recommended or default value of 
{{{mapreduce.jobhistory.principal}}}. In the section with the header: 
MapReduce JobHistory Server the principal jhs/... is used, but later, in 
the section with the header: Configurations for MapReduce JobHistory Server:, 
the principal mapred/... is used. 

Fix is to replace mapred/... with jhs/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated MAPREDUCE-3906:
-

Component/s: security

 Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
 ---

 Key: MAPREDUCE-3906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Trivial

 Currently the documentation on 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
  is inconsistent on the recommended or default value of 
 {{{mapreduce.jobhistory.principal}}}. In the section with the header: 
 MapReduce JobHistory Server the principal jhs/... is used, but later, in 
 the section with the header: Configurations for MapReduce JobHistory 
 Server:, the principal mapred/... is used. 
 Fix is to replace mapred/... with jhs/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-23 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215058#comment-13215058
 ] 

Zhihong Yu commented on MAPREDUCE-3583:
---

Turns out java 5 was installed.

Here is the command I used:
{code}
ant -Dforrest.home=${FORREST_HOME} -Dfindbugs.home=${FINDBUGS_HOME} 
-Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5/Home 
-Dpatch.file=../mapreduce-3583-v7.txt test-patch
{code}

I got:
{code}
  [get] Error opening connection java.io.IOException: Server returned HTTP 
response code: 503 for URL: 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar
  [get] Can't get 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to 
/Users/zhihyu/205-hadoop/ivy/ivy-2.1.0.jar

BUILD FAILED
/Users/zhihyu/205-hadoop/build.xml:2393: Can't get 
http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to 
/Users/zhihyu/205-hadoop/ivy/ivy-2.1.0.jar
{code}
Not sure if the above was caused by firewall.

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From

[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated MAPREDUCE-3906:
-

Attachment: MAPREDUCE-3906.patch

 Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
 ---

 Key: MAPREDUCE-3906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Trivial
 Attachments: MAPREDUCE-3906.patch


 Currently the documentation on 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
  is inconsistent on the recommended or default value of 
 {{{mapreduce.jobhistory.principal}}}. In the section with the header: 
 MapReduce JobHistory Server the principal jhs/... is used, but later, in 
 the section with the header: Configurations for MapReduce JobHistory 
 Server:, the principal mapred/... is used. 
 Fix is to replace mapred/... with jhs/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Status: Patch Available  (was: Open)

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt, MR3901_v3.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated MAPREDUCE-3901:
--

Attachment: MR3901_v3.txt

trying again.. the previous patch should've been ok.

 lazy load JobHistory Task and TaskAttempt details
 -

 Key: MAPREDUCE-3901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
 Attachments: MR3901.txt, MR3901_v2.txt, MR3901_v3.txt


 The job history UI and MRClientProtocol calls routed via JobHistory are very 
 slow for large jobs. Some of this time is spent parsing the history file. A 
 good chunk is spent pre-creating lots of objects which may never be used. 
 Those can be create when required - bringing down the load times of job 
 history pages and getJobReport etc calls to approximately the history file 
 parse time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal

2012-02-23 Thread Mahadev konar (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-3906:
-

Component/s: mrv2

 Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
 ---

 Key: MAPREDUCE-3906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: documentation, mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Trivial
 Attachments: MAPREDUCE-3906.patch


 Currently the documentation on 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
  is inconsistent on the recommended or default value of 
 {{{mapreduce.jobhistory.principal}}}. In the section with the header: 
 MapReduce JobHistory Server the principal jhs/... is used, but later, in 
 the section with the header: Configurations for MapReduce JobHistory 
 Server:, the principal mapred/... is used. 
 Fix is to replace mapred/... with jhs/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

[
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215077#comment-13215077
]

Hadoop QA commented on MAPREDUCE-3738:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515804/MAPREDUCE-3738.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1917//console

This message is automatically generated.

NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly

Key: MAPREDUCE-3738
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
Attachments: MAPREDUCE-3738.patch, livehistdump.txt

If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception
like OutOfMemoryError in the case I saw) then this will lead to a hang during
nodemanager shutdown. The NM calls AppLogAggregatorImpl.join() during
shutdown to make sure log aggregation has completed, and that method
internally waits for an atomic boolean to be set by the log aggregation
thread to indicate it has finished. Since the thread was killed off earlier
due to an uncaught exception, the boolean will never be set and the NM hangs
during shutdown repeating something like this every second in the log file:
2012-01-25 22:20:56,366 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
Waiting for aggregation to complete for application_1326848182580_2806

[jira] [Created] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.

2012-02-23 Thread Eugene Koontz (Created) (JIRA)

Create a mapred-default.xml for the jobhistory server.
--

 Key: MAPREDUCE-3907
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Minor


The following configuration properties are documented in 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode

* mapreduce.jobhistory.address  
* mapreduce.jobhistory.keytab
* mapreduce.jobhistory.principal

Create a 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml
 that documents these and provides default values.




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3889) job client tries to use /tasklog interface, but that doesn't exist anymore

2012-02-23 Thread Thomas Graves (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215097#comment-13215097
 ] 

Thomas Graves commented on MAPREDUCE-3889:
--

{quote}

What is the impact of this? Is it crashing the client? Seems like it from the 
code, in which case we'll need to fix it.
{quote}

This is not crashing the client.  It just prints the 400 message out on the 
client if they had a failed task (by default) or task with status by what they 
set -Dmapreduce.client.output.filter to.

400 message look like:
12/02/18 21:32:12 WARN mapreduce.Job: Error reading task output Server returned 
HTTP response code: 400 for URL:
 
http://nodemanager:8080/tasklog?plaintext=trueattemptid=attempt_1329857083014_0003_r_00_0filter=stdout

So as far as I can tell its benign - just possibly confusing to the user and 
its not actually giving them any of the log information for failed tasks.

 job client tries to use /tasklog interface, but that doesn't exist anymore
 --

 Key: MAPREDUCE-3889
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3889
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Thomas Graves
Priority: Critical

 if you specify  -Dmapreduce.client.output.filter=SUCCEEDED option when 
 running a job it tries to fetch task logs to print out on the client side 
 from a url like: 
 http://nodemanager:8080/tasklog?plaintext=trueattemptid=attempt_1329857083014_0003_r_00_0filter=stdout
 It always errors on this request with: Required param job, map and reduce
 We saw this error when using distcp and the distcp failed. I'm not sure if it 
 is mandatory for distcp or just informational purposes.  I'm guessing the 
 latter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated MAPREDUCE-3907:
-

Description: 
The following configuration properties are documented in 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode

* mapreduce.jobhistory.address  
* mapreduce.jobhistory.keytab
* mapreduce.jobhistory.principal

Create a 
{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}
 that documents these properties and provides default values.




  was:
The following configuration properties are documented in 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode

* mapreduce.jobhistory.address  
* mapreduce.jobhistory.keytab
* mapreduce.jobhistory.principal

Create a 
{{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}}
 that documents these properties and provides default values.





 Create a mapred-default.xml for the jobhistory server.
 --

 Key: MAPREDUCE-3907
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Minor
 Attachments: MAPREDUCE-3907.patch


 The following configuration properties are documented in 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 * mapreduce.jobhistory.address
 * mapreduce.jobhistory.keytab
 * mapreduce.jobhistory.principal
 Create a 
 {{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}
  that documents these properties and provides default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated MAPREDUCE-3907:
-

Description: 
The following configuration properties are documented in 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode

* mapreduce.jobhistory.address  
* mapreduce.jobhistory.keytab
* mapreduce.jobhistory.principal

Create a 
{{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}}
 that documents these properties and provides default values.




  was:
The following configuration properties are documented in 
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode

* mapreduce.jobhistory.address  
* mapreduce.jobhistory.keytab
* mapreduce.jobhistory.principal

Create a 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml
 that documents these and provides default values.





 Create a mapred-default.xml for the jobhistory server.
 --

 Key: MAPREDUCE-3907
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Minor
 Attachments: MAPREDUCE-3907.patch


 The following configuration properties are documented in 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 * mapreduce.jobhistory.address
 * mapreduce.jobhistory.keytab
 * mapreduce.jobhistory.principal
 Create a 
 {{{hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml}}}
  that documents these properties and provides default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3907) Create a mapred-default.xml for the jobhistory server.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated MAPREDUCE-3907:
-

Attachment: MAPREDUCE-3907.patch

 Create a mapred-default.xml for the jobhistory server.
 --

 Key: MAPREDUCE-3907
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3907
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Minor
 Attachments: MAPREDUCE-3907.patch


 The following configuration properties are documented in 
 http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
 * mapreduce.jobhistory.address
 * mapreduce.jobhistory.keytab
 * mapreduce.jobhistory.principal
 Create a 
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/resources/mapred-default.xml
  that documents these and provides default values.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location

2012-02-23 Thread Thomas Graves (Created) (JIRA)

jobhistory server trying to load job conf file from wrong location
--

 Key: MAPREDUCE-3908
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.23.0
Reporter: Thomas Graves


I have seen a few instance where I try to click on the job configuration link 
from the job history server web ui and it gives a 500 message.  Looking at the 
job history server log file it shows an exception like:

2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while 
reading 
hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
java.io.FileNotFoundException: File does not exist: 
/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2

If I go look in hdfs, it doesn't exist in the done_intermediate directory 
anymore, it exists in the done directory structure.  
hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml

I'm not exactly sure how to reproduce this, but I definitely see it every once 
in a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location

2012-02-23 Thread Thomas Graves (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215126#comment-13215126
 ] 

Thomas Graves commented on MAPREDUCE-3908:
--

I should also note that restarting the job history server makes the issue go 
away and it looks it from the right location in the done directory.

 jobhistory server trying to load job conf file from wrong location
 --

 Key: MAPREDUCE-3908
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.23.0
Reporter: Thomas Graves

 I have seen a few instance where I try to click on the job configuration link 
 from the job history server web ui and it gives a 500 message.  Looking at 
 the job history server log file it shows an exception like:
 2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while 
 reading 
 hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
 java.io.FileNotFoundException: File does not exist: 
 /home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302)
 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2
 If I go look in hdfs, it doesn't exist in the done_intermediate directory 
 anymore, it exists in the done directory structure.  
 hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml
 I'm not exactly sure how to reproduce this, but I definitely see it every 
 once in a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3908) jobhistory server trying to load job conf file from wrong location

2012-02-23 Thread Siddharth Seth (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215128#comment-13215128
 ] 

Siddharth Seth commented on MAPREDUCE-3908:
---

This happens when the job history file is initially read from the 
done_intermediate directory, and later moved over to the done directory. The 
cached CompletedJob object continues to hold a reference to the conf file in 
the intermediate directory.

 jobhistory server trying to load job conf file from wrong location
 --

 Key: MAPREDUCE-3908
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3908
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.23.0
Reporter: Thomas Graves

 I have seen a few instance where I try to click on the job configuration link 
 from the job history server web ui and it gives a 500 message.  Looking at 
 the job history server log file it shows an exception like:
 2012-02-23 22:16:32,519 ERROR org.apache.hadoop.yarn.webapp.View: Error while 
 reading 
 hdfs://host.com:9000/home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
 java.io.FileNotFoundException: File does not exist: 
 /home/hadoop/mapred/history/done_intermediate/user/job_1330033607650_0001_conf.xml
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:746)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:681)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:302)
 at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:2
 If I go look in hdfs, it doesn't exist in the done_intermediate directory 
 anymore, it exists in the done directory structure.  
 hdfs://host.com:9000/home/hadoop/mapred/history/done/2012/02/23/00/job_1330033607650_0001_conf.xml
 I'm not exactly sure how to reproduce this, but I definitely see it every 
 once in a while.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3906) Fix inconsistency in documentation regarding mapreduce.jobhistory.principal

[
https://issues.apache.org/jira/browse/MAPREDUCE-3906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215133#comment-13215133
]

Hadoop QA commented on MAPREDUCE-3906:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515819/MAPREDUCE-3906.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+0 tests included. The patch appears to be a documentation patch that
doesn't require tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1918//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1918//console

This message is automatically generated.

Fix inconsistency in documentation regarding mapreduce.jobhistory.principal
---

Key: MAPREDUCE-3906
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3906
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: documentation, mrv2, security
Reporter: Eugene Koontz
Assignee: Eugene Koontz
Priority: Trivial
Attachments: MAPREDUCE-3906.patch

Currently the documentation on
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html#Running_Hadoop_in_Secure_Mode
is inconsistent on the recommended or default value of
{{{mapreduce.jobhistory.principal}}}. In the section with the header:
MapReduce JobHistory Server the principal jhs/... is used, but later, in
the section with the header: Configurations for MapReduce JobHistory
Server:, the principal mapred/... is used.
Fix is to replace mapred/... with jhs/

[jira] [Commented] (MAPREDUCE-3901) lazy load JobHistory Task and TaskAttempt details

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215134#comment-13215134
]

Hadoop QA commented on MAPREDUCE-3901:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515820/MR3901_v3.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1919//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1919//console

This message is automatically generated.

lazy load JobHistory Task and TaskAttempt details
-

[jira] [Assigned] (MAPREDUCE-3792) job -list displays only the jobs submitted by a particular user

2012-02-23 Thread Jason Lowe (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe reassigned MAPREDUCE-3792:
-

Assignee: Jason Lowe  (was: Vinod Kumar Vavilapalli)

 job -list displays only the jobs submitted by a particular user
 ---

 Key: MAPREDUCE-3792
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3792
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Jason Lowe
Priority: Critical

 mapred job -list lists only the jobs submitted by the user who ran the 
 command. This behavior is different from 1.x. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215156#comment-13215156
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-3583:
---

 I got two test failures ...

Both tests passed on my machine and I don't think the failures your got are 
related to the patch.
{noformat}
[junit] Running org.apache.hadoop.hdfs.security.TestDelegationToken
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 15.793 sec

[junit] Running org.apache.hadoop.metrics2.impl.TestSinkQueue
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.321 sec
{noformat}


 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 0.23.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing

[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-02-23 Thread Tsz Wo (Nicholas), SZE (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-3583:
--

   Resolution: Fixed
Fix Version/s: 1.0.2
   1.1.0
   Status: Resolved  (was: Patch Available)

I also have committed to branch-1 and branch-1.0.  Thanks Ted again!

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.24.0, 1.1.0, 0.23.2, 1.0.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

2012-02-23 Thread Ravi Prakash (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-3614:


Attachment: MAPREDUCE-3614.patch

Oops! I'm sorry! It seems my random comment generator malfunctioned :D 
Apologies. Thanks Hitesh!

I'm uploading this patch which addresses our issues. I'll be adding unit tests 
to this, but in the meantime could some committer please bless it?

  finalState UNDEFINED if AM is killed by hand
 -

 Key: MAPREDUCE-3614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch


 Courtesy [~dcapwell]
 {quote}
 If the AM is running and you kill the process (sudo kill #pid), the State in 
 Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
 say History and point to the proxy url (which will redirect to the history 
 server).
 The state should be more descriptive that the job failed and the tracker url 
 shouldn't point to the history server.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

2012-02-23 Thread Bikas Saha (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215202#comment-13215202
 ] 

Bikas Saha commented on MAPREDUCE-2793:
---

The hashCode difference was because the ApplicationId internal to JobId was 
different. The test creates 3 jobs with the same app id. However currently, 
having jobid == appid is baked into a lot of code including the one used to fix 
the inconsistency in names. The test would create a list of 3 jobs with id's 
0,1,2 and app id =0. The it would fetch the all the jobs from the webserver, 
pick the first job and verify that it exists in its list. Hence when the new 
code in the webserver was used to generate the jobid from the jobid string, it 
returned a job id with app id equal to the job id. This job id would have a 
different app id than the one in the test list except for when the job id was 
0. So when the first job in the list was job id 0 then the test would pass, and 
otherwise it would fail. The order in the list would change with each run 
because the list was a hash map.

 [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs 
 --

 Key: MAPREDUCE-2793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793.patch


 appIDs, jobIDs and attempt/container ids are not consistently named in the 
 logs, console and UI. For consistency purpose, they all have to follow a 
 common naming convention.
 Currently, 
 For appID
 =
 On the RM UI: app_1308259676864_5 
 On the JHS UI: No appID 
 Console/logs: No appID
 mapred-local dirs are named as: application_1308259676864_0005
 For jobID
 =
 On the RM UI: job_1308259676864_5_5 
 JHS UI: job_1308259676864_5_5 
 Console/logs: job_1308259676864_0005
 mapred-local dirs are named as: No jobID
 For attemptID
 
 On the RM UI: attempt_1308259676864_5_5_m_24_0
 JHS attempt_1308259676864_5_5_m_24_0
 Console/logs: attempt_1308259676864_0005_m_24_0
 mapred-local dirs are named as: container_1308259676864_0005_24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

2012-02-23 Thread Bikas Saha (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-2793:
--

Status: Patch Available  (was: Open)

 [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs 
 --

 Key: MAPREDUCE-2793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793.patch


 appIDs, jobIDs and attempt/container ids are not consistently named in the 
 logs, console and UI. For consistency purpose, they all have to follow a 
 common naming convention.
 Currently, 
 For appID
 =
 On the RM UI: app_1308259676864_5 
 On the JHS UI: No appID 
 Console/logs: No appID
 mapred-local dirs are named as: application_1308259676864_0005
 For jobID
 =
 On the RM UI: job_1308259676864_5_5 
 JHS UI: job_1308259676864_5_5 
 Console/logs: job_1308259676864_0005
 mapred-local dirs are named as: No jobID
 For attemptID
 
 On the RM UI: attempt_1308259676864_5_5_m_24_0
 JHS attempt_1308259676864_5_5_m_24_0
 Console/logs: attempt_1308259676864_0005_m_24_0
 mapred-local dirs are named as: container_1308259676864_0005_24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

2012-02-23 Thread Bikas Saha (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-2793:
--

Attachment: MAPREDUCE-2793-branch-0.23.patch

Changed the test to have jobid==appid. The AppContext methods that are supposed 
to return appId for the AppContext return null for this TestAppContext so that 
it crashes deterministically if it gets used in the  future.

 [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs 
 --

 Key: MAPREDUCE-2793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793.patch


 appIDs, jobIDs and attempt/container ids are not consistently named in the 
 logs, console and UI. For consistency purpose, they all have to follow a 
 common naming convention.
 Currently, 
 For appID
 =
 On the RM UI: app_1308259676864_5 
 On the JHS UI: No appID 
 Console/logs: No appID
 mapred-local dirs are named as: application_1308259676864_0005
 For jobID
 =
 On the RM UI: job_1308259676864_5_5 
 JHS UI: job_1308259676864_5_5 
 Console/logs: job_1308259676864_0005
 mapred-local dirs are named as: No jobID
 For attemptID
 
 On the RM UI: attempt_1308259676864_5_5_m_24_0
 JHS attempt_1308259676864_5_5_m_24_0
 Console/logs: attempt_1308259676864_0005_m_24_0
 mapred-local dirs are named as: container_1308259676864_0005_24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

2012-02-23 Thread Bikas Saha (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated MAPREDUCE-2793:
--

Status: Open  (was: Patch Available)

 [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs 
 --

 Key: MAPREDUCE-2793
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch, 
 MAPREDUCE-2793.patch


 appIDs, jobIDs and attempt/container ids are not consistently named in the 
 logs, console and UI. For consistency purpose, they all have to follow a 
 common naming convention.
 Currently, 
 For appID
 =
 On the RM UI: app_1308259676864_5 
 On the JHS UI: No appID 
 Console/logs: No appID
 mapred-local dirs are named as: application_1308259676864_0005
 For jobID
 =
 On the RM UI: job_1308259676864_5_5 
 JHS UI: job_1308259676864_5_5 
 Console/logs: job_1308259676864_0005
 mapred-local dirs are named as: No jobID
 For attemptID
 
 On the RM UI: attempt_1308259676864_5_5_m_24_0
 JHS attempt_1308259676864_5_5_m_24_0
 Console/logs: attempt_1308259676864_0005_m_24_0
 mapred-local dirs are named as: container_1308259676864_0005_24

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2942) TestNMAuditLogger.testNMAuditLoggerWithIP failing


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215218#comment-13215218
 ] 

Hudson commented on MAPREDUCE-2942:
---

Integrated in Hadoop-Hdfs-0.23-PB-Commit #2 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-PB-Commit/2/])
svn merge -c 1166842 from trunk for MAPREDUCE-2942. (Revision 1293033)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293033
Files : 
* /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNMAuditLogger.java
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java


 TestNMAuditLogger.testNMAuditLoggerWithIP failing
 -

 Key: MAPREDUCE-2942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Thomas Graves
Priority: Critical
 Fix For: 0.24.0

 Attachments: audittest.patch, audittest2.patch


 This is failing right after the MAPREDUCE-2655 commit, but Jenkins did report 
 a success when that patch was submitted.
 {code}
 Standard Output
 2011-09-07 07:12:52,785 INFO  ipc.Server (Server.java:run(349)) - Starting 
 Socket Reader #1 for port 33000
 2011-09-07 07:12:52,787 INFO  ipc.Server 
 (WritableRpcEngine.java:registerProtocolAndImpl(399)) - 
 ProtocolImpl=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer
  
 protocolClass=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer
  version=1
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(642)) - IPC Server 
 Responder: starting
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(473)) - IPC Server 
 listener on 33000: starting
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(1459)) - IPC Server 
 handler 0 on 33000: starting
 2011-09-07 07:12:52,798 INFO  ipc.Server (Server.java:run(1497)) - IPC Server 
 handler 0 on 33000, call: ping(), rpc version=2, client version=1, 
 methodsFingerPrint=-1968962669 from 67.195.138.31:33806, error: 
 java.io.IOException: java.io.IOException: Unknown protocol: 
 org.apache.hadoop.ipc.TestRPC$TestProtocol
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:622)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1485)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1483)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2942) TestNMAuditLogger.testNMAuditLoggerWithIP failing


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215221#comment-13215221
 ] 

Hudson commented on MAPREDUCE-2942:
---

Integrated in Hadoop-Common-0.23-PB-Commit #2 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-PB-Commit/2/])
svn merge -c 1166842 from trunk for MAPREDUCE-2942. (Revision 1293033)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293033
Files : 
* /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project
* /hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/TestNMAuditLogger.java
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/TestRMAuditLogger.java


 TestNMAuditLogger.testNMAuditLoggerWithIP failing
 -

 Key: MAPREDUCE-2942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Thomas Graves
Priority: Critical
 Fix For: 0.24.0

 Attachments: audittest.patch, audittest2.patch


 This is failing right after the MAPREDUCE-2655 commit, but Jenkins did report 
 a success when that patch was submitted.
 {code}
 Standard Output
 2011-09-07 07:12:52,785 INFO  ipc.Server (Server.java:run(349)) - Starting 
 Socket Reader #1 for port 33000
 2011-09-07 07:12:52,787 INFO  ipc.Server 
 (WritableRpcEngine.java:registerProtocolAndImpl(399)) - 
 ProtocolImpl=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer
  
 protocolClass=org.apache.hadoop.yarn.server.nodemanager.TestNMAuditLogger$MyTestRPCServer
  version=1
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(642)) - IPC Server 
 Responder: starting
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(473)) - IPC Server 
 listener on 33000: starting
 2011-09-07 07:12:52,788 INFO  ipc.Server (Server.java:run(1459)) - IPC Server 
 handler 0 on 33000: starting
 2011-09-07 07:12:52,798 INFO  ipc.Server (Server.java:run(1497)) - IPC Server 
 handler 0 on 33000, call: ping(), rpc version=2, client version=1, 
 methodsFingerPrint=-1968962669 from 67.195.138.31:33806, error: 
 java.io.IOException: java.io.IOException: Unknown protocol: 
 org.apache.hadoop.ipc.TestRPC$TestProtocol
   at 
 org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:622)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1485)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1483)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2793) [MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs

[
https://issues.apache.org/jira/browse/MAPREDUCE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215270#comment-13215270
]

Hadoop QA commented on MAPREDUCE-2793:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12515848/MAPREDUCE-2793-branch-0.23.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 27 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed unit tests in .

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1920//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1920//console

This message is automatically generated.

[MR-279] Maintain consistency in naming appIDs, jobIDs and attemptIDs
--

Key: MAPREDUCE-2793
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2793
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Bikas Saha
Priority: Critical
Fix For: 0.23.2

Attachments: MAPREDUCE-2793-branch-0.23.patch,
MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch,
MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch,
MAPREDUCE-2793-branch-0.23.patch, MAPREDUCE-2793-branch-0.23.patch,
MAPREDUCE-2793.patch

appIDs, jobIDs and attempt/container ids are not consistently named in the
logs, console and UI. For consistency purpose, they all have to follow a
common naming convention.
Currently,
For appID
=
On the RM UI: app_1308259676864_5
On the JHS UI: No appID
Console/logs: No appID
mapred-local dirs are named as: application_1308259676864_0005
For jobID
=
On the RM UI: job_1308259676864_5_5
JHS UI: job_1308259676864_5_5
Console/logs: job_1308259676864_0005
mapred-local dirs are named as: No jobID
For attemptID

On the RM UI: attempt_1308259676864_5_5_m_24_0
JHS attempt_1308259676864_5_5_m_24_0
Console/logs: attempt_1308259676864_0005_m_24_0
mapred-local dirs are named as: container_1308259676864_0005_24

[jira] [Updated] (MAPREDUCE-3614) finalState UNDEFINED if AM is killed by hand

2012-02-23 Thread Ravi Prakash (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-3614:


Attachment: MAPREDUCE-3614.patch

Discussed with Vinod and he told me that we should not drain the event queue in 
case of a SIGTERM in stop(). So I created a new shutdownhook that notifies the 
JHEH that SIGTERM had been called. 

I forgot to mention but thanks go to [~daryn] for helping me figure out a way 
to keep FileSystem objects open. :) 

  finalState UNDEFINED if AM is killed by hand
 -

 Key: MAPREDUCE-3614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Attachments: MAPREDUCE-3614.branch-0.23.patch, MAPREDUCE-3614.patch, 
 MAPREDUCE-3614.patch


 Courtesy [~dcapwell]
 {quote}
 If the AM is running and you kill the process (sudo kill #pid), the State in 
 Yarn would be FINISHED and FinalStatus is UNDEFINED.  The Tracking UI would 
 say History and point to the proxy url (which will redirect to the history 
 server).
 The state should be more descriptive that the job failed and the tracker url 
 shouldn't point to the history server.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-3904:
---

Summary: [NPE] Job history produced with mapreduce.cluster.acls.enabled 
false can not be viewed with mapreduce.cluster.acls.enabled true  (was: Job 
history produced with mapreduce.cluster.acls.enabled false can not be viewed 
with mapreduce.cluster.acls.enabled true)

 [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not 
 be viewed with mapreduce.cluster.acls.enabled true
 ---

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-3904.patch


 Job history page displays 'null'. It looks like job history files only 
 populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading 
 job history files, getAcls can return null, throwing an exception on the 
 HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-3904:
---

Status: Patch Available  (was: Open)

 [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not 
 be viewed with mapreduce.cluster.acls.enabled true
 ---

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-3904.patch


 Job history page displays 'null'. It looks like job history files only 
 populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading 
 job history files, getAcls can return null, throwing an exception on the 
 HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-3904:
---

Attachment: MAPREDUCE-3904.patch

 Job history produced with mapreduce.cluster.acls.enabled false can not be 
 viewed with mapreduce.cluster.acls.enabled true
 -

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-3904.patch


 Job history page displays 'null'. It looks like job history files only 
 populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading 
 job history files, getAcls can return null, throwing an exception on the 
 HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes

2012-02-23 Thread Bikas Saha (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215306#comment-13215306
]

Bikas Saha commented on MAPREDUCE-3353:
---

A potential solution would be the following
1) have the scheduler interface return the set of bad nodes on which it has
stopped scheduling. This keeps the decision of which node is bad in the
scheduler. The scheduler is the ultimate authority on what runs on a node and
should tell its clients whether about the nodes that it is not considering for
scheduling.
2) 1) above could be done as another interface API or piggybacked on the
scheduler.allocate() API.
3) The response could contain all the known bad nodes or deltas to the previous
response. Deltas are cheaper to send but are susceptible to message loss and
retransmission. Also, deltas would have to be divided into new bad nodes and
new good nodes.
4) The AM might want to know the type of bad node. Say lost or unhealthy etc.
The bad nodes information could be enhanced via querying the RMNode object for
the actual reason/health.

As an enhancement, we could add a new RMNodeMananger entity that manages all
the RMNodes. The above functionality could move from the scheduler into
RMNodeManager (though it would need to be in sync with the scheduler). After
that, getting detailed information may not need direct access to RMNode object.
Potentially, other interactions with RMNode could be forwarded through the
RMNodeManager. But this would be a fairly significant refactoring thats best
left to a separate future work item.

Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
-

Key: MAPREDUCE-3353
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Bikas Saha
Priority: Critical
Fix For: 0.23.2

When a node gets lost or turns faulty, AM needs to know about that event so
that it can take some action like for e.g. re-executing map tasks whose
intermediate output live on that faulty node.

[jira] [Commented] (MAPREDUCE-3368) compile-mapred-test fails


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215318#comment-13215318
 ] 

Hudson commented on MAPREDUCE-3368:
---

Integrated in Hadoop-Hdfs-0.23-PB-Commit #4 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-PB-Commit/4/])
Revert TestAuditLogger changes from MAPREDUCE-3368. (Revision 1293058)

 Result = SUCCESS
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293058
Files : 
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestAuditLogger.java


 compile-mapred-test fails
 -

 Key: MAPREDUCE-3368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Hitesh Shah
Priority: Critical
 Fix For: 0.23.1

 Attachments: MR-3368.1.patch


 compile-mapred-test target is failing once again.
 Details: 
 https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-0.23-Build/83/consoleFull

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215320#comment-13215320
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Common-0.23-Commit #587 (See 
[https://builds.apache.org/job/Hadoop-Common-0.23-Commit/587/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true

[
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215321#comment-13215321
]

Hadoop QA commented on MAPREDUCE-3904:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12515862/MAPREDUCE-3904.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.mapred.TestIndexCache

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1921//testReport/
Console output:
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1921//console

This message is automatically generated.

[NPE] Job history produced with mapreduce.cluster.acls.enabled false can not
be viewed with mapreduce.cluster.acls.enabled true
---

Key: MAPREDUCE-3904
URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
Attachments: MAPREDUCE-3904.patch

Job history page displays 'null'. It looks like job history files only
populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading
job history files, getAcls can return null, throwing an exception on the
HsJobBlock page.

[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215325#comment-13215325
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Hdfs-0.23-Commit #574 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-0.23-Commit/574/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = SUCCESS
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes

2012-02-23 Thread Bikas Saha (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215328#comment-13215328
]

Bikas Saha commented on MAPREDUCE-3353:
---

Not doing deltas on the RM-AM channel does not seem viable because of high
frequency message traffic. Sending information about 100 bad nodes at 100 bytes
per node for 1000AM's every second is about 10MB/s of traffic.
Sending deltas means tracking last and current states on the RM on a per AM
attempt basis. That would not be good to do in the scheduler because its not
the responsibility of the scheduler. So this needs to be done on each RMAttempt
object. The RMAttempt object gets the current list of bad nodes and compares it
with its last known list of bad nodes. Additions and deletions are sent to the
AM as new bad and good nodes.
Alternatively, each RMNode could send an event to each RMAppAttempt for
healthy-unhealthy and vice versa transitions. These events could be
accumulated and copied to the AM via the allocate response.

Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
-

When a node gets lost or turns faulty, AM needs to know about that event so
that it can take some action like for e.g. re-executing map tasks whose
intermediate output live on that faulty node.

[jira] [Commented] (MAPREDUCE-3368) compile-mapred-test fails


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215330#comment-13215330
 ] 

Hudson commented on MAPREDUCE-3368:
---

Integrated in Hadoop-Mapreduce-0.23-PB-Commit #2 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-PB-Commit/2/])
Revert TestAuditLogger changes from MAPREDUCE-3368. (Revision 1293058)

 Result = ABORTED
szetszwo : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293058
Files : 
* 
/hadoop/common/branches/branch-0.23-PB/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/TestAuditLogger.java


 compile-mapred-test fails
 -

 Key: MAPREDUCE-3368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Hitesh Shah
Priority: Critical
 Fix For: 0.23.1

 Attachments: MR-3368.1.patch


 compile-mapred-test target is failing once again.
 Details: 
 https://builds.apache.org/view/G-L/view/Hadoop/job/Hadoop-Mapreduce-0.23-Build/83/consoleFull

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-3738) NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215335#comment-13215335
 ] 

Hudson commented on MAPREDUCE-3738:
---

Integrated in Hadoop-Mapreduce-0.23-Commit #589 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-0.23-Commit/589/])
merge MAPREDUCE-3738 from trunk (Revision 1293061)

 Result = ABORTED
sseth : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1293061
Files : 
* /hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/AppLogAggregatorImpl.java
* 
/hadoop/common/branches/branch-0.23/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/TestLogAggregationService.java


 NM can hang during shutdown if AppLogAggregatorImpl thread dies unexpectedly
 

 Key: MAPREDUCE-3738
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3738
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Critical
 Fix For: 0.23.2

 Attachments: MAPREDUCE-3738.patch, livehistdump.txt


 If an AppLogAggregator thread dies unexpectedly (e.g.: uncaught exception 
 like OutOfMemoryError in the case I saw) then this will lead to a hang during 
 nodemanager shutdown.  The NM calls AppLogAggregatorImpl.join() during 
 shutdown to make sure log aggregation has completed, and that method 
 internally waits for an atomic boolean to be set by the log aggregation 
 thread to indicate it has finished.  Since the thread was killed off earlier 
 due to an uncaught exception, the boolean will never be set and the NM hangs 
 during shutdown repeating something like this every second in the log file:
 2012-01-25 22:20:56,366 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  Waiting for aggregation to complete for application_1326848182580_2806

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-3904:
---

Status: Open  (was: Patch Available)

 [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not 
 be viewed with mapreduce.cluster.acls.enabled true
 ---

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-3904.patch


 Job history page displays 'null'. It looks like job history files only 
 populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading 
 job history files, getAcls can return null, throwing an exception on the 
 HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated MAPREDUCE-3904:
---

Attachment: MAPREDUCE-3904.patch

 [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not 
 be viewed with mapreduce.cluster.acls.enabled true
 ---

 Key: MAPREDUCE-3904
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3904
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Jonathan Eagles
Assignee: Jonathan Eagles
 Attachments: MAPREDUCE-3904.patch, MAPREDUCE-3904.patch


 Job history page displays 'null'. It looks like job history files only 
 populate job acls when mapreduce.cluster.acls.enabled is true. Upon reading 
 job history files, getAcls can return null, throwing an exception on the 
 HsJobBlock page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-3904) [NPE] Job history produced with mapreduce.cluster.acls.enabled false can not be viewed with mapreduce.cluster.acls.enabled true