[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.2.patch

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774322#comment-13774322
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris. I attached a new patch and will launch a large scale test 
tomorrow.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5524) java.io.IOException: Task process exit with nonzero status of 255. how to fix it?

2013-09-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774379#comment-13774379
 ] 

Steve Loughran commented on MAPREDUCE-5524:
---

You're going to have the look in the Task Tracker logs to find out what 
happened there -all that means so far is that the process exited for some 
reason

 java.io.IOException: Task process exit with nonzero status of   255. how to 
 fix it?
 ---

 Key: MAPREDUCE-5524
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5524
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: hawkswood

  Task ..FAILED
  java.lang.Throwable: Child Error
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
  Caused by: java.io.IOException: Task process exit with nonzero status of
  255.
  at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5526) TestMRJobClient fails on Windows and Linux

2013-09-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5526:
--

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Duplicate of MAPREDUCE-5503.

 TestMRJobClient fails on Windows and Linux
 --

 Key: MAPREDUCE-5526
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5526
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 3.0.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor
 Attachments: MAPREDUCE-5526.patch


 The unit test fails on both Windows and Linux. I think the failures are due 
 to wrong assertion at several places.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774620#comment-13774620
 ] 

Sandy Ryza commented on MAPREDUCE-5508:
---

Does this mean that
{code}
FileSystem fs1 = FileSystem.get(conf);
FileSystem fs2 = FileSystem.get(conf);
{code}
could create either one or two FileSystems objects?

If that's the case we should document that FileSystem#close implementations 
must be idempotent 

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5527) Add CONTAINERS_MILLIS_MAPS|REDUCES counters

2013-09-23 Thread Sandy Ryza (JIRA)
Sandy Ryza created MAPREDUCE-5527:
-

 Summary: Add CONTAINERS_MILLIS_MAPS|REDUCES counters
 Key: MAPREDUCE-5527
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5527
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Sandy Ryza


It would be helpful to have a counters which report the total wallclock time 
spent in all map/reduce tasks.  This is what SLOTS_MILLIS_MAPS usually did in 
MR1. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated MAPREDUCE-5502:
--

Attachment: patch_5502_3.txt


Uploading a patch rebased against trunk so that Hadoop QA can apply it

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5522) Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.

2013-09-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-5522:
---

  Resolution: Fixed
   Fix Version/s: (was: 2.1.0-beta)
  2.3.0
Target Version/s: 2.1.0-beta, 3.0.0  (was: 3.0.0, 2.1.0-beta)
  Status: Resolved  (was: Patch Available)

I merged this into trunk and branch-2.  I am not totally sure what is happening 
with the 2.2 and 2.1 releases.  If you really want this to go into 2.1 and 2.2 
I'll look at what it takes to get things merged in properly.

 Incorrectly expect the array of JobQueueInfo returned by 
 o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.
 -

 Key: MAPREDUCE-5522
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5522
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
 Environment: Red Hat Enterprise 6 with Sun Java 1.7  IBM Java 1.6
Reporter: Jinghui Wang
Assignee: Jinghui Wang
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5522.patch


 There is a bug in test o.a.h.mapred.TestQueue. The implementation of 
 getJobQueueInfos in QueueManager uses the keySet of a HashMap to populate the 
 return value and since there is no guarantee in the ordering of the elements 
 in the keySet of a Hashmap, this test would fail if the order returned by 
 getJobQueueInfos is different than what the test is expecting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5514) TestRMContainerAllocator fails on trunk

2013-09-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775566#comment-13775566
 ] 

Zhijie Shen commented on MAPREDUCE-5514:


bq. May be the following will work:

I've done some more investigation and found:

1. We'd like to set the user name with the submitter's user name, which is 
consistent with what the production code does.

2. We'd like to add the AMRMToken for the ugi, and set it as the login user, 
because in addition to the register operation, the following operations may 
require the ugi and its AMRMToken as well, such as makeRequest.

Therefore, IMHO, the current patch is doing the correct thing.



 TestRMContainerAllocator fails on trunk
 ---

 Key: MAPREDUCE-5514
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5514
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker
 Attachments: MAPREDUCE-5514.1.patch, 
 org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator-output.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated MAPREDUCE-5502:
--

Status: Open  (was: Patch Available)

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated MAPREDUCE-5502:
--

Status: Patch Available  (was: Open)

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated MAPREDUCE-5502:
--

Status: Open  (was: Patch Available)

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Vrushali C (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vrushali C updated MAPREDUCE-5502:
--

Status: Patch Available  (was: Open)

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5522) Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.

2013-09-23 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774824#comment-13774824
 ] 

Robert Joseph Evans commented on MAPREDUCE-5522:


The patch looks good to me and the tests pass. +1

I'll check it in.

 Incorrectly expect the array of JobQueueInfo returned by 
 o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.
 -

 Key: MAPREDUCE-5522
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5522
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
 Environment: Red Hat Enterprise 6 with Sun Java 1.7  IBM Java 1.6
Reporter: Jinghui Wang
Assignee: Jinghui Wang
Priority: Minor
 Fix For: 3.0.0, 2.1.0-beta

 Attachments: MAPREDUCE-5522.patch


 There is a bug in test o.a.h.mapred.TestQueue. The implementation of 
 getJobQueueInfos in QueueManager uses the keySet of a HashMap to populate the 
 return value and since there is no guarantee in the ordering of the elements 
 in the keySet of a Hashmap, this test would fail if the order returned by 
 getJobQueueInfos is different than what the test is expecting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-5517:
---

Attachment: MAPREDUCE_5517_v3.patch.txt

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor
 Attachments: MAPREDUCE_5517_v3.patch.txt


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-5517:
---

Status: Patch Available  (was: Open)

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor
 Attachments: MAPREDUCE_5517_v3.patch.txt


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-5517:
---

Attachment: (was: MAPREDUCE_5517_v1.patch.txt)

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor
 Attachments: MAPREDUCE_5517_v2.patch.txt


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-5517:
---

Status: Open  (was: Patch Available)

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor
 Attachments: MAPREDUCE_5517_v2.patch.txt


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-5517:
---

Attachment: (was: MAPREDUCE_5517_v2.patch.txt)

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor

 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5522) Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.

2013-09-23 Thread Jinghui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775560#comment-13775560
 ] 

Jinghui Wang commented on MAPREDUCE-5522:
-

Thanks Robert. I don't think it's really necessary to merge it with 2.1 and 2.2 
as long as it's in trunk.

 Incorrectly expect the array of JobQueueInfo returned by 
 o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.
 -

 Key: MAPREDUCE-5522
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5522
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
 Environment: Red Hat Enterprise 6 with Sun Java 1.7  IBM Java 1.6
Reporter: Jinghui Wang
Assignee: Jinghui Wang
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5522.patch


 There is a bug in test o.a.h.mapred.TestQueue. The implementation of 
 getJobQueueInfos in QueueManager uses the keySet of a HashMap to populate the 
 return value and since there is no guarantee in the ordering of the elements 
 in the keySet of a Hashmap, this test would fail if the order returned by 
 getJobQueueInfos is different than what the test is expecting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5522) Incorrectly expect the array of JobQueueInfo returned by o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.

2013-09-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775554#comment-13775554
 ] 

Hudson commented on MAPREDUCE-5522:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #4457 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4457/])
MAPREDUCE-5522. Incorrect oreder expected from JobQueueInfo (Jinghui Wang via 
bobby) (bobby: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1525670)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestQueue.java


 Incorrectly expect the array of JobQueueInfo returned by 
 o.a.h.mapred.QueueManager#getJobQueueInfos to have a specific order.
 -

 Key: MAPREDUCE-5522
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5522
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta
 Environment: Red Hat Enterprise 6 with Sun Java 1.7  IBM Java 1.6
Reporter: Jinghui Wang
Assignee: Jinghui Wang
Priority: Minor
 Fix For: 3.0.0, 2.3.0

 Attachments: MAPREDUCE-5522.patch


 There is a bug in test o.a.h.mapred.TestQueue. The implementation of 
 getJobQueueInfos in QueueManager uses the keySet of a HashMap to populate the 
 return value and since there is no guarantee in the ordering of the elements 
 in the keySet of a Hashmap, this test would fail if the order returned by 
 getJobQueueInfos is different than what the test is expecting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775633#comment-13775633
 ] 

Sandy Ryza commented on MAPREDUCE-5508:
---

There are some tabs that should be converted to spaces. Other than that I am +1 
for the patch.


 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774702#comment-13774702
 ] 

Chris Nauroth commented on MAPREDUCE-5508:
--

+1 for the patch.  Thanks again, Xi.

[~sandyr], assuming that Xi's large scale tests come back showing no memory 
leaks, are you also +1 for this patch?  If so, then I will commit to branch-1 
and branch-1-win.

bq. Does this mean that ... could create either one or two FileSystems objects?

If that code sample is the only thread running, then only one instance is 
created, and fs1 == fs2.  With multiple threads running, it's 
non-deterministic, because the other threads could be running 
{{FileSystem#get}} and {{FileSystem#close}} on the same cached instances at 
just the right moment.  It's possible to get 2 instances created, and fs1 != 
fs2.

It's a good idea to document that {{FileSystem#close}} requires an idempotent 
implementation, in scope of a separate jira.  In practice, 
{{DistributedFileSystem}} does guarantee idempotence via a synchronized close 
method and an isRunning flag inside the {{DFSClient}}.

BTW, while researching some of these issues around the cache, I started to 
think that we ought to ref-count the instances to better guard against problems 
like this.  Then, I found HADOOP-4655.  Discussion in that issue made an 
intentional choice not to ref count in order to preserve 
backwards-compatibility with clients that don't call close.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server

2013-09-23 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5332:
--

Attachment: MAPREDUCE-5332-7.patch

Thanks for the thorough review, Daryn!  Updated the patch to address all but 
one of the concerns.  High-level changes include:

* Added an updateToken method to the state store interface, and filesystem 
store uses rename to try to make this atomic.
* Token buckets are created up front

bq. The DTSM has the stateStore so its recovery method could load the state - 
instead of the caller loading the state from the stateStore and passing it in. 
The code may become a bit easier to follow, but just a suggestion.

I kept this as-is.  It makes more sense if the history server were to persist 
more items in the future than just these tokens, as you'd want to load the 
state once then dole out the bits of state to the various entities that need to 
recover using that state.  Either that or the state stores should just be 
separate and per-service, then I agree that the recovery would be handled by 
each service.


 Support token-preserving restart of history server
 --

 Key: MAPREDUCE-5332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobhistoryserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, 
 MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, 
 MAPREDUCE-5332-6.patch, MAPREDUCE-5332-7.patch, MAPREDUCE-5332.patch


 To better support rolling upgrades through a cluster, the history server 
 needs the ability to restart without losing track of delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2013-09-23 Thread Albert Chu (JIRA)
Albert Chu created MAPREDUCE-5528:
-

 Summary: TeraSort fails with can't read paritions file - does 
not read partition file from distributed cache
 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 3.0.0
Reporter: Albert Chu
Priority: Minor


I was trying to run TeraSort against a parallel networked file system,
setting things up via the 'file:// scheme.  I always got the
following error when running terasort:

{noformat}
13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
attempt_1379960046506_0001_m_80_1, Status : FAILED
Error: java.lang.IllegalArgumentException: can't read paritions file
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
at org.apache.hadoop.util.Shell.run(Shell.java:417)
at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
... 10 more
{noformat}

After digging into TeraSort, I noticed that the partitions file was
created in the output directory, then added into the distributed cache

{noformat}
Path outputDir = new Path(args[1]);
...
Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
...
job.addCacheFile(partitionUri);
{noformat}

but the partitions file doesn't seem to be read back from the output
directory or distributed cache:

{noformat}
FileSystem fs = FileSystem.getLocal(conf);
...
Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
splitPoints = readPartitions(fs, partFile, conf);
{noformat}

It seems the file is being read from whatever the working directory is
for the filesystem returned from FileSystem.getLocal(conf).

Under HDFS this code works, the working directory seems to be the
distributed cache (I guess by default??).

But when I set things up with the networked file system and 'file://'
scheme, the working directory was the directory I was running my
Hadoop binaries out of.

The attached patch fixed things for me.  It grabs the partition file from the 
distributed cache all of the time, instead of trusting things underneath to 
work out.  It seems to be the right thing to do???

Apologies, I was unable to get this to reproduce under the TeraSort
example tests, such as TestTeraSort.java, so no test added.  Not sure what the 
subtle difference is in the setup.  I tested under both HDFS  'file' scheme 
and the patch worked under both.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2013-09-23 Thread Albert Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Chu updated MAPREDUCE-5528:
--

Attachment: MAPREDUCE-5528.patch

 TeraSort fails with can't read paritions file - does not read partition 
 file from distributed cache
 -

 Key: MAPREDUCE-5528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 3.0.0
Reporter: Albert Chu
Priority: Minor
 Attachments: MAPREDUCE-5528.patch


 I was trying to run TeraSort against a parallel networked file system, 
 setting things up via the 'file:// scheme.  I always got the following error 
 when running terasort:
 {noformat}
 13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
 attempt_1379960046506_0001_m_80_1, Status : FAILED
 Error: java.lang.IllegalArgumentException: can't read paritions file
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
 at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
 at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
 Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
 at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
 at org.apache.hadoop.util.Shell.run(Shell.java:417)
 at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
 at 
 org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
 at 
 org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
 at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
 at 
 org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
 ... 10 more
 {noformat}
 After digging into TeraSort, I noticed that the partitions file was created 
 in the output directory, then added into the distributed cache
 {noformat}
 Path outputDir = new Path(args[1]);
 ...
 Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
 ...
 job.addCacheFile(partitionUri);
 {noformat}
 but the partitions file doesn't seem to be read back from the output 
 directory or distributed cache:
 {noformat}
 FileSystem fs = FileSystem.getLocal(conf);
 ...
 Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
 splitPoints = readPartitions(fs, partFile, conf);
 {noformat}
 It seems the file is being read from whatever the working directory is for 
 the filesystem returned from FileSystem.getLocal(conf).
 Under HDFS this code works, the working directory seems to be the distributed 
 cache (I guess by default??).
 But when I set things up with the networked file system and 'file://' scheme, 
 the working directory was the directory I was running my Hadoop binaries out 
 of.
 The attached patch fixed things for me.  It grabs the partition file from the 
 distributed cache all of the time, instead of trusting things underneath to 
 work out.  It seems to be the right thing to do???
 Apologies, I was unable to get this to reproduce under the TeraSort example 
 tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
 difference is in the setup.  I tested under both HDFS  'file' scheme and the 
 patch worked under both.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Updated] (MAPREDUCE-5528) TeraSort fails with can't read paritions file - does not read partition file from distributed cache

2013-09-23 Thread Albert Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Chu updated MAPREDUCE-5528:
--

Description: 
I was trying to run TeraSort against a parallel networked file system, setting 
things up via the 'file:// scheme.  I always got the following error when 
running terasort:

{noformat}
13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
attempt_1379960046506_0001_m_80_1, Status : FAILED
Error: java.lang.IllegalArgumentException: can't read paritions file
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1499)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
Caused by: java.io.FileNotFoundException: File _partition.lst does not exist
at org.apache.hadoop.fs.Stat.parseExecResult(Stat.java:124)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:486)
at org.apache.hadoop.util.Shell.run(Shell.java:417)
at org.apache.hadoop.fs.Stat.getFileStatus(Stat.java:74)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getNativeFileLinkStatus(RawLocalFileSystem.java:808)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:740)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:525)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.init(ChecksumFileSystem.java:137)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.readPartitions(TeraSort.java:161)
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:246)
... 10 more
{noformat}

After digging into TeraSort, I noticed that the partitions file was created in 
the output directory, then added into the distributed cache

{noformat}
Path outputDir = new Path(args[1]);
...
Path partitionFile = new Path(outputDir, TeraInputFormat.PARTITION_FILENAME);
...
job.addCacheFile(partitionUri);
{noformat}

but the partitions file doesn't seem to be read back from the output directory 
or distributed cache:

{noformat}
FileSystem fs = FileSystem.getLocal(conf);
...
Path partFile = new Path(TeraInputFormat.PARTITION_FILENAME);
splitPoints = readPartitions(fs, partFile, conf);
{noformat}

It seems the file is being read from whatever the working directory is for the 
filesystem returned from FileSystem.getLocal(conf).

Under HDFS this code works, the working directory seems to be the distributed 
cache (I guess by default??).

But when I set things up with the networked file system and 'file://' scheme, 
the working directory was the directory I was running my Hadoop binaries out of.

The attached patch fixed things for me.  It grabs the partition file from the 
distributed cache all of the time, instead of trusting things underneath to 
work out.  It seems to be the right thing to do???

Apologies, I was unable to get this to reproduce under the TeraSort example 
tests, such as TestTeraSort.java, so no test added.  Not sure what the subtle 
difference is in the setup.  I tested under both HDFS  'file' scheme and the 
patch worked under both.


  was:
I was trying to run TeraSort against a parallel networked file system,
setting things up via the 'file:// scheme.  I always got the
following error when running terasort:

{noformat}
13/09/23 11:15:12 INFO mapreduce.Job: Task Id : 
attempt_1379960046506_0001_m_80_1, Status : FAILED
Error: java.lang.IllegalArgumentException: can't read paritions file
at 
org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.setConf(TeraSort.java:254)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.init(MapTask.java:678)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at 

[jira] [Created] (MAPREDUCE-5529) Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)
Robert Kanter created MAPREDUCE-5529:


 Summary: Binary incompatibilities in 
mapred.lib.TotalOrderPartitioner between branch-1 and branch-2
 Key: MAPREDUCE-5529
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5529
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker


{{mapred.lib.TotalPartitioner}} in branch-1 has these two methods:
{code:java}
public static String getPartitionFile(JobConf job)
public static void setPartitionFile(JobConf job, Path p)
{code}

In branch-2, {{mapred.lib.TotalPartitioner}} is now a subclass of 
{{mapreduce.lib.TotalPartitioner}}, from which it inherits the similar methods:
{code:java}
public static String getPartitionFile(Configuration conf)
public static void setPartitionFile(Configuration conf, Path p)
{code}

This means that any code that does either of the following:
{code:java}
TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
{code}
will not be binary compatible (that is, if compiled against branch-1, it will 
throw a {{NoSuchMethodError}} if run against branch-2).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5529) Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5529:
-

Description: 
{{mapred.lib.TotalOrderPartitioner}} in branch-1 has these two methods:
{code:java}
public static String getPartitionFile(JobConf job)
public static void setPartitionFile(JobConf job, Path p)
{code}

In branch-2, {{mapred.lib.TotalOrderPartitioner}} is now a subclass of 
{{mapred.lib.TotalOrderPartitioner}}, from which it inherits the similar 
methods:
{code:java}
public static String getPartitionFile(Configuration conf)
public static void setPartitionFile(Configuration conf, Path p)
{code}

This means that any code that does either of the following:
{code:java}
TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
{code}
will not be binary compatible (that is, if compiled against branch-1, it will 
throw a {{NoSuchMethodError}} if run against branch-2).

  was:
{{mapred.lib.TotalPartitioner}} in branch-1 has these two methods:
{code:java}
public static String getPartitionFile(JobConf job)
public static void setPartitionFile(JobConf job, Path p)
{code}

In branch-2, {{mapred.lib.TotalPartitioner}} is now a subclass of 
{{mapreduce.lib.TotalPartitioner}}, from which it inherits the similar methods:
{code:java}
public static String getPartitionFile(Configuration conf)
public static void setPartitionFile(Configuration conf, Path p)
{code}

This means that any code that does either of the following:
{code:java}
TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
{code}
will not be binary compatible (that is, if compiled against branch-1, it will 
throw a {{NoSuchMethodError}} if run against branch-2).


 Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 
 and branch-2
 --

 Key: MAPREDUCE-5529
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5529
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5529.patch


 {{mapred.lib.TotalOrderPartitioner}} in branch-1 has these two methods:
 {code:java}
 public static String getPartitionFile(JobConf job)
 public static void setPartitionFile(JobConf job, Path p)
 {code}
 In branch-2, {{mapred.lib.TotalOrderPartitioner}} is now a subclass of 
 {{mapred.lib.TotalOrderPartitioner}}, from which it inherits the similar 
 methods:
 {code:java}
 public static String getPartitionFile(Configuration conf)
 public static void setPartitionFile(Configuration conf, Path p)
 {code}
 This means that any code that does either of the following:
 {code:java}
 TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
 String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
 {code}
 will not be binary compatible (that is, if compiled against branch-1, it will 
 throw a {{NoSuchMethodError}} if run against branch-2).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5529) Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5529:
-

Status: Patch Available  (was: Open)

 Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 
 and branch-2
 --

 Key: MAPREDUCE-5529
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5529
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5529.patch


 {{mapred.lib.TotalOrderPartitioner}} in branch-1 has these two methods:
 {code:java}
 public static String getPartitionFile(JobConf job)
 public static void setPartitionFile(JobConf job, Path p)
 {code}
 In branch-2, {{mapred.lib.TotalOrderPartitioner}} is now a subclass of 
 {{mapred.lib.TotalOrderPartitioner}}, from which it inherits the similar 
 methods:
 {code:java}
 public static String getPartitionFile(Configuration conf)
 public static void setPartitionFile(Configuration conf, Path p)
 {code}
 This means that any code that does either of the following:
 {code:java}
 TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
 String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
 {code}
 will not be binary compatible (that is, if compiled against branch-1, it will 
 throw a {{NoSuchMethodError}} if run against branch-2).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5529) Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5529:
-

Attachment: MAPREDUCE-5529.patch

The patch adds the two missing methods to {{mapred.lib.TotalOrderPartitioner}}; 
they simply call the new methods in {{mapreduce.lib.TotalOrderPartitioner}} 
because they are source compatible.

Unit tests aren't really possible for this.

 Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 
 and branch-2
 --

 Key: MAPREDUCE-5529
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5529
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5529.patch


 {{mapred.lib.TotalPartitioner}} in branch-1 has these two methods:
 {code:java}
 public static String getPartitionFile(JobConf job)
 public static void setPartitionFile(JobConf job, Path p)
 {code}
 In branch-2, {{mapred.lib.TotalPartitioner}} is now a subclass of 
 {{mapreduce.lib.TotalPartitioner}}, from which it inherits the similar 
 methods:
 {code:java}
 public static String getPartitionFile(Configuration conf)
 public static void setPartitionFile(Configuration conf, Path p)
 {code}
 This means that any code that does either of the following:
 {code:java}
 TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
 String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
 {code}
 will not be binary compatible (that is, if compiled against branch-1, it will 
 throw a {{NoSuchMethodError}} if run against branch-2).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5530) Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)
Robert Kanter created MAPREDUCE-5530:


 Summary: Binary and source incompatibility in 
mapred.lib.CombineFileInputFormat between branch-1 and branch-2
 Key: MAPREDUCE-5530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker


{{mapred.lib.CombineFileInputFormat}} in branch-1 has this method:
{code:java}
protected boolean isSplitable(FileSystem fs, Path file)
{code}

In branch-2, {{mapred.lib.CombineFileInputFormat}} is now a subclass of 
{{mapreduce.lib.input.CombineFileInputFormat}}, from which it inherits the 
similar method:
{code:java}
protected boolean isSplitable(JobContext context, Path file)
{code}

This means that any code that subclasses {{mapred.lib.CombineFileInputFormat}} 
and does not provide its own implementation of {{protected boolean 
isSplitable(FileSystem fs, Path file)}} will not be binary or source compatible 
if it tries to call {{isSplitable}} with a {{FileSystem}} argument anywhere 
(that is, if compiled against branch-1, it will throw a {{NoSuchMethodError}} 
if run against branch-2; also, it won't even compile against branch-2).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5530) Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5530:
-

Attachment: MAPREDUCE-5530.patch

The patch adds the missing method to {{mapred.lib.CombineFileInputFormat}}.  
It's almost identical to the one that already exists in 
{{mapreduce.lib.input.CombineFileInputFormat}}.

Unit tests aren't really possible for this.

 Binary and source incompatibility in mapred.lib.CombineFileInputFormat 
 between branch-1 and branch-2
 

 Key: MAPREDUCE-5530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5530.patch


 {{mapred.lib.CombineFileInputFormat}} in branch-1 has this method:
 {code:java}
 protected boolean isSplitable(FileSystem fs, Path file)
 {code}
 In branch-2, {{mapred.lib.CombineFileInputFormat}} is now a subclass of 
 {{mapreduce.lib.input.CombineFileInputFormat}}, from which it inherits the 
 similar method:
 {code:java}
 protected boolean isSplitable(JobContext context, Path file)
 {code}
 This means that any code that subclasses 
 {{mapred.lib.CombineFileInputFormat}} and does not provide its own 
 implementation of {{protected boolean isSplitable(FileSystem fs, Path file)}} 
 will not be binary or source compatible if it tries to call {{isSplitable}} 
 with a {{FileSystem}} argument anywhere (that is, if compiled against 
 branch-1, it will throw a {{NoSuchMethodError}} if run against branch-2; 
 also, it won't even compile against branch-2).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5530) Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2

2013-09-23 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-5530:
-

Status: Patch Available  (was: Open)

 Binary and source incompatibility in mapred.lib.CombineFileInputFormat 
 between branch-1 and branch-2
 

 Key: MAPREDUCE-5530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5530.patch


 {{mapred.lib.CombineFileInputFormat}} in branch-1 has this method:
 {code:java}
 protected boolean isSplitable(FileSystem fs, Path file)
 {code}
 In branch-2, {{mapred.lib.CombineFileInputFormat}} is now a subclass of 
 {{mapreduce.lib.input.CombineFileInputFormat}}, from which it inherits the 
 similar method:
 {code:java}
 protected boolean isSplitable(JobContext context, Path file)
 {code}
 This means that any code that subclasses 
 {{mapred.lib.CombineFileInputFormat}} and does not provide its own 
 implementation of {{protected boolean isSplitable(FileSystem fs, Path file)}} 
 will not be binary or source compatible if it tries to call {{isSplitable}} 
 with a {{FileSystem}} argument anywhere (that is, if compiled against 
 branch-1, it will throw a {{NoSuchMethodError}} if run against branch-2; 
 also, it won't even compile against branch-2).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5332) Support token-preserving restart of history server

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775891#comment-13775891
 ] 

Hadoop QA commented on MAPREDUCE-5332:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12604682/MAPREDUCE-5332-7.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.TestMRJobClient

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4028//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4028//console

This message is automatically generated.

 Support token-preserving restart of history server
 --

 Key: MAPREDUCE-5332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobhistoryserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, 
 MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, 
 MAPREDUCE-5332-6.patch, MAPREDUCE-5332-7.patch, MAPREDUCE-5332.patch


 To better support rolling upgrades through a cluster, the history server 
 needs the ability to restart without losing track of delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.3.patch

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5517) enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb to be less than yarn.app.mapreduce.am.resource.mb

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775905#comment-13775905
 ] 

Hadoop QA commented on MAPREDUCE-5517:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12604660/MAPREDUCE_5517_v3.patch.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app:

org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4029//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4029//console

This message is automatically generated.

 enabling uber mode with 0 reducer still requires mapreduce.reduce.memory.mb 
 to be less than yarn.app.mapreduce.am.resource.mb
 -

 Key: MAPREDUCE-5517
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5517
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.5-alpha
Reporter: Siqi Li
Priority: Minor
 Attachments: MAPREDUCE_5517_v3.patch.txt


 Since there is no reducer, the memory allocated to reducer is irrelevant to 
 enable uber mode of a job

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Xi Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775904#comment-13775904
 ] 

Xi Fang commented on MAPREDUCE-5508:


Thanks Chris and Sandy. I just finished the large scale test. I didn't find 
memory leak in my test. I removed tabs and attached a new patch. 

So Chris, do you think we should file a new Jira for the idempotent 
implementation?



 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5530) Binary and source incompatibility in mapred.lib.CombineFileInputFormat between branch-1 and branch-2

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775939#comment-13775939
 ] 

Hadoop QA commented on MAPREDUCE-5530:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604710/MAPREDUCE-5530.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4031//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4031//console

This message is automatically generated.

 Binary and source incompatibility in mapred.lib.CombineFileInputFormat 
 between branch-1 and branch-2
 

 Key: MAPREDUCE-5530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5530.patch


 {{mapred.lib.CombineFileInputFormat}} in branch-1 has this method:
 {code:java}
 protected boolean isSplitable(FileSystem fs, Path file)
 {code}
 In branch-2, {{mapred.lib.CombineFileInputFormat}} is now a subclass of 
 {{mapreduce.lib.input.CombineFileInputFormat}}, from which it inherits the 
 similar method:
 {code:java}
 protected boolean isSplitable(JobContext context, Path file)
 {code}
 This means that any code that subclasses 
 {{mapred.lib.CombineFileInputFormat}} and does not provide its own 
 implementation of {{protected boolean isSplitable(FileSystem fs, Path file)}} 
 will not be binary or source compatible if it tries to call {{isSplitable}} 
 with a {{FileSystem}} argument anywhere (that is, if compiled against 
 branch-1, it will throw a {{NoSuchMethodError}} if run against branch-2; 
 also, it won't even compile against branch-2).  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5502) History link in resource manager is broken for KILLED jobs

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775947#comment-13775947
 ] 

Hadoop QA commented on MAPREDUCE-5502:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604638/patch_5502_3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

  org.apache.hadoop.mapreduce.TestMRJobClient

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient:

org.apache.hadoop.mapreduce.v2.TestUberAM

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4030//console

This message is automatically generated.

 History link in resource manager is broken for KILLED jobs
 --

 Key: MAPREDUCE-5502
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.0.5-alpha
Reporter: Vrushali C
Assignee: Vrushali C
  Labels: ui
 Attachments: patch_5502_2.txt, patch_5502_3.txt, patch_5502.txt


 History link in resource manager is broken for KILLED jobs.
 Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If 
 the State is 'FINISHED' and FinalStatus is 'KILLED', then the History link 
 is fine.
 It isn't easy to reproduce the problem since the time at which the app is 
 killed determines the state it ends up in, which is hard to guess. these 
 particular jobs seem to get a Diagnostics message of Application killed by 
 user. where as the other killed jobs get  Kill Job received from client 
 job_1378766187901_0002
 Job received Kill while in RUNNING state. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5529) Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 and branch-2

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775956#comment-13775956
 ] 

Hadoop QA commented on MAPREDUCE-5529:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12604702/MAPREDUCE-5529.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4032//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4032//console

This message is automatically generated.

 Binary incompatibilities in mapred.lib.TotalOrderPartitioner between branch-1 
 and branch-2
 --

 Key: MAPREDUCE-5529
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5529
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 2.1.1-beta
Reporter: Robert Kanter
Assignee: Robert Kanter
Priority: Blocker
 Attachments: MAPREDUCE-5529.patch


 {{mapred.lib.TotalOrderPartitioner}} in branch-1 has these two methods:
 {code:java}
 public static String getPartitionFile(JobConf job)
 public static void setPartitionFile(JobConf job, Path p)
 {code}
 In branch-2, {{mapred.lib.TotalOrderPartitioner}} is now a subclass of 
 {{mapred.lib.TotalOrderPartitioner}}, from which it inherits the similar 
 methods:
 {code:java}
 public static String getPartitionFile(Configuration conf)
 public static void setPartitionFile(Configuration conf, Path p)
 {code}
 This means that any code that does either of the following:
 {code:java}
 TotalOrderPartitioner.setPartitionFile(new JobConf(), new Path(/));
 String str = TotalOrderPartitioner.getPartitionFile(new JobConf());
 {code}
 will not be binary compatible (that is, if compiled against branch-1, it will 
 throw a {{NoSuchMethodError}} if run against branch-2).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered

2013-09-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated MAPREDUCE-5505:
---

Attachment: MAPREDUCE-5505.4.patch

Thanks [~bikassaha] for reviewing the patch. I've updated it accordingly.

bq. Typo - already ...

Fixed

bq. Lets use a true value to be back-compatible in case it gets used.

Fixed

bq. Typo - mock

Fixed

bq. Please put a comment for this non-obvious code. We need it because noone is 
calling shutdown job right. Alternatively, can MRApp.createJob() be changed to 
call MRAppMaster.shutdown() or set the boolean value to true. This would be 
closer to ideal that the current approach.

Agree the code is non-obvious. Instead of moving setting 
safeToReportTerminationToUser to MRApp.createJob(), I moved it to the 
constructor of MRApp, because safeToReportTerminationToUser is the per 
MRAppMaster variable.

bq. Can we have a test that verifies the main straightline case. Job succeeds 
and returns running until the boolean is set.?

Added TestMRApp#testJobSuccess

bq. How can we be sure that the previous state == RUNNING

Its a general issue. To solve it in all the transitions of JobImpl, I added the 
code to remember the last non-final state. Then, whenever 
safeToReportTerminationToUser is false, JobImpl returns the stored previous 
state instead of the final state, i.e., SUCCEEDED, FAILED, KILLED and ERROR.

bq. Has this been tested on a single node cluster with a real job?

Tested locally. The job client saw RUNNING until AM got unregistered.

 Clients should be notified job finished only after job successfully 
 unregistered 
 -

 Key: MAPREDUCE-5505
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, 
 MAPREDUCE-5505.3.patch, MAPREDUCE-5505.4.patch


 This is to make sure user is notified job finished after job is really done. 
 This does increase client latency but can reduce some races during unregister 
 like YARN-540

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered

2013-09-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775992#comment-13775992
 ] 

Hadoop QA commented on MAPREDUCE-5505:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12604732/MAPREDUCE-5505.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 5 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

  org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl

  The following test timeouts occurred in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

org.apache.hadoop.mapreduce.v2.app.TestRMContainerAllocator

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4033//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4033//console

This message is automatically generated.

 Clients should be notified job finished only after job successfully 
 unregistered 
 -

 Key: MAPREDUCE-5505
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, 
 MAPREDUCE-5505.3.patch, MAPREDUCE-5505.4.patch


 This is to make sure user is notified job finished after job is really done. 
 This does increase client latency but can reduce some races during unregister 
 like YARN-540

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5351) JobTracker memory leak caused by CleanupQueue reopening FileSystem

2013-09-23 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated MAPREDUCE-5351:
-

Fix Version/s: 1-win

I have also committed this patch branch-1-win.

 JobTracker memory leak caused by CleanupQueue reopening FileSystem
 --

 Key: MAPREDUCE-5351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1.1.2
Reporter: Sandy Ryza
Assignee: Sandy Ryza
Priority: Critical
 Fix For: 1-win, 1.2.1

 Attachments: MAPREDUCE-5351-1.patch, MAPREDUCE-5351-2.patch, 
 MAPREDUCE-5351-addendum-1.patch, MAPREDUCE-5351-addendum.patch, 
 MAPREDUCE-5351.patch


 When a job is completed, closeAllForUGI is called to close all the cached 
 FileSystems in the FileSystem cache.  However, the CleanupQueue may run after 
 this occurs and call FileSystem.get() to delete the staging directory, adding 
 a FileSystem to the cache that will never be closed.
 People on the user-list have reported this causing their JobTrackers to OOME 
 every two weeks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-5508.
--

  Resolution: Fixed
   Fix Version/s: 1.3.0
  1-win
Target Version/s: 1-win, 1.3.0
Hadoop Flags: Reviewed

I have committed this to branch-1 and branch-1-win.  Xi, thank you for 
providing a patch for this tricky issue.  Sandy, thank you for help with code 
reviews.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5505) Clients should be notified job finished only after job successfully unregistered

2013-09-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated MAPREDUCE-5505:
---

Attachment: MAPREDUCE-5505.5.patch

Updated the new patch to fix test failure.

 Clients should be notified job finished only after job successfully 
 unregistered 
 -

 Key: MAPREDUCE-5505
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He
Assignee: Zhijie Shen
 Attachments: MAPREDUCE-5505.1.patch, MAPREDUCE-5505.1.patch, 
 MAPREDUCE-5505.3.patch, MAPREDUCE-5505.4.patch, MAPREDUCE-5505.5.patch


 This is to make sure user is notified job finished after job is really done. 
 This does increase client latency but can reduce some races during unregister 
 like YARN-540

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-23 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776014#comment-13776014
 ] 

Chris Nauroth commented on MAPREDUCE-5508:
--

I filed HADOOP-9993 for documenting the requirement that {{FileSystem#close}} 
implementations must be idempotent.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Fix For: 1-win, 1.3.0

 Attachments: MAPREDUCE-5508.1.patch, MAPREDUCE-5508.2.patch, 
 MAPREDUCE-5508.3.patch, MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira