[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766097#comment-13766097
 ] 

Hadoop QA commented on MAPREDUCE-5379:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602922/mr-5379-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3999//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3999//console

This message is automatically generated.

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4680) Job history cleaner should only check timestamps of files in old enough directories

2013-09-13 Thread Robert Kanter (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Kanter updated MAPREDUCE-4680:
-

Attachment: MAPREDUCE-4680.patch

New patch suppresses the 3 new javac warnings (caused by a test) and fixes the 
test failure.

 Job history cleaner should only check timestamps of files in old enough 
 directories
 ---

 Key: MAPREDUCE-4680
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4680
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.0-alpha
Reporter: Sandy Ryza
Assignee: Robert Kanter
 Attachments: MAPREDUCE-4680.patch, MAPREDUCE-4680.patch


 Job history files are stored in /mm/dd folders.  Currently, the job 
 history cleaner checks the modification date of each file in every one of 
 these folders to see whether it's past the maximum age.  The load on HDFS 
 could be reduced by only checking the ages of files in directories that are 
 old enough, as determined by their name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4680) Job history cleaner should only check timestamps of files in old enough directories

2013-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766152#comment-13766152
 ] 

Hadoop QA commented on MAPREDUCE-4680:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602936/MAPREDUCE-4680.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4000//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4000//console

This message is automatically generated.

 Job history cleaner should only check timestamps of files in old enough 
 directories
 ---

 Key: MAPREDUCE-4680
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4680
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.0-alpha
Reporter: Sandy Ryza
Assignee: Robert Kanter
 Attachments: MAPREDUCE-4680.patch, MAPREDUCE-4680.patch


 Job history files are stored in /mm/dd folders.  Currently, the job 
 history cleaner checks the modification date of each file in every one of 
 these folders to see whether it's past the maximum age.  The load on HDFS 
 could be reduced by only checking the ages of files in directories that are 
 old enough, as determined by their name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4680) Job history cleaner should only check timestamps of files in old enough directories

2013-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13765857#comment-13765857
 ] 

Hadoop QA commented on MAPREDUCE-4680:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12602866/MAPREDUCE-4680.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 1149 javac 
compiler warnings (more than the trunk's current 1146 warnings).

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs:

  org.apache.hadoop.mapreduce.v2.hs.TestJobHistory

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3997//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3997//artifact/trunk/patchprocess/diffJavacWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/3997//console

This message is automatically generated.

 Job history cleaner should only check timestamps of files in old enough 
 directories
 ---

 Key: MAPREDUCE-4680
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4680
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.0.0-alpha
Reporter: Sandy Ryza
Assignee: Robert Kanter
 Attachments: MAPREDUCE-4680.patch


 Job history files are stored in /mm/dd folders.  Currently, the job 
 history cleaner checks the modification date of each file in every one of 
 these folders to see whether it's past the maximum age.  The load on HDFS 
 could be reduced by only checking the ages of files in directories that are 
 old enough, as determined by their name.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5505) Clients should be notified job finished after job successfully unregistered

2013-09-13 Thread Jian He (JIRA)
Jian He created MAPREDUCE-5505:
--

 Summary: Clients should be notified job finished after job 
successfully unregistered 
 Key: MAPREDUCE-5505
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5505
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jian He


This is to make sure user is notified job finished after job is really done. 
This does increase client latency but can reduce some races during unregister 
like YARN-540

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5329) APPLICATION_INIT is never sent to AuxServices other than the builtin ShuffleHandler

2013-09-13 Thread Avner BenHanoch (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766295#comment-13766295
 ] 

Avner BenHanoch commented on MAPREDUCE-5329:


Hi Siddharth,

My patch is ready for your review.  Its core part is ~25 lines.  The rest is 
mainly tests.

Thanks,
  Avner


 APPLICATION_INIT is never sent to AuxServices other than the builtin 
 ShuffleHandler
 ---

 Key: MAPREDUCE-5329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.1.0-beta, 2.0.6-alpha
Reporter: Avner BenHanoch
 Fix For: trunk

 Attachments: MAPREDUCE-5329.patch


 APPLICATION_INIT is never sent to AuxServices other than the built-in 
 ShuffleHandler.  This means that 3rd party ShuffleProvider(s) will not be 
 able to function, because APPLICATION_INIT enables the AuxiliaryService to 
 map jobId-userId. This is needed for properly finding the MOFs of a job per 
 reducers' requests.
 NOTE: The built-in ShuffleHandler does get APPLICATION_INIT events due to 
 hard-coded expression in hadoop code. The current TaskAttemptImpl.java code 
 explicitly call: serviceData.put (ShuffleHandler.MAPREDUCE_SHUFFLE_SERVICEID, 
 ...) and ignores any additional AuxiliaryService. As a result, only the 
 built-in ShuffleHandler will get APPLICATION_INIT events.  Any 3rd party 
 AuxillaryService will never get APPLICATION_INIT events.
 I think a solution can be in one of two ways:
 1. Change TaskAttemptImpl.java to loop on all Auxiliary Services and register 
 each of them, by calling serviceData.put (…) in loop.
 2. Change AuxServices.java similar to the fix in: MAPREDUCE-2668  
 APPLICATION_STOP is never sent to AuxServices.  This means that in case the 
 'handle' method gets APPLICATION_INIT event it will demultiplex it to all Aux 
 Services regardless of the value in event.getServiceID().
 I prefer the 2nd solution.  I am welcoming any ideas.  I can provide the 
 needed patch for any option that people like.
 See [Pluggable Shuffle in Hadoop 
 documentation|http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5506) Hadoop-1.1.1 occurs ArrayIndexOutOfBoundsException with MultithreadedMapRunner

2013-09-13 Thread sam liu (JIRA)
sam liu created MAPREDUCE-5506:
--

 Summary: Hadoop-1.1.1 occurs ArrayIndexOutOfBoundsException with 
MultithreadedMapRunner
 Key: MAPREDUCE-5506
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5506
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.1.1
 Environment: RHEL 6.3 x86_64
Reporter: sam liu
Priority: Blocker


After I set:
- 'jobConf.setMapRunnerClass(MultithreadedMapRunner.class);' in MR app
- 'mapred.map.multithreadedrunner.threads = 2' in mapred-site.xml

A simple MR app failed as its Map task encountered 
ArrayIndexOutOfBoundsException as below(please ignore the line numbers in the 
exception as I added some log print codes):
java.lang.ArrayIndexOutOfBoundsException
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1331)
at java.io.DataOutputStream.write(DataOutputStream.java:101)
at org.apache.hadoop.io.Text.write(Text.java:282)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1060)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:591)
at study.hadoop.mapreduce.sample.WordCount$Map.map(WordCount.java:41)
at study.hadoop.mapreduce.sample.WordCount$Map.map(WordCount.java:1)
at 
org.apache.hadoop.mapred.lib.MultithreadedMapRunner$MapperInvokeRunable.run(MultithreadedMapRunner.java:231)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:738)


And the exception happens on line 'System.arraycopy(b, off, kvbuffer, bufindex, 
len)' in MapTask.java#MapOutputBuffer#Buffer#write(). When the exception 
occurs, 'b.length=4' but 'len=9'. 

Btw, if I set 'mapred.map.multithreadedrunner.threads = 1', no exception 
happened. So it should be an issue caused by multiple threads.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5164) command mapred job and mapred queue omit HADOOP_CLIENT_OPTS

2013-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766508#comment-13766508
 ] 

Hudson commented on MAPREDUCE-5164:
---

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1547 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1547/])
MAPREDUCE-5164. mapred job and queue commands omit HADOOP_CLIENT_OPTS. 
Contributed by Nemon Lou. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred.cmd


 command  mapred job and mapred queue omit HADOOP_CLIENT_OPTS 
 -

 Key: MAPREDUCE-5164
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5164
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Nemon Lou
Assignee: Nemon Lou
 Fix For: 2.1.1-beta

 Attachments: MAPREDUCE-5164.patch, MAPREDUCE-5164.patch, 
 MAPREDUCE-5164.patch, MAPREDUCE-5164.patch


 HADOOP_CLIENT_OPTS does not take effect when type mapred job -list and 
 mapred queue -list.
 The mapred script omit it 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5164) command mapred job and mapred queue omit HADOOP_CLIENT_OPTS

2013-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766400#comment-13766400
 ] 

Hudson commented on MAPREDUCE-5164:
---

SUCCESS: Integrated in Hadoop-Yarn-trunk #331 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/331/])
MAPREDUCE-5164. mapred job and queue commands omit HADOOP_CLIENT_OPTS. 
Contributed by Nemon Lou. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred.cmd


 command  mapred job and mapred queue omit HADOOP_CLIENT_OPTS 
 -

 Key: MAPREDUCE-5164
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5164
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Nemon Lou
Assignee: Nemon Lou
 Fix For: 2.1.1-beta

 Attachments: MAPREDUCE-5164.patch, MAPREDUCE-5164.patch, 
 MAPREDUCE-5164.patch, MAPREDUCE-5164.patch


 HADOOP_CLIENT_OPTS does not take effect when type mapred job -list and 
 mapred queue -list.
 The mapred script omit it 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5164) command mapred job and mapred queue omit HADOOP_CLIENT_OPTS

2013-09-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766470#comment-13766470
 ] 

Hudson commented on MAPREDUCE-5164:
---

FAILURE: Integrated in Hadoop-Hdfs-trunk #1521 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1521/])
MAPREDUCE-5164. mapred job and queue commands omit HADOOP_CLIENT_OPTS. 
Contributed by Nemon Lou. (devaraj: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1522595)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred
* /hadoop/common/trunk/hadoop-mapreduce-project/bin/mapred.cmd


 command  mapred job and mapred queue omit HADOOP_CLIENT_OPTS 
 -

 Key: MAPREDUCE-5164
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5164
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.0.3-alpha
Reporter: Nemon Lou
Assignee: Nemon Lou
 Fix For: 2.1.1-beta

 Attachments: MAPREDUCE-5164.patch, MAPREDUCE-5164.patch, 
 MAPREDUCE-5164.patch, MAPREDUCE-5164.patch


 HADOOP_CLIENT_OPTS does not take effect when type mapred job -list and 
 mapred queue -list.
 The mapred script omit it 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5507) MapReduce reducer preemption gets hanged

2013-09-13 Thread Omkar Vinit Joshi (JIRA)
Omkar Vinit Joshi created MAPREDUCE-5507:


 Summary: MapReduce reducer preemption gets hanged
 Key: MAPREDUCE-5507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Omkar Vinit Joshi


Today if we are setting yarn.app.mapreduce.am.job.reduce.rampup.limit and 
mapreduce.job.reduce.slowstart.completedmaps then reducer are launched more 
aggressively. However the calculation to either Ramp up or Ramp down reducer is 
not down in most optimal way. 
* If MR AM at any point sees situation something like 
** scheduledMaps : 30
** scheduledReducers : 10
** assignedMaps : 0
** assignedReducers : 11
** finishedMaps : 120
** headroom : 756 ( when your map /reduce task needs only 512mb)
* then today it simply hangs because it thinks that there is sufficient room to 
launch one more mapper and therefore there is no need to ramp down. However, if 
this continues forever then this is not the correct way / optimal way.
* Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 and 
there are running reducers around should wait for certain time ( upper limited 
by average map task completion time ... for heuristic sake)..but after that if 
still it doesn't get new container for map task then should preempt the reducer 
one by one with some interval and should ramp up slowly...
** Preemption of reducer can be done in little smarter way
*** preempt reducer on a node manager for which there is any pending map 
request.
*** otherwise preempt any other reducer. MR AM will contribute to getting new 
mapper by releasing such a reducer / container because it will reduce its 
cluster consumption and thereby may become candidate for an allocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-5507) MapReduce reducer preemption gets hanged

2013-09-13 Thread Omkar Vinit Joshi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omkar Vinit Joshi reassigned MAPREDUCE-5507:


Assignee: Omkar Vinit Joshi

 MapReduce reducer preemption gets hanged
 

 Key: MAPREDUCE-5507
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5507
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Omkar Vinit Joshi
Assignee: Omkar Vinit Joshi

 Today if we are setting yarn.app.mapreduce.am.job.reduce.rampup.limit and 
 mapreduce.job.reduce.slowstart.completedmaps then reducer are launched more 
 aggressively. However the calculation to either Ramp up or Ramp down reducer 
 is not down in most optimal way. 
 * If MR AM at any point sees situation something like 
 ** scheduledMaps : 30
 ** scheduledReducers : 10
 ** assignedMaps : 0
 ** assignedReducers : 11
 ** finishedMaps : 120
 ** headroom : 756 ( when your map /reduce task needs only 512mb)
 * then today it simply hangs because it thinks that there is sufficient room 
 to launch one more mapper and therefore there is no need to ramp down. 
 However, if this continues forever then this is not the correct way / optimal 
 way.
 * Ideally for MR AM when it sees that assignedMaps drops have dropped to 0 
 and there are running reducers around should wait for certain time ( upper 
 limited by average map task completion time ... for heuristic sake)..but 
 after that if still it doesn't get new container for map task then should 
 preempt the reducer one by one with some interval and should ramp up slowly...
 ** Preemption of reducer can be done in little smarter way
 *** preempt reducer on a node manager for which there is any pending map 
 request.
 *** otherwise preempt any other reducer. MR AM will contribute to getting new 
 mapper by releasing such a reducer / container because it will reduce its 
 cluster consumption and thereby may become candidate for an allocation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766901#comment-13766901
 ] 

Andrew Wang commented on MAPREDUCE-5379:


Thanks Karthik, the patch looks good to me. As I'm not well-versed in the ways 
of MR, it'd be good to get confirmation from someone else as well.

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13766896#comment-13766896
 ] 

Karthik Kambatla commented on MAPREDUCE-5379:
-

I verified this manually on both non-secure and secure clusters. On the secure 
cluster, the tracking id shows up in the jobconf. On the non-secure cluster, 
made sure there were no regressions and the jobs ran fine. 

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5332) Support token-preserving restart of history server

2013-09-13 Thread Jason Lowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5332:
--

Attachment: MAPREDUCE-5332-6.patch

Minor tweak to patch to set the permissions on the file during the create which 
should reduce the number of RPC calls when using HDFS as the filesystem.

 Support token-preserving restart of history server
 --

 Key: MAPREDUCE-5332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5332
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobhistoryserver
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5332-2.patch, MAPREDUCE-5332-3.patch, 
 MAPREDUCE-5332-4.patch, MAPREDUCE-5332-5.patch, MAPREDUCE-5332-5.patch, 
 MAPREDUCE-5332-6.patch, MAPREDUCE-5332.patch


 To better support rolling upgrades through a cluster, the history server 
 needs the ability to restart without losing track of delegation tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5379:


Status: Open  (was: Patch Available)

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5379:


Attachment: mr-5379-4.patch

Updated patch to not use a Joiner, and use conf.setStrings instead.

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767055#comment-13767055
 ] 

Karthik Kambatla commented on MAPREDUCE-5379:
-

Manually verified the updated patch as well - the tracking id shows up in the 
jobconf.

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4421) Remove dependency on deployed MR jars

2013-09-13 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767146#comment-13767146
 ] 

Jason Lowe commented on MAPREDUCE-4421:
---

Thanks for the review, Hitesh!

bq. Why does classpath need to include all of common, hdfs and yarn jar 
locations? Assuming that MR is running on a YARN-based cluster, shouldn't the 
location of the core dependencies come from the cluster deployment i.e. via the 
env that the NM sets for a container. I believe the only jars that MR should 
have in its uploaded tarball should be the client jars. I understand that there 
is no clear boundary for client-side only jars for common and hdfs today ( for 
For YARN, I believe it should be simple to split out the client-side 
requirements ) but it is something we should aim for or assume that the jars 
deployed on the cluster are compatible.

This is primarily for avoiding jar conflicts and removing dependencies on the 
nodes.  If the cluster upgrades and picks up a new version of 
jackson/jersey/guava/name-your-favorite-jar-that-breaks-apps-when-updated then 
that means existing apps can suddenly break due to jar conflicts.  Another case 
we've seen is when a dependency jar is dropped between versions, and apps were 
depending upon those to be provided by Hadoop.  Having the apps provide all of 
their dependencies means we can focus on just the RPC layer compatibilities 
(something we have to solve anyway) rather than have to worry as well about the 
myriad of combinations between jars within the app and those being picked up 
from the nodes.

However if desired the user could configure it to work with just a partial 
tarball by setting the classpath to pickup the jars on the nodes via 
HADOOP_COMMON_HOME/HADOOP_HDFS_HOME/HADOOP_YARN_HOME references in the 
classpath like MRApps is doing today.

bq. I would vote to make the tar-ball in HDFS be the only way to run MR on 
YARN. Obviously, this cannot be done for 2.x but we should move to this model 
on trunk and not support the current approach at all there. Comments?

I'm all for it, and I see this as being a stepping stone to getting there.  
We'd like to have the ability to run out of HDFS in 2.x as a potential way to 
do a rolling upgrade of bugfixes in the MR framework.  It probably won't be a 
complete solution to all forms of upgrades (i.e.: what if the client code or 
ShuffleHandler needs the fix), but it could still be very useful in practice.

bq. The other point is related to configs.

Yes, final parameter configs on the nodes conflicting with the job.xml settings 
are another concern.  In practice I don't expect that to be a common issue, but 
it is something we should try to address in a followup JIRA.

bq. How do you see framework name extracted from the path to be used? Is it 
just a safety check to ensure that it is found in the classpath? Will it have 
any relation to a version?

I see the framework fragment alias primarily used for sanity-checks in case 
the classpath wasn't updated when using a specified framework and to allow the 
classpath settings to be a bit more general.  For example, ops could configure 
the classpath once based on an expected framework tarball layout (e.g.: 
mrframework/share/mapreduce/* : mrframework/share/mapreduce/lib/* etc) and 
different versions of the tarball can be used without modifying the classpath 
as long as they match that layout.  e.g.: mrtarball-2.3.1.tgz#mrframework, 
mrtarball-2.3.4.tgz#mrframework, etc.  It's sort of like the assumed-layout 
approach from your last comment.  Ops could set the classpath and users could 
select the framework version without having to set the classpath as long as the 
layout is compatible.  Users could still override the classpath if using a 
framework that isn't compatible with the assumed layout.

One problem with the common classpath approach is that the archives need to 
have the same directory structure, so top-level directories with the version 
number in them break it.  The tarballs deployed to HDFS would have to be 
reorganized to have a common dir name rather than the versioned name.  Not 
difficult to do, but it is annoying.

bq. A minor nit - framework name seems confusing in relation to the framework 
name in use from earlier i.e yarn vs local framework.

Yeah, that's true.  I'm open to suggestions for what to call this instead of 
framework.

bq. Regarding versions, it seems like users will need to do 2 things. Change 
the location of the tarball on HDFS and modify the classpath. Users will need 
to know the exact structure of the classpath.  In such a scenario, do defaults 
even make sense?

I wanted this to be flexible so ops/users could decide how to organize the 
framework (i.e.: partial/complete tarball, monolithic jar, whatever) and be 
able to set the classpath accordingly.  I thought about hardcoding the 
assumption of the layout, but then that 

[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types

2013-09-13 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5504:
--

Attachment: MAPREDUCE-5504.patch

Hi,
I've created a patch for this issue.

 mapred queue -info inconsistent with types
 --

 Key: MAPREDUCE-5504
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.9
Reporter: Thomas Graves
 Attachments: MAPREDUCE-5504.patch


 $ mapred queue -info default
 ==
 Queue Name : default
 Queue State : running
 Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: 
 0.9309831
 The capacity is displayed in % as 4, however maximum capacity is displayed as 
 an absolute number 0.67 instead of 67%.
 We should make these consistent with the type we are displaying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types

2013-09-13 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5504:
--

Target Version/s: 3.0.0, 0.23.10  (was: 0.23.10)
  Status: Patch Available  (was: Open)

 mapred queue -info inconsistent with types
 --

 Key: MAPREDUCE-5504
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.9
Reporter: Thomas Graves
 Attachments: MAPREDUCE-5504.patch


 $ mapred queue -info default
 ==
 Queue Name : default
 Queue State : running
 Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: 
 0.9309831
 The capacity is displayed in % as 4, however maximum capacity is displayed as 
 an absolute number 0.67 instead of 67%.
 We should make these consistent with the type we are displaying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5379) Include token tracking ids in jobconf

2013-09-13 Thread Alejandro Abdelnur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767170#comment-13767170
 ] 

Alejandro Abdelnur commented on MAPREDUCE-5379:
---

+1 LGTM

 Include token tracking ids in jobconf
 -

 Key: MAPREDUCE-5379
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5379
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission, security
Affects Versions: 2.1.0-beta
Reporter: Sandy Ryza
Assignee: Karthik Kambatla
 Attachments: MAPREDUCE-5379-1.patch, MAPREDUCE-5379-2.patch, 
 MAPREDUCE-5379.patch, mr-5379-3.patch, mr-5379-4.patch


 HDFS-4680 enables audit logging delegation tokens. By storing the tracking 
 ids in the job conf, we can enable tracking what files each job touches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5504) mapred queue -info inconsistent with types

2013-09-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767188#comment-13767188
 ] 

Hadoop QA commented on MAPREDUCE-5504:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12603154/MAPREDUCE-5504.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4002//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4002//console

This message is automatically generated.

 mapred queue -info inconsistent with types
 --

 Key: MAPREDUCE-5504
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.23.9
Reporter: Thomas Graves
 Attachments: MAPREDUCE-5504.patch


 $ mapred queue -info default
 ==
 Queue Name : default
 Queue State : running
 Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: 
 0.9309831
 The capacity is displayed in % as 4, however maximum capacity is displayed as 
 an absolute number 0.67 instead of 67%.
 We should make these consistent with the type we are displaying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-5508) Memory Leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)
Xi Fang created MAPREDUCE-5508:
--

 Summary: Memory Leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob
 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical


MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object that is properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Summary: Memory leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob  (was: Memory Leak caused by unreleased FileSystem 
objects in JobInProgress#cleanupJob)

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object that is properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767210#comment-13767210
 ] 

Xi Fang commented on MAPREDUCE-5508:


This bug was found in Microsoft's large scale test with about 200,000 job 
submissions. The memory usage is steadily growing up. 

There is a long discussion between Hortonworks (thanks [~cnauroth] and 
[~vinodkv]) and Microsoft on this issue. Here is the summary of the discussion.

1. The heap dumps are showing DistributedFileSystem instances that are only 
referred to from the cache's HashMap entries. Since nothing else has a 
reference, nothing else can ever attempt to close it, and therefore it will 
never be removed from the cache. 

2. The special check for tempDirFS (see code in description) in the patch for 
MAPREDUCE-5351 is intended as an optimization so that CleanupQueue doesn't need 
to immediately reopen a FileSystem that was just closed. However, we observed 
that we're getting different identity hash code values on the subject in the 
key. The code is assuming that CleanupQueue will find the same Subject that was 
used inside JobInProgress. Unfortunately, this is not guaranteed, because we 
may have crossed into a different access control context at this point, via 
UserGroupInformation#doAs. Even though it's conceptually the same user, the 
Subject is a function of the current AccessControlContext:
{code}
  public synchronized
  static UserGroupInformation getCurrentUser() throws IOException {
AccessControlContext context = AccessController.getContext();
Subject subject = Subject.getSubject(context);
{code}
Even if the contexts are logically equivalent between JobInProgress and 
CleanupQueue, we see no guarantee that Java will give you the same Subject 
instance, which is required for successful lookup in the FileSystem cache 
(because of the use of identity hash code).

A fix is abandon this optimization and close the FileSystem within the same 
AccessControlContext that opened it.  


 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object that is properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Description: 
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}


  was:
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object that is properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}



 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Description: 
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
 if (tempDirFs != fs) {
  try {
fs.close();
  } catch (IOException ie) {
...
}
{code}


  was:
MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
object (see tempDirFs) that is not properly released.
{code} JobInProgress#cleanupJob()

  void cleanupJob() {
...
  tempDirFs = jobTempDirPath.getFileSystem(conf);
  CleanupQueue.getInstance().addToQueue(
  new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
...
{code}



 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical

 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Attachment: MAPREDUCE-5508.patch

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (MAPREDUCE-5508) Memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on MAPREDUCE-5508 started by Xi Fang.

 Memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 ---

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Xi Fang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xi Fang updated MAPREDUCE-5508:
---

Summary: JobTracker memory leak caused by unreleased FileSystem objects in 
JobInProgress#cleanupJob  (was: Memory leak caused by unreleased FileSystem 
objects in JobInProgress#cleanupJob)

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Sandy Ryza (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated MAPREDUCE-5508:
--

Affects Version/s: 1.2.1

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5508) JobTracker memory leak caused by unreleased FileSystem objects in JobInProgress#cleanupJob

2013-09-13 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767308#comment-13767308
 ] 

Sandy Ryza commented on MAPREDUCE-5508:
---

Have you tested this fix?  I took a deeper look into this and it doesn't appear 
that tempDirFs and fs are ever even ending up equal because tempDirFs is 
created with the wrong UGI.

The deeper problem to me is that we are creating a new UGI, which can have a 
new subject, which can create a new entry in the FS cache, every time 
CleanupQueue#deletePath is called with a null UGI.  This occurs here:
{code}
CleanupQueue.getInstance().addToQueue(
new PathDeletionContext(tempDir, conf));
{code}

A better fix would be to avoid this, either by having CleanupQueue hold a UGI 
of the login user for use in these situations or to avoid the doAs entirely 
when the given UGI is null.

 JobTracker memory leak caused by unreleased FileSystem objects in 
 JobInProgress#cleanupJob
 --

 Key: MAPREDUCE-5508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 1-win, 1.2.1
Reporter: Xi Fang
Assignee: Xi Fang
Priority: Critical
 Attachments: MAPREDUCE-5508.patch


 MAPREDUCE-5351 fixed a memory leak problem but introducing another filesystem 
 object (see tempDirFs) that is not properly released.
 {code} JobInProgress#cleanupJob()
   void cleanupJob() {
 ...
   tempDirFs = jobTempDirPath.getFileSystem(conf);
   CleanupQueue.getInstance().addToQueue(
   new PathDeletionContext(jobTempDirPath, conf, userUGI, jobId));
 ...
  if (tempDirFs != fs) {
   try {
 fs.close();
   } catch (IOException ie) {
 ...
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira