[jira] [Created] (MAPREDUCE-3101) Security issues in YARN

2011-09-26 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Security issues in YARN
---

 Key: MAPREDUCE-3101
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3101
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


Most of the chassis for security in YARN is set up and is working. There are 
known bugs and security holes though. This JIRA is an umbrella ticket for 
tracking those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3102) NodeManager should fail fast with wrong configuration or permissions for LinuxContainerExecutor

2011-09-26 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
NodeManager should fail fast with wrong configuration or permissions for 
LinuxContainerExecutor
---

 Key: MAPREDUCE-3102
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3102
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3103) Implement Job ACLs for MRAppMaster

2011-09-26 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Implement Job ACLs for MRAppMaster
--

 Key: MAPREDUCE-3103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3103
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3104) Implement Application ACLs, Queue ACLs and their interaction

2011-09-26 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Implement Application ACLs, Queue ACLs and their interaction


 Key: MAPREDUCE-3104
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3104
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3105) NM<->RM shared secrets should be rolled every so often.

2011-09-26 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
NM<->RM shared secrets should be rolled every so often. 


 Key: MAPREDUCE-3105
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3105
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3121) NodeManager should handle disk-failures

2011-09-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
NodeManager should handle disk-failures
---

 Key: MAPREDUCE-3121
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3121
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


This is akin to MAPREDUCE-2413 but for YARN's NodeManager. We want to minimize 
the impact of transient/permanent disk failures on containers. With larger 
number of disks per node, the ability to continue to run containers on other 
disks is crucial.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3141) Yarn+MR secure mode is broken, uncovered after MAPREDUCE-3056

2011-10-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Yarn+MR secure mode is broken, uncovered after MAPREDUCE-3056
-

 Key: MAPREDUCE-3141
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3141
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3143) Complete aggregation of user-logs spit out by containers onto DFS

2011-10-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Complete aggregation of user-logs spit out by containers onto DFS
-

 Key: MAPREDUCE-3143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


Already implemented the feature for handling user-logs spit out by containers 
in NodeManager. But the feature is currently disabled due to user-interface 
issues.

This is the umbrella ticket for tracking the pending bugs w.r.t putting 
container-logs on DFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3144) Augment JobHistory to include information needed for serving aggregated logs.

2011-10-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Augment JobHistory to include information needed for serving aggregated logs.
-

 Key: MAPREDUCE-3144
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3144
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3145) Fix NM UI to serve logs from DFS once application finishes

2011-10-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Fix NM UI to serve logs from DFS once application finishes
--

 Key: MAPREDUCE-3145
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3145
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3146) Add a MR specific command line to dump logs for a given TaskAttemptID

2011-10-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Add a MR specific command line to dump logs for a given TaskAttemptID
-

 Key: MAPREDUCE-3146
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3146
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3152) Miscellaneous web UI issues

2011-10-07 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Miscellaneous web UI issues
---

 Key: MAPREDUCE-3152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3152
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0


We need to fix the following issues on YARN web-UI:
 - Remove the "Note" column from the application list. When a failure happens, 
this "Note" spoils the table layout.
 - When the Application is still not running, the Tracking UI should be title 
"UNASSIGNED", for some reason it is titled "ApplicationMaster" but (correctly) 
links to "#".
 - The per-application page has all the RM related information like version, 
start-time etc. Must be some accidental change by one of the patches.
 - The diagnostics for a failed app on the per-application page don't retain 
new lines and wrap'em around - looks hard to read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3153) TestFileOutputCommitter.testFailAbort() is failing on trunk on Jenkins

2011-10-07 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TestFileOutputCommitter.testFailAbort() is failing on trunk on Jenkins
--

 Key: MAPREDUCE-3153
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3153
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


This mostly is caused by MAPREDUCE-2702.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3164) Decorate event transitions and the event-types with their behaviour

2011-10-10 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Decorate event transitions and the event-types with their behaviour
---

 Key: MAPREDUCE-3164
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3164
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


Helps to annotate the transitions with (start-state, end-state) pair and the 
events with (source, destination) pair.

Not just readability, we may also use them to generate the event diagrams 
across components.

Not a blocker for 0.23, but let's see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3172) Add cluster-level stats availabe via RPCs

2011-10-11 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Add cluster-level stats availabe via RPCs
-

 Key: MAPREDUCE-3172
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3172
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


MAPREDUCE-2738 already added the stats to the UI. It'll be helpful to add them 
to YarnClusterMetrics and make them available via the command-line/RPC.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3199) TestJobMonitorAndPrint is broken on trunk

2011-10-18 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TestJobMonitorAndPrint is broken on trunk
-

 Key: MAPREDUCE-3199
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3199
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


I bisected this down to MAPREDUCE-3003 changes. The parent project for 
client-core changed to hadoop-project which doesn't have the log4j 
configuration unlike the previous parent hadoop-mapreduce-client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3209) Jenkins reports 160 FindBugs warnings

2011-10-18 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Jenkins reports 160 FindBugs warnings
-

 Key: MAPREDUCE-3209
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3209
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


See
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1055//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-common.html
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1055//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-app.html
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1055//artifact/trunk/hadoop-mapreduce-project/patchprocess/newPatchFindbugsWarningshadoop-mapreduce-client-core.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3226) Few reduce tasks hanging in a gridmix-run

2011-10-19 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Few reduce tasks hanging in a gridmix-run
-

 Key: MAPREDUCE-3226
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3226
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, task
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0


In a gridmix run with ~1000 jobs, one job is getting stuck because of 2-3 
hanging reducers. All of the them are stuck after downloading all map outputs 
and have the following thread dump.

{code}
"EventFetcher for fetching Map Completion Events" daemon prio=10 tid=0xa325fc00 
nid=0x1ca4 waiting on condition [0xa315c000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher.run(EventFetcher.java:71)

"main" prio=10 tid=0x080ed400 nid=0x1c71 in Object.wait() [0xf73a2000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xa94b23d8> (a 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
at java.lang.Thread.join(Thread.java:1143)
- locked <0xa94b23d8> (a 
org.apache.hadoop.mapreduce.task.reduce.EventFetcher)
at java.lang.Thread.join(Thread.java:1196)
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:135)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:367)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1135)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:142)
{code}

Thanks to [~karams] for helping track this down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3228) MR AM hangs when one node goes bad

2011-10-20 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM hangs when one node goes bad
--

 Key: MAPREDUCE-3228
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3228
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0


Found this on one of the gridmix runs, again. One of the nodes went real bad, 
the job had three containers running on the node. Eventually, AM marked the 
tasks as timedout and initiated cleanup of the failed containers via 
{{stopContainer()}}. The later got stuck at the faulty node, the tasks are 
stuck in FAIL_CONTAINER_CLEANUP stage and the job lies in there waiting for 
ever.

Thanks to [~Karams] for helping with this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3229) Tests for verifying application-acl checks on the web-UI

2011-10-20 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Tests for verifying application-acl checks on the web-UI


 Key: MAPREDUCE-3229
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3229
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


MAPREDUCE-3104 added application-acls. We need tests which pull the web-pages 
with various login users and validate the authorization checks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3240) NM should send a SIGKILL for completed containers also

2011-10-21 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
NM should send a SIGKILL for completed containers also
--

 Key: MAPREDUCE-3240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


This is to address the containers which exit properly after spawning 
sub-processes themselves. We don't want to leave these sub-process-tree or else 
they can pillage the NM's resources.

Today, we already have code to send SIGKILL to the whole process-trees (because 
of single sessionId resulting from  setsid) when the container is alive. We 
need to obtain the PID of the containers when they start and use that PID to 
send signal for completed containers' case also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3245) Write an integration test for validating MR AM restart and recovery

2011-10-22 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Write an integration test for validating MR AM restart and recovery
---

 Key: MAPREDUCE-3245
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3245
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3249) Recovery of MR AMs with reduces fails the subsequent generation of the job

2011-10-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Recovery of MR AMs with reduces fails the subsequent generation of the job
--

 Key: MAPREDUCE-3249
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3249
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3250) When AM restarts, client keeps reconnecting to the new AM and prints a lots of logs.

2011-10-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
When AM restarts, client keeps reconnecting to the new AM and prints a lots of 
logs.


 Key: MAPREDUCE-3250
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3250
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3256) Authorization checks needed for AM->NM and AM->RM protocols

2011-10-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Authorization checks needed for AM->NM and AM->RM protocols
---

 Key: MAPREDUCE-3256
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3256
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3257) Authorization checks needed for AM->RM protocol

2011-10-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Authorization checks needed for AM->RM protocol
---

 Key: MAPREDUCE-3257
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3257
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3280) MR AM should not read the username from configuration

2011-10-27 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM should not read the username from configuration
-

 Key: MAPREDUCE-3280
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3280
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


MR AM reads the value for mapreduce.job.user.name from the configuration in 
several places. It should instead get the app-submitter name from the RM.

Once that is done, we can remove the default value for mapreduce.job.user.name 
from mapred-default.xml

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3281) TestLinuxContainerExecutorWithMocks failing on trunk.

2011-10-27 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TestLinuxContainerExecutorWithMocks failing on trunk.
-

 Key: MAPREDUCE-3281
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3281
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3296) Pending(9) findBugs warnings

2011-10-27 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Pending(9) findBugs warnings


 Key: MAPREDUCE-3296
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3296
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3306) Cannot run apps after MAPREDUCE-2989

2011-10-28 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Cannot run apps after MAPREDUCE-2989


 Key: MAPREDUCE-3306
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3306
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0


Seeing this in NM logs when trying to run jobs.
{code}
2011-10-28 21:40:21,263 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
 Processing application_1319818154209_0001 of type APPLICATION_INITED
2011-10-28 21:40:21,264 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: 
Error in dispatcher thread. Exiting..
java.util.NoSuchElementException
at java.util.HashMap$HashIterator.nextEntry(HashMap.java:796)
at java.util.HashMap$ValueIterator.next(HashMap.java:822)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl$AppInitDoneTransition.transition(ApplicationImpl.java:251)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl$AppInitDoneTransition.transition(ApplicationImpl.java:245)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
at 
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
at 
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:385)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationImpl.handle(ApplicationImpl.java:58)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:407)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher.handle(ContainerManagerImpl.java:399)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:116)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
at java.lang.Thread.run(Thread.java:662)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3333) MR AM for sort-job going out of memory

2011-11-02 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM for sort-job going out of memory
--

 Key: MAPREDUCE-
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker


[~Karams] just found this. The usual sort job on a 350 node cluster hung due to 
OutOfMemory and eventually failed after an hour instead of the usual odd 20 
minutes.
{code}
2011-11-02 11:40:36,438 ERROR [ContainerLauncher #258] 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Container 
launch failed for container_1320233407485_0002
_01_001434 : java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:88)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:290)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: com.google.protobuf.ServiceException: java.io.IOException: Failed on 
local exception: java.io.IOException: Couldn't set up IO streams; Host Details 
: local host is: "gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination 
host is: ""gsbl91525.blue.ygrid.yahoo.com":45450; 
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
at $Proxy20.startContainer(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagerPBClientImpl.startContainer(ContainerManagerPBClientImpl.java:81)
... 4 more
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: 
Couldn't set up IO streams; Host Details : local host is: 
"gsbl91281.blue.ygrid.yahoo.com/98.137.101.189"; destination host is: 
""gsbl91525.blue.ygrid.yahoo.com":45450; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:655)
at org.apache.hadoop.ipc.Client.call(Client.java:1089)
at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
... 6 more
Caused by: java.io.IOException: Couldn't set up IO streams
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:621)
at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
at org.apache.hadoop.ipc.Client.call(Client.java:1065)
... 7 more
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:597)
at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:614)
... 10 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3338) Remove hardcoded version of mr-app jar from the tests

2011-11-03 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Remove hardcoded version of mr-app jar from the tests
-

 Key: MAPREDUCE-3338
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3338
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


MiniMRYarnCluster and its related tests, and TestDistributedShell depend on a 
hard-coded version of mr-app jar. We need to figure out if we can avoid this. 
Otherwise, for every release, we have to keep changing these files manually - a 
pain.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3340) Deprecate Job.setJobSetupCleanupNeeded()

2011-11-03 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Deprecate Job.setJobSetupCleanupNeeded()


 Key: MAPREDUCE-3340
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3340
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


We should deprecate the setJobSetupCleanupNeeded() API. It was originally added 
for performance reasons to avoid launching new JVMs altogether for job-setup 
and job-cleanup. With Yarn and MRAppMaster, setup and cleanup are run inside 
the AM itself and so nothing much can be gained by making them optional.

Before 0.23, we could disable set up and cleanup, yet obtain the output when 
using FileOutputCommitter in the job-output directory. But post 0.23.0, that 
won't be the case because of the nested temporary directories to support AM 
recoverability. So it makes sense to *not* have cleanupJob optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3345) Race condition in ResourceManager causing TestContainerManagerSecurity to fail sometimes

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Race condition in ResourceManager causing TestContainerManagerSecurity to fail 
sometimes


 Key: MAPREDUCE-3345
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3345
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


See 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1247//testReport/org.apache.hadoop.yarn.server/TestContainerManagerSecurity/testUnauthorizedUser/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3349) No rack-name logged in JobHistory for unsuccessful tasks

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
No rack-name logged in JobHistory for unsuccessful tasks


 Key: MAPREDUCE-3349
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3349
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


Found this while running jobs on a cluster with [~Karams].

This is because TaskAttemptUnsuccessfulCompletionEvent history record doesn't 
have a rack field.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3350) Per-app RM page should have the list of application-attempts like on the app JHS page

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Per-app RM page should have the list of application-attempts like on the app 
JHS page
-

 Key: MAPREDUCE-3350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, webapps
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3351) TaskAttempt's state string is not consumed by MR AM web-UI

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TaskAttempt's state string is not consumed by MR AM web-UI
--

 Key: MAPREDUCE-3351
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3351
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


Jobs like random-writer use the state string to report the amount of work they 
have completed. JT used to print this on UI, AM webapp should do the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3352) Separate installation of mapreduce libraries from YARN_HOME

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Separate installation of mapreduce libraries from YARN_HOME
---

 Key: MAPREDUCE-3352
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3352
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


Time and again, I am running into scenarios where I just want to fix bugs in 
mapreduce app, replace the mapreduce libraries without bringing down YARN 
daemons and/or replacing the YARN RM/NM installation.

Today, we have separate HADOOP_MAPRED_HOME and YARN_HOME, but the installation 
directory is the same. We need to separate this.

We will need this eventually anyways, as MR is strictly a user-land library, 
just like PIG, HIVE etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3353) Need a RM->AM channel to inform AMs about faulty/unhealthy/lost nodes

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Need a RM->AM channel to inform AMs about faulty/unhealthy/lost nodes
-

 Key: MAPREDUCE-3353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


When a node gets lost or turns faulty, AM needs to know about that event so 
that it can take some action like for e.g. re-executing map tasks whose 
intermediate output live on that faulty node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3354) JobHistoryServer should be started by bin/mapred and not by bin/yarn

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
JobHistoryServer should be started by bin/mapred and not by bin/yarn


 Key: MAPREDUCE-3354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3354
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


JobHistoryServer belongs to mapreduce land.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3355) AM scheduling hangs frequently with sort job on 350 nodes

2011-11-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
AM scheduling hangs frequently with sort job on 350 nodes
-

 Key: MAPREDUCE-3355
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3355
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1


Another collaboration with [~karams]. Sort job hangs not so rarely on a 350 
node cluster. Found this in AM logs:
{code}

Exception in thread "ContainerLauncher #60" 
org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:379)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
... 4 more

Exception in thread "ContainerLauncher #53" 
org.apache.hadoop.yarn.YarnException: java.lang.InterruptedException
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:170)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl.sendContainerLaunchFailedMsg(ContainerLauncherImpl.java:405)
at 
org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:330)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1199)
at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:312)
at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:294)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:168)
... 5 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3382) Network ACLs can prevent AMs to ping the Job-end notification URL

2011-11-08 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Network ACLs can prevent AMs to ping the Job-end notification URL
-

 Key: MAPREDUCE-3382
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3382
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


MAPREDUCE-3028 added support for job-end notification from MR AMs after the job 
finishes. Network ACLs can have an implication on this one - outgoing 
connections from the compute nodes may be restricted in some settings and so 
job-end notification( that can originate from the AMs which may run on random 
nodes in the cluster) may have issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3402) AMScalability test of Sleep job with 100K 1-sec maps regressed into running very slowly

2011-11-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
AMScalability test of Sleep job with 100K 1-sec maps regressed into running 
very slowly
---

 Key: MAPREDUCE-3402
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3402
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


The world was rosier before October 19-25, [~karams] says.

The 100K 1 second sleep job used to take around 800mins or 13-14 mins. It now 
runs till 45 mins and still manages to complete only about 45K tasks.

One/more of the flurry of commits for 0.23.0 deserve(s) the blame.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3558) Integration test needed for MRV2 job-end notification feature

2011-12-14 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Integration test needed for MRV2 job-end notification feature
-

 Key: MAPREDUCE-3558
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3558
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


We can modify/port {{NotificationTestCase}} to work with {{MiniMRYarnCluster}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3559) MR AM can change to SUCCESS state but crash and fail to write JobHistory

2011-12-14 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM can change to SUCCESS state but crash and fail to write JobHistory


 Key: MAPREDUCE-3559
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3559
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


In such corner cases, clients can see that the job has become successful but 
they cannot get any information from JobHistoryServer. And there is no means of 
figuring out what happened in the system except from the AM logs.

Ideally we should set the JobState to become SUCCESS as the last step, once 
everything else is done. The code changes may be a bit complicated to achieve 
this, but we can investigate and see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3560) TestRMNodeTransitions is failing on trunk

2011-12-14 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TestRMNodeTransitions is failing on trunk
-

 Key: MAPREDUCE-3560
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3560
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager, test
Affects Versions: 0.23.1
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1


Apparently Jenkins is screwed up. It is happily blessing patches, even though 
tests are failing.

Link to logs: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1454//testReport/org.apache.hadoop.yarn.server.resourcemanager/TestRMNodeTransitions/testExpiredContainer/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3561) [Umbrella ticket] Performance issues in YARN+MR

2011-12-14 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
[Umbrella ticket] Performance issues in YARN+MR
---

 Key: MAPREDUCE-3561
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3561
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, performance
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


Been working on measuring performance of YARN+MR relative to the 0.20.xx 
release line together with [~karams].

This is an umbrella ticket to track all the issues related to performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3566) MR AM slows down due to repeatedly constructing ContainerLaunchContext

2011-12-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM slows down due to repeatedly constructing ContainerLaunchContext
--

 Key: MAPREDUCE-3566
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3566
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


The construction of the context is expensive, includes per-task trips to 
NameNode for obtaining the information about job.jar, job splits etc which is 
redundant across all tasks.

We should have a common job-level context and a task-specific context 
inheriting from the common job-level context.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3567) Extraneous JobConf objects in AM heap

2011-12-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Extraneous JobConf objects in AM heap
-

 Key: MAPREDUCE-3567
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3567
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3568) Optimize Job's progress calculations in MR AM

2011-12-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Optimize Job's progress calculations in MR AM
-

 Key: MAPREDUCE-3568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3568
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3569) TaskAttemptListener holds a global lock for all task-updates

2011-12-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TaskAttemptListener holds a global lock for all task-updates


 Key: MAPREDUCE-3569
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3569
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3572) MR AM's dispatcher is blocked by heartbeats to ResourceManager

2011-12-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM's dispatcher is blocked by heartbeats to ResourceManager
--

 Key: MAPREDUCE-3572
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3572
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3586) Lots of AMs hanging around in PIG testing

2011-12-20 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Lots of AMs hanging around in PIG testing
-

 Key: MAPREDUCE-3586
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3586
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1


[~daijy] found this. Here's what he says:
bq. I see hundreds of MRAppMaster process on my machine, and lots of tests fail 
for "Too many open files".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3616) Thread pool for launching containers in MR AM not expanding as expected

2012-01-04 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Thread pool for launching containers in MR AM not expanding as expected
---

 Key: MAPREDUCE-3616
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3616
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, performance
Affects Versions: 0.23.1
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


Found this while running some benchmarks on 350 nodes. The thread pool stays at 
60 for a long time and only expands to 350 towards the fag end of the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3634) All daemons should crash instead of hanging around when their EventHandlers get exceptions

2012-01-06 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
All daemons should crash instead of hanging around when their EventHandlers get 
exceptions
--

 Key: MAPREDUCE-3634
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3634
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


We should make sure that the daemons crash in case the dispatchers get 
exceptions and stop processing. That way we will be debugging RM/NM/AM crashes 
instead of hard-to-track hanging jobs. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3672) Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS

2012-01-13 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Killed maps shouldn't be counted towards JobCounter.NUM_FAILED_MAPS
---

 Key: MAPREDUCE-3672
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3672
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


We counted maps that are killed, say by speculator, towards 
JobCounter.NUM_FAILED_MAPS. We should instead have a separate JobCounter for 
killed maps.

Same with reduces too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3699) Default RPC handlers are very low for YARN servers

2012-01-19 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Default RPC handlers are very low for YARN servers
--

 Key: MAPREDUCE-3699
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3699
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


Mainly NM has a default of 5, RM has 10 and AM also has 10 irrespective of 
num-slots, num-nodes and num-tasks respectively. Though ideally we want to 
scale according to slots/nodes/tasks, for now increasing the defaults should be 
enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3714) Reduce hangs in a corner case

2012-01-23 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Reduce hangs in a corner case
-

 Key: MAPREDUCE-3714
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3714
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, task
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


[~karams] found this long time back and we(Sid/I) ran into this again.

Logs to follow..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3718) Default AM heartbeat interval should be one second

2012-01-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Default AM heartbeat interval should be one second
--

 Key: MAPREDUCE-3718
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3718
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3719) Make gridmix performance on YARN+MR to match or exceed that on 1.0

2012-01-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Make gridmix performance on YARN+MR to match or exceed that on 1.0
--

 Key: MAPREDUCE-3719
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3719
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3720) Command line listJobs should not visit each AM

2012-01-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Command line listJobs should not visit each AM
--

 Key: MAPREDUCE-3720
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3720
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


When the RM has a large number of jobs, {{bin/mapred job -list}} takes a long 
time as it visits each AM to get information like map-progress, reduce-progress 
etc.

We should move all per-AM information to {{bin/mapred job -status}} and keep 
the list just a list.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3729) Commit build failing TestJobClientGetJob, TestMRWithDistributedCache, TestLocalModeWithNewApis

2012-01-25 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Commit build failing TestJobClientGetJob, TestMRWithDistributedCache, 
TestLocalModeWithNewApis
--

 Key: MAPREDUCE-3729
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3729
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.1


See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1671//testReport/.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3731) SecureIOUtils should be used by NodeManager for serving logs and intermediate outputs

2012-01-25 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
SecureIOUtils should be used by NodeManager for serving logs and intermediate 
outputs
-

 Key: MAPREDUCE-3731
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3731
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3754) RM webapp should have pages filtered based on App-state

2012-01-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
RM webapp should have pages filtered based on App-state
---

 Key: MAPREDUCE-3754
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3754
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, webapps
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


Helps a lot when we have lot of apps. Already having difficulties with gridmix 
with a single big list of apps of all states.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3767) Fix and enable env tests in TestMiniMRChildTask

2012-01-30 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Fix and enable env tests in TestMiniMRChildTask
---

 Key: MAPREDUCE-3767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3767
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


This test is ported to YARN+MR via MAPREDUCE-3716. We should try to enable the 
env tests also.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3778) Per-state RM app-pages should have search ala JHS pages

2012-01-31 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Per-state RM app-pages should have search ala JHS pages
---

 Key: MAPREDUCE-3778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, webapps
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3795) "job -status" command line output is malformed

2012-02-02 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
"job -status" command line output is malformed
--

 Key: MAPREDUCE-3795
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3795
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


Misses new lines after numMaps and numReduces. Caused by MAPREDUCE-3720.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3805) MR AM not respecting MaxReduceRampUpLimit

2012-02-03 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM not respecting MaxReduceRampUpLimit
-

 Key: MAPREDUCE-3805
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3805
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.1


While running GridMixV3 with high memory reduces, we ran into issues where for 
jobs with significant number of maps and reduces, when the map progress hits 
98-99% but still there are maps pending, reduces get every new container that 
RM allocates. And the job takes much longer time than with usual reduces.

For addressing precisely these issues, a configurable limit was introduced to 
limit the reduce ramp up. Unfortunately this limit is not working correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3810) MR AM's ContainerAllocator is assigning the allocated containers very slowly

2012-02-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM's ContainerAllocator is assigning the allocated containers very slowly


 Key: MAPREDUCE-3810
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3810
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3812) Change default AM slot size to be 1GB

2012-02-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Change default AM slot size to be 1GB
-

 Key: MAPREDUCE-3812
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3812
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3813) RackResolver should maintain a cache to avoid repetitive lookups.

2012-02-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
RackResolver should maintain a cache to avoid repetitive lookups.
-

 Key: MAPREDUCE-3813
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3813
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3818) Trunk MRV1 compilation is broken.

2012-02-06 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Trunk MRV1 compilation is broken.
-

 Key: MAPREDUCE-3818
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3818
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, test
Affects Versions: 0.24.0
Reporter: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.24.0


Seeing this:
{code}
[javac] 
/Users/vinodkv/Workspace/eclipse-workspace/apache-git/hadoop-common/hadoop-mapreduce-project/src/test/mapred/org/apache/hadoop/mapred/TestSubmitJob.java:155:
 cannot find symbol
[javac] symbol  : class ClientNamenodeWireProtocol
[javac] location: class org.apache.hadoop.mapred.TestSubmitJob
[javac]   RPC.getProxy(ClientNamenodeWireProtocol.class,
[javac]^
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3823) Counters are getting calculated twice at job-finish and delaying clients.

2012-02-06 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Counters are getting calculated twice at job-finish and delaying clients.
-

 Key: MAPREDUCE-3823
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3823
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3827) Counters aggregation slowed down significantly after MAPREDUCE-3749

2012-02-06 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Counters aggregation slowed down significantly after MAPREDUCE-3749
---

 Key: MAPREDUCE-3827
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3827
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3836) TestContainersMonitor failing intermittently

2012-02-07 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
TestContainersMonitor failing intermittently


 Key: MAPREDUCE-3836
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3836
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, test
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


See https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/1814//testReport/ 
for an example failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3846) Restarted+Recovered AM hangs in some corner cases

2012-02-09 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Restarted+Recovered AM hangs in some corner cases
-

 Key: MAPREDUCE-3846
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3846
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3865) RM should throw different exceptions for while querying app/node/queue

2012-02-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
RM should throw different exceptions for while querying app/node/queue
--

 Key: MAPREDUCE-3865
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3865
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


We should distinguish the exceptions for absent app/node/queue, illegally 
accessed app/node/queue etc. Today everything is a {{YarnRemoteException}}. We 
should extend {{YarnRemoteException}} to add {{NotFoundException}}, 
{{AccessControlException}} etc. Today, {{AccessControlException}} exists but 
not as part of the protocol descriptions (i.e. only available to Java).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3866) bin/yarn prints the command line unnecessarily

2012-02-15 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
bin/yarn prints the command line unnecessarily
--

 Key: MAPREDUCE-3866
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3866
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Minor
 Fix For: 0.23.2


For commands like rmadmin, version etc, it also prints the whole command line 
unnecessarily.

This was /me from long time ago, pre alpha :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3888) Ensure/confirm that the NodeManager cleanup their local filesystem when they restart

2012-02-21 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Ensure/confirm that the NodeManager cleanup their local filesystem when they 
restart


 Key: MAPREDUCE-3888
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3888
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.2


We have to make sure that NodeManagers cleanup their local files on restart.

It may already be working like that in which case we should have tests 
validating this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3921) MR AM should act on the nodes liveliness information when nodes go up/down/unhealthy

2012-02-24 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR AM should act on the nodes liveliness information when nodes go 
up/down/unhealthy


 Key: MAPREDUCE-3921
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3931) MR tasks failing due to changing timestamps on Resources to download

2012-02-27 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR tasks failing due to changing timestamps on Resources to download


 Key: MAPREDUCE-3931
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3931
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 1.0.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 1.0.1


[~karams] reported this offline. Seems that tasks are randomly failing during 
gridmix runs:
{code}
2012-02-24 21:03:34,912 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report 
from attempt_1330116323296_0140_m_003868_0: RemoteTrace:
java.io.IOException: Resource 
hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar 
changed on src filesystem (expected 2971811411, was 1330116705875
   at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:90)
   at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157)
   at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153)
   at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 at LocalTrace:
   org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: 
Resource 
hdfs://hostname.com:8020/user/hadoop15/.staging/job_1330116323296_0140/job.jar 
changed on src filesystem (expected 2971811411, was 1330116705875
   at 
org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
   at 
org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:827)
   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:497)
   at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222)
   at 
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
   at 
org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
   at 
org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:342)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1493)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1489)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1487)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3932) MR tasks failing and crashing the AM when available-resources/headRoom becomes zero

2012-02-27 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
MR tasks failing and crashing the AM when available-resources/headRoom becomes 
zero
---

 Key: MAPREDUCE-3932
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3932
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2


[~karams] reported this offline. One reduce task gets preempted because of zero 
headRoom and crashes the AM.
{code}
2012-02-23 11:30:15,956 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 
AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 
availableResources(headroom):memory: 44544
2012-02-23 11:30:16,959 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: 
PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 
AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 
availableResources(headroom):memory: 44544
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 
AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 
availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Assign: 
PendingReduces:377 ScheduledMaps:6 ScheduledReduces:23 AssignedMaps:0 
AssignedReduces:0 completedMaps:4 completedReduces:0 containersAllocated:4 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 
availableResources(headroom):memory: 0
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
containers 3
2012-02-23 11:30:16,965 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container 
container_1329995034628_0983_01_06 to attempt_1329995034628_0983_r_00_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container 
container_1329995034628_0983_01_07 to attempt_1329995034628_0983_r_01_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned container 
container_1329995034628_0983_01_08 to attempt_1329995034628_0983_r_02_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Assign: 
PendingReduces:377 ScheduledMaps:6 ScheduledReduces:20 AssignedMaps:0 
AssignedReduces:3 completedMaps:4 completedReduces:0 containersAllocated:7 
containersReleased:0 hostLocalAssigned:0 rackLocalAssigned:4 
availableResources(headroom):memory: 0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down all 
scheduled reduces:20
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Going to preempt 2
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
attempt_1329995034628_0983_r_02_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Preempting 
attempt_1329995034628_0983_r_01_0
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
schedule...
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: completedMapPercent 
0.4 totalMemLimit:4608 finalMapMemLimit:2765 finalReduceMemLimit:1843 
netScheduledMapMem:9216 netScheduledReduceMem:4608
2012-02-23 11:30:16,966 INFO [RMCommunicator Allocator] 
org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Ramping down 0
2012-02-23 11:30:16,968 INFO [AsyncDispatcher event handler]

[jira] [Created] (MAPREDUCE-3940) ContainerTokens should have an expiry interval

2012-02-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
ContainerTokens should have an expiry interval
--

 Key: MAPREDUCE-3940
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3940
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2


 - RM should generate the expiry time for a container
 - A ContainerToken should have its expire time encoded
 - NMs should reject containers with expired tokens.
 - Expiry interval for a ContainerToken is same as the expiry interval for a 
container.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3942) Randomize master key generation for ApplicationTokenSecretManager and roll it every so often

2012-02-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Randomize master key generation for ApplicationTokenSecretManager and roll it 
every so often


 Key: MAPREDUCE-3942
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3942
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2


 - Master key for authentication of AMs need to be automatically generated.
 - The key needs to be rolled every so often but AMs with old keys should 
continue to be able to talk to the RM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3943) RM-NM secret-keys should be randomly generated and rolled every so often

2012-02-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
RM-NM secret-keys should be randomly generated and rolled every so often


 Key: MAPREDUCE-3943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3943
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
 Fix For: 0.23.2


 - RM should generate the master-key randomly
 - The master-key should roll every so often
 - NM should remember old expired keys so that already doled out 
container-requests can be satisfied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3945) JobHistoryServer should store tokens to authenticate clients across restart

2012-02-29 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
JobHistoryServer should store tokens to authenticate clients across restart
---

 Key: MAPREDUCE-3945
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3945
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


JobHistoryServer gives off delegation tokens so that clients can talk to it. It 
needs to store them off somewhere to authenticate clients across the server 
restart

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3966) Add separate cache for

2012-03-03 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
Add separate cache for 
---

 Key: MAPREDUCE-3966
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3966
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.1
Reporter: Vinod Kumar Vavilapalli


After MAPREDUCE-3901, we should have separate caches for list of CompletedJob 
with tasks and list of CompletedJob without tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3973) [] JobHistoryServer improvements in YARN+MR

2012-03-05 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
[] JobHistoryServer improvements in YARN+MR
---

 Key: MAPREDUCE-3973
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3973
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


Few parallel efforts are happening w.r.t improving/fixing issues with 
JobHistoryServer in MR over YARN. This is the umbrella ticket so we have the 
complete picture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-3988) mapreduce.job.local.dir doesn't point to a single directory on a node.

2012-03-08 Thread Vinod Kumar Vavilapalli (Created) (JIRA)
mapreduce.job.local.dir doesn't point to a single directory on a node.
--

 Key: MAPREDUCE-3988
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3988
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli


After MAPREDUCE-3975, mapreduce.job.local.dir is set correctly for the tasks 
but it doesn't point to the same directory for all tasks running on the node.

It is a public API. Either we should point to a single directory or point it to 
all directories and change the documentation to say that it points to all dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira