[jira] [Assigned] (MAPREDUCE-4039) Sort Avoidance

2012-03-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-4039:


Assignee: anty

 Sort Avoidance
 --

 Key: MAPREDUCE-4039
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4039
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.2
Reporter: anty.rao
Assignee: anty
Priority: Minor
 Fix For: 0.23.2

 Attachments: MAPREDUCE-4039-branch-0.23.2.patch, 
 MAPREDUCE-4039-branch-0.23.2.patch


 Inspired by 
 [Tenzing|http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37200.pdf],
  in 5.1 MapReduce Enhanceemtns:
 {quote}*Sort Avoidance*. Certain operators such as hash join
 and hash aggregation require shuffling, but not sorting. The
 MapReduce API was enhanced to automatically turn off
 sorting for these operations. When sorting is turned off, the
 mapper feeds data to the reducer which directly passes the
 data to the Reduce() function bypassing the intermediate
 sorting step. This makes many SQL operators significantly
 more ecient.{quote}
 There are a lot of applications which need aggregation only, not 
 sorting.Using sorting to achieve aggregation is costly and inefficient. 
 Without sorting, up application can make use of hash table or hash map to do 
 aggregation efficiently.But application should bear in mind that reduce 
 memory is limited, itself is committed to manage memory of reduce, guard 
 against out of memory. Map-side combiner is not supported, you can also do 
 hash aggregation in map side  as a workaround.
 the following is the main points of sort avoidance implementation
 # add a configuration parameter ??mapreduce.sort.avoidance??, boolean type, 
 to turn on/off sort avoidance workflow.Two type of workflow are coexist 
 together.
 # key/value pairs emitted by map function is sorted by partition only, using 
 a more efficient sorting algorithm: counting sort.
 # map-side merge, use a kind of byte merge, which just concatenate bytes from 
 generated spills, read in bytes, write out bytes, without overhead of 
 key/value serialization/deserailization, comparison, which current version 
 incurs.
 # reduce can start up as soon as there is any map output available, in 
 contrast to sort workflow which must wait until all map outputs are fetched 
 and merged.
 # map output in memory can be directly consumed by reduce.When reduce can't 
 catch up with the speed of incoming map outputs, in-memory merge thread will 
 kick in, merging in-memory map outputs onto disk.
 # sequentially read in on-disk files to feed reduce, in contrast to currently 
 implementation which read multiple files concurrently, result in many disk 
 seek. Map output in memory take precedence over on disk files in feeding 
 reduce function.
 I have already implement this feature based on hadoop CDH3U3 and done some 
 performance evaluation, you can reference to 
 [https://github.com/hanborq/hadoop] for details. Now,I'm willing to port it 
 into yarn. Welcome for commenting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-4025) AM can crash if task attempt reports bogus progress value

2012-03-19 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-4025:


Assignee: Jason Lowe

 AM can crash if task attempt reports bogus progress value
 -

 Key: MAPREDUCE-4025
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4025
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.2
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
 Attachments: MAPREDUCE-4025.patch


 If a task attempt reports a bogus progress value (e.g.: something above 1.0) 
 then the AM can crash like this:
 {noformat}
 java.lang.ArrayIndexOutOfBoundsException: 12
   at 
 org.apache.hadoop.mapred.PeriodicStatsAccumulator.extend(PeriodicStatsAccumulator.java:185)
   at 
 org.apache.hadoop.mapred.WrappedPeriodicStatsAccumulator.extend(WrappedPeriodicStatsAccumulator.java:31)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.updateProgressSplits(TaskAttemptImpl.java:1043)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.access$4100(TaskAttemptImpl.java:136)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:1509)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl$StatusUpdater.transition(TaskAttemptImpl.java:1490)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:357)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:298)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
   at 
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:931)
   at 
 org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl.handle(TaskAttemptImpl.java:135)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:886)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$TaskAttemptEventDispatcher.handle(MRAppMaster.java:878)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:125)
   at 
 org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:74)
   at java.lang.Thread.run(Thread.java:619)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3921) MR AM should act on the nodes liveliness information when nodes go up/down/unhealthy

2012-02-29 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3921:


Assignee: Bikas Saha

 MR AM should act on the nodes liveliness information when nodes go 
 up/down/unhealthy
 

 Key: MAPREDUCE-3921
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3921
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Bikas Saha
 Fix For: 0.23.2




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3833) Capacity scheduler queue refresh doesn't recompute queue capacities properly

2012-02-07 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3833:


Assignee: Jason Lowe  (was: Arun C Murthy)

Jason deserves credit for the hardwork here...

 Capacity scheduler queue refresh doesn't recompute queue capacities properly
 

 Key: MAPREDUCE-3833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-3833-testcase.patch, MAPREDUCE-3833.patch


 Refreshing the capacity scheduler configuration (e.g.: via yarn rmadmin 
 -refreshQueues) can fail to compute the proper absolute capacity for leaf 
 queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes

2012-02-07 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3353:


Assignee: Vinod Kumar Vavilapalli

 Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
 -

 Key: MAPREDUCE-3353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical
 Fix For: 0.23.1


 When a node gets lost or turns faulty, AM needs to know about that event so 
 that it can take some action like for e.g. re-executing map tasks whose 
 intermediate output live on that faulty node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3747) Memory Total is not refreshed until an app is launched

2012-02-02 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3747:


Assignee: Arun C Murthy

 Memory Total is not refreshed until an app is launched
 --

 Key: MAPREDUCE-3747
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3747
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Arun C Murthy
 Fix For: 0.23.1


 Memory Total on the RM UI is not refreshed until an application is launched. 
 This is a problem when the cluster is started for the first time or when 
 there are any lost/decommissioned NMs.
 When the cluster is started for the first time, Active Nodes is  0 but the 
 Memory Total=0. Also when there are any lost/decommissioned nodes, Memory 
 Total has wrong value.
 This is a useful tool for cluster admins and has to be updated correctly 
 without having the need to submit an app each time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3784) maxActiveApplications(|PerUser) per queue is too low for small clusters

2012-02-01 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3784:


Assignee: Arun C Murthy

 maxActiveApplications(|PerUser) per queue is too low for small clusters
 ---

 Key: MAPREDUCE-3784
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3784
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Arun C Murthy

 We ran into this issue while testing on small clusters. 
 On a 7node cluster with 8G per node,  for a queue with absolute capacity 30%, 
 user limit 100%, maxActiveApplications and maxActiveApplicationsPerUser is 
 calculated to be 1.
 This means that even though the queue has 17GB(0.3*8*7), only 1 user can run 
 1 app at a given time queuing up rest of the apps/users. This hurts 
 performance on small clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3427) streaming tests fail with MR2

2012-01-30 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3427:


Assignee: Hitesh Shah

 streaming tests fail with MR2
 -

 Key: MAPREDUCE-3427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3427
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming, mrv2
Affects Versions: 0.23.1, 0.24.0
Reporter: Alejandro Abdelnur
Assignee: Hitesh Shah
Priority: Blocker
 Fix For: 0.23.1, 0.24.0


 After Mavenizing streaming and getting its testcases to use the MiniMRCluster 
 wrapper (MAPREDUCE-3169), 4 testcases fail to pass.
 Following is an assessment of those failures. Note that the testcases have 
 been tweaked only to set the streaming JAR and yarn as the  framework.
  
 (If these issues are unrelated we should create sub-tasks for each one of 
 them).
 *TestStreamingCombiner*, fails because returned counters don't match 
 assertion. However, counters printed in the test output indicate values that 
 would satisfy the assertion. As Tom has indicated it seems MR/YARN are not 
 passing back counter information to the client API.
 *TestStreamingBadRecords*, the job is failing with the following exception
 {code}
 Application application_1321575850006_0001 failed 1 times due to AM Container 
 for 
 appattempt_1321575850006_0001_01 exited with  exitCode: 127 due to: 
 .Failing this attempt.. Failing the application.
 {code}
 Difficult to troubleshoot because there are not task logs from Mini MR/YARN  
 run.
 *TestStreamingStatus* fails in validateTaskStatus() in the following assertion
 {code}
 expected:[before consuming input  sort] but was:[SUCCEEDED]
 {code}
 *TestUlimit* fails with
 {code}
 org.junit.ComparisonFailure: output is wrong expected:[786432] but 
 was:[unlimited]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3640) AMRecovery should pick completed task form partial JobHistory files

2012-01-30 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3640:


Assignee: Arun C Murthy  (was: Siddharth Seth)

 AMRecovery should pick completed task form partial JobHistory files
 ---

 Key: MAPREDUCE-3640
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3640
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Siddharth Seth
Assignee: Arun C Murthy
Priority: Blocker
 Attachments: MAPREDUCE-3640.patch


 Currently, if the JobHistory file has a partial record, AMRecovery will start 
 from scratch. This will become more relevant after MAPREDUCE-3512.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3763) Failed refreshQueues due to misconfiguration prevents further refreshing of queues

2012-01-30 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3763:


Assignee: Arun C Murthy

 Failed refreshQueues due to misconfiguration prevents further refreshing of 
 queues
 --

 Key: MAPREDUCE-3763
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3763
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Arun C Murthy
 Fix For: 0.23.1


 Stumbled upon this problem while refreshing queues with incorrect 
 configuration. The exact scenario was:
 1. Added a new queue newQueue without defining its capacity.
 2. bin/mapred queue -refreshQueues fails correctly with Illegal capacity 
 of -1 for queue root.newQueue
 3. However, after defining the capacity of newQueue followed by a second 
 bin/mapred queue -refreshQueues throws 
 org.apache.hadoop.metrics2.MetricsException: Metrics source 
 QueueMetrics,q0=root,q1=newQueue already exists! Also see 
 Hadoop:name=QueueMetrics,q0=root,q1=newQueue,service=ResourceManager metrics 
 being available even though the queue was not added.
 The expected behavior would be to refresh the queues correctly and allow 
 addition of newQueue. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3699) Default RPC handlers are very low for YARN servers

2012-01-25 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3699:


Assignee: Hitesh Shah

 Default RPC handlers are very low for YARN servers
 --

 Key: MAPREDUCE-3699
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3699
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Hitesh Shah
 Fix For: 0.23.1


 Mainly NM has a default of 5, RM has 10 and AM also has 10 irrespective of 
 num-slots, num-nodes and num-tasks respectively. Though ideally we want to 
 scale according to slots/nodes/tasks, for now increasing the defaults should 
 be enough.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3681) capacity scheduler LeafQueues calculate used capacity wrong

2012-01-17 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3681:


Assignee: Arun C Murthy

 capacity scheduler LeafQueues calculate used capacity wrong
 ---

 Key: MAPREDUCE-3681
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3681
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Blocker

 In the Capacity scheduler if you configure the queues to be hierarchical 
 where you have root - parent queue - leaf queue, the leaf queue doesn't 
 calculate the used capacity properly. It seems to be using the entire cluster 
 memory rather then its parents memory capacity. 
 In updateResource in LeafQueue:
 setUsedCapacity(
 usedResources.getMemory() / (clusterResource.getMemory() * capacity));
 I think the clusterResource.getMemory() should be something like 
 getParentsMemory().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3683) Capacity scheduler LeafQueues maximum capacity calculation issues

2012-01-17 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3683:


Assignee: Arun C Murthy

 Capacity scheduler LeafQueues maximum capacity calculation issues
 -

 Key: MAPREDUCE-3683
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3683
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Blocker

 In the Capacity scheduler if you configure the queues to be hierarchical 
 where you have root - parent queue - leaf queue, the leaf queue doesn't 
 take into account its parents maximum capacity when calculate its own maximum 
 capacity, instead it seems to use the parents capacity.  Looking at the code 
 its using the parents absoluteCapacity and I think it should be using the 
 parents absoluteMaximumCapacity.
 It also seems to only use the parents capacity in the leaf queues max 
 capacity calculation when the leaf queue has a max capacity configured. If 
 the leaf queues maximum-capacity is not configured, then it can use 100% of 
 the cluster.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3612) Task.TaskReporter.done method blocked for some time when task is finishing

2012-01-03 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3612:


Assignee: Binglin Chang

 Task.TaskReporter.done method blocked for some time when task is finishing
 --

 Key: MAPREDUCE-3612
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3612
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Binglin Chang
Assignee: Binglin Chang
 Attachments: MAPREDUCE-3612.patch


 We recently have done some tests to evaluate performances of different Hadoop 
 versions(1.0, 0.23, Baidu internal version), and found some weird results. 
 One of them is in 1.0 Task.TaskReporter.done() takes too much time, about 2s, 
 this is bad for small tasks. After reviewing source code and add some log, 
 the following code block Task.TaskReporter.done
 {code:title=src/mapred/org/apache/hadoop/mapred/Task.java}
  658   try {
  659 Thread.sleep(PROGRESS_INTERVAL);
  660   }
  723 public void stopCommunicationThread() throws InterruptedException {
  724   // Updating resources specified in ResourceCalculatorPlugin
  725   if (pingThread != null) {
  726 synchronized(lock) {
  727   while(!done) {
  728 lock.wait();
  729   }
  730 }
  731 pingThread.interrupt();
  732 pingThread.join();
  733   }
  734 }
 {code}
 Originally line 724-730 don't exists, I don't know why it is added. If it is 
 needed, we can replace Thread.sleep with Object.wait(timeout) and 
 Object.notify instead, so it won't block.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3596) Sort benchmark got hang after completion of 99% map phase

2012-01-03 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3596:


Assignee: Vinod Kumar Vavilapalli

 Sort benchmark got hang after completion of 99% map phase
 -

 Key: MAPREDUCE-3596
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3596
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Attachments: logs.tar.bz2


 Courtesy [~vinaythota]
 {quote}
 Ran sort benchmark couple of times and every time the job got hang after 
 completion 99% map phase. There are some map tasks failed. Also it's not 
 scheduled some of the pending map tasks.
 Cluster size is 350 nodes.
 Build Details:
 ==
 Compiled:   Fri Dec 9 16:25:27 PST 2011 by someone from 
 branches/branch-0.23/hadoop-common-project/hadoop-common 
 ResourceManager version:revision 1212681 by someone source checksum 
 on Fri Dec 9 16:52:07 PST 2011
 Hadoop version: revision 1212592 by someone Fri Dec 9 16:25:27 PST 
 2011
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3605) Allow mr commands to be run via bin/hadoop

2011-12-28 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3605:


Assignee: Arun C Murthy

 Allow mr commands to be run via bin/hadoop
 --

 Key: MAPREDUCE-3605
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3605
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Ramya Sunil
Assignee: Arun C Murthy
 Fix For: 0.23.1


 MR command line options are not supported in bin/hadoop.
 {noformat}
 bin/hadoop job
 Exception in thread main java.lang.NoClassDefFoundError: job
 Caused by: java.lang.ClassNotFoundException: job
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: job.  Program will exit.
 {noformat}
 A deprecated message like DEPRECATED: Use of this script to execute mapred 
 command is deprecated. Instead use the mapred command for it. should be 
 displayed along with the correct output.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3490) RMContainerAllocator counts failed maps towards Reduce ramp up

2011-12-21 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3490:


Assignee: Arun C Murthy

 RMContainerAllocator counts failed maps towards Reduce ramp up
 --

 Key: MAPREDUCE-3490
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3490
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Arun C Murthy
Priority: Blocker
 Attachments: MAPREDUCE-3490.patch


 The RMContainerAllocator does not differentiate between failed and successful 
 maps while calculating whether reduce tasks are ready to launch. Failed tasks 
 are also counted towards total completed tasks. 
 Example. 4 failed maps, 10 total maps. Map%complete = 4/14 * 100 instead of 
 being 0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3483) CapacityScheduler reserves container on same node as AM but can't ever use due to never enough avail memory

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3483:


Assignee: Arun C Murthy

 CapacityScheduler reserves container on same node as AM but can't ever use 
 due to never enough avail memory
 ---

 Key: MAPREDUCE-3483
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3483
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Blocker

 Saw a case where a job was stuck trying to get reducers.  The issue is the 
 capacity scheduler reserved a container on the same node as the application 
 master but there wasn't ever enough memory to run the reducer on that node.  
 Node total memory was 8G, Reducer needed 8G, AM was using 2G.  This 
 particular job had 10 reducers and it was stuck waiting on the one because 
 the AM + reserved reducer memory was already over the queue limit.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3524) Scan runtime is more than 1.5x slower in 0.23

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3524:


Assignee: Vinod Kumar Vavilapalli

 Scan runtime is more than 1.5x slower in 0.23
 -

 Key: MAPREDUCE-3524
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3524
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 Scan runtime is more than 1.5X slower(almost 92% increased) in 0.23 than 
 Hadoop-0.20.204 on 350 nodes size cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3534) Compression benchmark run-time increased by 13% in 0.23

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3534:


Assignee: Vinod Kumar Vavilapalli

 Compression benchmark run-time increased by 13% in 0.23
 ---

 Key: MAPREDUCE-3534
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3534
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Vinay Kumar Thota
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 Compression runtime is increased by 13% as well as throughput is decreased by 
 24% in 0.23 when compared to 0.20.204 on
 a 350 node size cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3525) Shuffle runtime is nearly 1.5x slower in 0.23

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3525:


Assignee: Vinod Kumar Vavilapalli

 Shuffle runtime is nearly 1.5x slower in 0.23
 -

 Key: MAPREDUCE-3525
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3525
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Vinod Kumar Vavilapalli

 Shuffle runtime is nearly 1.5X slower(almost 55% increased) in 0.23 than 
 Hadoop-0.20.204 on 350 nodes size cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3511) Counters occupy a good part of AM heap

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3511:


Assignee: Vinod Kumar Vavilapalli  (was: Devaraj K)

Devaraj - if you don't mind, I'll assign this to Vinod since he is blocked by 
this. Thanks.

 Counters occupy a good part of AM heap
 --

 Key: MAPREDUCE-3511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am, mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Vinod Kumar Vavilapalli

 Per task counters seem to be occupying a good part of an AMs heap. Looks like 
 more than 50% of what's used by a TaskAttemptImpl object.
 This could be optimized by interning strings or possibly using mrv1 counters 
 which are optimized. Currently counters are converted from mrv1 to mrv2 
 format for in memory storage. The conversion could be delayed till it's 
 actually required for RPC transfers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3530) Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3530:


Assignee: Arun C Murthy  (was: Vinod Kumar Vavilapalli)

I'll take a look.

 Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop
 ---

 Key: MAPREDUCE-3530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager, scheduler
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Arun C Murthy
Priority: Blocker

 Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop 
 but ResourceManager keeps on running.
 I have been observing intermitently for last 3 weeks.
 But with latest svn code. I tried to run sort twice and both times Job got 
 stuck due to NPE.
 {code}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerLaunchedOnNode(SchedulerApp.java:181)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.containerLaunchedOnNode(CapacityScheduler.java:596)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:539)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:617)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:77)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:294)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3399) ContainerLocalizer should request new resources after completing the current one

2011-12-13 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3399:


Assignee: Siddharth Seth

 ContainerLocalizer should request new resources after completing the current 
 one
 

 Key: MAPREDUCE-3399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3399
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Blocker

 Currently, the ContainerLocalizer to NM heartbeats to the NM every second. 
 Not very significant, but this causes a ~4second delay in jobs (job jar, 
 splits, etc). Instead, it should heartbeat to ask for additional resources to 
 localize as soon as the previous one is localzied. There's already a TODO in 
 the ContainerLocalizer for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3530) Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop

2011-12-12 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3530:


Assignee: Vinod Kumar Vavilapalli

 Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop
 ---

 Key: MAPREDUCE-3530
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3530
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager, scheduler
Affects Versions: 0.23.1
Reporter: Karam Singh
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 Sometimes NODE_UPDATE to the scheduler throws NPE causes scheduling to stop 
 but ResourceManager keeps on running.
 I have been observing intermitently for last 3 weeks.
 But with latest svn code. I tried to run sort twice and both times Job got 
 stuck due to NPE.
 {code}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApp.containerLaunchedOnNode(SchedulerApp.java:181)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.containerLaunchedOnNode(CapacityScheduler.java:596)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.nodeUpdate(CapacityScheduler.java:539)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:617)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:77)
 at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher$EventProcessor.run(ResourceManager.java:294)
 at java.lang.Thread.run(Thread.java:619)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-1118) Capacity Scheduler scheduling information is hard to read / should be tabular format

2011-11-22 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-1118:


Assignee: Krishna Ramachandran  (was: Milind Bhandarkar)

 Capacity Scheduler scheduling information is hard to read / should be tabular 
 format
 

 Key: MAPREDUCE-1118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Affects Versions: 0.20.2
Reporter: Allen Wittenauer
Assignee: Krishna Ramachandran
 Fix For: 0.20.203.0, 0.22.0

 Attachments: MR-1118-22.patch, mapred-1118-1.patch, 
 mapred-1118-2.patch, mapred-1118-3.patch, mapred-1118.20S.patch, 
 mapred-1118.patch


 The scheduling information provided by the capacity scheduler is extremely 
 hard to read on the job tracker web page.  Instead of just flat text, it 
 should be presenting the information in a tabular format, similar to what the 
 fair share scheduler provides.  This makes it much easier to compare what 
 different queues are doing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3329) capacity schedule maximum-capacity allowed to be less then capacity

2011-11-21 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3329:


Assignee: Arun C Murthy

 capacity schedule maximum-capacity allowed to be less then capacity
 ---

 Key: MAPREDUCE-3329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Blocker

 When configuring the capacity scheduler capacity and maximum-capacity, it 
 allows the maximum-capacity to be less then the capacity.  I did not test to 
 see what true limit is, I assume maximum capacity.
 output from mapred queue -list where capacity = 10%, max capacity = 5%.
 Queue Name : test2 
 Queue State : running 
 Scheduling Info : queueName: test2, capacity: 0.1, maximumCapacity: 5.0, 
 currentCapacity: 0.0, state: Q_RUNNING,  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3307) Improve logging on the console during job execution

2011-11-21 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3307:


Assignee: Arun C Murthy

 Improve logging on the console during job execution
 ---

 Key: MAPREDUCE-3307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3307
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.23.1


 There is a lot of redundant information being printed on the console and a 
 not so intuitive flow of events. We should improve the logging on console 
 during job execution. More details in the next comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3265) Reduce log level on MR2 IPC construction, etc

2011-11-21 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3265:


Assignee: Arun C Murthy

 Reduce log level on MR2 IPC construction, etc
 -

 Key: MAPREDUCE-3265
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3265
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Arun C Murthy
Priority: Blocker

 Currently MR's IPC logging is very verbose. For example, I see a lot of:
 11/10/25 12:14:06 INFO ipc.YarnRPC: Creating YarnRPC for 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
 11/10/25 12:14:06 INFO mapred.ResourceMgrDelegate: Connecting to 
 ResourceManager at c0309.hal.cloudera.com/172.29.81.91:40012
 11/10/25 12:14:06 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
 11/10/25 12:14:07 INFO mapred.ResourceMgrDelegate: Connected to 
 ResourceManager at c0309.hal.cloudera.com/172.29.81.91:40012
 11/10/25 12:14:08 INFO mapred.ClientCache: Connecting to HistoryServer at: 
 c0309.hal.cloudera.com:10020
 11/10/25 12:14:08 INFO ipc.YarnRPC: Creating YarnRPC for 
 org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
 11/10/25 12:14:08 INFO mapred.ClientCache: Connected to HistoryServer at: 
 c0309.hal.cloudera.com:10020
 11/10/25 12:14:08 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
 for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
 ... when submitting a job. This should be DEBUG level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3284) bin/mapred queue fails with JobQueueClient ClassNotFoundException

2011-10-27 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3284:


Assignee: Arun C Murthy

Trivial fix is to move JobQueueClient to hadoop-mapreduce-client-core. We 
should have done this long ago...

 bin/mapred queue fails with JobQueueClient ClassNotFoundException
 -

 Key: MAPREDUCE-3284
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3284
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Arun C Murthy
 Fix For: 0.23.0


 bin/mapred queue fails with the following exception:
 {code}
 -bash$ bin/mapred queue
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapred/JobQueueClient
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapred.JobQueueClient
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: org.apache.hadoop.mapred.JobQueueClient.  
 Program will exit.
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3282) bin/mapred job -list throws exception

2011-10-27 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3282:


Assignee: Arun C Murthy

 bin/mapred job -list throws exception
 -

 Key: MAPREDUCE-3282
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3282
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Arun C Murthy
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-3282.patch


 bin/mapred job -list throws exception when mapreduce.framework.name is set to 
 yarn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3290) list-active-trackers throws NPE

2011-10-27 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3290:


Assignee: Arun C Murthy

Ramya, I can't seem to reproduce this with latest builds. Can you please let me 
know further details? Tx

 list-active-trackers throws NPE
 ---

 Key: MAPREDUCE-3290
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3290
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Arun C Murthy
 Fix For: 0.23.0


 bin/mapred -list-active-trackers throws NPE in mrV2. Trace in the next 
 comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2766) [MR-279] Set correct permissions for files in dist cache

2011-10-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2766:


Assignee: Hitesh Shah

 [MR-279] Set correct permissions for files in dist cache
 

 Key: MAPREDUCE-2766
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2766
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Hitesh Shah
Priority: Blocker
 Fix For: 0.23.0


 Currently, the files in both public and private dist cache are having 777 
 permission. Also, the group ownership of files on private cache have to be 
 set to $TT_SPECIAL_GROUP

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2821) [MR-279] Missing fields in job summary logs

2011-10-24 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2821:


Assignee: Mahadev konar  (was: Harsh J)

 [MR-279] Missing fields in job summary logs 
 

 Key: MAPREDUCE-2821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Mahadev konar
Priority: Blocker
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2821.patch


 The following fields are missing in the job summary logs in mrv2:
 - numSlotsPerMap
 - numSlotsPerReduce
 - clusterCapacity (Earlier known as clusterMapCapacity and 
 clusterReduceCapacity in 0.20.x)
 The first two fields are important to know if the job was a High RAM job or 
 not and the last field is important to know the total available resource in 
 the cluster during job execution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3254) Streaming jobs failing with PipeMapRunner ClassNotFoundException

2011-10-24 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3254:


Assignee: Hitesh Shah

 Streaming jobs failing with PipeMapRunner ClassNotFoundException
 

 Key: MAPREDUCE-3254
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3254
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming, mrv2
Affects Versions: 0.23.0
Reporter: Ramya Sunil
Assignee: Hitesh Shah
Priority: Blocker

 ClassNotFoundException: org.apache.hadoop.streaming.PipeMapRunner encountered 
 while running streaming jobs. Stack trace in the next comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3253) ContextFactory throw NoSuchFieldException

2011-10-24 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3253:


Assignee: Arun C Murthy  (was: Mahadev konar)

Ok, I can reproduce this on branch-0.23.

 ContextFactory throw NoSuchFieldException
 -

 Key: MAPREDUCE-3253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3253
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Daniel Dai
Assignee: Arun C Murthy
Priority: Blocker

 I see exceptions from ContextFactory when I am running Pig unit test:
 Caused by: java.lang.IllegalArgumentException: Can't find field
 at 
 org.apache.hadoop.mapreduce.ContextFactory.clinit(ContextFactory.java:139)
 Caused by: java.lang.NoSuchFieldException: reporter
 at java.lang.Class.getDeclaredField(Class.java:1882)
 at 
 org.apache.hadoop.mapreduce.ContextFactory.clinit(ContextFactory.java:126)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2977) ResourceManager needs to renew and cancel tokens associated with a job

2011-10-22 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2977:


Assignee: Arun C Murthy  (was: Vinod Kumar Vavilapalli)

 ResourceManager needs to renew and cancel tokens associated with a job
 --

 Key: MAPREDUCE-2977
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2977
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2, resourcemanager, security
Affects Versions: 0.23.0
Reporter: Owen O'Malley
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.23.0


 The JobTracker currently manages tokens for the applications and the resource 
 manager needs the same functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3028) Support job end notification in .next /0.23

2011-10-22 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3028:


Assignee: Ravi Prakash  (was: Arun C Murthy)

 Support job end notification in .next /0.23
 ---

 Key: MAPREDUCE-3028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Mohammad Kamrul Islam
Assignee: Ravi Prakash
Priority: Blocker
 Fix For: 0.23.0

 Attachments: MAPREDUCE-3028.branch-0.23.patch, MAPREDUCE-3028.patch, 
 MAPREDUCE-3028.patch, MAPREDUCE-3028.patch, MAPREDUCE-3028.patch, 
 MAPREDUCE-3028.patch


 Oozie primarily depends on  the job end notification to determine when the 
 job finishes. In the current version,  job end notification is implemented in 
 job tracker. Since job tracker will be removed in the upcoming hadoop release 
 (.next), we wander where this support will move. I think this best effort 
 notification could be implemented in the new Application Manager as one of 
 the last step of job completion.
 Whatever implementation will it be, Oozie badly needs this feature to be 
 continued in next releases as well.
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3143) Complete aggregation of user-logs spit out by containers onto DFS

2011-10-20 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3143:


Assignee: (was: Vinod Kumar Vavilapalli)

 Complete aggregation of user-logs spit out by containers onto DFS
 -

 Key: MAPREDUCE-3143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
 Fix For: 0.23.0


 Already implemented the feature for handling user-logs spit out by containers 
 in NodeManager. But the feature is currently disabled due to user-interface 
 issues.
 This is the umbrella ticket for tracking the pending bugs w.r.t putting 
 container-logs on DFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2746) [MR-279] [Security] Yarn servers can't communicate with each other with hadoop.security.authorization set to true

2011-10-20 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2746:


Assignee: Arun C Murthy

 [MR-279] [Security] Yarn servers can't communicate with each other with 
 hadoop.security.authorization set to true
 -

 Key: MAPREDUCE-2746
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2746
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2, security
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.23.0


 Because of this problem, till now, we've been testing YARN+MR with 
 {{hadoop.security.authorization}} set to false. We need to register yarn 
 communication protocols in the implementation of the authorization related 
 PolicyProvider (MapReducePolicyProvider.java).
 [~devaraj] also found this issue independently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3178) Capacity Schedular shows incorrect cluster information in the RM logs

2011-10-20 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3178:


Assignee: Bhallamudi Venkata Siva Kamesh

Bala, this is a good to fix bug. Can you pls add a unit test? Tx.

 Capacity Schedular shows incorrect cluster information in the RM logs
 -

 Key: MAPREDUCE-3178
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3178
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Bhallamudi Venkata Siva Kamesh
Assignee: Bhallamudi Venkata Siva Kamesh
 Attachments: MAPREDUCE-3178.patch


 When we start the NM, after stopping it (in a quick session) CS shows 
 incorrect information about clusterResource in the logs.
 I have encountered this issue in a pseudo cluster mode and steps to reproduce 
 are
 1) start the YARN cluster
 2) stop a NM and start the NM again (in a quick session)
 There should be a NM running in the cluster however as I observed RM detects 
 NM as dead, after default time since its actual unavailability(In this case 
 NM has been stopped).
  
 If you start your NM before this time (default time), ResourceTracker throws 
 IOEx, however, CS adds the NM's capacity to the clusterResource. 
 After elapsed time (default time) when RM detects NM as dead, RM removes the 
 NM and hence capacity of the cluster will be subtracted by the amount NM 
 capacity.
 Eventually there is no NM running in the cluster, but capacity of the cluster 
 is NM's capacity (by default)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3034) NM should act on a REBOOT command from RM

2011-10-18 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3034:


Assignee: Arun C Murthy  (was: Vinod Kumar Vavilapalli)

 NM should act on a REBOOT command from RM
 -

 Key: MAPREDUCE-3034
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3034
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, nodemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.23.0


 RM sends a reboot command to NM in some cases, like when it gets lost and 
 rejoins back. In such a case, NM should act on the command and 
 reboot/reinitalize itself.
 This is akin to TT reinitialize on order from JT. We will need to shutdown 
 all the services properly and reinitialize - this should automatically take 
 care of killing of containers, cleaning up local temporary files etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3103) Implement Job ACLs for MRAppMaster

2011-10-18 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3103:


Assignee: Vinod Kumar Vavilapalli

 Implement Job ACLs for MRAppMaster
 --

 Key: MAPREDUCE-3103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3103
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: mrv2, security
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker
 Fix For: 0.23.0




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3058) Sometimes task keeps on running while its Syslog says that it is shutdown

2011-10-17 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3058:


Assignee: Hitesh Shah  (was: Vinod Kumar Vavilapalli)

Hitesh - can you please take a look since you are already working on related 
areas?

Currently the TT sends SIGTERM followed by SIGKILL...

 Sometimes task keeps on running while its Syslog says that it is shutdown
 -

 Key: MAPREDUCE-3058
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3058
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix, mrv2
Affects Versions: 0.23.0
Reporter: Karam Singh
Assignee: Hitesh Shah
Priority: Critical
 Fix For: 0.23.0

 Attachments: MAPREDUCE-3058-20110923.txt


 While running GridMixV3, one of the jobs got stuck for 15 hrs. After clicking 
 on the Job-page, found one of its reduces to be stuck. Looking at syslog of 
 the stuck reducer, found this:
 Task-logs' head:
 {code}
 2011-09-19 17:57:22,002 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period 
 at 10 second(s).
 2011-09-19 17:57:22,002 INFO 
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system 
 started
 {code}
 Task-logs' tail:
 {code}
 2011-09-19 18:06:49,818 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
 createBlockOutputStream java.io.IOException: Bad connect ack with 
 firstBadLink as DATANODE1
 2011-09-19 18:06:49,818 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
 for block 
 BP-1405370709-NAMENODE-1316452621953:blk_-7004355226367468317_79871 in 
 pipeline  DATANODE2,  DATANODE1: bad datanode  DATANODE1
 2011-09-19 18:06:49,818 DEBUG 
 org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtocol: 
 lastAckedSeqno = 26870
 2011-09-19 18:06:49,820 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to NAMENODE from gridperf sending #454
 2011-09-19 18:06:49,826 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to NAMENODE from gridperf got value #454
 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.ipc.RPC: Call: 
 getAdditionalDatanode 8
 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Connecting to 
 datanode DATANODE2
 2011-09-19 18:06:49,827 DEBUG org.apache.hadoop.hdfs.DFSClient: Send buf size 
 131071
 2011-09-19 18:06:49,833 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
 Exception
 java.io.EOFException: Premature EOF: no length prefix available
 at 
 org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
 2011-09-19 18:06:49,837 WARN org.apache.hadoop.mapred.YarnChild: Exception 
 running child : java.io.EOFException: Premature EOF: no length prefix 
 available
 at 
 org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:158)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:860)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:838)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:929)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:740)
 at 
 org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:415)
 2011-09-19 18:06:49,837 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to APPMASTER from job_1316452677984_0862 sending #455
 2011-09-19 18:06:49,839 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to APPMASTER from job_1316452677984_0862 got value 
 #455
 2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.RPC: Call: statusUpdate 3
 2011-09-19 18:06:49,840 INFO org.apache.hadoop.mapred.Task: Runnning cleanup 
 for the task
 2011-09-19 18:06:49,840 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to NAMENODE from gridperf sending #456
 2011-09-19 18:06:49,858 DEBUG org.apache.hadoop.ipc.Client: IPC Client 
 (26613121) connection to NAMENODE from gridperf got value #456
 2011-09-19 18:06:49,858 DEBUG 

[jira] [Assigned] (MAPREDUCE-3126) mr job stuck because reducers using all slots and mapper isn't scheduled

2011-10-10 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3126:


Assignee: Arun C Murthy

I'll fix the CS corner case at least with this jira...

 mr job stuck because reducers using all slots and mapper isn't scheduled
 

 Key: MAPREDUCE-3126
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3126
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Thomas Graves
Assignee: Arun C Murthy
Priority: Blocker
 Fix For: 0.23.0


 The command in MAPREDUCE-3124 run and this job got hung with 1 Map task 
 waiting for resources and 7 Reducers running (2 waiting).  The mapper got 
 scheduler, then AM scheduled the reducers, the map task failed and tried to 
 start a new attempt but reducers were using all the slots.   
 I will try to add some more info from the logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2889) Add docs for writing new application frameworks

2011-09-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2889:


Assignee: Hitesh Shah

 Add docs for writing new application frameworks
 ---

 Key: MAPREDUCE-2889
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2889
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: documentation, mrv2
Affects Versions: 0.23.0
Reporter: Arun C Murthy
Assignee: Hitesh Shah
Priority: Critical
 Fix For: 0.23.0


 We need to add docs for writing new application frameworks, including 
 examples, javadocs and sample apps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-3084) race when KILL_CONTAINER is received for a LOCALIZED container

2011-09-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-3084:


Assignee: Hitesh Shah

 race when KILL_CONTAINER is received for a LOCALIZED container
 --

 Key: MAPREDUCE-3084
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3084
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Siddharth Seth
Assignee: Hitesh Shah
Priority: Blocker

 Depending on when ContainersLaunch starts a container, {{KILL_CONTAINER}} 
 when container state is {{LOCALIZED}} ({{LAUNCH_CONTAINER}} event already 
 sent) can end up generating a {{CONTAINER_LAUNCHED}} event - which isn't 
 handled by ContainerState: {{KILLING}}. Also, the launched container won't be 
 killed since {{CLEANUP_CONTAINER}} would have already been processed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2989) JobHistory should link to task logs

2011-09-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2989:


Assignee: Siddharth Seth

 JobHistory should link to task logs
 ---

 Key: MAPREDUCE-2989
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2989
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Siddharth Seth
Assignee: Siddharth Seth
Priority: Critical

 The log link on the task attempt page is currently broken - since it relies 
 on a ContainerId. We should either pass the containerId via a history event - 
 or some kind of field with information about the log location.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (MAPREDUCE-2977) ResourceManager needs to renew and cancel tokens associated with a job

2011-09-26 Thread Arun C Murthy (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy reassigned MAPREDUCE-2977:


Assignee: Vinod Kumar Vavilapalli

 ResourceManager needs to renew and cancel tokens associated with a job
 --

 Key: MAPREDUCE-2977
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2977
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Owen O'Malley
Assignee: Vinod Kumar Vavilapalli
Priority: Blocker

 The JobTracker currently manages tokens for the applications and the resource 
 manager needs the same functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira