[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Description: Re-factor MapReduce into a generic resource scheduler and a 
per-job, user-defined component that manages the application execution.   (was: 
We, at Yahoo!, have been using Hadoop-On-Demand as the resource 
provisioning/scheduling mechanism. 

With HoD the user uses a self-service system to ask-for a set of nodes. HoD 
allocates these from a global pool and also provisions a private Map-Reduce 
cluster for the user. She then runs her jobs and shuts the cluster down via HoD 
when done. All user-private clusters use the same humongous, static HDFS (e.g. 
2k node HDFS). 

More details about HoD are available here: HADOOP-1301.



h3. Motivation

The current deployment (Hadoop + HoD) has a couple of implications:

 * _Non-optimal Cluster Utilization_

   1. Job-private Map-Reduce clusters imply that the user-cluster potentially 
could be *idle* for atleast a while before being detected and shut-down.

   2. Elastic Jobs: Map-Reduce jobs, typically, have lots of maps with 
much-smaller no. of reduces; with maps being light and quick and reduces being 
i/o heavy and longer-running. Users typically allocate clusters depending on 
the no. of maps (i.e. input size) which leads to the scenario where all the 
maps are done (idle nodes in the cluster) and the few reduces are chugging 
along. Right now, we do not have the ability to shrink the HoD'ed Map-Reduce 
clusters which would alleviate this issue. 

 * _Impact on data-locality_

With the current setup of a static, large HDFS and much smaller (5/10/20/50 
node) clusters there is a good chance of losing one of Map-Reduce's primary 
features: ability to execute tasks on the datanodes where the input splits are 
located. In fact, we have seen the data-local tasks go down to 20-25 percent in 
the GridMix benchmarks, from the 95-98 percent we see on the randomwriter+sort 
runs run as part of the hadoopqa benchmarks (admittedly a synthetic benchmark, 
but yet). Admittedly, HADOOP-1985 (rack-aware Map-Reduce) helps significantly 
here.



Primarily, the notion of *job-level scheduling* leading to private clusers, as 
opposed to *task-level scheduling*, is a good peg to hang-on the majority of 
the blame.

Keeping the above factors in mind, here are some thoughts on how to 
re-structure Hadoop Map-Reduce to solve some of these issues.



h3. State of the Art

As it exists today, a large, static, Hadoop Map-Reduce cluster (forget HoD for 
a bit) does provide task-level scheduling; however as it exists today, it's 
scalability to tens-of-thousands of user-jobs, per-week, is in question.

Lets review it's current architecture and main components:

 * JobTracker: It does both *task-scheduling* and *task-monitoring* 
(tasktrackers send task-statuses via periodic heartbeats), which implies it is 
fairly loaded. It is also a _single-point of failure_ in the Map-Reduce 
framework i.e. its failure implies that all the jobs in the system fail. This 
means a static, large Map-Reduce cluster is fairly susceptible and a definite 
suspect. Clearly HoD solves this by having per-job clusters, albeit with the 
above drawbacks.
 * TaskTracker: The slave in the system which executes one task at-a-time under 
directions from the JobTracker.
 * JobClient: The per-job client which just submits the job and polls the 
JobTracker for status. 



h3. Proposal - Map-Reduce 2.0 

The primary idea is to move to task-level scheduling and static Map-Reduce 
clusters (so as to maintain the same storage cluster and compute cluster 
paradigm) as a way to directly tackle the two main issues illustrated above. 
Clearly, we will have to get around the existing problems, especially w.r.t. 
scalability and reliability.

The proposal is to re-work Hadoop Map-Reduce to make it suitable for a large, 
static cluster. 

Here is an overview of how its main components would look like:
 * JobTracker: Turn the JobTracker into a pure task-scheduler, a global one. 
Lets call this the *JobScheduler* henceforth. Clearly (data-locality aware) 
Maui/Moab are  candidates for being the scheduler, in which case, the 
JobScheduler is just a thin wrapper around them. 
 * TaskTracker: These stay as before, without some minor changes as illustrated 
later in the piece.
 * JobClient: Fatten up the JobClient my putting a lot more intelligence into 
it. Enhance it to talk to the JobTracker to ask for available TaskTrackers and 
then contact them to schedule and monitor the tasks. So we'll have lots of 
per-job clients talking to the JobScheduler and the relevant TaskTrackers for 
their respective jobs, a big change from today. Lets call this the *JobManager* 
henceforth. 

A broad sketch of how things would work: 

h4. Deployment

There is a single, static, large Map-Reduce cluster, and no per-job 

[jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994693#comment-12994693
 ] 

eric baldeschwieler commented on MAPREDUCE-279:
---

We're having a baby!

Todd Papaioannou (p9u) is action head of Hadoop.
Most line issues can continue to go to Amol, Kazi, Satish, Avik or Senthil as 
appropriate.

I'll be back on roughly march 9th.

CUSoon,
E14


 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: We're having a baby!

Todd Papaioannou (p9u) is action head of Hadoop.
Most line issues can continue to go to Amol, Kazi, Satish, Avik or Senthil as 
appropriate.

I'll be back on roughly march 9th.

CUSoon,
E14
)

 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files

2011-02-15 Thread Ramkumar Vadali (JIRA)
RAID BlockFixer should exclude temporary files
--

 Key: MAPREDUCE-2329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor


RAID BlockFixer should exclude files matching the pattern ^/tmp/.*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2329) RAID BlockFixer should exclude temporary files

2011-02-15 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2329:
---

Component/s: contrib/raid

 RAID BlockFixer should exclude temporary files
 --

 Key: MAPREDUCE-2329
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2329
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.20.2, 0.20.3
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
Priority: Minor

 RAID BlockFixer should exclude files matching the pattern ^/tmp/.*

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994928#comment-12994928
 ] 

Arun C Murthy commented on MAPREDUCE-279:
-

h5. Proposal

The fundamental idea of the re-factor is to divide the two major functions of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate components: a generic resource scheduler and a per-job, user-defined 
component that manages the application execution.

The new ResourceManager manages the global assignment of compute resources to 
applications and the per-application ApplicationMaster manages the 
application's scheduling and coordination. An application is either a single 
job in the classic MapReduce jobs or a DAG of such jobs. The ResourceManager 
and per-machine NodeManager server, which manages the user processes on that 
machine, form the computation fabric. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.

The ResourceManager is a pure scheduler in the sense that it performs no 
monitoring or tracking of status for the application. Also, it offers no 
guarantees on restarting failed tasks either due to application failure or 
hardware failures.

The ResourceManager performs its scheduling function based the resource 
requirements of the applications; each application has multiple resource 
request types that represent the resources required for containers. The 
resource requests include memory, CPU, disk, network etc. Note that this is a 
significant change from the current model of fixed-type slots in Hadoop 
MapReduce, which leads to significant negative impact on cluster utilization. 
The ResourceManager has a scheduler policy plug-in, which is responsible for 
partitioning the cluster resources among various queues, applications etc. 
Scheduler plug-ins can be based, for e.g., on the current CapacityScheduler and 
FairScheduler.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, launching tasks, tracking 
their status  monitoring for progress, handling task-failures and recovering 
from saved state on an ResourceManager fail-over.

Since downtime is more expensive at scale high-availability is built-in from 
the beginning via Apache ZooKeeper for the ResourceManager and HDFS checkpoint 
for the MapReduce ApplicationMaster. Security and multi-tenancy support is 
critical to support many users on the larger clusters. The new architecture 
will also increase innovation and agility by allowing for user-defined versions 
of MapReduce runtime. Support for generic resource requests will increase 
cluster utilization by removing artificial bottlenecks such as 
hard-partitioning of resources into map and reduce slots.



We have a *prototype* we'd like to commit to a branch soon, where we look 
forward to feedback. From there on, we would love to collaborate to get it 
committed to trunk.



 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: h5. Proposal 

The fundamental idea of the re-factor is to divide the two major functions of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate components: a generic resource scheduler and a per-job, user-defined 
component that manages the application execution. 

The new ResourceManager manages the global assignment of compute resources to 
applications and the per-application ApplicationMaster manages the 
application's scheduling and coordination. An application is either a single 
job in the classic MapReduce jobs or a DAG of such jobs. The ResourceManager 
and per-machine NodeManager server, which manages the user processes on that 
machine, form the computation fabric. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.

The ResourceManager is a pure scheduler in the sense that it performs no 
monitoring or tracking of status for the application. Also, it offers no 
guarantees on restarting failed tasks either due to application failure or 
hardware failures.

The ResourceManager performs its scheduling function based the resource 
requirements of the applications; each application has multiple resource 
request types that represent the resources required for containers. The 
resource requests include memory, CPU, disk, network etc. Note that this is a 
significant change from the current model of fixed-type slots in Hadoop 
MapReduce, which leads to significant negative impact on cluster utilization. 
The ResourceManager has a scheduler policy plug-in, which is responsible for 
partitioning the cluster resources among various queues, applications etc. 
Scheduler plug-ins can be based, for e.g., on the current CapacityScheduler and 
FairScheduler.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, launching tasks, tracking 
their status  monitoring for progress, handling task-failures and recovering 
from saved state on an ResourceManager fail-over.

Since downtime is more expensive at scale high-availability is built-in from 
the beginning via Apache ZooKeeper for the ResourceManager and HDFS checkpoint 
for the MapReduce ApplicationMaster. Security and multi-tenancy support is 
critical to support many users on the larger clusters. The new architecture 
will also increase innovation and agility by allowing for user-defined versions 
of MapReduce runtime. Support for generic resource requests will increase 
cluster utilization by removing artificial bottlenecks such as 
hard-partitioning of resources into map and reduce slots.



We have a *prototype* we'd like to commit to a branch soon, where we look 
forward to feedback. From there on, we would love to collaborate to get it 
committed to trunk.

)

 Map-Reduce 2.0
 --

 Key: MAPREDUCE-279
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.23.0


 Re-factor MapReduce into a generic resource scheduler and a per-job, 
 user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994991#comment-12994991
 ] 

Roman Shaposhnik commented on MAPREDUCE-2314:
-

$ ant veryclean test-patch -Dpatch.file=/home/rvs/MAPREDUCE-2314.patch 
-Dscratch.dir=/tmp/mapred/scratch -Dfindbugs.home=/home/rvs/findbugs-1.3.9 
-Dforrest.home=/home/rvs/src/apache-forrest-0.8
.

 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.


And given that it is a build change  -1 tests included can be ignored

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12994993#comment-12994993
 ] 

Roman Shaposhnik commented on MAPREDUCE-2314:
-

To answer a chmod question -- I followed the already established pattern of not 
doing chmods in parallel.

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Roman Shaposhnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Shaposhnik updated MAPREDUCE-2314:


  Component/s: build
Affects Version/s: 0.22.0

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-2314:
--

Hadoop Flags: [Reviewed]

+1 patch looks good.

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2011-02-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995003#comment-12995003
 ] 

Konstantin Boudnik commented on MAPREDUCE-1783:
---

Looks like when the change has been committed to 0.22 the CHANGE.txt file was 
updated improperly (added to IMPROVEMENT section instead of BUG FIXES) which 
cases problems now for downstream merges. Also, the description of the JIRA has 
been written to CHANGES.txt differently from what it say on the ticket ;(

Please fix.

 Task Initialization should be delayed till when a job can be run
 

 Key: MAPREDUCE-1783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.22.0, 0.23.0

 Attachments: 0001-Pool-aware-job-initialization.patch, 
 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, 
 submit-mapreduce-1783.patch


 The FairScheduler task scheduler uses PoolManager to impose limits on the 
 number of jobs that can be running at a given time. However, jobs that are 
 submitted are initiaiized immediately by EagerTaskInitializationListener by 
 calling JobInProgress.initTasks. This causes the job split file to be read 
 into memory. The split information is not needed until the number of running 
 jobs is less than the maximum specified. If the amount of split information 
 is large, this leads to unnecessary memory pressure on the Job Tracker.
 To ease memory pressure, FairScheduler can use another implementation of 
 JobInProgressListener that is aware of PoolManager limits and can delay task 
 initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1783) Task Initialization should be delayed till when a job can be run

2011-02-15 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995004#comment-12995004
 ] 

Konstantin Boudnik commented on MAPREDUCE-1783:
---

I have fixed it.

 Task Initialization should be delayed till when a job can be run
 

 Key: MAPREDUCE-1783
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1783
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/fair-share
Affects Versions: 0.20.1
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali
 Fix For: 0.22.0, 0.23.0

 Attachments: 0001-Pool-aware-job-initialization.patch, 
 0001-Pool-aware-job-initialization.patch.1, MAPREDUCE-1783.patch, 
 submit-mapreduce-1783.patch


 The FairScheduler task scheduler uses PoolManager to impose limits on the 
 number of jobs that can be running at a given time. However, jobs that are 
 submitted are initiaiized immediately by EagerTaskInitializationListener by 
 calling JobInProgress.initTasks. This causes the job split file to be read 
 into memory. The split information is not needed until the number of running 
 jobs is less than the maximum specified. If the amount of split information 
 is large, this leads to unnecessary memory pressure on the Job Tracker.
 To ease memory pressure, FairScheduler can use another implementation of 
 JobInProgressListener that is aware of PoolManager limits and can delay task 
 initialization until the number of running jobs is below the maximum.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-2314:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I have committed this to trunk and branch-0.22. Thanks Roman!

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-2314:
--

Fix Version/s: 0.22.0

 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2314) configure files that are generated as part of the released tarball need to have executable bit set

2011-02-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995010#comment-12995010
 ] 

Hudson commented on MAPREDUCE-2314:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #611 (See 
[https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/611/])
MAPREDUCE-2314. configure files that are generated as part of the released 
tarball need to have executable bit set. Contributed by Roman Shaposhnik.


 configure files that are generated as part of the released tarball need to 
 have executable bit set
 --

 Key: MAPREDUCE-2314
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2314
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.22.0
Reporter: Roman Shaposhnik
Assignee: Roman Shaposhnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-2314.patch


 Currently the configure files that are packaged in a tarball are -rw-rw-r--

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2330) Forward port MapReduce server MXBeans

2011-02-15 Thread Luke Lu (JIRA)
Forward port MapReduce server MXBeans
-

 Key: MAPREDUCE-2330
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2330
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Luke Lu
 Fix For: 0.23.0


Some JMX classes e.g., JobTrackerMXBean and TaskTrackerMXBean in 0.20.100~ 
needs to be forward ported to 0.23 in some fashion, depending on how MapReduce 
2.0 emerges.

Note, similar item for HDFS, HDFS-1318 is already in 0.22.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-02-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995017#comment-12995017
 ] 

Todd Lipcon commented on MAPREDUCE-2254:


looks good except one minor nit - can you add the apache license to the new 
test file?

 Allow setting of end-of-record delimiter for TextInputFormat
 

 Key: MAPREDUCE-2254
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ahmed Radwan
 Attachments: MAPREDUCE-2245.patch, MAPREDUCE-2254_r2.patch


 It will be useful to allow setting the end-of-record delimiter for 
 TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
 the only possible record delimiters. This is a problem if users have embedded 
 newlines in their data fields (which is pretty common). This is also a 
 problem for other tools using this TextInputFormat (See for example: 
 https://issues.apache.org/jira/browse/PIG-836 and 
 https://issues.cloudera.org/browse/SQOOP-136).
 I have wrote a patch to address this issue. This patch allows users to 
 specify any custom end-of-record delimiter using a new added configuration 
 property. For backward compatibility, if this new configuration property is 
 absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
 '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2254) Allow setting of end-of-record delimiter for TextInputFormat

2011-02-15 Thread Ahmed Radwan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Radwan updated MAPREDUCE-2254:


Attachment: MAPREDUCE-2254_r3.patch

Adding the Apache license header to the test file.

 Allow setting of end-of-record delimiter for TextInputFormat
 

 Key: MAPREDUCE-2254
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2254
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ahmed Radwan
 Attachments: MAPREDUCE-2245.patch, MAPREDUCE-2254_r2.patch, 
 MAPREDUCE-2254_r3.patch


 It will be useful to allow setting the end-of-record delimiter for 
 TextInputFormat. The current implementation hardcodes '\n', '\r' or '\r\n' as 
 the only possible record delimiters. This is a problem if users have embedded 
 newlines in their data fields (which is pretty common). This is also a 
 problem for other tools using this TextInputFormat (See for example: 
 https://issues.apache.org/jira/browse/PIG-836 and 
 https://issues.cloudera.org/browse/SQOOP-136).
 I have wrote a patch to address this issue. This patch allows users to 
 specify any custom end-of-record delimiter using a new added configuration 
 property. For backward compatibility, if this new configuration property is 
 absent, then the same exact previous delimiters are used (i.e., '\n', '\r' or 
 '\r\n').

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1921) IOExceptions should contain the filename of the broken input files

2011-02-15 Thread Krishna Ramachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995126#comment-12995126
 ] 

Krishna Ramachandran commented on MAPREDUCE-1921:
-

Tom
You are right I will update the patch address the lapse. It has been a long 
time since I looked at this am glad to fix  revive

regards
Krishna

 IOExceptions should contain the filename of the broken input files
 --

 Key: MAPREDUCE-1921
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1921
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Krishna Ramachandran
Assignee: Krishna Ramachandran
 Attachments: mapred-1921-1.patch, mapred-1921-3.patch, 
 mapreduce-1921.patch


 If bzip or other decompression fails, the IOException  does not contain the 
 name of the broken file that caused the exception.
 It would be nice if such actions could be avoided in the future by having the 
 name of the files that are broken spelled
 out in the exception. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2331) Add coverage of task graph servlet to fair scheduler system test

2011-02-15 Thread Todd Lipcon (JIRA)
Add coverage of task graph servlet to fair scheduler system test


 Key: MAPREDUCE-2331
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2331
 Project: Hadoop Map/Reduce
  Issue Type: Test
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.22.0


Would be useful to hit the TaskGraph servlet in the fair scheduler system test. 
This way, when run under JCarder, it will check for any lock inversions in this 
code.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2116) optimize getTasksToKill to reduce JobTracker contention

2011-02-15 Thread Kang Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kang Xiao updated MAPREDUCE-2116:
-

Attachment: 2116.4.patch

 optimize getTasksToKill to reduce JobTracker contention
 ---

 Key: MAPREDUCE-2116
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2116
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 2116.1.patch, 2116.2.patch, 2116.3.patch, 2116.4.patch, 
 getTaskToKill.JPG


 getTasksToKill shows up as one of the top routines holding the JT lock. 
 Specifically, the translation from attemptid to tip is very expensive:
 at java.util.TreeMap.getEntry(TreeMap.java:328)
 at java.util.TreeMap.get(TreeMap.java:255)
 at 
 org.apache.hadoop.mapred.TaskInProgress.shouldClose(TaskInProgress.java:500)
 at 
 org.apache.hadoop.mapred.JobTracker.getTasksToKill(JobTracker.java:3464)
   locked 0x2aab6ebb6640 (a org.apache.hadoop.mapred.JobTracker)
 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3181)
 this seems like an avoidable expense since the tip for a given attempt is 
 fixed (and one should not need a map lookup to find the association). on a 
 different note - not clear to me why TreeMaps are in use here (i didn't find 
 any iteration over these maps). any background info on why things are 
 arranged the way they are would be useful.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-2116) optimize getTasksToKill to reduce JobTracker contention

2011-02-15 Thread Kang Xiao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12995176#comment-12995176
 ] 

Kang Xiao commented on MAPREDUCE-2116:
--

I am sorry for late response. A patch attached for the suggestion. It just uses 
the same as getTasksToSave() to iterator only on running tasks on the 
tasktracker.

 optimize getTasksToKill to reduce JobTracker contention
 ---

 Key: MAPREDUCE-2116
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2116
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Joydeep Sen Sarma
Assignee: Joydeep Sen Sarma
 Attachments: 2116.1.patch, 2116.2.patch, 2116.3.patch, 2116.4.patch, 
 getTaskToKill.JPG


 getTasksToKill shows up as one of the top routines holding the JT lock. 
 Specifically, the translation from attemptid to tip is very expensive:
 at java.util.TreeMap.getEntry(TreeMap.java:328)
 at java.util.TreeMap.get(TreeMap.java:255)
 at 
 org.apache.hadoop.mapred.TaskInProgress.shouldClose(TaskInProgress.java:500)
 at 
 org.apache.hadoop.mapred.JobTracker.getTasksToKill(JobTracker.java:3464)
   locked 0x2aab6ebb6640 (a org.apache.hadoop.mapred.JobTracker)
 at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3181)
 this seems like an avoidable expense since the tip for a given attempt is 
 fixed (and one should not need a map lookup to find the association). on a 
 different note - not clear to me why TreeMaps are in use here (i didn't find 
 any iteration over these maps). any background info on why things are 
 arranged the way they are would be useful.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-2332) Improve error messages when MR dirs on local FS have bad ownership

2011-02-15 Thread Todd Lipcon (JIRA)
Improve error messages when MR dirs on local FS have bad ownership
--

 Key: MAPREDUCE-2332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2332
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.23.0
 Attachments: mr-2332.txt

A common source of user difficulty on a secure cluster is understanding which 
paths should be owned by which users. The task log directory in particular is 
often missed, since it has to be owned by mapred but may be inside a logs dir 
which has different ownership. Right now the user has to spelunk in the code to 
understand the exception they get if this dir has bad ownership.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-2332) Improve error messages when MR dirs on local FS have bad ownership

2011-02-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-2332:
---

Attachment: mr-2332.txt

Something like this?

 Improve error messages when MR dirs on local FS have bad ownership
 --

 Key: MAPREDUCE-2332
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2332
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 0.23.0

 Attachments: mr-2332.txt


 A common source of user difficulty on a secure cluster is understanding which 
 paths should be owned by which users. The task log directory in particular is 
 often missed, since it has to be owned by mapred but may be inside a logs dir 
 which has different ownership. Right now the user has to spelunk in the code 
 to understand the exception they get if this dir has bad ownership.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira