[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-10-17 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


 Description: 
Re-factor MapReduce into a generic resource scheduler and a per-job, 
user-defined component that manages the application execution.



  was:
Re-factor MapReduce into a generic resource scheduler and a per-job, 
user-defined component that manages the application execution.

Check it out by following [the instructions|http://goo.gl/rSJJC].

Release Note: 
MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what 
we call, MapReduce 2.0 (MRv2).

The fundamental idea of MRv2 is to split up the two major functionalities of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate daemons. The idea is to have a global ResourceManager (RM) and 
per-application ApplicationMaster (AM).  An application is either a single job 
in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager 
and per-node slave, the NodeManager (NM), form the data-computation framework. 
The ResourceManager is the ultimate authority that arbitrates resources among 
all the applications in the system. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.



The ResourceManager has two main components:
* Scheduler (S)
* ApplicationsManager (ASM)


The Scheduler is responsible for allocating resources to the various running 
applications subject to familiar constraints of capacities, queues etc. The 
Scheduler is pure scheduler in the sense that it performs no monitoring or 
tracking of status for the application. Also, it offers no guarantees on 
restarting failed tasks either due to application failure or hardware failures. 
The Scheduler performs its scheduling function based the resource requirements 
of the applications; it does so based on the abstract notion of a Resource 
Container which incorporates elements such as memory, cpu, disk, network etc. 

The Scheduler has a pluggable policy plug-in, which is responsible for 
partitioning the cluster resources among the various queues, applications etc. 
The current Map-Reduce schedulers such as the CapacityScheduler and the 
FairScheduler would be some examples of the plug-in.

The ApplicationsManager is responsible for accepting job-submissions, 
negotiating the first container for executing the application specific 
ApplicationMaster and provides the service for restarting the ApplicationMaster 
container on failure.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, tracking their status and 
monitoring for progress.


Editorial pass over hadoop-0.23 content.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script-final.sh, 
> MR-279-script.sh, MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> NodeManager.gv, NodeManager.png, ResourceManager.gv, ResourceManager.png, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move-patch-final.txt, post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-10-18 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Release Note: 
MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what 
we call, MapReduce 2.0 (MRv2).

The fundamental idea of MRv2 is to split up the two major functionalities of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate daemons. The idea is to have a global ResourceManager (RM) and 
per-application ApplicationMaster (AM).  An application is either a single job 
in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager 
and per-node slave, the NodeManager (NM), form the data-computation framework. 
The ResourceManager is the ultimate authority that arbitrates resources among 
all the applications in the system. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.

The ResourceManager has two main components:
* Scheduler (S)
* ApplicationsManager (ASM)

The Scheduler is responsible for allocating resources to the various running 
applications subject to familiar constraints of capacities, queues etc. The 
Scheduler is pure scheduler in the sense that it performs no monitoring or 
tracking of status for the application. Also, it offers no guarantees on 
restarting failed tasks either due to application failure or hardware failures. 
The Scheduler performs its scheduling function based the resource requirements 
of the applications; it does so based on the abstract notion of a Resource 
Container which incorporates elements such as memory, cpu, disk, network etc. 

The Scheduler has a pluggable policy plug-in, which is responsible for 
partitioning the cluster resources among the various queues, applications etc. 
The current Map-Reduce schedulers such as the CapacityScheduler and the 
FairScheduler would be some examples of the plug-in.

The CapacityScheduler supports hierarchical queues to allow for more 
predictable sharing of cluster resources.
The ApplicationsManager is responsible for accepting job-submissions, 
negotiating the first container for executing the application specific 
ApplicationMaster and provides the service for restarting the ApplicationMaster 
container on failure.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, tracking their status and 
monitoring for progress.


  was:
MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what 
we call, MapReduce 2.0 (MRv2).

The fundamental idea of MRv2 is to split up the two major functionalities of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate daemons. The idea is to have a global ResourceManager (RM) and 
per-application ApplicationMaster (AM).  An application is either a single job 
in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager 
and per-node slave, the NodeManager (NM), form the data-computation framework. 
The ResourceManager is the ultimate authority that arbitrates resources among 
all the applications in the system. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.



The ResourceManager has two main components:
* Scheduler (S)
* ApplicationsManager (ASM)


The Scheduler is responsible for allocating resources to the various running 
applications subject to familiar constraints of capacities, queues etc. The 
Scheduler is pure scheduler in the sense that it performs no monitoring or 
tracking of status for the application. Also, it offers no guarantees on 
restarting failed tasks either due to application failure or hardware failures. 
The Scheduler performs its scheduling function based the resource requirements 
of the applications; it does so based on the abstract notion of a Resource 
Container which incorporates elements such as memory, cpu, disk, network etc. 

The Scheduler has a pluggable policy plug-in, which is responsible for 
partitioning the cluster resources among the various queues, applications etc. 
The current Map-Reduce schedulers such as the CapacityScheduler and the 
FairScheduler would be some examples of the plug-in.

The ApplicationsManager is responsible for accepting job-submissions, 
negotiating the first container for executing the application specific 
ApplicationMaster and provides the service for r

[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-14 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


  Component/s: tasktracker
   jobtracker
Fix Version/s: 0.23.0
 Assignee: Arun C Murthy

h5. Proposal 

The fundamental idea of the re-factor is to divide the two major functions of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate components: a generic resource scheduler and a per-job, user-defined 
component that manages the application execution. 

The new ResourceManager manages the global assignment of compute resources to 
applications and the per-application ApplicationMaster manages the 
application's scheduling and coordination. An application is either a single 
job in the classic MapReduce jobs or a DAG of such jobs. The ResourceManager 
and per-machine NodeManager server, which manages the user processes on that 
machine, form the computation fabric. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.

The ResourceManager is a pure scheduler in the sense that it performs no 
monitoring or tracking of status for the application. Also, it offers no 
guarantees on restarting failed tasks either due to application failure or 
hardware failures.

The ResourceManager performs its scheduling function based the resource 
requirements of the applications; each application has multiple resource 
request types that represent the resources required for containers. The 
resource requests include memory, CPU, disk, network etc. Note that this is a 
significant change from the current model of fixed-type slots in Hadoop 
MapReduce, which leads to significant negative impact on cluster utilization. 
The ResourceManager has a scheduler policy plug-in, which is responsible for 
partitioning the cluster resources among various queues, applications etc. 
Scheduler plug-ins can be based, for e.g., on the current CapacityScheduler and 
FairScheduler.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, launching tasks, tracking 
their status & monitoring for progress, handling task-failures and recovering 
from saved state on an ResourceManager fail-over.

Since downtime is more expensive at scale high-availability is built-in from 
the beginning via Apache ZooKeeper for the ResourceManager and HDFS checkpoint 
for the MapReduce ApplicationMaster. Security and multi-tenancy support is 
critical to support many users on the larger clusters. The new architecture 
will also increase innovation and agility by allowing for user-defined versions 
of MapReduce runtime. Support for generic resource requests will increase 
cluster utilization by removing artificial bottlenecks such as 
hard-partitioning of resources into map and reduce slots.



We have a *prototype* we'd like to commit to a branch soon, where we look 
forward to feedback. From there on, we would love to collaborate to get it 
committed to trunk.



> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
>
> We, at Yahoo!, have been using Hadoop-On-Demand as the resource 
> provisioning/scheduling mechanism. 
> With HoD the user uses a self-service system to ask-for a set of nodes. HoD 
> allocates these from a global pool and also provisions a private Map-Reduce 
> cluster for the user. She then runs her jobs and shuts the cluster down via 
> HoD when done. All user-private clusters use the same humongous, static HDFS 
> (e.g. 2k node HDFS). 
> More details about HoD are available here: HADOOP-1301.
> 
> h3. Motivation
> The current deployment (Hadoop + HoD) has a couple of implications:
>  * _Non-optimal Cluster Utilization_
>1. Job-private Map-Reduce clusters imply that the user-cluster potentially 
> could be *idle* for atleast a while before being detected and shut-down.
>2. Elastic Jobs: Map-Reduce jobs, typically, have lots of maps with 
> much-smaller no. of reduces; with maps being light and quick and reduces 
> being i/o heavy and longer-running. Users typically allocate clusters 
> depending on the no. of maps (i.e. input size) which leads to the scenario 
> where all the maps are done (idle nod

[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Description: Re-factor MapReduce into a generic resource scheduler and a 
per-job, user-defined component that manages the application execution.   (was: 
We, at Yahoo!, have been using Hadoop-On-Demand as the resource 
provisioning/scheduling mechanism. 

With HoD the user uses a self-service system to ask-for a set of nodes. HoD 
allocates these from a global pool and also provisions a private Map-Reduce 
cluster for the user. She then runs her jobs and shuts the cluster down via HoD 
when done. All user-private clusters use the same humongous, static HDFS (e.g. 
2k node HDFS). 

More details about HoD are available here: HADOOP-1301.



h3. Motivation

The current deployment (Hadoop + HoD) has a couple of implications:

 * _Non-optimal Cluster Utilization_

   1. Job-private Map-Reduce clusters imply that the user-cluster potentially 
could be *idle* for atleast a while before being detected and shut-down.

   2. Elastic Jobs: Map-Reduce jobs, typically, have lots of maps with 
much-smaller no. of reduces; with maps being light and quick and reduces being 
i/o heavy and longer-running. Users typically allocate clusters depending on 
the no. of maps (i.e. input size) which leads to the scenario where all the 
maps are done (idle nodes in the cluster) and the few reduces are chugging 
along. Right now, we do not have the ability to shrink the HoD'ed Map-Reduce 
clusters which would alleviate this issue. 

 * _Impact on data-locality_

With the current setup of a static, large HDFS and much smaller (5/10/20/50 
node) clusters there is a good chance of losing one of Map-Reduce's primary 
features: ability to execute tasks on the datanodes where the input splits are 
located. In fact, we have seen the data-local tasks go down to 20-25 percent in 
the GridMix benchmarks, from the 95-98 percent we see on the randomwriter+sort 
runs run as part of the hadoopqa benchmarks (admittedly a synthetic benchmark, 
but yet). Admittedly, HADOOP-1985 (rack-aware Map-Reduce) helps significantly 
here.



Primarily, the notion of *job-level scheduling* leading to private clusers, as 
opposed to *task-level scheduling*, is a good peg to hang-on the majority of 
the blame.

Keeping the above factors in mind, here are some thoughts on how to 
re-structure Hadoop Map-Reduce to solve some of these issues.



h3. State of the Art

As it exists today, a large, static, Hadoop Map-Reduce cluster (forget HoD for 
a bit) does provide task-level scheduling; however as it exists today, it's 
scalability to tens-of-thousands of user-jobs, per-week, is in question.

Lets review it's current architecture and main components:

 * JobTracker: It does both *task-scheduling* and *task-monitoring* 
(tasktrackers send task-statuses via periodic heartbeats), which implies it is 
fairly loaded. It is also a _single-point of failure_ in the Map-Reduce 
framework i.e. its failure implies that all the jobs in the system fail. This 
means a static, large Map-Reduce cluster is fairly susceptible and a definite 
suspect. Clearly HoD solves this by having per-job clusters, albeit with the 
above drawbacks.
 * TaskTracker: The slave in the system which executes one task at-a-time under 
directions from the JobTracker.
 * JobClient: The per-job client which just submits the job and polls the 
JobTracker for status. 



h3. Proposal - Map-Reduce 2.0 

The primary idea is to move to task-level scheduling and static Map-Reduce 
clusters (so as to maintain the same storage cluster and compute cluster 
paradigm) as a way to directly tackle the two main issues illustrated above. 
Clearly, we will have to get around the existing problems, especially w.r.t. 
scalability and reliability.

The proposal is to re-work Hadoop Map-Reduce to make it suitable for a large, 
static cluster. 

Here is an overview of how its main components would look like:
 * JobTracker: Turn the JobTracker into a pure task-scheduler, a global one. 
Lets call this the *JobScheduler* henceforth. Clearly (data-locality aware) 
Maui/Moab are  candidates for being the scheduler, in which case, the 
JobScheduler is just a thin wrapper around them. 
 * TaskTracker: These stay as before, without some minor changes as illustrated 
later in the piece.
 * JobClient: Fatten up the JobClient my putting a lot more intelligence into 
it. Enhance it to talk to the JobTracker to ask for available TaskTrackers and 
then contact them to schedule and monitor the tasks. So we'll have lots of 
per-job clients talking to the JobScheduler and the relevant TaskTrackers for 
their respective jobs, a big change from today. Lets call this the *JobManager* 
henceforth. 

A broad sketch of how things would work: 

h4. Deployment

There is a single, static, large Map-Reduce cluster, and no per-job c

[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: We're having a baby!

Todd Papaioannou (p9u) is action head of Hadoop.
Most line issues can continue to go to Amol, Kazi, Satish, Avik or Senthil as 
appropriate.

I'll be back on roughly march 9th.

CUSoon,
E14
)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-15 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: h5. Proposal 

The fundamental idea of the re-factor is to divide the two major functions of 
the JobTracker, resource management and job scheduling/monitoring, into 
separate components: a generic resource scheduler and a per-job, user-defined 
component that manages the application execution. 

The new ResourceManager manages the global assignment of compute resources to 
applications and the per-application ApplicationMaster manages the 
application's scheduling and coordination. An application is either a single 
job in the classic MapReduce jobs or a DAG of such jobs. The ResourceManager 
and per-machine NodeManager server, which manages the user processes on that 
machine, form the computation fabric. The per-application ApplicationMaster is, 
in effect, a framework specific library and is tasked with negotiating 
resources from the ResourceManager and working with the NodeManager(s) to 
execute and monitor the tasks.

The ResourceManager is a pure scheduler in the sense that it performs no 
monitoring or tracking of status for the application. Also, it offers no 
guarantees on restarting failed tasks either due to application failure or 
hardware failures.

The ResourceManager performs its scheduling function based the resource 
requirements of the applications; each application has multiple resource 
request types that represent the resources required for containers. The 
resource requests include memory, CPU, disk, network etc. Note that this is a 
significant change from the current model of fixed-type slots in Hadoop 
MapReduce, which leads to significant negative impact on cluster utilization. 
The ResourceManager has a scheduler policy plug-in, which is responsible for 
partitioning the cluster resources among various queues, applications etc. 
Scheduler plug-ins can be based, for e.g., on the current CapacityScheduler and 
FairScheduler.

The NodeManager is the per-machine framework agent who is responsible for 
launching the applications' containers, monitoring their resource usage (cpu, 
memory, disk, network) and reporting the same to the Scheduler.

The per-application ApplicationMaster has the responsibility of negotiating 
appropriate resource containers from the Scheduler, launching tasks, tracking 
their status & monitoring for progress, handling task-failures and recovering 
from saved state on an ResourceManager fail-over.

Since downtime is more expensive at scale high-availability is built-in from 
the beginning via Apache ZooKeeper for the ResourceManager and HDFS checkpoint 
for the MapReduce ApplicationMaster. Security and multi-tenancy support is 
critical to support many users on the larger clusters. The new architecture 
will also increase innovation and agility by allowing for user-defined versions 
of MapReduce runtime. Support for generic resource requests will increase 
cluster utilization by removing artificial bottlenecks such as 
hard-partitioning of resources into map and reduce slots.



We have a *prototype* we'd like to commit to a branch soon, where we look 
forward to feedback. From there on, we would love to collaborate to get it 
committed to trunk.

)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: I'm way out of the office, I'm helping with the newest addition to our 
family, Jack baldeschwieler Yoshikawa.

Todd Papaioannou (p9u) is action head of Hadoop.
Most line issues can continue to go to Amol, Kazi, Satish, Avik or Senthil as 
appropriate.

I'll be back on roughly march 9th.

CUSoon,
E14
)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-02-25 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: Am out of office and will return on March 2 2011.
)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-03-16 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: MR-279_MR_files_to_move.txt
MR-279.sh
MR-279.patch

Folks, we are happy to put out a first cut of MRv2.

A brief overview:

A global ResourceManager (RM) tracks machine availability and scheduling 
invariants while a per-application ApplicationMaster (AM) runs inside the 
cluster and tracks the program semantics for a given job. An application is 
either a single MapReduce job as the JobTracker supports today, it could be a 
directed, acyclic graph (DAG) of MapReduce jobs, or it could be a new 
framework. Each machine in the cluster runs a per-node daemon, the NodeManager 
(NM), responsible for enforcing and reporting the resource allocations made by 
the RM and monitoring the lifecycle of processes spawned on behalf of an 
application. Each process started by the NM is conceptually a container, or a 
bundle of resources allocated by the RM.

We call the new framework (RM/NM) as YARN (Yet Another Resource Negotiator)... 
;-)

Source layout:

# A new yarn source folder contains the RM and NM.
# A new mr-client folder contains all of the MapReduce runtime. This includes 
the MapReduce ApplicationMaster and all of the classes for running MapReduce 
applications. Please note that the MR runtime has not changed at all, including 
the user apis - we continue to support both the old 'mapred' api and the new 
'mapreduce' api (context-objects). We are moving some classes from 
src/java/mapred/* to mr-client to achieve the same.
# We have continued to keep the old JobTracker/TaskTracker based MapReduce 
framework in src/java.

Build:
# We decided to embrace maven for MRv2, hence yarn and mr-client are built via 
maven.
# For now the old JT/TT based MR framework continues to use ant/ivy. Hopefully 
we can change this soon - I know Giri is working on this for common, hdfs and 
mapreduce at one go.

There is a INSTALL file which describes how to build, deploy MRv2 and also how 
to run MR applications.




I'm planning on committing this patch to a development branch (named 
MAPREDUCE-279) soon so that we can continue all our work via Apache in the 
open. We *really* look forward to feedback and working with the community 
henceforth. We have many many miles to go and promises to keep! ;-)

PS: I have attached a script (MR-279.sh) to show the the files being moved to 
mr-client for the MR runtime, a list of files being moved and the actual patch 
to apply after. Also, please note that the patch is significantly bigger than 
it should be since it includes binary images (via git diff --text).

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-03-17 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-279:


Comment: was deleted

(was: Hi Folks,

I'm back part-time, but I'm mainly focused on catching up, annual focal reviews 
and adjusting to life with a newborn at home.

Todd Papaioannou (p9u) remains acting head of Hadoop this week.

Most line issues can continue to go to Amol, Kazi, Satish, Avik or Senthil as 
appropriate.

I am about, drop me a line on my personal email or call my cell if you need 
rapid response, but I am reading mail now.

CUSoon,
E14
)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Updated: (MAPREDUCE-279) Map-Reduce 2.0

2011-03-17 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: MR-279.patch

Updated patch, adding missing license headers for some files.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-03-21 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-279:
--

Comment: was deleted

(was: I'm traveling and will return to the office on Monday, March 28th.

For urgent matters, please contact Aparna Ramani.

Thanks!

-- Philip
)

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-03-21 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu updated MAPREDUCE-279:
--

Attachment: multi-column-stable-sort-default-theme.png
capacity-scheduler-dark-theme.png

Arun suggested that I attach some screenshots of the new mapreduce web UI here.

* multi-column-stable-sort: demonstrates resource manager apps UI (in the 
default theme) multi-column sort by user name (ascending) and progress 
(descending.)
* capacity-scheduler: demonstrates the capacity scheduler UI (in a dark-theme) 
selecting a sub queue.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, 
> multi-column-stable-sort-default-theme.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-04-15 Thread Greg Roelofs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Roelofs updated MAPREDUCE-279:
---

Attachment: yarn-state-machine.task.png
yarn-state-machine.task-attempt.png
yarn-state-machine.job.png
yarn-state-machine.task.dot
yarn-state-machine.task-attempt.dot
yarn-state-machine.job.dot

dot(1) files for the Job, Task, and TaskAttempt state machines in MRv2, at 
least as of late March.  I found the graphs very useful while learning and 
modifying the MRv2 code for MAPREDUCE-2405.

These can be converted to PostScript or PNG or whatnot with dot, which is part 
of the Graphviz distribution (graphviz.org, I think).  Here's a sample command 
for PostScript:

{{dot -Tps yarn-state-machine.task-attempt.dot > 
yarn-state-machine.task-attempt.ps}}

Ultimately a version of these should be produced natively in some StateMachine 
method ({{toDot()}}?), and I think Chris Douglas may take that up eventually.  
However, some of the desirable info (e.g., which states send events to or 
receive them from other state machines) can't really be discovered 
automatically, so there will continue to be a place for hand-rolled graphs.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, 
> multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, 
> yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, 
> yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, 
> yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-06-16 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu updated MAPREDUCE-279:
--

Component/s: (was: jobtracker)
 (was: tasktracker)
 mrv2
Description: 
Re-factor MapReduce into a generic resource scheduler and a per-job, 
user-defined component that manages the application execution.

Check it out by following [the instructions|http://goo.gl/rSJJC].

  was:Re-factor MapReduce into a generic resource scheduler and a per-job, 
user-defined component that manages the application execution. 

   Tags: mr2,mapreduce-2.0

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, 
> multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, 
> yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, 
> yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, 
> yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-07-05 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-279:
-

Attachment: hadoop_contributors_meet_07_01_2011.pdf

Slides from Hadoop Contributors meet held on 07/01 having some design details 
on RM and AM. Also the high level APIs for writing new AMs.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, capacity-scheduler-dark-theme.png, 
> hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, 
> yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, 
> yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, 
> yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-07-11 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: MapReduce_NextGen_Architecture.pdf

MRv2 architecture document.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, yarn-state-machine.job.dot, 
> yarn-state-machine.job.png, yarn-state-machine.task-attempt.dot, 
> yarn-state-machine.task-attempt.png, yarn-state-machine.task.dot, 
> yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-16 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-279:


Attachment: post-move.patch
MR-279_MR_files_to_move.txt
MR-279-script.sh

Thanks to Vinod, attached is a script (MR-279-script.sh) and an input file 
(MR-279_MR_files_to_move.txt).

The script needs to be changed to point to MR-279 branch you have checked out 
and the trunk. The script will move map reduce runtime files around in trunk 
and copy the new framework from MR-279 branch to trunk. 

After running the script you will have to apply the patch (post-move.patch) on 
trunk. These  small changes are needed to the new framework to work with trunk.

You will have to run mvn install -DskipTests before you run any ant targets.

Also, in the script its just local mv/cp. When we actually merge the changes, 
it will be svn mv/copy.



> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move.txt, MR-279_MR_files_to_move.txt, 
> MapReduce_NextGen_Architecture.pdf, capacity-scheduler-dark-theme.png, 
> hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-16 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: MR-279-script.sh
post-move.patch

Update script for changing layout as suggested by Alejandro in MAPREDUCE-2842 
and update post-move.patch to apply to the new layout. This is WIP, I still 
need to change the artifact names and dependencies as suggested by Alejandro.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script.sh, MR-279-script.sh, MR-279.patch, 
> MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-16 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: post-move.patch

nearly there with the artifact/deps changes... not done yet.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script.sh, MR-279-script.sh, MR-279.patch, 
> MR-279.patch, MR-279.sh, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move.patch, post-move.patch, 
> post-move.patch, yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-17 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-279:
--

Attachment: post-move-patch-20110817.2.txt
MR-279_MR_files_to_move-20110817.txt
MR-279-script-20110817.sh

Updated script, to-be-moved-files-list and the post-move patch to reflect the 
directory structure suggested at MAPREDUCE-2842 (all modules with hadoop- 
prefix).

This is close now, mvn install, ant jar jar-test binary etc pass with this. 
Making sure 'ant test' passes is the pending item.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script.sh, 
> MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-17 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-279:


Attachment: post-move-patch-final.txt

An updated patch on top of Vinod's latest scripts. This fixes ant test.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script.sh, 
> MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move-patch-final.txt, post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-08-17 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-279:


Attachment: MR-279-script-final.sh

Per the vote in mapreduce-dev@ I've merged MR-279 to a preview branch 
(MR-279-merge) and am doing the final set of tests.

I'm attaching the shell script I used for the merge.

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script-final.sh, 
> MR-279-script.sh, MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move-patch-final.txt, post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-09-02 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated MAPREDUCE-279:


Attachment: ResourceManager.gv
ResourceManager.png

State graph for ResourceManager


> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script-final.sh, 
> MR-279-script.sh, MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> ResourceManager.gv, ResourceManager.png, capacity-scheduler-dark-theme.png, 
> hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move-patch-final.txt, post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-279) Map-Reduce 2.0

2011-09-02 Thread Binglin Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated MAPREDUCE-279:


Attachment: NodeManager.png
NodeManager.gv

> Map-Reduce 2.0
> --
>
> Key: MAPREDUCE-279
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-279
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Arun C Murthy
> Fix For: 0.23.0
>
> Attachments: MR-279-script-20110817.sh, MR-279-script-final.sh, 
> MR-279-script.sh, MR-279-script.sh, MR-279.patch, MR-279.patch, MR-279.sh, 
> MR-279_MR_files_to_move-20110817.txt, MR-279_MR_files_to_move.txt, 
> MR-279_MR_files_to_move.txt, MapReduce_NextGen_Architecture.pdf, 
> NodeManager.gv, NodeManager.png, ResourceManager.gv, ResourceManager.png, 
> capacity-scheduler-dark-theme.png, hadoop_contributors_meet_07_01_2011.pdf, 
> multi-column-stable-sort-default-theme.png, post-move-patch-20110817.2.txt, 
> post-move-patch-final.txt, post-move.patch, post-move.patch, post-move.patch, 
> yarn-state-machine.job.dot, yarn-state-machine.job.png, 
> yarn-state-machine.task-attempt.dot, yarn-state-machine.task-attempt.png, 
> yarn-state-machine.task.dot, yarn-state-machine.task.png
>
>
> Re-factor MapReduce into a generic resource scheduler and a per-job, 
> user-defined component that manages the application execution.
> Check it out by following [the instructions|http://goo.gl/rSJJC].

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira