resource management proviosioning for Hadoop
--------------------------------------------
Key: HADOOP-1301
URL: https://issues.apache.org/jira/browse/HADOOP-1301
Project: Hadoop
Issue Type: New Feature
Components: contrib/hbase
Reporter: Pete Wyckoff
Priority: Minor
The Hadoop On Demand (HOD) project addresses the provisioning and managing of
MapReduce instances on cluster resources. With HOD, the MapReduce user
interacts with the cluster solely through a self-service interface and the JT,
TT info ports. The user never needs to log into the cluster or even have an
account on the cluster for that matter. HOD allocates nodes, provisions
MapReduce (and optionally HDFS) on the cluster and when the user is done with
MapReduce jobs, cleanly shuts down MapReduce and de-allocates the nodes (i.e.,
re-introducing them to the pool of available resources in the cluster).
Using HOD, a cluster can be shared among different users in a fair and
efficient manner. HOD is not a replacement or re-implementation of a
traditional resource manager. HOD is implemented using the resource manager
paradigm and at present is envisioned supporting Torque and Condor out of the
box. It also supports "static" resources, i.e., a dedicated set of resources
not using a resource manager.
HOD is also self provisioning and, thus, can be used on systems such as EC2 or
a campus cluster not already running MapReduce software or a resouce manager.
Figure 1 depicts a cluster using HOD. As the figure shows, the user never logs
into the cluster itself. The user's jobs run as the 'hod' user (a configurable
unix id).
The user interacts with MapReduce and the cluster using the hod shell, hodsh.
Once in the hodsh, the user can allocate/de-allocate nodes and automatically
run JT, TTs, NN, DNs on those nodes without knowing the specifics of which
nodes are running which or logging into any of those boxes. HOD transparently
masks failures by allocating nodes to replace failed nodes. Once the user has
allocated nodes, she can run /bin/MapReduce my1.jar and then /bin/MapReduce
my2.jar ... from within the hod shell which automatically generates the
configuration file for the MapReduce script. When done, the user will exit the
shell.
The hod shell has an automatic timeout so that users cannot hog resources they
aren't using. The timeout applies only when there is no MapReduce job running.
In addition, hod also has the option of tracking and enforcing user/group
resource limits.
Optionally, HOD can run dedicated log and directory services in the cluster.
The log services are a central repository for collecting and retrieving Hadoop
logs for any given job. The directory service provides an easy way to inspect
what's running in the cluster or for the end user and html interfacing for
getting to their JT and TT info ports.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.