[05/43] hadoop git commit: YARN-3168. Convert site documentation from apt to markdown (Gururaj Shetty via aw)

zjshen Tue, 03 Mar 2015 11:32:26 -0800

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md
new file mode 100644
index 0000000..e516afb
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/ResourceManagerRestart.md
@@ -0,0 +1,181 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+ResourceManger Restart
+======================
+
+* [Overview](#Overview)
+* [Feature](#Feature)
+* [Configurations](#Configurations)
+    * [Enable RM Restart](#Enable_RM_Restart)
+    * [Configure the state-store for persisting the RM 
state](#Configure_the_state-store_for_persisting_the_RM_state)
+    * [How to choose the state-store 
implementation](#How_to_choose_the_state-store_implementation)
+    * [Configurations for Hadoop FileSystem based state-store 
implementation](#Configurations_for_Hadoop_FileSystem_based_state-store_implementation)
+    * [Configurations for ZooKeeper based state-store 
implementation](#Configurations_for_ZooKeeper_based_state-store_implementation)
+    * [Configurations for LevelDB based state-store 
implementation](#Configurations_for_LevelDB_based_state-store_implementation)
+    * [Configurations for work-preserving RM 
recovery](#Configurations_for_work-preserving_RM_recovery)
+* [Notes](#Notes)
+* [Sample Configurations](#Sample_Configurations)
+
+Overview
+--------
+
+ResourceManager is the central authority that manages resources and schedules 
applications running atop of YARN. Hence, it is potentially a single point of 
failure in a Apache YARN cluster.
+`
+This document gives an overview of ResourceManager Restart, a feature that 
enhances ResourceManager to keep functioning across restarts and also makes 
ResourceManager down-time invisible to end-users.
+
+ResourceManager Restart feature is divided into two phases: 
+
+* **ResourceManager Restart Phase 1 (Non-work-preserving RM restart)**: 
Enhance RM to persist application/attempt state and other credentials 
information in a pluggable state-store. RM will reload this information from 
state-store upon restart and re-kick the previously running applications. Users 
are not required to re-submit the applications.
+
+* **ResourceManager Restart Phase 2 (Work-preserving RM restart)**: Focus on 
re-constructing the running state of ResourceManager by combining the container 
statuses from NodeManagers and container requests from ApplicationMasters upon 
restart. The key difference from phase 1 is that previously running 
applications will not be killed after RM restarts, and so applications won't 
lose its work because of RM outage.
+
+Feature
+-------
+
+* **Phase 1: Non-work-preserving RM restart** 
+
+     As of Hadoop 2.4.0 release, only ResourceManager Restart Phase 1 is 
implemented which is described below.
+
+     The overall concept is that RM will persist the application metadata 
(i.e. ApplicationSubmissionContext) in a pluggable state-store when client 
submits an application and also saves the final status of the application such 
as the completion state (failed, killed, finished) and diagnostics when the 
application completes. Besides, RM also saves the credentials like security 
keys, tokens to work in a secure  environment. Any time RM shuts down, as long 
as the required information (i.e.application metadata and the alongside 
credentials if running in a secure environment) is available in the 
state-store, when RM restarts, it can pick up the application metadata from the 
state-store and re-submit the application. RM won't re-submit the applications 
if they were already completed (i.e. failed, killed, finished) before RM went 
down.
+
+     NodeManagers and clients during the down-time of RM will keep polling RM 
until RM comes up. When RM becomes alive, it will send a re-sync command to all 
the NodeManagers and ApplicationMasters it was talking to via heartbeats. As of 
Hadoop 2.4.0 release, the behaviors for NodeManagers and ApplicationMasters to 
handle this command are: NMs will kill all its managed containers and 
re-register with RM. From the RM's perspective, these re-registered 
NodeManagers are similar to the newly joining NMs. AMs(e.g. MapReduce AM) are 
expected to shutdown when they receive the re-sync command. After RM restarts 
and loads all the application metadata, credentials from state-store and 
populates them into memory, it will create a new attempt (i.e. 
ApplicationMaster) for each application that was not yet completed and re-kick 
that application as usual. As described before, the previously running 
applications' work is lost in this manner since they are essentially killed by 
RM via the re-sync co
 mmand on restart.
+
+* **Phase 2: Work-preserving RM restart** 
+
+     As of Hadoop 2.6.0, we further enhanced RM restart feature to address the 
problem to not kill any applications running on YARN cluster if RM restarts.
+
+     Beyond all the groundwork that has been done in Phase 1 to ensure the 
persistency of application state and reload that state on recovery, Phase 2 
primarily focuses on re-constructing the entire running state of YARN cluster, 
the majority of which is the state of the central scheduler inside RM which 
keeps track of all containers' life-cycle, applications' headroom and resource 
requests, queues' resource usage etc. In this way, RM doesn't need to kill the 
AM and re-run the application from scratch as it is done in Phase 1. 
Applications can simply re-sync back with RM and resume from where it were left 
off.
+
+     RM recovers its runing state by taking advantage of the container 
statuses sent from all NMs. NM will not kill the containers when it re-syncs 
with the restarted RM. It continues managing the containers and send the 
container statuses across to RM when it re-registers. RM reconstructs the 
container instances and the associated applications' scheduling status by 
absorbing these containers' information. In the meantime, AM needs to re-send 
the outstanding resource requests to RM because RM may lose the unfulfilled 
requests when it shuts down. Application writers using AMRMClient library to 
communicate with RM do not need to worry about the part of AM re-sending 
resource requests to RM on re-sync, as it is automatically taken care by the 
library itself.
+
+Configurations
+--------------
+
+This section describes the configurations involved to enable RM Restart 
feature.
+
+### Enable RM Restart
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.recovery.enabled` | `true` |
+
+### Configure the state-store for persisting the RM state
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.store.class` | The class name of the state-store to be 
used for saving application/attempt state and the credentials. The available 
state-store implementations are 
`org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore`, a 
ZooKeeper based state-store implementation and 
`org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore`,
 a Hadoop FileSystem based state-store implementation like HDFS and local FS. 
`org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore`, a 
LevelDB based state-store implementation. The default value is set to 
`org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore`.
 |
+
+### How to choose the state-store implementation
+
+   * **ZooKeeper based state-store**: User is free to pick up any storage to 
set up RM restart, but must use ZooKeeper based state-store to support RM HA. 
The reason is that only ZooKeeper based state-store supports fencing mechanism 
to avoid a split-brain situation where multiple RMs assume they are active and 
can edit the state-store at the same time.
+
+   * **FileSystem based state-store**: HDFS and local FS based state-store are 
supported. Fencing mechanism is not supported.
+
+   * **LevelDB based state-store**: LevelDB based state-store is considered 
more light weight than HDFS and ZooKeeper based state-store. LevelDB supports 
better atomic operations, fewer I/O ops per state update,
+    and far fewer total files on the filesystem. Fencing mechanism is not 
supported.
+
+### Configurations for Hadoop FileSystem based state-store implementation
+
+   Support both HDFS and local FS based state-store implementation. The type 
of file system to be used is determined by the scheme of URI. e.g. 
`hdfs://localhost:9000/rmstore` uses HDFS as the storage and 
`file:///tmp/yarn/rmstore` uses local FS as the storage. If no scheme(`hdfs://` 
or `file://`) is specified in the URI, the type of storage to be used is 
determined by `fs.defaultFS` defined in `core-site.xml`.
+
+* Configure the URI where the RM state will be saved in the Hadoop FileSystem 
state-store.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.fs.state-store.uri` | URI pointing to the location of 
the FileSystem path where RM state will be stored (e.g. 
hdfs://localhost:9000/rmstore). Default value is 
`${hadoop.tmp.dir}/yarn/system/rmstore`. If FileSystem name is not provided, 
`fs.default.name` specified in **conf/core-site.xml* will be used. |
+
+* Configure the retry policy state-store client uses to connect with the 
Hadoop FileSystem.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.fs.state-store.retry-policy-spec` | Hadoop FileSystem 
client retry policy specification. Hadoop FileSystem client retry is always 
enabled. Specified in pairs of sleep-time and number-of-retries i.e. (t0, n0), 
(t1, n1), ..., the first n0 retries sleep t0 milliseconds on average, the 
following n1 retries sleep t1 milliseconds on average, and so on. Default value 
is (2000, 500) |
+
+### Configurations for ZooKeeper based state-store implementation
+  
+* Configure the ZooKeeper server address and the root path where the RM state 
is stored.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.zk-address` | Comma separated list of Host:Port pairs. 
Each corresponds to a ZooKeeper server (e.g. 
"127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the RM for 
storing RM state. |
+| `yarn.resourcemanager.zk-state-store.parent-path` | The full path of the 
root znode where RM state will be stored. Default value is /rmstore. |
+
+* Configure the retry policy state-store client uses to connect with the 
ZooKeeper server.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.zk-num-retries` | Number of times RM tries to connect 
to ZooKeeper server if the connection is lost. Default value is 500. |
+| `yarn.resourcemanager.zk-retry-interval-ms` | The interval in milliseconds 
between retries when connecting to a ZooKeeper server. Default value is 2 
seconds. |
+| `yarn.resourcemanager.zk-timeout-ms` | ZooKeeper session timeout in 
milliseconds. This configuration is used by the ZooKeeper server to determine 
when the session expires. Session expiration happens when the server does not 
hear from the client (i.e. no heartbeat) within the session timeout period 
specified by this configuration. Default value is 10 seconds |
+
+* Configure the ACLs to be used for setting permissions on ZooKeeper znodes.
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.zk-acl` | ACLs to be used for setting permissions on 
ZooKeeper znodes. Default value is `world:anyone:rwcda` |
+
+### Configurations for LevelDB based state-store implementation
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.leveldb-state-store.path` | Local path where the RM 
state will be stored. Default value is `${hadoop.tmp.dir}/yarn/system/rmstore` |
+
+### Configurations for work-preserving RM recovery
+
+| Property | Description |
+|:---- |:---- |
+| `yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms` | Set the 
amount of time RM waits before allocating new containers on RM work-preserving 
recovery. Such wait period gives RM a chance to settle down resyncing with NMs 
in the cluster on recovery, before assigning new containers to applications.|
+
+Notes
+-----
+
+ContainerId string format is changed if RM restarts with work-preserving 
recovery enabled. It used to be such format:
+
+    Container_{clusterTimestamp}_{appId}_{attemptId}_{containerId}, e.g. 
Container_1410901177871_0001_01_000005.
+
+It is now changed to:
+
+    Container_e{epoch}_{clusterTimestamp}_{appId}_{attemptId}_{containerId}, 
e.g. Container_e17_1410901177871_0001_01_000005.
+ 
+Here, the additional epoch number is a monotonically increasing integer which 
starts from 0 and is increased by 1 each time RM restarts. If epoch number is 
0, it is omitted and the containerId string format stays the same as before.
+
+Sample Configurations
+---------------------
+
+Below is a minimum set of configurations for enabling RM work-preserving 
restart using ZooKeeper based state store.
+
+
+     <property>
+       <description>Enable RM to recover state after starting. If true, then 
+       yarn.resourcemanager.store.class must be specified</description>
+       <name>yarn.resourcemanager.recovery.enabled</name>
+       <value>true</value>
+     </property>
+   
+     <property>
+       <description>The class to use as the persistent store.</description>
+       <name>yarn.resourcemanager.store.class</name>
+       
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
+     </property>
+
+     <property>
+       <description>Comma separated list of Host:Port pairs. Each corresponds 
to a ZooKeeper server
+       (e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002") to be used by the 
RM for storing RM state.
+       This must be supplied when using 
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
+       as the value for yarn.resourcemanager.store.class</description>
+       <name>yarn.resourcemanager.zk-address</name>
+       <value>127.0.0.1:2181</value>
+     </property>
+
+


http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/SecureContainer.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/SecureContainer.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/SecureContainer.md
new file mode 100644
index 0000000..f32e460
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/SecureContainer.md
@@ -0,0 +1,135 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+YARN Secure Containers
+======================
+
+* [Overview](#Overview)
+
+Overview
+--------
+
+YARN containers in a secure cluster use the operating system facilities to 
offer execution isolation for containers. Secure containers execute under the 
credentials of the job user. The operating system enforces access restriction 
for the container. The container must run as the use that submitted the 
application.
+
+Secure Containers work only in the context of secured YARN clusters.
+
+###Container isolation requirements
+
+  The container executor must access the local files and directories needed by 
the container such as jars, configuration files, log files, shared objects etc. 
Although it is launched by the NodeManager, the container should not have 
access to the NodeManager private files and configuration. Container running 
applications submitted by different users should be isolated and unable to 
access each other files and directories. Similar requirements apply to other 
system non-file securable objects like named pipes, critical sections, LPC 
queues, shared memory etc.
+
+###Linux Secure Container Executor
+
+  On Linux environment the secure container executor is the 
`LinuxContainerExecutor`. It uses an external program called the 
**container-executor**\> to launch the container. This program has the `setuid` 
access right flag set which allows it to launch the container with the 
permissions of the YARN application user.
+
+###Configuration
+
+  The configured directories for `yarn.nodemanager.local-dirs` and 
`yarn.nodemanager.log-dirs` must be owned by the configured NodeManager user 
(`yarn`) and group (`hadoop`). The permission set on these directories must be 
`drwxr-xr-x`.
+
+  The `container-executor` program must be owned by `root` and have the 
permission set `---sr-s---`.
+
+  To configure the `NodeManager` to use the `LinuxContainerExecutor` set the 
following in the **conf/yarn-site.xml**:
+
+```xml
+<property>
+  <name>yarn.nodemanager.container-executor.class</name>
+  
<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>
+</property>
+
+<property>
+  <name>yarn.nodemanager.linux-container-executor.group</name>
+  <value>hadoop</value>
+</property>
+```
+
+  Additionally the LCE requires the `container-executor.cfg` file, which is 
read by the `container-executor` program.
+
+```
+yarn.nodemanager.linux-container-executor.group=#configured value of 
yarn.nodemanager.linux-container-executor.group
+banned.users=#comma separated list of users who can not run applications
+allowed.system.users=#comma separated list of allowed system users
+min.user.id=1000#Prevent other super-users
+```
+
+###Windows Secure Container Executor (WSCE)
+
+  The Windows environment secure container executor is the 
`WindowsSecureContainerExecutor`. It uses the Windows S4U infrastructure to 
launch the container as the YARN application user. The WSCE requires the 
presense of the `hadoopwinutilsvc` service. This services is hosted by 
`%HADOOP_HOME%\bin\winutils.exe` started with the `service` command line 
argument. This service offers some privileged operations that require 
LocalSystem authority so that the NM is not required to run the entire JVM and 
all the NM code in an elevated context. The NM interacts with the 
`hadoopwintulsvc` service by means of Local RPC (LRPC) via calls JNI to the RCP 
client hosted in `hadoop.dll`.
+
+###Configuration
+
+  To configure the `NodeManager` to use the `WindowsSecureContainerExecutor` 
set the following in the **conf/yarn-site.xml**:
+
+```xml
+        <property>
+          <name>yarn.nodemanager.container-executor.class</name>
+          
<value>org.apache.hadoop.yarn.server.nodemanager.WindowsSecureContainerExecutor</value>
+        </property>
+
+        <property>
+          <name>yarn.nodemanager.windows-secure-container-executor.group</name>
+          <value>yarn</value>
+        </property>
+```
+   
+  The hadoopwinutilsvc uses `%HADOOP_HOME%\etc\hadoop\wsce_site.xml` to 
configure access to the privileged operations.
+
+```xml
+<property>
+ 
<name>yarn.nodemanager.windows-secure-container-executor.impersonate.allowed</name>
+  <value>HadoopUsers</value>
+</property>
+
+<property>
+  
<name>yarn.nodemanager.windows-secure-container-executor.impersonate.denied</name>
+  <value>HadoopServices,Administrators</value>
+</property>
+
+<property>
+  <name>yarn.nodemanager.windows-secure-container-executor.allowed</name>
+  <value>nodemanager</value>
+</property>
+
+<property>
+  <name>yarn.nodemanager.windows-secure-container-executor.local-dirs</name>
+  <value>nm-local-dir, nm-log-dirs</value>
+</property>
+
+<property>
+  <name>yarn.nodemanager.windows-secure-container-executor.job-name</name>
+  <value>nodemanager-job-name</value>
+</property>  
+```
+
+  `yarn.nodemanager.windows-secure-container-executor.allowed` should contain 
the name of the service account running the nodemanager. This user will be 
allowed to access the hadoopwintuilsvc functions.
+
+  `yarn.nodemanager.windows-secure-container-executor.impersonate.allowed` 
should contain the users that are allowed to create containers in the cluster. 
These users will be allowed to be impersonated by hadoopwinutilsvc.
+
+  `yarn.nodemanager.windows-secure-container-executor.impersonate.denied` 
should contain users that are explictly forbiden from creating containers. 
hadoopwinutilsvc will refuse to impersonate these users.
+
+  `yarn.nodemanager.windows-secure-container-executor.local-dirs` should 
contain the nodemanager local dirs. hadoopwinutilsvc will allow only file 
operations under these directories. This should contain the same values as 
`$yarn.nodemanager.local-dirs, $yarn.nodemanager.log-dirs` but note that 
hadoopwinutilsvc XML configuration processing does not do substitutions so the 
value must be the final value. All paths must be absolute and no environment 
variable substitution will be performed. The paths are compared 
LOCAL\_INVARIANT case insensitive string comparison, the file path validated 
must start with one of the paths listed in local-dirs configuration. Use comma 
as path separator:`,`
+
+  `yarn.nodemanager.windows-secure-container-executor.job-name` should contain 
an Windows NT job name that all containers should be added to. This 
configuration is optional. If not set, the container is not added to a global 
NodeManager job. Normally this should be set to the job that the NM is assigned 
to, so that killing the NM kills also all containers. Hadoopwinutilsvc will not 
attempt to create this job, the job must exists when the container is launched. 
If the value is set and the job does not exists, container launch will fail 
with error 2 `The system cannot find the file specified`. Note that this global 
NM job is not related to the container job, which always gets created for each 
container and is named after the container ID. This setting controls a global 
job that spans all containers and the parent NM, and as such it requires nested 
jobs. Nested jobs are available only post Windows 8 and Windows Server 2012.
+
+####Useful Links
+
+  * [Exploring S4U Kerberos Extensions in Windows Server 
2003](http://msdn.microsoft.com/en-us/magazine/cc188757.aspx)
+
+  * [Nested 
Jobs](http://msdn.microsoft.com/en-us/library/windows/desktop/hh448388.aspx)
+
+  * [Winutils needs ability to create task as domain 
user](https://issues.apache.org/jira/browse/YARN-1063)
+
+  * [Implement secure Windows Container 
Executor](https://issues.apache.org/jira/browse/YARN-1972)
+
+  * [Remove the need to run NodeManager as privileged account for Windows 
Secure Container Executor](https://issues.apache.org/jira/browse/YARN-2198)
+
+

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
new file mode 100644
index 0000000..4889936
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/TimelineServer.md
@@ -0,0 +1,231 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+YARN Timeline Server
+====================
+
+* [Overview](#Overview)
+* [Current Status](#Current_Status)
+* [Basic Configuration](#Basic_Configuration)
+* [Advanced Configuration](#Advanced_Configuration)
+* [Generic-data related Configuration](#Generic-data_related_Configuration)
+* [Per-framework-date related 
Configuration](#Per-framework-date_related_Configuration)
+* [Running Timeline server](#Running_Timeline_server)
+* [Accessing generic-data via 
command-line](#Accessing_generic-data_via_command-line)
+* [Publishing of per-framework data by 
applications](#Publishing_of_per-framework_data_by_applications)
+
+Overview
+--------
+
+Storage and retrieval of applications' current as well as historic information 
in a generic fashion is solved in YARN through the Timeline Server (previously 
also called Generic Application History Server). This serves two 
responsibilities:
+
+* Generic information about completed applications
+    
+    Generic information includes application level data like queue-name, user 
information etc in the ApplicationSubmissionContext, list of 
application-attempts that ran for an application, information about each 
application-attempt, list of containers run under each application-attempt, and 
information about each container. Generic data is stored by ResourceManager to 
a history-store (default implementation on a file-system) and used by the 
web-UI to display information about completed applications.
+
+* Per-framework information of running and completed applications
+    
+    Per-framework information is completely specific to an application or 
framework. For example, Hadoop MapReduce framework can include pieces of 
information like number of map tasks, reduce tasks, counters etc. Application 
developers can publish the specific information to the Timeline server via 
TimelineClient from within a client, the ApplicationMaster and/or the 
application's containers. This information is then queryable via REST APIs for 
rendering by application/framework specific UIs.
+
+Current Status
+--------------
+
+Timeline sever is a work in progress. The basic storage and retrieval of 
information, both generic and framework specific, are in place. Timeline server 
doesn't work in secure mode yet. The generic information and the per-framework 
information are today collected and presented separately and thus are not 
integrated well together. Finally, the per-framework information is only 
available via RESTful APIs, using JSON type content - ability to install 
framework specific UIs in YARN isn't supported yet.
+
+Basic Configuration
+-------------------
+
+Users need to configure the Timeline server before starting it. The simplest 
configuration you should add in `yarn-site.xml` is to set the hostname of the 
Timeline server.
+
+```xml
+<property>
+  <description>The hostname of the Timeline service web 
application.</description>
+  <name>yarn.timeline-service.hostname</name>
+  <value>0.0.0.0</value>
+</property>
+```
+
+Advanced Configuration
+----------------------
+
+In addition to the hostname, admins can also configure whether the service is 
enabled or not, the ports of the RPC and the web interfaces, and the number of 
RPC handler threads.
+
+```xml
+<property>
+  <description>Address for the Timeline server to start the RPC 
server.</description>
+  <name>yarn.timeline-service.address</name>
+  <value>${yarn.timeline-service.hostname}:10200</value>
+</property>
+
+<property>
+  <description>The http address of the Timeline service web 
application.</description>
+  <name>yarn.timeline-service.webapp.address</name>
+  <value>${yarn.timeline-service.hostname}:8188</value>
+</property>
+
+<property>
+  <description>The https address of the Timeline service web 
application.</description>
+  <name>yarn.timeline-service.webapp.https.address</name>
+  <value>${yarn.timeline-service.hostname}:8190</value>
+</property>
+
+<property>
+  <description>Handler thread count to serve the client RPC 
requests.</description>
+  <name>yarn.timeline-service.handler-thread-count</name>
+  <value>10</value>
+</property>
+
+<property>
+  <description>Enables cross-origin support (CORS) for web services where
+  cross-origin web response headers are needed. For example, javascript making
+  a web services request to the timeline server.</description>
+  <name>yarn.timeline-service.http-cross-origin.enabled</name>
+  <value>false</value>
+</property>
+
+<property>
+  <description>Comma separated list of origins that are allowed for web
+  services needing cross-origin (CORS) support. Wildcards (*) and patterns
+  allowed</description>
+  <name>yarn.timeline-service.http-cross-origin.allowed-origins</name>
+  <value>*</value>
+</property>
+
+<property>
+  <description>Comma separated list of methods that are allowed for web
+  services needing cross-origin (CORS) support.</description>
+  <name>yarn.timeline-service.http-cross-origin.allowed-methods</name>
+  <value>GET,POST,HEAD</value>
+</property>
+
+<property>
+  <description>Comma separated list of headers that are allowed for web
+  services needing cross-origin (CORS) support.</description>
+  <name>yarn.timeline-service.http-cross-origin.allowed-headers</name>
+  <value>X-Requested-With,Content-Type,Accept,Origin</value>
+</property>
+
+<property>
+  <description>The number of seconds a pre-flighted request can be cached
+  for web services needing cross-origin (CORS) support.</description>
+  <name>yarn.timeline-service.http-cross-origin.max-age</name>
+  <value>1800</value>
+</property>
+```
+
+Generic-data related Configuration
+----------------------------------
+
+Users can specify whether the generic data collection is enabled or not, and 
also choose the storage-implementation class for the generic data. There are 
more configurations related to generic data collection, and users can refer to 
`yarn-default.xml` for all of them.
+
+```xml
+<property>
+  <description>Indicate to ResourceManager as well as clients whether
+  history-service is enabled or not. If enabled, ResourceManager starts
+  recording historical data that Timelien service can consume. Similarly,
+  clients can redirect to the history service when applications
+  finish if this is enabled.</description>
+  <name>yarn.timeline-service.generic-application-history.enabled</name>
+  <value>false</value>
+</property>
+
+<property>
+  <description>Store class name for history store, defaulting to file system
+  store</description>
+  <name>yarn.timeline-service.generic-application-history.store-class</name>
+  
<value>org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore</value>
+</property>
+```
+
+Per-framework-date related Configuration
+----------------------------------------
+
+Users can specify whether per-framework data service is enabled or not, choose 
the store implementation for the per-framework data, and tune the retention of 
the per-framework data. There are more configurations related to per-framework 
data service, and users can refer to `yarn-default.xml` for all of them.
+
+```xml
+<property>
+  <description>Indicate to clients whether Timeline service is enabled or not.
+  If enabled, the TimelineClient library used by end-users will post entities
+  and events to the Timeline server.</description>
+  <name>yarn.timeline-service.enabled</name>
+  <value>true</value>
+</property>
+
+<property>
+  <description>Store class name for timeline store.</description>
+  <name>yarn.timeline-service.store-class</name>
+  <value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
+</property>
+
+<property>
+  <description>Enable age off of timeline store data.</description>
+  <name>yarn.timeline-service.ttl-enable</name>
+  <value>true</value>
+</property>
+
+<property>
+  <description>Time to live for timeline store data in 
milliseconds.</description>
+  <name>yarn.timeline-service.ttl-ms</name>
+  <value>604800000</value>
+</property>
+```
+
+Running Timeline server
+-----------------------
+
+Assuming all the aforementioned configurations are set properly, admins can 
start the Timeline server/history service with the following command:
+
+      $ yarn timelineserver
+
+Or users can start the Timeline server / history service as a daemon:
+
+      $ yarn --daemon start timelineserver
+
+Accessing generic-data via command-line
+---------------------------------------
+
+Users can access applications' generic historic data via the command line as 
below. Note that the same commands are usable to obtain the corresponding 
information about running applications.
+
+```
+      $ yarn application -status <Application ID>
+      $ yarn applicationattempt -list <Application ID>
+      $ yarn applicationattempt -status <Application Attempt ID>
+      $ yarn container -list <Application Attempt ID>
+      $ yarn container -status <Container ID>
+```
+
+Publishing of per-framework data by applications
+------------------------------------------------
+
+Developers can define what information they want to record for their 
applications by composing `TimelineEntity` and `TimelineEvent` objects, and put 
the entities and events to the Timeline server via `TimelineClient`. Following 
is an example:
+
+```java
+// Create and start the Timeline client
+TimelineClient client = TimelineClient.createTimelineClient();
+client.init(conf);
+client.start();
+
+TimelineEntity entity = null;
+// Compose the entity
+try {
+  TimelinePutResponse response = client.putEntities(entity);
+} catch (IOException e) {
+  // Handle the exception
+} catch (YarnException e) {
+  // Handle the exception
+}
+
+// Stop the Timeline client
+client.stop();
+```

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebApplicationProxy.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebApplicationProxy.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebApplicationProxy.md
new file mode 100644
index 0000000..8d6187d
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebApplicationProxy.md
@@ -0,0 +1,24 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Web Application Proxy
+=====================
+
+The Web Application Proxy is part of YARN. By default it will run as part of 
the Resource Manager(RM), but can be configured to run in stand alone mode. The 
reason for the proxy is to reduce the possibility of web based attacks through 
YARN.
+
+In YARN the Application Master(AM) has the responsibility to provide a web UI 
and to send that link to the RM. This opens up a number of potential issues. 
The RM runs as a trusted user, and people visiting that web address will treat 
it, and links it provides to them as trusted, when in reality the AM is running 
as a non-trusted user, and the links it gives to the RM could point to anything 
malicious or otherwise. The Web Application Proxy mitigates this risk by 
warning users that do not own the given application that they are connecting to 
an untrusted site.
+
+In addition to this the proxy also tries to reduce the impact that a malicious 
AM could have on a user. It primarily does this by stripping out cookies from 
the user, and replacing them with a single cookie providing the user name of 
the logged in user. This is because most web based authentication systems will 
identify a user based off of a cookie. By providing this cookie to an untrusted 
application it opens up the potential for an exploit. If the cookie is designed 
properly that potential should be fairly minimal, but this is just to reduce 
that potential attack vector. The current proxy implementation does nothing to 
prevent the AM from providing links to malicious external sites, nor does it do 
anything to prevent malicious javascript code from running as well. In fact 
javascript can be used to get the cookies, so stripping the cookies from the 
request has minimal benefit at this time.
+
+In the future we hope to address the attack vectors described above and make 
attaching to an AM's web UI safer.

http://git-wip-us.apache.org/repos/asf/hadoop/blob/2e44b75f/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
----------------------------------------------------------------------
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
new file mode 100644
index 0000000..0e89a50
--- /dev/null
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/WebServicesIntro.md
@@ -0,0 +1,569 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Hadoop YARN - Introduction to the web services REST API's
+==========================================================
+
+* [Overview](#Overview)
+* [URI's](#URIs)
+* [HTTP Requests](#HTTP_Requests)
+    * [Summary of HTTP operations](#Summary_of_HTTP_operations)
+    * [Security](#Security)
+    * [Headers Supported](#Headers_Supported)
+* [HTTP Responses](#HTTP_Responses)
+    * [Compression](#Compression)
+    * [Response Formats](#Response_Formats)
+    * [Response Errors](#Response_Errors)
+    * [Response Examples](#Response_Examples)
+* [Sample Usage](#Sample_Usage)
+
+Overview
+--------
+
+The Hadoop YARN web service REST APIs are a set of URI resources that give 
access to the cluster, nodes, applications, and application historical 
information. The URI resources are grouped into APIs based on the type of 
information returned. Some URI resources return collections while others return 
singletons.
+
+URI's
+-----
+
+The URIs for the REST-based Web services have the following syntax:
+
+      http://{http address of service}/ws/{version}/{resourcepath}
+
+The elements in this syntax are as follows:
+
+      {http address of service} - The http address of the service to get 
information about. 
+                                  Currently supported are the ResourceManager, 
NodeManager, 
+                                  MapReduce application master, and history 
server.
+      {version} - The version of the APIs. In this release, the version is v1.
+      {resourcepath} - A path that defines a singleton resource or a 
collection of resources. 
+
+HTTP Requests
+-------------
+
+To invoke a REST API, your application calls an HTTP operation on the URI 
associated with a resource.
+
+### Summary of HTTP operations
+
+Currently only GET is supported. It retrieves information about the resource 
specified.
+
+### Security
+
+The web service REST API's go through the same security as the web UI. If your 
cluster adminstrators have filters enabled you must authenticate via the 
mechanism they specified.
+
+### Headers Supported
+
+      * Accept 
+      * Accept-Encoding
+
+Currently the only fields used in the header is `Accept` and 
`Accept-Encoding`. `Accept` currently supports XML and JSON for the response 
type you accept. `Accept-Encoding` currently supports only gzip format and will 
return gzip compressed output if this is specified, otherwise output is 
uncompressed. All other header fields are ignored.
+
+HTTP Responses
+--------------
+
+The next few sections describe some of the syntax and other details of the 
HTTP Responses of the web service REST APIs.
+
+### Compression
+
+This release supports gzip compression if you specify gzip in the 
Accept-Encoding header of the HTTP request (Accept-Encoding: gzip).
+
+### Response Formats
+
+This release of the web service REST APIs supports responses in JSON and XML 
formats. JSON is the default. To set the response format, you can specify the 
format in the Accept header of the HTTP request.
+
+As specified in HTTP Response Codes, the response body can contain the data 
that represents the resource or an error message. In the case of success, the 
response body is in the selected format, either JSON or XML. In the case of 
error, the resonse body is in either JSON or XML based on the format requested. 
The Content-Type header of the response contains the format requested. If the 
application requests an unsupported format, the response status code is 500. 
Note that the order of the fields within response body is not specified and 
might change. Also, additional fields might be added to a response body. 
Therefore, your applications should use parsing routines that can extract data 
from a response body in any order.
+
+### Response Errors
+
+After calling an HTTP request, an application should check the response status 
code to verify success or detect an error. If the response status code 
indicates an error, the response body contains an error message. The first 
field is the exception type, currently only RemoteException is returned. The 
following table lists the items within the RemoteException error message:
+
+|      Item | Data Type |          Description |
+|:---- |:---- |:---- |
+|   exception |   String |         Exception type |
+| javaClassName |   String |  Java class name of exception |
+|    message |   String | Detailed message of exception |
+
+### Response Examples
+
+#### JSON response with single resource
+
+HTTP Request: GET 
http://rmhost.domain:8088/ws/v1/cluster/app/application\_1324057493980\_0001
+
+Response Status Line: HTTP/1.1 200 OK
+
+Response Header:
+
+      HTTP/1.1 200 OK
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+  app":
+  {
+    "id":"application_1324057493980_0001",
+    "user":"user1",
+    "name":"",
+    "queue":"default",
+    "state":"ACCEPTED",
+    "finalStatus":"UNDEFINED",
+    "progress":0,
+    "trackingUI":"UNASSIGNED",
+    "diagnostics":"",
+    "clusterId":1324057493980,
+    "startedTime":1324057495921,
+    "finishedTime":0,
+    "elapsedTime":2063,
+    
"amContainerLogs":"http:\/\/amNM:2\/node\/containerlogs\/container_1324057493980_0001_01_000001",
+    "amHostHttpAddress":"amNM:2"
+  }
+}
+```
+
+#### JSON response with Error response
+
+Here we request information about an application that doesn't exist yet.
+
+HTTP Request: GET 
http://rmhost.domain:8088/ws/v1/cluster/app/application\_1324057493980\_9999
+
+Response Status Line: HTTP/1.1 404 Not Found
+
+Response Header:
+
+      HTTP/1.1 404 Not Found
+      Content-Type: application/json
+      Transfer-Encoding: chunked
+      Server: Jetty(6.1.26)
+
+Response Body:
+
+```json
+{
+   "RemoteException" : {
+      "javaClassName" : "org.apache.hadoop.yarn.webapp.NotFoundException",
+      "exception" : "NotFoundException",
+      "message" : "java.lang.Exception: app with id: 
application_1324057493980_9999 not found"
+   }
+}
+```
+
+Sample Usage
+-------------
+
+You can use any number of ways/languages to use the web services REST API's. 
This example uses the curl command line interface to do the REST GET calls.
+
+In this example, a user submits a MapReduce application to the ResourceManager 
using a command like:
+
+      hadoop jar hadoop-mapreduce-test.jar sleep -Dmapred.job.queue.name=a1 -m 
1 -r 1 -rt 1200000 -mt 20
+
+The client prints information about the job submitted along with the 
application id, similar to:
+
+    12/01/18 04:25:15 INFO mapred.ResourceMgrDelegate: Submitted application 
application_1326821518301_0010 to ResourceManager at 
host.domain.com/10.10.10.10:8032
+    12/01/18 04:25:15 INFO mapreduce.Job: Running job: job_1326821518301_0010
+    12/01/18 04:25:21 INFO mapred.ClientServiceDelegate: The url to track the 
job: host.domain.com:8088/proxy/application_1326821518301_0010/
+    12/01/18 04:25:22 INFO mapreduce.Job: Job job_1326821518301_0010 running 
in uber mode : false
+    12/01/18 04:25:22 INFO mapreduce.Job:  map 0% reduce 0%
+
+The user then wishes to track the application. The users starts by getting the 
information about the application from the ResourceManager. Use the 
--comopressed option to request output compressed. curl handles uncompressing 
on client side.
+
+    curl --compressed -H "Accept: application/json" -X GET 
"http://host.domain.com:8088/ws/v1/cluster/apps/application_1326821518301_0010"; 
+
+Output:
+
+```json
+{
+   "app" : {
+      "finishedTime" : 0,
+      "amContainerLogs" : 
"http://host.domain.com:8042/node/containerlogs/container_1326821518301_0010_01_000001";,
+      "trackingUI" : "ApplicationMaster",
+      "state" : "RUNNING",
+      "user" : "user1",
+      "id" : "application_1326821518301_0010",
+      "clusterId" : 1326821518301,
+      "finalStatus" : "UNDEFINED",
+      "amHostHttpAddress" : "host.domain.com:8042",
+      "progress" : 82.44703,
+      "name" : "Sleep job",
+      "startedTime" : 1326860715335,
+      "elapsedTime" : 31814,
+      "diagnostics" : "",
+      "trackingUrl" : 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/";,
+      "queue" : "a1"
+   }
+}
+```
+
+The user then wishes to get more details about the running application and 
goes directly to the MapReduce application master for this application. The 
ResourceManager lists the trackingUrl that can be used for this application: 
http://host.domain.com:8088/proxy/application\_1326821518301\_0010. This could 
either go to the web browser or use the web service REST API's. The user uses 
the web services REST API's to get the list of jobs this MapReduce application 
master is running:
+
+     curl --compressed -H "Accept: application/json" -X GET 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs";
+
+Output:
+
+```json
+{
+   "jobs" : {
+      "job" : [
+         {
+            "runningReduceAttempts" : 1,
+            "reduceProgress" : 72.104515,
+            "failedReduceAttempts" : 0,
+            "newMapAttempts" : 0,
+            "mapsRunning" : 0,
+            "state" : "RUNNING",
+            "successfulReduceAttempts" : 0,
+            "reducesRunning" : 1,
+            "acls" : [
+               {
+                  "value" : " ",
+                  "name" : "mapreduce.job.acl-modify-job"
+               },
+               {
+                  "value" : " ",
+                  "name" : "mapreduce.job.acl-view-job"
+               }
+            ],
+            "reducesPending" : 0,
+            "user" : "user1",
+            "reducesTotal" : 1,
+            "mapsCompleted" : 1,
+            "startTime" : 1326860720902,
+            "id" : "job_1326821518301_10_10",
+            "successfulMapAttempts" : 1,
+            "runningMapAttempts" : 0,
+            "newReduceAttempts" : 0,
+            "name" : "Sleep job",
+            "mapsPending" : 0,
+            "elapsedTime" : 64432,
+            "reducesCompleted" : 0,
+            "mapProgress" : 100,
+            "diagnostics" : "",
+            "failedMapAttempts" : 0,
+            "killedReduceAttempts" : 0,
+            "mapsTotal" : 1,
+            "uberized" : false,
+            "killedMapAttempts" : 0,
+            "finishTime" : 0
+         }
+      ]
+   }
+}
+```
+
+The user then wishes to get the task details about the job with job id 
job\_1326821518301\_10\_10 that was listed above.
+
+     curl --compressed -H "Accept: application/json" -X GET 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks";
 
+
+Output:
+
+```json
+{
+   "tasks" : {
+      "task" : [
+         {
+            "progress" : 100,
+            "elapsedTime" : 5059,
+            "state" : "SUCCEEDED",
+            "startTime" : 1326860725014,
+            "id" : "task_1326821518301_10_10_m_0",
+            "type" : "MAP",
+            "successfulAttempt" : "attempt_1326821518301_10_10_m_0_0",
+            "finishTime" : 1326860730073
+         },
+         {
+            "progress" : 72.104515,
+            "elapsedTime" : 0,
+            "state" : "RUNNING",
+            "startTime" : 1326860732984,
+            "id" : "task_1326821518301_10_10_r_0",
+            "type" : "REDUCE",
+            "successfulAttempt" : "",
+            "finishTime" : 0
+         }
+      ]
+   }
+}
+```
+
+The map task has finished but the reduce task is still running. The users 
wishes to get the task attempt information for the reduce task 
task\_1326821518301\_10\_10\_r\_0, note that the Accept header isn't really 
required here since JSON is the default output format:
+
+      curl --compressed -X GET 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts";
+
+Output:
+
+```json
+{
+   "taskAttempts" : {
+      "taskAttempt" : [
+         {
+            "elapsedMergeTime" : 158,
+            "shuffleFinishTime" : 1326860735378,
+            "assignedContainerId" : "container_1326821518301_0010_01_000003",
+            "progress" : 72.104515,
+            "elapsedTime" : 0,
+            "state" : "RUNNING",
+            "elapsedShuffleTime" : 2394,
+            "mergeFinishTime" : 1326860735536,
+            "rack" : "/10.10.10.0",
+            "elapsedReduceTime" : 0,
+            "nodeHttpAddress" : "host.domain.com:8042",
+            "type" : "REDUCE",
+            "startTime" : 1326860732984,
+            "id" : "attempt_1326821518301_10_10_r_0_0",
+            "finishTime" : 0
+         }
+      ]
+   }
+}
+```
+
+The reduce attempt is still running and the user wishes to see the current 
counter values for that attempt:
+
+     curl --compressed -H "Accept: application/json"  -X GET 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/ws/v1/mapreduce/jobs/job_1326821518301_10_10/tasks/task_1326821518301_10_10_r_0/attempts/attempt_1326821518301_10_10_r_0_0/counters";
 
+
+Output:
+
+```json
+{
+   "JobTaskAttemptCounters" : {
+      "taskAttemptCounterGroup" : [
+         {
+            "counterGroupName" : 
"org.apache.hadoop.mapreduce.FileSystemCounter",
+            "counter" : [
+               {
+                  "value" : 4216,
+                  "name" : "FILE_BYTES_READ"
+               }, 
+               {
+                  "value" : 77151,
+                  "name" : "FILE_BYTES_WRITTEN"
+               }, 
+               {
+                  "value" : 0,
+                  "name" : "FILE_READ_OPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "FILE_LARGE_READ_OPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "FILE_WRITE_OPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "HDFS_BYTES_READ"
+               },
+               {
+                  "value" : 0,
+                  "name" : "HDFS_BYTES_WRITTEN"
+               },
+               {
+                  "value" : 0,
+                  "name" : "HDFS_READ_OPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "HDFS_LARGE_READ_OPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "HDFS_WRITE_OPS"
+               }
+            ]  
+         }, 
+         {
+            "counterGroupName" : "org.apache.hadoop.mapreduce.TaskCounter",
+            "counter" : [
+               {
+                  "value" : 0,
+                  "name" : "COMBINE_INPUT_RECORDS"
+               }, 
+               {
+                  "value" : 0,
+                  "name" : "COMBINE_OUTPUT_RECORDS"
+               }, 
+               {  
+                  "value" : 1767,
+                  "name" : "REDUCE_INPUT_GROUPS"
+               },
+               {  
+                  "value" : 25104,
+                  "name" : "REDUCE_SHUFFLE_BYTES"
+               },
+               {
+                  "value" : 1767,
+                  "name" : "REDUCE_INPUT_RECORDS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "REDUCE_OUTPUT_RECORDS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "SPILLED_RECORDS"
+               },
+               {
+                  "value" : 1,
+                  "name" : "SHUFFLED_MAPS"
+               },
+               {
+                  "value" : 0,
+                  "name" : "FAILED_SHUFFLE"
+               },
+               {
+                  "value" : 1,
+                  "name" : "MERGED_MAP_OUTPUTS"
+               },
+               {
+                  "value" : 50,
+                  "name" : "GC_TIME_MILLIS"
+               },
+               {
+                  "value" : 1580,
+                  "name" : "CPU_MILLISECONDS"
+               },
+               {
+                  "value" : 141320192,
+                  "name" : "PHYSICAL_MEMORY_BYTES"
+               },
+              {
+                  "value" : 1118552064,
+                  "name" : "VIRTUAL_MEMORY_BYTES"
+               }, 
+               {  
+                  "value" : 73728000,
+                  "name" : "COMMITTED_HEAP_BYTES"
+               }
+            ]
+         },
+         {  
+            "counterGroupName" : "Shuffle Errors",
+            "counter" : [
+               {  
+                  "value" : 0,
+                  "name" : "BAD_ID"
+               },
+               {  
+                  "value" : 0,
+                  "name" : "CONNECTION"
+               },
+               {  
+                  "value" : 0,
+                  "name" : "IO_ERROR"
+               },
+               {  
+                  "value" : 0,
+                  "name" : "WRONG_LENGTH"
+               },
+               {  
+                  "value" : 0,
+                  "name" : "WRONG_MAP"
+               },
+               {  
+                  "value" : 0,
+                  "name" : "WRONG_REDUCE"
+               }
+            ]
+         },
+         {  
+            "counterGroupName" : 
"org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter",
+            "counter" : [
+              {  
+                  "value" : 0,
+                  "name" : "BYTES_WRITTEN"
+               }
+            ]
+         }
+      ],
+      "id" : "attempt_1326821518301_10_10_r_0_0"
+   }
+}
+```
+
+The job finishes and the user wishes to get the final job information from the 
history server for this job.
+
+      curl --compressed -X GET 
"http://host.domain.com:19888/ws/v1/history/mapreduce/jobs/job_1326821518301_10_10";
 
+
+Output:
+
+```json
+{
+   "job" : {
+      "avgReduceTime" : 1250784,
+      "failedReduceAttempts" : 0,
+      "state" : "SUCCEEDED",
+      "successfulReduceAttempts" : 1,
+      "acls" : [
+         {
+            "value" : " ",
+            "name" : "mapreduce.job.acl-modify-job"
+         },
+         {
+            "value" : " ",
+            "name" : "mapreduce.job.acl-view-job"
+         }
+      ],
+      "user" : "user1",
+      "reducesTotal" : 1,
+      "mapsCompleted" : 1,
+      "startTime" : 1326860720902,
+      "id" : "job_1326821518301_10_10",
+      "avgMapTime" : 5059,
+      "successfulMapAttempts" : 1,
+      "name" : "Sleep job",
+      "avgShuffleTime" : 2394,
+      "reducesCompleted" : 1,
+      "diagnostics" : "",
+      "failedMapAttempts" : 0,
+      "avgMergeTime" : 2552,
+      "killedReduceAttempts" : 0,
+      "mapsTotal" : 1,
+      "queue" : "a1",
+      "uberized" : false,
+      "killedMapAttempts" : 0,
+      "finishTime" : 1326861986164
+   }
+}
+```
+
+The user also gets the final applications information from the ResourceManager.
+
+      curl --compressed -H "Accept: application/json" -X GET 
"http://host.domain.com:8088/ws/v1/cluster/apps/application_1326821518301_0010"; 
+
+Output:
+
+```json
+{
+   "app" : {
+      "finishedTime" : 1326861991282,
+      "amContainerLogs" : 
"http://host.domain.com:8042/node/containerlogs/container_1326821518301_0010_01_000001";,
+      "trackingUI" : "History",
+      "state" : "FINISHED",
+      "user" : "user1",
+      "id" : "application_1326821518301_0010",
+      "clusterId" : 1326821518301,
+      "finalStatus" : "SUCCEEDED",
+      "amHostHttpAddress" : "host.domain.com:8042",
+      "progress" : 100,
+      "name" : "Sleep job",
+      "startedTime" : 1326860715335,
+      "elapsedTime" : 1275947,
+      "diagnostics" : "",
+      "trackingUrl" : 
"http://host.domain.com:8088/proxy/application_1326821518301_0010/jobhistory/job/job_1326821518301_10_10";,
+      "queue" : "a1"
+   }
+}
+```
\ No newline at end of file

[05/43] hadoop git commit: YARN-3168. Convert site documentation from apt to markdown (Gururaj Shetty via aw)

Reply via email to