Repository: hadoop Updated Branches: refs/heads/trunk c3d475070 -> 1e215e8ba
YARN-2632. Document NM Restart feature. Contributed by Junping Du and Vinod Kumar Vavilapalli Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/1e215e8b Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/1e215e8b Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/1e215e8b Branch: refs/heads/trunk Commit: 1e215e8ba2e801eb26f16c307daee756d6b2ca66 Parents: c3d4750 Author: Jason Lowe <jl...@apache.org> Authored: Fri Nov 7 23:40:22 2014 +0000 Committer: Jason Lowe <jl...@apache.org> Committed: Fri Nov 7 23:40:22 2014 +0000 ---------------------------------------------------------------------- hadoop-project/src/site/site.xml | 1 + hadoop-yarn-project/CHANGES.txt | 3 + .../src/site/apt/NodeManagerRestart.apt.vm | 86 ++++++++++++++++++++ 3 files changed, 90 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-project/src/site/site.xml ---------------------------------------------------------------------- diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index e1d4c92..6a61a83 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -123,6 +123,7 @@ <item name="Writing YARN Applications" href="hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html"/> <item name="YARN Commands" href="hadoop-yarn/hadoop-yarn-site/YarnCommands.html"/> <item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/> + <item name="NodeManager Restart" href="hadoop-yarn/hadoop-yarn-site/NodeManagerRestart.html"/> </menu> <menu name="YARN REST APIs" inherit="top"> http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/CHANGES.txt ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/CHANGES.txt b/hadoop-yarn-project/CHANGES.txt index e4b116d..df7e3ea 100644 --- a/hadoop-yarn-project/CHANGES.txt +++ b/hadoop-yarn-project/CHANGES.txt @@ -195,6 +195,9 @@ Release 2.6.0 - UNRELEASED YARN-2647. Added a queue CLI for getting queue information. (Sunil Govind via vinodkv) + YARN-2632. Document NM Restart feature. (Junping Du and Vinod Kumar + Vavilapalli via jlowe) + IMPROVEMENTS YARN-2197. Add a link to YARN CHANGES.txt in the left side of doc http://git-wip-us.apache.org/repos/asf/hadoop/blob/1e215e8b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm ---------------------------------------------------------------------- diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm new file mode 100644 index 0000000..ba03f4e --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRestart.apt.vm @@ -0,0 +1,86 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + NodeManager Restart + --- + --- + ${maven.build.timestamp} + +NodeManager Restart + +* Introduction + + This document gives an overview of NodeManager (NM) restart, a feature that + enables the NodeManager to be restarted without losing + the active containers running on the node. At a high level, the NM stores any + necessary state to a local state-store as it processes container-management + requests. When the NM restarts, it recovers by first loading state for + various subsystems and then letting those subsystems perform recovery using + the loaded state. + +* Enabling NM Restart + + [[1]] To enable NM Restart functionality, set the following property in <<conf/yarn-site.xml>> to true: + +*--------------------------------------+--------------------------------------+ +|| Property || Value | +*--------------------------------------+--------------------------------------+ +| <<<yarn.nodemanager.recovery.enabled>>> | | +| | <<<true>>>, (default value is set to false) | +*--------------------------------------+--------------------------------------+ + + [[2]] Configure a path to the local file-system directory where the + NodeManager can save its run state + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<<yarn.nodemanager.recovery.dir>>> | | +| | The local filesystem directory in which the node manager will store state | +| | when recovery is enabled. | +| | The default value is set to | +| | <<<${hadoop.tmp.dir}/yarn-nm-recovery>>>. | +*--------------------------------------+--------------------------------------+ + + [[3]] Configure a valid RPC address for the NodeManager + +*--------------------------------------+--------------------------------------+ +|| Property || Description | +*--------------------------------------+--------------------------------------+ +| <<<yarn.nodemanager.address>>> | | +| | Ephemeral ports (port 0, which is default) cannot be used for the | +| | NodeManager's RPC server specified via yarn.nodemanager.address as it can | +| | make NM use different ports before and after a restart. This will break any | +| | previously running clients that were communicating with the NM before | +| | restart. Explicitly setting yarn.nodemanager.address to an address with | +| | specific port number (for e.g 0.0.0.0:45454) is a precondition for enabling | +| | NM restart. | +*--------------------------------------+--------------------------------------+ + + [[4]] Auxiliary services + + NodeManagers in a YARN cluster can be configured to run auxiliary services. + For a completely functional NM restart, YARN relies on any auxiliary service + configured to also support recovery. This usually includes (1) avoiding usage + of ephemeral ports so that previously running clients (in this case, usually + containers) are not disrupted after restart and (2) having the auxiliary + service itself support recoverability by reloading any previous state when + NodeManager restarts and reinitializes the auxiliary service. + + A simple example for the above is the auxiliary service 'ShuffleHandler' for + MapReduce (MR). ShuffleHandler respects the above two requirements already, + so users/admins don't have do anything for it to support NM restart: (1) The + configuration property <<mapreduce.shuffle.port>> controls which port the + ShuffleHandler on a NodeManager host binds to, and it defaults to a + non-ephemeral port. (2) The ShuffleHandler service also already supports + recovery of previous state after NM restarts. \ No newline at end of file