Author: stevel Date: Wed Mar 25 17:59:26 2015 New Revision: 1669188 URL: http://svn.apache.org/r1669188 Log: SLIDER-824 Update documentation with new placement configuration options
Added: incubator/slider/site/trunk/content/developing/chaosmonkey.md - copied unchanged from r1668701, incubator/slider/site/trunk/content/docs/slider_specs/chaosmonkey.md incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md incubator/slider/site/trunk/content/docs/configuration/internal.md incubator/slider/site/trunk/content/docs/configuration/resource_specification.md - copied, changed from r1668701, incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md incubator/slider/site/trunk/content/docs/configuration/revision-1/ incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md - copied, changed from r1660946, incubator/slider/site/trunk/content/docs/configuration/index.md incubator/slider/site/trunk/content/docs/configuration/revision-1/original-hbase.json - copied unchanged from r1660946, incubator/slider/site/trunk/content/docs/configuration/original-hbase.json incubator/slider/site/trunk/content/docs/configuration/revision-1/proposed-hbase.json - copied unchanged from r1660946, incubator/slider/site/trunk/content/docs/configuration/proposed-hbase.json incubator/slider/site/trunk/content/docs/configuration/revision-1/redesign.md - copied unchanged from r1660946, incubator/slider/site/trunk/content/docs/configuration/redesign.md incubator/slider/site/trunk/content/docs/configuration/revision-1/specification.md - copied unchanged from r1660946, incubator/slider/site/trunk/content/docs/configuration/specification.md Removed: incubator/slider/site/trunk/content/docs/configuration/original-hbase.json incubator/slider/site/trunk/content/docs/configuration/proposed-hbase.json incubator/slider/site/trunk/content/docs/configuration/redesign.md incubator/slider/site/trunk/content/docs/configuration/specification.md incubator/slider/site/trunk/content/docs/slider_specs/chaosmonkey.md incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md Modified: incubator/slider/site/trunk/content/design/rolehistory.md incubator/slider/site/trunk/content/developing/index.md incubator/slider/site/trunk/content/docs/client-configuration.md incubator/slider/site/trunk/content/docs/configuration/core.md incubator/slider/site/trunk/content/docs/configuration/index.md incubator/slider/site/trunk/content/docs/examples.md incubator/slider/site/trunk/content/docs/getting_started.md incubator/slider/site/trunk/content/docs/high_availability.md incubator/slider/site/trunk/content/docs/security.md incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md incubator/slider/site/trunk/content/docs/slider_specs/index.md incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md incubator/slider/site/trunk/content/docs/ssl.md Modified: incubator/slider/site/trunk/content/design/rolehistory.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/design/rolehistory.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/design/rolehistory.md (original) +++ incubator/slider/site/trunk/content/design/rolehistory.md Wed Mar 25 17:59:26 2015 @@ -35,20 +35,23 @@ that have reached their escalation timeo 1. Such requests are cancelled and "relaxed" requests re-issued. 1. Labels are always respected; even relaxed requests use any labels specified in `resources.json` 1. If a node is considered unreliable (as per-the slider 0.70 changes), it is not used in the initial -request. +request. YARN may still allocate relaxed instances on such nodes. That is: there is no explicit +blacklisting, merely deliberate exclusion of unreliable nodes from explicitly placed requests. -#### `strict` placement +#### Placement policies + +`strict` placement Again, "strict placement" has a different policy: once a component has been deployed on a node, one component request will be made against that node, even if it is considered unreliable. No relaxation of the request will ever take place. -#### `none` placement +`none` placement If the placement policy is "none", the request will always be relaxed. While tracking of recent failure counts takes place, it is not used in placement requests. -#### `anti-affine` placement +`anti-affine` placement There's still no explicit support for this in YARN or slider. As noted above, Slider does try to spread placement when rebuilding an application, but otherwise it accepts which Modified: incubator/slider/site/trunk/content/developing/index.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/developing/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/developing/index.md (original) +++ incubator/slider/site/trunk/content/developing/index.md Wed Mar 25 17:59:26 2015 @@ -33,6 +33,7 @@ Slider * [Submitting Patches](submitting_patches.html) * [Windows Development and Testing](windows.html) * [Demo Script](demo.html) +* [Configuring the Slider Chaos Monkey](chaosmonkey.html) ## Historical Documents Modified: incubator/slider/site/trunk/content/docs/client-configuration.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/client-configuration.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/client-configuration.md (original) +++ incubator/slider/site/trunk/content/docs/client-configuration.md Wed Mar 25 17:59:26 2015 @@ -328,4 +328,4 @@ and, 2. in a secure cluster, the security flag (`slider.security.enabled`) and the HDFS Kerberos principal. -3. The yarn registry options. +3. The YARN registry options. Added: incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md?rev=1669188&view=auto ============================================================================== --- incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md (added) +++ incubator/slider/site/trunk/content/docs/configuration/appconfig.html.md Wed Mar 25 17:59:26 2015 @@ -0,0 +1,16 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# title + \ No newline at end of file Modified: incubator/slider/site/trunk/content/docs/configuration/core.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/core.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/configuration/core.md (original) +++ incubator/slider/site/trunk/content/docs/configuration/core.md Wed Mar 25 17:59:26 2015 @@ -15,7 +15,7 @@ limitations under the License. --> -# Apache Slider Core Configuration Specification +# Apache Slider Core Configuration Specification, version 2.0 ## Terminology @@ -45,6 +45,9 @@ and what their resource requirements are size of the application in terms of its component requirements: how many, and what their resource requirements are. +*`internal.json`*: A file which contains Slider's internal configuration +parameters. + ## Structure Configurations are stored in well-formed JSON files. @@ -63,9 +66,9 @@ The JSON specification files all have a 1. A component section, `/components`. 1. 0 or more sections under `/components` for each component, identified by component name, containing string properties. -1. 0 or 1 section `/metadata` containing arbitrary metadata (such as a description, +1. An optional section `/metadata` containing arbitrary metadata (such as a description, author, or any other information that is not parsed or processed directly). - +1. An optional section, `/credentials` containing security information. The simplest valid specification file is @@ -288,37 +291,6 @@ the master component, using 1 vcore and each using one vcore and 512 MB of RAM. -## Internal information, `internal.json` - -This contains internal data related to the deployment -it is not -intended for manual editing. - -There MAY be a component, `diagnostics`. If defined, its content contains -diagnostic information for support calls, and MUST NOT be interpreted -during application deployment, (though it may be included in the generation -of diagnostics reports) - - - { - "schema": "http://example.org/specification/v2.0.0", - - "metadata": { - "description": "Internal configuration DO NOT EDIT" - }, - "global": { - "name": "small_cluster", - "application": "hdfs://cluster:8020/apps/hbase/v/1.0.0/application.tar" - }, - "components": { - - "diagnostics": { - "create.hadoop.deployed.info": "(release-2.3.0) @dfe463", - "create.hadoop.build.info": "2.3.0", - "create.time.millis": "1393512091276", - "create.time": "27 Feb 2014 14:41:31 GMT" - } - } - } ## Deployment specification: `app_configuration.json` @@ -351,6 +323,8 @@ application, and instances of the indivi "jvm.heapsize": "512M" } } + "credentials" { + } } The resolved specification defines the values that are passed to the @@ -397,6 +371,8 @@ different components. "jvm.heapsize": "512M" } } + "credentials" { + } } The `site.` properties have been passed down to each component, components @@ -407,17 +383,29 @@ there is no way to declare an attribute of the author of the configuration file (and their tools) to detect such issues. ### Key Application Configuration Items -The following sections provides details about certain application configuration properties that can be utilized to tailor the deployment of a given application: + +The following sections provides details about certain application configuration + properties that can be utilized to tailor the deployment of a given application: #### Controlling assigned port ranges -For certain deployments, the ports available for communication with clients (Web UI ports, RPC ports, etc) are restricted to a specific set (e.g when leveraging a firewall). In those situations you can designate the set of allowed ports with the "site.global.slider.allowed.ports" setting. This settings takes a comma-delimited set of port numbers and ranges, e.g.: + +For certain deployments, the ports available for communication with clients +(Web UI ports, RPC ports, etc) are restricted to a specific set (e.g when using a firewall). +In those situations you can designate the set of allowed ports with the +`site.global.slider.allowed.ports` setting. + +This takes a comma-delimited set of port numbers and ranges, e.g.: "site.global.slider.allowed.ports": "48000, 49000, 50001-50010" - -The AM exposed ports (Web UI, RPC), as well as the ports allocated to launched application containers, will be limited to the ranges specified by the property value. + +The AM exposed ports (Web UI, RPC), as well as the ports allocated to launched +application containers, will be limited to the ranges specified by the property value. #### Delaying container launch -In situations where container restarts may need to be delayed to allow for platform resources to be released (e.g. a port assigned to a container that is stopped may be slow to release), a delay can be designated by setting the "container.launch.delay.sec" property in the component's configuration section: + +In situations where container restarts may need to be delayed to allow for +platform resources to be released (e.g. a port assigned to a previous container +may be slow to release), a delay can be designated by setting the `container.launch.delay.sec` property. "worker": { "jvm.heapsite": "512M", @@ -425,14 +413,20 @@ In situations where container restarts m } #### Specifying the Python Executable Path -Currently the Slider containers leverage python for component scripts (the scripts responsible for component lifecycle operations). When deploying applications on certain variations of linux or other operating systems (e.g. Centos 5) , the version of python on the system path may be incompatible with the component script (e.g. methods or imports utilized are not available). In those circumstances the path to the python executable for container script execution can be specified by the "agent.python.exec.path" property: + +Slider containers use python for component scripts in the containers. +When deploying applications on certain variations of linux or other operating systems (e.g. Centos 5), +the version of python on the system PATH may be incompatible with the component script +In those circumstances the path to the python executable for container script execution can be +specified by the `agent.python.exec.path` property: "global": { "agent.python.exec.path": "/usr/bin/python", . . . } -This property may also be specified in the slider-client.xml file (typically in the "conf" directory of the slider installation) if the python version specified is to be utilized across multiple deployments: +This property may also be specified in the `slider-client.xml` file (typically in the "conf" directory +of the slider installation) if the python version specified is to be utilized across multiple deployments: <property> <name>agent.python.exec.path</name> Modified: incubator/slider/site/trunk/content/docs/configuration/index.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/configuration/index.md (original) +++ incubator/slider/site/trunk/content/docs/configuration/index.md Wed Mar 25 17:59:26 2015 @@ -25,15 +25,8 @@ requirements. 1. The dynamic description of the running application, including information on the location of components and aggregated statistics. -The specifics of this are covered in the [Core Configuration Specification](core.html) - -## Historical References - -1. [Specification](specification.html) -1. [Redesign](redesign.html) - - -1. [Example: current](original-hbase.json) -1. [Example: proposed](proposed-hbase.json) +* [Core Configuration Specification](core.html) +* [internal.json](internal.html) +* [resources.json](resources.html) Added: incubator/slider/site/trunk/content/docs/configuration/internal.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/internal.md?rev=1669188&view=auto ============================================================================== --- incubator/slider/site/trunk/content/docs/configuration/internal.md (added) +++ incubator/slider/site/trunk/content/docs/configuration/internal.md Wed Mar 25 17:59:26 2015 @@ -0,0 +1,55 @@ +<!--- + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. See accompanying LICENSE file. +--> + +# internal.json: slider's internal configuration document + + +## Internal information, `internal.json` + +This contains internal data related to the deployment âit is not +intended for normal use. + +There MAY be a component, `diagnostics`. If defined, its content contains +diagnostic information for support calls, and MUST NOT be interpreted +during application deployment, (though it may be included in the generation +of diagnostics reports) + + + { + "schema": "http://example.org/specification/v2.0.0", + + "metadata": { + "description": "Internal configuration DO NOT EDIT" + }, + "global": { + "name": "small_cluster", + "application": "hdfs://cluster:8020/apps/hbase/v/1.0.0/application.tar" + }, + "components": { + + "diagnostics": { + "create.hadoop.deployed.info": "(release-2.3.0) @dfe463", + "create.hadoop.build.info": "2.3.0", + "create.time.millis": "1393512091276", + "create.time": "27 Feb 2014 14:41:31 GMT" + } + } + } + +## Chaos Monkey + +The Slider application has a built in "Chaos Monkey", which is configured in the `internal.json` +file.: + +Consult ["configuring the Slider Chaos Monkey"](../developing/chaosmonkey.html) for details. Copied: incubator/slider/site/trunk/content/docs/configuration/resource_specification.md (from r1668701, incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md) URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/resource_specification.md?p2=incubator/slider/site/trunk/content/docs/configuration/resource_specification.md&p1=incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md&r1=1668701&r2=1669188&rev=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/resource_specification.md (original) +++ incubator/slider/site/trunk/content/docs/configuration/resource_specification.md Wed Mar 25 17:59:26 2015 @@ -17,31 +17,159 @@ # Apache Slider Resource Specification +* [Core Properties](#core) * [Container Failure Policy](#failurepolicy) -* [Using Labels](#labels) -* [Specifying Log Aggregation](#logagg) +* [Placement Policies and escalation](#placement) +* [Labels](#labels) +* [Log Aggregation](#logagg) + + +The Resource specification file, `resources.json`, defines the Yarn resource needs for each component type that belong to the application. +This includes: +* container CPU and memory requirements +* component placement policy, including YARN labels to explictly request nodes on. +* failure policy: what to do if components keep failing. +* placement escalation policy +* where logs generated by applications will be saved, information which is passed to YARN to enable +these logs to be copied to HDFS and remotely retrieved, even while the application is running +As such, it is the core file used by Slider to configure and manage +the application. -Resource specification is an input to Slider to specify the Yarn resource needs for each component type that belong to the application. +## <a name="core"></a>Core Properties An example resource requirement for an application that has two components "master" and "worker" is as follows. Slider will automatically add the requirements for the AppMaster for the application. This component is named "slider-appmaster". Some parameters that can be specified for a component instance include: -* `yarn.memory`: amount of memory required for the component instance -* `yarn.vcores`: number of vcores requested -* `yarn.role.priority`: each component must be assigned unique priority. Component with higher priority come up earlier than components with lower priority -* `yarn.component.instances`: number of instances for this component type +<table> + <tr> + <td>yarn.component.instances</td> + <td> + Number of instances of this component type + </td> + </tr> + <tr> + <td>yarn.memory</td> + <td> + Amount of memory in MB required for the component instance. + </td> + </tr> + <tr> + <td>yarn.vcores</td> + <td> + Number of "virtual cores" requested + </td> + </tr> + <tr> + <td>yarn.role.priority</td> + <td> + Unique priority for this component + </td> + </tr> + + +</table> + + +### Component instance count: `yarn.component.instances` + +The property `yarn.component.instances` is one of the most foundational one in slider +âit declares how many instances of a component to instantiate on the cluster. + +If the value is set to "0", no instances of a component will be created. If set +to a larger number, more instances will be requested. Thus the property sets the size +of the application, component-by-component. + +The number of instances of each component is application-specific; there are no recommended +values. + +### Container resource requirements: `yarn.memory` and `yarn.vcores` + +These two properties define how much memory and CPU capacity each +YARN container of this component requires. YARN will queue +container requests until enough capacity exists within the cluster +to satisfy them. When there is capacity, a container is allocated to Slider, +which then deploys an instance of the component. + +The larger these numbers, the more capacity the application gets. + +If more memory or CPU is requested than needed then that containers will take +longer to be allocated than necessary, and other work may not be scheduled: +the cluster will be under-utilized. + +`yarn.memory` declares the amount of memory to ask for in YARN containers; it should +be defined for each component based on the expected memory consumption. It is measured +in MB. + +If the cluster has hard memory limits enabled, then if the processes in a container +use more physical or virtual memory than was granted âYARN will kill the container. +Slider will attempt to recreate the component instance by requesting a new container, +though if the number of failures of a component is too great then it will eventually +give up and fail the application. + +A YARN cluster is usually configured with a minimum container allocation, set in `yarn-site.xml` +by the configuration parameter `yarn.scheduler.minimum-allocation-mb`; the default value is +1024 MB. It will also have a maximum size set in `yarn.scheduler.maximum-allocation-mb`; +the default is 8192, that is, 8GB. Asking for more than this will result in the request +being rejected. + + +`yarn.vcores` declares the number of "virtual cores" to request. These are a site-configured +fraction of a physical CPU core; if the ratio of virtual to physical is 1:1 then a physical core +is allocated to each one (this may include a Hyperthreaded Core if enabled in the BIOS). +If the ratio is lower, such as 2:1, then each vcore allocates half a physical one. + +This notion of a virtual core is intended to partially isolate applications from differences +in cluster performance: a process which needs 2 vcores on one cluster should ideally +still ask for 2 vcores on a different cluster âeven if the latter has newer CPU parts. +In practise, it's not so consistent. Ask for more vcores if your process needs more CPU +time. + +YARN clusters may be configured to throttle CPU usage: if a process tries to use more than +has been granted to the container, it will simply be scheduled with less CPU time. The penalty +for using more CPU than requested is therefore less destructive than attempting to +use more memory than requested/granted. + + + +#### Relationship between `yarn.memory` and JVM Heap Size + +Java applications deployed by Slider usually have a JVM heap size property which needs +to be defined as part of the application configuration. + +The value of `yarn.memory` MUST be bigger than the heap size allocated to any JVM, as a JVM +uses a lot more memory than simply the heap alone. We have found that asking for at least 50% +more appears to work, though some experimentation will be needed. + +Slider does not attempt to derive a heap size for any component from the YARN allocation. + +### Component instance count: `yarn.role.priority` + +The property `yarn.role.priority` has two purposes within Slider +1. It provides a unique index of individual component types. That is, it is not +the name of a component which Slider uses to index components, it is it's priority +value. +1. It defines the priority within an application for YARN to use when allocating +components. Components with higher priority get allocated first. + +Generally the latter use, YARN allocation priority, is less important for Slider-deployed +applications than for analytics applications designed to scale to as many nodes that can +be instantiated. A static slider cluster has a predefined number of of each components to +request (defined by `yarn.component.instances`), with memory and CPU requirements of +each component's container defined by `yarn.memory` and `yarn.vcores`. It will request +the specified number of components âand keep those requests outstanding until they are +satisfied. + +### Example -Sample: { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { - "yarn.container.failure.threshold":"10", - "yarn.container.failure.window.hours":"1" + }, "components" : { "HBASE_MASTER" : { @@ -51,6 +179,8 @@ Sample: "yarn.vcores" : "1" }, "slider-appmaster" : { + "yarn.memory" : "1024", + "yarn.vcores" : "1" }, "HBASE_REGIONSERVER" : { "yarn.role.priority" : "2", @@ -59,6 +189,16 @@ Sample: } } +## <a name="slider-appmaster"></a>The `slider-appmaster` component + +The examples here all have a component `slider-appmaster`. This defines the settings of +the application master itself: the memory and CPU it requires, optionally a label (see +["Labels"](#labels)). The `yarn.role.priority` value is ignored: the priority is always "0"; +and the instance count, `yarn.component.instances` is implicitly set to "1". + +The entry exists primarily to allow applications to configure the amount of RAM the AM should +request. + ## <a name="failurepolicy"></a>Container Failure Policy YARN containers hosting component instances may fail. This can happen because of @@ -89,17 +229,27 @@ The limits are defined in `resources.jso This duration can span days. 1. The maximum number of failures of any component in this time period. +### Failure threshold for a component -The parameters defining the failure policy are as follows. -* `yarn.container.failure.threshold` +The number of times a component may fail within a failure window is +defined by the property `yarn.container.failure.threshold` -The threshold for failures. If set to "0" there are no limits on + +If set to "0" there are no limits on the number of times containers may fail. +The failure thresholds for individual components can be set independently + -* `yarn.container.failure.window.days`, `yarn.container.failure.window.hours` -and ``yarn.container.failure.window.minutes` +### Failure window + +The failure window can be set by minutes, days and hours. These must be set +in the `global` options, as they apply to slider only. + + yarn.container.failure.window.days + yarn.container.failure.window.hours + yarn.container.failure.window.minutes These properties define the duration of the window; they are all combined so the window is, in minutes: @@ -113,12 +263,14 @@ is exceeded, all failure counts are rese If the AM itself fails, the failure counts are reset and and the window is restarted. -### Per-component and global failure thresholds - -The failure thresholds for individual components can be set independently +The default value is `yarn.container.failure.window.hours=6`; when changing +the window size, the hour value must be explicitly set, even if to zero, to +change this. ### Recommended values + + We recommend having a duration of a few hours for the window, and a large failure limit proportional to the the number of instances of that component @@ -130,16 +282,19 @@ trying to reinstantiate all the componen repeatedly, eventually slider will conclude that there is a problem and fail with the exit code 73, `EXIT_DEPLOYMENT_FAILED`. + ### Example -Here is a `resource.json` file for an HBase cluster +Here is a `resource.json` file for an HBase cluster: - "resources": { + { "schema" : "http://example.org/specification/v2.0.0", "metadata" : { }, "global" : { "yarn.container.failure.threshold" : "4", - "yarn.container.failure.window.hours" : "1' + "yarn.container.failure.window.days" : "0', + "yarn.container.failure.window.hours" : "1', + "yarn.container.failure.window.minutes" : "0' }, "components" : { "slider-appmaster" : { @@ -147,13 +302,13 @@ Here is a `resource.json` file for an HB "yarn.vcores" : "1", "yarn.component.instances" : "1" }, - "master" : { + "HBASE_MASTER" : { "yarn.role.priority" : "1", "yarn.memory" : "256", "yarn.vcores" : "1", "yarn.component.instances" : "2" }, - "worker" : { + "HBASE_REGIONSERVER" : { "yarn.role.priority" : "2", "yarn.memory" : "512", "yarn.container.failure.threshold" : "15", @@ -165,13 +320,18 @@ Here is a `resource.json` file for an HB The window size is set to one hour: after that the counters are reset. -There is a global failure threshold of 4. As two instances of the HBase master -are requested, the failure threshold per hour is double that of the number of masters. +There is a global failure threshold of 4 components. + There are ten worker components requested; the failure threshold for these -components is overridden to be fifteen. This allows all workers to fail and -the cluster to recover âbut only another five failures would be tolerated -for the remaining hour. +components is overridden to be fifteen. Given that there are more region servers +than masters, a higher failure rate of worker nodes is to be expected **if the cause of +the failure is due to the underlying hardware** + +Choosing a higher value for the region servers ensures that the application is resilient +to harware problems. If there were some configuration problem in the region server +deployments, resulting in them all failing rapidly, this threshold would soon be breached +which would cause the application to fail. Thus, configuration problems would be detected. These failure thresholds are all heuristics. When initially configuring an application instance, low thresholds reduce the disruption caused by components @@ -181,8 +341,172 @@ In a production application, large failu ensures that the application is resilient to transient failures of the underlying YARN cluster and hardware. + +## <a name="placement"></a>Placement Policies and escalation + +Slider can be configured with different options for **placement** âthe +policies by which it chooses where to ask YARN for nodes. + +### Placement Policy + +The "placement policy" of a component is the set of rules by which Slider makes +a decision on where to request instances of that component from YARN. + +<table> +<tr> + <td>0</td> + <td> + Default: placement is spread + across the cluster on re-starts, with escalation if requests are + unmet. Unreliable nodes are avoided. + </td> +</tr> + +<tr> + <td>1</td> + <td>strict: a component is requested on every node used, irrespective + of faiure history. No escalation takes place.</td> +</tr> + +<tr> + <td>2</td> + <td>Anywhere. Place requests anywhere and ignore the history.</td> +</tr> + +<tr> + <td>4</td> + <td>Anti affinity required. This option is not currently supported.</td> +</tr> + +</table> + +The placement policy is a binary "or" of all the values, and can be +set in the property `"yarn.component.placement.policy"`. + +Example: + + "HBASE_REST": { + "yarn.role.priority": "3", + "yarn.component.instances": "1", + "yarn.component.placement.policy": "1", + "yarn.memory": "556" + }, + +This defines an HBASE_REST component with a placement policy of "1"; strict. + +On application restarts Slider will re-request the same node. + +If the component were configured to request an explicit port for its REST endpoint, +then the same URL would reach it whenever this application were deployed +âprovided the host was available and the port not already in use. + +#### Notes + +1. There's no support for **anti-affinity** âi.e. to mandate that component +instances must never be deployed on the same hosts. Once YARN adds support for +this, Slider will support it. + +1. Slider never explicitly black-lists nodes. It does track which nodes have been +unreliable "recently", and avoids explicitly requesting them. If YARN does +actually allocate a container there, Slider will attempt to deploy the component +there. + +1. Apart from an (optional) label, placement policies for the application master itself + cannot be specified. The Application Master is deployed wherever YARN sees fit. + + +### Node Failure Threshold, `yarn.node.failure.threshold` + +The configuration property `yarn.node.failure.threshold` defines how "unreliable" +a node must be before it is skipped for placement requests. + +1. This is per-component. +1. It is ignored for "strict" or "anywhere" placements. +1. It is reset at the same time as the container failure counters; that is, at +the interval defined by the `yarn.container.failure.window` properties + +### Escalation: `yarn.placement.escalate.seconds` + +For any component whose placement policy is not "any", Slider saves to HDFS +a record the nodes on which instances were running. When starting a cluster, it uses +this history to identify hosts on which to request instances. + +1. Slider initially asks for nodes on those specific hosts âprovided their recent failure +history is considered acceptable. +1. It tracks which 'placed' requests are outstanding. +1. If, after the specified escalation time, YARN containers have not been allocated +on those nodes, slider will "escalate" the placement of those requests that are +outstanding. +1. It currently does this by cancelling each request and re-requesting a container on +that node, this time with the `relaxLocality` flag set. +1. This tells YARN to seek an alternative location in the cluster if it cannot +allocate one on the target host. +1. If there is enough capacity in the cluster, the new node will then be allocated. + + +The higher the cost of migrating a component instance from one host to another, the longer +we would recommend for an escalation timeout. + +Example: + + { + "schema": "http://example.org/specification/v2.0.0", + "metadata": { + }, + "global": { + }, + "components": { + "HBASE_MASTER": { + "yarn.role.priority": "1", + "yarn.component.instances": "1", + "yarn.placement.escalate.seconds": "10" + }, + "HBASE_REGIONSERVER": { + "yarn.role.priority": "2", + "yarn.component.instances": "10", + "yarn.placement.escalate.seconds": "600" + }, + "slider-appmaster": { + } + } + } + +This declares that the `HBASE_MASTER` placement should be escalated after one second, +but that that `HBASE_REGIONSERVER` instances should have an escalation timeout of 600 +seconds âten minutes. These values were chosen as an HBase Master can be allocated +anywhere in the cluster, but a region server is significantly faster if restarted +on the same node on which it previously saved all its data. Even though HDFS will +have replicated all data elsewhere, it will have been scattered across the cluster +âresulting in remote access for most of the data, at least until a full compaction +has taken place. + + +#### Notes + +1. Escalation goes directory from "specific node" to "anywhere in cluster"; it does +not have any intermediate "same-rack" policy. + +1. If components were assigned to specific labels, then even when placement is +"escalated", Slider will always ask for containers on the specified labels. That +is âit will never relax the constraint of "deploy on the labels specified". If +there are not enough labelled nodes for the application, either the cluster +administrators need to add more labelled nodes, or the application must be reconfigured +with a different label policy. + +1. Escalated components may be allocated containers on nodes which already have a running +instance of the same component. + +1. If the placement policy is "strict", there is no escalation. If the node +is not available or lacks capacity, the request will remain unsatisfied. + +1. There is no placement escalation option for the application master. + +1. For more details, see: [Role History](/design/rolehistory.html) + + ## <a name="labels"></a>Using Labels -The resources.json file can be used to specify the labels to be used when allocating containers for the components. The details of the YARN Label feature can be found at [YARN-796](https://issues.apache.org/jira/browse/YARN-796). + +The `resources.json` file include specifications the labels to be used when allocating containers for the components. The details of the YARN Label feature can be found at [YARN-796](https://issues.apache.org/jira/browse/YARN-796). In summary: @@ -193,55 +517,77 @@ In summary: This way, you can guarantee that a certain set of nodes are reserved for an application or for a component within an application. -Label expression is specified through property "yarn.label.expression". When no label expression is specified then it is assumed that only non-labeled nodes are used when allocating containers for component instances. +Label expression is specified through property `yarn.label.expression`. When no label expression is specified then it is assumed that only non-labeled nodes are used when allocating containers for component instances. + +If a label expression is specified for the `slider-appmaster` component then it also becomes the default label expression for all component. -If label expression is specified for slider-appmaster then it also becomes the default label expression for all component. To take advantage of default label expression leave out the property (see HBASE_REGIONSERVER in the example). Label expression with empty string ("yarn.label.expression":"") means nodes without labels. +#### Example -Example -Here is a resource.json file for an HBase cluster which uses labels. The label for the application instance is "hbase1" and the label expression for the HBASE_MASTER components is "hbase1_master". HBASE_REGIONSERVER instances will automatically use label "hbase1". Alternatively, if you specify ("yarn.label.expression":"") for HBASE_REGIONSERVER then the containers will only be allocated on nodes with no labels. +Here is a `resource.json` file for an HBase cluster which uses labels. + +The label for the application master is `hbase1` and the label expression for the HBASE_MASTER components is `hbase1_master`. +`HBASE_REGIONSERVER` instances will automatically use label `hbase1`. { - "schema": "http://example.org/specification/v2.0.0", - "metadata": { + "schema": "http://example.org/specification/v2.0.0", + "metadata": { + }, + "global": { + }, + "components": { + "HBASE_MASTER": { + "yarn.role.priority": "1", + "yarn.component.instances": "1", + "yarn.label.expression":"hbase1_master" }, - "global": { + "HBASE_REGIONSERVER": { + "yarn.role.priority": "2", + "yarn.component.instances": "10", }, - "components": { - "HBASE_MASTER": { - "yarn.role.priority": "1", - "yarn.component.instances": "1", - "yarn.label.expression":"hbase1_master" - }, - "HBASE_REGIONSERVER": { - "yarn.role.priority": "1", - "yarn.component.instances": "1", - }, - "slider-appmaster": { - "yarn.label.expression":"hbase1" - } + "slider-appmaster": { + "yarn.label.expression":"hbase1" } + } } -Specifically, for the above example you will need: +To deploy this application in a YARN cluster, the following steps must be followed. + +1. Create two labels, `hbase1` and `hbase1_master` (use `yarn rmadmin` commands) +1. Assign the labels to nodes (use `yarn rmadmin` commands) +1. Perform refresh queue (`yarn -refreshqueue`) +1. Create a queue by defining it in the capacity scheduler configuragion. +1. Allow the queue to access to the labels and ensure that appropriate min/max capacity is assigned +1. Perform refresh queue (`yarn -refreshqueue`) +1. Create the Slider application against the above queue using parameter --queue while creating the application + +### Notes + +1. If a label is defined in the `global` section, it will also apply to all components which do +not explicitly identify a label. If such a label is expression is set there and another is defined +for the `slider-appmaster`, the app master's label is only used for its placement. + +1. To explicitly request that components are not requested on a label, irrespective of +any global- or appmaster- spettings, set the `yarn.label.expression` to an empty string: + + "HBASE_REGIONSERVER": { + "yarn.role.priority": "2", + "yarn.component.instances": "10", + "yarn.label.expression":"" + } -* Create two labels, hbase1 and hbase1_master (use yarn rmadmin commands) -* Assign the labels to nodes (use yarn rmadmin commands) -* Perform refresh queue (yarn -refreshqueue) -* Create a queue by defining it in the capacity scheduler config -* Allow the queue to access to the labels and ensure that appropriate min/max capacity is assigned -* Perform refresh queue (yarn -refreshqueue) -* Create the Slider application against the above queue using parameter --queue while creating the application +1. If there is not enough capacity within a set of labelled nodes for the requested containers, +the application instance will not reach its requested size. +## <a name="logagg"></a>Log Aggregation -## <a name="logagg"></a>Using Log Aggregation Log aggregation at regular intervals for long running services (LRS) needs to be enabled at the YARN level before any application can exploit this functionality. To enable set the following property to a positive value of 3600 (in secs) or more. If set to a positive value less than 3600 (1 hour) this property defaults to 3600. To disable log aggregation set it to -1. <property> - <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> - <value>3600</value> + <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> + <value>3600</value> </property> Subsequently every application owner has the flexibility to set the include and exclude patterns of file names that @@ -250,27 +596,105 @@ of files that need to be backed up under set at the global level as shown below - { - "schema": "http://example.org/specification/v2.0.0", - "metadata": { + "schema": "http://example.org/specification/v2.0.0", + "metadata": { + }, + "global": { + "yarn.log.include.patterns": "hbase*.*", + "yarn.log.exclude.patterns": "hbase*.out" + }, + "components": { + "HBASE_MASTER": { + "yarn.role.priority": "1", + "yarn.component.instances": "1", + }, + "HBASE_REGIONSERVER": { + "yarn.role.priority": "2", + "yarn.component.instances": "10", }, - "global": { - "yarn.log.include.patterns": "hbase*.*", - "yarn.log.exclude.patterns": "hbase*.out" - }, - "components": { - "HBASE_MASTER": { - "yarn.role.priority": "1", - "yarn.component.instances": "1", - }, - "HBASE_REGIONSERVER": { - "yarn.role.priority": "1", - "yarn.component.instances": "1", - }, - "slider-appmaster": { - } + "slider-appmaster": { } + } } The details of the YARN Log Aggregation feature can be found at [YARN-2468](https://issues.apache.org/jira/browse/YARN-2468). + +## Putting it all together + +Here is an example of a definition of an HBase cluster. + + + + { + "schema": "http://example.org/specification/v2.0.0", + "metadata": { + }, + "global": { + "yarn.log.include.patterns": "hbase*.*", + "yarn.log.exclude.patterns": "hbase*.out", + "yarn.container.failure.window.hours": "0", + "yarn.container.failure.window.minutes": "30", + "yarn.label.expression":"development" + }, + "components": { + "slider-appmaster": { + "yarn.memory": "1024", + "yarn.vcores": "1" + "yarn.label.expression":"" + }, + "HBASE_MASTER": { + "yarn.role.priority": "1", + "yarn.component.instances": "1", + "yarn.placement.escalate.seconds": "10", + "yarn.vcores": "1", + "yarn.memory": "1500" + }, + "HBASE_REGIONSERVER": { + "yarn.role.priority": "2", + "yarn.component.instances": "1", + "yarn.vcores": "1", + "yarn.memory": "1500", + "yarn.container.failure.threshold": "15", + "yarn.placement.escalate.seconds": "60" + }, + "HBASE_REST": { + "yarn.role.priority": "3", + "yarn.component.instances": "1", + "yarn.component.placement.policy": "1", + "yarn.container.failure.threshold": "3", + "yarn.vcores": "1", + "yarn.memory": "556" + }, + "HBASE_THRIFT": { + "yarn.role.priority": "4", + "yarn.component.instances": "0", + "yarn.component.placement.policy": "1", + "yarn.vcores": "1", + "yarn.memory": "556" + "yarn.label.expression":"stable" + }, + "HBASE_THRIFT2": { + "yarn.role.priority": "5", + "yarn.component.instances": "1", + "yarn.component.placement.policy": "1", + "yarn.vcores": "1", + "yarn.memory": "556" + "yarn.label.expression":"stable" + } + } + } + +There are ten region servers, with a 60-second timeout for placement escalation; +15 containers can fail in the "recent" time window before the application is +considered to have failed. + +The time window to reset failures is set to 30 minutes. + +The Thrift, Thrift2 and REST servers all have strict placement. The REST +server also has a container failure threshold of 3: if it can not come up +three times, the entire application deployment is considered a failure. + +The default label for nodes is "development". For the application master itself it is "", +meaning anywhere. Both thrift services are requested on the labels "stable" \ No newline at end of file Copied: incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md (from r1660946, incubator/slider/site/trunk/content/docs/configuration/index.md) URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md?p2=incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md&p1=incubator/slider/site/trunk/content/docs/configuration/index.md&r1=1660946&r2=1669188&rev=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/configuration/index.md (original) +++ incubator/slider/site/trunk/content/docs/configuration/revision-1/index.md Wed Mar 25 17:59:26 2015 @@ -15,20 +15,10 @@ limitations under the License. --> -# Apache Slider: Specification of an application instance, revision 2.0 +# Apache Slider: Specification of an application instance, revision 1 -The specification of an application comprises - -1. The persistent description of an application's configuration -1. The persistent description of the desired topology and YARN resource -requirements. -1. The dynamic description of the running application, including information -on the location of components and aggregated statistics. - -The specifics of this are covered in the [Core Configuration Specification](core.html) - - -## Historical References +This is the original specification of an application instance, including discussion +on a proposed rework. 1. [Specification](specification.html) 1. [Redesign](redesign.html) @@ -37,3 +27,4 @@ The specifics of this are covered in the 1. [Example: current](original-hbase.json) 1. [Example: proposed](proposed-hbase.json) +This design has been supplanted by the [version 2.0](../index.html) design. \ No newline at end of file Modified: incubator/slider/site/trunk/content/docs/examples.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/examples.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/examples.md (original) +++ incubator/slider/site/trunk/content/docs/examples.md Wed Mar 25 17:59:26 2015 @@ -102,7 +102,7 @@ or ### Optional: point bin/slider at your chosen cluster configuration -export SLIDER_CONF_DIR=~/Projects/slider/slider-core/src/test/configs/ubuntu-secure/slider + export SLIDER_CONF_DIR=~/Projects/slider/slider-core/src/test/configs/ubuntu-secure/slider ## Optional: Clean up any existing slider cluster details Modified: incubator/slider/site/trunk/content/docs/getting_started.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/getting_started.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/getting_started.md (original) +++ incubator/slider/site/trunk/content/docs/getting_started.md Wed Mar 25 17:59:26 2015 @@ -240,7 +240,7 @@ As Slider creates each instance of a com All this information goes into the **Resources Specification** file ("Resource Spec") named `resources.json`. The Resource Spec tells Slider how many instances of each component in the application (such as an HBase RegionServer) to deploy and the parameters for YARN. -An application package should contain the default resources.json and you can start from there. Or you can create one based on [Resource Specification](slider_specs/resource_specification.html). +An application package should contain the default resources.json and you can start from there. Or you can create one based on [Resource Specification](/configuration/resource.html)). Store the Resource Spec file on your local disk (e.g. `/tmp/resources.json`). Modified: incubator/slider/site/trunk/content/docs/high_availability.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/high_availability.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/high_availability.md (original) +++ incubator/slider/site/trunk/content/docs/high_availability.md Wed Mar 25 17:59:26 2015 @@ -116,7 +116,6 @@ for setup details. <property> <description>The class to use as the persistent store.</description> <name>yarn.resourcemanager.store.class</name> - <!ÂÂ--value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</valueÂ--> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> @@ -126,7 +125,7 @@ for setup details. org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore as the value for yarn.resourcemanager.store.class </description> - <name>yarn.resourcemanager.zkÂ-address</name> + <name>yarn.resourcemanager.zk.address</name> <value>127.0.0.1:2181</value> </property> Modified: incubator/slider/site/trunk/content/docs/security.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/security.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/security.md (original) +++ incubator/slider/site/trunk/content/docs/security.md Wed Mar 25 17:59:26 2015 @@ -118,7 +118,7 @@ The Application Master will read in the relevant number of componentss. ### The Keytab distribution/access Options - Rather than relying on delegation token based authentication mechanisms, the AM leverages keytab files for obtaining the principals to authenticate to the configured cluster KDC. In order to perform this login the AM requires access to a keytab file that contains the principal representing the user identity to be associated with the launched application instance (e.g. in an HBase installation you may elect to leverage the âhbaseâ principal for this purpose). There are two mechanisms supported for keytab access and/or distribution: + Rather than relying on delegation token based authentication mechanisms, the AM leverages keytab files for obtaining the principals to authenticate to the configured cluster KDC. In order to perform this login the AM requires access to a keytab file that contains the principal representing the user identity to be associated with the launched application instance (e.g. in an HBase installation you may elect to leverage the `hbase` principal for this purpose). There are two mechanisms supported for keytab access and/or distribution: #### Local Keytab file access: @@ -129,11 +129,11 @@ relevant number of componentss. "slider-appmaster": { "jvm.heapsize": "256M", "slider.am.keytab.local.path": "/etc/security/keytabs/hbase.headless.keytab", - âslider.keytab.principal.nameâ : âhbase" + "slider.keytab.principal.name" : "hbase" } } - The âslider.am.keytab.local.pathâ property provides the full path to the keytab file location and is mandatory for the local lookup mechanism. The principal to leverage from the file is identified by the âslider.keytab.principal.nameâ property. + The `slider.am.keytab.local.path` property provides the full path to the keytab file location and is mandatory for the local lookup mechanism. The principal to leverage from the file is identified by the `slider.keytab.principal.name` property. In this scenario the distribution of keytab files for the AM AND the application itself is the purview of the application deployer. So, for example, for an hbase deployment, the hbase site service keytab will have to be distributed as well and indicated in the hbase-site properties: @@ -152,32 +152,47 @@ relevant number of componentss. "jvm.heapsize": "256M", "slider.hdfs.keytab.dir": ".slider/keytabs/hbase", "slider.am.login.keytab.name": "hbase.headless.keytab", - âslider.keytab.principal.nameâ : âhbase" + "slider.keytab.principal.name" : "hbase" } } - The âslider.hdfs.keytab.dirâ points to an HDFS path, relative to the userâs home directory (e.g. /users/hbase), in which slider can find all keytab files required for both AM login as well as application services (e.g. for hbase that would be the headless keytab for the AM and the service keytab for the HBase application components). If no value is specified, a default location of â.slider/keytabs/<cluster name>â is assumed. - The âslider.am.login.keytab.nameâ is the name of the keytab file (mandatory property), found within the specified directory, that the AM will use to lookup up the login principal and authenticate. - - If leveraging the slider-based distribution mechanism, the keytab files for components will be accessible from a âkeytabsâ sub-directory of the container work folder and can therefore be specified relative to the $AGENT_WORK_ROOT/keytabs directory, e.g.: +The `slider.hdfs.keytab.dir` points to an HDFS path, relative to the user`s home directory +(e.g. `/users/hbase`), in which slider can find all keytab files required for both AM login +as well as application services. For example, for Apache HBase the uses would be the headless keytab +for the AM and the service keytab for the HBase application components). + +If no value is specified, a default location of `.slider/keytabs/<cluster name>` is assumed. + +The `slider.am.login.keytab.name` is the name of the keytab file (mandatory property), +found within the specified directory, that the AM will use to lookup up the login principal and authenticate. + +When using the slider-based distribution mechanism, the keytab files for components will be +accessible from a `keytabs` sub-directory of the container work folder and can therefore be +specified relative to the `$AGENT_WORK_ROOT/keytabs` directory, e.g.: . . . "site.hbase-site.hbase.master.kerberos.principal": "hbase/_h...@example.com", "site.hbase-site.hbase.master.keytab.file": "${AGENT_WORK_ROOT}/keytabs/hbase.service.keytab", . . . - For both mechanisms above, the principal name used for authentication is either: +For both mechanisms above, the principal name used for authentication is either: -* The principal name established on the client side before invocation of the Slider CLI (the principal used to âkinitâ) or -* The value specified for a âslider.keytab.principal.nameâ property. +* The principal name established on the client side before invocation of the Slider CLI (the principal used to `kinit`) or +* The value specified for a `slider.keytab.principal.name` property. #### Slider Client Keytab installation: -The Slider client can be leveraged to install keytab files individually into a designated keytab HDFS folder. The format of the command is: +The Slider client can be leveraged to install keytab files individually into a designated +keytab HDFS folder. The format of the command is: slider install-keytab âkeytab <path to keytab on local file system> âfolder <name of HDFS folder to store keytab> [âoverwrite] -The command will store the keytab file specified by the ââkeytabâ option in to an HDFS folder that is created or exists under /user/username/.slider/keytabs named by the ââfolderâ option (e.g. if the folder name specified is âHBASEâ the keytab will be stored in /user/username/.slider/keytabs/HBASE). The command can be used to upload keytab files individually up to HDFS. For example, if uploading both AM and HBase service keytabs to the âHBASEâ folder, the command will be invoked twice: +The command will store the keytab file specified by the `âkeytab` option in to +an HDFS folder that is created or exists under `/user/username/.slider/keytabs` named by the +`âfolder` option (e.g. if the folder name specified is `HBASE` the keytab will be stored in + `/user/username/.slider/keytabs/HBASE`). +The command can be used to upload keytab files individually up to HDFS. + For example, if uploading both AM and HBase service keytabs to the `HBASE` folder, the command will be invoked twice: slider install-keytab âkeytab /my/local/keytabs/folder/hbase.headless.keytab âfolder HBASE slider install-keytab âkeytab /my/local/keytabs/folder/hbase.service.keytab âfolder HBASE @@ -195,10 +210,10 @@ Subsequently, the associated hbase-site "jvm.heapsize": "256M", "slider.hdfs.keytab.dir": ".slider/keytabs/HBASE", "slider.am.login.keytab.name": "hbase.headless.keytab" - âslider.keytab.principal.nameâ : âhbase" + `slider.keytab.principal.name` : `hbase" } } - + ## Securing communications between the Slider Client and the Slider AM. When the AM is deployed in a secure cluster, @@ -255,16 +270,25 @@ They can also be set on the Slider comma -S java.security.krb5.realm=MINICLUSTER -S java.security.krb5.kdc=hadoop-kdc ## Generation and deployment of application keystores/truststores -Application components may make use of keystores and truststores to establish secure communications. Given the nature of application deployments in a YARN cluster and the lack of certainty concerning the nodemanager host on which a component container may be spawned, Slider provides the facility for creating and deploying the keystores and truststores that may be required. + +Application components may make use of keystores and truststores to establish secure communications. +Given the nature of application deployments in a YARN cluster and the lack of certainty concerning +the host on which a component container may be allocated, +Slider provides the facility for creating and deploying the keystores and truststores that may be required. The process of enabling application keystore/truststore generation and deployment is: -* Set the "slider.component.security.stores.required" property to "true". This property can be set as a global property (indicating all components require stores) or can be set/overridden at the component level to selectively enable store generation for a given component. +* Set the configuration option `"slider.component.security.stores.required"` to `"true"`. + This optional can be set as a global property (indicating all components require stores) or can be set/overridden at the component level to selectively enable store generation for a given component. * Specify the password property for the component keystore or truststore or, * Specify the property providing the alias that references a credential managed by the Hadoop Credential Provider. This credential provides the password for securing the keystore/truststore. ### Specifying a keystore/truststore password -Applications that make use of a keystore and/or truststore may already have configuration properties that reference the value for the password used to secure the given certificate store. In those instances the application configuration can reference the value of the password property in the component specific configuration section: + +Applications that make use of a keystore and/or truststore may already have configuration +properties that reference the value for the password used to secure the given certificate store. +In those instances the application configuration can reference the value of the password property +in the component specific configuration section: "APP_COMPONENT": { "slider.component.security.stores.required": "true", @@ -273,13 +297,14 @@ Applications that make use of a keystore In this example: -* The store required property is set to "true" for the APP_COMPONENT component -* The application has a property in its site configuration file named "app_component.keystore.password". This property is specified in the appConfig file's global section (with the "site.myapp-site" prefix), and is referenced here to indicate to Slider which application property provides the store password. +* The store required property is set to `"true"` for the `APP_COMPONENT` component +* The application has a property in its site configuration file named `"app_component.keystore.password"`. +This property is specified in the appConfig file's global section (with the "site.myapp-site" prefix), and is referenced here to indicate to Slider which application property provides the store password. ### Specifying a keystore/truststore Credential Provider alias Applications that utilize the Credenfial Provider API to retrieve application passwords can specify the following configuration: -* Indicate the credential storage path in the "credentials" section of the app configuration file: +* Indicate the credential storage path in the `credentials` section of the app configuration file: "credentials": { "jceks://hdfs/user/${USER}/myapp.jceks": ["app_component.keystore.password.alias"] @@ -302,7 +327,9 @@ At runtime, Slider will read the credent When trying to talk to a secure, cluster you may see the message: No valid credentials provided (Mechanism level: Illegal key size)] - +or + No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) + This means that the JRE does not have the extended cryptography package needed to work with the keys that Kerberos needs. This must be downloaded from Oracle (or other supplier of the JVM) and installed according to Modified: incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md (original) +++ incubator/slider/site/trunk/content/docs/slider_specs/application_instance_configuration.md Wed Mar 25 17:59:26 2015 @@ -84,7 +84,7 @@ An appConfig.json contains the applicati appConf.json allows you to pass in arbitrary set of configuration that Slider will forward to the application component instances. ## Variable naming convention -In order to understand how the naming convention work, lets look at how the config is passed on to component commands. Slider agent recevies a structured bag of commands as input for all commands, INSTALL, CONFIGURE, START, etc. The command includes a section "configuration" which has config properties arranged into named property bags. +In order to understand how the naming convention works, lets look at how the config is passed on to component commands. Slider agent recevies a structured bag of commands as input for all commands, INSTALL, CONFIGURE, START, etc. The command includes a section "configuration" which has config properties arranged into named property bags. * Variables of the form `site.xx.yy` translates to variables by the name `yy` within the group `xx` and are typically converted to site config files by the name `xx` containing variable `yy`. For example, `"site.hbase-site.hbase.regionserver.port":""` will be sent to the Slider-Agent as `"hbase-site" : { "hbase.regionserver.port": ""}` and app definition scripts can access all variables under `hbase-site` as a single property bag. * Similarly, `site.core-site.fs.defaultFS` allows you to pass in the default fs. *This specific variable is automatically made available by Slider but its shown here as an example.* @@ -114,7 +114,7 @@ For example, HBase master info port need There is no set guideline for doing so. How an application emits metrics and how the metrics are emitted to the right place is completely defined by the application. In the following example, we hso how HBase app is configured to emit metrics to a ganglia server. -Ganglia server lifecycle is not controlled by the app instance. So the app instance only needs to know where to emit the metrics. This is achieved by three global variables +Ganglia server lifecycle is not controlled by the app instance. So the app instance only needs to know where to emit the metrics. This is achieved through global variables * "site.global.ganglia_enabled":"true" * "site.global.ganglia_server_host": "gangliaserver.my.org" Modified: incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md (original) +++ incubator/slider/site/trunk/content/docs/slider_specs/creating_app_definitions.md Wed Mar 25 17:59:26 2015 @@ -26,7 +26,7 @@ An application instance consists of seve Figure 1 - High-level view of a container For example: - + yarn 8849 -- python ./infra/agent/slider-agent/agent/main.py --label container_1397675825552_0011_01_000003___HBASE_REGIONSERVER --host AM_HOST --port 47830 yarn 9085 -- bash /hadoop/yarn/local/usercache/yarn/appcache/application_1397675825552_0011/ ... internal_start regionserver yarn 9114 -- /usr/jdk64/jdk1.7.0_45/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=... @@ -37,7 +37,7 @@ The above list shows three processes, th The following command creates an HBase application using the AppPackage for HBase. ./slider create cl1 --template /work/appConf.json --resources /work/resources.json - + Lets analyze various parameters from the perspective of app creation: * `--template`: app configuration @@ -110,12 +110,12 @@ Looking at the content through unzip -l Sample **resources-default.json** and **appConfig-default.json** files are also included in the enlistment. These are samples and are typically tested on one node test installations. These files are not used during the create commmand, rather the files provided as input parameter are the ones that are used. *So you can leave these files as is in the package.* -### --template appConfig.json +### `--template appConfig.json` An appConfig.json contains the application configuration. See [Specifications InstanceConfiguration](application_instance_configuration.html) for details on how to create a template config file. The enlistment includes sample config files for HBase, Accumulo, and Storm. -### --resources resources.json -Resource specification is an input to Slider to specify the Yarn resource needs for each component type that belong to the application. [Specification of Resources](resource_specification.html) describes how to write a resource config json file. The enlistment includes sample config files for HBase, Accumulo, and Storm. +### `--resources resources.json` +Resource specification is an input to Slider to specify the Yarn resource needs for each component type that belong to the application. [Specification of Resources](/docs/configuration/resource.html) describes how to write a resource config json file. The enlistment includes sample config files for HBase, Accumulo, and Storm. ## Scripting for AppPackage Modified: incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md (original) +++ incubator/slider/site/trunk/content/docs/slider_specs/hello_world_slider_app.md Wed Mar 25 17:59:26 2015 @@ -106,7 +106,8 @@ Most applications release a tarball that ## Step 3: Create a default resources file (resources.json) -By default all resources.json file must include slider-appmaster. Add one more entry for the component MEMCACHED and assign a unique priority and default number of instances. Ensure, that a suitable default value is provided for yarn.memory. Additional details are available [here](/docs/slider_specs/resource_specification.html). +By default all `resources.json` files must include a `slider-appmaster` component +Add one more entry for the component `MEMCACHED` and assign a unique priority and default number of instances. Ensure, that a suitable default value is provided for yarn.memory. Additional details are available [here](/docs/configuration/resource.html)). { Modified: incubator/slider/site/trunk/content/docs/slider_specs/index.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/index.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/index.md (original) +++ incubator/slider/site/trunk/content/docs/slider_specs/index.md Wed Mar 25 17:59:26 2015 @@ -40,18 +40,20 @@ Refer to [Creating a Slider package for Memcached](hello_world_slider_app.html) for a quick over view of how to write a Slider app. -Packaging enhancements: [Simplified Packaging](simple_pkg.html) describes a simplified version of packaging that Slider supports for applications that do not need full capability of a Slider application package. *The work is available in the develop branch and is targeted for the next relase.* +Packaging enhancements: [Simplified Packaging](simple_pkg.html) describes a simplified version of packaging that Slider +supports for applications that do not need full capability of a Slider application package. +*The work is available in the develop branch and is targeted for the next relase.* -The entry points to leverage Slider are: +The entry points to use Slider are: - [Application Needs](application_needs.html) What it takes to be deployable by Slider. - [Slider AppPackage](creating_app_definitions.html) Overview of how to create an Slider AppPackage. - [Specifications for AppPackage](application_package.html) Describes the structure of an AppPackage -- [Specifications for Application Definition](application_definition.html) How to write metainfo.xml? -- [Specification of Resources](resource_specification.html) How to write a resource spec for an app? -- [Specifications InstanceConfiguration](application_instance_configuration.html) How to write a template config for an app? +- [Specifications for Application Definition](application_definition.html) How to write metainfo.xml +- [Specification of Resources](/docs/configuration/resource.html) How to write a resource spec for an app/ +- [Specifications InstanceConfiguration](application_instance_configuration.html) How to write a template config for an app - [Specifications for Configuration](application_configuration.html) Default application configuration - [Specifying Exports](specifying_exports.html) How to specify exports for an application? - [Documentation for "General Developer Guidelines"](/developing/index.html) -* [Configuring the Slider Chaos Monkey](chaosmonkey.html) - + + Modified: incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md (original) +++ incubator/slider/site/trunk/content/docs/slider_specs/specifying_exports.md Wed Mar 25 17:59:26 2015 @@ -30,7 +30,7 @@ All exports are specified in the metadat Slider application packages accept an appConfig.json file for all application configuration supplied by the user. Any property whose name starts with "site" are considered configuration. [Specifications InstanceConfiguration](application_instance_configuration.html) describes the naming convention. ### Export specific configs -By default all configurations are exported (e.g. http://hos1:44500/ws/v1/slider/publisher/slider/storm-site). They can be disabled by specifying `<exportedConfigs>None</exportedConfigs>` under `<application>`. If you want to explicitly specify what to publish you can use comma separated named such as `<exportedConfigs>storm-site,another-site</exportedConfigs>`. +By default all configurations are exported (e.g. `http://host1:44500/ws/v1/slider/publisher/slider/storm-site`). They can be disabled by specifying `<exportedConfigs>None</exportedConfigs>` under `<application>`. If you want to explicitly specify what to publish you can use comma separated named such as `<exportedConfigs>storm-site,another-site</exportedConfigs>`. ### Which component is responsible for export By default an arbitrary master is chosen as the master responsible for exporting the config. *What this means is that when this master is STARTED the applied config known at that time is exported*. Otherwise, you can specify which master component type should export configuration by specifying `<publishConfig>true</publishConfig>` under `<component>`. Modified: incubator/slider/site/trunk/content/docs/ssl.md URL: http://svn.apache.org/viewvc/incubator/slider/site/trunk/content/docs/ssl.md?rev=1669188&r1=1669187&r2=1669188&view=diff ============================================================================== --- incubator/slider/site/trunk/content/docs/ssl.md (original) +++ incubator/slider/site/trunk/content/docs/ssl.md Wed Mar 25 17:59:26 2015 @@ -1,3 +1,21 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + #Set Up Two-Way SSL Between the Slider Agents and the Application Master Two-way SSL provides a higher level of secure communication between the Slider Application Master and Agents by requiring both to verify each other's identify prior to the exchange of HTTP requests and responses. By default the communication mechanism between the two is One-Way SSL. To enable Two-way SSL: @@ -11,4 +29,4 @@ Two-way SSL provides a higher level of s } } -* Create and start the cluster (e.g. by using the slider command line leveraging the "create" option) +* Create and start the cluster (e.g. by using the slider command line and the "create" option)