Repository: samza
Updated Branches:
  refs/heads/master 2a75a209b -> c3db20483


Improve documentation for Resource Localization

This is a follow-up to Fred Ji's original PR : 
https://github.com/apache/samza/pull/191 .

Author: vjagadish1989 <jvenk...@linkedin.com>

Reviewers: Prateek Maheshwari <pmahe...@linkedin.com>

Closes #199 from vjagadish1989/doc-improvements


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/c3db2048
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/c3db2048
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/c3db2048

Branch: refs/heads/master
Commit: c3db2048374f1ed520a32b2f9e5534e8dcdb413b
Parents: 2a75a20
Author: vjagadish1989 <jvenk...@linkedin.com>
Authored: Mon May 22 17:42:32 2017 -0700
Committer: vjagadish1989 <jvenk...@linkedin.com>
Committed: Mon May 22 17:42:32 2017 -0700

----------------------------------------------------------------------
 .../yarn/yarn-resource-localization.md          | 59 +++++++-------------
 1 file changed, 19 insertions(+), 40 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/c3db2048/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
----------------------------------------------------------------------
diff --git 
a/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md 
b/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
index a55670b..3d1c87a 100644
--- a/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
+++ b/docs/learn/documentation/versioned/yarn/yarn-resource-localization.md
@@ -18,58 +18,49 @@ title: YARN Resource Localization
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-
-When Samza jobs run on YARN clusters, sometimes there are needs to preload 
some files or data (called as resources in this doc) before job starts, such as 
preparing the job package, fetching job certificate, or etc., Samza supports a 
general configuration way to localize difference resources.
+When running Samza jobs on YARN clusters, you may need to download some 
resources before startup (For example, downloading the job binaries, fetching 
certificate files etc.) This step is called as Resource Localization.
 
 ### Resource Localization Process
 
-For the Samza jobs running on YARN, the resource localization leverages the 
YARN node manager localization service. Here is a good [deep 
dive](https://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/) 
from Horton Works on how localization works in YARN. 
-
-Depending on where and how the resource comes from, fetching the resource is 
associated with a scheme in the path, such as `http`, `https`, `hdfs`, `ftp`, 
`file`, etc., which maps to a certain FileSystem for handling the localization. 
+For Samza jobs running on YARN, resource localization leverages the YARN node 
manager's localization service. Here is a [deep 
dive](https://hortonworks.com/blog/resource-localization-in-yarn-deep-dive/) on 
how localization works in YARN. 
 
-If there is an implementation of 
[FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html)
 on YARN supporting a scheme, then that scheme can be used for resource 
localization. 
+Depending on where and how the resource comes from, fetching the resource is 
associated with a scheme in the path (such as `http`, `https`, `hdfs`, `ftp`, 
`file`, etc). The scheme maps to a corresponding `FileSystem` implementation 
for handling the localization. 
 
-There are some predefined file systems in Hadoop or Samza, which are provided 
if you run Samza jobs on YARN:
+There are some predefined `FileSystem` implementations in Hadoop and Samza, 
which are provided if you run Samza jobs on YARN:
 
-* `org.apache.samza.util.hadoop.HttpFileSystem`: used for fetching resources 
based on http, or https without client side authentication requirement.
+* `org.apache.samza.util.hadoop.HttpFileSystem`: used for fetching resources 
based on http or https without client side authentication.
 * `org.apache.hadoop.hdfs.DistributedFileSystem`: used for fetching resource 
from DFS system on Hadoop.
 * `org.apache.hadoop.fs.LocalFileSystem`: used for copying resources from 
local file system to the job directory.
 * `org.apache.hadoop.fs.ftp.FTPFileSystem`: used for fetching resources based 
on ftp.
-* ...
 
-If you would like to have your own file system, you need to implement a class 
which extends from `org.apache.hadoop.fs.FileSystem`. 
+You can create your own file system implementation by creating a class which 
extends from `org.apache.hadoop.fs.FileSystem`. 
 
-### Job Configuration
-With the configuration properly defined, the resources a job requiring from 
external or internal locations may be prepared automatically before it runs.
-
-For each resource with the name `<resourceName>` in the Samza job, the 
following set of job configurations are used when running on a YARN cluster. 
The first one which definiing resource path is required, but the others are 
optional and they have default values.
+### Resource Configuration
+You can specify a resource to be localized by the following configuration.
 
+#### Required Configuration
 1. `yarn.resources.<resourceName>.path`
-    * Required
-    * The path for fetching the resource for localization, e.g. 
http://hostname.com/packages/mySamzaJob
+    * The path for fetching the resource for localization, e.g. 
http://hostname.com/packages/myResource
+
+#### Optional Configuration
 2. `yarn.resources.<resourceName>.local.name`
-    * Optional 
     * The local name used for the localized resource.
-    * If not set, the default one will be `<resourceName>` from the config key.
+    * If it is not set, the default will be the `<resourceName>` specified in 
`yarn.resources.<resourceName>.path`
 3. `yarn.resources.<resourceName>.local.type`
-    * Optional 
-    * Localized resource type with valid values from: `ARCHIVE`, `FILE`, 
`PATTERN`.
+    * The type of the resource with valid values from: `ARCHIVE`, `FILE`, 
`PATTERN`.
         * ARCHIVE: the localized resource will be an archived directory;
         * FILE: the localized resource will be a file;
         * PATTERN: the localized resource will be the entries extracted from 
the archive with the pattern.
-    * If not set, the default value is `FILE`.
+    * If it is not set, the default value is `FILE`.
 4. `yarn.resources.<resourceName>.local.visibility`
-    * Optional
-    * Localized resource visibility for the resource, and it can be a value 
from `PUBLIC`, `PRIVATE`, `APPLICATION`
+    * Visibility for the resource with valid values from `PUBLIC`, `PRIVATE`, 
`APPLICATION`
         * PUBLIC: visible to everyone 
         * PRIVATE: visible to just the account which runs the job
         * APPLICATION: visible only to the specific application job which has 
the resource configuration
-    * If not set, the default value is `APPLICATION`
-
-It is up to you how to name the resource, but `<resourceName>` should be the 
same in the above configurations to apply to the same resource. 
+    * If it is not set, the default value is `APPLICATION`
 
 ### YARN Configuration
-Make sure the scheme used in the yarn.resources.&lt;resourceName&gt;.path is 
configured in YARN core-site.xml with a FileSystem implementation. For example, 
for scheme `http`, you should have the following property in YARN core-site.xml:
+Make sure the scheme used in the `yarn.resources.<resourceName>.path` is 
configured with a corresponding FileSystem implementation in YARN core-site.xml.
 
 {% highlight xml %}
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
@@ -81,19 +72,7 @@ Make sure the scheme used in the 
yarn.resources.&lt;resourceName&gt;.path is con
 </configuration>
 {% endhighlight %}
 
-You can override a behavior for a scheme by linking it to another file system. 
For example, you have a special need for localizing a resource for your job 
through http request, you may implement your own Http File System by extending 
[FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html),
 and have the following configuration:
-
-{% highlight xml %}
-<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-<configuration>
-    <property>
-      <name>fs.http.impl</name>
-      <value>com.myCompany.MyHttpFileSystem</value>
-    </property>
-</configuration>
-{% endhighlight %}
-
-If you are using other scheme which is not defined in Hadoop or Samza, for 
example, `yarn.resources.mySampleResource.path=myScheme://host.com/test`, in 
your job configuration, you may implement your own 
[FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html)
 such as com.myCompany.MySchemeFileSystem and link it with your own scheme in 
yarn core-site.xml configuration.
+If you are using your own scheme (for example, 
`yarn.resources.myResource.path=myScheme://host.com/test`), you can link your 
[FileSystem](https://hadoop.apache.org/docs/stable/api/index.html?org/apache/hadoop/fs/FileSystem.html)
 implementation with it as follows.
 
 {% highlight xml %}
 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

Reply via email to