spark git commit: Docs small fixes

srowen Tue, 08 Sep 2015 06:39:06 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 37c5edf1c -> 88a07d89e



Docs small fixes

Author: Jacek Laskowski <ja...@japila.pl>

Closes #8629 from jaceklaskowski/docs-fixes.

(cherry picked from commit 6ceed852ab716d8acc46ce90cba9cfcff6d3616f)
Signed-off-by: Sean Owen <so...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/88a07d89
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/88a07d89
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/88a07d89

Branch: refs/heads/branch-1.5
Commit: 88a07d89e91c139a65d3a2d46632500a93b615c3
Parents: 37c5edf
Author: Jacek Laskowski <ja...@japila.pl>
Authored: Tue Sep 8 14:38:10 2015 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Tue Sep 8 14:38:19 2015 +0100

----------------------------------------------------------------------
 docs/building-spark.md   | 23 +++++++++++------------
 docs/cluster-overview.md | 15 ++++++++-------
 2 files changed, 19 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/88a07d89/docs/building-spark.md
----------------------------------------------------------------------
diff --git a/docs/building-spark.md b/docs/building-spark.md
index f133eb9..4db32cf 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -61,12 +61,13 @@ If you don't run this, you may see errors like the 
following:
 You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
 
 **Note:**
-* *For Java 8 and above this step is not required.*
-* *If using `build/mvn` and `MAVEN_OPTS` were not already set, the script will 
automate this for you.*
+
+* For Java 8 and above this step is not required.
+* If using `build/mvn` with no `MAVEN_OPTS` set, the script will automate this 
for you.
 
 # Specifying the Hadoop Version
 
-Because HDFS is not protocol-compatible across versions, if you want to read 
from HDFS, you'll need to build Spark against the specific HDFS version in your 
environment. You can do this through the "hadoop.version" property. If unset, 
Spark will build against Hadoop 2.2.0 by default. Note that certain build 
profiles are required for particular Hadoop versions:
+Because HDFS is not protocol-compatible across versions, if you want to read 
from HDFS, you'll need to build Spark against the specific HDFS version in your 
environment. You can do this through the `hadoop.version` property. If unset, 
Spark will build against Hadoop 2.2.0 by default. Note that certain build 
profiles are required for particular Hadoop versions:
 
 <table class="table">
   <thead>
@@ -91,7 +92,7 @@ mvn -Dhadoop.version=1.2.1 -Phadoop-1 -DskipTests clean 
package
 mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
 {% endhighlight %}
 
-You can enable the "yarn" profile and optionally set the "yarn.version" 
property if it is different from "hadoop.version". Spark only supports YARN 
versions 2.2.0 and later.
+You can enable the `yarn` profile and optionally set the `yarn.version` 
property if it is different from `hadoop.version`. Spark only supports YARN 
versions 2.2.0 and later.
 
 Examples:
 
@@ -125,7 +126,7 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive 
-Phive-thriftserver -Dskip
 # Building for Scala 2.11
 To produce a Spark package compiled with Scala 2.11, use the `-Dscala-2.11` 
property:
 
-    dev/change-scala-version.sh 2.11
+    ./dev/change-scala-version.sh 2.11
     mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
 
 Spark does not yet support its JDBC component for Scala 2.11.
@@ -163,11 +164,9 @@ the `spark-parent` module).
 
 Thus, the full flow for running continuous-compilation of the `core` submodule 
may look more like:
 
-```
- $ mvn install
- $ cd core
- $ mvn scala:cc
-```
+    $ mvn install
+    $ cd core
+    $ mvn scala:cc
 
 # Building Spark with IntelliJ IDEA or Eclipse
 
@@ -193,11 +192,11 @@ then ship it over to the cluster. We are investigating 
the exact cause for this.
 
 # Packaging without Hadoop Dependencies for YARN
 
-The assembly jar produced by `mvn package` will, by default, include all of 
Spark's dependencies, including Hadoop and some of its ecosystem projects. On 
YARN deployments, this causes multiple versions of these to appear on executor 
classpaths: the version packaged in the Spark assembly and the version on each 
node, included with yarn.application.classpath.  The `hadoop-provided` profile 
builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper 
and Hadoop itself.
+The assembly jar produced by `mvn package` will, by default, include all of 
Spark's dependencies, including Hadoop and some of its ecosystem projects. On 
YARN deployments, this causes multiple versions of these to appear on executor 
classpaths: the version packaged in the Spark assembly and the version on each 
node, included with `yarn.application.classpath`.  The `hadoop-provided` 
profile builds the assembly without including Hadoop-ecosystem projects, like 
ZooKeeper and Hadoop itself.
 
 # Building with SBT
 
-Maven is the official recommendation for packaging Spark, and is the "build of 
reference".
+Maven is the official build tool recommended for packaging Spark, and is the 
*build of reference*.
 But SBT is supported for day-to-day development since it can provide much 
faster iterative
 compilation. More advanced developers may wish to use SBT.
 

http://git-wip-us.apache.org/repos/asf/spark/blob/88a07d89/docs/cluster-overview.md
----------------------------------------------------------------------
diff --git a/docs/cluster-overview.md b/docs/cluster-overview.md
index 7079de5..faaf154 100644
--- a/docs/cluster-overview.md
+++ b/docs/cluster-overview.md
@@ -5,18 +5,19 @@ title: Cluster Mode Overview
 
 This document gives a short overview of how Spark runs on clusters, to make it 
easier to understand
 the components involved. Read through the [application submission 
guide](submitting-applications.html)
-to submit applications to a cluster.
+to learn about launching applications on a cluster.
 
 # Components
 
-Spark applications run as independent sets of processes on a cluster, 
coordinated by the SparkContext
+Spark applications run as independent sets of processes on a cluster, 
coordinated by the `SparkContext`
 object in your main program (called the _driver program_).
+
 Specifically, to run on a cluster, the SparkContext can connect to several 
types of _cluster managers_
-(either Spark's own standalone cluster manager or Mesos/YARN), which allocate 
resources across
+(either Spark's own standalone cluster manager, Mesos or YARN), which allocate 
resources across
 applications. Once connected, Spark acquires *executors* on nodes in the 
cluster, which are
 processes that run computations and store data for your application.
 Next, it sends your application code (defined by JAR or Python files passed to 
SparkContext) to
-the executors. Finally, SparkContext sends *tasks* for the executors to run.
+the executors. Finally, SparkContext sends *tasks* to the executors to run.
 
 <p style="text-align: center;">
   <img src="img/cluster-overview.png" title="Spark cluster components" 
alt="Spark cluster components" />
@@ -33,9 +34,9 @@ There are several useful things to note about this 
architecture:
 2. Spark is agnostic to the underlying cluster manager. As long as it can 
acquire executor
    processes, and these communicate with each other, it is relatively easy to 
run it even on a
    cluster manager that also supports other applications (e.g. Mesos/YARN).
-3. The driver program must listen for and accept incoming connections from its 
executors throughout 
-   its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the 
network config 
-   section](configuration.html#networking)). As such, the driver program must 
be network 
+3. The driver program must listen for and accept incoming connections from its 
executors throughout
+   its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the 
network config
+   section](configuration.html#networking)). As such, the driver program must 
be network
    addressable from the worker nodes.
 4. Because the driver schedules tasks on the cluster, it should be run close 
to the worker
    nodes, preferably on the same local area network. If you'd like to send 
requests to the


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: Docs small fixes

Reply via email to