spark git commit: [MINOR][DOCS] Add note about Spark network security

gurwls223 Wed, 01 Aug 2018 19:23:20 -0700

Repository: spark
Updated Branches:
  refs/heads/master c5fe41292 -> c9914cf04



[MINOR][DOCS] Add note about Spark network security

## What changes were proposed in this pull request?

In response to a recent question, this reiterates that network access to a 
Spark cluster should be disabled by default, and that access to its hosts and 
services from outside a private network should be added back explicitly.

Also, some minor touch-ups while I was at it.

## How was this patch tested?

N/A

Author: Sean Owen <sro...@gmail.com>

Closes #21947 from srowen/SecurityNote.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9914cf0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9914cf0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9914cf0

Branch: refs/heads/master
Commit: c9914cf0490d13820fb4081eb05188b4903eb980
Parents: c5fe412
Author: Sean Owen <sro...@gmail.com>
Authored: Thu Aug 2 10:22:52 2018 +0800
Committer: hyukjinkwon <gurwls...@apache.org>
Committed: Thu Aug 2 10:22:52 2018 +0800

----------------------------------------------------------------------
 docs/security.md         | 23 ++++++++++++++++++-----
 docs/spark-standalone.md | 15 +++++++++++----
 2 files changed, 29 insertions(+), 9 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c9914cf0/docs/security.md
----------------------------------------------------------------------
diff --git a/docs/security.md b/docs/security.md
index 6ef3a80..1de1d63 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -278,7 +278,7 @@ To enable authorization in the SHS, a few extra options are 
used:
 <table class="table">
 <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
 <tr>
-  <td>spark.history.ui.acls.enable</td>
+  <td><code>spark.history.ui.acls.enable</code></td>
   <td>false</td>
   <td>
     Specifies whether ACLs should be checked to authorize users viewing the 
applications in
@@ -292,7 +292,7 @@ To enable authorization in the SHS, a few extra options are 
used:
   </td>
 </tr>
 <tr>
-  <td>spark.history.ui.admin.acls</td>
+  <td><code>spark.history.ui.admin.acls</code></td>
   <td>None</td>
   <td>
     Comma separated list of users that have view access to all the Spark 
applications in history
@@ -300,7 +300,7 @@ To enable authorization in the SHS, a few extra options are 
used:
   </td>
 </tr>
 <tr>
-  <td>spark.history.ui.admin.acls.groups</td>
+  <td><code>spark.history.ui.admin.acls.groups</code></td>
   <td>None</td>
   <td>
     Comma separated list of groups that have view access to all the Spark 
applications in history
@@ -501,6 +501,7 @@ can be accomplished by setting `spark.ssl.useNodeLocalConf` 
to `true`. In that c
 provided by the user on the client side are not used.
 
 ### Mesos mode
+
 Mesos 1.3.0 and newer supports `Secrets` primitives as both file-based and 
environment based
 secrets. Spark allows the specification of file-based and environment variable 
based secrets with
 `spark.mesos.driver.secret.filenames` and `spark.mesos.driver.secret.envkeys`, 
respectively.
@@ -562,8 +563,12 @@ Security.
 
 # Configuring Ports for Network Security
 
-Spark makes heavy use of the network, and some environments have strict 
requirements for using tight
-firewall settings.  Below are the primary ports that Spark uses for its 
communication and how to
+Generally speaking, a Spark cluster and its services are not deployed on the 
public internet.
+They are generally private services, and should only be accessible within the 
network of the
+organization that deploys Spark. Access to the hosts and ports used by Spark 
services should
+be limited to origin hosts that need to access the services.
+
+Below are the primary ports that Spark uses for its communication and how to
 configure those ports.
 
 ## Standalone mode only
@@ -598,6 +603,14 @@ configure those ports.
     <td>Set to "0" to choose a port randomly. Standalone mode only.</td>
   </tr>
   <tr>
+    <td>External Service</td>
+    <td>Standalone Master</td>
+    <td>6066</td>
+    <td>Submit job to cluster via REST API</td>
+    <td><code>spark.master.rest.port</code></td>
+    <td>Use <code>spark.master.rest.enabled</code> to enable/disable this 
service. Standalone mode only.</td>
+  </tr>
+  <tr>
     <td>Standalone Master</td>
     <td>Standalone Worker</td>
     <td>(random)</td>

http://git-wip-us.apache.org/repos/asf/spark/blob/c9914cf0/docs/spark-standalone.md
----------------------------------------------------------------------
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 14d742d..7975b0c 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -362,8 +362,15 @@ You can run Spark alongside your existing Hadoop cluster 
by just launching it as
 
 # Configuring Ports for Network Security
 
-Spark makes heavy use of the network, and some environments have strict 
requirements for using
-tight firewall settings. For a complete list of ports to configure, see the
+Generally speaking, a Spark cluster and its services are not deployed on the 
public internet.
+They are generally private services, and should only be accessible within the 
network of the
+organization that deploys Spark. Access to the hosts and ports used by Spark 
services should
+be limited to origin hosts that need to access the services.
+
+This is particularly important for clusters using the standalone resource 
manager, as they do
+not support fine-grained access control in a way that other resource managers 
do.
+
+For a complete list of ports to configure, see the
 [security page](security.html#configuring-ports-for-network-security).
 
 # High Availability
@@ -376,7 +383,7 @@ By default, standalone scheduling clusters are resilient to 
Worker failures (ins
 
 Utilizing ZooKeeper to provide leader election and some state storage, you can 
launch multiple Masters in your cluster connected to the same ZooKeeper 
instance. One will be elected "leader" and the others will remain in standby 
mode. If the current leader dies, another Master will be elected, recover the 
old Master's state, and then resume scheduling. The entire recovery process 
(from the time the first leader goes down) should take between 1 and 2 minutes. 
Note that this delay only affects scheduling _new_ applications -- applications 
that were already running during Master failover are unaffected.
 
-Learn more about getting started with ZooKeeper 
[here](http://zookeeper.apache.org/doc/current/zookeeperStarted.html).
+Learn more about getting started with ZooKeeper 
[here](https://zookeeper.apache.org/doc/current/zookeeperStarted.html).
 
 **Configuration**
 
@@ -419,6 +426,6 @@ In order to enable this recovery mode, you can set 
SPARK_DAEMON_JAVA_OPTS in spa
 
 **Details**
 
-* This solution can be used in tandem with a process monitor/manager like 
[monit](http://mmonit.com/monit/), or just to enable manual recovery via 
restart.
+* This solution can be used in tandem with a process monitor/manager like 
[monit](https://mmonit.com/monit/), or just to enable manual recovery via 
restart.
 * While filesystem recovery seems straightforwardly better than not doing any 
recovery at all, this mode may be suboptimal for certain development or 
experimental purposes. In particular, killing a master via stop-master.sh does 
not clean up its recovery state, so whenever you start a new Master, it will 
enter recovery mode. This could increase the startup time by up to 1 minute if 
it needs to wait for all previously-registered Workers/clients to timeout.
 * While it's not officially supported, you could mount an NFS directory as the 
recovery directory. If the original Master node dies completely, you could then 
start a Master on a different node, which would correctly recover all 
previously registered Workers/applications (equivalent to ZooKeeper recovery). 
Future applications will have to be able to find the new Master, however, in 
order to register.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [MINOR][DOCS] Add note about Spark network security

Reply via email to