spark git commit: [SPARK-25023] Clarify Spark security documentation

tgraves Fri, 02 Nov 2018 08:57:31 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-2.4 8c508da2a -> ea11d1142



[SPARK-25023] Clarify Spark security documentation

## What changes were proposed in this pull request?

Clarify documentation about security.

## How was this patch tested?

None, just documentation

Closes #22852 from tgravescs/SPARK-25023.

Authored-by: Thomas Graves <tgra...@thirteenroutine.corp.gq1.yahoo.com>
Signed-off-by: Thomas Graves <tgra...@apache.org>
(cherry picked from commit c00186f90cfcc33492d760f874ead34f0e3da6ed)
Signed-off-by: Thomas Graves <tgra...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea11d114
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea11d114
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea11d114

Branch: refs/heads/branch-2.4
Commit: ea11d114264560638129eac1db3aa1dc12a206a2
Parents: 8c508da
Author: Thomas Graves <tgra...@thirteenroutine.corp.gq1.yahoo.com>
Authored: Fri Nov 2 10:56:30 2018 -0500
Committer: Thomas Graves <tgra...@apache.org>
Committed: Fri Nov 2 10:56:44 2018 -0500

----------------------------------------------------------------------
 docs/index.md                 |  5 +++++
 docs/quick-start.md           |  5 +++++
 docs/running-on-kubernetes.md |  5 +++++
 docs/running-on-mesos.md      |  5 +++++
 docs/running-on-yarn.md       |  5 +++++
 docs/security.md              | 17 +++++++++++++++--
 docs/spark-standalone.md      |  5 +++++
 7 files changed, 45 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 40f628b..0300528 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -10,6 +10,11 @@ It provides high-level APIs in Java, Scala, Python and R,
 and an optimized engine that supports general execution graphs.
 It also supports a rich set of higher-level tools including [Spark 
SQL](sql-programming-guide.html) for SQL and structured data processing, 
[MLlib](ml-guide.html) for machine learning, 
[GraphX](graphx-programming-guide.html) for graph processing, and [Spark 
Streaming](streaming-programming-guide.html).
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) before downloading and running 
Spark.
+
 # Downloading
 
 Get Spark from the [downloads page](https://spark.apache.org/downloads.html) 
of the project website. This documentation is for Spark version 
{{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. 
Downloads are pre-packaged for a handful of popular Hadoop versions.

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/quick-start.md
----------------------------------------------------------------------
diff --git a/docs/quick-start.md b/docs/quick-start.md
index ef7af6c..28186c1 100644
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -17,6 +17,11 @@ you can download a package for any version of Hadoop.
 
 Note that, before Spark 2.0, the main programming interface of Spark was the 
Resilient Distributed Dataset (RDD). After Spark 2.0, RDDs are replaced by 
Dataset, which is strongly-typed like an RDD, but with richer optimizations 
under the hood. The RDD interface is still supported, and you can get a more 
detailed reference at the [RDD programming guide](rdd-programming-guide.html). 
However, we highly recommend you to switch to use Dataset, which has better 
performance than RDD. See the [SQL programming 
guide](sql-programming-guide.html) to get more information about Dataset.
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) before running Spark.
+
 # Interactive Analysis with the Spark Shell
 
 ## Basics

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/running-on-kubernetes.md
----------------------------------------------------------------------
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index f19aa41..754b1ff 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -12,6 +12,11 @@ Kubernetes scheduler that has been added to Spark.
 In future versions, there may be behavioral changes around configuration,
 container images and entrypoints.**
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) and the specific security sections 
in this doc before running Spark.
+
 # Prerequisites
 
 * A runnable distribution of Spark 2.3 or above.

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/running-on-mesos.md
----------------------------------------------------------------------
diff --git a/docs/running-on-mesos.md b/docs/running-on-mesos.md
index b473e65..2502cd4 100644
--- a/docs/running-on-mesos.md
+++ b/docs/running-on-mesos.md
@@ -13,6 +13,11 @@ The advantages of deploying Spark with Mesos include:
   [frameworks](https://mesos.apache.org/documentation/latest/frameworks/)
 - scalable partitioning between multiple instances of Spark
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) and the specific security sections 
in this doc before running Spark.
+
 # How it Works
 
 In a standalone cluster deployment, the cluster manager in the below diagram 
is a Spark master

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/running-on-yarn.md
----------------------------------------------------------------------
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index e3d67c3..f265075 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -9,6 +9,11 @@ Support for running on [YARN (Hadoop
 
NextGen)](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html)
 was added to Spark in version 0.6.0, and improved in subsequent releases.
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) and the specific security sections 
in this doc before running Spark.
+
 # Launching Spark on YARN
 
 Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which 
contains the (client side) configuration files for the Hadoop cluster.

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/security.md
----------------------------------------------------------------------
diff --git a/docs/security.md b/docs/security.md
index 7fb3e17..2948fbc 100644
--- a/docs/security.md
+++ b/docs/security.md
@@ -6,7 +6,20 @@ title: Security
 * This will become a table of contents (this text will be scraped).
 {:toc}
 
-# Spark RPC
+# Spark Security: Things You Need To Know
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Spark supports multiple deployments types and each one supports different 
levels of security. Not
+all deployment types will be secure in all environments and none are secure by 
default. Be
+sure to evaluate your environment, what Spark supports, and take the 
appropriate measure to secure
+your Spark deployment.
+
+There are many different types of security concerns. Spark does not 
necessarily protect against
+all things. Listed below are some of the things Spark supports. Also check the 
deployment
+documentation for the type of deployment you are using for deployment specific 
settings. Anything
+not documented, Spark does not support.
+
+# Spark RPC (Communication protocol between Spark processes)
 
 ## Authentication
 
@@ -123,7 +136,7 @@ The following table describes the different options 
available for configuring th
 Spark supports encrypting temporary data written to local disks. This covers 
shuffle files, shuffle
 spills and data blocks stored on disk (for both caching and broadcast 
variables). It does not cover
 encrypting output data generated by applications with APIs such as 
`saveAsHadoopFile` or
-`saveAsTable`.
+`saveAsTable`. It also may not cover temporary files created explicitly by the 
user.
 
 The following settings cover enabling encryption for data written to disk:
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ea11d114/docs/spark-standalone.md
----------------------------------------------------------------------
diff --git a/docs/spark-standalone.md b/docs/spark-standalone.md
index 7975b0c..49ef2e1 100644
--- a/docs/spark-standalone.md
+++ b/docs/spark-standalone.md
@@ -8,6 +8,11 @@ title: Spark Standalone Mode
 
 In addition to running on the Mesos or YARN cluster managers, Spark also 
provides a simple standalone deploy mode. You can launch a standalone cluster 
either manually, by starting a master and workers by hand, or use our provided 
[launch scripts](#cluster-launch-scripts). It is also possible to run these 
daemons on a single machine for testing.
 
+# Security
+
+Security in Spark is OFF by default. This could mean you are vulnerable to 
attack by default.
+Please see [Spark Security](security.html) and the specific security sections 
in this doc before running Spark.
+
 # Installing Spark Standalone to a Cluster
 
 To install Spark Standalone mode, you simply place a compiled version of Spark 
on each node on the cluster. You can obtain pre-built versions of Spark with 
each release or [build it yourself](building-spark.html).


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-25023] Clarify Spark security documentation

Reply via email to