[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158168783
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,573 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `kubernetes/dockerfiles/` directory and can be 
customized further before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
--- End diff --

We need to explicitly call out that app names in k8s mode cannot have 
spaces and special characters.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r158008588
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157967544
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157966640
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
--- End diff --

This has been addressed and the script has been fixed now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-20 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157965524
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

Ping @mridulm, @rxin - can you please confirm the scope of the renaming you 
were referring to here? Is it just the maven target? Changing all the config 
options etc would be a considerably large change at this point. Also, a point 
that was brought up today was - while k8s is common shorthand, it's not 
universal as is the full name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-15 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157293530
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

https://github.com/apache/spark/pull/19995 PTAL


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-15 Thread erikerlandson
Github user erikerlandson commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157274013
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

I'd expect the container run-time to be transparent to the particular 
image, and very definitely transparent to spark.  Using `container.image` makes 
sense from either standpoint.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-15 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157269442
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

Makes sense. I think `container.image` should be fine then - will send a PR 
changing that. 
I don't think there are plans to support choosing the runtime on the fly, 
or letting the CRI abstraction leak into the application layer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-15 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157266862
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

+1 to @ueshin's comment; if the same property supports more than docker 
images, it should be renamed to use a more generic name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157133964
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

I wonder if k8s can tell the container runtime from the image name? If it 
can, we can use `container.image` here, but otherwise, I guess we need another 
config like `xxx.container` to tell the container runtime and 
`xxx.${container}.image` to specify the image, e.g. `xxx.container=docker` and 
`xxx.docker.image=something`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157120443
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

Just my two cents, if we could foresee that we may add support to other 
containers, then we should consider using `container.image` instead of 
`docker.image` at the beginning, because rename a config is also considered a 
kind of behavior change, that commonly takes more effort to process. If it is 
not that case (we are satisfied to be only supporting docker containers), then 
it would be fine to just keep the `docker.image` phrase.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157111248
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

I see your point - Although there is flexibility in theory, as of now, it's 
safe to assume that *most* people are running docker containers when using k8s 
- making the name docker much more intuitive. If the other runtimes do see 
traction in future (and we do some testing around them), we can rename to 
`container.image` instead of `docker.image`. As of now, I can make the 
documentation clearer that spark on k8s only supports docker images. Sound like 
a reasonable thing to do here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157110529
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

My comment is more about the properties being called "docker" and whether 
that means only docker images are supported. If you can use any image supported 
by the k8s cluster, than pehaps the properties should be renamed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157095536
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

Are you both also referring to the config options etc, which are still 
`spark.kubernetes.*`, or just the maven build target? If it's everything, it 
would be a fairly large change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157093891
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///path/to/examples.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol is specified in 
the URL, it defaults to `https`. For example,
+setting the master to `k8s://example.com:443` is 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157093878
  
--- Diff: docs/running-on-yarn.md ---
@@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and 
the AM when running in
 
 There are two deploy modes that can be used to launch Spark applications 
on YARN. In `cluster` mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. In `client` mode, the driver runs in the 
client process, and the application master is only used for requesting 
resources from YARN.
 
-Unlike [Spark standalone](spark-standalone.html) and 
[Mesos](running-on-mesos.html) modes, in which the master's address is 
specified in the `--master` parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the `--master` 
parameter is `yarn`.
+Unlike [Spark standalone](spark-standalone.html), 
[Mesos](running-on-mesos.html) and [Kubernetes](running-on-kubernetes.html) 
modes,
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157093475
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
--- End diff --

It is not yet true - it's a TODO item on this PR. Will clarify after we get 
that change in.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157093212
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

@vanzin, we don't make any runtime level assumptions in Spark code. The K8s 
abstraction layer 
[CRI](http://blog.kubernetes.io/2016/12/container-runtime-interface-cri-in-kubernetes.html)
 should in theory allow using a differnet runtime.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread erikerlandson
Github user erikerlandson commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157076409
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

the images should be runnable by other container runtimes (cri-o, rkt, 
etc), although I'm not aware of anybody testing that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread erikerlandson
Github user erikerlandson commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157075744
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///path/to/examples.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol is specified in 
the URL, it defaults to `https`. For example,
+setting the master to `k8s://example.com:443` is 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157064360
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///path/to/examples.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol is specified in 
the URL, it defaults to `https`. For example,
+setting the master to `k8s://example.com:443` is 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157063804
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
--- End diff --

"the `dockerfiles/` directory in the Spark distribution archive"?

(Assuming that's true. Need to clarify where to find `dockerfiles/`.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157066105
  
--- Diff: docs/running-on-yarn.md ---
@@ -18,7 +18,8 @@ Spark application's configuration (driver, executors, and 
the AM when running in
 
 There are two deploy modes that can be used to launch Spark applications 
on YARN. In `cluster` mode, the Spark driver runs inside an application master 
process which is managed by YARN on the cluster, and the client can go away 
after initiating the application. In `client` mode, the driver runs in the 
client process, and the application master is only used for requesting 
resources from YARN.
 
-Unlike [Spark standalone](spark-standalone.html) and 
[Mesos](running-on-mesos.html) modes, in which the master's address is 
specified in the `--master` parameter, in YARN mode the ResourceManager's 
address is picked up from the Hadoop configuration. Thus, the `--master` 
parameter is `yarn`.
+Unlike [Spark standalone](spark-standalone.html), 
[Mesos](running-on-mesos.html) and [Kubernetes](running-on-kubernetes.html) 
modes,
--- End diff --

Should just say "other cluster managers supported by Spark" at this point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157063535
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
--- End diff --

> a container runtime environment that Kubernetes supports

Does the current code support these other container runtimes, or just 
docker? From my memory (and the propery names) it seems that only docker is 
supported?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157065401
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,502 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest release of minikube with the DNS addon 
enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark 
application to a Kubernetes cluster.
+The submission mechanism works as follows:
+
+* Spark creates a Spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API until it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized Spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--master k8s://https://: \
+--deploy-mode cluster \
+--name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--conf spark.executor.instances=5 \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///path/to/examples.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol is specified in 
the URL, it defaults to `https`. For example,
+setting the master to `k8s://example.com:443` is 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157039537
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

I actually second @rxin, I do get the spelling wrong at times :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-14 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r157006909
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

It's `k8s://` in the URL scheme we use and the package names are also `k8s` 
- so, users should never have to type the name.
This is one of the last few holdouts. I'd say that it's consistent here, 
with the use of other cluster manager names in full in their maven projects. I 
can change it here if you feel strongly about it.





---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156821519
  
--- Diff: docs/building-spark.md ---
@@ -49,7 +49,7 @@ To create a Spark distribution like those distributed by 
the
 to be runnable, use `./dev/make-distribution.sh` in the project root 
directory. It can be configured
 with Maven profile settings and so on like the direct Maven build. Example:
 
-./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn
+./dev/make-distribution.sh --name custom-spark --pip --r --tgz 
-Psparkr -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Pyarn -Pkubernetes
--- End diff --

should we use k8s? I kept bringing this up and that's because I can never 
spell Kubernetes properly. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156728882
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
--- End diff --

Some edit probably buried this query ... would be great if you can opine 
@foxish, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156728638
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
--- End diff --

You are right, I did not realize mesos did not honour it and it was yarn 
only option !
It might make sense for k8s also to support it too (since it does not right 
now, my proposal for doc change would be incorrect) .. what do you think 
@foxish  ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156727858
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
--- End diff --

Just to clarify - local:///my/path/jar implies local to the docker 
container ? and /my/path/jar points to submitters machine ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156693078
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156692510
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
--- End diff --

That is specified currently as a YARN-only option in `spark-submit`. Does 
it make sense to move it out to suit both K8s and YARN? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156691165
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156688079
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
--- End diff --

Removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156687169
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
--- End diff --

Done. The other references have been removed. Couldn't find any others.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156690501
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
--- End diff --

Fair enough, I just wanted to show that it had to be a path. The protocol 
`local:///` is necessary to point at container-local files, as opposed to local 
file paths on the submitting user's machine which will both be handled 
separately. Changed it to `local:///path/to/examples.jar`. Is that better?


---


[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156687477
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156686786
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
--- End diff --

As of today, preemption is at random. 
[priority and 
preemption](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/)
 are in alpha as of now. As soon as they go to beta (in the Spark 2.4 
timeframe), we'll add the required pieces to make it honor the rule as you said 
- driver is the last to go, etc. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156675246
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
--- End diff --

As an aside - should have asked about it in scheduler review - how does 
kubernetes handle prioritization of resources ? In particular, if it is 
pre-empting containers, is there a relative ordering ?
Will driver will always be the last to go ?
If this has to be user specified, do we need to add doc link to it here ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156677468
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
--- End diff --

--num-executors instead ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156672841
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
--- End diff --

Remove experimental ? I think there are other references (a grep should 
catch them all) and we can remove them all now ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156677077
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
--- End diff --

Remove this default value from example ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156683488
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679139
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
--- End diff --

Just the filename ?

I want to make sure it is intuitive and easy for users looking at examples 
- and it looks familiar with how spark is typically invoked. (Though the scheme 
is referenced here 
https://github.com/apache/spark/pull/19946/files#diff-b5527f236b253e0d9f5db5164bdb43e9R108
 : I am assuming avoiding it defaults to local)


---


[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156674433
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
--- End diff --

Just curious - what about disk usage ? What is it proportional to ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156677643
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
--- End diff --

Reorder so that master is above deploy-mode ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread mridulm
Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156678749
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
--- End diff --

--name instead ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156682354
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156680716
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679585
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156680213
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156682749
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156683182
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156678936
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156680776
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156682918
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679094
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679614
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156680505
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679949
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156683316
  
--- Diff: docs/submitting-applications.md ---
@@ -155,6 +165,12 @@ The master URL passed to Spark can be in one of the 
following formats:
 client or cluster mode depending on the 
value of --deploy-mode.
 The cluster location will be found based on the 
HADOOP_CONF_DIR or YARN_CONF_DIR variable.
 
+ k8s://HOST:PORT  Connect to a  Kubernetes  cluster in
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679055
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156678238
  
--- Diff: docs/index.md ---
@@ -112,7 +113,7 @@ options for deployment:
   * [Mesos](running-on-mesos.html): deploy a private cluster using
   [Apache Mesos](http://mesos.apache.org)
   * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen 
(YARN)
-  * [Kubernetes 
(experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on 
top of Kubernetes
+  * [Kubernetes (experimental)](running-on-kubernetes.html): deploy Spark 
on top of Kubernetes
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156679300
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641676
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156639188
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641285
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156640797
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156643797
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156633323
  
--- Diff: docs/index.md ---
@@ -112,7 +113,7 @@ options for deployment:
   * [Mesos](running-on-mesos.html): deploy a private cluster using
   [Apache Mesos](http://mesos.apache.org)
   * [YARN](running-on-yarn.html): deploy Spark on top of Hadoop NextGen 
(YARN)
-  * [Kubernetes 
(experimental)](https://github.com/apache-spark-on-k8s/spark): deploy Spark on 
top of Kubernetes
+  * [Kubernetes (experimental)](running-on-kubernetes.html): deploy Spark 
on top of Kubernetes
--- End diff --

We can remove `(experimental)` here as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156641748
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-13 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156639540
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156499655
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156497503
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
--- End diff --

s/till/until


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156499210
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156497184
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
--- End diff --

`spark-submit`

"The submission mechanism works as follows:"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156498525
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156499173
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156500437
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156500162
  
--- Diff: docs/submitting-applications.md ---
@@ -155,6 +165,12 @@ The master URL passed to Spark can be in one of the 
following formats:
 client or cluster mode depending on the 
value of --deploy-mode.
 The cluster location will be found based on the 
HADOOP_CONF_DIR or YARN_CONF_DIR variable.
 
+ k8s://HOST:PORT  Connect to a  Kubernetes  cluster in
--- End diff --

Remove spaces around "Kubernetes".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156498835
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156496946
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
--- End diff --

This sentence doesn't read right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156499363
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,498 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This feature makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also running within Kubernetes 
pods and connects to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+{% highlight bash %}
+$ bin/spark-submit \
+--deploy-mode cluster \
+--class org.apache.spark.examples.SparkPi \
+--master k8s://https://: \
+--conf spark.kubernetes.namespace=default \
+--conf spark.executor.instances=5 \
+--conf spark.app.name=spark-pi \
+--conf spark.kubernetes.driver.docker.image= \
+--conf spark.kubernetes.executor.docker.image= \
+local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+{% endhighlight %}
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156460094
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156460137
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156460053
  
--- Diff: sbin/build-push-docker-images.sh ---
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This script builds and pushes docker images when run from a release of 
Spark
+# with Kubernetes support.
+
+declare -A path=( [spark-driver]=dockerfiles/driver/Dockerfile \
+  [spark-executor]=dockerfiles/executor/Dockerfile )
+
+function build {
+  docker build -t spark-base -f dockerfiles/spark-base/Dockerfile .
+  for image in "${!path[@]}"; do
+docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
+  done
+}
+
+
+function push {
+  for image in "${!path[@]}"; do
+docker push ${REPO}/$image:${TAG}
+  done
+}
+
+function usage {
+  echo "Usage: ./sbin/build-push-docker-images.sh -r  -t  build"
+  echo "   ./sbin/build-push-docker-images.sh -r  -t  push"
+  echo "for example: ./sbin/build-push-docker-images.sh -r 
docker.io/myrepo -t v2.3.0 push"
+}
+
+if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
+  usage
+  exit 0
+fi
+
+while getopts r:t: option
+do
+ case "${option}"
+ in
+ r) REPO=${OPTARG};;
+ t) TAG=${OPTARG};;
+ esac
+done
+
+if [ -z "$REPO" ] || [ -z "$TAG" ]; then
+usage
+else
+  case "${@: -1}" in
+build) build;;
+push) push;;
+*) usage;;
+  esac
+fi
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156460169
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156457660
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156459761
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP protocol 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156457806
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
--- End diff --

Done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread foxish
Github user foxish commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156457426
  
--- Diff: sbin/build-push-docker-images.sh ---
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This script builds and pushes docker images when run from a release of 
Spark
+# with Kubernetes support.
+
+declare -A path=( [spark-driver]=dockerfiles/driver/Dockerfile \
+  [spark-executor]=dockerfiles/executor/Dockerfile )
+
+function build {
+  docker build -t spark-base -f dockerfiles/spark-base/Dockerfile .
+  for image in "${!path[@]}"; do
+docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
+  done
+}
+
+
+function push {
+  for image in "${!path[@]}"; do
+docker push ${REPO}/$image:${TAG}
+  done
+}
+
+function usage {
--- End diff --

We do already do that [in our 
fork](https://github.com/apache-spark-on-k8s/spark/blob/branch-2.2-kubernetes/sbin/build-push-docker-images.sh#L24)
 - and when we get PySpark and R submitted (in Spark 2.4 hopefully), we will 
extend this script.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156354521
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
--- End diff --

"which are also Kubernetes pods" -> "which are also running within 
Kubernetes pods"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156353526
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
--- End diff --

"This features" -> "This feature"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156401095
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156348786
  
--- Diff: sbin/build-push-docker-images.sh ---
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This script builds and pushes docker images when run from a release of 
Spark
+# with Kubernetes support.
+
+declare -A path=( [spark-driver]=dockerfiles/driver/Dockerfile \
+  [spark-executor]=dockerfiles/executor/Dockerfile )
+
+function build {
+  docker build -t spark-base -f dockerfiles/spark-base/Dockerfile .
+  for image in "${!path[@]}"; do
+docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
+  done
+}
+
+
+function push {
+  for image in "${!path[@]}"; do
+docker push ${REPO}/$image:${TAG}
+  done
+}
+
+function usage {
+  echo "Usage: ./sbin/build-push-docker-images.sh -r  -t  build"
+  echo "   ./sbin/build-push-docker-images.sh -r  -t  push"
+  echo "for example: ./sbin/build-push-docker-images.sh -r 
docker.io/myrepo -t v2.3.0 push"
+}
+
+if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
+  usage
+  exit 0
+fi
+
+while getopts r:t: option
+do
+ case "${option}"
+ in
+ r) REPO=${OPTARG};;
+ t) TAG=${OPTARG};;
+ esac
+done
+
+if [ -z "$REPO" ] || [ -z "$TAG" ]; then
+usage
+else
+  case "${@: -1}" in
+build) build;;
+push) push;;
+*) usage;;
+  esac
+fi
--- End diff --

nit: add extra empty line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156371701
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156403004
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
+  --deploy-mode cluster \
+  --class org.apache.spark.examples.SparkPi \
+  --master k8s://https://: \
+  --conf spark.kubernetes.namespace=default \
+  --conf spark.executor.instances=5 \
+  --conf spark.app.name=spark-pi \
+  --conf spark.kubernetes.driver.docker.image= \
+  --conf spark.kubernetes.executor.docker.image= \
+  local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
+
+The Spark master, specified either via passing the `--master` command line 
argument to `spark-submit` or by setting
+`spark.master` in the application's configuration, must be a URL with the 
format `k8s://`. Prefixing the
+master string with `k8s://` will cause the Spark application to launch on 
the Kubernetes cluster, with the API server
+being contacted at `api_server_url`. If no HTTP 

[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156369803
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -0,0 +1,496 @@
+---
+layout: global
+title: Running Spark on Kubernetes
+---
+* This will become a table of contents (this text will be scraped).
+{:toc}
+
+Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). 
This features makes use of the new experimental native
+Kubernetes scheduler that has been added to Spark.
+
+# Prerequisites
+
+* A runnable distribution of Spark 2.3 or above.
+* A running Kubernetes cluster at version >= 1.6 with access configured to 
it using
+[kubectl](https://kubernetes.io/docs/user-guide/prereqs/).  If you do not 
already have a working Kubernetes cluster,
+you may setup a test cluster on your local machine using
+[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
+  * We recommend using the latest releases of minikube be updated to the 
most recent version with the DNS addon enabled.
+* You must have appropriate permissions to list, create, edit and delete
+[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You 
can verify that you can list these resources
+by running `kubectl auth can-i  pods`.
+  * The service account credentials used by the driver pods must be 
allowed to create pods, services and configmaps.
+* You must have [Kubernetes 
DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) 
configured in your cluster.
+
+# How it works
+
+
+  
+
+
+spark-submit can be directly used to submit a Spark application to a 
Kubernetes cluster. The mechanism by which spark-submit happens is as follows:
+
+* Spark creates a spark driver running within a [Kubernetes 
pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/).
+* The driver creates executors which are also Kubernetes pods and connects 
to them, and executes application code.
+* When the application completes, the executor pods terminate and are 
cleaned up, but the driver pod persists
+logs and remains in "completed" state in the Kubernetes API till it's 
eventually garbage collected or manually cleaned up.
+
+Note that in the completed state, the driver pod does *not* use any 
computational or memory resources.
+
+The driver and executor pod scheduling is handled by Kubernetes. It will 
be possible to affect Kubernetes scheduling
+decisions for driver and executor pods using advanced primitives like
+[node 
selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector)
+and [node/pod 
affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity)
+in a future release.
+
+# Submitting Applications to Kubernetes
+
+## Docker Images
+
+Kubernetes requires users to supply images that can be deployed into 
containers within pods. The images are built to
+be run in a container runtime environment that Kubernetes supports. Docker 
is a container runtime environment that is
+frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles 
provided in the runnable distribution that can be customized
+and built for your usage.
+
+You may build these docker images from sources.
+There is a script, `sbin/build-push-docker-images.sh` that you can use to 
build and push
+customized spark distribution images consisting of all the above 
components.
+
+Example usage is:
+
+./sbin/build-push-docker-images.sh -r  -t my-tag build
+./sbin/build-push-docker-images.sh -r  -t my-tag push
+
+Docker files are under the `dockerfiles/` and can be customized further 
before
+building using the supplied script, or manually.
+
+## Cluster Mode
+
+To launch Spark Pi in cluster mode,
+
+bin/spark-submit \
--- End diff --

nit: add `{% highlight bash %}` over the command.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...

2017-12-12 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19946#discussion_r156352778
  
--- Diff: sbin/build-push-docker-images.sh ---
@@ -0,0 +1,67 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This script builds and pushes docker images when run from a release of 
Spark
+# with Kubernetes support.
+
+declare -A path=( [spark-driver]=dockerfiles/driver/Dockerfile \
+  [spark-executor]=dockerfiles/executor/Dockerfile )
+
+function build {
+  docker build -t spark-base -f dockerfiles/spark-base/Dockerfile .
+  for image in "${!path[@]}"; do
+docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
+  done
+}
+
+
+function push {
+  for image in "${!path[@]}"; do
+docker push ${REPO}/$image:${TAG}
+  done
+}
+
+function usage {
--- End diff --

So likely not for this PR, but I'm wondering for the future what you think 
about the idea of extending this to package up a usable Python env so we can 
solve the dependency management issue as well?

Really excited to see the progress :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >