Repository: spark
Updated Branches:
  refs/heads/branch-2.3 84707f0c6 -> ea9da6152


[SPARK-22960][K8S] Make build-push-docker-images.sh more dev-friendly.

- Make it possible to build images from a git clone.
- Make it easy to use minikube to test things.

Also fixed what seemed like a bug: the base image wasn't getting the tag
provided in the command line. Adding the tag allows users to use multiple
Spark builds in the same kubernetes cluster.

Tested by deploying images on minikube and running spark-submit from a dev
environment; also by building the images with different tags and verifying
"docker images" in minikube.

Author: Marcelo Vanzin <van...@cloudera.com>

Closes #20154 from vanzin/SPARK-22960.

(cherry picked from commit 0428368c2c5e135f99f62be20877bbbda43be310)
Signed-off-by: Marcelo Vanzin <van...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea9da615
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea9da615
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea9da615

Branch: refs/heads/branch-2.3
Commit: ea9da6152af9223787cffd83d489741b4cc5aa34
Parents: 84707f0
Author: Marcelo Vanzin <van...@cloudera.com>
Authored: Thu Jan 4 16:34:56 2018 -0800
Committer: Marcelo Vanzin <van...@cloudera.com>
Committed: Thu Jan 4 16:35:07 2018 -0800

----------------------------------------------------------------------
 docs/running-on-kubernetes.md                   |   9 +-
 .../src/main/dockerfiles/driver/Dockerfile      |   3 +-
 .../src/main/dockerfiles/executor/Dockerfile    |   3 +-
 .../main/dockerfiles/init-container/Dockerfile  |   3 +-
 .../src/main/dockerfiles/spark-base/Dockerfile  |   7 +-
 sbin/build-push-docker-images.sh                | 120 +++++++++++++++----
 6 files changed, 117 insertions(+), 28 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/docs/running-on-kubernetes.md
----------------------------------------------------------------------
diff --git a/docs/running-on-kubernetes.md b/docs/running-on-kubernetes.md
index e491329..2d69f63 100644
--- a/docs/running-on-kubernetes.md
+++ b/docs/running-on-kubernetes.md
@@ -16,6 +16,9 @@ Kubernetes scheduler that has been added to Spark.
 you may setup a test cluster on your local machine using
 [minikube](https://kubernetes.io/docs/getting-started-guides/minikube/).
   * We recommend using the latest release of minikube with the DNS addon 
enabled.
+  * Be aware that the default minikube configuration is not enough for running 
Spark applications.
+  We recommend 3 CPUs and 4g of memory to be able to start a simple Spark 
application with a single
+  executor.
 * You must have appropriate permissions to list, create, edit and delete
 [pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can 
verify that you can list these resources
 by running `kubectl auth can-i <list|create|edit|delete> pods`.
@@ -197,7 +200,7 @@ kubectl port-forward <driver-pod-name> 4040:4040
 
 Then, the Spark driver UI can be accessed on `http://localhost:4040`.
 
-### Debugging 
+### Debugging
 
 There may be several kinds of failures. If the Kubernetes API server rejects 
the request made from spark-submit, or the
 connection is refused for a different reason, the submission logic should 
indicate the error encountered. However, if there
@@ -215,8 +218,8 @@ If the pod has encountered a runtime error, the status can 
be probed further usi
 kubectl logs <spark-driver-pod>
 ```
 
-Status and logs of failed executor pods can be checked in similar ways. 
Finally, deleting the driver pod will clean up the entire spark 
-application, includling all executors, associated service, etc. The driver pod 
can be thought of as the Kubernetes representation of 
+Status and logs of failed executor pods can be checked in similar ways. 
Finally, deleting the driver pod will clean up the entire spark
+application, including all executors, associated service, etc. The driver pod 
can be thought of as the Kubernetes representation of
 the Spark application.
 
 ## Kubernetes Features

http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
----------------------------------------------------------------------
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
index 45fbcd9..ff5289e 100644
--- a/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
+++ b/resource-managers/kubernetes/docker/src/main/dockerfiles/driver/Dockerfile
@@ -15,7 +15,8 @@
 # limitations under the License.
 #
 
-FROM spark-base
+ARG base_image
+FROM ${base_image}
 
 # Before building the docker image, first build and make a Spark distribution 
following
 # the instructions in http://spark.apache.org/docs/latest/building-spark.html.

http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
----------------------------------------------------------------------
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
index 0f806cf..3eabb42 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/executor/Dockerfile
@@ -15,7 +15,8 @@
 # limitations under the License.
 #
 
-FROM spark-base
+ARG base_image
+FROM ${base_image}
 
 # Before building the docker image, first build and make a Spark distribution 
following
 # the instructions in http://spark.apache.org/docs/latest/building-spark.html.

http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
----------------------------------------------------------------------
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
index 047056a..e0a249e 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/init-container/Dockerfile
@@ -15,7 +15,8 @@
 # limitations under the License.
 #
 
-FROM spark-base
+ARG base_image
+FROM ${base_image}
 
 # If this docker file is being used in the context of building your images 
from a Spark distribution, the docker build
 # command should be invoked from the top level directory of the Spark 
distribution. E.g.:

http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
----------------------------------------------------------------------
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
index 222e777..da1d6b9 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile
@@ -17,6 +17,9 @@
 
 FROM openjdk:8-alpine
 
+ARG spark_jars
+ARG img_path
+
 # Before building the docker image, first build and make a Spark distribution 
following
 # the instructions in http://spark.apache.org/docs/latest/building-spark.html.
 # If this docker file is being used in the context of building your images 
from a Spark
@@ -34,11 +37,11 @@ RUN set -ex && \
     ln -sv /bin/bash /bin/sh && \
     chgrp root /etc/passwd && chmod ug+rw /etc/passwd
 
-COPY jars /opt/spark/jars
+COPY ${spark_jars} /opt/spark/jars
 COPY bin /opt/spark/bin
 COPY sbin /opt/spark/sbin
 COPY conf /opt/spark/conf
-COPY kubernetes/dockerfiles/spark-base/entrypoint.sh /opt/
+COPY ${img_path}/spark-base/entrypoint.sh /opt/
 
 ENV SPARK_HOME /opt/spark
 

http://git-wip-us.apache.org/repos/asf/spark/blob/ea9da615/sbin/build-push-docker-images.sh
----------------------------------------------------------------------
diff --git a/sbin/build-push-docker-images.sh b/sbin/build-push-docker-images.sh
index b313759..bb8806d 100755
--- a/sbin/build-push-docker-images.sh
+++ b/sbin/build-push-docker-images.sh
@@ -19,29 +19,94 @@
 # This script builds and pushes docker images when run from a release of Spark
 # with Kubernetes support.
 
-declare -A path=( [spark-driver]=kubernetes/dockerfiles/driver/Dockerfile \
-                  [spark-executor]=kubernetes/dockerfiles/executor/Dockerfile \
-                  
[spark-init]=kubernetes/dockerfiles/init-container/Dockerfile )
+function error {
+  echo "$@" 1>&2
+  exit 1
+}
+
+# Detect whether this is a git clone or a Spark distribution and adjust paths
+# accordingly.
+if [ -z "${SPARK_HOME}" ]; then
+  SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
+fi
+. "${SPARK_HOME}/bin/load-spark-env.sh"
+
+if [ -f "$SPARK_HOME/RELEASE" ]; then
+  IMG_PATH="kubernetes/dockerfiles"
+  SPARK_JARS="jars"
+else
+  IMG_PATH="resource-managers/kubernetes/docker/src/main/dockerfiles"
+  SPARK_JARS="assembly/target/scala-$SPARK_SCALA_VERSION/jars"
+fi
+
+if [ ! -d "$IMG_PATH" ]; then
+  error "Cannot find docker images. This script must be run from a runnable 
distribution of Apache Spark."
+fi
+
+declare -A path=( [spark-driver]="$IMG_PATH/driver/Dockerfile" \
+                  [spark-executor]="$IMG_PATH/executor/Dockerfile" \
+                  [spark-init]="$IMG_PATH/init-container/Dockerfile" )
+
+function image_ref {
+  local image="$1"
+  local add_repo="${2:-1}"
+  if [ $add_repo = 1 ] && [ -n "$REPO" ]; then
+    image="$REPO/$image"
+  fi
+  if [ -n "$TAG" ]; then
+    image="$image:$TAG"
+  fi
+  echo "$image"
+}
 
 function build {
-  docker build -t spark-base -f kubernetes/dockerfiles/spark-base/Dockerfile .
+  local base_image="$(image_ref spark-base 0)"
+  docker build --build-arg "spark_jars=$SPARK_JARS" \
+    --build-arg "img_path=$IMG_PATH" \
+    -t "$base_image" \
+    -f "$IMG_PATH/spark-base/Dockerfile" .
   for image in "${!path[@]}"; do
-    docker build -t ${REPO}/$image:${TAG} -f ${path[$image]} .
+    docker build --build-arg "base_image=$base_image" -t "$(image_ref $image)" 
-f ${path[$image]} .
   done
 }
 
-
 function push {
   for image in "${!path[@]}"; do
-    docker push ${REPO}/$image:${TAG}
+    docker push "$(image_ref $image)"
   done
 }
 
 function usage {
-  echo "This script must be run from a runnable distribution of Apache Spark."
-  echo "Usage: ./sbin/build-push-docker-images.sh -r <repo> -t <tag> build"
-  echo "       ./sbin/build-push-docker-images.sh -r <repo> -t <tag> push"
-  echo "for example: ./sbin/build-push-docker-images.sh -r docker.io/myrepo -t 
v2.3.0 push"
+  cat <<EOF
+Usage: $0 [options] [command]
+Builds or pushes the built-in Spark Docker images.
+
+Commands:
+  build       Build images.
+  push        Push images to a registry. Requires a repository address to be 
provided, both
+              when building and when pushing the images.
+
+Options:
+  -r repo     Repository address.
+  -t tag      Tag to apply to built images, or to identify images to be pushed.
+  -m          Use minikube's Docker daemon.
+
+Using minikube when building images will do so directly into minikube's Docker 
daemon.
+There is no need to push the images into minikube in that case, they'll be 
automatically
+available when running applications inside the minikube cluster.
+
+Check the following documentation for more information on using the minikube 
Docker daemon:
+
+  
https://kubernetes.io/docs/getting-started-guides/minikube/#reusing-the-docker-daemon
+
+Examples:
+  - Build images in minikube with tag "testing"
+    $0 -m -t testing build
+
+  - Build and push images with tag "v2.3.0" to docker.io/myrepo
+    $0 -r docker.io/myrepo -t v2.3.0 build
+    $0 -r docker.io/myrepo -t v2.3.0 push
+EOF
 }
 
 if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
@@ -49,21 +114,36 @@ if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
   exit 0
 fi
 
-while getopts r:t: option
+REPO=
+TAG=
+while getopts mr:t: option
 do
  case "${option}"
  in
  r) REPO=${OPTARG};;
  t) TAG=${OPTARG};;
+ m)
+   if ! which minikube 1>/dev/null; then
+     error "Cannot find minikube."
+   fi
+   eval $(minikube docker-env)
+   ;;
  esac
 done
 
-if [ -z "$REPO" ] || [ -z "$TAG" ]; then
+case "${@: -1}" in
+  build)
+    build
+    ;;
+  push)
+    if [ -z "$REPO" ]; then
+      usage
+      exit 1
+    fi
+    push
+    ;;
+  *)
     usage
-else
-  case "${@: -1}" in
-    build) build;;
-    push) push;;
-    *) usage;;
-  esac
-fi
+    exit 1
+    ;;
+esac


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to