This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new dc2da72  [SPARK-26685][K8S] Correct placement of ARG declaration
dc2da72 is described below

commit dc2da72100811988ee1b31190f219b620f88f8de
Author: Rob Vesse <rve...@dotnetrdf.org>
AuthorDate: Tue Jan 22 10:31:17 2019 -0800

    [SPARK-26685][K8S] Correct placement of ARG declaration
    
    Latest Docker releases are stricter in their enforcement of build argument 
scope.  The location of the `ARG spark_uid` declaration in the Python and R 
Dockerfiles means the variable is out of scope by the time it is used in a 
`USER` declaration resulting in a container running as root rather than the 
default/configured UID.
    
    Also with some of the refactoring of the script that has happened since my 
PR that introduced the configurable UID it turns out the `-u <uid>` argument is 
not being properly passed to the Python and R image builds when those are opted 
into
    
    ## What changes were proposed in this pull request?
    
    This commit moves the `ARG` declaration to just before the argument is used 
such that it is in scope.  It also ensures that Python and R image builds 
receive the build arguments that include the `spark_uid` argument where relevant
    
    ## How was this patch tested?
    
    Prior to the patch images are produced where the Python and R images ignore 
the default/configured UID:
    
    ```
    > docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456
    bash-4.4# whoami
    root
    bash-4.4# id -u
    0
    bash-4.4# exit
    > docker run -it --entrypoint /bin/bash rvesse/spark:uid456
    bash-4.4$ id -u
    456
    bash-4.4$ exit
    ```
    
    Note that the Python image is still running as `root` having ignored the 
configured UID of 456 while the base image has the correct UID because the 
relevant `ARG` declaration is correctly in scope.
    
    After the patch the correct UID is observed:
    
    ```
    > docker run -it --entrypoint /bin/bash rvesse/spark-r:uid456
    bash-4.4$ id -u
    456
    bash-4.4$ exit
    exit
    > docker run -it --entrypoint /bin/bash rvesse/spark-py:uid456
    bash-4.4$ id -u
    456
    bash-4.4$ exit
    exit
    > docker run -it --entrypoint /bin/bash rvesse/spark:uid456
    bash-4.4$ id -u
    456
    bash-4.4$ exit
    ```
    
    Closes #23611 from rvesse/SPARK-26685.
    
    Authored-by: Rob Vesse <rve...@dotnetrdf.org>
    Signed-off-by: Marcelo Vanzin <van...@cloudera.com>
---
 bin/docker-image-tool.sh                                               | 3 ++-
 .../kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile | 2 +-
 .../docker/src/main/dockerfiles/spark/bindings/python/Dockerfile       | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/bin/docker-image-tool.sh b/bin/docker-image-tool.sh
index 4f66137..efaf09e 100755
--- a/bin/docker-image-tool.sh
+++ b/bin/docker-image-tool.sh
@@ -154,10 +154,11 @@ function build {
   fi
 
   local BINDING_BUILD_ARGS=(
-    ${BUILD_PARAMS}
+    ${BUILD_ARGS[@]}
     --build-arg
     base_img=$(image_ref spark)
   )
+
   local 
BASEDOCKERFILE=${BASEDOCKERFILE:-"kubernetes/dockerfiles/spark/Dockerfile"}
   local PYDOCKERFILE=${PYDOCKERFILE:-false}
   local RDOCKERFILE=${RDOCKERFILE:-false}
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile
 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile
index 9ded57c..34d449c 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile
@@ -16,7 +16,6 @@
 #
 
 ARG base_img
-ARG spark_uid=185
 
 FROM $base_img
 WORKDIR /
@@ -35,4 +34,5 @@ WORKDIR /opt/spark/work-dir
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
 
 # Specify the User that the actual main process will run as
+ARG spark_uid=185
 USER ${spark_uid}
diff --git 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
index 36b91eb..5044900 100644
--- 
a/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
+++ 
b/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile
@@ -16,7 +16,6 @@
 #
 
 ARG base_img
-ARG spark_uid=185
 
 FROM $base_img
 WORKDIR /
@@ -46,4 +45,5 @@ WORKDIR /opt/spark/work-dir
 ENTRYPOINT [ "/opt/entrypoint.sh" ]
 
 # Specify the User that the actual main process will run as
+ARG spark_uid=185
 USER ${spark_uid}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to