GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/23019

    [SPARK-26025][k8s] Speed up docker image build on dev repo.

    The "build context" for a docker image - basically the whole contents of the
    current directory where "docker" is invoked - can be huge in a dev build,
    easily breaking a couple of gigs.
    
    Doing that copy 3 times during the build of docker images severely slows
    down the process.
    
    This patch creates a smaller build context - basically mimicking what the
    make-distribution.sh script does, so that when building the docker images,
    only the necessary bits are in the current directory. For PySpark and R that
    is optimized further, since those images are built based on the previously
    built Spark main image.
    
    In my current local clone, the dir size is about 2G, but with this script
    the "context" sent to docker is about 250M for the main image, 1M for the
    pyspark image and 8M for the R image. That speeds up the image builds
    considerably.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-26025

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23019.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23019
    
----
commit 1625baa24923fd2bb6dbfb06cc52d059b64f3d0f
Author: Marcelo Vanzin <vanzin@...>
Date:   2018-11-12T22:13:00Z

    [SPARK-26025][k8s] Speed up docker image build on dev repo.
    
    The "build context" for a docker image - basically the whole contents of the
    current directory where "docker" is invoked - can be huge in a dev build,
    easily breaking a couple of gigs.
    
    Doing that copy 3 times during the build of docker images severely slows
    down the process.
    
    This patch creates a smaller build context - basically mimicking what the
    make-distribution.sh script does, so that when building the docker images,
    only the necessary bits are in the current directory. For PySpark and R that
    is optimized further, since those images are built based on the previously
    built Spark main image.
    
    In my current local clone, the dir size is about 2G, but with this script
    the "context" sent to docker is about 250M for the main image, 1M for the
    pyspark image and 8M for the R image. That speeds up the image builds
    considerably.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to