potiuk commented on a change in pull request #4543: [AIRFLOW-3718] Multi-layered version of the docker image URL: https://github.com/apache/airflow/pull/4543#discussion_r248651860
########## File path: Dockerfile ########## @@ -16,26 +16,83 @@ FROM python:3.6-slim -COPY . /opt/airflow/ +SHELL ["/bin/bash", "-c"] + +# Make sure noninteractie debian install is used +ENV DEBIAN_FRONTEND=noninteractive + +# Increase the value to force renstalling of all apt-get dependencies +ENV FORCE_REINSTALL_APT_GET_DEPENDENCIES=1 + +# Install core build dependencies +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + libkrb5-dev libsasl2-dev libssl-dev libffi-dev libpq-dev git \ + && apt-get clean + + # Install useful utilities and other airflow required dependencies +RUN apt-get update \ + && apt-get install -y --no-install-recommends \ + libsasl2-dev freetds-bin build-essential default-libmysqlclient-dev apt-utils \ + curl rsync netcat locales \ + && apt-get clean ARG AIRFLOW_HOME=/usr/local/airflow -ARG AIRFLOW_DEPS="all" -ARG PYTHON_DEPS="" -ARG buildDeps="freetds-dev libkrb5-dev libsasl2-dev libssl-dev libffi-dev libpq-dev git" -ARG APT_DEPS="$buildDeps libsasl2-dev freetds-bin build-essential default-libmysqlclient-dev apt-utils curl rsync netcat locales" +RUN mkdir -p $AIRFLOW_HOME + +# Airflow extras to be installed +ARG AIRFLOW_EXTRAS="all" + +# Increase the value here to force reinstalling Apache Airflow pip dependencies +ENV FORCE_REINSTALL_ALL_PIP_DEPENDENCIES=1 + +# Speeds up building the image - cassandra driver without CYTHON saves around 10 minutes +# of build on typical machine +ARG CASS_DRIVER_NO_CYTHON_ARG="" + +# Build cassandra driver on multiple CPUs +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +# Speeds up the installation of cassandra driver +ENV CASS_DRIVER_NO_CYTHON=${CASS_DRIVER_NO_CYTHON_ARG} + +## Airflow requires this variable be set on installation to avoid a GPL dependency. +ENV SLUGIFY_USES_TEXT_UNIDECODE yes + +# Airflow sources change frequently but dependency onfiguration won't change that often +# We copy setup.py and other files needed to perform setup of dependencies +# This way cache here will only be invalidated if any of the +# version/setup configuration change but not when airflow sources change +COPY setup.* /opt/airflow/ +COPY README.md /opt/airflow/ +COPY airflow/version.py /opt/airflow/airflow/version.py +COPY airflow/__init__.py /opt/airflow/airflow/__init__.py +COPY airflow/bin/airflow /opt/airflow/airflow/bin/airflow WORKDIR /opt/airflow -RUN set -x \ - && apt update \ - && if [ -n "${APT_DEPS}" ]; then apt install -y $APT_DEPS; fi \ - && if [ -n "${PYTHON_DEPS}" ]; then pip install --no-cache-dir ${PYTHON_DEPS}; fi \ - && pip install --no-cache-dir -e .[$AIRFLOW_DEPS] \ - && apt purge --auto-remove -yqq $buildDeps \ - && apt autoremove -yqq --purge \ - && apt clean - -WORKDIR $AIRFLOW_HOME -RUN mkdir -p $AIRFLOW_HOME +# First install only dependencies but no Apache Airflow itself - this way regular Airflow +RUN pip install --no-cache-dir -e.[$AIRFLOW_EXTRAS] + +# Cache for this will be automatically invalidated if any of airflow sources change +COPY . /opt/airflow/ Review comment: It's the whole point of the solution. We must only copy the setup* related files first to avoid cache invalidation whenever sources change. COPY on setup.py will not invalidate cache if the setup.py file is not changed (docker calculates hash of the file and if the hash has not changed the cashe will not be invalidated. If we copy the whole directory then docker calculates hashes for all files in the directory and will invalidate the cashe when any of the files change. That's why we need to only copy files that might potentially impact dependendcies (like setup.py). Then we need to have all the files that are needed to run setup.py (but we do not want all the sources as they will trigger invalidation on every commit). On the other hand if setup.py changes - it will invalidate the cache. Only after the installation of dependencies we should COPY all of the airflow sources (at this time the packages are installed already and taken from the cache). Or maybe we are talking about smth different ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services