potiuk commented on a change in pull request #4543: [AIRFLOW-3718] 
Multi-layered version of the docker image
URL: https://github.com/apache/airflow/pull/4543#discussion_r248747685
 
 

 ##########
 File path: Dockerfile
 ##########
 @@ -16,26 +16,83 @@
 
 FROM python:3.6-slim
 
-COPY . /opt/airflow/
+SHELL ["/bin/bash", "-c"]
+
+# Make sure noninteractie debian install is used
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Increase the value to force renstalling of all apt-get dependencies
+ENV FORCE_REINSTALL_APT_GET_DEPENDENCIES=1
+
+# Install core build dependencies
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+    libkrb5-dev libsasl2-dev libssl-dev libffi-dev libpq-dev git \
+    && apt-get clean
+
+    # Install useful utilities and other airflow required dependencies
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends \
+    libsasl2-dev freetds-bin build-essential default-libmysqlclient-dev 
apt-utils \
+    curl rsync netcat locales \
+    && apt-get clean
 
 ARG AIRFLOW_HOME=/usr/local/airflow
-ARG AIRFLOW_DEPS="all"
-ARG PYTHON_DEPS=""
-ARG buildDeps="freetds-dev libkrb5-dev libsasl2-dev libssl-dev libffi-dev 
libpq-dev git"
-ARG APT_DEPS="$buildDeps libsasl2-dev freetds-bin build-essential 
default-libmysqlclient-dev apt-utils curl rsync netcat locales"
+RUN mkdir -p $AIRFLOW_HOME
+
+# Airflow extras to be installed
+ARG AIRFLOW_EXTRAS="all"
+
+# Increase the value here to force reinstalling Apache Airflow pip dependencies
+ENV FORCE_REINSTALL_ALL_PIP_DEPENDENCIES=1
+
+# Speeds up building the image - cassandra driver without CYTHON saves around 
10 minutes
+# of build on typical machine
+ARG CASS_DRIVER_NO_CYTHON_ARG=""
+
+# Build cassandra driver on multiple CPUs
+ENV CASS_DRIVER_BUILD_CONCURRENCY=8
+
+# Speeds up the installation of cassandra driver
+ENV CASS_DRIVER_NO_CYTHON=${CASS_DRIVER_NO_CYTHON_ARG}
+
+## Airflow requires this variable be set on installation to avoid a GPL 
dependency.
+ENV SLUGIFY_USES_TEXT_UNIDECODE yes
+
+# Airflow sources change frequently but dependency onfiguration won't change 
that often
+# We copy setup.py and other files needed to perform setup of dependencies
+# This way cache here will only be invalidated if any of the
+# version/setup configuration change but not when airflow sources change
 
 Review comment:
   The dependencies should be resolved properly and minor updates should be 
installed as needed. It works as follows:
   
   1) The first pip install will install the dependencies as they are at the 
moment the dockerfile is generated for the first time (line 74).
   
   2) Whenever any of the sources of airflow change, the docker image at line 
77 gets invalidated and docker is rebuilt starting from that line. Then we 
upgrade apt-get dependencies to the latest ones (line 80) and then we run `pip 
install` Again (without using the old cache) to see if the pip transient 
dependencies will resolve to different versions of related packages. In most 
cases, this will be no-op but in case some transient dependencies will cause 
different resolution of them - they will get installed at that time (line 89)
   
   3) Whenever any of the setup.*, README, version.py, airflow script change 
(all files related to setup) we go even back a step - the Docker is invalidated 
at one of the lines 66-70 and it will continue from there - which means that 
completely fresh 'pip install' is run from scratch.
   
   4) We can always force installation from the scratch by increasing value of 
ENV FORCE_REINSTALL_APT_GET_DEPENDENCIES (currently = 1). If we manually 
increase it to 2 and commit such Dockerfile to docker it will invalidate docker 
images at line 25 and the all the installation is run from the scratch.
   
   5) The only thing I am not 100% sure (but about 99%) is what happens when 
new version of python3.6-slim gets released. I believe it will trigger the 
build from scratch, but that's something that we will have to confirm with 
DockerHub (It might depend on how they are doing caching as this part might be 
done differently). It's not a big problem even if this is not the case because 
at point 3) we do `apt-get upgrade` and at this time python version will get 
upgraded as well if it is released. So this is more of an optimisation question 
than a problem.
    
   
   So eventually - whenever you make any change to airflow sources, you are 
sure that both 'apt-get upgrade' and 'pip install' have been called. Whenever 
you make change to setup.py - you are sure that 'pip install' is called from 
scratch , whenever you force it you can do apt-get install from the scratch.
   
   I thin this fairly robust and expected behaviour.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to