potiuk commented on issue #4483: [AIRFLOW-3673] Add official dockerfile
URL: https://github.com/apache/airflow/pull/4483#issuecomment-453675025
 
 
   I created a pull request from my private repo : 
https://github.com/ffinfo/incubator-airflow/pull/1 
(https://github.com/potiuk/incubator-airflow/commit/f7e3e2646823122c05f0075e5b019b21426a90fa)
 . This improves the original Dockerfile in the following ways:
   * there are several layers of the image:
     * two apt-get layers for basic/complex dependencie
     * upgrade apt-get layer for future upgrades in apt-get dependencies
     * layer with pip airflow dependencies installed
     * layer with source changes of airflow
     * layer with additional pip dependencies
   * Cassandra driver install time is vastly decreased - no CYTHON compilation 
is done which shortens the build time by 10 minutes or so. Also automatically 
multi-processor build (8 processors) is enabled. This all can be changed by 
simply changing the default values. also it can be overwritten by --build-arg 
flag of `docker build`
   
   This should work as follows (if caching is enabled in dockerhub):
   - if only airflow sources change without setup.py dependencies (which is 
most common case) only the last layer is rebuild, all the other layers are 
taken from the cache. This should not only speed up the build time but also 
amount of data downloaded by the users/developers. This can also be forced by 
increasing value of FORCE_REINSTALL_AIRFLOW_SOURCES env in the Dockerfile
   - if dependencies are changed in setup.py, then last two layers are 
invalidated and rebuilt - all dependencies will be installed from scratch. This 
can also be forced by increasing the value of 
FORCE_REINSTALL_ALL_PIP_DEPENDENCIES
   - if we want to upgrade all apt-get dependencies we can increase the value 
of FORCE_UPGRADE_OF_APT_GET variable in the Dockerfile - this will invalidate 
the cache and force "apt-get upgrade"
   - if we want to force reinstalling of everything from the scratch we can 
increase the value of FORCE_REINSTALL_APT_GET_DEPENDENCIES variable in the 
Dockerfile
   
   As result - if we just push code to airflow repo, the image built in 
Dockerhub will be prepared in optimal way and users will only download 
incremental updates to the base image they already have. We also have a way to 
force rebuilding of parts or the whole image if we choose to.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to