[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r337465801 ## File path: setup.py ## @@ -287,46 +286,75 @@ def write_version(filename: str = os.path.join(*["airflow", "git_version"])): 'jira', 'mongomock', 'moto==1.3.5', +'mypy==0.720', 'nose', 'nose-ignore-docstring==0.2', 'nose-timer', 'parameterized', -'paramiko', 'pre-commit', 'pylint~=2.3.1', # to be upgraded after fixing https://github.com/PyCQA/pylint/issues/3123 # We should also disable checking docstring at the module level -'pysftp', -'pywinrm', -'qds-sdk>=1.9.6', 'rednose', 'requests_mock', 'yamllint' ] + +devel = sorted(devel + doc) + # IMPORTANT NOTE!!! # IF you are removing dependencies from the above list, please make sure that you also increase # DEPENDENCIES_EPOCH_NUMBER in the Dockerfile -if PY3: -devel += ['mypy==0.720'] -else: -devel += ['unittest2'] - devel_minreq = devel + kubernetes + mysql + doc + password + cgroups devel_hadoop = devel_minreq + hive + hdfs + webhdfs + kerberos -devel_all = (sendgrid + devel + all_dbs + doc + samba + slack + oracle + - docker + ssh + kubernetes + celery + redis + gcp + grpc + - datadog + zendesk + jdbc + ldap + kerberos + password + webhdfs + jenkins + - druid + pinot + segment + snowflake + elasticsearch + sentry + - atlas + azure + aws + salesforce + cgroups + papermill + virtualenv) -# Snakebite & Google Cloud Dataflow are not Python 3 compatible :'( -if PY3: -devel_ci = [package for package in devel_all if package not in -['snakebite>=2.7.8', 'snakebite[kerberos]>=2.7.8']] -else: -devel_ci = devel_all +all_packages = ( +async_packages + +atlas + +all_dbs + +aws + +azure + +celery + +cgroups + +datadog + +dask + +databricks + +datadog + +docker + +druid + +elasticsearch + +gcp + +grpc + +flask_oauth + +jdbc + +jenkins + +kerberos + +kubernetes + +ldap + +oracle + +papermill + +password + +pinot + +redis + +salesforce + +samba + +sendgrid + +sentry + +segment + +slack + +snowflake + +ssh + +statsd + +virtualenv + +webhdfs + +winrm + +zendesk +) + +# Snakebite is not Python 3 compatible :'( +all_packages = [package for package in all_packages if not package.startswith('snakebite')] Review comment: There's an open issue for upgrading/replacing snakebite. Removing Kerberos: eek no definetly not. Soem enterprises need kerberos support. If it's broken on Py3 then we need to fix that before 2.0.0 can be released. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r337073537 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336916999 ## File path: Dockerfile ## @@ -334,56 +380,52 @@ COPY --chown=airflow:airflow airflow/version.py ${AIRFLOW_SOURCES}/airflow/versi COPY --chown=airflow:airflow airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py COPY --chown=airflow:airflow airflow/bin/airflow ${AIRFLOW_SOURCES}/airflow/bin/airflow -# The goal of this line is to install the dependencies from the most current setup.py from sources -# This will be usually incremental small set of packages in CI optimized build, so it will be very fast -# In non-CI optimized build this will install all dependencies before installing sources. -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" - +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="" +ENV CASS_DRIVER_BUILD_CONCURRENCY="8" -WORKDIR ${AIRFLOW_SOURCES}/airflow/www - -# Copy all www files here so that we can run npm building for production -COPY --chown=airflow:airflow airflow/www/ ${AIRFLOW_SOURCES}/airflow/www/ +ENV PATH="/home/airflow/.local/bin:/home/airflow:${PATH}" -# Package NPM for production -RUN gosu ${AIRFLOW_USER} npm run prod +# The goal of this line is to install the dependencies from the most current setup.py from sources +# This will be usually incremental small set of packages in CI optimized build, so it will be very fast +# For production optimised build it is the first time dependencies are installed so it will be slower +RUN pip install --user ".[${AIRFLOW_PROD_EXTRAS}]" \ +&& pip uninstall --yes apache-airflow snakebite # Cache for this line will be automatically invalidated if any # of airflow sources change COPY --chown=airflow:airflow . ${AIRFLOW_SOURCES}/ -WORKDIR ${AIRFLOW_SOURCES} - -# Finally install the requirements from the latest sources -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" +# Reinstall airflow again - this time with sources and remove the sources after installation +RUN pip install --user ".[${AIRFLOW_PROD_EXTRAS}]" Review comment: I don't see where we remove the sources? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336905484 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336913399 ## File path: setup.py ## @@ -287,46 +286,75 @@ def write_version(filename: str = os.path.join(*["airflow", "git_version"])): 'jira', 'mongomock', 'moto==1.3.5', +'mypy==0.720', 'nose', 'nose-ignore-docstring==0.2', 'nose-timer', 'parameterized', -'paramiko', 'pre-commit', 'pylint~=2.3.1', # to be upgraded after fixing https://github.com/PyCQA/pylint/issues/3123 # We should also disable checking docstring at the module level -'pysftp', -'pywinrm', -'qds-sdk>=1.9.6', 'rednose', 'requests_mock', 'yamllint' ] + +devel = sorted(devel + doc) Review comment: Why not just merge the doc extra in to `devel` list in source code above? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336904053 ## File path: Dockerfile ## @@ -334,56 +380,52 @@ COPY --chown=airflow:airflow airflow/version.py ${AIRFLOW_SOURCES}/airflow/versi COPY --chown=airflow:airflow airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py COPY --chown=airflow:airflow airflow/bin/airflow ${AIRFLOW_SOURCES}/airflow/bin/airflow -# The goal of this line is to install the dependencies from the most current setup.py from sources -# This will be usually incremental small set of packages in CI optimized build, so it will be very fast -# In non-CI optimized build this will install all dependencies before installing sources. -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" - +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="" +ENV CASS_DRIVER_BUILD_CONCURRENCY="8" -WORKDIR ${AIRFLOW_SOURCES}/airflow/www - -# Copy all www files here so that we can run npm building for production -COPY --chown=airflow:airflow airflow/www/ ${AIRFLOW_SOURCES}/airflow/www/ +ENV PATH="/home/airflow/.local/bin:/home/airflow:${PATH}" -# Package NPM for production -RUN gosu ${AIRFLOW_USER} npm run prod +# The goal of this line is to install the dependencies from the most current setup.py from sources +# This will be usually incremental small set of packages in CI optimized build, so it will be very fast +# For production optimised build it is the first time dependencies are installed so it will be slower +RUN pip install --user ".[${AIRFLOW_PROD_EXTRAS}]" \ +&& pip uninstall --yes apache-airflow snakebite Review comment: ?? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336898067 ## File path: Dockerfile ## @@ -16,19 +16,17 @@ # WARNING: THIS DOCKERFILE IS NOT INTENDED FOR PRODUCTION USE OR DEPLOYMENT. # # Base image for the whole Docker file -ARG APT_DEPS_IMAGE="airflow-apt-deps-ci-slim" -ARG PYTHON_BASE_IMAGE="python:3.6-slim-stretch" +ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster" +ARG NODE_BASE_IMAGE="node:12.11.1-buster" + -# This is the slim image with APT dependencies needed by Airflow. It is based on a python slim image -# Parameters: -#PYTHON_BASE_IMAGE - base python image (python:x.y-slim-stretch) +# Base image for Airflow - contains dependencies used by both - Production and CI images -FROM ${PYTHON_BASE_IMAGE} as airflow-apt-deps-ci-slim - +FROM ${PYTHON_BASE_IMAGE} as airflow-base SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ARG PYTHON_BASE_IMAGE="python:3.6-slim-stretch" +ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster" ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} ARG AIRFLOW_VERSION="2.0.0.dev0" Review comment: It might be worth adding these two ENVS as LABELs too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336903775 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336914233 ## File path: setup.py ## @@ -287,46 +286,75 @@ def write_version(filename: str = os.path.join(*["airflow", "git_version"])): 'jira', 'mongomock', 'moto==1.3.5', +'mypy==0.720', 'nose', 'nose-ignore-docstring==0.2', 'nose-timer', 'parameterized', -'paramiko', 'pre-commit', 'pylint~=2.3.1', # to be upgraded after fixing https://github.com/PyCQA/pylint/issues/3123 # We should also disable checking docstring at the module level -'pysftp', -'pywinrm', -'qds-sdk>=1.9.6', 'rednose', 'requests_mock', 'yamllint' ] + +devel = sorted(devel + doc) + # IMPORTANT NOTE!!! # IF you are removing dependencies from the above list, please make sure that you also increase # DEPENDENCIES_EPOCH_NUMBER in the Dockerfile -if PY3: -devel += ['mypy==0.720'] -else: -devel += ['unittest2'] - devel_minreq = devel + kubernetes + mysql + doc + password + cgroups devel_hadoop = devel_minreq + hive + hdfs + webhdfs + kerberos -devel_all = (sendgrid + devel + all_dbs + doc + samba + slack + oracle + - docker + ssh + kubernetes + celery + redis + gcp + grpc + - datadog + zendesk + jdbc + ldap + kerberos + password + webhdfs + jenkins + - druid + pinot + segment + snowflake + elasticsearch + sentry + - atlas + azure + aws + salesforce + cgroups + papermill + virtualenv) -# Snakebite & Google Cloud Dataflow are not Python 3 compatible :'( -if PY3: -devel_ci = [package for package in devel_all if package not in -['snakebite>=2.7.8', 'snakebite[kerberos]>=2.7.8']] -else: -devel_ci = devel_all +all_packages = ( +async_packages + +atlas + +all_dbs + +aws + +azure + +celery + +cgroups + +datadog + +dask + +databricks + +datadog + +docker + +druid + +elasticsearch + +gcp + +grpc + +flask_oauth + +jdbc + +jenkins + +kerberos + +kubernetes + +ldap + +oracle + +papermill + +password + +pinot + +redis + +salesforce + +samba + +sendgrid + +sentry + +segment + +slack + +snowflake + +ssh + +statsd + +virtualenv + +webhdfs + +winrm + +zendesk +) + +# Snakebite is not Python 3 compatible :'( +all_packages = [package for package in all_packages if not package.startswith('snakebite')] Review comment: We should do this better and delcare those deps as `'snakebite;python_version<"3"'` etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336909165 ## File path: scripts/ci/ci_flake8.sh ## @@ -22,16 +22,21 @@ MY_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" export AIRFLOW_CI_SILENT=${AIRFLOW_CI_SILENT:="true"} -export PYTHON_VERSION=3.5 Review comment: I'm not sure about this. If I'm following the code right removing this will autodetect the python version to use based on the host python. But we _explicitly_ want to run these tests agsinst 3.5 locally. If I have python 3.7 locally it would attempt to use that version which isn't what we want. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336896918 ## File path: BREEZE.rst ## @@ -912,32 +914,26 @@ Docker images used by Breeze For all development tasks related integration tests and static code checks we are using Docker images that are maintained in DockerHub under ``apache/airflow`` repository. -There are three images that we currently manage: +There are those images that we currently manage (all of them are stages in multi-staging +``_ dockerfile.: -* **Slim CI** image that is used for static code checks (size around 500MB) - tag follows the pattern - of ``-python-ci-slim`` (for example ``apache/airflow:master-python3.6-ci-slim``). - The image is built using the ``_ dockerfile. -* **Full CI image*** that is used for testing - containing a lot more test-related installed software +* ** CI image*** that is used for testing - containing a lot more test-related installed software (size around 1GB) - tag follows the pattern of ``-python-ci`` - (for example ``apache/airflow:master-python3.6-ci``). The image is built using the - ``_ dockerfile. + (for example ``apache/airflow:master-python3.6-ci``). It is also used to run some of the + static checks (pylint, mypy, flake8, as well as to generate the documentation) +* ** PROD image*** that is a base for Production-ready image of Apache Airflow. + (size around 1GB) - tag follows the pattern of ``-python`` Review comment: Do we have a slim image too that just contains airflow core? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336901053 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336914812 ## File path: tests/operators/test_operators.py ## @@ -78,8 +78,10 @@ def test_mysql_operator_test_multi(self): ) t.run(start_date=DEFAULT_DATE, end_date=DEFAULT_DATE, ignore_ti_state=True) -@unittest.skipUnless('mysql' in configuration.conf.get('core', 'sql_alchemy_conn'), - "This is a MySQL test") +@unittest.skip(""" +This tests is not working for mariadb as it uses not secure API of mysql Review comment: ```suggestion This tests is not working for mariadb or modern MySQL versions as it uses an insecure API of mysql ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336904694 ## File path: Dockerfile ## @@ -334,56 +380,52 @@ COPY --chown=airflow:airflow airflow/version.py ${AIRFLOW_SOURCES}/airflow/versi COPY --chown=airflow:airflow airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py COPY --chown=airflow:airflow airflow/bin/airflow ${AIRFLOW_SOURCES}/airflow/bin/airflow -# The goal of this line is to install the dependencies from the most current setup.py from sources -# This will be usually incremental small set of packages in CI optimized build, so it will be very fast -# In non-CI optimized build this will install all dependencies before installing sources. -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" - +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="" +ENV CASS_DRIVER_BUILD_CONCURRENCY="8" -WORKDIR ${AIRFLOW_SOURCES}/airflow/www - -# Copy all www files here so that we can run npm building for production -COPY --chown=airflow:airflow airflow/www/ ${AIRFLOW_SOURCES}/airflow/www/ +ENV PATH="/home/airflow/.local/bin:/home/airflow:${PATH}" -# Package NPM for production -RUN gosu ${AIRFLOW_USER} npm run prod +# The goal of this line is to install the dependencies from the most current setup.py from sources +# This will be usually incremental small set of packages in CI optimized build, so it will be very fast +# For production optimised build it is the first time dependencies are installed so it will be slower +RUN pip install --user ".[${AIRFLOW_PROD_EXTRAS}]" \ +&& pip uninstall --yes apache-airflow snakebite # Cache for this line will be automatically invalidated if any # of airflow sources change COPY --chown=airflow:airflow . ${AIRFLOW_SOURCES}/ -WORKDIR ${AIRFLOW_SOURCES} - -# Finally install the requirements from the latest sources -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" +# Reinstall airflow again - this time with sources and remove the sources after installation +RUN pip install --user ".[${AIRFLOW_PROD_EXTRAS}]" # Additional python deps to install ARG ADDITIONAL_PYTHON_DEPS="" RUN if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \ -pip install ${ADDITIONAL_PYTHON_DEPS}; \ +pip install --user ${ADDITIONAL_PYTHON_DEPS}; \ fi COPY --chown=airflow:airflow ./scripts/docker/entrypoint.sh /entrypoint.sh -ARG APT_DEPS_IMAGE="airflow-apt-deps-ci-slim" -ENV APT_DEPS_IMAGE=${APT_DEPS_IMAGE} +# Copy Airflow www packages +COPY --chown=airflow:airflow --from=airflow-www /opt/airflow/airflow/www ${HOME}/.local/airflow/www -COPY --chown=airflow:airflow .bash_completion run-tests-complete run-tests ${HOME}/ -COPY --chown=airflow:airflow .bash_completion.d/run-tests-complete \ - ${HOME}/.bash_completion.d/run-tests-complete +COPY --chown=airflow:airflow --from=airflow-docs /opt/airflow/docs/_build/html \ +${HOME}/.local/airflow/www/static/docs -RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-apt-deps-ci" ]]; then \ - ${AIRFLOW_SOURCES}/scripts/ci/docker_build/ci_build_extract_tests.sh; fi +RUN mkdir -pv "${AIRFLOW_HOME}" \ Review comment: This first command isn't needed as we have `chown airflow.airflow ${AIRFLOW_HOME}` earlier in the file - this directory must already exist otherwise that previous command would fail. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336899269 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ Review comment: Now we are on Debian Buster we could use https://packages.debian.org/buster/nodejs couldn't we? It's 10.15.2 right now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336905828 ## File path: airflow/contrib/hooks/winrm_hook.py ## @@ -204,6 +203,7 @@ def get_conn(self): try: if self.password and self.password.strip(): +# pylint: disable=unexpected-keyword-arg Review comment: Unrelated changes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336902442 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336917284 ## File path: Dockerfile ## @@ -334,56 +380,52 @@ COPY --chown=airflow:airflow airflow/version.py ${AIRFLOW_SOURCES}/airflow/versi COPY --chown=airflow:airflow airflow/__init__.py ${AIRFLOW_SOURCES}/airflow/__init__.py COPY --chown=airflow:airflow airflow/bin/airflow ${AIRFLOW_SOURCES}/airflow/bin/airflow -# The goal of this line is to install the dependencies from the most current setup.py from sources -# This will be usually incremental small set of packages in CI optimized build, so it will be very fast -# In non-CI optimized build this will install all dependencies before installing sources. -RUN pip install -e ".[${AIRFLOW_EXTRAS}]" - +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="" +ENV CASS_DRIVER_BUILD_CONCURRENCY="8" -WORKDIR ${AIRFLOW_SOURCES}/airflow/www - -# Copy all www files here so that we can run npm building for production -COPY --chown=airflow:airflow airflow/www/ ${AIRFLOW_SOURCES}/airflow/www/ +ENV PATH="/home/airflow/.local/bin:/home/airflow:${PATH}" -# Package NPM for production -RUN gosu ${AIRFLOW_USER} npm run prod +# The goal of this line is to install the dependencies from the most current setup.py from sources +# This will be usually incremental small set of packages in CI optimized build, so it will be very fast Review comment: This is in the prod image section so the comment doesn't make sense anymore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336900127 ## File path: Dockerfile ## @@ -77,252 +75,300 @@ RUN curl -sL https://deb.nodesource.com/setup_10.x | bash - \ libssl-dev \ locales \ netcat \ - nodejs \ rsync \ sasl2-bin \ sudo \ + libmariadb-dev-compat \ && apt-get autoremove -yqq --purge \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* -# Install graphviz - needed to build docs with diagrams -RUN apt-get update \ -&& apt-get install -y --no-install-recommends \ - graphviz \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean \ -&& rm -rf /var/lib/apt/lists/* - -# Install MySQL client from Oracle repositories (Debian installs mariadb) -RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \ -&& GNUPGHOME="$(mktemp -d)" \ -&& export GNUPGHOME \ -&& for KEYSERVER in $(shuf -e \ -ha.pool.sks-keyservers.net \ -hkp://p80.pool.sks-keyservers.net:80 \ -keyserver.ubuntu.com \ -hkp://keyserver.ubuntu.com:80 \ -pgp.mit.edu) ; do \ - gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true ; \ - done \ -&& gpg --export "${KEY}" | apt-key add - \ -&& gpgconf --kill all \ -rm -rf "${GNUPGHOME}"; \ -apt-key list > /dev/null \ -&& echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.6" | tee -a /etc/apt/sources.list.d/mysql.list \ -&& apt-get update \ -&& apt-get install --no-install-recommends -y \ -libmysqlclient-dev \ -mysql-client \ -&& apt-get autoremove -yqq --purge \ -&& apt-get clean && rm -rf /var/lib/apt/lists/* - RUN adduser airflow \ && echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \ && chmod 0440 /etc/sudoers.d/airflow -# This is an image with all APT dependencies needed by CI. It is built on top of the airlfow APT image -# Parameters: -# airflow-apt-deps - this is the base image for CI deps image. +# CI airflow image -FROM airflow-apt-deps-ci-slim as airflow-apt-deps-ci +FROM airflow-base as airflow-ci SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"] -ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/ +# Setting to 1 speeds up building the image. Cassandra driver without CYTHON saves around 10 minutes +# But might not be suitable for production image +ENV CASS_DRIVER_NO_CYTHON="1" +ENV CASS_DRIVER_BUILD_CONCURRENCY=8 + +ENV JAVA_HOME=/usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/ + +# By changing the CI build epoch we can force reinstalling apt dependenecies for CI +# It can also be overwritten manually by setting the build variable. +ARG CI_APT_DEPENDENCIES_EPOCH_NUMBER="1" +ENV CI_APT_DEPENDENCIES_EPOCH_NUMBER=${CI_APT_DEPENDENCIES_EPOCH_NUMBER} + +RUN apt-get update \ +&& apt-get install --no-install-recommends -y \ + apt-transport-https ca-certificates wget dirmngr gnupg software-properties-common curl gnupg2 \ +&& export APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1 \ +&& curl -sL https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | apt-key add - \ +&& curl -sL https://deb.nodesource.com/setup_10.x | bash - \ +&& add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/ \ +&& apt-get update \ +&& apt-get install --no-install-recommends -y \ + gnupg \ + graphviz \ + krb5-user \ + ldap-utils \ + less \ + lsb-release \ + nodejs \ + net-tools \ + adoptopenjdk-8-hotspot \ + openssh-client \ + openssh-server \ + postgresql-client \ + python-selinux \ + sqlite3 \ + tmux \ + unzip \ + vim \ +&& apt-get autoremove -yqq --purge \ +&& apt-get clean \ +&& rm -rf /var/lib/apt/lists/* \ +; + +ENV HADOOP_DISTRO="cdh" HADOOP_MAJOR="5" HADOOP_DISTRO_VERSION="5.11.0" HADOOP_VERSION="2.6.0" \ +HADOOP_HOME="/tmp/hadoop-cdh" +ENV HIVE_VERSION="1.1.0" HIVE_HOME="/tmp/hive" +ENV HADOOP_URL="https://archive.cloudera.com/${HADOOP_DISTRO}${HADOOP_MAJOR}/${HADOOP_DISTRO}/${HADOOP_MAJOR}/; +ENV MINICLUSTER_BASE="https://github.com/bolkedebruin/minicluster/releases/download/; \ +MINICLUSTER_HOME="/tmp/minicluster" \ +MINICLUSTER_VER="1.1" + +RUN mkdir -pv "${HADOOP_HOME}" \ +&& mkdir -pv "${HIVE_HOME}" \ +&& mkdir -pv "${MINICLUSTER_HOME}" \ +&& mkdir -pv "/user/hive/warehouse" \ +&& chmod -R 777 "${HIVE_HOME}" \ +& -R 777 "/user/" + +ENV
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336896656 ## File path: BREEZE.rst ## @@ -912,32 +914,26 @@ Docker images used by Breeze For all development tasks related integration tests and static code checks we are using Docker images that are maintained in DockerHub under ``apache/airflow`` repository. -There are three images that we currently manage: +There are those images that we currently manage (all of them are stages in multi-staging +``_ dockerfile.: -* **Slim CI** image that is used for static code checks (size around 500MB) - tag follows the pattern - of ``-python-ci-slim`` (for example ``apache/airflow:master-python3.6-ci-slim``). - The image is built using the ``_ dockerfile. -* **Full CI image*** that is used for testing - containing a lot more test-related installed software +* ** CI image*** that is used for testing - containing a lot more test-related installed software (size around 1GB) - tag follows the pattern of ``-python-ci`` - (for example ``apache/airflow:master-python3.6-ci``). The image is built using the - ``_ dockerfile. + (for example ``apache/airflow:master-python3.6-ci``). It is also used to run some of the + static checks (pylint, mypy, flake8, as well as to generate the documentation) +* ** PROD image*** that is a base for Production-ready image of Apache Airflow. + (size around 1GB) - tag follows the pattern of ``-python`` + (for example ``apache/airflow:master-python3.6``). Review comment: We shouldn't imply that `apache/airflow:master-python3.6` is a suitable prod image which this bullet point does to me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336895542 ## File path: .travis.yml ## @@ -32,39 +33,49 @@ stages: jobs: include: - name: "Static checks (no pylint, no licence check)" + python: "3.5" stage: pre-test - script: ./scripts/ci/ci_run_all_static_tests_except_pylint_licence.sh + script: ./scripts/ci/ci_run_all_static_tests_except_pylint_checklicence.sh - name: "Check licence compliance for Apache" + python: "3.5" stage: pre-test - script: ./scripts/ci/ci_check_license.sh + script: ./scripts/ci/ci_run_checklicence.sh - name: "Pylint checks" + python: "3.5" stage: pre-test script: ./scripts/ci/ci_run_all_static_tests_pylint.sh - name: "Build documentation" + python: "3.5" stage: pre-test script: ./scripts/ci/ci_docs.sh - name: "Tests postgres kubernetes python 3.6 (persistent)" - env: BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.15.0 KUBERNETES_MODE=persistent_mode + env: BACKEND=postgres KUBERNETES_MODE=persistent_mode RUN_KUBERNETES_TESTS=true python: "3.6" stage: test + script: travis_wait 30 "./scripts/ci/ci_run_airflow_testing.sh" - name: "Tests postgres kubernetes python 3.6 (git)" - env: BACKEND=postgres ENV=kubernetes KUBERNETES_VERSION=v1.15.0 KUBERNETES_MODE=git_mode Review comment: Do we not support more than one version of Kube in tests anymore? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704]
ashb commented on a change in pull request #6266: [AIRFLOW-2439] Production Docker image support including refactoring of build scripts - depends on [AIRFLOW-5704] URL: https://github.com/apache/airflow/pull/6266#discussion_r336894816 ## File path: .pre-commit-config.yaml ## @@ -20,21 +20,6 @@ default_language_version: # force all unspecified python hooks to run python3 python: python3 repos: - - repo: local -hooks: - - id: build -name: Check if image build is needed -entry: ./scripts/ci/local_ci_build.sh -language: system -always_run: true -pass_filenames: false - - id: check-apache-license Review comment: Why did we move these? (I'm not too familiary with with a "repo" is in pre-commit terms. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services