This is an automated email from the ASF dual-hosted git repository. potiuk pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit dbc9ab95ae733cced3f285b9472c28cbf5ef3fcf Author: Jarek Potiuk <[email protected]> AuthorDate: Sun Oct 11 06:19:57 2020 +0200 Add capability of customising PyPI sources (#11385) * Add capability of customising PyPI sources This change adds capability of customising installation of PyPI modules via custom .pypirc file. This might allow to install dependencies from in-house, vetted registry of PyPI (cherry picked from commit 45d33dbd432fd010f6ff2b698c682c31ac436c24) --- Dockerfile | 4 ++++ docs/production-deployment.rst | 40 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 44 insertions(+) diff --git a/Dockerfile b/Dockerfile index f257606..7cc7f94 100644 --- a/Dockerfile +++ b/Dockerfile @@ -164,6 +164,8 @@ RUN mkdir -p /root/.local/bin ARG AIRFLOW_PRE_CACHED_PIP_PACKAGES="true" ENV AIRFLOW_PRE_CACHED_PIP_PACKAGES=${AIRFLOW_PRE_CACHED_PIP_PACKAGES} +COPY .pypirc /root/.pypirc + # In case of Production build image segment we want to pre-install master version of airflow # dependencies from github so that we do not have to always reinstall it from the scratch. RUN if [[ ${AIRFLOW_PRE_CACHED_PIP_PACKAGES} == "true" ]]; then \ @@ -385,6 +387,8 @@ RUN chmod a+x /entrypoint /clean-logs # See https://github.com/apache/airflow/issues/9248 RUN chmod g=u /etc/passwd +COPY .pypirc ${AIRFLOW_USER_HOME_DIR}/.pypirc + ENV PATH="${AIRFLOW_USER_HOME_DIR}/.local/bin:${PATH}" ENV GUNICORN_CMD_ARGS="--worker-tmp-dir /dev/shm" diff --git a/docs/production-deployment.rst b/docs/production-deployment.rst index 5e6cad2..7c4bfab 100644 --- a/docs/production-deployment.rst +++ b/docs/production-deployment.rst @@ -262,6 +262,14 @@ You can combine both - customizing & extending the image. You can build the imag ``customize`` method (either with docker command or with ``breeze`` and then you can ``extend`` the resulting image using ``FROM:`` any dependencies you want. +Customizing PYPI installation +............................. + +You can customize PYPI sources used during image build by modifying .pypirc file that should be +placed in the root of Airflow Directory. This .pypirc will never be committed to the repository +and will not be present in the final production image. It is added and used only in the build +segment of the image so it is never copied to the final image. + External sources for dependencies --------------------------------- @@ -595,3 +603,35 @@ More details about the images You can read more details about the images - the context, their parameters and internal structure in the `IMAGES.rst <https://github.com/apache/airflow/blob/master/IMAGES.rst>`_ document. + +.. _production-deployment:kerberos: + +Kerberos-authenticated workers +============================== + +Apache Airflow has a built-in mechanism for authenticating the operation with a KDC (Key Distribution Center). +Airflow has a separate command ``airflow kerberos`` that acts as token refresher. It uses the pre-configured +Kerberos Keytab to authenticate in the KDC to obtain a valid token, and then refreshing valid token +at regular intervals within the current token expiry window. + +Each request for refresh uses a configured principal, and only keytab valid for the principal specified +is capable of retrieving the authentication token. + +The best practice to implement proper security mechanism in this case is to make sure that worker +workloads have no access to the Keytab but only have access to the periodically refreshed, temporary +authentication tokens. This can be achieved in docker environment by running the ``airflow kerberos`` +command and the worker command in separate containers - where only the ``airflow kerberos`` token has +access to the Keytab file (preferably configured as secret resource). Those two containers should share +a volume where the temporary token should be written by the ``airflow kerberos`` and read by the workers. + +In the Kubernetes environment, this can be realized by the concept of side-car, where both Kerberos +token refresher and worker are part of the same Pod. Only the Kerberos side-car has access to +Keytab secret and both containers in the same Pod share the volume, where temporary token is written by +the side-care container and read by the worker container. + +This concept is implemented in the development version of the Helm Chart that is part of Airflow source code. + + +.. spelling:: + + pypirc
