potiuk commented on issue #35255: URL: https://github.com/apache/airflow/issues/35255#issuecomment-1806743084
Well, it's not obvious - that's why we are having that discussion in the first place. Since you are asking, I will provide you very detailed answer, from which you understand WHY it is difficult to explain it succintly in the user documentation. The reason why we are installing database clients "system level" software in slim image that it is not a one-liner and we do not want to make our users to repeat it in all their docker files. The idea with slim image is that you can install any Python package or provider by easily extending the image. For example this will install latest postgres provider (see examples in https://airflow.apache.org/docs/docker-stack/build.html#building-the-image) ``` FROM apache/airflow:slim-2.7.3 RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" "apache-airflow-providers-postgres" ``` or this will install any packages you specify in requirements.txt ``` FROM apache/airflow:slim-2.7.3 COPY requirements.txt / RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r /requirements.txt ``` (note `apache-airflow==` must be there in order to prevent `pip` from accidentally upgrading/downgrading airlflow in case you add conflicting dependencies - you can also see my talk from the Summit explaining why https://www.youtube.com/watch?v=zPjIQjjjyHI and some more details) This is easy to explain and (except the `==` for airflow that is a protection against accidental upgrades) is pretty "standard" and "expected". If we would remove the clients ALSO ask user to install the postgres binary clients, it would be way more complex This is the script that installs all necessary system dependencies in order to be able to connect to Postgres: https://github.com/apache/airflow/blob/main/scripts/docker/install_postgres.sh ```bash install_postgres_client() { echo echo "${COLOR_BLUE}Installing postgres client${COLOR_RESET}" echo if [[ "${1}" == "dev" ]]; then packages=("libpq-dev" "postgresql-client") elif [[ "${1}" == "prod" ]]; then packages=("postgresql-client") else echo echo "Specify either prod or dev" echo exit 1 fi curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list apt-get update apt-get install --no-install-recommends -y "${packages[@]}" apt-get autoremove -yqq --purge apt-get clean && rm -rf /var/lib/apt/lists/* } # Install Postgres client from Postgres repositories # But only if it is not disabled if [[ ${INSTALL_POSTGRES_CLIENT:="true"} == "true" ]]; then install_postgres_client "${@}" fi ``` And we certainly would not want the users of the slim image to have to understand and copy all the lines of that installation to their Dockerfiles. Of course it could be slighlty simplified - for example you could only install runtime dependencies, and you could rely on the postgres client that is available in Debian repositories, but our image is really "production ready" so our installation of clients uses latest, most secure and most up-to-date version of Debian compatible system level libraries that the producer of the software (Postgres) offers - that's why we are installing it directly from the source. Now. Having all the explanation: I would really appreciate if you find a way to describe it clearly and succintly enough in the user documentation :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org