potiuk commented on issue #35255:
URL: https://github.com/apache/airflow/issues/35255#issuecomment-1806743084

   Well, it's not obvious - that's why we are having that discussion in the 
first place. Since you are asking, I will provide you very detailed answer, 
from which you understand WHY it is difficult to explain it succintly in the 
user documentation.
   
   
   The reason why we are installing database clients "system level" software in 
slim image that it is not a one-liner and we do not want to make our users to 
repeat it in all their docker files. 
   
   The idea with slim image is that you can install any Python package or 
provider by easily extending the image. 
   
   For example this will install latest postgres provider (see examples in 
https://airflow.apache.org/docs/docker-stack/build.html#building-the-image)
   
   ```
   FROM apache/airflow:slim-2.7.3
   RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" 
"apache-airflow-providers-postgres"
   ```
   
   or this will install any packages you specify in requirements.txt
   
   ```
   FROM apache/airflow:slim-2.7.3
   COPY requirements.txt /
   RUN pip install --no-cache-dir "apache-airflow==${AIRFLOW_VERSION}" -r 
/requirements.txt
   ```
   
   (note `apache-airflow==` must be there in order to prevent `pip` from 
accidentally upgrading/downgrading airlflow in case you add conflicting 
dependencies - you can also see my talk from the Summit explaining why 
https://www.youtube.com/watch?v=zPjIQjjjyHI and some more details)
   
   This is easy to explain and (except the `==` for airflow that is a 
protection against accidental upgrades) is pretty "standard" and "expected".
   
   If we would remove the clients ALSO ask user to install the postgres binary 
clients, it would be way more complex
   
   This is the script that installs all necessary system dependencies in order 
to be able to connect to Postgres: 
   
   
https://github.com/apache/airflow/blob/main/scripts/docker/install_postgres.sh
   
   ```bash
   install_postgres_client() {
       echo
       echo "${COLOR_BLUE}Installing postgres client${COLOR_RESET}"
       echo
   
       if [[ "${1}" == "dev" ]]; then
           packages=("libpq-dev" "postgresql-client")
       elif [[ "${1}" == "prod" ]]; then
           packages=("postgresql-client")
       else
           echo
           echo "Specify either prod or dev"
           echo
           exit 1
       fi
   
       curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
       echo "deb https://apt.postgresql.org/pub/repos/apt/ $(lsb_release 
-cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list
       apt-get update
       apt-get install --no-install-recommends -y "${packages[@]}"
       apt-get autoremove -yqq --purge
       apt-get clean && rm -rf /var/lib/apt/lists/*
   }
   
   # Install Postgres client from Postgres repositories
   # But only if it is not disabled
   if [[ ${INSTALL_POSTGRES_CLIENT:="true"} == "true" ]]; then
       install_postgres_client "${@}"
   fi
   ```
   
   And we certainly would not want the users of the slim image to have to 
understand and copy all the lines of that installation to their Dockerfiles. 
   
   Of course it could be slighlty simplified - for example you could only 
install runtime dependencies, and you could rely on the postgres client that is 
available in Debian repositories, but our image is really "production ready" so 
our installation of clients uses latest, most secure and most up-to-date 
version of Debian compatible system level libraries that the producer of the 
software (Postgres) offers - that's why we are installing it directly from the 
source. 
   
   Now. Having all the explanation: I would really appreciate if you find a way 
to describe it clearly and succintly enough in the user documentation :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to