TL;DR; Yes, if you use our reference docker images, you are already
following the "How to reproducibly install Airflow" recommendation, because
this is what we do when preparing the release, also because those images
are generally "frozen" in time once released.

*Little longer explanation if you care*

The dockerfiles are convenience packages (also called reference images)
which we publish for convenience for our users. They are not "official
source artifact" - they are just conveniently packaging the wheel
packages + system packages that were available at the time of release that
make them run on top of Python-debian-bullseye base image. As of Airflow
2.8.0 - assuming lazy consensus will be reached tomorrow morning - based on
Python-debian-bookworm based image - following our policies (see this LAZY
CONSENSUS thread on the devlist -
https://lists.apache.org/thread/gcy143nqodf8dqbjxo2xt5gq4npv334p)

They are just that - conveniently packaged installation based on Python
Debian base image, necessary system packages, and yes - this is what you
refer to - airflow packages with pre-selected provider list installed via
`pip`, These are indeed installed and constraints - so indeed the way we
build and publish those already strictly follows the reproducible
installation recommendation I repeated.

In fact it's even a bit more "reproducible" than just reproducible `pip`
installation - because once we publish the images we generally (unless
there are really exceptional situations) do not update those "released"
reference images. They are basically frozen in time, and if someone wants
to update them (for example to upgrade some packages that received security
fixes) - users should take our reference images and upgrade whatever they
want to upgrade, because our reference docker images are basically frozen
once released (of course the best way to get latest compliant packages and
security fixes is to update to latest released image when it is released).

This is nicely described in the docker image documentation as well
https://airflow.apache.org/docs/docker-stack/index.html#fixing-images-at-release-time.
Quoting the documentation here for convenience:

-------------------------------------

*Fixing images at release time*

The released “versioned” reference images are mostly fixed when we release
Airflow version and we only update them in exceptional circumstances. For
example when we find out that there are dependency errors that might
prevent important Airflow or embedded provider’s functionalities working.
In normal circumstances, the images are not going to change after release,
even if new version of Airflow dependencies are released - not even when
those versions contain critical security fixes. The process of Airflow
releases is designed around upgrading dependencies automatically where
applicable but only when we release a new version of Airflow, not for
already released versions.

If you want to make sure that Airflow dependencies are upgraded to the
latest released versions containing latest security fixes in the image you
use, you should implement your own process to upgrade those yourself when
you build custom image based on the Airflow reference one. Airflow usually
does not upper-bound versions of its dependencies via requirements, so you
should be able to upgrade them to the latest versions - usually without any
problems. And you can follow the process described in Building the image to
do it (even in automated way).

Obviously - since we have no control over what gets released in new
versions of the dependencies, we cannot give any guarantees that tests and
functionality of those dependencies will be compatible with Airflow after
you upgrade them - testing if Airflow still works with those is in your
hands, and in case of any problems, you should raise issue with the authors
of the dependencies that are problematic. You can also - in such cases -
look at the Airflow issues Airflow Pull Requests and Airflow Discussions,
searching for similar problems to see if there are any fixes or workarounds
found in the main version of Airflow and apply them to your custom image.

The easiest way to keep-up with the latest released dependencies is
however, to upgrade to the latest released Airflow version via switching to
newly released images as base for your images, when a new version of
Airflow is released. Whenever we release a new version of Airflow, we
upgrade all dependencies to the latest applicable versions and test them
together, so if you want to keep up with those tests - staying up-to-date
with latest version of Airflow is the easiest way to update those
dependencies.

J,


On Sun, Nov 5, 2023 at 6:41 PM Herve Ballans <herve.ball...@ias.u-psud.fr>
wrote:

> Dear Jarek,
>
> Thank you for this really useful recommandation!
>
> But, just, I would like to be sure of something: when you say that 'pip'
> is the only way to install Airflow in a reproducible way, you mean
> comparing to installation from sources?
>
> I guess the installation from Docker images is also recommended as well
> right? (unless I'm wrong, an official Airflow Docker image uses pip for
> installing Airflow, including also the constraints file for the
> dedicated version).
>
> Here we've been using Docker compose installation for years without
> encounter any major problems, even in the case of upgrades...
>
> Best,
> Hervé
>
> On 04/11/2023 12:57, Jarek Potiuk wrote:
> >
> > If you want to make sure released Airflow installs in a reproducible
> > way from the scratch - now and in the future, the only way to
> > achieve that is described here:
> >
> >
> https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html
> >
> > It involves using constraints. It only works with `pip`. There are no
> > other ways and other tools that can be achieved easily, so we strongly
> > recommend you use `pip` when installing Airflow.
> >
>

Reply via email to