Hi Jarek,

I really like the idea of having a slim airflow docker image.  500MB
uncompressed is tiny 👍


Thanks,

Ping


On Sun, May 1, 2022 at 8:41 AM Jarek Potiuk <[email protected]> wrote:

> And just to clarify. Those "slim" images are not at all "toothless". You
> can actually do stuff with them :)
>
> The 4 providers that are preinstalled are there:
>
> apache-airflow-providers-ftp    | File Transfer Protocol (FTP)
> https://tools.ietf.org/html/rfc114             | 2.1.2
> apache-airflow-providers-http   | Hypertext Transfer Protocol (HTTP)
> https://www.w3.org/Protocols/            | 2.1.2
> apache-airflow-providers-imap   | Internet Message Access Protocol (IMAP)
> https://tools.ietf.org/html/rfc3501 | 2.2.3
> apache-airflow-providers-sqlite | SQLite https://www.sqlite.org/
>                                      | 2.1.3
>
> We could probably further slim them down but that would limit the
> extensibility a bit and I consider 500 MB uncompressed as pretty "decent" -
> it's ~ 130-160 MB of compressed data when you pull the image.
>
> J.
>
>
>
> On Sun, May 1, 2022 at 5:26 PM Jarek Potiuk <[email protected]> wrote:
>
>> Hello everyone,
>>
>> TL;DR: I am looking for consensus on releasing "slim" versions of PROD
>> images - ones that will be way smaller and contain no providers nor
>> other extras and would be database-specific.
>>
>> Context:
>>
>> Now after we are done with some infra changes that were also released
>> in 2.3.0 I came back to the issue raised in in
>> https://github.com/apache/airflow/issues/20849 which was originally
>> about "vanilla" image for Airflow, but I renamed the idea to "slim"
>> image (following similar convention by various distro and Python
>> providers). The issue itself explains why there is a need for such
>> images.
>>
>> The idea is to have a very small "base" ("slim") image that users will
>> be able to extend  - not only a "regular" (see the relation with
>> "slim" :D ?)  image where we pre-install a set of providers and
>> support multiple database backends.
>>
>> The "slim" images also have the advantage that we can use
>> "no-constraints" dependencies with them - which means that in those
>> images, the dependencies are "latest" that airflow supports even if
>> some providers would limit the dependencies.
>>
>> I looked at what it would mean and really what it translates to is
>> that we would have to push many more images.
>>
>> The bad news:
>>
>> We need to push matrix of 4 * 3 = 12 new "slim" images (plus some
>> aliases for "latest")
>> *  Python versions: 3.7, 3.8, 3.9, 3.10
>> *  Database: postgres, mysql, mssql
>>
>> Postgres images would be additionally multiplatform (AMD64/ARM64) and
>> for now MySQL and MsSQL would  be just AMD64 (until we add support for
>> ARM for those).
>> Those are plenty of images, but this is a rather normal approach if
>> you look for a number of other images published by multiple
>> "platform-like" products.
>>
>> The good news:
>>
>> We only need to do it at release time and we already have the right
>> set of scripts and parameters to enable that. It will take a bit
>> longer, but those images are much smaller and building and pushing
>> them is WAY faster and smaller han the regular image.
>>
>> Some comparison:
>>
>> Size (uncompressed): Regular (1.1G), Slim (500MB)
>> Time to build single image: Regular(6m), Slim (up to 3m)
>>
>> Overall the release process would take some 20 mins longer if we
>> release the slim images (and I already made it a separate step so it
>> should not block "regular" release).
>>
>> The very good news:
>>
>> I've actually prepared PR:
>> https://github.com/apache/airflow/pull/23391 to add this feature
>> (including the docs), and it's a very small change. It does not change
>> any of the source code of airflow or Dockerfile, we basically need to
>> extend our "dev" script to build and push images to ... build and push
>> more images. I actually even .. prepared and pushed 2.3.0 images of
>> airflow to my private dockerhub account so that everyone can see how
>> it will look like.
>>
>> You can see it here:
>>
>> https://hub.docker.com/repository/docker/potiuk/airflow/tags?page=1&ordering=last_updated&name=2.3.0
>>
>> I **believe** those changes don't even need PMC votes for release, and
>> this is more a procedural change than software release, so we
>> **could** release the "slim" 2.3.0 images even now - so that they are
>> available as of 2.3.0. I think even if we see that this is a welcome
>> change (despite the complexity of our dockerhub images available) it
>> could even be agreed to via lasy-consensus if we see consensus
>> forming.
>>
>> J.
>>
>

Reply via email to